WO2023109862A1 - Procédé de lecture coopérative de contenu audio d'une lecture vidéo et système de communication - Google Patents

Procédé de lecture coopérative de contenu audio d'une lecture vidéo et système de communication Download PDF

Info

Publication number
WO2023109862A1
WO2023109862A1 PCT/CN2022/138988 CN2022138988W WO2023109862A1 WO 2023109862 A1 WO2023109862 A1 WO 2023109862A1 CN 2022138988 W CN2022138988 W CN 2022138988W WO 2023109862 A1 WO2023109862 A1 WO 2023109862A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
electronic device
video
playback
audio playback
Prior art date
Application number
PCT/CN2022/138988
Other languages
English (en)
Chinese (zh)
Inventor
张泰�
程力
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023109862A1 publication Critical patent/WO2023109862A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/403Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers loud-speakers

Definitions

  • the present application relates to the technical field of terminals, and in particular to a method and a communication system for cooperatively playing audio during video playback.
  • the sounds made in nature such as the sound of human speech, thunder, train sound, etc., are all stereo. When people hear these sounds, in addition to feeling the loudness, pitch, and timbre of the sound, they can also identify the direction of the sound.
  • the video playback device plays the audio synchronously when playing the video, it is often difficult for the user to distinguish the directions of the sounds of different sounding objects in the audio. Therefore, during video playback, the user cannot quickly (or even) substitute himself into the scene in the video, and cannot experience the feeling of being in the scene, and the user experience is limited.
  • the present application provides a method and a communication system for cooperatively playing audio during video playing.
  • the technical solution provided by this application enables multiple audio playback devices to simulate the orientation of different sounding objects in the live environment of the video through the positions of multiple audio playback devices, the user's location, and the played video, so that users can Feel the orientation of different sounding objects, let users feel that they are in the live environment of the video, increase the user's sense of substitution and playability when watching the video, and improve the user's sense of experience.
  • the present application provides a method for cooperatively playing audio during video playback.
  • the method can be applied to electronic devices.
  • the electronic device is capable of communicating with M audio playback devices.
  • Electronic devices may include display screens. M is a positive integer greater than or equal to 2.
  • the electronic device may acquire the first video.
  • the first video includes a first group of images and a first audio within a first time period.
  • the electronic device may determine that the first video contains the first sounding object and the first background sound within the first time period, and separate the first audio component of the first sounding object from the first audio and the second audio component of the first background sound.
  • the electronic device may send the first message to a first audio playback device among the M audio playback devices.
  • the first message includes the first audio component and the first playing parameter, and the first message is used to instruct the first audio playing device to play the first audio component with the first playing parameter.
  • the electronic device may send a second message to a second audio playback device among the M audio playback devices, the second message includes a second audio component and a second playback parameter, and the second message is used to instruct the second audio playback device to play The parameter plays the second audio component.
  • the electronic device may display the images in the first image group.
  • the first audio playback device simulates the sound effect of the first sounding object on the sound orientation, matching the position of the first sounding object in the three-dimensional space embodied in the video.
  • the observer can substitute himself into the scene presented by the video, and feel the orientation of different sounding objects more realistically. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the electronic device further determines that the first video contains a second sounding object within the first time period, and separates the second sounding from the first audio The object's third audio component.
  • the electronic device may send a third message to a third audio playback device among the M audio playback devices, the third message includes a third audio component and a third playback parameter, and the third message is used to instruct the third audio playback device to play The parameter plays the third audio component.
  • the above-mentioned first audio playback device and the second audio playback device may be different audio playback devices.
  • the electronic device can select an audio playback device to simulate different sounding objects making sounds.
  • Each audio device simulates the sound effect of the sounding object in the sound direction, and matches the position of the sounding object simulated by the audio device in the three-dimensional space embodied in the video. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the first video may also include more sounding objects within the first time period.
  • the method for separating the audio components of other sounding objects, the method of determining the audio playback device and playing time for playing the audio components of other sounding objects can refer to the above-mentioned processing methods for the first sounding object and the second sounding object. I won't go into details here.
  • the content contained in the first video in the first time period may be the content of the first video part duration.
  • the first video is a video with a duration of 1 minute.
  • the above-mentioned first time period may be a time period of 0-5 seconds of the first video.
  • the content contained in the first video in the first time period may be the entire duration of the first video.
  • the first video is a video with a duration of 1 minute.
  • the above-mentioned first time period may be a time period from the beginning to the end of the first video.
  • the above-mentioned first audio playback device may be based on all or part of the M audio playback devices in the first scene relative to the viewing electronic The position of the observer of the device, the position of the first sound-emitting object in the second scene relative to the virtual camera of the first video, and determining the position of the virtual camera in the second scene to be relative to the position of the observer in the first scene obtained from the corresponding position.
  • the electronic device can determine the position of the first sounding object in the first scene, and select an audio playback device close to the position of the first sounding object in the first scene to simulate the sounding of the first sounding object.
  • the closer an audio device is to a sounding object, by adjusting the playback parameters of the audio device to play the audio component of a sounding object the easier it is for the observer to hear the sound from the location of the sounding object. dispatched. If the position of one or more of the observer, the audio playback device, and the first sound-emitting object changes in the first scene, the electronic device may re-select the audio playback device that simulates the sound of the first sound-emitting object.
  • the first playback parameter is based on the position of the first audio playback device in the first scene relative to the observer watching the electronic device.
  • the position of the first sound-emitting object in the second scene relative to the virtual camera of the first video is obtained by determining the position of the virtual camera in the second scene as corresponding to the position of the observer in the first scene.
  • the first audio playback device plays the first audio component with the first playback parameter, which can make the observer feel that the sound is emitted from the position of the first sound-emitting object in the first scene. Then, the observer can substitute himself into the scene presented by the video, and feel the orientation of different sounding objects more realistically. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the third audio playback device is based on all or part of the M audio playback devices in the first scene relative to the viewing electronic device The position of the observer, the position of the second sound-emitting object in the second scene relative to the virtual camera of the first video, and determining the position of the virtual camera in the second scene to correspond to the position of the observer in the first scene location obtained.
  • the electronic device can determine the position of the second sounding object in the first scene, and select an audio playback device close to the position of the second sounding object in the first scene to simulate the sounding of the second sounding object.
  • the third playback parameter is based on the position of the third audio playback device in the first scene relative to the observer watching the electronic device, the second The position of the second sound-emitting object in the second scene relative to the virtual camera of the first video is obtained by determining the position of the virtual camera in the second scene as corresponding to the position of the observer in the first scene.
  • the third audio playback device plays the third audio component with the third playback parameter, which can make the observer feel that the sound is emitted from the position of the second sound object in the first scene. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the position corresponding to the position of the observer in the first scene may be the position corresponding to the position of the observer in the first scene same location.
  • the observer is located at the first position in the first scene
  • the first audio playback device is located at the second position in the first scene.
  • the electronic device may, based on the first position, the position of the first sound-emitting object relative to the virtual camera in the second scene position to obtain the third position of the first sound-emitting object in the first scene.
  • the first angle is the vertex of the angle with the first position, the first position, the third position, The smallest of the angles formed by the positions of all or part of the M audio playback devices.
  • the second audio playback device is among the M audio playback devices that do not play the sound contained in the first video within the first time period An audio playback device for the audio component of the object; or, the second audio playback device is an audio playback device that plays the audio components of the sounding object contained in the first video in the first time period among the M audio playback devices.
  • the electronic device may preferentially select an audio playback device in an idle state to play the second audio component of the first background sound. Reduce the situation where an audio playback device plays too many audio components at the same time. This can make more reasonable use of the M audio playback devices to achieve a better stereo effect and help the observer feel the orientation of different sounding objects during video playback.
  • the first image group includes one or more image frames
  • the first playback parameters include a first playback time and a first sound intensity
  • the second playing parameters include a second playing time and a second sound intensity.
  • adjusting the playback time of the audio playback device to play the audio component of the sounding object can realize the adjustment of the result of the observer's discrimination of the sounding direction of the sounding object.
  • Adjusting the sound intensity of the audio component of the sounding object played by the audio playback device can realize the adjustment of the observer's resolution of the distance of the sounding object.
  • the audio playback device can play the audio component of the sound object, and the observer perceives that the sound is emitted from the position of the second sound object in the first scene. In this way, the observer can be substituted into the scene presented by the video and feel the orientation of different sounding objects.
  • the first playback time and the second playback time are within the first time period.
  • the above-mentioned first time period is a time period of 0-5 seconds of the first video.
  • the first audio playback device plays the first audio component
  • the second audio playback device plays the second audio component. That is, both the first playing time and the second playing time are within the time period of 0-5 seconds of the first video.
  • the electronic device instructs the audio playing device to adjust the playing time of playing the audio component of the sounding object.
  • the adjustment range of the above playback time is usually on the order of milliseconds. Then, the first playing time and/or the second playing time may not be within the first time period.
  • the electronic device may send a fourth message to the fourth audio playback device among the M audio playback devices, and the fourth message includes the second The audio component and the fourth playing parameter, the fourth message is used to instruct the fourth audio playing device to play the second audio component with the fourth playing parameter.
  • the second audio playback device may be located on the first side of the electronic device
  • the fourth audio playback device may be located on the second side of the second electronic device, and the first side and the second side are in the direction of the display screen of the electronic device. divide the sides.
  • both the second audio playback device and the fourth audio playback device play the second audio component of the first background sound, which can better form a stereo sound in the video playback environment, and help observers perceive the sounding process of the sounding object in different directions .
  • the electronic device may obtain a first position according to the positions of multiple observers in the first scene, and the first position is used to represent the position of the observer watching the electronic device.
  • the above-mentioned first position may be the center of the positions of the plurality of observers.
  • the electronic device can integrate the positions of multiple observers to adjust the sounding object corresponding to each audio device and the playback parameters of the audio component of the audio device playing the sounding object.
  • the multiple observers mentioned above can substitute themselves into the scene where the sounding object is making a sound in the video, and feel that different sounding objects are making sound in different directions of themselves.
  • the electronic device can still adjust the sounding objects corresponding to each audio device and the playback parameters of the audio components of the sounding objects played by the audio device in real time. This can enable the observer to still feel the process of following the perspective of the virtual camera and the sounding objects in the video making sounds in different directions during the moving process.
  • the first scene is a scene where an observer watches an electronic device
  • the second scene is a scene presented by the first video. That is to say, the first scene may be equivalent to a scene in a real three-dimensional space.
  • the second scene may correspond to a scene in a virtual three-dimensional space.
  • the first video includes the second image group and the second audio within the second time period.
  • the electronic device may determine that the first video contains the first sounding object and the second background sound within the second time period, and separate the fourth audio component of the first sounding object from the second audio and the fifth audio component of the second background sound.
  • the electronic device may send a fifth message to the first audio playback device, where the fifth message includes the fourth audio component and fifth playback parameters, and the fifth message is used to instruct the first audio playback device to play the fourth audio component with the fifth playback parameter.
  • the electronic device sends a sixth message to the second audio playback device, the sixth message includes the fifth audio component and the sixth playback parameter, and the sixth message is used to instruct the second audio playback device to play the fifth audio component with the sixth playback parameter.
  • the electronic device may display images in the second image group.
  • the first background sound and the second background sound may be the same. In some other embodiments, the above-mentioned first background sound is different from the above-mentioned second background sound.
  • the electronic device can analyze the first video segment by segment, and detect in real time changes in the position of the sounding object, the position of the observer, and the position of the audio playback device during the content playback of each time segment of the first video.
  • the electronic device can timely adjust the audio playback device for playing the audio component of the sounding object and the playback parameters when one or more of the position of the sounding object, the position of the observer, and the position of the audio playback device changes.
  • the above-mentioned method can help the observer to enter the scene presented by the video during the whole process of playing the video, and feel the process of following the perspective of the virtual camera and the sounding objects in the video making sounds in different directions.
  • the electronic device can obtain the second time period when the second audio playback device plays the first background sound according to the first time period of the first video.
  • the second playback time of the audio component may calculate the second sound pressure of the second audio component, and obtain the second sound intensity of the second audio component played by the second audio playback device according to the relationship between sound pressure and sound intensity.
  • the electronic device may obtain the third playback of the first audio component of the first sounding object according to the first time period of the first video time. It can be understood that the third playing time is the original playing time of the first audio component in the first video.
  • the electronic device may determine according to the first position of the observer watching the electronic device in the first scene, the second position of the first audio playback device in the first scene, and the third position of the first sound-emitting object in the first scene. Out of the first time difference.
  • the third position is based on the position of the first sounding object in the second scene relative to the virtual camera of the first video, and the position corresponding to the position of the observer in the first scene is determined as the position of the virtual camera in the second scene
  • the position is obtained, wherein, taking the first position as the vertex of the angle, the smaller the angle formed by the first position, the second position, and the third position, the smaller the first time difference.
  • the electronic device obtains, according to the sum of the third playing time and the first time difference, the first playing time at which the first audio playing device plays the first audio component.
  • the electronic device can determine the first sound pressure according to the second sound pressure, the first position, the second position, and the third position, wherein the closer the second position is to the first position than the third position, the smaller the first sound pressure is, The smaller the second sound pressure, the smaller the first sound pressure.
  • the electronic device may obtain the first sound intensity at which the first audio playback device plays the first audio component according to the first sound pressure and the relationship between the sound pressure and the sound intensity.
  • the fourth playback parameter includes a fourth playback time and a third sound intensity.
  • the above-mentioned fourth playing parameter is a playing parameter for playing the second audio component of the first background sound by the fourth audio playing device.
  • the electronic device may obtain the fourth playing time according to the second playing time and the positions of the second audio playing device and the fourth audio playing device in the first scene relative to the observer watching the electronic device, wherein the fourth audio playing device is shorter than the first audio playing device The closer the second audio playback device is to the observer, the later the fourth playback time is than the second playback time.
  • the electronic device may obtain a third sound intensity according to the second sound intensity and the positions of the second audio playback device and the fourth audio playback device in the first scene relative to the observer watching the electronic device, wherein the fourth audio playback device is higher than the first audio playback device.
  • the first background sounds played by the multiple audio playback devices can reach the observer at the same time, and the sound intensity reaching the observer is the same.
  • one audio playing device can play audio components of multiple sounding objects simultaneously.
  • An audio playback device can also simultaneously play the audio component of the sounding object and the second audio component of the first background sound.
  • the audio playback device may play the corresponding audio component according to the playback parameters sent by the electronic device.
  • the position of the observer watching the electronic device in the first scene changes. For example, during the process of playing the first video from the first time period to the content of the second time period, the observer moves from the first position to the fourth position. Wherein, when the observer is at the fourth position, the content of the second time period in the first video is playing.
  • the first video contains a second image group and a second audio within a second time period.
  • the above-mentioned first time period and second time period may be adjacent time periods in the first video.
  • the electronic device determines that the first video contains the first sounding object within the second time period, and separates a fourth audio component of the first sounding object from the second audio.
  • the electronic device may send a seventh message to the fifth audio playback device among the M audio playback devices, the seventh message includes the fourth audio component and the seventh playback parameter, and the seventh message is used to instruct the fifth audio playback device to play the seventh
  • the parameter plays the fourth audio component.
  • the electronic device may display images in the second image group.
  • the above-mentioned seventh playback parameter can be based on the position of the fifth audio playback device relative to the fourth position in the first scene, the position of the first sounding object in the second scene relative to the virtual camera of the first video, and the The positions corresponding to the four positions are determined as the positions of the virtual camera in the second scene.
  • the fifth audio playback device is based on the position of all or part of the M audio playback devices in the first scene relative to the fourth position, the position of the first sounding object in the second scene relative to the virtual camera of the first video, and The position corresponding to the fourth position is determined as the position of the virtual camera in the second scene.
  • the electronic device can adjust the audio playback device and playback parameters that simulate the sound of the sounding object. After changing the position, the observer can also substitute himself into the scene presented by the video, follow the perspective of the virtual camera, and feel the method of different sounding objects.
  • the above method can increase the observer's sense of substitution and playability when watching a video, and improve the observer's sense of experience.
  • the first video includes a third image group and a third audio within a third time period.
  • the third time period is after the first time period described above.
  • the electronic device may determine that the first video contains the first sounding object within the third time period, and separate the sixth audio component of the first sounding object from the third audio.
  • the position of the first sounding object in the first scene changes from the third position to the fifth position.
  • the second position is the position of the first sounding object within the first time period of the first video
  • the fifth position is the position of the first sounding object within the third time period of the first video.
  • the electronic device may send an eighth message to the sixth audio playback device among the M audio playback devices, the eighth message includes the sixth audio component and the eighth playback parameter, and the eighth message is used to instruct the sixth audio playback device to play in the eighth
  • the parameter plays the sixth audio component.
  • the electronic device may display images in the third image group.
  • the above-mentioned eighth playback parameter may be based on the position of the sixth audio playback device in the first scene relative to the observer watching the electronic device, the fifth position, and the position corresponding to the position of the observer in the first scene It is determined as the position of the virtual camera in the second scene.
  • the above-mentioned sixth audio playback device is based on the position of all or part of the M audio playback devices in the first scene relative to the observer watching the electronic device, the fifth position, and will be related to the position of the observer in the first scene The corresponding position is determined as the position of the virtual camera in the second scene.
  • the electronic device can adjust the audio playback device and playback parameters that simulate the sounding of the sounding object.
  • the observer can substitute himself into the scene presented by the video and feel the change of the position of the sounding object.
  • the above method can increase the observer's sense of substitution and playability when watching a video, and improve the observer's sense of experience.
  • the present application provides a method for cooperatively playing audio during video playback.
  • the method can be applied to a communication system including an electronic device and M audio playback devices, the electronic device includes a display screen, and M is a positive integer greater than or equal to 2.
  • the electronic device may acquire a first video, and the first video includes a first image group and a first audio within a first time period.
  • the electronic device may determine that the first video contains the first sounding object and the first background sound within the first time period, and separate the first audio component of the first sounding object from the first audio and the second audio component of the first background sound.
  • the electronic device may send a first message to a first audio playback device among the M audio playback devices, the first message includes a first audio component and a first playback parameter, and the first message is used to instruct the first audio playback device to play The parameter plays the first audio component.
  • the electronic device may send a second message to a second audio playback device among the M audio playback devices, the second message includes a second audio component and a second playback parameter, and the second message is used to instruct the second audio playback device to play The parameter plays the second audio component.
  • the electronic device may display the images in the first image group, the first audio playback device may play the first audio component synchronously with the first playback parameters according to the first message, and the second audio The playback device can play the second audio component synchronously with the second playback parameter according to the second message.
  • the first audio playback device simulates the sound effect of the first sounding object on the sound orientation, matching the position of the first sounding object in the three-dimensional space embodied in the video.
  • the observer can substitute himself into the scene presented by the video, and feel the orientation of different sounding objects more realistically. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the electronic device further determines that the first video contains a second sounding object within the first time period, and separates the second sounding from the first audio The object's third audio component.
  • the electronic device may send a third message to a third audio playback device among the M audio playback devices, the third message includes a third audio component and a third playback parameter, and the third message is used to instruct the third audio playback device to play The parameter plays the third audio component.
  • the third audio playing device may play the third audio component synchronously with the third playing parameter according to the third message.
  • the electronic device can select an audio playback device to simulate different sounding objects making sounds.
  • Each audio device simulates the sound effect of the sounding object in the sound direction, and matches the position of the sounding object simulated by the audio device in the three-dimensional space embodied in the video. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the first audio playback device may be based on all or part of the M audio playback devices in the first scene relative to the viewing electronic device
  • the position of the observer in the second scene, the position of the first sound-emitting object relative to the virtual camera of the first video, and the position of the virtual camera in the second scene is determined to correspond to the position of the observer in the first scene obtained from the location.
  • the electronic device can determine the position of the first sounding object in the first scene, and select an audio playback device close to the position of the first sounding object in the first scene to simulate the sounding of the first sounding object. If the position of one or more of the observer, the audio playback device, and the first sound-emitting object changes in the first scene, the electronic device may re-select the audio playback device that simulates the sound of the first sound-emitting object.
  • the first playback parameter may be based on the position of the first audio playback device in the first scene relative to the observer watching the electronic device,
  • the position of the first sound-emitting object in the second scene relative to the virtual camera of the first video is obtained by determining the position of the virtual camera in the second scene as corresponding to the position of the observer in the first scene.
  • the first audio playback device plays the first audio component with the first playback parameter, which can make the observer feel that the sound is emitted from the position of the first sound-emitting object in the first scene. Then, the observer can substitute himself into the scene presented by the video, and feel the orientation of different sounding objects more realistically. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the third audio playback device is based on all or part of the M audio playback devices in the first scene relative to the viewing electronic device The position of the observer, the position of the second sound-emitting object in the second scene relative to the virtual camera of the first video, and determining the position of the virtual camera in the second scene to correspond to the position of the observer in the first scene location obtained.
  • the third playback parameter is based on the position of the third audio playback device in the first scene relative to the observer watching the electronic device, the first The position of the second sound-emitting object in the second scene relative to the virtual camera of the first video is obtained by determining the position of the virtual camera in the second scene as corresponding to the position of the observer in the first scene.
  • the first video may also include more sounding objects within the first time period.
  • the method for separating the audio components of other sounding objects, the method of determining the audio playback device and playing time for playing the audio components of other sounding objects can refer to the above-mentioned processing methods for the first sounding object and the second sounding object. I won't go into details here.
  • the position corresponding to the position of the observer in the first scene may be the position corresponding to the position of the observer in the first scene same location.
  • the observer is located at the first position in the first scene
  • the first audio playback device is located at the second position in the first scene.
  • the electronic device may, based on the first position, the position of the first sound-emitting object relative to the virtual camera in the second scene position to obtain the third position of the first sound-emitting object in the first scene.
  • the first angle is the vertex of the angle with the first position, the first position, the third position, The smallest of the angles formed by the positions of all or part of the M audio playback devices.
  • the second audio playback device is among the M audio playback devices that do not play the sound contained in the first video within the first time period An audio playback device for the audio component of the object; or, the second audio playback device is an audio playback device that plays the audio components of the sounding object contained in the first video in the first time period among the M audio playback devices.
  • the electronic device may preferentially select an audio playback device in an idle state to play the second audio component of the first background sound. Reduce the situation where an audio playback device plays too many audio components at the same time. This can make more reasonable use of the M audio playback devices to achieve a better stereo effect and help the observer feel the orientation of different sounding objects during video playback.
  • the first image group includes one or more image frames
  • the first playback parameters include a first playback time and a first sound intensity
  • the second playing parameters include a second playing time and a second sound intensity.
  • the first playback time and the second playback time are within the first time period. In some other embodiments, it is necessary to adjust the result of the observer's discrimination of the sounding direction of the sounding object.
  • the electronic device instructs the audio playing device to adjust the playing time of playing the audio component of the sounding object.
  • the adjustment range of the above playback time is usually on the order of milliseconds. Then, the first playing time and/or the second playing time may not be within the first time period.
  • the electronic device may send a fourth message to the fourth audio playback device among the M audio playback devices, and the fourth message includes the second The audio component and the fourth playing parameter, the fourth message is used to instruct the fourth audio playing device to play the second audio component with the fourth playing parameter.
  • the fourth audio playing device may play the second audio component synchronously with the fourth playing parameter according to the fourth message.
  • the second audio playback device may be located on the first side of the electronic device
  • the fourth audio playback device may be located on the second side of the second electronic device, and the first side and the second side are in the direction of the display screen of the electronic device. divide the sides.
  • both the second audio playback device and the fourth audio playback device play the second audio component of the first background sound, which can better form a stereo sound in the video playback environment, and help observers perceive the sounding process of the sounding object in different directions .
  • the electronic device may obtain a first position according to the positions of multiple observers in the first scene, and the first position is used to represent the position of the observer watching the electronic device.
  • the above-mentioned first position may be the center of the positions of the plurality of observers.
  • the electronic device can integrate the positions of multiple observers to adjust the sounding object corresponding to each audio device and the playback parameters of the audio component of the audio device playing the sounding object.
  • the multiple observers mentioned above can substitute themselves into the scene where the sounding object is making a sound in the video, and feel that different sounding objects are making sound in different directions of themselves.
  • the electronic device can still adjust the sounding objects corresponding to each audio device and the playback parameters of the audio components of the sounding objects played by the audio device in real time. This can enable the observer to still feel the process of following the perspective of the virtual camera and the sounding objects in the video making sounds in different directions during the moving process.
  • the first scene is a scene where an observer watches an electronic device
  • the second scene is a scene presented by the first video
  • the first video includes the second image group and the second audio within the second time period.
  • the electronic device may determine that the first video contains the first sounding object and the second background sound within the second time period, and separate the fourth audio component of the first sounding object from the second audio and the fifth audio component of the second background sound.
  • the electronic device may send a fifth message to the first audio playback device, where the fifth message includes the fourth audio component and fifth playback parameters, and the fifth message is used to instruct the first audio playback device to play the fourth audio component with the fifth playback parameter.
  • the electronic device sends a sixth message to the second audio playback device, the sixth message includes the fifth audio component and the sixth playback parameter, and the sixth message is used to instruct the second audio playback device to play the fifth audio component with the sixth playback parameter.
  • the electronic device can display images in the second image group, and the first audio playback device can play the fourth audio component synchronously with the fifth playback parameters according to the fifth message, and the second audio The playback device may play the fifth audio component synchronously with the sixth playback parameter according to the sixth message.
  • the first background sound and the second background sound may be the same. In some other embodiments, the above-mentioned first background sound is different from the above-mentioned second background sound.
  • the electronic device can analyze the first video segment by segment, and detect in real time changes in the position of the sounding object, the position of the observer, and the position of the audio playback device during the content playback of each time segment of the first video.
  • the electronic device can timely adjust the audio playback device for playing the audio component of the sounding object and the playback parameters when one or more of the position of the sounding object, the position of the observer, and the position of the audio playback device changes.
  • the above-mentioned method can help the observer to enter the scene presented by the video during the whole process of playing the video, and feel the process of following the perspective of the virtual camera and the sounding objects in the video making sounds in different directions.
  • the present application provides a method for cooperatively playing audio during video playback, and the method can be applied to electronic devices.
  • the electronic device is capable of communicating with M audio playback devices.
  • Electronic devices may include display screens.
  • M is a positive integer greater than or equal to 2.
  • the electronic device may acquire a first video, and the first video includes a first image group and a first audio within a first time period.
  • the electronic device may determine that the first video contains the first sound-emitting object and the second sound-emitting object within the first time period, and separate the first audio component of the first sound-emitting object from the first audio and the third audio component of the second sounding object.
  • the electronic device may send a first message to a first audio playback device among the M audio playback devices, the first message includes a first audio component and a first playback parameter, and the first message is used to instruct the first audio playback device to play The parameter plays the first audio component.
  • the electronic device may send a third message to a third audio playback device among the M audio playback devices, the third message includes a third audio component and a third playback parameter, and the third message is used to instruct the third audio playback device to play The parameter plays the third audio component.
  • the electronic device may display images in the first image group.
  • the sound effect of each audio device playing audio on the sound orientation matches the position of the sounding object simulated by the audio device in the three-dimensional space presented by the first video.
  • the observer can substitute himself into the scene presented by the first video, and feel the orientation of different sounding objects more realistically. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the above-mentioned first audio playback device may be based on the position of all or part of the M audio playback devices in the first scene relative to the observer watching the electronic device, and the first audio playback device in the second scene A position of the sounding object relative to the virtual camera of the first video, and determining the position of the virtual camera in the second scene as corresponding to the position of the observer in the first scene.
  • the electronic device can determine the position of the first sounding object in the first scene, and select an audio playback device close to the position of the first sounding object in the first scene to simulate the sounding of the first sounding object. If the position of one or more of the observer, the audio playback device, and the first sound-emitting object changes in the first scene, the electronic device can reselect the audio playback device that simulates the sound of the first sound-emitting object.
  • the first playback parameter is based on the position of the first audio playback device in the first scene relative to the observer watching the electronic device.
  • the position of the first sound-emitting object in the second scene relative to the virtual camera of the first video is obtained by determining the position of the virtual camera in the second scene as corresponding to the position of the observer in the first scene.
  • the first audio playback device plays the first audio component with the first playback parameter, which can make the observer feel that the sound is emitted from the position of the first sound-emitting object in the first scene. Then, the observer can substitute himself into the scene presented by the video, and feel the orientation of different sounding objects more realistically. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the third audio playback device is based on all or part of the M audio playback devices in the first scene relative to the viewing electronic device The position of the observer, the position of the second sound-emitting object in the second scene relative to the virtual camera of the first video, and determining the position of the virtual camera in the second scene to correspond to the position of the observer in the first scene location obtained.
  • the third playback parameter is based on the position of the third audio playback device relative to the observer watching the electronic device in the first scene, the first The position of the second sound-emitting object in the second scene relative to the virtual camera of the first video is obtained by determining the position of the virtual camera in the second scene as corresponding to the position of the observer in the first scene.
  • the third audio playback device plays the third audio component with the third playback parameter, which can make the observer feel that the sound is emitted from the position of the second sound object in the first scene. This can increase the observer's sense of substitution and playability when watching the video, and improve the observer's sense of experience.
  • the first video may also include more sounding objects within the first time period.
  • the method for separating the audio components of other sounding objects, the method of determining the audio playback device and playing time for playing the audio components of other sounding objects can refer to the above-mentioned processing methods for the first sounding object and the second sounding object. I won't go into details here.
  • the position corresponding to the position of the observer in the first scene may be the position corresponding to the position of the observer in the first scene same location.
  • the first image group includes one or more image frames
  • the playback parameters include playback time and sound intensity.
  • the first playing parameter plays the first playing time and the first sound intensity.
  • the first scene is a scene where an observer watches an electronic device
  • the second scene is a scene presented by the first video
  • the electronic device may obtain a first position according to the positions of multiple observers in the first scene, and the first position is used to represent the position of the observer watching the electronic device.
  • the above-mentioned first position may be the center of the positions of the plurality of observers.
  • the electronic device can integrate the positions of multiple observers to adjust the sounding object corresponding to each audio device and the playback parameters of the audio component of the audio device playing the sounding object.
  • the multiple observers mentioned above can substitute themselves into the scene where the sounding object is making a sound in the video, and feel that different sounding objects are making sound in different directions of themselves.
  • the electronic device can still adjust the sound objects corresponding to each audio device in real time and the playback parameters of the audio component of the audio device playing the sound object. This can enable the observer to still feel the process of following the perspective of the virtual camera and the sounding objects in the video making sounds in different directions during the moving process.
  • the present application provides an electronic device.
  • the electronic device includes memory and a processor.
  • memory can be used to store computer programs.
  • the processor may be used to invoke the above computer program, so that the electronic device executes any possible implementation method in the first aspect or the third aspect.
  • the present application provides a communication system, characterized in that the communication system includes electronic equipment and M audio playback devices, the electronic equipment includes a display screen, M is a positive integer greater than or equal to 2, and the M audio playback devices It includes a first audio playback device and a second audio playback device.
  • the electronic device may be used to execute any possible implementation method in the first aspect or the third aspect.
  • the first audio playback device may be configured to synchronously play the first audio component with the first playback parameter during the playback of the first video during the first time period.
  • the second audio playback device may be used to synchronously play the second audio component with the second playback parameter during the playback of the first video during the first time period.
  • the present application provides a computer-readable storage medium.
  • the computer readable storage medium stores a computer program.
  • the electronic device is made to execute any possible implementation method in the first aspect or the third aspect.
  • the present application provides a computer program product
  • the computer program product may contain computer instructions, when the computer instructions run on the electronic device, the electronic device executes any one of the first aspect or the third aspect possible implementation methods.
  • the present application provides a chip, the chip is applied to an electronic device, the chip includes one or more processors, and the processor is used to invoke computer instructions so that the electronic device executes the electronic device according to the first aspect or the third aspect any of the possible implementation methods.
  • the electronic device provided by the fourth aspect the communication system provided by the fifth aspect, the computer-readable storage medium provided by the sixth aspect, the computer program product provided by the seventh aspect, and the chip provided by the eighth aspect are all used for Execute the method provided in the embodiment of this application. Therefore, the beneficial effects that it can achieve can refer to the beneficial effects in the corresponding method, and will not be repeated here.
  • FIG. 1 is a schematic diagram of a scenario of cooperatively playing audio during video playback provided by an embodiment of the present application
  • Fig. 2A-Fig. 2D are some schematic diagrams of the principle of identifying the direction of the sound source provided by the embodiment of the present application;
  • FIG. 3A is a schematic structural diagram of an electronic device 100 provided in an embodiment of the present application.
  • FIG. 3B is a schematic structural diagram of an audio playback device 200 provided in an embodiment of the present application.
  • FIG. 4 is a flow chart of a method for cooperatively playing audio during video playback provided by an embodiment of the present application
  • FIG. 5A is a flow chart of a method for determining a sounding object contained in a video and an audio component of the sounding object provided by an embodiment of the present application;
  • Fig. 5B is a schematic diagram of an image recognition provided by an embodiment of the present application.
  • FIG. 5C is a schematic diagram of a separated audio component provided by an embodiment of the present application.
  • Fig. 5D is a flowchart of a method for determining the sounding object contained in the video and the audio component of the sounding object provided by the embodiment of the present application;
  • FIG. 5E is a flow chart of another method for determining the sounding object contained in the video and the audio component of the sounding object provided by the embodiment of the present application;
  • Fig. 5F is a flow chart of another method for determining the sounding object contained in the video and the audio component of the sounding object provided by the embodiment of the present application;
  • Fig. 6A is a schematic diagram of a method for determining the position of the sounding object and the position of the virtual camera provided by the embodiment of the present application;
  • Figure 6B and Figure 6C are schematic diagrams of the positional relationship between some sound objects and virtual cameras provided by the embodiment of the present application.
  • FIG. 6D is a flow chart of a method for acquiring the location of an audio playback device provided in an embodiment of the present application.
  • FIG. 6E is a schematic diagram of the positional relationship between an observer, an electronic device 100, and an audio playback device according to an embodiment of the present application;
  • FIG. 6F is a schematic diagram of the positional relationship between an observer, an electronic device 100, an audio playback device, and a sounding object provided by an embodiment of the present application;
  • Figures 7A to 7C are schematic diagrams of determining the sounding objects and playback parameters corresponding to the audio playback device provided by the embodiment of the present application;
  • FIG. 8A and FIG. 8B are schematic diagrams of the positional relationship among other observers, electronic equipment 100, audio playback equipment, and sounding objects provided by the embodiment of the present application;
  • FIG. 9A is a schematic diagram of another positional relationship among an observer, an electronic device 100, and an audio playback device provided by an embodiment of the present application;
  • FIG. 9B is a schematic diagram of the positional relationship between another sounding object and a virtual camera provided by the embodiment of the present application.
  • FIG. 9C is another schematic diagram of the positional relationship between the observer, the electronic device 100, the audio playback device, and the sounding object provided by the embodiment of the present application;
  • FIG. 9D and FIG. 9E are schematic diagrams of scenarios of a method for a user to select cooperative audio playback during video playback provided by an embodiment of the present application.
  • 10 to 12 are schematic structural diagrams of some communication systems provided by the embodiments of the present application.
  • a and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists alone, Wherein A and B can be singular or plural.
  • the character "/" generally indicates that the contextual objects are an "or" relationship.
  • references to "one embodiment” or “some embodiments” or the like in this specification means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.
  • the term “connected” includes both direct and indirect connections, unless otherwise stated. "First” and “second” are used for descriptive purposes only, and should not be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features.
  • a device for playing video may include an electronic device 100 and one or more audio playback devices.
  • the audio playback device may include audio playback device 200 , audio playback device 201 , audio playback device 202 , and audio playback device 203 .
  • the audio playback device 200 , the audio playback device 201 , the audio playback device 202 , and the audio playback device 203 can all establish a communication connection with the electronic device 100 .
  • the aforementioned communication connection may include a wired communication connection, or may include a wireless communication connection such as a Bluetooth communication connection, a wireless fidelity (wireless fidelity, Wi-Fi) communication connection, a ZigBee communication connection, and the like.
  • the embodiment of the present application does not limit the specific manner of establishing the foregoing communication connection.
  • the electronic device 100 may be a device with a display screen (such as a television, a monitor, etc.), and may be used to display images in a video.
  • the electronic device 100 has an audio output device (such as a speaker), and the electronic device 100 can also be used to play the audio in the video.
  • the audio playback devices 200-203 may be devices with audio output devices (such as speakers) such as speakers and power amplifiers, and may be used to cooperatively play audio in the video during video playback.
  • the audio playback devices 200 - 203 may be distributed on both sides (such as left and right) of the electronic device 100 .
  • the audio playback devices 200-203 may be devices of the same model or different models.
  • the electronic device 100 may instruct one or more of the audio playback devices 200-203 to play the audio in the video cooperatively, so as to bring a better listening experience to the user. .
  • the observer shown in FIG. 1 may be a user watching a video.
  • the user can browse the video screen through the electronic device 100 and listen to the audio in the video through the audio playback devices 200-203.
  • the user can experience the loudness, pitch and timbre of the sound produced by the audio playback devices 200-203 through their own ears. Relying on both ears, the user can also distinguish the direction of the sound source and determine the location of the sound source.
  • the location of the sound source may be the location of one or more audio playback devices among the audio playback devices 200-203.
  • the user can realize position identification by listening, mainly by feeling at least one of the time difference, phase difference, sound level difference, and timbre difference when the sound reaches the two ears.
  • the above-mentioned time difference may represent the difference in the time when the sound arrives at the two ears.
  • the aforementioned phase difference may represent a difference in phase between sound waves of sounds heard by the two ears.
  • the above sound level difference may represent the difference in the sound intensity heard by the two ears.
  • the above-mentioned timbre difference may represent the difference in timbre of the sounds heard by the two ears.
  • the following describes the method for the user to distinguish the direction of the sound source by hearing the difference between the two ears of the sound.
  • FIG. 2A shows a scenario in which a sound reaches a user according to an embodiment of the present application.
  • the sound source is at the lower left of the user (also referred to as the front right of the user).
  • the distance of the sound source from the user's right ear is smaller than the distance of the sound source from the user's left ear.
  • the user's right ear hears the sound generated by the above-mentioned sound source before the left ear. Because the longer the sound travels in the air, the less energy it has. Then, the sound intensity heard by the user's right ear is greater than the sound intensity heard by the left ear.
  • the user can judge that the sound source is located on his right according to one or more of the above time difference and sound level difference.
  • the sound wave of the sound may also have a phase difference.
  • FIG. 2B shows a phase diagram of a signal provided by the embodiment of the present application.
  • Fig. 2B shows the phase of the sound wave by taking a sine wave as an example.
  • Phase is a measure that describes changes in the waveform of a signal. Phase can be measured in degrees (°).
  • the signal waveform changes periodically one cycle of the signal waveform is 360°.
  • an ideal sine wave can change its phase from 0° to 360° in a complete cycle, experience a peak and a trough, and finally return to the original position.
  • the sine wave starts from positive vibration
  • the phase of the peak is 90°
  • the phase of the trough is 270°
  • the phase difference between the peak and the trough is 180°. Since the sound generated by a sound source reaches the user's ears at different times, the phases of the sound waves may be different when the sound reaches the two ears.
  • FIG. 2C and FIG. 2D show other scenarios in which sounds reach the user provided by the embodiments of the present application.
  • the phase when the sound wave reaches the user's right ear is phase A1
  • the phase when the sound wave reaches the user's left ear is phase A2 .
  • Phase A1 and phase A2 are different.
  • the human ear can feel the above-mentioned phase difference, and then the user can judge the direction of the sound source according to the above-mentioned phase difference.
  • the human ear can also feel the above-mentioned phase difference, and then the user can judge the direction of the sound source according to the above-mentioned phase difference.
  • a shielding effect also called a masking effect
  • the higher the frequency of the sound wave the shorter the wavelength of the sound wave.
  • the wavelength of a sound wave with a frequency of 20 hertz (Hz) at normal temperature is 17 meters (m)
  • the wavelength of a sound wave with a frequency of 200 Hz is 1.7 m.
  • the sound generated by the sound source is not disturbed by obstacles when it reaches the user's right ear.
  • the user's right ear can hear all the sound waves (such as high-frequency sound waves, low-frequency sound waves) in the sound.
  • the sound generated by the sound source is disturbed by the user's head while reaching the user's left ear. Some of the high-frequency sound waves in the sound may be blocked from reaching the user's left ear. The user's left ear cannot hear the above-mentioned blocked high-frequency sound waves. Then, the frequency of the sound heard by the user's left ear and the right ear is different, that is, the timbre of the sound heard by both ears is different.
  • the propagation process of the sound from the front and rear of the user into the ear canal is different. Among them, the sound generated by the sound source located in front of the user can be reflected by the pinna and directly enter the ear canal. Sound from a sound source located behind the user needs to bypass the pinna to enter the ear canal. It can be seen that the sound generated by the sound source located behind the user will be blocked by the pinna during transmission. Then, part of the high-frequency sound waves from the rear sound may not be able to enter the user's ear canal.
  • the sound heard by the user will have a timbre difference.
  • the auditory area of the brain can compare the timbre of the sound it hears with the signal it has previously grasped, and determine whether the sound source is in front or behind the user.
  • the electronic device 100 may display images in the video through a display screen, and the audio playback devices 200-203 may play audio in the video synchronously.
  • the audio playback device that is, the sound source is the above audio playback device.
  • the above-mentioned played video may contain multiple speaking objects (for example, the above-mentioned video is a video recording a conversation among multiple people, and the speaking objects in the video include multiple people).
  • the sounds of the multiple sounding objects are transmitted to the user by the audio playing device during the video playing process. It is difficult for the user to determine the orientations of the multiple sounding objects through hearing the sounds of the multiple sounding objects. That is to say, it is difficult for the user to quickly substitute himself into the video scene, and he cannot feel that different vocal objects are vocalizing in different directions, and the user experience is limited.
  • the present application provides a method for cooperatively playing audio during video playback.
  • the electronic device 100 may analyze the image and audio included in the video to be played, and determine the sounding object and the audio component of the sounding object in the video.
  • the electronic device 100 may determine the positions of the sounding object and the virtual camera according to the images included in the video.
  • the position of the above-mentioned virtual camera can represent the position of the camera during the actual shooting of the above-mentioned video to be played.
  • the electronic device 100 can determine the positions of the observer and the audio playback device.
  • the electronic device 100 may determine the positional relationship between the sounding object in the video and the observer by taking the position of the above-mentioned virtual camera as the position of the observer.
  • the electronic device 100 may instruct the audio playing device to simulate the sound of the sounding object according to the positional relationship among the audio playing device, the sounding object and the observer.
  • the electronic device 100 may instruct the audio playback device to adjust playback parameters (such as playback time, sound intensity, etc.), and use the adjusted playback parameters to play the audio component of the audio playback device to simulate the sounding object, so that the user
  • playback parameters such as playback time, sound intensity, etc.
  • the position of the sound source from which the audio playback device emits the sound may be determined as the position of the sounding object simulated by the audio playback device.
  • the user can substitute himself into the scene where the sounding object is making a sound in the video, and feel that different sounding objects are making sounds in different directions.
  • the above method can increase the user's sense of substitution and playability when watching a video, and improve the user's sense of experience.
  • the above-mentioned electronic device 100 may be a display device with an image display function.
  • the device used to analyze the image and audio contained in the video and instruct the audio playback device to simulate the sound of the sounding object may be other electronic devices (for example, video analysis equipment and control equipment).
  • the above video analysis device can be used to analyze the images and audio included in the video.
  • the above-mentioned control device can be used to instruct the electronic device 100 to display images in the video, and instruct the audio playback device to simulate the sounding object to make a sound.
  • the communication system to which the method for cooperatively playing audio in video playing is applied may include the electronic device 100 and one or more audio playing devices.
  • the communication system may further include the above-mentioned video analysis device and control device.
  • the embodiment of the present application does not limit the devices included in the communication system.
  • the communication system composed of the electronic device 100 and the audio playback device is taken as an example for illustration.
  • Video in this application may refer to multimedia data including images and audio.
  • the video may be collected by a camera, a mobile phone, a tablet computer, a notebook computer, a TV, and other devices equipped with a camera.
  • the video may also be obtained by synthesizing multiple frames of images.
  • the embodiment of the present application does not limit the method of generating the video.
  • the file format of the video can include avi, mp4, mov, wmv, etc.
  • the embodiment of the present application does not limit the file format of the video.
  • Video playback equipment (such as mobile phones, televisions, etc.) has a display screen and an audio output device.
  • the video playback device can play images contained in the video at a preset frame rate, such as 24 frames per second, 30 frames per second, 60 frames per second, and so on.
  • the aforementioned 24 frames per second may mean that the video playback device continuously displays 24 frames of images on the display screen per second.
  • the images in the video have a temporal correspondence with the audio.
  • the video playing device plays the video according to the above-mentioned corresponding relationship in time
  • the video picture and the sound can be synchronized.
  • the display time of the image is the same as the playing time of the audio corresponding to the image.
  • the interval between the display time of the image and the playing time of the audio corresponding to the image is less than a preset time interval.
  • the above-mentioned preset time interval may be the maximum time difference, such as 100 milliseconds, for the user to watch the video without feeling that the video picture is out of sync with the sound.
  • the video picture and sound are synchronized in the user's perception. For example, if the user can hear the voice of a person speaking while seeing a person speaking on the video screen, then the video screen and the sound can be considered to be synchronous.
  • the images in the video may be two-dimensional images. Objects contained in images in a video can be located in a three-dimensional space. Objects contained in the image can be in different positions in this three-dimensional space.
  • the virtual camera may be a camera that shoots an object in a three-dimensional space as the object included in the image in the above-mentioned video. That is, the images in the video can be considered as captured by the virtual camera.
  • the above three-dimensional reconstruction can construct the three-dimensional space of video shooting, so as to determine the positional relationship between the user watching the video and the sounding object in the video in the three-dimensional space, so that the user can feel personally: different sounding objects in the video are in their own sound in different directions, so that the user is placed in the scene.
  • a sounding object may refer to an object capable of making sound.
  • people, or animals such as cats and dogs, or vehicles such as trains and automobiles, or natural scenery such as waterfalls, rain, thunder, sea, etc.
  • the embodiment of the present application does not limit the type of the speaking object.
  • the playback parameter may represent a parameter for the audio playback device to play audio.
  • An audio playback device may be a device for playing audio.
  • Playing parameters may include playing time and sound intensity. Sound intensity can also be called volume.
  • Sound effects may include the user's subjective perception of sound, such as loudness, pitch, and timbre. Sound effects can also include objective physical quantities of sound, such as phase and sound pressure of sound waves.
  • the audio playback device increases the sound intensity of the played audio, the louder the sound heard by the user, the greater the sound pressure of the sound wave reaching the user's ears.
  • the change in pressure is the sound pressure. That is to say, the sound pressure can represent the pressure change caused by the sound wave vibration when the sound wave passes through the propagating medium.
  • the size of the sound pressure can be represented by the sound pressure level (SPL). Sound pressure has the following relationship with sound intensity:
  • the above-mentioned p may represent sound pressure.
  • the above-mentioned I may represent sound intensity.
  • the above-mentioned ⁇ may represent the medium density.
  • the above C may represent the speed of sound. In air, C can have a value of 340 m/s. It can be seen that in the case of the same medium density, the greater the intensity of the sound, the greater the sound pressure.
  • two audio playback devices may play different audio components of a piece of audio.
  • the audio playback device A1 plays the audio component B1 in a piece of audio.
  • the audio playback device A2 plays the audio component B2 in this piece of audio.
  • the audio playback device A1 does not change playback parameters during playback of the audio component B1.
  • the audio playing device A2 adjusts the sound intensity of the playing audio component B2
  • the user's discrimination result of the distance between the sound source of the audio component B2 that is, the audio playing device A2
  • the user may feel that the distance between the audio playback device A2 and himself is shortening.
  • the audio playback device A2 adjusts the playback time of the audio component B2 during playback.
  • the audio corresponding to the time point Tc of the audio component B1 in the above-mentioned piece of audio should originally be played simultaneously with the audio corresponding to the time point Tc of the audio component B2 in the above-mentioned piece of audio. Since the audio playback device A2 adjusts the playback time of the audio component B2.
  • the audio corresponding to the time point Tc of the audio component B1 in the above audio segment is no longer played simultaneously with the audio corresponding to the time point Tc of the audio component B2 in the audio segment, and there is a certain time difference.
  • the phase at which the above-mentioned piece of audio reaches the user's ears may change, so that the user's resolution of the direction of the sound source of the audio component B2 (ie, the audio playback device A2) relative to the user changes. That is to say, by changing the playing time of the audio components in the audio, so that there is a time difference between the playing times of different audio components in the audio, the phase of the sound wave can be changed.
  • the embodiment of the present application does not limit the type of the playback parameter.
  • the above playback parameters may also include the frequency of the sound wave and the like. Wherein, adjusting the frequency of the above-mentioned sound waves can change the sound effects such as tone and timbre heard by the user.
  • the electronic device 100 may include a processor 110, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, Antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194 and the like.
  • a processor 110 an internal memory 121
  • a universal serial bus universal serial bus
  • USB universal serial bus
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit, NPU
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • the USB interface 130 is an interface conforming to the USB standard specification.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100 , and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through them.
  • the charging management module 140 is configured to receive a charging input from a charger. While the charging management module 140 is charging the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 100 .
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite, etc. applied on the electronic device 100.
  • WLAN wireless local area networks
  • System global navigation satellite system, GNSS
  • frequency modulation frequency modulation, FM
  • near field communication technology near field communication, NFC
  • infrared technology infrared, IR
  • the electronic device 100 may further include a millimeter wave radar module.
  • the millimeter-wave radar module can transmit millimeter-wave radar signals and receive reflected millimeter-wave radar signals. According to the transmitted millimeter-wave type signal and the received reflected millimeter-wave type signal, the millimeter-wave radar module can obtain the difference frequency signal, and use the difference frequency signal to determine the position of the target (such as an object, a human body, etc.).
  • the electronic device 100 may further include an ultra wide band (UWB) module.
  • the UWB module can provide a wireless communication solution based on UWB technology applied to the electronic device 100 .
  • a UWB module can realize the function of a UWB base station.
  • UWB base stations can be used to locate UWB tags.
  • the UWB signal can be detected and combined with some positioning algorithms to calculate the duration of the UWB signal flying in the air, and the duration is multiplied by the transmission rate of the UWB signal in the air (such as the speed of light) to obtain the distance between the UWB tag and the UWB base station .
  • the electronic device 100 realizes the display function through the GPU, the display screen 194 , and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • the display screen 194 is used to display images, videos and the like.
  • the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 and the application processor. Camera 193 is used to capture still images or video.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be realized through the NPU, such as: image recognition, image three-dimensional reconstruction, face recognition, voice recognition, text understanding, and the like.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 .
  • the electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the electronic device 100 may be provided with at least one microphone 170C.
  • the electronic device 100 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals.
  • the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc.
  • the earphone interface 170D is used for connecting wired earphones.
  • the sensor module 180 may include one or more of the following: pressure sensor, gyroscope sensor, air pressure sensor, magnetic sensor, acceleration sensor, distance sensor, proximity light sensor, infrared sensor, fingerprint sensor, temperature sensor, touch sensor, ambient light sensor , bone conduction sensor.
  • the keys 190 include a power key, a volume key and the like.
  • the electronic device 100 may receive key input and generate key signal input related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate a vibrating reminder.
  • the indicator 192 can be an indicator light, and can be used to indicate charging status, power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the electronic device 100 shown in FIG. 3A can be equipped with Or portable electronic equipment with other operating systems, such as mobile phones, tablet computers, notebook computers, etc., can also be a laptop computer (Laptop) with a touch-sensitive surface or a touch panel, a desktop computer with a touch-sensitive surface or a touch panel Non-portable electronic devices such as computers.
  • the embodiment of the present application does not limit the type of the electronic device 100 .
  • a schematic structural diagram of an audio playback device 200 involved in the present application is introduced below.
  • the audio playback device 200 may include a communication device 210 , a memory 211 , a processor 212 , a microphone 213 and a speaker 214 coupled through a bus. in:
  • the communication device 210 can be used for the audio playback device 200 to establish a communication connection with other electronic devices (such as the electronic device 100 ).
  • the audio playback device 200 may receive the audio components, playback parameters, and playback instructions sent by the electronic device 100 through the communication device 210 .
  • the above playing instruction may be used to instruct the audio playing device 200 to play the audio component with the playing parameters.
  • the memory 211 can be used to store various software programs and/or sets of instructions.
  • the memory 211 can also store a communication program, which can be used to communicate with devices such as the electronic device 100 .
  • the memory 211 can also store a video analysis program.
  • the video analysis program can be used to analyze the image and audio contained in the video, and obtain information such as the sounding object in the video, the audio component of the sounding object, and the location of the sounding object.
  • the memory 211 can also store a playback parameter determination program.
  • the playback parameter determination program can be used to determine the playback parameters used by an audio playback device to play the audio component of a sounding object.
  • the processor 212 can be used to read and execute programs in the memory 211 . Such as communication program, video analysis program, playback parameter determination program, etc. That is to say, the audio playback device 200 can determine the sounding object and the audio component of the sounding object in the video. The audio playback device 200 can determine which audio playback device is to play the audio component of a sounding object, and play parameters of the audio playback device when simulating the sounding of the sounding object.
  • the microphone 213 can be used to collect sound signals and convert the sound signals into electrical signals.
  • One or more microphones may be included in the audio playback device 200 .
  • the audio playback device 200 can collect sound signals through a microphone to identify the direction of the sound source.
  • the audio playback device 200 may also include other types of audio input devices.
  • Speaker 214 may be used to convert audio electrical signals into audio signals. That is, the audio playback device 200 can play the audio in the video through the speaker 214 . Not limited to the speaker 214, the audio playback device 200 may also include other types of audio output devices.
  • the structure shown in the embodiment of the present application does not constitute a specific limitation on the audio playback device 200 .
  • the audio playback device 200 may include more or fewer components than those shown in the illustration, or some components may be combined, or some components may be separated, or different component arrangements may be made.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • FIG. 3B For structures of other audio playback devices (such as audio playback devices 201 to 203 ) involved in this embodiment of the present application, reference may be made to the structure shown in FIG. 3B . I won't go into details here.
  • the following specifically introduces a method for cooperatively playing audio during video playback provided by the embodiment of the present application.
  • FIG. 4 exemplarily shows a flowchart of a method for cooperatively playing audio during video playback provided by an embodiment of the present application.
  • the method can be applied to a communication system including the electronic device 100 , the audio playback device 200 , and the audio playback device 201 . It is not limited to the audio playback device 200 and the audio playback device 201, and more audio playback devices may be included in the communication system.
  • the embodiment of the present application specifically takes two audio playback devices as an example for description.
  • a communication connection is established between the electronic device 100 and the audio playback device 200 and the audio playback device 201 .
  • Above-mentioned communication connection can comprise wired communication connection, and wireless communication connection such as bluetooth communication connection, Wi-Fi communication connection, ZigBee communication connection.
  • the embodiment of the present application does not limit the specific manner of establishing the foregoing communication connection.
  • the method may include steps S411-S416. in:
  • the observer can perform the first operation.
  • the first operation can be used to play the first video.
  • the aforementioned watcher may represent a user watching the first video (that is, watching the electronic device 100 ).
  • the above-mentioned first operation may be performed on the electronic device 100 .
  • the electronic device 100 has a video play control for playing the first video.
  • the above-mentioned first operation may be an operation on the video playback control.
  • the first operation above may also be applied to other electronic devices other than the electronic device 100 (such as a remote control, a sound box, a mobile phone, etc. used to control the electronic device 100 ).
  • the above-mentioned other electronic devices may send a video playing instruction to the electronic device 100 .
  • the video playing instruction may be used to instruct the electronic device 100 to play the first video.
  • the embodiment of the present application does not limit the specific implementation manner of the foregoing first operation.
  • the first video may be divided into a plurality of different time segments according to the playing time sequence. For example, a first time period, a second time period, a third time period, etc. (the number of specific time periods is not limited).
  • the length of a time period may be, for example, 4 seconds, 5 seconds, and so on.
  • the embodiment of the present application does not limit the value of the time length of a time period.
  • the electronic device 100 may analyze the first video segment by segment according to the sequence of time segments. Wherein, the audio in the first video usually includes background sound in addition to the sound from the specific sounding object.
  • the above-mentioned background sound may include sounds other than the sound of the sounding object in the environment during the recording of the first video, machine noise of the video recording device, and the like. There is no corresponding sound object for the background sound.
  • the electronic device 100 may determine audio components of different sounding objects in a piece of audio. After determining the audio components of all sounding objects in a piece of audio, the electronic device 100 may determine the audio components remaining after the audio components of the sounding objects as the audio components of the background sound.
  • a first video contains a first group of images and a first audio within a first time period.
  • the electronic device 100 may determine that the first video contains the first sounding object and the first background sound within the first time period according to the first image group and the first audio, and separate the first sounding object from the first audio.
  • the first video includes a second set of images and first audio for a second time period.
  • the electronic device 100 may determine, according to the second image group and the second audio, that the first video contains the first sound object and the fifth audio component of the second background sound within the first time period.
  • the above-mentioned first background sound may be the same as the above-mentioned second background sound.
  • the first background sound may be different from the second background sound.
  • the first video may only include one time period, that is, the first time period.
  • the first audio component of the first sounding object included in the first video in the first time period, and the first background sound The second audio component of , as an example, is introduced.
  • the processing method for the audio components of the sounding object and background sound contained in other time periods (such as the second time period, etc.) in the first video is the same as for the above-mentioned first audio component, second audio Components are handled in the same way.
  • This application does not describe the processing of the audio components of the sounding object and the background sound contained in other time periods in the first video.
  • the electronic device 100 receives a first operation, and acquires a first video.
  • the data in the first time period in the first video contains the first image group and the first audio
  • the images in the first image group present the action of the first sounding object
  • the first audio contains the first audio of the first sounding object component and the second audio component of the first background sound.
  • the first video may be a video stored locally on the electronic device 100 .
  • the electronic device 100 may acquire data of the first video from a local storage.
  • the first video may also be a video stored on a cloud server.
  • the electronic device 100 may request data of the first video from the cloud server.
  • the embodiment of the present application does not limit the method for the electronic device 100 to acquire the first video.
  • the electronic device 100 may analyze the first video in real time during playing the first video, and determine the sounding object, the audio component of the sounding object, and the location of the sounding object in the first video. .
  • the position of the above-mentioned sounding object can be used to determine an audio playback device for playing the audio component of the sounding object.
  • the electronic device 100 can select an audio playback device from multiple audio playback devices to play the audio component of the sound object according to the position of the sound object, the position of the audio playback device, and the position of the observer, and determine to play the audio of the sound object.
  • the component's playback parameters are examples of the audio playback parameters.
  • the implementation method for the electronic device 100 to obtain the location of the sounding object, the location of the audio playback device, and the location of the observer will be specifically introduced in subsequent embodiments. Let's not expand here.
  • the position of the sounding object in the first video may change, and the position of the observer may also change during the playback of the first video, and the first video is analyzed in real time to determine the audio played by the audio playback device.
  • the component and playback parameters can enable the audio playback device to better simulate the sounding process of the sounding object, and improve the user's sense of substitution when watching the video.
  • the above-mentioned first image group may include one or more frames of images.
  • the electronic device 100 sends a first message to the audio playback device 200, where the first message includes the first audio component and a first playback parameter, and the first message is used to instruct to play the first audio component with the first playback parameter.
  • the electronic device 100 may determine that the audio playback device 200 corresponds to the first sound-emitting object according to the position of the first sound-emitting object, the position of the audio playback device 200 , and the position of the observer.
  • the position of the first utterance object may be the position of the first utterance object within the first time period.
  • the position of the observer may be the position of the observer when the video in the first time period is played.
  • the electronic device 100 may instruct the audio playing device 200 to play the first audio component of the first sounding object.
  • the first playing parameters may include a first playing time and a first sound intensity.
  • the above-mentioned first playback parameter may be determined according to the first audio, the position of the first sounding object, the position of the audio playback device 200, and the position of the observer.
  • both the above-mentioned observer and the audio playback device 200 can be located in the first scene, and the above-mentioned first scene may represent a scene of watching the first video (that is, a real scene).
  • the position of the observer mentioned above may be the position of the observer in the first scene.
  • the above-mentioned position of the audio playback device 200 may also be the position of the audio playback device 200 in the first scene.
  • Both the above-mentioned first sounding object and the virtual camera of the first video can be located in the second scene.
  • the second scene may represent a scene embodied in the first video (ie, a virtual scene).
  • the first sounding object and the observer can be obtained or their positional relationship in the above-mentioned first scenario.
  • the method for the electronic device 100 to determine the correspondence between the audio playback device and the sounding object, and to determine the playback parameters will be described in detail in subsequent embodiments. Let's not expand here.
  • the electronic device 100 sends a second message to the audio playback device 201, where the second message includes the second audio component and a second playback parameter, and the second message is used to instruct to play the second audio component with the second playback parameter.
  • the electronic device 100 may select an idle audio playback device among the multiple audio playback devices to play the second audio component of the first background sound. For example, during the video playing process within the above-mentioned first time period, the audio playback device 201 does not play the audio component of the sounding object (that is, the audio playback device 201 is in an idle state). The electronic device 100 may instruct the audio playing device 201 to play the second audio component.
  • the second play parameter can include a second play time and a second sound intensity. The foregoing second playback parameter may be determined according to the second audio component.
  • the electronic device 100 may select at least one audio playback device from the audio playback devices on its side (such as the left side), and select at least one audio playback device on its other side (such as the right side) At least one audio playback device is selected from the audio playback devices to play the second audio component.
  • the aforementioned one side of the electronic device 100 and the other side of the electronic device 100 may be one side and the other side in the direction in which the display screen of the electronic device 100 is facing.
  • the electronic device 100 may preferentially select an idle audio playback device to play the second audio component of the first background sound.
  • There are audio playback devices on both sides of the electronic device 100 to play the second audio component of the first background sound which can better form a stereo sound in the video playback environment, and help users perceive the sounding process of sounding objects in different directions.
  • one of the audio playback devices may be a reference audio playback device.
  • the reference audio playback device can play the second audio component according to the above-mentioned second playback parameter.
  • the playback parameters for other audio playback devices to play the first audio 2 may be determined according to the second playback parameters, the position of the reference audio playback device, the positions of other audio playback devices, and the position of the observer.
  • the first background sound played by the above-mentioned multiple audio playback devices can reach the observer at the same time, and the first background sound has the same sound intensity when reaching the observer.
  • the electronic device 100 displays the images in the first image group.
  • the audio playback device 200 plays the first audio component synchronously according to the first playback parameter.
  • the audio playback device 201 plays the second audio component synchronously according to the second playback parameter.
  • Both the first audio component and the second audio component are audio of the first video within the first time period.
  • the audio playback device 200 plays the first audio component
  • the audio playback device 201 plays the second audio component
  • the electronic device 100 displays the images in the first image group synchronously.
  • the first image group includes images showing the first utterance object's utterance (eg, an image showing the person 1 speaking).
  • the audio playback device 200 plays the first audio component. That is to say, the video picture is synchronized with the audio in the video.
  • each audio playback device simulates the sound effect of the sounding object in the sound direction, and matches the position of the sounding object simulated by the audio playback device in the three-dimensional space represented by the video picture. In this way, the user can more truly feel the process of being in the scene and the sounding objects in the video making sounds in different directions.
  • the above method can increase the user's sense of substitution and playability when watching a video, and improve the user's sense of experience.
  • the following specifically introduces an implementation method for determining a sounding object contained in a video and an audio component of the sounding object provided by the embodiment of the present application.
  • FIG. 5A exemplarily shows a flow chart of a method for determining a sounding object contained in a video and an audio component of the sounding object.
  • the method is applicable to the electronic device 100 .
  • the determination of the sounding object and the audio component of the sounding object contained in the first video within the first time period is specifically taken as an example for illustration.
  • the method may include steps S510-S530. in:
  • S510 Perform image recognition on the images in the first image group to identify objects in the images.
  • the electronic device 100 may perform image recognition on the images included in the first image group in the first video, and determine the objects included in the images in the first image group.
  • the electronic device 100 may determine potential sounding objects included in the images in the first image group.
  • the aforementioned potential sounding objects may represent objects that may make sounds.
  • the above-mentioned first image group may include multiple frames of images.
  • the multiple frames of images included in the first image group may be continuous multiple frames of images within a period of time in the first video.
  • the first image group includes the images shown in Fig. 5B.
  • the image shown in Fig. 5B is taken as an example to illustrate the method of recognizing images in images.
  • the electronic device 100 performs image recognition on the image 610 shown in FIG. 5B , and can recognize that the image 610 contains objects: character Ha 611 , character Hb 612 , cat 613 , trash can 614 , car 615 and tree 616 .
  • the above-mentioned characters Ha611 and Hb612 may represent two different characters.
  • the electronic device 100 may determine the person Ha611, the person Hb612, the cat 613, and the car 615 as potential utterance objects. Since the trash can 614 and the tree 616 are generally unable to emit sound, the electronic device 100 may determine that the trash can 614 and the tree 616 are not potential sound emitting objects.
  • the electronic device 100 may use an image recognition model to recognize objects included in the images in the first image group.
  • the aforementioned image recognition model may be acquired by the electronic device 100 from a local storage, or may be acquired by the electronic device 100 from a server (eg, a cloud server).
  • the aforementioned image recognition model may be a trained model based on a neural network. Input a frame of image into the image recognition model, and the image recognition model can output information such as the category, quantity, and position of the objects contained in the frame of image.
  • the embodiment of the present application does not limit the type of the image recognition model and the training method of the image recognition model.
  • a category table of potential utterance objects may be stored in the electronic device 100 .
  • the electronic device 100 may determine the potential utterance objects contained in the images in the first image group according to the category table of potential utterance objects.
  • the image recognition model described above can be trained to recognize specified types of objects.
  • the object of the above-mentioned specified type may be the above-mentioned potential sounding object.
  • the electronic device 100 may use the image recognition model to determine the potential sounding object included in the images in the first image group.
  • the electronic device 100 can also identify the gender of the person in the image, such as man and woman. Since the voiceprint characteristics of men and women are significantly different, the electronic device 100 can associate the characters in the image with the audio components in the audio according to the difference in gender. For example, the character Ha611 in the image 610 is a woman, and the character Hb612 is a man. If it is determined that the audio component played at the display time of the image 610 is an audio component of a female voice, the electronic device 100 may associate the audio component of a female voice with the character Ha611 in the image 610 . If it is determined that the audio component played at the display time of the image 610 is the audio component of a male voice, the electronic device 100 may associate the audio component of the male voice with the character Hb611 in the image 610 .
  • the embodiment of the present application does not limit the method for the electronic device 100 to identify the object included in the image.
  • the electronic device 100 may also perform audio recognition on the first audio in the first video, and determine audio components of different sounding objects in the first audio.
  • the electronic device 100 may use an audio recognition model to recognize audio components of different sound types in the first audio.
  • the aforementioned sound types may include the sound of people talking, cats barking, dogs barking, cars, rain, thunder and so on.
  • the foregoing audio recognition model may be acquired by the electronic device 100 from a local storage, or may be acquired by the electronic device 100 from a server (eg, a cloud server).
  • the aforementioned image recognition model may be a trained model based on a neural network.
  • An audio recognition model can be trained to recognize audio components of a specified sound type. A piece of audio is input to the audio recognition model, and the audio recognition model may input audio components of one or more sound types contained in the piece of audio.
  • a piece of audio can consist of one or more audio components.
  • the time-domain signal of a piece of audio can be obtained by adding the time-domain signals of the audio components that make up the piece of audio.
  • An audio component can represent the sound signal produced by a sounding object speaking over a period of time.
  • the electronic device 100 can obtain the audio components of different types of sounding objects in the audio A.
  • the first audio contains audio components produced by a plurality of characters speaking.
  • the voiceprint characteristics in different human voices are different.
  • the above-mentioned voiceprint feature can represent the sound wave spectrum carried by the sound signal.
  • the electronic device 100 may perform voiceprint feature recognition on the audio component corresponding to the human voice in the first audio, so as to distinguish the audio components of different people.
  • the electronic device 100 may perform a slice operation on the audio A by using a sliding window method to obtain multiple voice segments of the audio A.
  • the length of the window used for the slicing operation may be, for example, 0.5 second, 1 second, or any other length. This embodiment of the present application does not limit it.
  • the electronic device 100 may extract voiceprint features of each voice segment.
  • the aforementioned extraction of voiceprint features may be extraction of mel-frequency cepstral coefficients (mel-frequency cepstral coefficients, MFCC) features.
  • the electronic device 100 may detect the similarity of voiceprint features of each voice segment, and determine the number of speakers included in audio A.
  • the above-mentioned method for detecting the similarity of voiceprint features may include a similarity judgment method based on Bayesian information criterion (bayesian information criterion, BIC), a similarity judgment method based on Akaike information criterion (akaike information criterion, AIC), etc.
  • the electronic device 100 may cluster the voice segments according to the voiceprint features of each voice segment to obtain voice segments of different speakers in the audio A.
  • the method for above-mentioned clustering can comprise: utilizing Gaussian mixture model (gaussian mixture model, GMM) clustering, utilizing support vector machine (support vector machine, SVM) clustering, utilizing K-means clustering algorithm (k-means clustering algorithm, k-means) clustering, clustering using neural network-based clustering algorithms, etc.
  • the electronic device 100 may splice the speech segments of a speaker to obtain the audio component of the speaker.
  • the electronic device 100 may further optimize the above clustering result by combining features such as energy, zero-crossing rate, and formant of the speech segment, so as to improve the accuracy of classifying speech segments of different speakers.
  • the embodiment of the present application does not limit the implementation method for separating the audio components of different characters in the audio.
  • the electronic device 100 can separate audio components of different voice types from the first audio, and separate audio components of different people from audio components whose voice type is human voice. In this way, the electronic device 100 can determine the audio components of different sounding objects in the first audio.
  • the first audio also includes the second audio component of the first background sound.
  • the electronic device 100 may determine the remaining audio components in the first audio except the audio components of the sound objects as the second audio components of the first background sound.
  • the electronic device 100 may also extract the second audio component of the first background sound in the first audio through other implementation methods. This embodiment of the present application does not limit it.
  • the types of speaking objects may only include people.
  • the electronic device 100 may directly use the above method to separate the audio components of different people in the first audio, and determine other audio components other than the audio components of the people as the second audio components of the first background sound.
  • f(t) may represent the time-domain signal of the above-mentioned first audio frequency.
  • the electronic device 100 analyzes the first audio, and determines that the first audio contains two sounding objects: the first sounding object and the second sounding object.
  • the electronic device 100 may separate the first audio into an audio component of the first sounding object, an audio component of the second sounding object, and a second audio component of the first background sound.
  • f1(t) may represent the time-domain signal of the audio component of the first sounding object.
  • f2(t) may represent a time-domain signal of an audio component of the second sounding object.
  • fn(t) may represent a time-domain signal of the second audio component of the first background sound.
  • f(t) f1(t)+f2(t)+fn(t).
  • the electronic device 100 may also determine the sound pressures of different sounding objects.
  • the sound pressure of a sounding object can be obtained by integrating the time domain signal of the audio component of the sounding object.
  • the sound pressure p1 of the first sound-emitting object can be obtained by integrating f1(t).
  • the sound pressure p2 of the second sounding object can be obtained by integrating f2(t).
  • the sound pressure pn of the first background sound can be obtained by integrating fn(t).
  • the electronic device 100 may determine one or more objects included in the images in the first image group. After the above step S520, the electronic device 100 can determine the audio components included in the first audio. Further, the electronic device 100 needs to associate the object in the image with the audio component in the first audio, so as to determine which object in the image is making a sound, and which audio component is generated by an object in the image making a sound. In this way, when the image displayed on the electronic device 100 presents a sounding process of a sounding object, the corresponding audio playback device can play the audio component of the sounding object. This can make the video picture and sound synchronized in the user's perception.
  • FIG. 5D exemplarily shows a flowchart of a method for associating an object contained in an image with an audio component.
  • the method is applicable to the electronic device 100 .
  • it is specifically described by associating an object contained in an image in the first image group with an audio component in the first audio as an example.
  • An audio component is produced by which object in the image makes a sound.
  • the method may include steps S531-S538. in:
  • the video screen may include an image showing the sounding object
  • the audio in the video may include the audio component of the sounding object
  • the playing time of the above-mentioned first audio component may be a period of time.
  • the image Ga1 may be an image displayed by the electronic device 100 during playing of the first audio component.
  • Image Ga1 may contain one or more frames of images.
  • the image Ga1 is an image in the above-mentioned first group of images.
  • the first sounding object indicated by the first audio component is a person, and the electronic device 100 may determine whether the object contained in the image Ga1 is human.
  • the first sounding object indicated by the first audio component is a cat, and the electronic device 100 may determine whether the object contained in the image Ga1 has a cat.
  • the electronic device 100 may directly associate the first sounding object in the image Ga1 with the first audio component. That is, the first audio component is generated by the sound of the first sound-emitting object in the image Ga1.
  • the first sounding object indicated by the first audio component is a cat.
  • the object contained in the image Ga1 is a cat.
  • the electronic device 100 may associate the cat in the image Ga1 with the first audio component.
  • the first video may be a video of multiple people talking
  • the object contained in the image Ga1 may have multiple people
  • the first audio may contain audio components of multiple human voices.
  • the electronic device 100 needs to identify which person in the image the audio component of a human voice is associated with.
  • the electronic device 100 may execute the following step S532 to determine whether the first utterance object is a person.
  • the electronic device 100 may execute the following step S533 to determine whether the image Ga1 contains multiple persons.
  • the electronic device 100 may execute the following step S535 to directly associate the first utterance object in the image Ga1 with the first audio component.
  • the electronic device 100 may determine whether there are multiple persons in one frame of the image Ga1. If there are multiple persons in a frame of image, it is difficult for the electronic device 100 to directly determine which person in the image the first utterance object is.
  • the electronic device 100 may execute the following step S534 to determine which character in the image Ga1 is speaking.
  • the electronic device 100 may directly determine that the first utterance object is the person contained in the image Ga1. That is, the first audio component is produced by the voice of the character. Then, the electronic device 100 may execute the following step S535.
  • the image Ga1 contains a plurality of persons, perform face recognition on the plurality of persons in the image Ga1, judge the facial movements of the plurality of persons, and determine that the person 1 among the plurality of persons is speaking, and the person 1 is The first voice object.
  • the electronic device 100 may recognize the face area of the image Ga1 containing multiple persons to determine which person in the image Ga1 is speaking. Among them, the facial movements of a person when they are speaking are usually different from those when they are not speaking.
  • the electronic device 100 can extract the facial features of different people in the image Ga1, and recognize facial movements (such as mouth movements) of different people according to the facial features to determine which character is speaking.
  • the electronic device 100 may associate the person determined to be speaking in the image Ga1 with the audio component of the human voice (such as the first audio component) whose playing time is the same as that of the image Ga1. It can be understood that when it is determined according to the image of the first video that a person in the image is speaking, then the audio component of the human voice played at the display time of the image in the first video is the result of the speech of the person in the image. Audio components.
  • the electronic device 100 can associate different people in the image with different audio components of the human voice in the audio.
  • the electronic device 100 may directly execute the following step S535.
  • the association between the first sound-emitting object in the image Ga1 and the first audio component may indicate that the first audio component is produced by the first sound-emitting object in the image Ga1. That is, the first audio component is the audio component of the first sounding object in the image Ga1.
  • the first sounding object does not exist in the image Ga1, but the first sounding object may exist in an image before the first time period in the first video, such as the image Gb.
  • the above-mentioned image Gb may include one or more frames of images.
  • the electronic device 100 may execute the following step S537 to associate the first sounding object in the image Gb with the first audio component.
  • the electronic device 100 may execute the following step S538 to determine the first audio component as the second audio component of the first background sound of the first video in the first time period.
  • the image Gb includes a first sounding object
  • the electronic device 100 may associate the cat in the image Gb with the first audio component.
  • the sounding objects of the first video within the first time period include the above-mentioned cat.
  • the electronic device 100 may use the image Gb to determine the position of the sounding object.
  • a method for associating the audio component of the human voice with the object in the image provided by the embodiment of the present application is exemplified here.
  • the electronic device 100 can associate the audio component of the audio
  • the voiceprint feature of the component is associated with the face feature of the person in the image.
  • the electronic device 100 can still determine according to the face feature associated with the voiceprint feature The person associated with this audio component.
  • a video may contain images and audio of conversations between multiple people.
  • one of the multiple people may appear in the video screen for a period of time when speaking, and may not appear in the video screen for another period of time, but the audio of these two periods of time contains the audio of this person's speech.
  • the resulting audio components When the electronic device 100 associates the audio component of a human voice with the person in the corresponding image at the same time, it may associate the voiceprint feature of the audio component with the face feature of the person in the image.
  • the electronic device 100 separates the audio component with the voiceprint feature associated with the face feature of the person from the audio, It can be determined that the audio component is an audio component produced by the person speaking.
  • the electronic device 100 may determine whether an audio component of another time period having the same voiceprint feature as the first audio component, such as the audio component Fa3, has been associated with an object in the image. If the audio component Fa3 is associated with the object 1 in the image Gc, the electronic device 100 may associate the first audio component with the object 1 associated with the audio component Fa3. That is, the first audio component is the audio component produced by the object 1 making a sound. It can be seen that, in the case that the above-mentioned object 1 does not exist in the image Ga, the electronic device 100 can still determine the association relationship between the first audio component and the object 1 . Object 1 is a sounding object of the first video in the first time period. The position of the object 1 can be determined from the above-mentioned image Gc.
  • step S536 and step S537 are optional.
  • the electronic device 100 may directly execute the following step S538.
  • the electronic device 100 may determine the first audio component as the second audio component of the first background sound of the first video within the first time period. Wherein, the electronic device 100 may mix the first audio component with the separated second audio component of the first background sound (such as adding the time-domain signals of the audio components). For example, the electronic device 100 separates the audio component of the barking sound of a dog from the first audio channel, but does not recognize the dog from the images contained in the first set of images. Then, the electronic device 100 may determine the audio component of the dog barking sound as the audio component included in the second audio component of the first background sound.
  • the object in the image included in the first image group fails to be associated with the audio component in the first audio, it can be considered that the object has not made a sound during the display time of the image included in the first image group. Not the subject of sound.
  • the electronic device 100 recognizes that a tree exists in the images included in the first image group, but does not separate the audio component of the tree from the first audio. Then, the tree in the image is not a sounding object.
  • the first video may also include more sounding objects within the first time period.
  • the electronic device 100 determines the audio components of these sounding objects.
  • the audio component of the above-mentioned sounding object may have a time attribute.
  • This temporal attribute may be determined by where the audio component is located on the time axis of the video.
  • the time attribute of an audio component can be used to determine the playing time of this audio component.
  • one audio component is the audio component of the first time period on the time axis of the video.
  • the audio playback device can play this audio component within the first time period of video playback.
  • the electronic device 100 can improve the synchronization between audio playback and video image playback by using the time attribute of audio components, and reduce the situation that the audio components of different sounding objects appear ahead or behind during playback.
  • the electronic device 100 can separately analyze the image and audio in the first video, and determine the object in the image and the sounding object in the audio.
  • the electronic device 100 may determine whether the image and audio corresponding to the same time in the first video contain the same object.
  • the image and audio corresponding to the same time in the first video may represent the image displayed and the audio played at the same time. If the image and audio corresponding to the same time in the first video contain the same object, such as object 1, the electronic device 100 may associate the audio component of the object 1 in the audio with the object 1 in the image.
  • the image and the audio corresponding to the same time in the first video respectively contain a car and an audio component of the car.
  • the electronic device 100 can determine the audio component of the car in the image.
  • FIG. 5E exemplarily shows a flow chart of another method for determining a sounding object contained in a video and an audio component of the sounding object.
  • the method is applicable to the electronic device 100 .
  • the determination of the sounding object and the audio component of the sounding object contained in the first video within the first time period is specifically taken as an example for illustration.
  • the method may include steps S551-S554. in:
  • Step S551 may refer to the aforementioned step S520 in FIG. 5A . I won't go into details here.
  • the electronic device 100 may identify whether an image in the first image group includes a corresponding sounding object according to the sounding object in the first audio. For example, the electronic device 100 separates an audio component of a meowing sound from the first audio. According to the audio component of the meowing sound, the electronic device 100 can identify whether there is a cat in the images in the first image group.
  • the first image group and the first audio are data of the first video in the same time period (ie, the first time period). If the first audio contains the first audio component of the first sounding object, and the images in the first image group contain the first sounding object, the electronic device 100 may determine that the first audio component is the first sounding object within the first time period The audio component produced by the sound.
  • the electronic device 100 determines that the first utterance object is a person according to the first audio component of the first utterance object. However, the images in the first image group contain multiple persons. The electronic device 100 may determine which of the above-mentioned multiple characters the first sounding object is according to step S534 and step S535 shown in FIG. 5D , and then associate the determined character with the audio component 1 .
  • the fact that the images in the first image group do not contain the first sound-emitting object may indicate that the first audio component of the sound-emitting object can be associated with the object contained in the images in the first image group.
  • the electronic device 100 may determine the first audio component of the first sounding object in the first audio as the second audio component of the first background sound of the first video within the first time period. Wherein, the electronic device 100 may mix the first audio component with the separated second audio component of the first background sound (such as adding the time-domain signals of the audio components).
  • the electronic device 100 may also determine whether the images of the first video before the first time period include the first sounding object. If the image of the first video before the first time period contains the first sounding object, the electronic device 100 may combine the first audio component of the first sounding object with the first audio component of the first video before the first time period.
  • First utterance object associated For a specific implementation method, reference may be made to step S537 shown in FIG. 5D above. I won't go into details here.
  • the electronic device 100 can analyze the audio in the first video, and determine the sounding object and the audio component of the sounding object in the audio. According to the sounding object determined from the audio, the electronic device 100 may identify whether there is the sounding object in the audio in the image played at the same time as the audio in the first video. If there is, the electronic device 100 may associate the sounding object in the image with the audio component in the corresponding audio.
  • the electronic device 100 can perform image recognition on the image in the video only according to the sounding object in the audio, without recognizing all the objects in the image. This can save computing resources of the electronic device 100, and improve the efficiency of determining the sounding object in the video by using the matching relationship between the image and the audio.
  • FIG. 5F exemplarily shows a flowchart of another method for determining a sounding object contained in a video and an audio component of the sounding object.
  • the method is applicable to the electronic device 100 .
  • the determination of the sounding object and the audio component of the sounding object contained in the first video within the first time period is specifically taken as an example for illustration.
  • the method may include steps S561-S564. in:
  • Step S561 may refer to the aforementioned step S510 in FIG. 5A . I won't go into details here.
  • the electronic device 100 may identify whether the corresponding object is contained in the first audio according to the objects contained in the images in the first image group. For example, the electronic device 100 recognizes that the images in the first image group contain cats. According to the cat in the image, the electronic device 100 can identify whether there is an audio component of a meowing sound in the first audio. Wherein, the electronic device 100 may use the audio recognition model of meowing to determine whether there is an audio component of meowing in the first audio. If there is an audio component of meowing in the first audio, the electronic device 100 may use the audio recognition model of meowing to separate the audio component of meowing from the first audio.
  • the electronic device 100 may combine the cat included in the image in the first image group with the audio component of the meowing sound in the first audio associated.
  • the electronic device 100 may further determine which of the multiple characters is speaking according to the images. For a specific method, reference may be made to step S534 in the aforementioned method shown in FIG. 5D .
  • the electronic device 100 may use the speaking character as the sounding object in the image to identify whether there is a human voice audio component in the audio played at the display time of the image of the character speaking. If there is an audio component of the human voice, the electronic device 100 may associate the person in the image with the audio component of the human voice. Wherein, according to the aforementioned step S534, it is determined that the non-speaking person in the image is not the speaking object within the display time of the image. The electronic device 100 may not recognize an audio component in the audio played at the display time of the image with the aforementioned non-speaking character.
  • the electronic device 100 can only judge the person speaking from the images in the first group of images to identify the person speaking in the audio.
  • the electronic device 100 recognizes that the images in the first image group contain dogs.
  • the electronic device 100 may use the audio recognition model of dog barking to determine whether there is an audio component of dog barking in the first audio. If there is no audio component of dog barking in the first audio, the electronic device 100 may determine that the dog included in the image Ga has not made a sound within the first time period of the first video.
  • the electronic device 100 may associate one or more objects included in the images in the first image group with the audio components in the first audio . In this way, the electronic device 100 can determine the sounding object and the audio component of the sounding object in the first video within the first time period. Furthermore, the electronic device 100 may determine the audio components in the first audio except the audio components of all sound objects as the second audio components of the first background sound.
  • the electronic device 100 can analyze the images in the first video to determine the objects existing in the images. According to the object determined from the image, the electronic device 100 may identify whether the object in the image exists in the audio played at the same time as the image in the first video. If present, the electronic device 100 may associate an audio component in the audio with a corresponding object in the image.
  • the electronic device 100 can perform audio recognition on the audio only according to the objects in the images contained in the video, without identifying all the sounding objects in the audio and separating the audio components of all the sounding objects from the audio . This can save computing resources of the electronic device 100, and improve the efficiency of determining the sounding object in the video by using the matching relationship between the image and the audio.
  • the electronic device 100 In addition to separating the audio components (such as the first audio component of the first sounding object and the second audio component of the first background sound) from the first audio contained in the first video in the first time period, the electronic device 100 also needs to determine The audio component is played by which audio playback device. Wherein, the electronic device 100 may determine the corresponding relationship between the audio playback device and the sounding object. The audio playing device can play the audio component of the sounding object corresponding to itself. The electronic device 100 may also select one or more audio playback devices from all audio playback devices to play the second audio component of the first background sound.
  • the electronic device 100 may determine the corresponding relationship between the audio playback device and the sounding object according to the position of the sounding object, the position of the audio playback device, and the position of the observer.
  • the following specifically introduces an implementation method for determining the position of a sounding object in a video provided by the embodiment of the present application.
  • FIG. 6A exemplarily shows a schematic diagram of a method for determining a position of a sounding object in a video.
  • the determination of the position of the sounding object in the first video within the first time period is specifically taken as an example for illustration.
  • the electronic device 100 may input the images in the first image group into the three-dimensional object reconstruction model to obtain the position of the sounding object in the image and the position of the virtual camera that captures the image.
  • the images input to the above three-dimensional object reconstruction model may include one or more frames of images.
  • the position of the above-mentioned sounding object and the position of the virtual camera are the position of the sounding object and the position of the virtual camera in the first video in the first time period.
  • the above three-dimensional object reconstruction model can be used to perform three-dimensional reconstruction on the images in the first image group.
  • shooting an object in a three-dimensional space can obtain a two-dimensional image.
  • the position of the object in the three-dimensional space playback has a corresponding relationship with the position in the two-dimensional image.
  • the aforementioned three-dimensional reconstruction of an image may refer to using a two-dimensional image showing a three-dimensional scene or object as basic data, and processing the basic data to obtain three-dimensional data of the scene or object, thereby generating a three-dimensional scene or object.
  • the above-mentioned three-dimensional object reconstruction model may be a model based on a neural network, which can realize reconstruction from a two-dimensional image to a three-dimensional model.
  • the 3D object reconstruction model can be trained to perform 3D reconstruction of a specified type of object in an image and a virtual camera that captures the image. Input a frame of image or multiple frames of images into the 3D object reconstruction model, the 3D object reconstruction model can input the position of one or more objects in the 3D space of this frame of image or multiple frames of images, and the position of the virtual camera in the 3D space .
  • the embodiment of the present application does not limit the training method of the above three-dimensional object reconstruction model.
  • the foregoing three-dimensional object reconstruction model may be obtained by the electronic device 100 from a local storage, or may be obtained by the electronic device 100 from a server (eg, a cloud server). This application is not limited to this.
  • the electronic device 100 may input a frame of image contained in the first image group into the above-mentioned three-dimensional object reconstruction model, and obtain the position of the sounding object in this frame of image and the virtual camera that captured this frame of image s position.
  • the first image group may include multiple frames of images.
  • the electronic device 100 can input the multi-frame images contained in the first image group into the three-dimensional object reconstruction model frame by frame according to the order of display time, to determine the position of the sounding object in each frame of image and the virtual camera that captures the corresponding frame of image s position. In this way, the accuracy of positioning the sounding object and the virtual camera in the video can be improved, and errors caused by changes in the position of the sounding object or the virtual camera can be reduced.
  • the electronic device 100 may also select a frame of images every preset number of frames from the multiple frames of images included in the first image group.
  • the electronic device 100 may input the selected image into the three-dimensional object reconstruction model to determine the position of the sounding object in the selected image and the position of the virtual camera that captures the selected image.
  • the frame rate of the video may be, for example, 24 frames per second, 60 frames per second, and so on.
  • the position of the sounding object in the video and the position of the virtual camera usually do not change within a short period of time (such as within the time of two consecutive frames of images, or within the time of five consecutive frames of images).
  • the electronic device 100 determines the position of the sounding object and the position of the virtual camera in the video at preset intervals without causing large errors. This can save computing resources of the electronic device 100 and reduce requirements on the computing capability of the electronic device 100 .
  • the electronic device 100 uses the three-dimensional object reconstruction model to determine the position of the sounding object in a frame of image and the position of the virtual camera that captures this frame of image, as shown in FIG. 6B .
  • the coordinate system Ow-Xw-Yw-Zw may be a world coordinate system.
  • the three-dimensional object reconstruction model can use the world coordinate system as a reference coordinate system to describe the position of the sounding object and the position of the virtual camera.
  • the embodiment of the present application does not limit the establishment method of the above coordinate system Ow-Xw-Yw-Zw.
  • the coordinate system Ow-Xw-Yw-Zw can be a left-handed coordinate system or a right-handed coordinate system. In this application, a left-handed coordinate system is taken as an example for illustration.
  • the above-mentioned one frame of image may include four sounding objects.
  • the position of the first sounding object may be (x_a, y_a, z_a).
  • the position of the second sounding object may be (x_b, y_b, z_b).
  • the position of the third sounding object may be (x_c, y_c, z_c).
  • the position of the fourth sounding object may be (x_d, y_d, z_d).
  • the position of the virtual camera can be (x_e, y_e, z_e).
  • the electronic device 100 may project the above-mentioned positional relationship between the sounding object and the virtual camera from a three-dimensional coordinate system to a two-dimensional coordinate system.
  • the electronic device 100 can project in the direction of the Zw axis of the above-mentioned coordinate system Ow-Xw-Yw-Zw, and establish the coordinate system Xc-Oc shown in FIG. 6C with the position of the virtual camera as the origin (0,0). -Yc.
  • the Yc axis of the coordinate system Xc-Oc-Yc may be a straight line parallel to the optical axis of the virtual camera, and the Xc axis may be a straight line perpendicular to the Yc axis.
  • the embodiment of the present application does not limit the method for determining the Yt axis and the Xt axis in the coordinate system Xc-Oc-Yc.
  • the electronic device 100 can determine the position of the above-mentioned sounding object in the coordinate system Xc-Oc-Yc.
  • the position of the first sounding object may be (x5, y5).
  • the position of the second sounding object may be (x6, y6).
  • the position of the third sounding object may be (x7, y7).
  • the position of the fourth sounding object may be (x8, y8).
  • the position of the sounding object shown in FIG. 6B and FIG. 6C is the position of the sounding object in the second scene (ie, the virtual scene), and the position of the virtual camera is the position of the virtual camera in the second scene.
  • the position of the above-mentioned sounding object and the position of the virtual camera may have a time attribute.
  • the position of the sounding object and the time attribute of the position of the virtual camera can be determined by the position of the image that determines the position on the time axis of the video.
  • the position of the virtual camera has a time attribute that can be used to determine when the virtual camera is at a position.
  • the location of a sounding object has a time attribute that can be used to determine how long a sounding object was at a location. It can be understood that if the time attribute of the location of the sounding object matches the time attribute of the audio component of the sounding object, the audio component may be an audio component generated by the sounding object at the position. Alternatively, if the audio component of the sounding object is associated with the sounding object on the image, the audio component may be an audio component produced by the sounding object at a position determined according to the image.
  • the electronic device 100 can adjust the audio playback device that simulates the sounding of the sounding object in real time in combination with the above time attributes. This can enable the user to more realistically experience the process of the sounding objects in the video making sounds in different directions following the perspective of the virtual camera.
  • the electronic device 100 can determine the position of the virtual camera and the position of the sounding object in the first video in real time.
  • the following specifically introduces an implementation method for acquiring the location of an audio playback device provided by the embodiment of the present application.
  • the electronic device 100 may obtain the position of the audio playback device by using an ultrasonic positioning method.
  • both the electronic device 100 and the audio playback device may have an audio output device (such as a speaker) and an audio input device (such as a microphone).
  • the above-mentioned audio output device can emit ultrasonic waves.
  • the above-mentioned audio input device can receive ultrasonic waves.
  • the frequency of ultrasonic waves exceeds the highest threshold of 20,000 Hz that can be heard by the human ear. That is, people cannot perceive the ultrasonic waves in the environment. Then the ultrasonic positioning of the electronic device 100 will not affect the user's ability to listen to other sounds in the environment.
  • the electronic device 100 may acquire directions and distances of multiple audio playback devices relative to itself one by one.
  • the acquisition of the position of the audio playback device 200 is taken as an example for description.
  • FIG. 6D exemplarily shows a flowchart of a method for acquiring the location of an audio playback device.
  • the method may include steps S611-S615. in:
  • the electronic device 100 sends an ultrasonic emission instruction to the audio playback device 200, where the ultrasonic sound emission instruction is used to instruct the audio playback device 200 to emit ultrasonic waves.
  • the electronic device 100 may send an ultrasonic wave transmission instruction to the audio playback device 200 through the communication connection between itself and the audio playback device 200 .
  • the ultrasonic emission instruction can be used to instruct the audio playback device 200 to emit ultrasonic waves.
  • the audio playback device 200 emits ultrasonic waves.
  • the audio playback device 200 may emit ultrasonic waves.
  • the electronic device 100 receives the ultrasonic wave from the audio playback device 200, and obtains the direction of the audio playback device 200 according to the ultrasonic wave.
  • the electronic device 100 may receive ultrasonic waves from the audio playback device 200 .
  • the audio input device of the electronic device 100 may include a microphone array.
  • the microphone array can be understood as a plurality of microphones distributed according to specified rules (such as three rows and three columns, five rows and five columns, etc.).
  • the electronic device 100 can obtain the direction of the ultrasonic wave by using the time difference when multiple microphones in the microphone array receive the ultrasonic wave.
  • the direction of the ultrasonic wave is the direction of the audio playback device 200 . In this way, the electronic device 100 can obtain the direction of the audio playback device 200 relative to itself.
  • the specific implementation process of obtaining the direction of the audio playback device through the microphone array please refer to the Chinese invention patent application with the application number 202011556351.2. I won't repeat them here. It should be noted that all content about positioning in the Chinese invention patent application with application number 202011556351.2 is incorporated into this application, and is within the scope of this application.
  • the electronic device 100 emits ultrasonic waves.
  • the electronic device 100 obtains the distance between the electronic device 100 and the audio playback device 200 according to the time difference between the time when the difference sound wave is emitted and the time when the reflected ultrasonic wave is received in the direction of the audio playback device 200 .
  • the electronic device 100 can emit ultrasonic waves and receive the reflected ultrasonic waves. Wherein, the electronic device 100 can obtain the distance of the audio playback device 200 relative to itself through the time ts of the ultrasonic wave emitted and the time tr of the received ultrasonic wave reflected from the direction of the audio playback device 200 .
  • the distance between the audio playback device 200 and the electronic device 100 may be C*(tr-ts)/2.
  • C may be the propagation speed of ultrasonic waves. In air, C can have a value of 340 m/s.
  • the audio playback device 200 may stop emitting ultrasonic waves.
  • the ultrasonic waves received by the electronic device 100 are reflected ultrasonic waves emitted by the electronic device 100 .
  • the electronic device 100 may send an instruction to stop emitting ultrasonic waves to the audio playback device 200 before emitting ultrasonic waves.
  • the audio playback device 200 may stop emitting ultrasonic waves.
  • the audio playback device 200 may emit ultrasonic waves within a preset time period (such as 1 second, 2 seconds, etc.). When the emission of ultrasonic waves is stopped, the audio playback device 200 may send an ultrasonic wave stop message to the electronic device 100 . The electronic device 100 may transmit the ultrasonic wave after receiving the ultrasonic wave stop message.
  • a preset time period such as 1 second, 2 seconds, etc.
  • the electronic device 100 can obtain the directions and distances of other audio playback devices other than the audio playback device 200 relative to itself through the ultrasonic positioning method shown in FIG. 6D above.
  • the electronic device 100 may emit ultrasonic waves before an audio playback device is placed, and receive the reflected ultrasonic waves. Then, the electronic device 100 may emit ultrasonic waves again after the audio playback device is placed, and receive the reflected ultrasonic waves. The electronic device 100 can obtain the position of the audio playback device by comparing the received reflected ultrasonic waves before and after the audio playback device is placed.
  • the electronic device 100 may have an image acquisition device (such as a camera).
  • the electronic device 100 may collect an image including an audio playback device.
  • the electronic device 100 may perform image recognition on the above image to obtain the model of the audio playback device, thereby obtaining the actual size of the audio playback device. According to the position of the audio playback device in the image and the ratio of the actual size of the audio playback device to the size of the audio playback device in the image, the electronic device 100 can acquire the position of the audio playback device.
  • the electronic device 100 may also take images containing audio playback devices at different focal lengths.
  • the electronic device 100 may recognize the images captured by an audio playback device at different focal lengths to obtain the location of the audio playback device.
  • the electronic device 100 may also instruct other devices (such as a video analysis device) to recognize the image. This embodiment of the present application does not limit it.
  • the electronic device 100 may also obtain the position of each audio playback device through positioning technologies such as millimeter wave radar positioning and UWB positioning.
  • positioning technologies such as millimeter wave radar positioning and UWB positioning.
  • UWB positioning For the specific implementation process of positioning the audio playback device through millimeter wave radar positioning and UWB positioning, please refer to the Chinese invention patent application with application number 202111243798.9. I won't repeat them here. It should be noted that all content about positioning in the Chinese invention patent application with application number 202111243798.9 is incorporated into this application, and is within the scope of this application.
  • the electronic device 100 may obtain the position of the observer through an ultrasonic positioning method. Specifically, the electronic device 100 can emit ultrasonic waves and receive the reflected ultrasonic waves. The electronic device 100 can recognize a human-shaped object (such as a standing human shape, a sitting human shape, etc.) through the reflected ultrasonic waves. The target can be thought of as an observer. The electronic device 100 can obtain the direction and position of the target relative to the electronic device 100 according to the ultrasonic wave corresponding to the human-shaped target among the reflected ultrasonic waves. In this way, the electronic device 100 can obtain the position of the observer.
  • a human-shaped object such as a standing human shape, a sitting human shape, etc.
  • the electronic device 100 can obtain the direction and position of the target relative to the electronic device 100 according to the ultrasonic wave corresponding to the human-shaped target among the reflected ultrasonic waves. In this way, the electronic device 100 can obtain the position of the observer.
  • the electronic device 100 may collect an image including the photographer.
  • the electronic device 100 can identify the positional relationship between the photographer and one or more audio playback devices in the image. Further, the electronic device 100 may combine the positions of one or more audio playback devices to acquire the position of the observer.
  • the specific implementation process of positioning the observer through millimeter-wave radar positioning and UWB positioning please refer to the Chinese invention patent application with application number 202111243798.9. I won't repeat them here. It should be noted that all content about positioning in the Chinese invention patent application with application number 202111243798.9 is incorporated into this application, and is within the scope of this application.
  • the number of the above observers can be one or more.
  • the embodiment of the present application does not limit the implementation method for the electronic device 100 to acquire the position of the observer.
  • the electronic device 100 may also obtain the position of the observer through positioning technologies such as millimeter-wave radar positioning, UWB positioning, and infrared positioning.
  • the electronic device 100 can establish a coordinate system with the position of the observer as the origin to determine the electronic device 100 and each audio playback device. The positional relationship with the observer.
  • FIG. 6E exemplarily shows a schematic diagram of the positions of the audio playback device and the electronic device 100 in a coordinate system established with the observer's position as the origin.
  • four audio playback devices are taken as an example for illustration.
  • the origin (0,0) of the coordinate system Xt-Ot-Yt is the position of the observer
  • the Yt axis can be a straight line parallel to the orientation of the display screen of the electronic device 100
  • the Xt axis can be a line parallel to the Yt axis vertical straight line.
  • the embodiment of the present application does not limit the method for determining the Yt axis and the Xt axis in the coordinate system Xt-Ot-Yt.
  • the electronic device 100 can determine the position of itself and each audio playback device in the coordinate system Xt-Ot-Yt according to the directions and distances of each audio playback device and the observer relative to itself. For example, in the coordinate system Xt-Ot-Yt, the coordinates of the electronic device 100 are (0, y0). The coordinates of the audio playback device 200 are (x1, y1). The coordinates of the audio playback device 201 are (x2, y2). The coordinates of the audio playback device 202 are (x3, y3). The coordinates of the audio playback device 203 are (x4, y4).
  • the position of the observer shown in FIG. 6E is the position of the observer in the first scene (ie, the real scene), and the position of the audio playback device is the position of the audio playback device in the first scene.
  • the electronic device 100 may determine whether all audio playback devices are located on its side. When it is determined that all audio playback devices are located on one side of the electronic device 100, the electronic device 100 can prompt the user to adjust the position of the audio playback devices so that there are audio playback devices on both sides of the electronic device 100, and reacquire the audio playback. The location of the device.
  • the viewer will face the display screen of the electronic device 100 when watching the video.
  • multiple audio playback devices can make the sound of part of the sounding object reach the observer from the left side of the observer when playing the audio in the video, and make the sound of another part of the sounding object be From the observer's right to the observer's. Then, the audio playback devices are distributed on the left side and the right side of the electronic device 100, so that the audio playback devices can better simulate the sounding process of the sounding object in the video.
  • the electronic device 100 may determine whether all audio playback devices are located on its side. Exemplarily, the electronic device 100 may compare the coordinate values of the audio playback devices 200 - 203 and the electronic device 100 on the Xt axis in the coordinate system shown in FIG. 6E . If the values of the coordinates of the audio playback devices 200-203 on the Xt axis are all less than or greater than the values of the coordinates of the electronic device 100 on the Xt axis, the electronic device 100 can determine that the audio playback devices 200-203 are all located in the electronic device. 100 on one side.
  • the electronic device 100 may prompt the user to adjust the positions of the audio playback devices. For example, the user may move some audio playback devices among all the audio playback devices to a side of the electronic device 100 where no audio playback devices are distributed.
  • the electronic device 100 may reacquire the location of the audio playback device.
  • the electronic device 100 may only re-acquire the position of the audio playback device whose position has changed.
  • the embodiment of the present application does not limit the time when the electronic device 100 obtains the location of each audio playback device.
  • the electronic device 100 may obtain the location of the one or more audio playback devices.
  • the electronic device 100 may obtain the location of the audio playback device and the observer before playing the video.
  • the electronic device 100 may acquire the positions of the audio playing device and the observer at regular or irregular intervals.
  • the electronic device 100 may provide the user with options for playing the video: a normal playback mode and a stereo playback mode.
  • the above-mentioned normal playback mode may mean that the electronic device 100 does not divide the audio components of different sounding objects in the video, and distributes them to different audio playback devices for playback. During a video playback, all audio playback devices can play the same audio.
  • the above stereo playback mode may mean that the electronic device 100 puts the audio components of different sound objects to different audio playback devices for playback according to the method for cooperatively playing audio in video playback provided in this application.
  • the electronic device 100 may acquire the position of the audio playback device and the observer before playing the video.
  • the following specifically introduces an implementation method for determining the positional relationship among the sounding object, the audio playback device, and the observer provided by the embodiment of the present application.
  • the electronic device 100 can determine the position of the virtual camera in the second scene from the position of the observer in the first scene, and obtain the relationship between the sounding object, the audio playback device, and the observer in the first scene. location relationship.
  • the electronic device 100 can overlap the coordinate system Xc-Oc-Yc shown in FIG. 6C with the coordinate system Xt-Ot-Yt shown in FIG. -Position in Ot-Yt.
  • the origin of the coordinate system Xt-Ot-Yt may be the position of the observer in the first scene, that is, the position of the virtual camera in the second scene.
  • the positions of the first to fourth sounding objects and the audio playback devices 200 to 203 in the two-dimensional coordinate system reference may be made to the introduction of the foregoing embodiments.
  • more or less sounding objects may be included in the first video.
  • the positional relationship among the sounding object, the audio playback device, and the observer mentioned in the embodiment of the present application may refer to the positional relationship among the sounding object, the audio playback device, and the observer in the first scene.
  • determining the position of the observer in the first scene as the position of the virtual camera in the second scene can enable the user to stand on the virtual camera
  • the angle of view changes with the change of the camera position, so that you can personally experience the three-dimensional scene represented by the video screen, and feel that different sounding objects make sounds in different directions.
  • the electronic device 100 may also determine the position having a mapping relationship with the position of the observer in the first scene as the position of the virtual camera. position in the second scene.
  • the position having a mapping relationship with the position of the observer in the first scene may be a position at a distance of 1 meter in the first direction of the position of the observer in the first scene.
  • the embodiment of the present application does not specifically limit the foregoing mapping relationship.
  • the embodiment of the present application does not limit the time when the electronic device 100 obtains the position of the above-mentioned audio playback device and the position of the observer, and determines the position of the sounding object and the position of the virtual camera.
  • the electronic device 100 may acquire the location of each audio playback device when establishing a communication connection with each audio playback device for the first time. When a new audio playback device is added, the electronic device 100 can establish a communication connection with this audio playback device, and obtain the position of this audio playback device. Because the number of changes in the position of the audio playback device is relatively small.
  • the electronic device 100 may store the location of each audio playback device.
  • the electronic device 100 may reacquire the location of the audio playback device and update it in the memory.
  • the electronic device 100 may acquire the location of the audio playback device from the memory. In this way, the electronic device 100 does not need to calculate the position of the audio playback device every time the audio component played by the audio playback device is determined during video playback.
  • the electronic device 100 can calculate the position of the observer, the position of the sounding object in the video, and the position of the virtual camera in real time during the video playing process.
  • the electronic device 100 may determine the corresponding relationship between the audio playback device and the sounding object.
  • the following specifically introduces an implementation method for determining the corresponding relationship between an audio playback device and a sounding object provided by the embodiment of the present application.
  • an audio playback device simulating the sounding of a sounding object can be equivalent to playing the audio component of the sounding object when the audio playing device is located at the position of the sounding object.
  • FIG. 7A exemplarily shows a flow chart of a method for determining the correspondence between an audio playback device and a sounding object.
  • the determination of the audio playback device corresponding to the first utterance object is specifically taken as an example for description.
  • the method may include steps S711-S717. in:
  • step S713 If there is only one audio playback device selected in step S711, instruct the audio playback device selected in step S711 to play the first audio component of the first sounding object, and the selected audio playback device is the same as the first sounding object correspond.
  • step S714 If there are multiple audio playback devices selected in step S711, select the audio playback device with the smallest distance from the first sounding object from the audio playback devices selected in step S711.
  • step S716 if there is only one audio playback device selected in step S714, then instruct the audio playback device selected in step S714 to play the first audio component of the first sounding object, the selected audio playback device and the first sounding object correspond.
  • step S714 select an idle audio playback device from the audio playback devices selected in step S714, or select an audio playback device with the least number of corresponding sounding objects, and instruct step S717
  • the selected audio playback device plays the first audio component of the first sounding object, and the selected audio playback device in step S717 corresponds to the first sounding object.
  • the above-mentioned idle audio playback device may refer to an audio playback device that does not play other audio components within the playback time of the first audio component of the first sounding object.
  • the above steps S712 to S716 are optional.
  • the electronic device 100 may select multiple audio playback devices.
  • the electronic device 100 may select an audio playback device from the multiple audio playback devices according to the above step S717.
  • the above step S717 is also optional.
  • the electronic device 100 selects a plurality of audio playback devices in the above step S711, the electronic device 100 may arbitrarily select an audio playback device from the plurality of electronic devices to play the first audio component of the first sounding object.
  • the electronic device 100 when the electronic device 100 selects a plurality of audio playback devices in the above step S714, the electronic device 100 can arbitrarily select an audio playback device from the plurality of electronic devices to play the first audio of the first sounding object portion.
  • the electronic device 100 may directly select the audio playback device corresponding to the first sound-emitting object according to the distance between the first sound-emitting object and each audio playback device. Specifically, the electronic device 100 may calculate the distance between the first sound-emitting object and each audio playback device. The electronic device 100 may select an audio playback device with the shortest distance to the first sounding object to play the first audio component of the first sounding object.
  • FIG. 7B exemplarily shows four audio playback devices (audio playback device 200, audio playback device 201, audio playback device 202, audio playback device 203), one sounding object (first sounding object) and electronic device 100.
  • the coordinate system Xt-Ot-Yt shown in FIG. 7B can refer to the introduction of the aforementioned FIG. 6F.
  • the included angle between the first sounding object and the straight line L1 where the observer is located, and between the audio playback device 200 and the straight line L2 where the observer is located is ⁇ 1.
  • ⁇ 1 is smaller than the angle between the above-mentioned straight line L2, the audio playback device 201 and the straight line L3 where the observer is located.
  • ⁇ 1 is smaller than the angle between the above-mentioned straight line L2, the audio playback device 202 and the straight line L4 where the observer is located.
  • ⁇ 1 is smaller than the angle between the above-mentioned straight line L2, the audio playback device 203 and the straight line L5 where the observer is located.
  • the electronic device 100 may associate the first sounding object with the audio playback device 200 .
  • the audio playing device 200 can play the audio component of the sounding object corresponding to itself.
  • the electronic device 100 may instruct the audio playback device 200 to play the audio component of the first sound-emitting object when it is at the position shown in FIG. 7B .
  • the audio playback device 200 can play the audio component.
  • the electronic device 100 may The image at the frame time determines the position of the first sound-emitting object. If the image when the first sounding object appeared on the video screen last time indicates that the first sounding object is located at the position shown in FIG. 7B , the audio playback device 200 may play the first audio component of the first sounding object.
  • the position of a sounding object in the second scene may change. Then the corresponding relationship between the sounding object and the audio playback device may change as the position of the sounding object changes.
  • the electronic device 100 may re-determine the positional relationship among the first sounding object, the audio playback device, and the observer. If the changed position of the first sounding object is closest to the position of the audio playback device 201 , the electronic device 100 may instruct the audio playback device 201 to play the audio component when the first sounding object is located at the changed position. That is to say, the electronic device 100 can detect whether the position of the sounding object changes in real time, so as to adjust the audio playback device for playing the audio component of the sounding object in real time.
  • the observer's position in the first scene may change.
  • the corresponding relationship between the sounding object and the audio playback device may change with the position of the observer.
  • the electronic device 100 may re-determine the positional relationship among the first sound-emitting object, the audio playback device, and the observer. If after the position of the observer changes, the electronic device 100 determines that the audio playback device corresponding to the first sounding object has changed to the audio playback device 202 according to the method shown in FIG.
  • the first audio component of the first sound-emitting object after the change That is to say, the electronic device 100 can detect whether the position of the observer changes in real time, so as to adjust the audio playback device for playing the audio component of the sounding object in real time.
  • the electronic device 100 can also detect in real time whether the position of the audio playback device changes. A change in the position of the audio playback device will affect the positional relationship between the audio playback device and the sounding object.
  • the electronic device 100 may re-determine the corresponding relationship between the sounding object and the audio playing device according to the position of the audio playing device, the position of the sounding object, and the position of the observer after detecting the change of the position of the audio playing device.
  • the electronic device 100 may also select an audio playback device for playing the second audio component of the first background sound.
  • the electronic device 100 may select an audio playback device from multiple audio playback devices to play the second audio component of the first background sound.
  • the aforementioned audio playback device that plays the second audio component of the first background sound may be an audio playback device in an idle state.
  • the electronic device 100 can select at least one audio playback device from the audio playback devices located on one side (such as the left side) of the electronic device 100, and select at least one audio playback device from the audio playback device located on the other side (such as the right side) of the electronic device 100. At least one audio playback device is selected to play the second audio component of the first background sound.
  • the aforementioned one side of the electronic device 100 and the other side of the electronic device 100 may be one side and the other side in the direction in which the display screen of the electronic device 100 is facing.
  • the electronic device 100 may preferentially select an idle audio playback device to play the second audio component of the first background sound.
  • There are audio playback devices on both sides of the electronic device 100 to play the second audio component of the first background sound which can better form a stereo sound in the video playback environment, and help users perceive the sounding process of sounding objects in different directions.
  • the electronic device 100 may adjust an audio playback device for simulating the sounding of a sounding object. That is, the audio playback device that is idle during video playback may change.
  • the electronic device 100 may preferentially select an idle audio playback device to play the second audio component of the first background sound. Then, during the video playing process, the audio playing device for playing the second audio component of the first background sound may be changed.
  • the electronic device 100 may select the audio playback device that requires the least number of simulated sound objects to play the second audio component of the first background sound.
  • the audio playback device can play the audio component of the sound object according to the playback parameters of the audio component of the sound object, and play the second audio component of the first background sound according to the playback parameters of the second audio component of the first background sound .
  • the electronic device 100 may further determine a playback parameter for the audio playback device to play the audio component. In this way, the electronic device 100 can instruct the audio playback device to adjust the playback parameters to play the corresponding audio component, so that when the user distinguishes the sound of different sounding objects during video playback, the position of the sound source of the sound from the audio playback device can be determined as the The position of the sound-emitting object simulated by the audio playback device.
  • the following specifically introduces an implementation method for determining playback parameters provided by the embodiment of the present application.
  • the electronic device 100 may instruct the audio playback device to adjust the playback time and sound intensity, so that when the audio playback device plays the audio component corresponding to the sounding object of the audio playback device, the user can hear the sound from the It is emitted from the location of the sound-emitting object corresponding to the audio playback device.
  • the electronic device 100 may use the playing time of the second audio component of the first background sound as a reference, and adjust the playing time of the audio components of different sounding objects in the audio played by each audio playback device to change the The phase at which audio reaches the observer's ears. That is, by adjusting the time difference between each audio playback device playing the audio component of a sounding object and the second audio component of the first background sound, the observer's discrimination result of the sounding direction of the sounding object can be adjusted.
  • the playing time of the second audio component of the first background sound as a reference may be determined according to the time attribute of the second audio component of the first background sound in the video.
  • the above-mentioned difference in sound level may represent a difference in sound intensity.
  • the sound intensity will gradually attenuate.
  • Sound intensity has a positive correlation with sound pressure. The lower the sound intensity, the lower the sound pressure. The farther the sound source is from the observer, the lower the sound pressure will reach the user's ears when the sound is played at the same sound intensity.
  • the electronic device 100 may use the sound intensity of the second audio component of the first background sound as a reference to adjust the sound intensity of the audio components of different sounding objects in the audio played by each audio playback device, so as to change the sound intensity of each audio component reaching the observer's ears. Sound pressure.
  • the electronic device 100 may determine the sound intensity of the audio component for playing the above-mentioned sound intensity as a reference by using the formula (1) in the foregoing embodiment.
  • the sound pressure used for calculating the sound intensity may be obtained by integrating the time domain signal of the audio component.
  • the distance between the audio playback device 200 and the observer is D1.
  • the distance between the first sounding object and the observer is D2.
  • the electronic device 100 may determine the sound pressure p1' of the first audio component of the first sound-emitting object at the audio playback device 1 according to the following formula (2).
  • pn may represent the sound pressure of the second audio component of the first background sound.
  • the electronic device 100 may use the above p1' and the above formula (1) to determine the sound intensity of the first audio component of the first sound-emitting object played by the audio playback device 200 .
  • the electronic device 100 can adjust the sound intensity of the audio components of each sounding object by taking the sound intensity of the second audio component of the first background sound as a reference.
  • the electronic device 100 may select an audio playback device to play the second audio component of the first background sound.
  • the electronic device 100 can determine the sound intensity of the second audio component of the first background sound through the above pn and the aforementioned formula (1).
  • the electronic device 100 may select at least two audio playback devices to play the second audio component of the first background sound.
  • the electronic device 100 may indicate that one of the multiple audio playback devices playing the first background sound is the reference audio playback device.
  • the above-mentioned reference audio playing device may play the second audio component of the first background sound at the sound intensity of the second audio component.
  • the electronic device 100 may determine the sound intensity of the second audio component of the first background sound played by other audio playback devices according to the distance between the reference audio playback device and other audio playback devices that play the second audio component of the first background sound relative to the observer .
  • the farther the audio playback device is from the observer the greater the sound intensity of the second audio component of the first background sound played by the audio playback device. This can make the sound intensity of the first background sound the same when the first background sound played by different audio playback devices reaches the observer.
  • the electronic device 100 may select the audio playback device 201 and the audio playback device 202 to play the second audio component of the first background sound.
  • the distance between the audio playback device 201 and the observer is D3.
  • the distance between the audio playback device 202 and the observer is D4.
  • the audio playback device 201 is the aforementioned reference audio playback device.
  • the electronic device 100 may determine the sound pressure pn3 of the second audio component of the first background sound at the audio playback device 202 according to the following formula (3).
  • the electronic device 100 can determine that the sound intensity of the second audio component of the first background sound played by the audio playback device 202 is In3.
  • In3 pn3 2 /( ⁇ *C).
  • the embodiment of the present application does not limit the sound intensity used as a reference.
  • the electronic device 100 may also use the sound intensity of the audio component of a sounding object in the video as a reference to determine that the audio playback device plays the sound of the audio components of other sounding objects. strength.
  • the audio component whose sound intensity is used as a reference and other audio components whose sound intensity is to be determined may be different audio components determined from audio within a time period in the video.
  • the electronic device 100 separates the audio components of multiple sounding objects and the second audio component of the first background sound from the audio of the first time period.
  • the electronic device 100 may use the sound intensity of the second audio component of the first background sound within the first time period as a reference to determine the sound intensity of the audio components of the multiple sounding objects played by the audio playback device within the first time period.
  • the electronic device 100 may determine the playing time T′ for the audio playing device 200 to play the first audio component of the first sounding object according to the following formulas (4-1) and (4-2).
  • T′ T1+ ⁇ T (4-1)
  • T1 may be the playing time of the audio component determined based on the time attribute of the first audio component of the first sounding object in the video. That is, T1 is the original playback time of the first audio component of the first sound-emitting object before the above-mentioned playback time adjustment.
  • C represents the speed of sound, and the value can be 340 m/s.
  • D1 may be the straight-line distance for the audio playback device 200 to play the sound of the first audio component of the first sound-emitting object to reach the observer.
  • the electronic device 100 adjusts the playback time of the audio playback device to play the first audio component of the first sounding object, and may adjust the user's discrimination result of the first sounding object's sounding object. This may be equivalent to moving the audio playback device 200 on a tangent to the location of the audio playback device 200 shown in FIG. 7B .
  • determining the time difference for adjusting the playing time of the first audio component of the first sounding object as the aforementioned ⁇ T may be equivalent to moving the audio playing device 200 to the position S1 shown in FIG. 7B .
  • the above ⁇ T may be the time required for the sound to propagate along a straight line from the position S1 shown in FIG. 7B to the position S2.
  • the above position S1 may be the intersection of the tangent line where the audio playback device 200 is located and the straight line where the first sound emitting object is located and the observer. That is, the direction of the position S1 relative to the observer is the same as the direction of the first sound-emitting object relative to the observer.
  • the above-mentioned position S2 may be a position where the distance between the first sound-emitting object and the observer is D1 on the straight line where the observer is located.
  • the distance between position S1 and position S2 is D0. It can be seen that, when the value of D1 is constant, the smaller ⁇ 1 is, the smaller the value of D0 is, and the smaller the value of ⁇ T is.
  • the smaller ⁇ 1 above may indicate that the direction of the audio playback device 200 relative to the observer is closer to the direction of the first sound-emitting object relative to the observer.
  • the smaller ⁇ 1 is, in order to make the audio playback device 200 play the first audio component of the first sounding object, the user's sense of hearing is that the sound comes from the direction of the first sounding object, and the audio playback device 200 can adjust the playing time smaller.
  • the adjustment amount of the audio playback device 200 to the playback time of the first audio component of the first sounding object can be 0 ( That is, ⁇ T is 0).
  • the original playing time of the first audio component of the first sounding object starts from the first minute, 30 seconds and 10 milliseconds of the video.
  • the distance D1 between the audio playback device 200 and the observer is 8 meters.
  • the aforementioned ⁇ 1 is 60°.
  • the electronic device 100 may determine that the above time difference ⁇ T is 23.5 milliseconds. Then, the electronic device 100 may instruct the audio playback device 200 to start playing the first audio component of the first sounding object at the first minute, 30 seconds, 33.5 milliseconds of the video.
  • the adjustment range of the above playback time is usually on the order of milliseconds.
  • the time difference between the time when the audio is played and the time when the corresponding image is displayed is within 100 milliseconds, which is generally acceptable. That is, the user basically does not feel that the video picture is out of sync with the sound.
  • the distance between the audio playback device and the user usually does not make the above-mentioned time difference of adjusting the playback time exceed 100 milliseconds. Then, the audio playback device adjusts the playback time to play the corresponding audio component without causing the user to feel that the video picture is out of sync with the sound.
  • the electronic device 100 may select an audio playback device to play the second audio component of the first background sound.
  • the electronic device 100 may determine the playing time of playing the second audio component of the first background sound according to the time attribute of the second audio component of the first background sound in the video.
  • the second audio component of the first background sound is the audio component of the first time period on the time axis of the video, and the electronic device 100 may instruct the audio playback device to play the second audio component of the first background sound within the first time period of video playback .
  • the electronic device 100 may select at least two audio playback devices to play the audio component of the first background sound.
  • the electronic device 100 may indicate that one of the multiple audio playback devices playing the first background sound is the reference audio playback device.
  • the playing time of the second audio component of the first background sound played by the reference audio playback device may be determined according to the time attribute of the second audio component of the first background sound in the video.
  • the electronic device 100 may determine the playing time of the second audio component of the first background sound by other audio playback devices according to the distance between the reference audio playback device and other audio playback devices that play the second audio component of the first background sound relative to the observer .
  • the audio playback device Since the audio playback device is farther away from the observer, it takes longer for the sound generated by the audio playback device to play the audio to reach the observer, so the playback time of the second audio component of the first background sound played by the audio playback device can be longer. morning. This can make the first background sounds played by different audio playback devices reach the observer at the same time.
  • the electronic device 100 may select the audio playback device 201 and the audio playback device 202 to play the second audio component of the first background sound.
  • the distance between the audio playback device 201 and the observer is D3.
  • the distance between the audio playback device 202 and the observer is D4.
  • the audio playback device 201 is the aforementioned reference audio playback device.
  • the playing time for the audio playing device 201 to play the second audio component of the first background sound may be T2.
  • T2 may be determined according to the time attribute of the second audio component of the first background sound in the video.
  • the electronic device 100 may determine the time difference ⁇ T2 between the audio playback device 201 and the audio playback device 202 playing the second audio component of the first background sound according to the following formula (5).
  • C is the speed of sound, and the value may be 340 m/s.
  • the audio playing device can also adjust other playing parameters, such as the frequency of the sound wave.
  • the playback parameters for the audio playback device to play the audio component of the sounding object may be determined according to the positional relationship among the sounding object, the video playback device, and the observer. Then, when one or more positions of the sounding object, the audio playback device, and the observer change, the positional relationship between the sounding object, the video playback device, and the observer will all change. Then the electronic device can re-determine the playing parameters of the audio component of the sounding object played by the audio playing device according to the positional relationship among the sounding object, the video playing device and the observer after the change.
  • the electronic device 100 can use the audio playback device after the change, the sounding object , and the positional relationship between the observers to determine the playback parameters for playing the audio component of this sounding object.
  • the embodiment of the present application does not limit the method for implementing the audio playback device to play the audio component of the sounding object so that the observer determines the position of the sound source as the position of the sounding object.
  • the method of handing over the audio component of a sounding object to an audio playback device for playback in the above-mentioned embodiment, and adjusting the playback time and sound intensity of the audio component played by the audio playback device it is also possible to play the audio of a sounding object through multiple audio playback devices audio component, and adjust the playing time and sound intensity of the multiple audio playback devices to play the audio component, so that the observer can determine the position of the sound source as the position of the sounding object.
  • FIG. 7C schematically shows another method for adjusting playback parameters of an audio playback device.
  • the audio playing device 200 and the audio playing device 201 simulate the sounding of the first sounding object as an example for illustration. Not limited to two audio playback devices, more audio playback devices can also be used to simulate a sounding object.
  • the positional relationship among the observer, the audio playback device 200 , the audio playback device 201 , and the first sounding object shown in FIG. 7C is only for illustrative purposes, and should not limit the present application.
  • the electronic device 100 can adjust the playback time of the audio playback device 200 to play the first audio component of the first sounding object according to the method of the foregoing embodiment, so that the audio playback device 200 is equivalent to the audio playback device 200 'Voice at the location.
  • the direction vector of the audio playback device 200' relative to the direction of the observer's right ear is a1.
  • the direction vector of the audio playback device 200' relative to the direction of the observer's left ear is a2.
  • the electronic device 100 can adjust the playback time of the audio playback device 201 to play the first audio component of the first sounding object according to the method of the foregoing embodiment, so that the audio playback device 201 is equivalent to making a sound at the position where the audio playback device 201' is located.
  • the direction vector of the audio playback device 201' relative to the direction of the observer's right ear is b1.
  • the direction vector of the audio playback device 201' relative to the direction of the observer's left ear is b2.
  • the direction vector of the first sound-emitting object relative to the direction of the observer's right ear is c1.
  • the direction vector of the first sound-emitting object relative to the direction of the observer's left ear is c2.
  • the direction of the direction vector obtained by a1+b1 is the same as that of c1.
  • the direction of the direction vector obtained by a2+b2 is the same as that of c2.
  • the audio playback device 200 and the audio playback device 201 simulate the sounding of the first sounding object, so as to adjust the phase at which the first audio component of the first sounding object reaches both ears of the observer.
  • the observer can determine that the direction of the sound is the direction of the first sounding object through the phase difference of the sound heard by the left ear and the right ear.
  • the electronic device 100 does not need to adjust the audio playback device 200 and the audio playback device.
  • the device 201 plays the playing time of the first audio component of the first sounding object.
  • the method for determining the sound intensity of the first audio component played by the audio playback device 200 and the audio playback device 201 may refer to the foregoing embodiments, and details are not repeated here.
  • the above-mentioned audio playback device 200 and audio playback device 201 may be the two audio playback devices closest to the first sounding object among all audio playback devices.
  • the embodiment of the present application does not limit the selection method of the multiple audio playback devices.
  • ⁇ T1 may represent a time difference between the playing time of the audio playing device 200 and the audio playing device 201 playing the first audio component of the first sounding object.
  • ⁇ I may represent an intensity difference between the audio playing device 200 and the audio playing device 201 playing the sound intensity of the first sounding object. If the first audio component of the first sounding object played by the audio playback device 200 is used as a reference, the playback time for the audio playback device 200 to play the audio component can be determined according to the time attribute of the audio component in the video, for example, the playback time is Tb.
  • the sound intensity at which the audio playback device 200 plays the audio component may be determined according to the sound pressure of the audio component and the foregoing formula (1), for example, the sound intensity is Ib. Then, the playing time for the audio playing device 201 to play the first audio component of the first sounding object may be Tb+ ⁇ T1. The sound intensity at which the audio playback device 201 plays the first audio component of the first sound-emitting object may be Ib+ ⁇ I.
  • the above s1 may represent the position of the observer's left ear.
  • the above s2 may represent the position of the observer's right ear.
  • the above s3 may indicate the position of the audio playback device 1 .
  • the above s4 may represent the position of the audio playback device 2 .
  • the above s5 may represent the position of the first sounding object.
  • the above-mentioned position may be a position in a two-dimensional coordinate system or a three-dimensional coordinate system.
  • the above playback parameter determination model may be a model based on a neural network, or may be a model based on a head related transfer function (head related transfer function, HRTF), etc.
  • the embodiment of the present application does not limit the specific type of the playback parameter determination model.
  • the sound intensity of the audio component played by the two audio playback devices will not only affect the distance between the observer and the sound source of the audio component.
  • the resolution result of the distance of the observer will also affect the resolution result of the observer's direction of the sound source of the audio component relative to itself. For example, if the sound intensity of the audio component played by the audio playback device 200 is greater than the sound intensity of the audio component played by the audio playback device 201, the observer will judge that the sound source of the audio component is more inclined to the direction where the audio playback device 200 is located .
  • the distances between the two audio playback devices and the observer may be different.
  • the above s0 may represent the position of the observer. That is to say, the above-mentioned playback parameter determination model can directly use the distance between each audio playback device and the sounding object and the observer to determine the playback parameter, without making the above-mentioned distance accurate to the left side of each audio playback device, the sounding object and the observer. The distance between the ears, the distance between each audio playback device and sounding object and the right ear of the observer. This can simplify the calculation process and save the calculation resources of the electronic device 100 .
  • the electronic device 100 can instruct multiple audio playback devices to simulate the sounding of a sounding object, and adjust the playback time and sound intensity of the audio components of the sounding object played by the multiple audio playback devices according to the above-mentioned playback parameter determination model .
  • the time difference and phase difference between the audio component of a sounding object reaching the user's ears can make the user match the position of the sound source of the audio component with the position of the sounding object.
  • Matching the position of the sound source of the audio component with the position of the sounding object may include: the position of the sound source of the audio component is the same as the position of the sounding object, the position of the sound source of the audio component is within the position of the sounding object within a preset range of the location (for example, within a radius of 1 meter).
  • an audio component of a sounding object is played by an audio playback device as an example for illustration.
  • electronic device 100 detects multiple observers.
  • the electronic device 100 may calculate the centers of the positions of the multiple observers to obtain the center positions.
  • the electronic device 100 detects two observers, and the positions of the two observers are (x11, y11), (x21, y21) respectively.
  • the electronic device 100 can calculate the center positions of the two observers as ((x11+x21)/2, (y11+y21)/2).
  • the embodiment of the present application does not limit the specific calculation method of the above-mentioned center position.
  • the electronic device 100 may determine the positional relationship between each audio playback device and the central location. For example, the electronic device 100 may take the center position as the origin, establish the coordinate system shown in FIG. 6E, and determine the position of each audio playback device in the coordinate system. Further, the electronic device 100 may use the position of the virtual camera as the central position to determine the positional relationship between each sounding object in the video and the central position. In this way, the electronic device 100 can determine the sounding object corresponding to each audio playing device, and the playback parameters of the audio component of the sounding object played by the audio playing device.
  • the electronic device 100 may re-determine the central positions of the above-mentioned multiple observers according to the changed positions, thereby updating the positional relationship between the central position, the sounding object, and the audio playback device. . That is to say, the electronic device 100 can detect the position of the observer in real time, and correspondingly adjust the sounding objects corresponding to each audio playback device and the playing parameters of the audio components of the sounding objects played by the audio playback device after the position of the observer changes.
  • the electronic device 100 can determine the above-mentioned center position according to the positions of the remaining observers, and adjust the sounding objects corresponding to each audio playback device and the audio playback device to play the audio of the sounding object The component's playback parameters. This can make the observer who is still watching the video have a better sense of substitution.
  • the electronic device 100 can adjust the sounding object corresponding to each audio playing device and the playback parameters of the audio component of the sounding object played by the audio playing device based on the positions of multiple observers.
  • the multiple observers mentioned above can substitute themselves into the scene where the sounding object is making a sound in the video, and feel that different sounding objects are making sound in different directions of themselves.
  • the electronic device 100 can still adjust in real time the sounding objects corresponding to each audio playing device and the playback parameters of the audio components of the sounding objects played by the audio playing device. This can enable the observer to still feel the process of following the perspective of the virtual camera and the sounding objects in the video making sounds in different directions during the moving process.
  • the electronic device 100 may determine whether the change range of the observer's position is smaller than a preset range. If the change range of the position of the observer is smaller than the preset range, the electronic device 100 may not change the sounding object corresponding to the audio playback device and the playback parameters of the audio components of the sounding object played by the audio playback device.
  • the aforementioned preset range may be, for example, a range within a radius of 1 meter. The embodiment of the present application does not limit the value of the above preset range.
  • the audio playback device plays the audio component of the corresponding sounding object according to the playback parameters determined before the observer's position changes, which can still make the observer have a better sense of substitution when watching the video . Then, the electronic device 100 does not adjust the sounding object and playback parameters corresponding to the audio playback device when the change range of the observer's position is small, which can save the computing resources of the electronic device 100 and reduce the requirement on the computing power of the electronic device 100 .
  • the electronic device 100 may further determine whether the observer returns to the position before the position change within a preset time. Wherein, the electronic device 100 determines that the observer is located at position 1 for a period of time, and then moves from position 1 to other positions.
  • the above-mentioned position 1 may be a position before the above-mentioned position changes.
  • the return of the observer to the position before the position change within the preset time may include: the electronic device 100 judges that the observer is located at the above position 1 again within the preset time, or the electronic device 100 judges within the preset time Call out that the observer is located within the preset range of the above-mentioned position 1 (for example, within a radius of 1 meter).
  • the electronic device 100 may not change the sounding object and the playback parameters corresponding to the audio playback device.
  • the aforementioned preset time may be, for example, 1 minute, 3 minutes and so on. The embodiment of the present application does not limit the value of the foregoing preset time.
  • the observer may only leave briefly (eg, get up to drink water, get up to go to the toilet, etc.). Since the observer returns to the original position within a relatively short period of time, the electronic device 100 may not adjust the sounding object and playback parameters corresponding to the audio playback device. This can save computing resources of the electronic device 100 and reduce requirements on the computing capability of the electronic device 100 .
  • the electronic device 100 can send the audio component corresponding to the sounding object to the audio playing device according to the correspondence between the audio playing device and the sounding object, and play the playback parameters of the audio component corresponding to the sounding object.
  • the electronic device 100 may also send the second audio component of the first background sound and the playback parameters to an audio playback device for playing the second audio component of the first background sound.
  • the audio playing device may play the audio components according to the instruction of the electronic device 100 with the received playing parameters.
  • the electronic device 100 recognizes that the first video contains the sounding object E1 and the sounding object E2 within the first time period.
  • the electronic device 100 separates the audio component BE1 of the sounding object E1 , the audio component BE2 of the sounding object E2 and the second audio component BE3 of the first background sound from the first audio of the first video within the first time period.
  • the electronic device 100 may also determine the positions of the sounding object E1 and the sounding object E2 according to the first image group of the first video within the first time period.
  • the positions of the sounding object E1 and the sounding object E2 may be as shown in FIG. 8A .
  • the electronic device 100 can determine that the position of the audio playback device 200 is closest to the sounding object E1.
  • the playback device 203 is closest to the sounding object E2.
  • the electronic device 100 may determine that the playback parameter for the audio playback device 200 to play the audio component BE1 is R1, and the playback parameter for the audio playback device 203 to play the audio component BE2 is R2.
  • the electronic device 100 may instruct the audio playback device 201 and the audio playback device 202 to play the second audio component BE3 of the first background sound. Wherein, the electronic device 100 may determine that the playback parameter for the audio playback device 201 to play the audio component BE3 is R3, and the playback parameter for the audio playback device 202 to play the audio component BE3 is R4.
  • the electronic device 100 may send the audio component BE1, the playback parameter R1 and the playback instruction first message to the audio playback device 200 .
  • the above-mentioned play instruction first message may be used to instruct the audio playback device 200 to play the audio component BE1 with the playback parameter R1.
  • the audio playback device 200 may play the audio component BE1 with the playback parameter R1 according to the first playback instruction message.
  • the electronic device 100 may send the audio component BE2, the playback parameter R2 and the second message of the playback instruction to the audio playback device 203 .
  • the above-mentioned play instruction second message may be used to instruct the audio playback device 203 to play the audio component BE2 with the playback parameter R2.
  • the audio playback device 203 may play the audio component BE2 with the playback parameter R2 according to the second playback instruction message.
  • the electronic device 100 may send the audio component BE3 , the playback parameter R3 and the playback instruction message 3 to the audio playback device 201 .
  • the above playing instruction message 3 may be used to instruct the audio playing device 201 to play the audio component BE3 with the playing parameter R3.
  • the audio playback device 201 can play the audio component BE3 with the playback parameter R3 according to the playback instruction message 3.
  • the electronic device 100 may send the audio component BE3 , the playback parameter R4 and the playback instruction message 4 to the audio playback device 202 .
  • the above playing instruction message 4 may be used to instruct the audio playing device 202 to play the audio component BE3 with the playing parameter R4.
  • the audio playback device 202 can play the audio component BE3 with the playback parameter R4 according to the playback instruction message 4.
  • the audio playback device 200 can play the audio component BE1 with the playback parameter R1 so that the observer can distinguish the sound of the first sound-emitting object at the position where the first sound-emitting object is shown in FIG. 8A .
  • Playing the audio component BE2 by the audio playback device 203 with the playback parameter R2 may enable the observer to discern the second sound-emitting object making sound at the position where the second sound-emitting object is located as shown in FIG. 8A .
  • each audio playback device on the sound orientation of the audio playback matches the position of the sound-emitting object simulated by the audio playback device in the three-dimensional space represented by the video screen. This can make the observer get a better sense of substitution when watching the video.
  • one audio playback device may correspond to multiple sounding objects. That is, one audio playback device can play audio components of multiple sounding objects.
  • the audio playback device closest to the sounding object E1 , E2 , and E3 is the audio playing device 200 .
  • the electronic device 100 may instruct the audio playing device 200 to play the audio components of the sounding object E1, the sounding object E2, and the sounding object E3.
  • the audio playing device 200 can play the audio component of the sounding object E1 by playing the playing parameters of the audio component of the sounding object E1.
  • the audio playing device 200 may play the audio component of the sounding object E2 by playing the playing parameter of the audio component of the sounding object E2.
  • the audio playing device 200 may play the audio component of the sounding object E2 by playing the playing parameter of the audio component of the sounding object E2.
  • the audio playback device 202 is located on one side of the electronic device 100
  • the audio playback device 201 and the audio playback device 203 are located on the other side of the electronic device 100 .
  • the electronic device 100 may instruct the audio playback device 202 to play the second audio component of the first background sound, and instruct the audio playback device 201 and/or the audio playback device 203 plays the second audio component of the first background sound.
  • an audio playback device in addition to playing the audio components of one or more sounding objects, can also play the second audio component of the first background sound.
  • the electronic device 100 may determine the sounding object and playback parameters corresponding to the audio playing device according to the positional relationship among the observer, the audio playing device, and the sounding object in the two-dimensional coordinate system.
  • the positions of observers, audio playback devices, and sounding objects may not actually be on the same horizontal plane. The difference in height will also affect the effect of each audio playback device on the sound position of the simulated sounding object.
  • the electronic device 100 may determine the positional relationship among the observer, the audio playback device, and the sounding object in the three-dimensional coordinate system, and determine the corresponding sounding object and playback parameters of the audio playback device according to the positional relationship. This can improve the accuracy of the audio playback device in simulating the sound of the sounding object, so that the effect of each audio playback device simulating the sounding object in the sound position is more consistent with the position of the audio playback device simulating the sounding object in the three-dimensional space reflected in the video screen.
  • the electronic device 100 may obtain the positions of the observer and the audio playback device in the three-dimensional space through a three-dimensional ultrasound imaging method.
  • the electronic device 100 can take the position of the observer as the origin, establish the three-dimensional coordinate system Ot-Xt-Yt-Zt shown in FIG. 9A, and determine the position of the electronic device 100 and each audio playback device in the three-dimensional coordinate system Location.
  • Ot-Xt-Yt-Zt For the Xt axis and Yt axis of the three-dimensional coordinate system Ot-Xt-Yt-Zt, reference may be made to the introduction of the aforementioned FIG. 6E.
  • the Zt axis may be a straight line perpendicular to the Xt-Ot-Yt plane.
  • the position of the electronic device 100 is (0, y0, z0).
  • the position of the audio playback device 200 is (x1, y1, z1).
  • the position of the audio playback device 201 is (x2, y2, z2).
  • the positions of the above-mentioned electronic device 100 and the audio playback device are only illustrative, and should not be construed as limiting the present application.
  • the electronic device 100 may also obtain the positions of the observer, the display device, and the audio playback device in the three-dimensional space through other methods.
  • the electronic device 100 can perform three-dimensional reconstruction on images in the video to determine the positions of the sounding object and the virtual camera in the video.
  • the electronic device 100 may establish a three-dimensional coordinate system with the position of the virtual camera as the origin according to the positional relationship shown in FIG. 6B .
  • the above-mentioned three-dimensional coordinate system with the position of the virtual camera as the origin can refer to the three-dimensional coordinate system Oc-Xc-Yc-Zc shown in FIG. 9B.
  • the Zc axis may be a straight line perpendicular to the Xc-Oc-Yc plane.
  • the position of the first sounding object is (x5, y5, z5).
  • the position of the second sounding object is (x6, y6, z6).
  • the electronic device 100 may use the position of the observer as the position of the virtual camera to obtain the positional relationship between the sounding object and the observer.
  • the control device can overlap the three-dimensional coordinate system Oc-Xc-Yc-Zc shown in FIG. 9B with the three-dimensional coordinate system Ot-Xt-Yt-Zt shown in FIG. 9A to obtain the virtual camera and sound output shown in FIG. 9C The position of the object in the three-dimensional coordinate system Ot-Xt-Yt-Zt.
  • the electronic device 100 can calculate the distance between each sounding object and the observer, the distance between each audio component and the observer, the straight line between the sounding object and the observer, and the distance between the audio playback device and the observer in the three-dimensional coordinate system shown in FIG. 9C. data such as the angle between the straight lines where the person is located. According to the above data calculated in the three-dimensional coordinate system, the electronic device 100 can determine the sounding object and playback parameters corresponding to the audio playback device.
  • the principle and specific method for the electronic device 100 to determine the sounding object and playing parameters corresponding to the audio playback device can refer to the calculation based on the positional relationship between the observer, the audio playback device, and the sounding object in the two-dimensional coordinate system in the foregoing embodiments. introduce.
  • the electronic device 100 can determine whether to use the positional relationship in the two-dimensional coordinate system or the positional relationship in the three-dimensional coordinate system to determine the corresponding sounding object and playback parameter. It can be understood that the complexity of calculation using the positional relationship in the three-dimensional coordinate system is higher than the complexity of calculation using the positional relationship in the two-dimensional coordinate system. Then, the electronic device 100 may use the positional relationship in the three-dimensional coordinate system to determine the sounding object and playback parameters corresponding to the audio playback device when its own computing capability is high.
  • the electronic device 100 is configured with a processing module that is solely used to determine the sounding object and playback parameters corresponding to the audio playback device.
  • the computing capability of the processing module is relatively strong.
  • the electronic device 100 may use the positional relationship of the observer, the audio playback device, and the sounding object in the three-dimensional coordinate system to perform calculations.
  • the electronic device 100 uses a general processing module (such as a CPU) to determine the sounding object and playback parameters corresponding to the audio playback device.
  • the computing resources available for determining the sounding object and playing parameters corresponding to the audio playback device are less, and the electronic device 100 can use the observer, the audio playback device, and the sounding object in the two-dimensional coordinate system Calculate the positional relationship in .
  • the electronic device 100 may determine the sounding object and playback parameters corresponding to the audio playback device by using the positional relationship in the two-dimensional coordinate system or the positional relationship in the three-dimensional coordinate system according to the user's selection.
  • the electronic device 100 may display a video playing interface 910 .
  • the video playing interface 910 may display video images.
  • the video playback interface 910 may include a stereo precision option 911 .
  • the stereo precision option 911 can be used for the user to select the precision of the audio playback device to simulate the sound of the sounding object.
  • the electronic device 100 may display an option box 912 shown in FIG. 9E.
  • the options box 912 may contain a low precision option 912A and a high precision option 912B.
  • the electronic device 100 may determine the corresponding sounding object and playback parameters of the audio playing device by using the positional relationship among the observer, the audio playing device, and the sounding object in the two-dimensional coordinate system.
  • the electronic device 100 may determine the corresponding sounding object and playback parameters of the audio playing device by using the positional relationship among the observer, the audio playing device, and the sounding object in the three-dimensional coordinate system.
  • the electronic device 100 may prompt the user to select the high-precision option 912B requires more computing resources, and the power consumption of the device will be higher.
  • FIG. 10 exemplarily shows a schematic structural diagram of a communication system 1000 provided by an embodiment of the present application.
  • a communication system 1000 may include an electronic device 100 and one or more audio playback devices (such as an audio playback device 200, an audio playback device 201, etc.). A communication connection is established between the electronic device 100 and each audio playback device.
  • audio playback devices such as an audio playback device 200, an audio playback device 201, etc.
  • the electronic device 100 may include a control unit 1001 , a video analysis unit 1002 , and a display unit 1003 .
  • the control unit 1001 can be used to obtain the position of the observer and the positions of each audio playback device.
  • the video analysis unit 1002 can be used to analyze the video to determine the sounding object, the audio component of the sounding object, the position of the sounding object and the position of the virtual camera in the video.
  • the control unit 1001 can also be used to determine the corresponding relationship between the sounding object and the audio playback device and the playback parameters. Reference may be made to the foregoing embodiments for implementation methods of determining corresponding information by the control unit 1001 and the video analysis unit 1002. I won't go into details here.
  • control unit 1001 and the video analysis unit 1002 may be integrated on the same processor in the electronic device 100, such as a CPU.
  • control unit 1001 and the video analysis unit 1002 may also be integrated into different processors in the electronic device 100 .
  • control unit 1001 is integrated on the CPU.
  • the video analysis unit 1002 is integrated on the NPU.
  • the above-mentioned display unit 1003 can be used to display video images.
  • the display unit 1003 may include a display screen.
  • control unit 1001, video analysis unit 1002, and display unit 1003, and the electronic device 100 may also include more units.
  • the above-mentioned audio output unit may include a speaker.
  • the above-mentioned audio input unit may include a microphone.
  • the electronic device 100 may display images in the video, and instruct one or more audio playback devices to play audio components included in the audio in the video.
  • the above-mentioned one or more audio playback devices can simulate the sounding process of the sounding object in the video, so that the observer can have an immersive feeling when watching the video.
  • the above-mentioned communication system 1000 can increase the user's sense of substitution and playability when watching videos, and improve the user's sense of experience.
  • FIG. 11 exemplarily shows a schematic structural diagram of a communication system 1100 provided by an embodiment of the present application.
  • a communication system 1100 may include a control device 1101 , a video analysis device 1102 , one or more audio playback devices (such as an audio playback device 1104 , an audio playback device 1105 , etc.), and a display device 1103 .
  • the control device 1101 establishes a communication connection with the video analysis device 1102 .
  • the control device 1101 establishes communication connections with each audio playback device and display device 1103 .
  • the above-mentioned control device 1101 can be used to acquire the position of the observer and the positions of each audio playback device.
  • the control device 1101 may send a video and an instruction to analyze the video to the video analysis device 1102 through its communication connection with the video analysis device 1102 .
  • the above-mentioned video analysis device 1102 can be used to analyze the video to determine the sounding object, the audio component of the sounding object, the position of the sounding object and the position of the virtual camera in the video.
  • the video analysis device 1102 can analyze the video to determine the sounding object in the video, the audio component of the sounding object, the position of the sounding object, and the position of the virtual camera.
  • the video analysis device 1102 may send the sounding object in the video, the audio component of the sounding object, the position of the sounding object, and the position of the virtual camera to the control device 1101 .
  • the above-mentioned control device 1101 can also be used to determine the corresponding relationship between the sounding object and the audio playback device and the playing parameters by using the sounding object in the video, the audio component of the sounding object, the position of the sounding object and the position of the virtual camera.
  • the control device 1101 may send the audio component and the playback parameter to the audio playback device, and instruct the audio playback device to play the audio component according to the playback parameter.
  • the control device 1101 may also send images in the video to the display device 1103, instructing the display device 1103 to display the images in the video.
  • the display device 1103 may display images in the video.
  • One or more audio playback devices can play the audio component contained in the audio in the video.
  • the above-mentioned one or more audio playback devices can simulate the sounding process of the sounding object in the video, so that the observer can have an immersive feeling when watching the video.
  • the above-mentioned communication system 1100 can increase the user's sense of substitution and playability when watching videos, and improve the user's sense of experience.
  • FIG. 12 exemplarily shows a schematic structural diagram of a communication system 1200 provided by an embodiment of the present application.
  • the communication system 1200 may include one or more audio playback devices (eg, audio playback device 1210 , audio playback device 1220 , audio playback device 1230 , etc.), and a display device 1240 .
  • audio playback devices eg, audio playback device 1210 , audio playback device 1220 , audio playback device 1230 , etc.
  • the audio playback device 1210 may include a control unit 1211 , a video analysis unit 1212 , and an audio output unit 1213 .
  • control unit 1211 reference may be made to the control unit 1001 in the communication system 1000 shown in FIG. 10 .
  • the video analysis unit 1212 may refer to the aforementioned video analysis unit 1002 in the communication system 1000 shown in FIG. 10 . I won't go into details here.
  • the audio output unit 1213 may include a speaker and may be used to convert audio into a sound signal.
  • the audio playback device 1210 may establish communication connections with other audio playback devices.
  • the audio playback device 1210 can establish a communication connection with the display device 1240 .
  • control unit 1211 video analysis unit 1212, and audio output unit 1213
  • audio playback device 1210 may also include more units.
  • communication unit audio input unit, etc.
  • the audio playback device 1210 can obtain the position of the observer and the positions of each audio playback device, and analyze the video to determine the sounding object in the video, the audio component of the sounding object, the position of the sounding object, and the position of the virtual camera .
  • the audio playback device 1210 may also determine audio components played by each audio playback device, and playback parameters.
  • the audio playing device 1210 may play the audio component played by itself, and instruct other audio playing devices to play the corresponding audio component.
  • Audio playback device 1210 may instruct display device 1240 to display images in the video.
  • the above-mentioned one or more audio playback devices can simulate the sounding process of the sounding object in the video, so that the observer can have an immersive feeling when watching the video.
  • the above-mentioned communication system 1200 can increase the user's sense of substitution and playability when watching videos, and improve the user's sense of experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Stereophonic System (AREA)

Abstract

La présente invention concerne un procédé de lecture coopérative de contenu audio d'une lecture vidéo et un système de communication. Le procédé peut être appliqué à un système de communication comprenant de multiples dispositifs de lecture de contenu audio. Selon le procédé, une image et un contenu audio d'une vidéo sont analysés, de façon à déterminer un objet de production de son dans la vidéo et la composante audio ainsi que la position de l'objet de production de son. Pendant la lecture vidéo, un dispositif d'affichage peut afficher une image dans une vidéo, et les multiples dispositifs de lecture de contenu audio peuvent lire les composantes audio d'objets de production de son de la vidéo, de sorte que, lorsqu'un utilisateur distingue des sons de différents objets de production de son pendant la lecture vidéo, la position d'une source sonore à partir de laquelle chaque dispositif de lecture de contenu audio produit un son puisse être déterminée en tant que position d'un objet de production de son simulé par le dispositif de lecture de contenu audio. Le procédé permet d'améliorer la sensation de substitution et de jouabilité d'un utilisateur lors de la visualisation d'une vidéo, et permet améliorer l'expérience de l'utilisateur.
PCT/CN2022/138988 2021-12-17 2022-12-14 Procédé de lecture coopérative de contenu audio d'une lecture vidéo et système de communication WO2023109862A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111552635.9 2021-12-17
CN202111552635.9A CN116266874A (zh) 2021-12-17 2021-12-17 视频播放中协同播放音频的方法及通信系统

Publications (1)

Publication Number Publication Date
WO2023109862A1 true WO2023109862A1 (fr) 2023-06-22

Family

ID=86743737

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138988 WO2023109862A1 (fr) 2021-12-17 2022-12-14 Procédé de lecture coopérative de contenu audio d'une lecture vidéo et système de communication

Country Status (2)

Country Link
CN (1) CN116266874A (fr)
WO (1) WO2023109862A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104768064A (zh) * 2014-01-02 2015-07-08 冠捷投资有限公司 基于用户位置动态优化图像或声音的方法
CN106027933A (zh) * 2016-06-21 2016-10-12 维沃移动通信有限公司 一种视频的录制、播放方法及移动终端
CN109194999A (zh) * 2018-09-07 2019-01-11 深圳创维-Rgb电子有限公司 一种实现声音与图像同位的方法、装置、设备及介质
CN111258530A (zh) * 2020-01-09 2020-06-09 珠海格力电器股份有限公司 音频播放控制方法和服务器以及音频播放系统
CN111641865A (zh) * 2020-05-25 2020-09-08 惠州视维新技术有限公司 音视频流的播放控制方法、电视设备及可读存储介质
WO2021159864A1 (fr) * 2020-02-11 2021-08-19 华为技术有限公司 Procédé de transmission de données vidéo et audio, serveur en nuage et système

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104768064A (zh) * 2014-01-02 2015-07-08 冠捷投资有限公司 基于用户位置动态优化图像或声音的方法
CN106027933A (zh) * 2016-06-21 2016-10-12 维沃移动通信有限公司 一种视频的录制、播放方法及移动终端
CN109194999A (zh) * 2018-09-07 2019-01-11 深圳创维-Rgb电子有限公司 一种实现声音与图像同位的方法、装置、设备及介质
CN111258530A (zh) * 2020-01-09 2020-06-09 珠海格力电器股份有限公司 音频播放控制方法和服务器以及音频播放系统
WO2021159864A1 (fr) * 2020-02-11 2021-08-19 华为技术有限公司 Procédé de transmission de données vidéo et audio, serveur en nuage et système
CN111641865A (zh) * 2020-05-25 2020-09-08 惠州视维新技术有限公司 音视频流的播放控制方法、电视设备及可读存储介质

Also Published As

Publication number Publication date
CN116266874A (zh) 2023-06-20

Similar Documents

Publication Publication Date Title
JP7408048B2 (ja) 人工知能に基づくアニメキャラクター駆動方法及び関連装置
US11856148B2 (en) Methods and apparatus to assist listeners in distinguishing between electronically generated binaural sound and physical environment sound
US10699482B2 (en) Real-time immersive mediated reality experiences
JP6017854B2 (ja) 情報処理装置、情報処理システム、情報処理方法及び情報処理プログラム
JP4439740B2 (ja) 音声変換装置及び方法
CN108141696A (zh) 用于空间音频调节的系统和方法
JP2019523607A (ja) 空間化オーディオを用いた複合現実システム
JP7347597B2 (ja) 動画編集装置、動画編集方法及びプログラム
JP3670180B2 (ja) 補聴器
WO2021244056A1 (fr) Procédé et appareil de traitement de données, et support lisible
JP2018036690A (ja) 一対多コミュニケーションシステムおよびプログラム
WO2021143574A1 (fr) Lunettes à réalité augmentée, procédé de mise en œuvre de ktv à base de lunettes à réalité augmentée, et support
US11721355B2 (en) Audio bandwidth reduction
US11641561B2 (en) Sharing locations where binaural sound externally localizes
JP2008299135A (ja) 音声合成装置、音声合成方法、および音声合成用プログラム
WO2023109862A1 (fr) Procédé de lecture coopérative de contenu audio d'une lecture vidéo et système de communication
Cohen et al. Spatial soundscape superposition and multimodal interaction
WO2021124680A1 (fr) Dispositif de traitement d'informations et procédé de traitement d'informations
CN212588503U (zh) 一种嵌入式音频播放装置
CN111696566A (zh) 语音处理方法、装置和介质
US20240087597A1 (en) Source speech modification based on an input speech characteristic
US11689878B2 (en) Audio adjustment based on user electrical signals
WO2023207884A1 (fr) Procédé de lecture audio et appareil associé
Mancini et al. Human movement expressivity for mobile active music listening
CN111696564A (zh) 语音处理方法、装置和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906607

Country of ref document: EP

Kind code of ref document: A1