WO2023109278A1 - 一种伴奏的生成方法、设备及存储介质 - Google Patents

一种伴奏的生成方法、设备及存储介质 Download PDF

Info

Publication number
WO2023109278A1
WO2023109278A1 PCT/CN2022/124590 CN2022124590W WO2023109278A1 WO 2023109278 A1 WO2023109278 A1 WO 2023109278A1 CN 2022124590 W CN2022124590 W CN 2022124590W WO 2023109278 A1 WO2023109278 A1 WO 2023109278A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
dry
virtual
accompaniment
chorus
Prior art date
Application number
PCT/CN2022/124590
Other languages
English (en)
French (fr)
Inventor
张超鹏
翁志强
寇志娟
Original Assignee
腾讯音乐娱乐科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯音乐娱乐科技(深圳)有限公司 filed Critical 腾讯音乐娱乐科技(深圳)有限公司
Publication of WO2023109278A1 publication Critical patent/WO2023109278A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/005Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/341Rhythm pattern selection, synthesis or composition

Definitions

  • the present application relates to the field of computer application technology, and in particular to an accompaniment generation method, device and storage medium.
  • Virtual 3D audio technology can create three-dimensional dynamic effects, and applying it to singing software can bring users an immersive experience.
  • the existing technical solution is to directly weight and superimpose multiple human voices.
  • this processing method makes the sound effect not three-dimensional, resulting in poor user experience.
  • Embodiments of the present application provide an accompaniment generation method, device, and storage medium, which can realize audio stereo surround effects in an all-round way and improve user experience.
  • the embodiment of the present application provides a method for generating an accompaniment, including:
  • Obtain a dry sound signal set which includes x dry sound signals corresponding to the target song, and x is an integer greater than 1;
  • the virtual sound signal set includes: a virtual sound at each virtual three-dimensional sound image position in the N virtual three-dimensional space sound image positions Signal;
  • the dry voice of the chorus and the background music of the target song are synthesized to obtain the accompaniment of the target song.
  • an accompaniment playback processing method including:
  • the user interface is used to receive a selection instruction for the target song
  • the selection instruction received in the user interface indicates that the accompaniment mode of the target song is a chorus accompaniment mode, then the accompaniment corresponding to the target song is obtained;
  • the accompaniment is generated based on the chorus dry sound and the background music
  • the chorus dry sound is generated based on multiple dry sound signals in the dry sound signal set
  • the multiple dry sound signals in the dry sound signal set correspond to multiple different virtual three-dimensional spaces
  • the sound image position and dry sound signal set are obtained according to the dry sound signals recorded by multiple users for the target song.
  • an accompaniment generating device including:
  • the acquisition unit is used to acquire a dry sound signal set, which includes x dry sound signals corresponding to the target song, and x is an integer greater than 1; based on each virtual three-dimensional space in the N virtual three-dimensional space sound image positions The corresponding dry sound signal at the sound image position generates a virtual sound signal, wherein, the x dry sound signals correspond to N virtual three-dimensional space sound image positions, N is an integer greater than 1, and the N virtual three-dimensional space sound image positions are not the same, and Each virtual three-dimensional sound image position is allowed to correspond to one or more dry sound signals among the x dry sound signals.
  • the processing unit is used to combine and process each virtual sound signal in the virtual sound signal set to obtain the chorus dry sound.
  • the virtual sound signal set includes: each virtual three-dimensional sound image in N virtual three-dimensional sound image positions The virtual sound signal at the position; according to the sound effect optimization rules, the dry voice of the chorus and the background music of the target song are subjected to sound effect synthesis processing to obtain the accompaniment of the target song.
  • an accompaniment playback processing device including:
  • the acquisition unit is used to display a user interface, and the user interface is used to receive a selection instruction for the target song; if the selection instruction received on the user interface indicates that the accompaniment mode for the target song is a chorus accompaniment mode, then acquire the accompaniment corresponding to the target song.
  • the processing unit is used to play the accompaniment corresponding to the target song; the accompaniment is generated according to the chorus dry sound and the background music, the chorus dry sound is generated according to a plurality of dry sound signals in the dry sound signal set, and the multiple dry sound signal sets in the dry sound signal set A dry sound signal corresponds to multiple different virtual three-dimensional space sound image positions, and the set of dry sound signals is obtained according to dry sound signals recorded by multiple users for the target song.
  • an embodiment of the present application provides an electronic device, including: a memory, a processor, and a network interface, the processor is connected to the memory and the network interface, wherein the network interface is used to provide a network communication function, and the memory It is used to store program codes, and the above-mentioned processor is used to call the program codes to execute the methods in the embodiments of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, including: a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method in the embodiment of the present application is implemented.
  • an embodiment of the present application provides a computer program product or computer program
  • the computer program product or computer program includes computer instructions
  • the computer instructions are stored in a computer-readable storage medium
  • the processor of the computer device can read from the computer
  • the storage medium reads and executes the computer instruction, so that the computer device executes the method in the embodiment of the present application.
  • the virtual sound signals of each dry sound signal corresponding to the target song in the dry sound signal set in different virtual three-dimensional space sound image positions can be obtained, and then the virtual sound signals corresponding to each dry sound signal After the merging process, the dry voice of the chorus is obtained, and finally the dry voice of the chorus and the background music of the target song are synthesized according to the sound effect optimization rules to obtain the accompaniment of the target song; on the other hand, the user's selection instruction for the target song can be received.
  • the received selection instruction indicates that the accompaniment mode of the target song is the chorus accompaniment mode
  • the accompaniment corresponding to the target song may be acquired and played.
  • the sound image position of the dry sound signal in the virtual three-dimensional space can be simulated in an all-round way, and the effect of audio stereo surround can be realized, so that the user can have an immersive sense of hearing when obtaining the corresponding accompaniment , for an immersive experience.
  • FIG. 1 is a schematic diagram of an application scenario of an accompaniment generation method provided by an embodiment of the present application
  • Fig. 2 is a schematic flow chart of an accompaniment generation method provided by the embodiment of the present application.
  • Fig. 3a is a schematic diagram of a horizontal plane, an upper plane and a lower plane in an accompaniment generation method provided by an embodiment of the present application;
  • Fig. 3b is a schematic diagram of a virtual three-dimensional space sound image position in an accompaniment generation method provided by an embodiment of the present application;
  • Fig. 3c is a schematic diagram of dividing each plane at preset angle intervals in an accompaniment generation method provided by the embodiment of the present application;
  • Fig. 4 is a schematic flowchart of another accompaniment generation method provided by the embodiment of the present application.
  • FIG. 5 is a schematic flow diagram of obtaining a two-channel signal corresponding to a dry sound signal in a set of dry sound signals in a method for generating an accompaniment provided by an embodiment of the present application;
  • Fig. 6 is a schematic flowchart of an accompaniment playback processing method provided by an embodiment of the present application.
  • Fig. 7a is a schematic flow chart of acquiring an accompaniment corresponding to a target song in an accompaniment playback processing method provided in an embodiment of the present application;
  • Fig. 7b is a schematic diagram of a first single-sentence interface displayed in an accompaniment playback processing method provided by an embodiment of the present application;
  • Fig. 7c is a schematic diagram of a second single-sentence interface displayed in an accompaniment playback processing method provided by an embodiment of the present application.
  • Fig. 8a is a schematic structural diagram of an accompaniment generating device provided by an embodiment of the present application.
  • Fig. 8b is a schematic structural diagram of an accompaniment playback processing device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Dry sound signal refers to a pure human voice signal without accompaniment music, and the dry sound signal is a monophonic sound signal, ie does not include any direction information.
  • Binaural signal Binaural means that there are two sound channels. The principle is that when people hear the sound, they can judge the specific position of the sound source according to the phase difference between the left ear and the right ear.
  • the binaural signal in the embodiment of the present application refers to a left channel sound signal and a right channel sound signal.
  • HRTF Head Related Transfer Functions
  • HRTF can also be called the binaural transfer function, which describes the transmission process of sound waves from the sound source to both ears.
  • HRTF is a set of filters. It adopts the principle that time-domain convolution is equivalent to frequency-domain convolution. It can calculate the virtual sound signal transmitted to both ears based on the HRTF data corresponding to the sound source position information.
  • the embodiment of the present application provides an accompaniment generation method, device, and storage medium.
  • a dry sound signal set composed of multiple dry sound signals corresponding to the same target song can be obtained, and the dry sound
  • the virtual sound signals corresponding to each dry sound signal included in the signal set in different virtual three-dimensional space sound image positions, and then the virtual sound signals corresponding to each dry sound signal are combined and processed to obtain the chorus dry sound, and finally according to the sound effect optimization rule Perform sound effect synthesis processing on the dry voice of the chorus and the background music of the target song to obtain the accompaniment of the target song.
  • the sound image position of each dry sound signal included in the dry sound signal set in the virtual three-dimensional space can be simulated in an all-round way, and then the corresponding sound image position of each dry sound signal at different virtual three-dimensional space can be obtained.
  • the chorus dry sound after merging and processing the virtual sound signal of the virtual sound signal realizes the effect of audio stereo surround.
  • the chorus dry sound and background music can be processed according to the sound effect optimization rules to obtain the accompaniment, so as to enhance the audio effect
  • this application can obtain richer audio processing effects and improve user experience.
  • FIG. 1 is a schematic diagram of an application scenario of a method for generating an accompaniment provided by an embodiment of the present application.
  • the application scenario may include a smart device 100 , the smart device communicates with a server 110 in a wired or wireless manner, and the server 110 is connected to a database 120 .
  • the method for generating an accompaniment may be implemented by an electronic device such as the smart device 100 .
  • the smart device 100 may acquire a set of dry sound signals corresponding to the target song, and based on each virtual sound image position in the N virtual three-dimensional space
  • the corresponding dry sound signal at the position of the three-dimensional sound image generates a virtual sound signal.
  • the virtual sound signal can be a two-channel signal, and then the virtual sound signals corresponding to each dry sound signal are combined to obtain a chorus dry sound.
  • the sound effect optimization rule performs sound effect synthesis processing on the dry voice of the chorus and the background music of the target song to obtain the accompaniment.
  • the smart device 100 in FIG. 1 shows the option of "chorus accompaniment".
  • the user can generate a selection instruction for the chorus accompaniment mode through voice control, or generate a chorus accompaniment mode by triggering the selection control displayed on the user interface. selection command.
  • the dry sound signal set may be pre-stored locally by the smart device 100 , or may be acquired by the smart device 100 from the server 110 or the database 120 .
  • the method for generating an accompaniment may also be implemented by an electronic device such as the server 110 .
  • the server 110 may obtain a set of dry sound signals corresponding to the target song, and based on each virtual three-dimensional sound image position in the N virtual three-dimensional space The corresponding dry sound signal at the spatial sound image position generates a virtual sound signal.
  • the virtual sound signal can be a binaural signal, for example, and then the virtual sound signals corresponding to each dry sound signal are combined to obtain a chorus dry sound, and then according to the sound effect
  • the optimization rule performs sound effect synthesis processing on the dry voice of the chorus and the background music of the target song to obtain the accompaniment.
  • the dry sound signal set can be pre-stored locally by the server 110, or acquired by the server 110 from the database 120, and the finally obtained accompaniment can be stored locally or stored in the database 120 for calling when needed.
  • the server 110 may not start to generate the accompaniment when the received selection instruction indicates that the accompaniment mode of the target song is a chorus accompaniment mode, and the server 110 may start generating the accompaniment at an appropriate time, for example, when the load of the server 110 is low, or when the server 110 receives
  • the relevant steps of the accompaniment generating method of the present application are executed to generate an accompaniment.
  • the accompaniment of the chorus version can be generated in advance, and then stored in the server.
  • the user can use the smart device 100 to select "chorus accompaniment" on the user interface to issue a selection of the target song. command, so that the server 110 can respond to the selection command, find the chorus accompaniment of the target song from a large number of generated accompaniments, and send the chorus accompaniment to the smart device 100 .
  • the method for generating an accompaniment provided in the embodiment of the present application may also be implemented cooperatively by an electronic device such as the smart device 100 and an electronic device such as the server 110 .
  • the server 110 may generate a virtual sound signal based on a dry sound signal corresponding to each virtual three-dimensional sound image position in the N virtual three-dimensional sound image positions, and the virtual sound signal may be a binaural signal, for example, and then each The virtual sound signal corresponding to the dry sound signal is merged and processed to obtain the dry chorus sound, and then the dry chorus sound and the background music of the target song are subjected to sound effect synthesis processing according to the sound effect optimization rules to obtain an accompaniment, and the obtained accompaniment is sent to the smart device 100 .
  • the method for generating an accompaniment provided by the embodiment of the present application can also be implemented by an electronic device such as the smart device 100 and an electronic device such as the server 110 by running a computer program.
  • a computer program may be a native program or a software module in an operating system, a local application program (APP, Application), or a small program.
  • APP application program
  • the computer program may be any form of application program, module or plug-in, This embodiment of the present application does not specifically limit it.
  • the smart devices involved in the embodiments of the present application may be personal computers, notebook computers, smart phones, tablet computers, smart watches, smart voice interaction devices, smart home appliances, vehicle-mounted terminals and smart wearable devices, etc., but are not limited thereto .
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, intermediate Cloud servers for basic cloud computing services such as mail service, domain name service, security service, content delivery network (Content Delivery Network, CDN), and big data and artificial intelligence platforms.
  • the smart device and the server may be connected directly or indirectly through wired or wireless communication, which is not specifically limited in this embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for generating an accompaniment provided by an embodiment of the present application.
  • the method in the embodiment of the present application can be applied to an electronic device, and the electronic device can be, for example, a smart phone Smart devices such as tablets, smart wearable devices, and personal computers can also be servers, etc.
  • the method may include, but is not limited to, the following steps:
  • the electronic device may acquire a dry sound signal set, and the dry sound signal set includes several dry sound signals corresponding to the target song.
  • the set of dry sound signals may be obtained from an audio database, which includes initial dry sound signals recorded by multiple users when singing the same song. It should be noted that the initial dry sound signals in the audio database are The acoustic signal is entered after authorization and consent of the user.
  • the electronic device can filter dry sound signals satisfying conditions according to sound parameters of the initial dry sound signal to form a set of dry sound signals.
  • the electronic device may filter dry sound signals satisfying conditions from the initial set of dry sound signals according to the pitch feature parameter and the sound quality feature parameter.
  • the pitch feature parameters can include any one or more of pitch parameters, rhythm parameters and prosody parameters, and the dry sound signal that satisfies the conditions screened out according to the pitch feature parameters has the characteristics of high consistency between song pitch and rhythm and accompaniment melody
  • the sound quality characteristic parameters may include any one or more of noise parameters, energy parameters and speed parameters, and the dry sound signal that satisfies the conditions screened out according to the sound quality characteristic parameters has the characteristics of clear audio, appropriate audio energy, and uniform audio speed.
  • the embodiment of the present application does not limit the screening sequence of dry sound signals that meet the conditions.
  • the electronic device can first filter the dry sound signals that meet the conditions according to the pitch characteristic parameters, and then select the dry sound signals that meet the preset pitch characteristic parameters. Filter the dry sound signal that meets the preset sound quality characteristic parameter conditions, or first filter the dry sound signal that meets the condition according to the sound quality characteristic parameter, and then filter the dry sound signal that meets the preset sound quality characteristic parameter condition to meet the preset audio characteristic parameter Conditioned dry signal.
  • the dry sound signal set formed by the dry sound signal set screened from the initial dry sound signal set in this way has excellent pitch and sound quality.
  • S202 Generate a virtual sound signal based on a dry sound signal corresponding to each virtual three-dimensional sound image position in the N virtual three-dimensional space sound image positions.
  • the electronic device can simulate the different sound image positions of each dry sound signal in the virtual three-dimensional space in the dry sound signal set, and then based on each virtual three-dimensional space sound image in the N virtual three-dimensional space sound image positions
  • the corresponding dry sound signal at the location generates a virtual sound signal, which may be a binaural signal, for example.
  • the N virtual three-dimensional space sound image positions are different, and each virtual three-dimensional space sound image position may correspond to one or more dry sound signals.
  • N virtual three-dimensional sound image positions can be simulated in the virtual three-dimensional space in the following manner: as shown in Figure 3a, the positive directions of the x, y, and z axes in the virtual three-dimensional space correspond to the positive directions of the human head respectively.
  • the front, the left side and the top of the head direction divide the virtual three-dimensional space into three planes: the horizontal plane 301, the upper plane 302 whose angle with the horizontal plane is the first angle threshold, and the upper plane 302 whose angle with the horizontal plane is the second angle threshold
  • the lower plane 303 .
  • each virtual three-dimensional space sound image position in the virtual three-dimensional space includes azimuth and elevation angle, assuming that ⁇ represents the azimuth angle of the virtual three-dimensional space sound image position, expressed by Indicates the elevation angle of the virtual three-dimensional space sound image position, then each virtual three-dimensional space sound image position can be used express.
  • the horizontal plane 301 is the plane corresponding to the elevation angle of 0°
  • the upper plane is the plane corresponding to the first angle threshold elevation angle
  • the first angle threshold value can be any angle value above the horizontal plane
  • the lower plane is the second angle threshold elevation angle
  • the second angle threshold may be any angle value below the horizontal plane.
  • the upper plane may be a plane corresponding to an elevation angle of 40°
  • the lower plane may be a plane corresponding to an elevation angle of -40°
  • the azimuth ⁇ can be used to describe the angle between the virtual three-dimensional sound image position on the plane and the target direction line in a clockwise direction.
  • FIG. 3 c after dividing the planes corresponding to different elevation angles at intervals of corresponding preset angles, multiple virtual three-dimensional spatial sound image positions can be obtained.
  • n1 virtual three-dimensional space sound image positions on the horizontal plane can be obtained; after dividing the upper plane at the interval of the second preset angle, it can be obtained n2 virtual three-dimensional space sound-image positions on the upper plane; after dividing the lower plane at intervals of a third preset angle, n3 virtual three-dimensional space sound-image positions on the lower plane can be obtained.
  • the first preset angle is 10°
  • the second preset angle and the third preset angle are both 15°
  • 36 virtual three-dimensional sound image positions can be obtained after dividing the horizontal plane at intervals of 10°
  • 24 virtual three-dimensional space sound-image positions can be obtained
  • after dividing the lower plane at intervals of 15°, 24 virtual three-dimensional space sound-image positions can also be obtained, so the total 84 different virtual three-dimensional space sound image positions will be obtained.
  • first preset angle, the second preset angle and the third preset angle in the embodiment of the present application can be any preset angle values, and the specific values of the above three preset angles are only for example It does not constitute a limitation to the embodiment of the application.
  • multiple virtual three-dimensional space sound image positions can be virtualized in three different planes in the virtual three-dimensional space at different intervals of azimuth angles, realizing the omnidirectional immersion simulation of sound sources.
  • each virtual three-dimensional spatial sound image position may correspond to one dry sound signal, or may correspond to multiple dry sound signals.
  • the electronic device can obtain virtual sound signals corresponding to one or more dry sound signals at the sound image positions in each virtual three-dimensional space.
  • the electronic device can obtain virtual sound signals corresponding to each dry sound signal in the virtual three-dimensional space by using the following scheme : Obtain the azimuth and elevation of the virtual three-dimensional sound image position corresponding to the dry sound signal, determine the head-related transfer function HRTF corresponding to the virtual three-dimensional sound image position according to the azimuth and elevation angle of the virtual three-dimensional sound image position, according to The azimuth and elevation of the virtual three-dimensional sound image position and the corresponding HRTF data can be calculated to obtain a virtual sound signal corresponding to the dry sound signal at the virtual space sound image position.
  • the azimuth and elevation angles of the virtual three-dimensional sound image position corresponding to the dry sound signal X are The HRTF data expression corresponding to the sound image position in the virtual three-dimensional space is:
  • the virtual sound signals corresponding to the dry sound signal X at the position of the virtual three-dimensional sound image are calculated as binaural signals Y L and Y R , wherein Y L is the left channel signal and Y R is the right channel signal.
  • the electronic device can obtain part of the dry sound signal from the dry sound signal set, for example, it can randomly obtain or filter out a dry sound signal with better pitch and sound quality according to a new screening rule, and the filtered part of the dry sound signal
  • the signals are respectively subjected to a delay processing operation to obtain a delayed two-channel signal corresponding to each dry sound signal in the part of the dry sound signal.
  • 8 pairs of different time parameters can be selected. It should be noted that the 8 pairs of time parameters represent the 8 time parameters and It is used to obtain 8 time parameters of the delayed right channel signal, and a total of 16 time parameters will be selected.
  • 16 parameters ranging from 21ms to 79ms can be selected as time parameters, or 16 (or Other values) different parameters are used as time parameters.
  • the dry sound signal located at the left ear of the head or the right ear of the head can be simulated to make the audio effect richer.
  • the selection method and the setting of the time parameter (delay duration parameter) during the delay processing operation can be adjusted through an interface, so as to facilitate flexible configuration for users who make chorus audio. It should be noted that the above steps of obtaining the virtual sound signal, the steps of obtaining the delayed left channel signal and the delayed right channel signal may be performed simultaneously or successively, which is not limited in this application.
  • S203 Perform merging processing on each virtual sound signal in the virtual sound signal set to obtain a chorus dry sound.
  • the electronic device may combine the virtual sound signals in the virtual sound signal set to obtain the dry chorus sound.
  • the combined processing of the virtual sound signals corresponding to each dry sound signal may be implemented through normalization processing, so as to achieve the purpose of adjusting the loudness of the combined virtual sound signals to [-1dB, 1dB].
  • the respective virtual sound signals targeted for the merging process include: the virtual sound signal corresponding to the dry sound signal at each virtual three-dimensional sound image position in the obtained N virtual three-dimensional sound image positions, and the dry sound signal Delayed two-channel signals obtained by performing a delay processing operation on part of the dry sound signals in the signal set.
  • S203 Perform sound effect synthesis processing on the dry voice of the chorus and the background music of the target song according to sound effect optimization rules to obtain an accompaniment.
  • the electronic device can obtain the final accompaniment after performing sound effect synthesis processing on the dry voice of the chorus and the background music of the target song according to sound effect optimization rules.
  • the sound effect optimization rule may be, for example, adjusting the sound parameters of the background music of the target song and the virtual sound signals corresponding to the obtained multiple dry sound signals.
  • the sound parameters may be, for example, common adjustable parameters such as loudness and timbre.
  • the electronic device after the electronic device obtains the dry chorus sound, it can obtain the background music of the target song. If the energy relationship between the obtained dry chorus sound and the background music of the target song does not satisfy the energy ratio condition, the electronic device can Adjust the energy relationship between the dry voice of the chorus and the background music of the target song.
  • the energy ratio condition can be set as the ratio between the energy value of the chorus dry sound and the energy value of the background music of the target song is less than a ratio threshold, or it can be set that the loudness of the chorus dry sound is greater than the loudness of the background music of the target song 3dB lower. In this way, the energy of the chorus dry voice can be prevented from being greater than the background music of the target song, making the final accompaniment more harmonious.
  • the virtual sound signals corresponding to each dry sound signal at different virtual three-dimensional space sound image positions in the virtual three-dimensional space, and then combine each virtual sound signal to obtain the chorus dry sound, and then follow the
  • the sound effect optimization rule performs sound effect synthesis processing on the dry voice of the chorus and the background music of the target song to obtain the accompaniment of the target song, thereby realizing the stereo surround effect on the audio sense of hearing, and enhancing the immersion of the audio effect, so that the user experience is perfect good.
  • FIG. 4 is a schematic flow chart of another accompaniment generation method provided by the embodiment of the present application.
  • the method in the embodiment of the present application can be applied to electronic devices. Mobile phones, tablets, smart wearables, PCs, servers, and more.
  • the method may include, but is not limited to, the following steps:
  • S401 Acquire an initial dry sound signal set from an audio database.
  • the electronic device may acquire the initial dry sound signal set from the audio database. It should be noted that the initial dry sound signal set in the audio database is entered after authorization and consent of the user.
  • the audio database may be an independently configured database, or may be integrated with the electronic device, that is, the audio database may be regarded as being stored inside the electronic device.
  • the initial dry sound signal set refers to the set of original dry sound signals entered after the authorization and consent of the user when singing the same song in the audio database.
  • S402 Screen dry sound signals from the initial dry sound signal set according to the sound parameters of each initial dry sound signal, and the screened dry sound signals form a dry sound signal set.
  • the electronic device may filter dry sound signals satisfying conditions from the initial dry sound signal set according to the sound parameters of each initial dry sound signal, so as to form the dry sound signal set after narrowing down the initial dry sound signal set.
  • the sound parameters of the initial dry sound signal may include pitch feature parameters and tone quality feature parameters of the initial dry sound signal
  • the pitch feature parameters may include any one or more of pitch parameters, rhythm parameters, and prosody parameters
  • the sound quality characteristic parameters may include any one or more of noise parameters, energy parameters and speed parameters.
  • S403 Obtain a head-related transfer function corresponding to each virtual three-dimensional sound image position of the N virtual three-dimensional space sound image positions.
  • the electronic device can acquire N virtual three-dimensional space sound image positions in the virtual three-dimensional space, and then obtain the head-related transfer corresponding to each virtual three-dimensional space sound image position according to the N virtual three-dimensional space sound image positions function.
  • the head-related transfer functions corresponding to each virtual three-dimensional sound image position in the virtual three-dimensional space can be stored in the head-related transfer function database in advance, so that the electronic device The corresponding head related transfer function is called in the related transfer function database.
  • S404 Using the head-related transfer function corresponding to the target virtual three-dimensional space sound image position, process the dry sound signal corresponding to the target virtual three-dimensional space sound image position to obtain a virtual sound signal at the target virtual three-dimensional space sound image position.
  • the electronic device may process the target dry sound signal according to the head-related transfer function corresponding to the target virtual three-dimensional space sound image position, so as to obtain the target dry sound signal corresponding to the target virtual three-dimensional space sound image position.
  • a virtual sound signal, the target virtual three-dimensional sound image position can be any one of the N virtual three-dimensional space sound image positions, and the target dry sound signal can be any dry sound in the dry sound signal set Signal.
  • the head-related transfer function corresponding to the target virtual three-dimensional space sound image position is HRTF data corresponding to the virtual three-dimensional space sound image position.
  • the HRTF data corresponding to the target virtual three-dimensional space sound image position can be determined from the known HRTF data, and then the electronic device can combine the target dry sound signal with the target virtual three-dimensional space
  • the HRTF data corresponding to the position is convoluted to obtain the virtual sound signal corresponding to the target dry sound signal at the target virtual three-dimensional space sound image position.
  • S405 Acquire p dry sound signals from x dry sound signals included in the dry sound signal set.
  • the electronic device may randomly acquire p dry sound signals from the x dry sound signals included in the dry sound signal set. It should be noted that S404 and S405 may be executed simultaneously or successively, which is not limited in this application.
  • S406 Perform a delay processing operation on each of the p dry sound signals to obtain a delayed left channel signal and a delayed right channel signal corresponding to each of the p dry sound signals.
  • the electronic device may perform m1 time parameter delay processing operation on each of the p dry sound signals to obtain m1 corresponding to each of the p dry sound signals delay dry sound signals, by superimposing the m1 delay dry sound signals corresponding to each dry sound signal, the delay left channel signal corresponding to each dry sound signal in the p dry sound signals is obtained, and m1 is a positive integer; Then, each dry sound signal in the p dry sound signals is subjected to a delay processing operation of m2 time parameters, and m2 delayed dry sound signals corresponding to each dry sound signal in the p dry sound signals are obtained. The m2 delayed dry sound signals corresponding to each dry sound signal obtain the delayed right channel signal corresponding to each dry sound signal in the p dry sound signals, and m2 is a positive integer.
  • the electronic device can process a dry sound signal through 16 delayers with different time parameters to obtain 16 dry sound signals with different delays and attenuation degrees, and then 16 different delays
  • the dry sound signal with the degree of attenuation is divided into two groups on average, and the dry sound signals with different delays and attenuation degrees in each group are superimposed, and finally the delayed left channel signal and the delayed right channel signal corresponding to the dry sound signal are obtained. channel signal.
  • the sound field of the dry sound signal can be widened by adding a bass enhancement and reverberation simulation module to Reduce the correlation between the delayed left channel signal and the delayed right channel signal in the two-channel signal obtained through delay processing.
  • the steps of obtaining the virtual sound signal in S403 and S404, and the steps of obtaining the delayed left channel signal and the delayed right channel signal in S405 and S406 can be performed simultaneously or successively, which is not limited in this application , wherein, S405 and S406 are two optional steps.
  • S407 Perform merging processing on each virtual sound signal in the virtual sound signal set to obtain a chorus dry sound.
  • the electronic device may combine the virtual sound signals in the virtual sound signal set to obtain the dry chorus sound.
  • the virtual sound signal set includes virtual sound signals corresponding to each dry sound signal obtained by the electronic device by simulating N virtual three-dimensional space positions, and the virtual sound signal obtained by the electronic device by performing delay processing operations on p dry sound signals in the dry sound signal set The delayed binaural signal that arrives.
  • each virtual sound signal in the set of virtual sound signals is a binaural signal
  • the binaural signal includes a left channel signal and a right channel signal
  • combining the virtual sound signals can combine the left channel
  • the signal and the right channel signal are processed separately, and the same processing rule applies to the left channel signal and the right channel signal.
  • the merging process can be realized through normalization processing, so that the loudness of the binaural signal after the merging process is [-1dB, 1dB].
  • each left channel signal is normalized separately, and then 1000 normalized After the sum of the left channel signals is divided by 1000, the combined left channel signal can be obtained.
  • 1000 normalized After normalizing each right channel signal
  • the obtained energy relationship between the dry chorus and the background music of the target song may or may not satisfy the energy ratio condition. If the energy relationship between the obtained chorus dry sound and the background music satisfies the energy ratio condition, step S408 can be ignored; correspondingly, if the obtained energy relationship between the chorus dry sound and the background music does not satisfy the energy ratio condition, then execute Step S408.
  • S408 Obtain the background music of the target song, and adjust the energy relationship between the dry voice of the chorus and the background music.
  • the electronic device can target the background music of the song, and adjust the energy relationship between the dry chorus and the corresponding background music, and the energy relationship between the adjusted dry chorus and the adjusted background music satisfies Energy ratio condition.
  • the energy of the dry chorus may be too loud, causing it to overwhelm the energy of the background music.
  • the energy relationship between the adjusted chorus dry sound and the adjusted background music can satisfy the energy ratio condition, and in this way can deal with the situation that the energy of the chorus dry sound is too large .
  • the energy ratio condition can be set as the ratio between the energy value of the chorus dry sound and the energy value of the background music is less than a ratio threshold, and can also be set so that the loudness of the chorus dry sound is 3dB lower than the loudness of the background music.
  • a detailed description of the virtual sound signal can be generated based on the dry sound signal corresponding to each virtual three-dimensional sound image position in the N virtual three-dimensional space sound image positions according to the above S202 , the background music is also processed in the same way to obtain a chorus dry sound and background music with similar effects, so as to achieve a more harmonious and unified listening experience.
  • S409 Perform spectrum equalization processing on the dry chorus at a preset frequency band.
  • the electronic device may perform spectrum equalization processing on the dry chorus sound in a preset frequency band.
  • the electronic device can add spectrum notch processing in a preset frequency band to achieve the purpose of spectrum equalization.
  • the electronic device can add about 6dB spectrum notch processing near 4kHz. In this way, the dry sound of the chorus can be made to sound more natural, and the high-frequency current sound caused by spectral incongruity can be prevented.
  • the electronic device may acquire the loudness of the background music.
  • the electronic device may increase the loudness of the background music to the loudness threshold.
  • the loudness threshold may be set to -14dB, and if the loudness of the background music is less than -14dB, the electronic device may increase it to -14dB.
  • the electronic device superimposes the dry voice of the chorus and the background music to obtain the final accompaniment.
  • the accompaniment can be obtained according to any one or a combination of steps in S408-S411, and, in an embodiment, S408-S411 can be selectively executed according to actual needs, for example, there may be a chorus The energy relationship between the dry sound and the background music does not need to be adjusted, so S408 is not executed.
  • spectral equalization processing for dry chorus at preset frequency bands is also optional.
  • the two steps of S410 and S411 may not be executed.
  • the electronic device after obtaining the final accompaniment, it can be stored in the database, so that the electronic device can directly obtain the corresponding accompaniment from the database when receiving a chorus request for the same song.
  • FIG. 5 is a schematic flow chart of obtaining a virtual sound signal in an accompaniment generation method provided by an embodiment of the present application.
  • Obtaining a virtual sound signal number includes: obtaining the number of N virtual three-dimensional spatial sound image positions The virtual sound signal corresponding to the dry sound signal at each virtual three-dimensional space sound image position, and the delay processing operation is performed on each dry sound signal in the p dry sound signals, and each dry sound signal in the p dry sound signals is obtained. The corresponding delayed binaural signal.
  • the virtual sound signal corresponding to the dry sound signal at each virtual three-dimensional sound image position in the N virtual three-dimensional space sound image positions can be obtained, or the dry sound signal can be A delay processing operation is performed on each of the p dry sound signals in the sound signal set to obtain a delayed two-channel signal corresponding to each of the p dry sound signals.
  • the dry sound signal X and the dry sound signal W included in the dry sound signal set are respectively obtained by the above two methods to obtain corresponding virtual sound signals, wherein the dry sound signal X and the dry sound signal W It can be any dry sound signal in the dry sound signal set.
  • the electronic device After the electronic device acquires the dry sound signal X in the dry sound signal set, it can describe the position information of the virtual three-dimensional space sound image position according to the azimuth angle and elevation angle of the virtual three-dimensional space sound image position, namely Then according to the position information of the virtual three-dimensional space sound image position
  • the head-related transfer function corresponding to the sound image position in the virtual three-dimensional space can be found
  • the head-related transfer function corresponding to the dry sound signal X and the position of the virtual three-dimensional sound image After the convolution operation, the virtual sound signal corresponding to the dry sound signal at the position of the virtual three-dimensional sound image can be obtained.
  • the virtual sound signal is a two-channel signal, including a left channel signal Y L and a right channel signal Y R .
  • the virtual sound signal corresponding to the dry sound signal obtained in this way can enhance the stereoscopic immersion of the user.
  • the electronic device may perform a delay processing operation on the dry sound signal W.
  • the electronic device may pass the dry sound signal W through d L (1), d L (2), ..., d L (8) and d R (1), d R (2), ..., d R (8 ) a total of 16 delayers with different time parameters for delay processing, and then through the 8 delayers of d L (1), d L (2), ..., d L (8) for delay processing to obtain
  • the 8 dry sound signals are superimposed to obtain the delayed left channel signal W L corresponding to the dry sound signal W, which will pass through the 8 d R (1), d R (2), ..., d R (8)
  • the eight dry sound signals obtained by the delay processing by the delayer are superimposed to obtain the delayed right channel signal W R corresponding to the dry sound signal W.
  • the delayed binaural signal corresponding to the dry sound signal obtained in this way can simulate the binaural signal at the left or right ear of the human head
  • the above steps of obtaining the virtual sound signal, the steps of obtaining the delayed left channel signal and the delayed right channel signal may be performed simultaneously or successively, which is not limited in this application.
  • the dry sound signal in the dry sound signal set is obtained in the above two different ways to obtain the corresponding virtual sound signal, which can fully display the scene experience of the chorus and enrich the audio effect.
  • FIG. 6 is an accompaniment playback processing method provided by an embodiment of the present application.
  • the method in the embodiment of the present application can be applied to an electronic device, such as a smart phone, a tablet computer, a smart wearable device, a personal computer and other smart devices, or a server or the like.
  • the method may include, but is not limited to, the following steps:
  • S601 Display a user interface.
  • the electronic device may display a user interface, and the user interface is used to receive a user's selection instruction for a target song.
  • the selection instruction can include a selection instruction for the accompaniment mode of the target song, and the accompaniment mode of the target song can be a chorus accompaniment mode, an acoustic accompaniment mode and an artificial intelligence (Artificial Intelligence, AI) accompaniment mode, but not limited to this.
  • AI Artificial Intelligence
  • the selection instruction may be an instruction generated by the user after triggering a selection control displayed on the user interface, or may be a selection instruction generated by the user through voice control of the electronic device, for example, the user's voice control of the electronic device It may be "please use the chorus accompaniment mode to play", so that the electronic device can generate a selection instruction indicating that the accompaniment mode of the target song is the chorus accompaniment mode.
  • the electronic device may acquire the corresponding accompaniment in the chorus accompaniment mode of the target song.
  • the user interface may display a selection control for the accompaniment mode of the target song, and the selection control may include: a choral accompaniment mode selection control and an acoustic accompaniment mode selection control.
  • the electronic device can detect whether the selection operation for the chorus accompaniment mode selection control is obtained, and if so, confirm that the selection instruction received on the user interface indicates that the accompaniment mode for the target song is the chorus accompaniment mode.
  • the corresponding accompaniment in the chorus accompaniment mode is generated according to the dry chorus and background music.
  • the chorus dry sound may be generated according to a virtual sound signal set
  • the virtual sound signal set includes: each virtual three-dimensional sound image in N virtual three-dimensional sound image positions generated according to the acquired dry sound signal set
  • the virtual sound signal at the position, the multiple dry sound signals in the dry sound signal set can correspond to multiple different virtual three-dimensional space sound image positions, and each virtual three-dimensional space sound image position can be associated with one or more dry sound signals
  • the set of dry sound signals is obtained based on the dry sound signals recorded by multiple users for the target song. It should be noted that the dry sound signals for the target song by the user are entered after authorization and consent of the user.
  • the method for generating the corresponding accompaniment in the chorus accompaniment mode can be seen in the above-mentioned embodiments shown in FIGS. 2-5 , and will not be repeated here.
  • the electronic device after the electronic device obtains the corresponding accompaniment in the chorus accompaniment mode of the target song, it may play the accompaniment to the user.
  • the accompaniment corresponding to the target song can be applied to the scene of karaoke, and the user can sing while playing the accompaniment.
  • the electronic device can collect the user's singing voice and match it with the target song.
  • the accompaniment is fused and then played, allowing users to have a unique experience like being in a live concert.
  • the acquisition of the accompaniment corresponding to the target song by the electronic device may include but not limited to the following steps:
  • S701 Send an accompaniment request to a server.
  • the electronic device may send an accompaniment request to the server, and the accompaniment request may include identification information of the target song.
  • the identification information of the target song is unique information for identifying the target song, for example, the identification information may be the song name of the target song.
  • S702 Receive the chorus dry voice and background music returned by the server in response to the accompaniment request.
  • the electronic device may receive the chorus dry sound and background music returned by the server in response to the accompaniment request for the target song.
  • the server can return the dry chorus and the background music separately, or combine the dry chorus and the background music before returning, and the specific return method can be selected according to the settings of the user.
  • the electronic device may determine the target chorus dry sound segment according to the returned chorus dry sound.
  • the electronic device may display a first single-sentence interface, as shown in FIG. 7b , which displays each single sentence in the text data corresponding to the dry chorus in the order of the time play nodes of the dry chorus.
  • the user can select the target chorus dry voice segment according to each single sentence displayed on the first single sentence interface.
  • the target chorus dry voice segment may be composed of some single sentences in the chorus dry voice, or may be composed of all single sentences in the chorus dry voice, which may be specifically determined by a user's selection operation.
  • S704 Obtain an accompaniment corresponding to the target song according to the dry chorus and background music corresponding to the dry chorus segment of the target.
  • the electronic device can obtain the accompaniment corresponding to the target song according to the chorus dry sound and the background music corresponding to the target chorus dry sound segment selected by the user.
  • the electronic device may display a second single-sentence interface, as shown in FIG. 7c, the second single-sentence interface may be displayed during the playback of the accompaniment corresponding to the target song, and the sequence of playing nodes of the accompaniment is displayed in sequence.
  • the electronic device may display a second single-sentence interface, as shown in FIG. 7c, the second single-sentence interface may be displayed during the playback of the accompaniment corresponding to the target song, and the sequence of playing nodes of the accompaniment is displayed in sequence.
  • Each single sentence in the text data corresponding to the accompaniment may be displayed during the playback of the accompaniment corresponding to the target song, and the sequence of playing nodes of the accompaniment is displayed in sequence.
  • the electronic device can also detect whether the mute selection operation for the dry chorus in the accompaniment is obtained during playback, and if the user's mute selection operation for the dry chorus in the accompaniment is received, it can play at the current time The node cancels the playback of the chorus dry voice in the accompaniment, and only keeps the background music in the accompaniment.
  • the accompaniment corresponding to the target song can be acquired and played.
  • the accompaniment of the target song in the chorus accompaniment mode is generated according to the chorus dry sound and the background music, the target chorus segment can be determined from the chorus dry sound, and the chorus dry sound and the corresponding chorus dry sound of the target chorus dry sound segment can be determined.
  • the background music generates the accompaniment corresponding to the target song.
  • the user when playing the accompaniment corresponding to the target song, the user can have the experience of being at the concert site, and have an immersive feeling in the listening sense.
  • the user can also flexibly choose the accompaniment
  • the chorus dry sound in the middle enhances the fun of the accompaniment and improves the user experience.
  • FIG. 8a is a schematic structural diagram of an accompaniment generating device provided in an embodiment of the present application.
  • the device in the embodiment of the present application can be applied to an electronic device, such as a smart phone , tablet computer, smart wearable device, personal computer, server, etc., in one embodiment, as shown in Figure 8a, the generating device 80 of the accompaniment may include:
  • the acquisition unit 801 is used to acquire a dry sound signal set, which includes x dry sound signals corresponding to the target song, and x is an integer greater than 1; based on each virtual three-dimensional sound image position in the N virtual three-dimensional space
  • the dry sound signal corresponding to the spatial sound image position generates a virtual sound signal, wherein, the x dry sound signals correspond to N virtual three-dimensional space sound image positions, N is an integer greater than 1, and the N virtual three-dimensional space sound image positions are different, And each virtual three-dimensional spatial sound image position is allowed to correspond to one or more dry sound signals in the x dry sound signals.
  • the processing unit 802 is configured to combine and process each virtual sound signal in the virtual sound signal set to obtain the dry chorus sound.
  • the virtual sound signal set includes: each virtual three-dimensional sound image position in the N virtual three-dimensional sound image positions The virtual sound signal at the image position; according to the sound effect optimization rules, the dry voice of the chorus and the background music of the target song are subjected to sound effect synthesis processing to obtain the accompaniment of the target song.
  • the acquiring unit 801 can also be used to acquire an initial dry sound signal set from an audio database, and the audio database includes initial dry sound signals recorded when multiple users sing the same song; the processing unit 802 can also be used to The sound parameters of each initial dry sound signal filter the dry sound signal from the initial dry sound signal set, and the filtered dry sound signal constitutes the dry sound signal set.
  • the set of dry sound signals includes: the dry sound signal filtered out from the initial set of dry sound signals according to the pitch feature parameters and the tone quality feature parameters;
  • the pitch feature parameters include any of pitch parameters, rhythm parameters and prosody parameters.
  • the sound quality characteristic parameters include any one or more of noise parameters, energy parameters and speed parameters.
  • the N virtual three-dimensional space sound image positions include; after the horizontal plane is divided with the first preset angle as an interval on the horizontal plane, n1 virtual three-dimensional space sound image positions on the horizontal plane are obtained; After the upper plane is divided with the second preset angle on the plane, n2 virtual three-dimensional space sound image positions on the upper plane are obtained; the angle between the upper plane and the horizontal plane is the first angle threshold; After dividing the lower plane with the third preset angle on the plane, n3 virtual three-dimensional space sound image positions on the lower plane are obtained; the angle between the lower plane and the horizontal plane is the second angle threshold; wherein, n1, n2 and n3 are positive integers and their sum is equal to N.
  • the obtaining unit 801 can also be used to obtain the head-related transfer function corresponding to each virtual three-dimensional space sound image position in the N virtual three-dimensional space sound image positions; the processing unit 802 can also be used to pass through the target virtual three-dimensional space
  • the head-related transfer function corresponding to the sound image position processes the dry sound signal corresponding to the target virtual three-dimensional space sound image position to obtain the virtual sound signal at the target virtual three-dimensional space sound image position; the target virtual three-dimensional space sound image position
  • the virtual sound signal at is a binaural signal; the target virtual three-dimensional space sound image position is any virtual three-dimensional space sound image position in the N virtual three-dimensional space sound image positions.
  • the virtual sound signal set further includes: a delayed left channel signal and a delayed right channel signal corresponding to each of the p dry sound signals; Acquire p dry sound signals from the x dry sound signals included in the sound signal set, where p is a positive integer and less than or equal to x; the processing unit 802 can also be used to perform m1 on each of the p dry sound signals
  • the delay processing operation of a time parameter obtains m1 delayed dry sound signals corresponding to each of the p dry sound signals, and obtains p by superimposing m1 delayed dry sound signals corresponding to each dry sound signal
  • the delayed left channel signal corresponding to each dry sound signal in the dry sound signals, m1 is a positive integer; each dry sound signal in the p dry sound signals is subjected to a delay processing operation of m2 time parameters to obtain
  • Each dry sound signal in the p dry sound signals corresponds to m2 delayed dry sound signals, and each dry sound signal in the p dry sound signals is obtained by superimposing the m2 delayed dry
  • the acquiring unit 801 can also be used to acquire the background music of the target song, and the processing unit 802 can also be used to adjust the energy relationship between the dry chorus and the background music, the adjusted dry chorus and the adjusted The energy relationship between the background music satisfies the energy ratio condition; the accompaniment is obtained according to the adjusted dry voice of the chorus and the background music.
  • the processing unit 802 can also be used to perform spectrum equalization processing on the chorus dry sound at a preset frequency band; the acquisition unit 801 can also be used to acquire the loudness of the background music; the processing unit 802 can also be used to If the loudness of the background music is lower than the loudness threshold, the loudness of the background music is raised to the loudness threshold; the accompaniment is obtained based on the chorus dry sound processed by spectrum equalization and the background music processed by the loudness.
  • FIG. 8b is a schematic structural diagram of an accompaniment playback processing device provided in an embodiment of the present application.
  • the device in the embodiment of the present application can be applied to an electronic device.
  • Mobile phones, tablet computers, smart wearable devices, personal computers, servers, etc. in one embodiment, as shown in Figure 8b, the playback processing device 81 of the accompaniment may include:
  • Acquisition unit 811 is used for displaying user interface, and user interface is used for receiving the selection instruction to target song; If the selection instruction received in user interface indicates that the accompaniment mode of target song is chorus accompaniment mode, then obtain the accompaniment corresponding to target song .
  • the processing unit 812 is used to play the corresponding accompaniment of the target song; the accompaniment is generated according to the chorus dry sound and the background music, the chorus dry sound is generated according to a plurality of dry sound signals in the dry sound signal set, and the dry sound signal set is
  • the multiple dry sound signals correspond to multiple different virtual three-dimensional sound image positions, and the set of dry sound signals is obtained based on the dry sound signals recorded by multiple users for the target song.
  • the chorus dry sound is generated according to a virtual sound signal set
  • the virtual sound signal set includes: each virtual three-dimensional space in N virtual three-dimensional space sound image positions generated according to the acquired dry sound signal set The virtual sound signal at the sound image position; wherein, a plurality of dry sound signals in the dry sound signal set correspond to N virtual three-dimensional space sound image positions, N is an integer greater than 1, and the N virtual three-dimensional space sound image positions are different, And each virtual three-dimensional spatial sound image position is allowed to correspond to one or more dry sound signals.
  • the user interface is displayed with a selection control for the accompaniment mode of the target song, and the selection control for the accompaniment mode includes: a chorus accompaniment mode selection control, an acoustic accompaniment mode selection control; before obtaining the accompaniment corresponding to the target song, the processing unit 812 may also be used to detect whether a selection operation for the chorus accompaniment mode selection control is obtained; if so, confirm that the selection instruction received on the user interface indicates that the accompaniment mode for the target song is the chorus accompaniment mode.
  • the processing unit 812 can also be used to send an accompaniment request to the server, and the accompaniment request includes the identification information of the target song; the acquisition unit 811 can also be used to receive the chorus dry voice and background music returned by the server in response to the accompaniment request The processing unit 812 can also be used to determine the target chorus dry sound segment from the chorus dry sound; according to the chorus dry sound and background music corresponding to the target chorus dry sound segment, obtain the accompaniment corresponding to the target song.
  • the processing unit 812 can also be used to display the first single-sentence interface, and display each single sentence in the text data corresponding to the dry chorus according to the time play node order of the dry chorus; the target dry chorus segment is based on the It is determined by the single-sentence selection operation on the single-sentence interface.
  • the processing unit 812 can also be used to display the second single-sentence interface, and display each single sentence in the text data corresponding to the accompaniment according to the time playing nodes of the accompaniment; Mute selection operation; if yes, cancel the playback of the chorus dry sound at the current time playback node.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device may include: a network interface 901, a memory 902, and a processor 903.
  • the network interface 901, the memory 902, and the processor 903 are connected through one or more communication buses, and the communication buses are used to implement connection and communication between these components.
  • the network interface 901 may include a standard wired interface and a wireless interface (such as a WIFI interface).
  • the memory 902 can include a volatile memory (volatile memory), such as a random-access memory (random-access memory, RAM); the memory 902 can also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory).
  • the processor 903 may be a central processing unit (central processing unit, CPU).
  • the processor 903 may further include a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD), and the like.
  • the above-mentioned PLD may be a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) and the like.
  • the memory 902 is also used to store program instructions, and the processor 903 can also call the program instructions to implement:
  • Obtain a dry sound signal set which includes x dry sound signals corresponding to the target song, and x is an integer greater than 1;
  • the virtual sound signal set includes: a virtual sound at each virtual three-dimensional sound image position in the N virtual three-dimensional space sound image positions Signal;
  • the dry voice of the chorus and the background music of the target song are synthesized to obtain the accompaniment of the target song.
  • the processor 903 can also invoke the program instructions to realize: obtaining an initial dry sound signal set from an audio database, which includes initial dry sound signals recorded by multiple users singing the same song; The sound parameters of the initial dry sound signal filter the dry sound signal from the initial dry sound signal set, and the filtered dry sound signal constitutes the dry sound signal set.
  • the set of dry sound signals includes: the dry sound signal filtered out from the initial set of dry sound signals according to the pitch feature parameters and the tone quality feature parameters;
  • the pitch feature parameters include any of pitch parameters, rhythm parameters and prosody parameters.
  • the sound quality characteristic parameters include any one or more of noise parameters, energy parameters and speed parameters.
  • the N virtual three-dimensional space sound image positions include; after the horizontal plane is divided with the first preset angle as an interval on the horizontal plane, n1 virtual three-dimensional space sound image positions on the horizontal plane are obtained; After the upper plane is divided with the second preset angle on the plane, n2 virtual three-dimensional space sound image positions on the upper plane are obtained; the angle between the upper plane and the horizontal plane is the first angle threshold; After dividing the lower plane with the third preset angle on the plane, n3 virtual three-dimensional space sound image positions on the lower plane are obtained; the angle between the lower plane and the horizontal plane is the second angle threshold; wherein, n1, n2 and n3 are positive integers and their sum is equal to N.
  • the processor 903 can also call the program instructions to realize: obtaining the head-related transfer function corresponding to each virtual three-dimensional space sound image position in the N virtual three-dimensional space sound image positions;
  • the head-related transfer function corresponding to the sound image position processes the dry sound signal corresponding to the target virtual three-dimensional space sound image position to obtain the virtual sound signal at the target virtual three-dimensional space sound image position;
  • the target virtual three-dimensional space sound image position The virtual sound signal at is a binaural signal;
  • the target virtual three-dimensional space sound image position is any virtual three-dimensional space sound image position in the N virtual three-dimensional space sound image positions.
  • the virtual sound signal set further includes: a delayed left channel signal and a delayed right channel signal corresponding to each of the p dry sound signals; the processor 903 can also call the program instruction , to achieve: obtain p dry sound signals from x dry sound signals included in the dry sound signal set, p is a positive integer and less than or equal to x; perform m1 dry sound signals on each of the p dry sound signals
  • the delay processing operation of the time parameter obtains m1 delayed dry sound signals corresponding to each of the p dry sound signals, and p ones are obtained by superimposing the m1 delayed dry sound signals corresponding to each dry sound signal
  • the delayed left channel signal corresponding to each dry sound signal in the dry sound signal, m1 is a positive integer; each dry sound signal in the p dry sound signals is subjected to a delay processing operation of m2 time parameters to obtain p m2 delayed dry sound signals corresponding to each dry sound signal in the dry sound signals, each dry sound signal in the p dry sound signals is obtained by superimposing the m
  • the processor 903 can also call the program instructions to realize: obtaining the background music of the target song, and adjusting the energy relationship between the dry chorus and the background music, and the adjusted dry chorus and the adjusted The energy relationship between the background music satisfies the energy ratio condition; the accompaniment is obtained according to the adjusted dry voice of the chorus and the background music.
  • the processor 903 can also invoke the program instructions to realize: performing spectrum equalization processing on the dry chorus at a preset frequency band; obtaining the loudness of the background music; if the loudness of the background music is less than the loudness threshold, the The loudness of the background music is raised to the loudness threshold; the accompaniment is based on the chorus dry after spectral equalization and the background music after loudness processing.
  • the memory 902 is also used to store program instructions, and the processor 903 can also call the program instructions to implement:
  • the user interface is used to receive a selection instruction for the target song
  • the selection instruction received in the user interface indicates that the accompaniment mode of the target song is a chorus accompaniment mode, then the accompaniment corresponding to the target song is obtained;
  • the accompaniment is generated based on the chorus dry sound and background music
  • the chorus dry sound is generated based on multiple dry sound signals in the dry sound signal set
  • the multiple dry sound signals in the dry sound signal set correspond to Multiple different virtual three-dimensional space sound image positions and dry sound signal sets are obtained based on dry sound signals recorded by multiple users for the target song.
  • the chorus dry sound is generated according to a virtual sound signal set
  • the virtual sound signal set includes: each virtual three-dimensional space in N virtual three-dimensional space sound image positions generated according to the acquired dry sound signal set The virtual sound signal at the sound image position; wherein, a plurality of dry sound signals in the dry sound signal set correspond to N virtual three-dimensional space sound image positions, N is an integer greater than 1, and the N virtual three-dimensional space sound image positions are different, And each virtual three-dimensional spatial sound image position is allowed to correspond to one or more dry sound signals.
  • the user interface is displayed with a selection control of the accompaniment mode of the target song
  • the selection control of the accompaniment mode includes: a chorus accompaniment mode selection control, an acoustic accompaniment mode selection control; before acquiring the corresponding accompaniment of the target song, the processor 903 can also call the program instruction to realize: detecting whether the selection operation for the chorus accompaniment mode selection control is obtained; if so, confirming that the selection instruction received on the user interface indicates that the accompaniment mode of the target song is the chorus accompaniment mode.
  • the processor 903 can also call the program instructions to realize: sending an accompaniment request to the server, the accompaniment request including the identification information of the target song; receiving the chorus dry voice and background music returned by the server in response to the accompaniment request;
  • the target chorus dry sound segment is determined from the chorus dry sound segment; and the accompaniment corresponding to the target song is obtained according to the chorus dry sound and background music corresponding to the target chorus dry sound segment.
  • the processor 903 can also call the program instructions to realize: displaying the first single sentence interface, displaying each single sentence in the text data corresponding to the dry chorus according to the time play node order of the dry chorus; The sound segment is determined based on the single-sentence selection operation on the first single-sentence interface.
  • the processor 903 can also call the program instructions to realize: displaying the second single sentence interface, displaying each single sentence in the text data corresponding to the accompaniment according to the order of the time playing nodes of the accompaniment; The mute selection operation for the chorus dry sound in the accompaniment; if so, cancel the playback of the chorus dry sound at the current time playback node.
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the methods provided in the foregoing embodiments are implemented.
  • the embodiment of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the methods provided in the foregoing embodiments.
  • Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs.
  • the program can be stored in a computer-readable storage medium.
  • the program When the program is executed , may include the processes of the embodiments of the above-mentioned methods.
  • the above-mentioned storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Stereophonic System (AREA)

Abstract

本申请实施例公开了一种伴奏的生成方法、设备及存储介质,该伴奏的生成方法包括:获取干声信号集合,干声信号集合中包括目标歌曲对应的x个干声信号;基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,其中,x个干声信号对应N个虚拟三维空间声像位置,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与x个干声信号中的一个或者多个干声信号对应;对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声;按照声效优化规则将合唱干声与目标歌曲的背景音乐进行音效合成处理,得到目标歌曲的伴奏。采用本申请,可以实现伴奏立体环绕的效果。

Description

一种伴奏的生成方法、设备及存储介质 技术领域
本申请涉及计算机应用技术领域,尤其涉及一种伴奏的生成方法、设备及存储介质。
背景技术
随着虚拟现实(Virtual Reality)技术的发展,虚拟三维(Three-Dimensional,3D)音频技术也在逐渐优化。虚拟3D音频技术可以营造立体动态效果,将其应用到唱歌软件,能给用户带来身临其境的体验。目前,将虚拟3D音频技术应用在多人合唱的场景下时,现有的技术解决方案是将多路人声直接加权叠加,然而这种处理方法使得音效听感不够立体,导致用户体验不佳。
发明内容
本申请实施例提供了一种伴奏的生成方法、设备及存储介质,可全方位实现音频立体环绕的效果,提高用户体验。
一方面,本申请实施例提供了一种伴奏的生成方法,包括:
获取干声信号集合,干声信号集合中包括目标歌曲对应的x个干声信号,x为大于1的整数;
基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,其中,x个干声信号对应N个虚拟三维空间声像位置,N为大于1的整数,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与x个干声信号中的一个或者多个干声信号对应;
对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声,该虚拟声音信号集合包括:在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号;
按照声效优化规则将合唱干声与目标歌曲的背景音乐进行音效合成处理,得到目标歌曲的伴奏。
一方面,本申请实施例提供了一种伴奏的播放处理方法,包括:
显示用户界面,用户界面用于接收对目标歌曲的选择指令;
若在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式,则获取目标歌曲对应的伴奏;
播放目标歌曲对应的伴奏;
伴奏是根据合唱干声和背景音乐生成的,合唱干声是根据干声信号集合中的多个干声信号生成的,干声信号集合中的多个干声信号对应多个不同的虚拟三维空间声像位置,干声信号集合是根据多个用户针对目标歌曲录入的干声信号得到的。
另一方面,本申请实施例提供了一种伴奏的生成装置,包括:
获取单元,用于获取干声信号集合,干声信号集合中包括目标歌曲对应的x个干声信号,x为大于1的整数;基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,其中,x个干声信号对应N个虚拟三维空间声像位置,N为大于1的整数,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与x个干声信号中的一个或者多个干声信号对应。
处理单元,用于对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声,该虚拟声音信号集合包括:在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号;按照声效优化规则将合唱干声与目标歌曲的背景音乐进行音效合成处理,得到目标歌曲的伴奏。
另一方面,本申请实施例提供了一种伴奏的播放处理装置,包括:
获取单元,用于显示用户界面,用户界面用于接收对目标歌曲的选择指令;若在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式,则获取目标歌曲对应的伴奏。
处理单元,用于播放目标歌曲对应的伴奏;伴奏是根据合唱干声和背景音乐生成的,合唱干声是根 据干声信号集合中的多个干声信号生成的,干声信号集合中的多个干声信号对应多个不同的虚拟三维空间声像位置,干声信号集合是根据多个用户针对目标歌曲录入的干声信号得到的。
相应地,本申请实施例提供了一种电子设备,包括:存储器、处理器以及网络接口,上述处理器与上述存储器、上述网络接口相连,其中,上述网络接口用于提供网络通信功能,上述存储器用于存储程序代码,上述处理器用于调用程序代码,执行本申请实施例中的方法。
相应地,本申请实施例提供了一种计算机可读存储介质,包括:该计算机读存储介质中存储有计算机程序,该计算机程序被处理器执行时,实现本申请实施例中的方法。
相应地,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取并执行该计算机指令,使得该计算机设备执行本申请实施例中的方法。
通过实施本申请实施例,一方面,可以获取干声信号集合中目标歌曲对应的各个干声信号在不同虚拟三维空间声像位置中的虚拟声音信号,然后将各个干声信号对应的虚拟声音信号进行合并处理后得到合唱干声,最终按照声效优化规则对合唱干声和目标歌曲的背景音乐进行音效合成处理,得到目标歌曲的伴奏;另一方面,可以接收用户对目标歌曲的选择指令,当接收到的选择指令指示目标歌曲的伴奏模式为合唱伴奏模式时,可以获取并播放该目标歌曲对应的伴奏。通过这种方式,可以全方位模拟干声信号在虚拟三维空间中的声像位置,实现音频立体环绕的效果,让用户在获取到相应的伴奏时,在听感上具有身临其境的感觉,获得沉浸式的体验。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种伴奏的生成方法的应用场景示意图;
图2是本申请实施例提供的一种伴奏的生成方法的流程示意图;
图3a是本申请实施例提供的一种伴奏的生成方法中水平面、上方平面和下方平面的示意图;
图3b是本申请实施例提供的一种伴奏的生成方法中虚拟三维空间声像位置的示意图;
图3c是本申请实施例提供的一种伴奏的生成方法中以预设角度间隔对各个平面进行划分的示意图;
图4是本申请实施例提供的另一种伴奏的生成方法的流程示意图;
图5是本申请实施例提供的一种伴奏的生成方法中获取干声信号集中干声信号对应的双声道信号的流程示意图;
图6是本申请实施例提供的一种伴奏的播放处理方法的流程示意图;
图7a是本申请实施例提供的一种伴奏的播放处理方法中获取目标歌曲对应的伴奏的流程示意图;
图7b是本申请实施例提供的一种伴奏的播放处理方法中显示第一单句界面的示意图;
图7c是本申请实施例提供的一种伴奏的播放处理方法中显示第二单句界面的示意图;
图8a是本申请实施例提供的一种伴奏的生成装置的结构示意图;
图8b是本申请实施例提供的一种伴奏的播放处理装置的结构示意图;
图9是本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申 请实施例中涉及的名词和术语适用于如下的解释。
1)干声信号:本申请实施例中的干声信号是指没有伴奏音乐的纯人声信号,该干声信号为单声道声音信号,即不包括任何方向信息。
2)双声道信号:双声道就是指有两个声音通道,其原理是人们听到声音时可以根据左耳和右耳对声音相位差来判断声源的具体位置。本申请实施例中的双声道信号即指左声道声音信号和右声道声音信号。
3)头相关传递函数(Head Related Transfer Functions,HRTF):HRTF也可以称为双耳传输函数,描述了声波从声源到双耳的传输过程。HRTF是一组滤波器,采用时域卷积等价于频域卷积的原理,可以根据声源位置信息对应的HRTF数据计算得到传输到双耳的虚拟声音信号。
本申请实施例提供了一种伴奏的生成方法、设备及存储介质,通过实施本申请实施例,可以获取由同一目标歌曲对应的多个干声信号构成的干声信号集合,并获取该干声信号集合中包括的各个干声信号在不同虚拟三维空间声像位置中所对应的虚拟声音信号,然后将各个干声信号对应的虚拟声音信号进行合并处理后得到合唱干声,最终按照声效优化规则对合唱干声和目标歌曲的背景音乐进行音效合成处理,得到目标歌曲的伴奏。通过这种方式,一方面,可以全方位模拟干声信号集合中包括的各个干声信号在虚拟三维空间中的声像位置,进而获得由各个干声信号在不同虚拟三维空间声像位置处对应的虚拟声音信号合并处理后的合唱干声,实现了音频立体环绕的效果,另一方面,可以按照声效优化规则将合唱干声和背景音乐进行音效合成处理后再得到伴奏,以此增强音频效果的沉浸感,总的来说,相较于直接对各个干声信号进行叠加的处理方式,本申请可以得到更加丰富的音频处理效果,提高了用户体验。
请参见图1,图1是本申请实施例提供的一种伴奏的生成方法的应用场景示意图。如图1所示,该应用场景中可以包括智能设备100,智能设备通过有线或者无线的方式与服务器110进行通信,服务器110连接数据库120。
本申请实施例提供的伴奏的生成方法可以由诸如智能设备100等电子设备实现。例如,智能设备100在接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式时,可以获取目标歌曲对应的干声信号集合,并基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,例如该虚拟声音信号可以为双声道信号,然后将各个干声信号对应的虚拟声音信号进行合并处理后得到合唱干声,再按照声效优化规则对合唱干声与目标歌曲的背景音乐进行音效合成处理,得到伴奏。作为示例,在图1的智能设备100中示出了“合唱伴奏”的选项,用户可以通过语音控制生成合唱伴奏模式的选择指令,也可以通过触发用户界面上显示的选择控件后生成合唱伴奏模式的选择指令。其中,干声信号集合可以是智能设备100预先存储在本地的,也可以是智能设备100从服务器110或数据库120中获取的。
本申请实施例提供的伴奏的生成方法也可以由诸如服务器110等电子设备实现。例如,服务器110在接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式时,可以获取目标歌曲对应的干声信号集合,并基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,该虚拟声音信号例如可以是双声道信号,然后将各个干声信号对应的虚拟声音信号进行合并处理后得到合唱干声,再按照声效优化规则对合唱干声与目标歌曲的背景音乐进行音效合成处理,得到伴奏。其中,干声信号集合可以是服务器110预先存储在本地的,也可以是服务器110从数据库120中获取的,最终得到的伴奏可以存储在本地,也可以存储至数据库120,以在需要时调用。当然,服务器110也可以不用在接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式时时开始生成伴奏,服务器110可以在合适的时间,例如在服务器110负载较低时、或者服务器110接收到了目标歌曲的新的干声信号时、或者在接收到关于生成伴奏的管理操作时,开始执行本申请的伴奏的生成方法的相关步骤,生成伴奏。优选的,可以事先生成合唱版的伴奏,然后存储在服务器中,在生成大量歌曲的伴奏之后,用 户就可以通过智能设备100,在用户界面上选择“合唱伴奏”等方式发出对目标歌曲的选择指令,这样一来服务器110就可以响应该选择指令,从生成的大量伴奏中找到目标歌曲的合唱伴奏,并将合唱伴奏下发给智能设备100。
本申请实施例提供的伴奏的生成方法也可以由诸如智能设备100的电子设备和诸如服务器110的电子设备协同实现。例如,服务器110可以基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,该虚拟声音信号例如可以是双声道信号,然后将各个干声信号对应的虚拟声音信号进行合并处理后得到合唱干声,再按照声效优化规则对合唱干声与目标歌曲的背景音乐进行音效合成处理,得到伴奏,并将得到伴奏下发至智能设备100。
本申请实施例提供的伴奏的生成方法还可以由诸如智能设备100的电子设备和诸如服务器110的电子设备通过运行计算机程序来实现。例如,计算机程序可以是操作系统中的原生程序或软件模块,可以是本地应用程序(APP,Application),也可以是小程序,总之,该计算机程序可以是任意形式的应用程序、模块或插件,本申请实施例对此不做具体限定。
本申请实施例中所涉及的智能设备可以是个人计算机、笔记本电脑、智能手机、平板电脑、智能手表、智能语音交互设备、智能家电、车载终端和智能可穿戴设备等,但并不局限于此。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。智能设备以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例对此不做具体限定。
应当理解,图1中示出的干声信号和虚拟三维空间声像位置的数目仅仅是示意性的,根据实现需要,干声信号集合中可以具有任意数目的干声信号,虚拟三维空间中也可具有任意数目的虚拟三维声像位置。
进一步的,请参见图2,图2是本申请实施例提供的一种伴奏的生成方法的流程示意图,本申请实施例的所述方法可以应用在电子设备上,该电子设备例如可以是智能手机、平板电脑、智能可穿戴设备、个人电脑等智能设备,也可以是服务器等等。该方法可以包括但不限于如下步骤:
S201:获取干声信号集合。
在本申请实施中,电子设备可以获取干声信号集合,该干声信号集合中包括目标歌曲对应的的若干个干声信号。
在一个实施例中,干声信号集合可以是从音频数据库中获取的,该音频数据库中包括多个用户演唱同一歌曲时录入的初始干声信号,需要说明的是,该音频数据库中的初始干声信号是经用户授权同意后录入的。电子设备可以根据初始干声信号的声音参数筛选出满足条件的干声信号,以构成干声信号集合。
在一个实施例中,电子设备可以根据音准特征参数和音质特征参数从初始干声信号集合中筛选满足条件的干声信号。其中,音准特征参数可以包括音调参数、节奏参数和韵律参数中的任意一种或多种,根据音准特征参数筛选出的满足条件的干声信号具有歌曲音调与节奏与伴奏旋律一致性高的特点;音质特征参数可以包括噪音参数、能量参数和速度参数中的任意一种或多种,根据音质特征参数筛选出的满足条件的干声信号具有音频清晰、音频能量适宜和音频速度均匀等特点。本申请实施例对满足条件的干声信号的筛选顺序不做限定,例如,电子设备可以先根据音准特征参数筛选满足条件的干声信号,再从满足预设音准特征参数条件的干声信号中筛选满足预设音质特征特征参数条件的干声信号,也可以先根据音质特征参数筛选满足条件的干声信号,再从满足预设音质特征参数条件的干声信号中筛选满足预设音频特征参数条件的干声信号。以这种方式从初始干声信号集合筛选得到的干声信号所构成的干声信号集合音准和音质俱佳。
S202:基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号。
在本申请实施例中,电子设备可以模拟干声信号集合中各个干声信号在虚拟三维空间中不同的声像位置,再基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,该虚拟声音信号例如可以是双声道信号。其中,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置可以与一个或者多个干声信号对应。
在一个实施例中,N个虚拟三维空间声像位置可以通过如下方式在虚拟三维空间中模拟得到:如图3a所示,虚拟三维空间中x,y,z轴的正方向分别对应人头的正前方,左侧方及头顶方向,将虚拟三维空间划分为三个平面:水平面301、与水平面之间夹角为第一角度阈值的上方平面302和与水平面之间夹角为第二角度阈值的下方平面303。如图3b所示,虚拟三维空间中的每个虚拟三维空间声像位置包括方位角和仰角,假设用θ表示虚拟三维空间声像位置的方位角,用
Figure PCTCN2022124590-appb-000001
表示虚拟三维空间声像位置的仰角,则每个虚拟三维空间声像位置可以用
Figure PCTCN2022124590-appb-000002
表示。相应地,水平面301即为0°仰角对应的平面,上方平面即为第一角度阈值仰角对应的平面,该第一角度阈值可以为水平面以上的任意角度值,下方平面即为第二角度阈值仰角对应的平面,该第二角度阈值可以为水平面以下的任意角度值。例如,上方平面可以为40°仰角对应的平面,下方平面可以为-40°仰角对应的平面。其中,方位角θ可以用于描述虚拟三维空间声像位置在平面上依顺时针方向到目标方向线之间的夹角。进一步的,如图3c所示,在不同仰角对应的平面上以各自对应的预设角度为间隔进行划分后,可以得到多个虚拟三维空间声像位置。具体的,在水平面上以第一预设角度为间隔进行划分后,可以得到在水平面上的n1个虚拟三维空间声像位置;以第二预设角度为间隔对上方平面进行划分后,可以得到在上方平面上的n2个虚拟三维空间声像位置;以第三预设角度为间隔对下方平面进行划分后,可以得到下方平面上的n3个虚拟三维空间声像位置。例如,假设第一预设角度为10°,第二预设角度和第三预设角度都为15°,那么以10°为间隔对水平面进行划分后将可以得到36个虚拟三维空间声像位置,以15°为间隔对上方平面进行划分后将可以得到24个虚拟三维空间声像位置,以15°为间隔对下方平面进行划分后也同样可以得到24个虚拟三维空间声像位置,如此合计将可以得到84个不同的虚拟三维空间声像位置。需要说明的是,本申请实施例中的第一预设角度、第二预设角度和第三预设角度可以是预设的任意角度值,上述三个预设角度的具体数值仅用于举例并不构成对本申请实施例的限定。通过这种方式,可以在虚拟三维空间中三个不同的平面内以不同间隔的方位角虚拟出多个虚拟三维空间声像位置,实现了声源的全方位沉浸感模拟。
在一个实施例中,每个虚拟三维空间声像位置处可以与一个干声信号对应,也可以与多个干声信号对应。电子设备可以获取每个虚拟三维空间声像位置处的一个或多个干声信号对应的虚拟声音信号,具体的,电子设备可以采用如下方案得到虚拟三维空间中各个干声信号对应的虚拟声音信号:获取干声信号对应的虚拟三维空间声像位置的方位角和仰角,根据该虚拟三维空间声像位置的方位角和仰角确定该虚拟三维空间声像位置对应的头部相关传递函数HRTF,根据该虚拟三维空间声像位置的方位角和仰角与对应的HRTF数据可以计算得到虚拟空间声像位置处的干声信号对应的虚拟声音信号。例如,干声信号X对应的虚拟三维空间声像位置的方位角和仰角为
Figure PCTCN2022124590-appb-000003
该虚拟三维空间声像位置对应的HRTF数据表达式为:
Figure PCTCN2022124590-appb-000004
计算得到该虚拟三维空间声像位置处干声信号X对应的虚拟声音信号为双声道信号Y L和Y R,其中,Y L为左声道信号,Y R为右声道信号。
在一个实施例中,电子设备可以从干声信号集合中获取部分干声信号,例如可以随机获取或者按照新的筛选规则筛选出音准、音质更加出色的干声信号,对筛选的该部分干声信号分别进行延时处理操作, 得到该部分干声信号中各个干声信号对应的延时双声道信号。具体的,在对一个干声信号进行延时处理操作时,可以选取8对不同的时间参数,需要说明的是,8对时间参数表示用于获取延时左声道信号的8个时间参数和用于获取延时右声道信号的8个时间参数,总共将选取16个时间参数。例如,可以从21ms~79ms(根据一般房间脉冲响应取80ms作为混响时间选取)范围内选取不等的16个参数作为时间参数,也可以根据实际需要,在其他合理范围内随机选取16(或者其他数值)个不等的参数作为时间参数。通过这种方式,可以模拟位于人头左耳处或者人头右耳处的干声信号,让音频效果更加丰富。在一个实施例中,关于选取的方式,以及延时处理操作时的时间参数(延时时长参数)的设置,可以通过一个界面进行调节,以方便制作合唱音频的用户进行灵活配置。需要说明的是,上述获取虚拟声音信号的步骤、获取延时左声道信号和延时右声道信号的步骤可以同时执行,也可以先后执行,本申请对此不做限定。
S203:对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声。
在本申请实施例中,电子设备可以对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,以得到合唱干声。
在一个实施例中,各个干声信号对应的虚拟声音信号进行合并处理可以通过归一化处理实现,以达到将合并处理之后的虚拟声音信号的响度调整至[-1dB,1dB]的目的。其中,进行合并处理时所针对的各个虚拟声音信号包括:获取的N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的干声信号对应的虚拟声音信号,和对干声信号集合中部分干声信号进行延时处理操作后得到的各个延时双声道信号。
S203:按照声效优化规则对合唱干声与目标歌曲的背景音乐进行音效合成处理,得到伴奏。
在本申请实施例中,电子设备按照声效优化规则对合唱干声与目标歌曲的背景音乐进行音效合成处理后,可以得到最终的伴奏。该声效优化规则例如可以是调整目标歌曲的背景音乐和上述获取到的多个干声信号对应的虚拟声音信号的声音参数,该声音参数例如可以是响度、音色等等常见可调参数。
在一个实施例中,电子设备在得到合唱干声后,可以获取目标歌曲的背景音乐,若得到的合唱干声与目标歌曲的背景音乐之间的能量关系不满足能量比条件,则电子设备可以调整合唱干声与目标歌曲的背景音乐之间的能量关系。这里,能量比条件可以设置为合唱干声的能量值与目标歌曲的背景音乐的能量值之间的比例的小于一个比例阈值,也可以设置为合唱干声的响度比目标歌曲的背景音乐的响度低3dB。通过这种方式,可以避免合唱干声的能量大过目标歌曲的背景音乐,使最终得到的伴奏更加和谐。
通过实施本申请实施例,可以获得各个干声信号在虚拟三维空间中不同的虚拟三维空间声像位置处对应的虚拟声音信号,然后将各个虚拟声音信号进行合并处理后得到合唱干声,再按照声效优化规则对合唱干声与目标歌曲的背景音乐进行音效合成处理,得到目标歌曲的伴奏,以此实现了音频听感上立体环绕的效果,并增强了音频效果的沉浸感,使用户体验俱佳。
进一步的,请参见图4,图4是本申请实施例提供的另一种伴奏的生成方法的流程示意图,本申请实施例的所述方法可以应用在电子设备上,该电子设备例如可以是智能手机、平板电脑、智能可穿戴设备、个人电脑、服务器等等。该方法可以包括但不限于如下步骤:
S401:从音频数据库中获取初始干声信号集合。
在本申请实施例中,电子设备可以从音频数据库中获取初始干声信号集合,需要说明的是,该音频数据库中的初始干声信号集合是经用户授权同意后录入的。
在一个实施例中,音频数据库可以为独立设置的数据库,也可以与电子设备集成在一起,即音频数据库可以视为存储在电子设备内部。这里,初始干声信号集合是指音频数据库中用户演唱同一首歌曲时授权同意后录入的原始干声信号所构成的集合。
S402:根据各个初始干声信号的声音参数从初始干声信号集合中筛选出干声信号,筛选出的干声信 号构成干声信号集合。
在本申请实施例中,电子设备可以根据各个初始干声信号的声音参数从初始干声信号集合中筛选出满足条件的干声信号,以缩小初始干声信号集合后构成干声信号集合。
在一个实施例中,初始干声信号的声音参数可以包括初始干声信号的音准特征参数和音质特征参数,音准特征参数可以包括音调参数、节奏参数和韵律参数中的任意一种或多种,音质特征参数可以包括噪音参数、能量参数和速度参数中的任意一种或多种。通过这种方式,可以将初始干声信号集合中听感嘈杂、出现跑调、音频时间过短、音频能量低和爆音等音频效果差的初始干声信号去除,以得到音准和音质俱佳的干声信号集合。
S403:获取N个虚拟三维空间声像位置每个虚拟三维空间声像位置对应的头部相关传递函数。
在本申请实施例中,电子设备可以获取虚拟三维空间中的N个虚拟三维空间声像位置,再根据N个虚拟三维空间声像位置获取每个虚拟三维空间声像位置对应的头部相关传递函数。
在一个实施例中,虚拟三维空间中各个虚拟三维空间声像位置对应的头部相关传递函数可以预先保存至头部相关传递函数数据库中,以供电子设备根据虚拟三维空间声像位置从头部相关传递函数数据库中调用对应的头部相关传递函数。
S404:通过目标虚拟三维空间声像位置对应的头部相关传递函数,对该目标虚拟三维空间声像位置对应的干声信号进行处理,得到目标虚拟三维空间声像位置处的虚拟声音信号。
在本申请实施例中,电子设备可以根据目标虚拟三维空间声像位置对应的头部相关传递函数对目标干声信号进行处理,以得到目标干声信号在目标虚拟三维空间声像位置处对应的虚拟声音信号,该目标虚拟三维空间声像位置可以是N个虚拟三维空间声像位置中的任意一个虚拟三维空间声像位置,该目标干声信号可以为干声信号集合中的任意一个干声信号。
在一个实施例中,目标虚拟三维空间声像位置对应的头部相关传递函数为该虚拟三维空间声像位置对应的HRTF数据。根据目标虚拟三维空间声像位置的方位角和仰角可以从已知的HRTF数据中确定与该目标虚拟三维空间声像位置对应的HRTF数据,然后电子设备可以将目标干声信号与目标虚拟三维空间位置对应的HRTF数据进行卷积,得到目标干声信号在目标虚拟三维空间声像位置处对应的虚拟声音信号。
S405:从干声信号集合包括的x个干声信号中获取p个干声信号。
在本申请实施例中,电子设备可以从干声信号集合包括的x个干声信号中随机获取p个干声信号。需要说明的是,S404和S405可以同时执行,也可以先后执行,本申请并不限定。
S406:对p个干声信号中每个干声信号进行延时处理操作,得到p个干声信号中每个干声信号对应的延时左声道信号和延时右声道信号。
在本申请实施例中,电子设备可以将p个干声信号中的每一个干声信号进行m1个时间参数的延时处理操作,得到p个干声信号中的每一个干声信号对应的m1个延时干声信号,通过叠加每一个干声信号对应的m1个延时干声信号得到p个干声信号中的每一个干声信号对应的延时左声道信号,m1为正整数;再将p个干声信号中的每一个干声信号进行m2个时间参数的延时处理操作,得到p个干声信号中的每一个干声信号对应的m2个延时干声信号,通过叠加每一个干声信号对应的m2个延时干声信号得到p个干声信号中的每一个干声信号对应的延时右声道信号,m2为正整数。
在一个实施例中,电子设备可以将一个干声信号通过16个不同时间参数的延时器进行处理,以得到16个不同延时与衰减程度的干声信号,然后再将16个不同延时与衰减程度的干声信号平均分为两组,分别对每一组中不同延时与衰减程度的干声信号进行叠加,最终得到该干声信号对应的延时左声道信号和延时右声道信号。
在一个实施例中,得到该p个干声信号中每个干声信号对应的延时双声道信号之前,可以通过添加 低音增强和混响模拟模块的方式拉宽干声信号的声场,以降低通过延时处理得到的双声道信号中延时左声道信号和延时右声道信号之间的相关性。需要说明的是,S403和S404获取虚拟声音信号的步骤、S405和S406获取延时左声道信号和延时右声道信号的步骤可以同时执行,也可以先后执行,本申请对此不做限定,其中,S405和S406为可选的两个步骤。
S407:对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声。
在本申请实施例中,电子设备可以对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,以得到合唱干声。这里,虚拟声音信号集合包括电子设备通过模拟N个虚拟三维空间位置获取到的各个干声信号对应的虚拟声音信号和电子设备通过对干声信号集合中p个干声信号进行延时处理操作获取到的延时双声道信号。
在一个实施例中,虚拟声音信号集合中的各个虚拟声音信号为双声道信号,双声道信号包括左声道信号和右声道信号,对各个虚拟声音信号进行合并处理可以将左声道信号和右声道信号分开处理,左声道信号和右声道信号适用于同一个处理规则。这里,合并处理可以通过归一化处理实现,以使合并处理之后的双声道信号的响度为[-1dB,1dB]。例如,假设有1000个双声道信号,其中包括左声道信号1000个和右声道信号1000个,则将每个左声道信号分别进行归一化处理,再将1000个归一化处理之后的左声道信号的相加之和除以1000,便可以得到合并处理后的左声道信号,同理,将每个右声道信号进行归一化处理之后,再将1000个归一化处理之后的右声道信号的相加之和除以1000,便可以得到合并处理后的右声道信号。如此,便可以得到合唱干声。
在一个实施例中,得到的合唱干声与目标歌曲的背景音乐之间的能量关系可能满足能量比条件,也可能不满足能量比条件。若得到的合唱干声与背景音乐之间的能量关系满足能量比条件,则可以忽略步骤S408;相应地,若得到的合唱干声与背景音乐之间的能量关系不满足能量比条件,则执行步骤S408。
S408:获取目标歌曲的背景音乐,并调整合唱干声与背景音乐之间的能量关系。
在本申请实施例中,电子设备可以目标歌曲的背景音乐,并调整合唱干声与对应的背景音乐之间的能量关系,调整后的合唱干声与调整后的背景音乐之间的能量关系满足能量比条件。
在一个实施例中,合唱干声的能量可能过大,导致盖过背景音乐的能量。通过对合唱干声与背景音乐的调整,可以使调整后的合唱干声与调整后的背景音乐之间的能量关系满足能量比条件,以这种方式可以应对合唱干声的能量过大的情况。其中,能量比条件可以设置为合唱干声的能量值与背景音乐的能量值之间的比例的小于一个比例阈值,也可以设置为合唱干声的响度比背景音乐的响度低3dB。
在一个实施例中,获取目标歌曲的背景音乐之后,可以按照上述S202基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号的详细描述,将背景音乐也做相同处理,以得到具有相似效果的合唱干声与背景音乐,从而达到更加和谐统一的听觉体验。
S409:对合唱干声在预设频段处进行频谱均衡处理。
在本申请实施例中,电子设备可以将合唱干声在预设频段进行频谱均衡处理。
在一个实施例中,电子设备可以通过在预设频段添加频谱陷波处理,以达到频谱均衡的目的,例如,电子设备可以在4kHz附近添加6dB左右的频谱陷波处理。通过这种方式,可以使合唱干声的听感更加自然,防止出现由于频谱不协调而产生的高频电流声。
S410:获取背景音乐的响度。
在本申请实施例中,电子设备可以获取背景音乐的响度。
S411:若响度小于响度阈值,则将调整后背景音乐的响度提升至响度阈值。
在本申请实施例中,背景音乐的响度小于响度阈值,则电子设备可以将背景音乐的响度提升至响度阈值。例如,该响度阈值可以设置为-14dB,若背景音乐的响度小于-14dB,则电子设备可以将其上调至-14dB。
S412:得到伴奏。
在本申请实施例中,电子设备将合唱干声和背景音乐进行叠加,便可以得到最终的伴奏。需要说明的是,该伴奏可以是根据S408~S411中任意一个步骤或多个步骤组合得到的,并且,在一个实施例中,S408~S411可以根据实际需要选择性执行,例如,有可能存在合唱干声与背景音乐之间的能量关系不需要调整的情况,因此不执行S408。同样,对合唱干声在预设频段处进行频谱均衡处理也为可选。又比如,S410和S411两个步骤也可以不执行。在图4中仅仅指出了为了使得伴奏更加和谐自然、音质更佳所采用的技术方案,并且,在S408中进行能量关系调整、S409频谱均衡调整、以及S410和S411体现的响度调整,这三方面的先后顺序本申请也不做限定。
在一个实施中,得到最终的伴奏之后可以将其存储至数据库中,以便于电子设备在收到相同歌曲的合唱请求时可以从数据库中直接获取对应的伴奏。
通过实施本申请实施例,可以获取通过模拟N个虚拟三维空间位置得到各个干声信号对应的虚拟声音信号,还可以获取通过对各个干声信号进行延时处理操作得到的各个干声信号对应的延时双声道信号,以此丰富了合唱干声。除此之外,通过调整合唱干声与背景音乐的能量关系,使最终得到的伴奏听感更加和谐自然,让用户可以很明显地感受到合唱时的空间感和沉浸感。
进一步的,请参见图5,图5是本申请实施例提供的一种伴奏的生成方法中获取虚拟声音信号的流程示意图,获取虚拟声音信号号包括:获取N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的干声信号对应的虚拟声音信号、和对p个干声信号中每个干声信号进行延时处理操作,得到p个干声信号中每个干声信号对应的延时双声道信号。
在本申请实施例中,获取到干声信号集合之后,可以获取N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的干声信号对应的虚拟声音信号,也可以对干声信号集合中的p个干声信号中每个干声信号进行延时处理操作,得到p个干声信号中每个干声信号对应的延时双声道信号。
作为举例,如图5所示,对干声信号集合中包括的干声信号X和干声信号W分别通过上述两种方式获取对应的虚拟声音信号,其中,干声信号X和干声信号W可以是干声信号集合中的任意干声信号。具体的,电子设备在获取到干声信号集合中的干声信号X之后,可以根据虚拟三维空间声像位置的方位角和仰角描述该虚拟三维空间声像位置的位置信息,即
Figure PCTCN2022124590-appb-000005
然后根据该虚拟三维空间声像位置的位置信息
Figure PCTCN2022124590-appb-000006
可以找到该虚拟三维空间声像位置对应的头部相关传递函数
Figure PCTCN2022124590-appb-000007
将干声信号X与该虚拟三维空间声像位置对应的头部相关传递函数
Figure PCTCN2022124590-appb-000008
进行卷积运算后,便可以得到该虚拟三维空间声像位置处的干声信号对应的虚拟声音信号,该虚拟声音信号为双声道信号,其中包括左声道信号Y L和右声道信号Y R。通过这种方式得到的干声信号对应的虚拟声音信号可以增强用户的立体沉浸感。
除此之外,电子设备在获取到干声信号集合中的干声信号W之后,可以对干声信号W进行延时处理操作。作为举例,电子设备可以将干声信号W通过d L(1)、d L(2),…,d L(8)和d R(1)、d R(2),…,d R(8)总共16个不同时间参数的延时器进行延时处理,然后再将通过d L(1)、d L(2),…,d L(8)这8个延时器进行延时处理得到的8个干声信号进行叠加,以得到干声信号W对应的延时左声道信号W L,将通过d R(1)、d R(2),…,d R(8)这8个延时器进行延时处理得到的8个干声信号进行叠加,以得到干声信号W对应的延时右声道信号W R。通过这种方式得到的干声信号对应的延时双声道信号可模拟人头左耳处或者右耳处的双声道信号,丰富了用户的听感效果。
在一个实施例中,最终得到的虚拟声音信号集合包括了上述两种情况,即最后的虚拟声音信号集合为:Z={ZL,ZR},ZL=Y L+W L;ZR=Y R+W R。需要说明的是,上述获取虚拟声音信号的步骤、获取延时左声道信号和延时右声道信号的步骤可以同时执行,也可以先后执行,本申请对此不做限定。将干声信号集合中的干声信号采用上述两种不同的方式获取对应的虚拟声音信号,可以全方位展示合唱时的场景体 验,让音频效果更加丰富。
进一步的,请参见图6,图6是本申请实施例提供的一种伴奏的播放处理方法。本申请实施例的所述方法可以应用在电子设备上,该电子设备例如可以是智能手机、平板电脑、智能可穿戴设备、个人电脑等智能设备,也可以是服务器等等。该方法可以包括但不限于如下步骤:
S601:显示用户界面。
在本申请实施例中,电子设备可以显示用户界面,该用户界面用于接收用户对目标歌曲的选择指令。
在一个实施例中,该选择指令可以包括对目标歌曲的伴奏模式的选择指令,目标歌曲的伴奏模式可以是合唱伴奏模式、原声伴奏模式和人工智能(Artificial Intelligence,AI)伴奏模式,但不限于此。
在一个实施例中,该选择指令可以是用户通过触发用户界面上显示的选择控件后生成的指令,也可以是用户通过语音控制该电子设备生成的选择指令,例如,用户对电子设备的语音控制可以是“请使用合唱伴奏模式播放”,这样,电子设备便可以生成指示对目标歌曲的伴奏模式为合唱伴奏模式的选择指令。
S602:若在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式,则获取目标歌曲对应的伴奏。
在本申请实施例中,电子设备若在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式,则可以获取该目标歌曲合唱伴奏模式下对应的伴奏。
在一个实施例中,用户界面上可以显示有对目标歌曲的伴奏模式的选择控件,该选择控件可以包括:合唱伴奏模式选择控件、原声伴奏模式选择控件。在获取目标歌曲对应的伴奏之前,电子设备可以检测是否获取到针对合唱伴奏模式选择控件的选择操作,若是,则确认在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式。
在一个实施例中,该合唱伴奏模式下对应的伴奏是根据合唱干声和背景音乐生成的。其中,合唱干声可以是根据虚拟声音信号集合生成的,该虚拟声音信号集合包括:根据获取到的干声信号集合生成的在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号,该干声信号集合中的多个干声信号可以对应多个不同的虚拟三维空间声像位置,且每一个虚拟三维空间声像位置可以与一个或者多个干声信号对应,该干声信号集合是根据多个用户针对目标歌曲录入的干声信号得到的,需要说明的是,用户针对目标歌曲的干声信号是经用户授权同意后录入的。具体的,合唱伴奏模式下对应的伴奏的生成方法可见上述图2-图5所示实施例,在这里不再赘述。
S603:播放目标歌曲对应的伴奏。
在本申请实施例中,电子设备获取到目标歌曲合唱伴奏模式下对应的伴奏之后,可以向用户播放该伴奏。
在一个实施中,目标歌曲对应的伴奏可以应用于K歌的场景,用户可以在播放该伴奏时进行演唱,在经过用户授权同意的情况下,电子设备可以将采集到用户歌声并与目标歌曲对应的伴奏融合后再进行播放,让用户有身处演唱会现场般的独特体验。
在一个实施例中,如图7a所示,电子设备获取目标歌曲对应的伴奏可以包括但不限于如下步骤:
S701:向服务器发送伴奏请求。
在本申请实施例中,电子设备可以向服务器发送伴奏请求,该伴奏请求中可以包括目标歌曲的标识信息。
在一个实施例中,目标歌曲的标识信息是用于标识该目标歌曲唯一的信息,例如,该标识信息可以是目标歌曲的歌曲名称。
S702:接收服务器响应伴奏请求返回的合唱干声和背景音乐。
在本申请实施例中,电子设备可以接收服务器响应针对目标歌曲的伴奏请求返回的合唱干声和背景 音乐。
在一个实施例中,服务器可以分别返回合唱干声和背景音乐,也可以将合唱干声和背景音乐合并之后再返回,具体返回方式可以根据用户的设定进行选择。
S703:从合唱干声中确定目标合唱干声片段。
在本申请实施例中,电子设备可以根据返回的合唱干声确定目标合唱干声片段。
在一个实施例中,电子设备可以显示第一单句界面,如图7b所示,该第一单句界面按照合唱干声的时间播放节点顺序显示了合唱干声对应的文本数据中的各个单句。用户可以根据第一单句界面显示的各个单句,选择目标合唱干声片段。
在一个实施例中,该目标合唱干声片段可以是合唱干声中的部分单句构成的,也可以是合唱干声中的全部单句构成的,具体可以由用户的选择操作决定。
S704:根据目标合唱干声片段对应的合唱干声和背景音乐,得到目标歌曲对应的伴奏。
在本申请实施例中,电子设备根据用户选择的目标合唱干声片段对应的合唱干声和背景音乐,可以得到目标歌曲对应的伴奏。
在一个实施例中,电子设备可以显示第二单句界面,如图7c所示,该第二单句界面可以在该目标歌曲对应的伴奏播放过程中显示,并按照该伴奏的时间播放节点顺序显示该伴奏对应的文本数据中的各个单句。
在一个实施例中,电子设备还可以检测播放过程中是否获取到针对伴奏中合唱干声的静音选择操作,若接收到用户针对伴奏中的合唱干声的静音选择操作,则可以在当前时间播放节点取消伴奏中合唱干声的播放,仅保留伴奏中背景音乐的播放。
通过实施本申请实施例,一方面,可以接收用户对目标歌曲的选择指令,当用户的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式时,可以获取该目标歌曲对应的伴奏,并播放该伴奏;另一方面,目标歌曲在合唱伴奏模式下的伴奏是根据合唱干声和背景音乐生成的,可以从合唱干声中确定目标合唱片段,并根据目标合唱干声片段对应的合唱干声和背景音乐生成目标歌曲对应的伴奏。通过这种方式,可以在播放目标歌曲对应的伴奏时,让用户有身处演唱会现场的体验,且在听感上具有身临其境的感觉,除此之外,用户还可以灵活选择伴奏中的合唱干声,增强了伴奏的趣味性,提高了用户体验。
进一步的,请参见图8a,图8a是本申请实施例提供的一种伴奏的生成装置的结构示意图,本申请实施例的所述装置可以应用在电子设备上,该电子设备例如可以是智能手机、平板电脑、智能可穿戴设备、个人电脑、服务器等等,在一个实施例中,如图8a所示,该伴奏的生成装置80,可以包括:
获取单元801,用于获取干声信号集合,干声信号集合中包括目标歌曲对应的x个干声信号,x为大于1的整数;基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,其中,x个干声信号对应N个虚拟三维空间声像位置,N为大于1的整数,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与x个干声信号中的一个或者多个干声信号对应。
处理单元802,用于对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声,该虚拟声音信号集合包括:在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号;按照声效优化规则将合唱干声与目标歌曲的背景音乐进行音效合成处理,得到目标歌曲的伴奏。
在一个实施例中,获取单元801还可以用于从音频数据库中获取初始干声信号集合,音频数据库中包括多个用户演唱同一歌曲时录入的初始干声信号;处理单元802还可以用于根据各个初始干声信号的声音参数从初始干声信号集合中筛选出干声信号,筛选出的干声信号构成干声信号集合。
在一个实施例中,干声信号集合中包括:根据音准特征参数和音质特征参数从初始干声信号集合中筛选出的干声信号;音准特征参数包括音调参数、节奏参数和韵律参数中的任意一种或多种;音质特征 参数包括噪音参数、能量参数和速度参数中的任意一种或多种。
在一个实施例中,N个虚拟三维空间声像位置包括;在水平面上以第一预设角度为间隔对水平面进行划分后,得到的在水平面上的n1个虚拟三维空间声像位置;在上方平面上以第二预设角度为间隔对上方平面进行划分后,得到的在上方平面上的n2个虚拟三维空间声像位置;上方平面与水平面之间的夹角为第一角度阈值;在下方平面上以第三预设角度为间隔对下方平面进行划分后,得到的在下方平面上的n3个虚拟三维空间声像位置;下方平面与水平面之间的夹角为第二角度阈值;其中,n1、n2和n3为正整数且和值等于N。
在一个实施例中,获取单元801还可以用于获取N个虚拟三维空间声像位置中每个虚拟三维空间声像位置对应的头部相关传递函数;处理单元802还可用于通过目标虚拟三维空间声像位置对应的头部相关传递函数,对该目标虚拟三维空间声像位置对应的干声信号进行处理,得到目标虚拟三维空间声像位置处的虚拟声音信号;该目标虚拟三维空间声像位置处的虚拟声音信号为双声道信号;该目标虚拟三维空间声像位置为N个虚拟三维空间声像位置中的任一虚拟三维空间声像位置。
在一个实施例中,虚拟声音信号集合还包括:p个干声信号中的每一个干声信号对应的延时左声道信号和延时右声道信号;获取单元801还可以用于从干声信号集合包括的x个干声信号中获取p个干声信号,p为正整数且小于或者等于x;处理单元802还可以用于将p个干声信号中的每一个干声信号进行m1个时间参数的延时处理操作,得到p个干声信号中的每一个干声信号对应的m1个延时干声信号,通过叠加每一个干声信号对应的m1个延时干声信号得到p个干声信号中的每一个干声信号对应的延时左声道信号,m1为正整数;将p个干声信号中的每一个干声信号进行m2个时间参数的延时处理操作,得到p个干声信号中的每一个干声信号对应的m2个延时干声信号,通过叠加每一个干声信号对应的m2个延时干声信号得到p个干声信号中的每一个干声信号对应的延时右声道信号,m2为正整数。
在一个实施例中,获取单元801还可以用于获取目标歌曲的背景音乐,处理单元802还可以用于调整合唱干声与背景音乐之间的能量关系,调整后的合唱干声与调整后的背景音乐之间的能量关系满足能量比条件;伴奏是根据调整后的合唱干声与背景音乐得到伴奏得到的。
在一个实施例中,处理单元802还可以用于对合唱干声在预设频段处进行频谱均衡处理;获取单元801还可以用于获取背景音乐的响度;处理单元802还可以用于若背景音乐的响度小于响度阈值,则将背景音乐的响度提升至响度阈值;伴奏是根据经频谱均衡处理后的合唱干声和经响度处理后的背景音乐得到的。
需要说明的是,图8a对应的实施例中未提及的内容以及各个步骤的具体实现方式可参见图2-图5所示实施例以及前述内容,这里不再赘述。
进一步的,请参见图8b,图8b是本申请实施例提供的一种伴奏的播放处理装置的结构示意图,本申请实施例的所述装置可以应用在电子设备上,该电子设备例如可以是智能手机、平板电脑、智能可穿戴设备、个人电脑、服务器等等,在一个实施例中,如图8b所示,该伴奏的播放处理装置81,可以包括:
获取单元811,用于显示用户界面,用户界面用于接收对目标歌曲的选择指令;若在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式,则获取目标歌曲对应的伴奏。
处理单元812,用于播放目标歌曲对应的伴奏;伴奏是根据合唱干声和背景音乐生成的,合唱干声是根据干声信号集合中的多个干声信号生成的,干声信号集合中的多个干声信号对应多个不同的虚拟三维空间声像位置,干声信号集合是根据多个用户针对目标歌曲录入的干声信号得到的。
在一个实施例中,合唱干声是根据虚拟声音信号集合生成的,虚拟声音信号集合包括:根据获取到的干声信号集合生成的在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号;其中,干声信号集合中的多个干声信号对应N个虚拟三维空间声像位置,N为大于1的整数,N个 虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与一个或者多个干声信号对应。
在一个实施例中,用户界面上显示有对目标歌曲的伴奏模式的选择控件,伴奏模式的选择控件包括:合唱伴奏模式选择控件、原声伴奏模式选择控件;获取目标歌曲对应的伴奏之前,处理单元812还可以用于检测是否获取到针对合唱伴奏模式选择控件的选择操作;若是,则确认在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式。
在一个实施例中,处理单元812还可以用于向服务器发送伴奏请求,该伴奏请求中包括目标歌曲的标识信息;获取单元811还可以用于接收服务器响应伴奏请求返回的合唱干声和背景音乐;处理单元812还可以用于从合唱干声中确定目标合唱干声片段;根据目标合唱干声片段对应的合唱干声和背景音乐,得到目标歌曲对应的伴奏。
在一个实施例中,处理单元812还可以用于显示第一单句界面,按照合唱干声的时间播放节点顺序显示合唱干声对应的文本数据中的各个单句;目标合唱干声片段是基于在第一单句界面上的单句选择操作来确定的。
在一个实施例中,处理单元812还可以用于显示第二单句界面,按照伴奏的时间播放节点顺序显示伴奏对应的文本数据中的各个单句;检测播放过程中是否获取到针对伴奏中合唱干声的静音选择操作;若是,则在当前时间播放节点取消合唱干声的播放。
需要说明的是,图8b对应的实施例中未提及的内容以及各个步骤的具体实现方式可参见图2-图7c所示实施例以及前述内容,这里不再赘述。
进一步的,请参见图9,图9是本申请实施例提供的一种电子设备的结构示意图。该电子设备可以包括:网络接口901、存储器902和处理器903,网络接口901、存储器902和处理器903通过一条或多条通信总线连接,通信总线用于实现这些组件之间的连接通信。网络接口901可以包括标准的有线接口、无线接口(如WIFI接口)。存储器902可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储器902也可以包括非易失性存储器(non-volatile memory),例如快闪存储器(flash memory),固态硬盘(solid-state drive,SSD)等;存储器902还可以包括上述种类的存储器的组合。处理器903可以是中央处理器(central processing unit,CPU)。处理器903还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)等。上述PLD可以是现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)等。
可选的,存储器902还用于存储程序指令,处理器903还可调用该程序指令,以实现:
获取干声信号集合,干声信号集合中包括目标歌曲对应的x个干声信号,x为大于1的整数;
基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,其中,x个干声信号对应N个虚拟三维空间声像位置,N为大于1的整数,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与x个干声信号中的一个或者多个干声信号对应;
对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声,该虚拟声音信号集合包括:在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号;
按照声效优化规则将合唱干声与目标歌曲的背景音乐进行音效合成处理,得到目标歌曲的伴奏。
在一个实施例中,处理器903还可调用该程序指令,以实现:从音频数据库中获取初始干声信号集合,音频数据库中包括多个用户演唱同一歌曲时录入的初始干声信号;根据各个初始干声信号的声音参数从初始干声信号集合中筛选出干声信号,筛选出的干声信号构成干声信号集合。
在一个实施例中,干声信号集合中包括:根据音准特征参数和音质特征参数从初始干声信号集合中筛选出的干声信号;音准特征参数包括音调参数、节奏参数和韵律参数中的任意一种或多种;音质特征参数包括噪音参数、能量参数和速度参数中的任意一种或多种。
在一个实施例中,N个虚拟三维空间声像位置包括;在水平面上以第一预设角度为间隔对水平面进行划分后,得到的在水平面上的n1个虚拟三维空间声像位置;在上方平面上以第二预设角度为间隔对上方平面进行划分后,得到的在上方平面上的n2个虚拟三维空间声像位置;上方平面与水平面之间的夹角为第一角度阈值;在下方平面上以第三预设角度为间隔对下方平面进行划分后,得到的在下方平面上的n3个虚拟三维空间声像位置;下方平面与水平面之间的夹角为第二角度阈值;其中,n1、n2和n3为正整数且和值等于N。
在一个实施例中,处理器903还可调用该程序指令,以实现:获取N个虚拟三维空间声像位置中每个虚拟三维空间声像位置对应的头部相关传递函数;通过目标虚拟三维空间声像位置对应的头部相关传递函数,对该目标虚拟三维空间声像位置对应的干声信号进行处理,得到目标虚拟三维空间声像位置处的虚拟声音信号;该目标虚拟三维空间声像位置处的虚拟声音信号为双声道信号;该目标虚拟三维空间声像位置为N个虚拟三维空间声像位置中的任一虚拟三维空间声像位置。
在一个实施例中,虚拟声音信号集合还包括:p个干声信号中的每一个干声信号对应的延时左声道信号和延时右声道信号;处理器903还可调用该程序指令,以实现:从干声信号集合包括的x个干声信号中获取p个干声信号,p为正整数且小于或者等于x;将p个干声信号中的每一个干声信号进行m1个时间参数的延时处理操作,得到p个干声信号中的每一个干声信号对应的m1个延时干声信号,通过叠加每一个干声信号对应的m1个延时干声信号得到p个干声信号中的每一个干声信号对应的延时左声道信号,m1为正整数;将p个干声信号中的每一个干声信号进行m2个时间参数的延时处理操作,得到p个干声信号中的每一个干声信号对应的m2个延时干声信号,通过叠加每一个干声信号对应的m2个延时干声信号得到p个干声信号中的每一个干声信号对应的延时右声道信号,m2为正整数。
在一个实施例中,处理器903还可调用该程序指令,以实现:获取目标歌曲的背景音乐,并调整合唱干声与背景音乐之间的能量关系,调整后的合唱干声与调整后的背景音乐之间的能量关系满足能量比条件;伴奏是根据调整后的合唱干声与背景音乐得到伴奏得到的。
在一个实施例中,处理器903还可调用该程序指令,以实现:对合唱干声在预设频段处进行频谱均衡处理;获取背景音乐的响度;若背景音乐的响度小于响度阈值,则将背景音乐的响度提升至响度阈值;伴奏是根据经频谱均衡处理后的合唱干声和经响度处理后的背景音乐得到的。
可选的,存储器902还用于存储程序指令,处理器903还可调用该程序指令,以实现:
显示用户界面,用户界面用于接收对目标歌曲的选择指令;
若在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式,则获取目标歌曲对应的伴奏;
播放目标歌曲对应的伴奏;伴奏是根据合唱干声和背景音乐生成的,合唱干声是根据干声信号集合中的多个干声信号生成的,干声信号集合中的多个干声信号对应多个不同的虚拟三维空间声像位置,干声信号集合是根据多个用户针对目标歌曲录入的干声信号得到的。
在一个实施例中,合唱干声是根据虚拟声音信号集合生成的,虚拟声音信号集合包括:根据获取到的干声信号集合生成的在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号;其中,干声信号集合中的多个干声信号对应N个虚拟三维空间声像位置,N为大于1的整数,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与一个或者多个干声信号对应。
在一个实施例中,用户界面上显示有对目标歌曲的伴奏模式的选择控件,伴奏模式的选择控件包括:合唱伴奏模式选择控件、原声伴奏模式选择控件;获取目标歌曲对应的伴奏之前,处理器903还可调用该程序指令,以实现:检测是否获取到针对合唱伴奏模式选择控件的选择操作;若是,则确认在用户界面接收到的选择指令指示对目标歌曲的伴奏模式为合唱伴奏模式。
在一个实施例中,处理器903还可调用该程序指令,以实现:向服务器发送伴奏请求,该伴奏请求 中包括目标歌曲的标识信息;接收服务器响应伴奏请求返回的合唱干声和背景音乐;从合唱干声中确定目标合唱干声片段;根据目标合唱干声片段对应的合唱干声和背景音乐,得到目标歌曲对应的伴奏。
在一个实施例中,处理器903还可调用该程序指令,以实现:显示第一单句界面,按照合唱干声的时间播放节点顺序显示合唱干声对应的文本数据中的各个单句;目标合唱干声片段是基于在第一单句界面上的单句选择操作来确定的。
在一个实施例中,处理器903还可调用该程序指令,以实现:显示第二单句界面,按照伴奏的时间播放节点顺序显示伴奏对应的文本数据中的各个单句;检测播放过程中是否获取到针对伴奏中合唱干声的静音选择操作;若是,则在当前时间播放节点取消合唱干声的播放。
应当理解,本申请实施例中所描述的电子设备90解决问题的原理与有益效果与本申请图2-图7c所示实施例以及前述内容解决问题的原理和有益效果相似,为简洁描述,在这里不再赘述。
此外,本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现前述实施例提供的方法。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行前述实施例提供的方法。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。
本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,上述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请的部分实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于本申请所涵盖的范围。

Claims (16)

  1. 一种伴奏的生成方法,其特征在于,所述方法包括:
    获取干声信号集合,所述干声信号集合中包括目标歌曲对应的x个干声信号,所述x为大于1的整数;
    基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,其中,所述x个干声信号对应N个虚拟三维空间声像位置,N为大于1的整数,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与所述x个干声信号中的一个或者多个干声信号对应;
    对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声,所述虚拟声音信号集合包括:在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号;
    按照声效优化规则将所述合唱干声与所述目标歌曲的背景音乐进行音效合成处理,得到所述目标歌曲的伴奏。
  2. 如权利要求1所述的方法,其特征在于,所述获取干声信号集合,包括:
    从音频数据库中获取初始干声信号集合,所述音频数据库中包括多个用户演唱目标歌曲时录入的初始干声信号;
    根据各个初始干声信号的声音参数从所述初始干声信号集合中筛选出x个干声信号构成所述干声信号集合。
  3. 如权利要求2所述的方法,其特征在于,所述声音参数包括:音准特征参数和音质特征参数;
    其中,所述音准特征参数包括音调参数、节奏参数和韵律参数中的任意一种或多种;所述音质特征参数包括噪音参数、能量参数和速度参数中的任意一种或多种。
  4. 如权利要求1所述方法,其特征在于,N个虚拟三维空间声像位置包括;
    在水平面上以第一预设角度为间隔对所述水平面进行划分后,得到的在所述水平面上的n1个虚拟三维空间声像位置;
    在上方平面上以第二预设角度为间隔对所述上方平面进行划分后,得到的在所述上方平面上的n2个虚拟三维空间声像位置;所述上方平面与所述水平面之间的夹角为第一角度阈值;
    在下方平面上以第三预设角度为间隔对所述下方平面进行划分后,得到的在所述下方平面上的n3个虚拟三维空间声像位置;所述下方平面与所述水平面之间的夹角为第二角度阈值;
    其中,所述n1、所述n2和所述n3为正整数且和值等于所述N。
  5. 如权利要求1-4任一项所述方法,其特征在于,所述基于N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处对应的干声信号生成虚拟声音信号,包括:
    获取N个虚拟三维空间声像位置中每个虚拟三维空间声像位置对应的头部相关传递函数;
    通过目标虚拟三维空间声像位置对应的头部相关传递函数,对该目标虚拟三维空间声像位置对应的干声信号进行处理,得到所述目标虚拟三维空间声像位置处的虚拟声音信号;
    该目标虚拟三维空间声像位置处的虚拟声音信号为双声道信号;
    所述目标虚拟三维空间声像位置为所述N个虚拟三维空间声像位置中的任一虚拟三维空间声像位置。
  6. 如权利要求1-4任一项所述的方法,其特征在于,所述虚拟声音信号集合还包括:p个干声信号中的每一个干声信号对应的延时左声道信号和延时右声道信号;
    所述对虚拟声音信号集合中的各个虚拟声音信号进行合并处理,得到合唱干声之前,所述方法还包括:
    从所述干声信号集合包括的x个干声信号中获取p个干声信号,所述p为正整数且小于或者等于所述x;
    将所述p个干声信号中的每一个干声信号进行m1个时间参数的延时处理操作,得到p个干声信号中的每一个干声信号对应的m1个延时干声信号,通过叠加所述每一个干声信号对应的m1个延时干声信号得到p个干声信号中的每一个干声信号对应的延时左声道信号,所述m1为正整数;
    将所述p个干声信号中的每一个干声信号进行m2个时间参数的延时处理操作,得到p个干声信号中的每一个干声信号对应的m2个延时干声信号,通过叠加所述每一个干声信号对应的m2个延时干声信号得到p个干声信号中的每一个干声信号对应的延时右声道信号,所述m2为正整数。
  7. 如权利要求1-4任一项所述的方法,其特征在于,所述按照声效优化规则将所述合唱干声与所述目标歌曲的背景音乐进行音效合成处理,得到所述目标歌曲的伴奏,包括:
    获取所述目标歌曲的背景音乐,并调整所述合唱干声与所述背景音乐之间的能量关系,调整后的合唱干声与调整后的背景音乐之间的能量关系满足能量比条件;
    所述伴奏是根据调整后的合唱干声与背景音乐得到的。
  8. 如权利要求1-4任一项所述的方法,其特征在于,所述按照声效优化规则将所述合唱干声与所述目标歌曲的背景音乐进行音效合成处理,得到所述目标歌曲的伴奏,包括:
    对所述合唱干声在预设频段处进行频谱均衡处理;
    获取所述背景音乐的响度;
    若所述背景音乐的响度小于响度阈值,则将所述背景音乐的响度提升至响度阈值;
    所述伴奏是根据经频谱均衡处理后的合唱干声和经响度处理后的背景音乐得到的。
  9. 一种伴奏的播放处理方法,其特征在于,包括:
    显示用户界面,所述用户界面用于接收对目标歌曲的选择指令;
    若在所述用户界面接收到的选择指令指示对所述目标歌曲的伴奏模式为合唱伴奏模式,则获取所述目标歌曲对应的伴奏;
    播放所述目标歌曲对应的伴奏;
    所述伴奏是根据合唱干声和背景音乐生成的,所述合唱干声是根据干声信号集合中的多个干声信号生成的,所述干声信号集合中的多个干声信号对应多个不同的虚拟三维空间声像位置,所述干声信号集合是根据多个用户针对所述目标歌曲录入的干声信号得到的。
  10. 如权利要求9所述的方法,其特征在于,所述合唱干声是根据虚拟声音信号集合生成的,所述虚拟声音信号集合包括:根据获取到的干声信号集合生成的在N个虚拟三维空间声像位置中的每个虚拟三维空间声像位置处的虚拟声音信号;
    其中,所述干声信号集合中的多个干声信号对应N个虚拟三维空间声像位置,N为大于1的整数,N个虚拟三维空间声像位置不相同,且每一个虚拟三维空间声像位置被允许与一个或者多个干声信号对应。
  11. 如权利要求9或10所述的方法,其特征在于,所述用户界面上显示有对目标歌曲的伴奏模式的选择控件,所述伴奏模式的选择控件包括:合唱伴奏模式选择控件、原声伴奏模式选择控件;所述获取所述目标歌曲对应的伴奏之前,还包括:
    检测是否获取到针对所述合唱伴奏模式选择控件的选择操作;
    若是,则确认在所述用户界面接收到的选择指令指示对所述目标歌曲的伴奏模式为合唱伴奏模式。
  12. 如权利要求9所述的方法,其特征在于,所述获取所述目标歌曲对应的伴奏,包括:
    向服务器发送伴奏请求,所述伴奏请求中包括所述目标歌曲的标识信息;
    接收所述服务器响应所述伴奏请求返回的所述合唱干声和所述背景音乐;
    从所述合唱干声中确定目标合唱干声片段;
    根据所述目标合唱干声片段对应的合唱干声和所述背景音乐,得到所述目标歌曲对应的伴奏。
  13. 如权利要求12所述的方法,其特征在于,所述从合唱干声中确定目标合唱干声片段之前,所述方法还包括:
    显示第一单句界面,按照所述合唱干声的时间播放节点顺序显示所述合唱干声对应的文本数据中的各个单句;
    所述目标合唱干声片段是基于在所述第一单句界面上的单句选择操作来确定的。
  14. 如权利要求9或12所述的方法,其特征在于,所述播放所述目标歌曲对应的伴奏之后,所述方法包括:
    显示第二单句界面,按照所述伴奏的时间播放节点顺序显示所述伴奏对应的文本数据中的各个单句;
    检测播放过程中是否获取到针对所述伴奏中所述合唱干声的静音选择操作;
    若是,则在当前时间播放节点取消所述合唱干声的播放。
  15. 一种电子设备,其特征在于,包括:存储器、处理器以及网络接口,所述处理器与所述存储器、所述网络接口相连,其中,所述网络接口用于提供网络通信功能,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,执行权利要求1~14任一项所述的方法。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,该计算机程序被处理器执行时,实现权利要求1~14任一项所述的方法。
PCT/CN2022/124590 2021-12-14 2022-10-11 一种伴奏的生成方法、设备及存储介质 WO2023109278A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111527995.3A CN114242025A (zh) 2021-12-14 2021-12-14 一种伴奏的生成方法、设备及存储介质
CN202111527995.3 2021-12-14

Publications (1)

Publication Number Publication Date
WO2023109278A1 true WO2023109278A1 (zh) 2023-06-22

Family

ID=80756085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/124590 WO2023109278A1 (zh) 2021-12-14 2022-10-11 一种伴奏的生成方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN114242025A (zh)
WO (1) WO2023109278A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170613A (zh) * 2022-09-08 2023-05-26 腾讯音乐娱乐科技(深圳)有限公司 音频流处理方法、计算机设备和计算机程序产品

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114242025A (zh) * 2021-12-14 2022-03-25 腾讯音乐娱乐科技(深圳)有限公司 一种伴奏的生成方法、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1176446A (zh) * 1996-09-03 1998-03-18 雅马哈株式会社 带有从演唱声导入的自然起伏的合唱效果器
CN1287346A (zh) * 1999-09-03 2001-03-14 科乐美股份有限公司 唱歌伴奏系统
CN111554267A (zh) * 2020-04-23 2020-08-18 北京字节跳动网络技术有限公司 音频合成方法、装置、电子设备和计算机可读介质
CN113077771A (zh) * 2021-06-04 2021-07-06 杭州网易云音乐科技有限公司 异步合唱混音方法及装置、存储介质和电子设备
CN113192486A (zh) * 2021-04-27 2021-07-30 腾讯音乐娱乐科技(深圳)有限公司 一种合唱音频的处理方法、设备及存储介质
CN114242025A (zh) * 2021-12-14 2022-03-25 腾讯音乐娱乐科技(深圳)有限公司 一种伴奏的生成方法、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1176446A (zh) * 1996-09-03 1998-03-18 雅马哈株式会社 带有从演唱声导入的自然起伏的合唱效果器
CN1287346A (zh) * 1999-09-03 2001-03-14 科乐美股份有限公司 唱歌伴奏系统
CN111554267A (zh) * 2020-04-23 2020-08-18 北京字节跳动网络技术有限公司 音频合成方法、装置、电子设备和计算机可读介质
CN113192486A (zh) * 2021-04-27 2021-07-30 腾讯音乐娱乐科技(深圳)有限公司 一种合唱音频的处理方法、设备及存储介质
CN113077771A (zh) * 2021-06-04 2021-07-06 杭州网易云音乐科技有限公司 异步合唱混音方法及装置、存储介质和电子设备
CN114242025A (zh) * 2021-12-14 2022-03-25 腾讯音乐娱乐科技(深圳)有限公司 一种伴奏的生成方法、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170613A (zh) * 2022-09-08 2023-05-26 腾讯音乐娱乐科技(深圳)有限公司 音频流处理方法、计算机设备和计算机程序产品

Also Published As

Publication number Publication date
CN114242025A (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2023109278A1 (zh) 一种伴奏的生成方法、设备及存储介质
CN105874820B (zh) 响应于多通道音频通过使用至少一个反馈延迟网络产生双耳音频
CN105900457B (zh) 用于设计和应用数值优化的双耳房间脉冲响应的方法和系统
US8520871B2 (en) Method of and device for generating and processing parameters representing HRTFs
JP4938015B2 (ja) 3次元音声を生成する方法及び装置
US10924875B2 (en) Augmented reality platform for navigable, immersive audio experience
CN107258091A (zh) 用于耳机虚拟化的混响生成
WO2022228220A1 (zh) 一种合唱音频的处理方法、设备及存储介质
CN104768121A (zh) 响应于多通道音频通过使用至少一个反馈延迟网络产生双耳音频
US20050069143A1 (en) Filtering for spatial audio rendering
US11611840B2 (en) Three-dimensional audio systems
CN116437268B (zh) 自适应分频的环绕声上混方法、装置、设备及存储介质
Yeoward et al. Real-time binaural room modelling for augmented reality applications
JP2001186599A (ja) 音場創出装置
CA3044260A1 (en) Augmented reality platform for navigable, immersive audio experience
US11102606B1 (en) Video component in 3D audio
JP2020518159A (ja) 心理音響的なグループ化現象を有するステレオ展開
JP2004509544A (ja) 耳に近接配置されるスピーカ用の音声信号処理方法
CN114598985B (zh) 音频处理方法及装置
EP4254983A1 (en) Live data delivering method, live data delivering system, live data delivering device, live data reproducing device, and live data reproducing method
EP4254982A1 (en) Live data delivery method, live data delivery system, live data delivery device, live data reproduction device, and live data reproduction method
Pfanzagl-Cardone Comparative 3D Audio Microphone Array Tests
İçuz A subjective listening test on the preference of two different stereo microphone arrays on headphones and speakers listening setups
CN115206283A (zh) 音频处理方法、装置及计算机设备
CN115696170A (zh) 音效处理方法、音效处理装置、终端和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906030

Country of ref document: EP

Kind code of ref document: A1