CN112037739B

CN112037739B - Data processing method and device and electronic equipment

Info

Publication number: CN112037739B
Application number: CN202010907240.5A
Authority: CN
Inventors: 徐东; 鲁霄
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2024-02-27
Anticipated expiration: 2040-09-01
Also published as: CN112037739A

Abstract

The embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a storage medium. The data processing method specifically comprises the following steps: selecting a first audio and a second audio from the serial burning candidate pool; acquiring music retrieval information of a first audio and music retrieval information of a second audio, wherein the music retrieval information comprises beat information and time information; if the similarity between the beat information of the first audio and the beat information of the second audio is larger than a similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio; and overlapping the smoothed first audio and the smoothed second audio to form target audio. According to the embodiment of the invention, the audio frequency with similar beat information is selected for audio frequency overlapping, and the audio frequency to be overlapped is subjected to smoothing processing, so that high-quality serial burning audio frequency can be efficiently generated.

Description

Data processing method and device and electronic equipment

Technical Field

The present invention relates to the field of internet technologies, and in particular, to a data processing method, a data processing device, and an electronic device.

Background

In the present age of the continuous development of internet technology, music becomes an indispensable part of people's daily life. With the increase of modern life pace, the need for music diversification is growing, for example, in addition to the listening experience from the beginning to the end of the audio, the need for audio composed of different styles and types of audio segments, i.e. audio in a crossfire, is growing.

At present, the serial burning audio is mainly generated in a manual manufacturing mode, and the serial burning audio manufactured by manual manufacturing may have great style difference between the audio, so that the formed serial burning audio is abrupt, meanwhile, the connection between the audio and the audio may not be smooth enough, and the experience of a user on the serial burning audio is poor. Therefore, how to produce high-quality serial audio is a current urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a storage medium, which can generate high-quality serial burning audio and enhance user experience by selecting audio with similar beat information to overlap and respectively smoothing the audio needing to be overlapped.

In a first aspect, an embodiment of the present invention provides a data processing method, where the data processing method includes:

selecting a first audio and a second audio from the serial burning candidate pool;

acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information;

if the similarity between the beat information of the first audio and the beat information of the second audio is larger than a similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio;

and overlapping the smoothed first audio and the smoothed second audio to form target audio.

In a second aspect, an embodiment of the present invention proposes a data processing apparatus, including:

the selection unit is used for selecting the first audio and the second audio from the serial burning candidate pool;

an acquisition unit configured to acquire music retrieval information of the first audio and music retrieval information of the second audio, the music retrieval information including beat information and time information;

The processing unit is used for smoothing the first audio according to the time information of the first audio and smoothing the second audio according to the time information of the second audio if the similarity between the beat information of the first audio and the beat information of the second audio is larger than a similarity threshold;

and the processing unit is also used for carrying out overlapping processing on the first audio after the smoothing processing and the second audio after the smoothing processing to form target audio.

In a third aspect, an embodiment of the present invention provides an electronic device, where the device includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is configured to store a computer program, and the computer program includes program instructions, and the processor is configured to invoke the program instructions to perform the operations related to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium storing computer program instructions for use with an electronic device, comprising instructions for executing the program according to the first aspect.

In the embodiment of the invention, the first audio and the second audio with similar beat information are selected from the serial burning candidate pool, and the first audio and the second audio are subjected to smoothing processing and overlapping processing, so that the first audio and the second audio can be serial burned to obtain the target audio with higher quality; since the first audio and the second audio are audio with similar beats, the target audio formed after overlapping the first audio and the second audio can realize stable transition in beats; in addition, in the process of overlapping the first audio and the second audio, the first audio and the second audio are respectively subjected to smoothing treatment, so that smooth audio turning can be realized for the serial burning audio, and a better playing effect is presented.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present invention;

FIG. 3a is a schematic diagram of a smoothing process for a first audio and a second audio according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of another smoothing process for the first audio and the second audio provided by an embodiment of the present invention;

FIG. 4 is an audio schematic of a target audio provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of smoothing a first audio, a second audio, and a third audio according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the descriptions of "first," "second," and the like in the embodiments of the present invention are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a technical feature defining "first", "second" may include at least one such feature, either explicitly or implicitly.

In the prior art, mainly through the manual production cluster fever audio, the mode of manual production cluster fever audio often is that two sections or multistage need carry out the audio of cluster fever and splice the processing, because the audio of random choice, so style difference and music characteristic between audio and the audio probably are bigger, and beat information difference between the audio is very big in particular, and this can lead to the cluster fever audio that the production produced more abrupt, and is not enough natural to join between audio and the audio and be smooth enough, lead to the user to the experience feel relatively poor to the cluster fever audio.

Based on the analysis, the embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a storage medium. The data processing method specifically comprises the following steps: selecting a first audio and a second audio from the serial burning candidate pool; acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information; if the similarity between the beat information of the first audio and the beat information of the second audio is larger than a similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio; and overlapping the smoothed first audio and the smoothed second audio to form target audio. By implementing the data processing method provided by the embodiment of the invention, the high-quality audio is selected to be put into the serial burning candidate pool, the audio with similar beat information is selected from the serial burning candidate pool to be overlapped, and the audio to be overlapped is smoothed, so that the high-quality serial burning audio can be generated.

The electronic device according to the embodiment of the present application may be a device that provides voice and/or data connectivity to a user, for example, a handheld device with a wireless connection function, an in-vehicle device, or the like. The electronic device may also be other processing devices connected to the wireless modem. The electronic device may communicate with a radio access network (Radio Access Network, RAN). The electronic Device may also be referred to as a wireless Terminal, subscriber Unit (Subscriber Unit), subscriber Station (Subscriber Station), mobile Station (Mobile Station), mobile Station (Mobile), remote Station (Remote Station), access Point (Access Point), remote Terminal (Remote Terminal), access Terminal (Access Terminal), user Terminal (User Terminal), user Agent (User Agent), user Device (User Equipment), or User Equipment (UE), among others. The electronic equipment may be mobile terminals such as mobile telephones (or "cellular" telephones) and computers with mobile terminals, which may be, for example, portable, pocket, hand-held, computer-built-in or car-mounted mobile devices which exchange voice and/or data with radio access networks. For example, the electronic device may also be a personal communication services (Personal Communication Service, PCS) phone, cordless phone, session initiation protocol (Session Initiation Protocol, SIP) phone, wireless local loop (Wireless Local Loop, WLL) station, personal digital assistant (Personal Digital Assistant, PDA), or the like. Common electronic devices include, for example: a mobile phone, tablet computer, notebook computer, palm computer, mobile internet device (Mobile Internet Device, MID), vehicle, roadside device, aircraft, wearable device, such as a smart watch, smart bracelet, pedometer, etc., but embodiments of the application are not limited thereto.

In order to better understand the data processing method provided by the embodiment of the present invention, a system architecture diagram to which the embodiment of the present invention is applied is first described below. Referring to FIG. 1, FIG. 1 is a schematic diagram illustrating a data processing system according to an embodiment of the present invention. As shown in fig. 1, the system architecture diagram at least includes: an electronic device 110, a server 120. The electronic device 110 and the server 120 may communicate data over a network, including but not limited to a local area network, a wide area network, a wireless communication network, and the like.

In one possible implementation, server 120 stores an audio information library in which a large amount of audio (alternatively pooled audio) is recorded, and also records audio information corresponding to each audio, which may include, but is not limited to: audio name, audio format, number of channels, lyric information, audio trending index, etc. Where audio may refer to songs, pieces of songs, music, pieces of music, and so forth. The audio name refers to the name of a certain audio release; the audio format refers to what manner the audio of the audio is stored in, such as a lossless format FLAC, or a lossy format Mp3, etc.; the number of channels means that the audio is stored in several channels, such as mono, bi, etc.; lyric information refers to audio content of the audio; the audio popularity index refers to the measure of the popularity of the audio, and generally uses the playing amount as an index, and can also be measured by indexes such as the search amount, the purchase amount of users and the like.

In one possible implementation, the electronic device 110 downloads the first audio and the second audio from the audio information base in the server 120 through a network, the electronic device 110 performs feature extraction on the first audio through a computer algorithm to obtain music retrieval information of the first audio, and the electronic device 110 performs feature extraction on the second audio through the computer algorithm to obtain music retrieval information of the second audio, where the music retrieval information may include beat information and time information; if the beat information of the first audio is similar to the beat information of the second audio, the electronic device 110 performs smoothing processing on the first audio and the second audio, and performs overlapping processing on the smoothed first audio and the smoothed second audio to form a serial audio.

In one possible implementation, the electronic device 110 uploads the generated burn-in audio to the server 120 through the network, and after the server 120 receives the burn-in audio sent by the electronic device, the server 120 stores the burn-in audio locally, so that other electronic devices download the burn-in audio from the server 120 through the network.

It may be understood that the schematic diagram of the system architecture described in the embodiment of the present invention is for more clearly describing the technical solution of the embodiment of the present invention, and does not constitute a limitation on the technical solution provided by the embodiment of the present invention, and those skilled in the art can know that, with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided by the embodiment of the present invention is equally applicable to similar technical problems.

Referring to fig. 2, fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present invention. The method is applied to an electronic device, and as shown in fig. 2, the data processing method may include steps S210 to S240. Wherein:

step S210: and selecting the first audio and the second audio from the serial burning candidate pool.

In one possible implementation, the electronic device may also construct the burn-in candidate pool before the electronic device selects the first audio and the second audio from the burn-in candidate pool. The construction mode of the series firing candidate pool is as follows: the electronic equipment acquires any audio from a pre-created audio information base as an alternative pool entering audio; the electronic equipment performs feature extraction on the candidate pool entering audio to obtain feature information of the candidate pool entering audio, and determines whether to place the candidate pool entering audio into the serial burning candidate pool according to the feature information. It should be noted that, the first audio and the second audio are arbitrary candidate in the serial burn candidate pool, and the processing mode of the electronic device on the first audio is the same as the processing mode of the electronic device on the second audio, and the processing mode of the electronic device on the arbitrary candidate pool is described in detail below as an example.

The feature information mentioned in the embodiment of the invention comprises direct information and music retrieval information (Music information retrieval, mir). Wherein the direct information of the audio includes, but is not limited to: audio name, audio format, number of channels, lyric information, audio popularity index; music retrieval information includes, but is not limited to: beat information (BPM), down Beat, and time information, which may include a chorus start time and chorus end time.

For example, the electronic device performs feature extraction on the candidate in-pool audio to obtain direct information of the candidate in-pool audio, which may specifically include: first, the electronic device numbers a large amount of input audio. Specifically, the user uploads 1000 audio frequencies to the server through the user terminal, and then the electronic device obtains the 1000 audio frequencies from the server, and numbers the 1000 audio frequencies as 1,2,3, … and 1000 respectively. Then, the electronic equipment queries an audio information base from a server through a network to obtain information such as an audio name, an audio format, the number of sound channels, lyric information, an audio popularity index and the like corresponding to each audio, determines the information such as the audio name, the audio format, the number of sound channels, the lyric information, the audio popularity index and the like as direct information of the audio, and marks the direct information as info D. Wherein, the audio name refers to the name of the audio release; the audio format refers to the manner in which the audio of the audio is stored, including but not limited to: lossy format MP3, lossless format FLAC, lossless format WMA (Windows Media Audio), MIDI (Musical Instrument Digital Interface), twovQ (Transform-domain Weighted Interleave Vector Quantization), AMR (Adaptive Multi-Rate); the number of channels means that the audio is stored in several channels, such as mono, bi, etc.; lyric information refers to audio content of the audio; the audio popularity index is used for measuring the popularity of the audio, and the audio popularity index is generally measured by taking the playing quantity as an index.

For example, the electronic device performs feature extraction on the candidate audio to obtain music retrieval information of the candidate audio, which may specifically include: the electronic equipment obtains music retrieval information of the alternative incoming pool audio through an information extraction technology based on a neural network, and the music retrieval information is marked as info M. The music retrieval information may include, but is not limited to: beat information (BPM), down Beat, and time information, which may include a chorus start time and chorus end time. Specifically, the BPM of an audio frequency refers to how many beats the audio frequency contains per minute, and is used for measuring the speed of music; downbeat generally refers to the first beat in a music beat, i.e., the strong beat. The BPM of the audio may be obtained by a BPM detection analysis tool (e.g., a mixmeisterbpmamalyzer tool) and the downbean of the audio may be obtained by analyzing the audio, e.g., the downbean of the audio may be obtained by a technician analyzing the audio.

In one possible implementation manner, the electronic device determining the chorus start time of the alternative incoming audio and the chorus end time of the audio mainly includes four steps: 1. firstly, carrying out conversion between time domain and frequency domain on the audio in the alternative pool through constant Q conversion (constant Q transform, CQT) to obtain frequency spectrum information of the audio in the alternative pool, wherein the CQT can give the energy of each note frequency in the audio in the alternative pool, which is beneficial to characterizing the subjective hearing characteristics of the audio in the alternative pool; 2. then taking the alternative pool-in audio as an input parameter of a recurrent neural network (Convolutional Recurrent Neural Network, CRNN) model, extracting the alternative pool-in audio through the CRNN, and outputting audio fragments of a plurality of time points of the alternative pool-in audio; 3. and traversing audio fragments of different time points contained in the alternative pool-entering audio through a filter to obtain a predicted probability value of each time point, determining a time point corresponding to a probably maximum value in a plurality of predicted probability values as a chorus starting time, and determining a time point corresponding to a minimum probability value in the plurality of predicted probability values as a chorus ending time.

In one possible implementation, the characteristic information of the candidate pool audio includes an audio name, an audio format, and a channel number of the candidate pool audio. And the electronic equipment judges whether the alternative in-pool audio meets the quality requirement according to the characteristic information of the alternative in-pool audio, and if the alternative in-pool audio meets the quality requirement, the alternative in-pool audio is put into the serial burning candidate pool.

In one possible implementation manner, the electronic device determining whether the candidate incoming audio meets the quality requirement according to the feature information of the candidate incoming audio specifically includes: judging whether the audio names of the audio in the alternative pool are unique, judging whether the audio formats of the audio in the alternative pool are preset formats, and judging whether the number of channels of the audio in the alternative pool is the preset number of channels; if the audio name of the alternative in-pool audio is unique, the audio format of the alternative in-pool audio is a preset format, and the number of channels of the alternative in-pool audio is a preset number of channels, judging that the alternative in-pool audio meets the quality requirement; if the audio names of the alternative pool audio are not unique, or the audio format of the alternative pool audio is not a preset format, or the number of channels of the alternative pool audio is not a preset number of channels, judging that the alternative pool audio does not meet the quality requirement. The preset format may be a lossless format, and the preset number of channels may be two or more channels.

In one possible implementation, the electronic device obtains an alternate in-pool audio from the audio information base, which may be any audio in the audio information base. Judging whether the audio name of the audio in the alternative pool is unique; by audio name only, it is meant that only one of the audio names exists in the audio information base; by audio names are not unique, it is meant that there are multiple identical audio names in the audio information base. If the audio names of the audio in the alternative pool are not unique, all the audios which are the same as the audio names of the audio in the alternative pool are obtained from an audio information base, and the audio with the highest audio popularity index is selected from the audios to be redetermined as the audio in the alternative pool; alternatively, one audio is randomly selected from the audios to be redetermined as an alternative audio to be pooled, or the audio with the highest audio quality is selected from the audios to be redetermined as an alternative audio to be pooled. After the audio frequency of the alternative entering pool is redetermined or the audio frequency name of the alternative entering pool is judged to be unique, the electronic equipment performs feature extraction on the audio frequency of the alternative entering pool to obtain feature information of the audio frequency of the alternative entering pool, judges whether the audio format of the audio frequency of the alternative entering pool is a preset format or not according to the feature information of the audio frequency of the alternative entering pool, and judges whether the number of channels of the audio frequency of the alternative entering pool is a preset number of channels or not; if the audio format of the alternative pool entering audio is a preset format and the number of channels of the alternative pool entering audio is a preset number of channels, the alternative pool entering audio is put into the serial burning candidate pool.

In this way, the electronic device screens a large amount of audios through the preset screening rules, so that audios with higher quality and favored by users can be obtained, and the candidate audio in the pool is put into the candidate serial burning pool, so that materials with higher quality are provided for the serial burning audios.

In one possible implementation, after the electronic device performs feature extraction on the feature information of the candidate in-pool audio, the electronic device counts the feature information of the candidate in-pool audio. If the direct information contained in the characteristic information of the alternative pool audio is missing, the alternative pool audio cannot be put into the serial burning candidate pool, another audio is selected from the audio information base again as a new alternative pool audio, and the judgment process is executed again. And if the music retrieval information contained in the characteristic information of the alternative pool audio is missing, the music retrieval information of the alternative pool audio is acquired again. Specific ways of re-acquiring music retrieval information of the alternative pool audio may include: firstly, randomly intercepting audio fragments of the audio in the alternative pool to obtain a plurality of audio fragments of the audio in the alternative pool; then, respectively calculating the music retrieval information corresponding to each audio fragment to obtain a calculation result corresponding to each audio fragment, wherein the mode of calculating the music retrieval information of each audio fragment is the same as the step executed by the electronic equipment in the music retrieval information of the alternative pool audio, which is obtained by extracting the characteristics of the alternative pool audio by the electronic equipment, and the steps are not repeated here; and finally, carrying out weighted operation on the plurality of calculation results to obtain new music retrieval information of the alternative pool-entering audio.

Step S220: music retrieval information of the first audio and music retrieval information of the second audio are acquired, and the music retrieval information comprises beat information and time information.

Specifically, the electronic device acquires beat information and time information of the first audio from the serial burning candidate pool, and the electronic device acquires beat information and time information of the second audio from the serial burning candidate pool. For example, the electronic device obtains the BPM, downbeat, chorus start time, and chorus end time of the first audio, and the electronic device obtains the BPM, downbeat, chorus start time, and chorus end time of the second audio.

Step S230: if the similarity between the beat information of the first audio and the beat information of the second audio is greater than a similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio.

In one possible implementation manner, the electronic device acquires first audio from the serial-burning candidate pool, acquires beat information of the first audio, and selects audio with similarity greater than a similarity threshold value from the serial-burning candidate pool as second audio. The similarity threshold may be a degree threshold preset by a user. If more than one audio with similarity greater than the similarity threshold is selected from the candidate pool, the selection is performed according to the audio popularity index corresponding to each audio, specifically, the audio with the highest audio popularity index may be selected as the second audio, or a certain audio may be randomly selected as the second audio, which is not limited in the present invention.

For example, the electronic device randomly acquires the BPM1 of the first audio from the serial-burning candidate pool and acquires the first beat information of the first audio, and the electronic device selects the audio with the similarity of more than 80% with the BPM1 of the first audio from the serial-burning candidate pool as the second audio. If 3 audios with the similarity of more than 80% with BPM1 in the serial burning candidate pool are respectively audio 1, audio 2 and audio 3. The electronic device obtains the audio popularity index of the 3 audios, and supposing that the audio popularity index corresponding to the audio 1 is 80%, the audio popularity index corresponding to the audio 1 is 85% and the audio popularity index corresponding to the audio 1 is 90%, the electronic device takes the audio with the highest audio popularity index (i.e. the audio 3) as the second audio.

In one possible implementation, among the first audio and the second audio, the electronic device determines an audio with a preceding playing order and an audio with a following playing order; the fade-out processing is performed on the ending part of the audio with the preceding playing order, and the fade-in processing is performed on the starting part of the audio with the following playing order. If the electronic device determines that the first audio is the audio with the front playing order, the second audio is the audio with the rear playing order. The electronic device fades out the ending portion of the first audio and fades in the beginning portion of the second audio. The ending part of the first audio refers to an audio fragment of a first time period traced back from the end time of the chorus in the first audio; the beginning portion of the second audio refers to an audio clip of a second period of time that extends from a chorus start time of the second audio. The first time period and the second time period can be set according to actual conditions, and the first time period and the second time period can be the same or different. Specifically, the ending part of the first audio refers to an audio segment included in a period of time (for example, 10 seconds) from the end time of the chorus of the first audio (for example, 3 minutes and 50 seconds), that is, from 3 minutes and 40 seconds of the first audio to 3 minutes and 50 seconds of the first audio, and the audio segment corresponding to the 10 seconds is the ending part of the first audio. The beginning part of the second audio refers to an audio segment contained in a period (for example, 10 seconds) extending backward from the beginning time (for example, 0 min 00 seconds) of the second audio, namely, from 0 min 00 seconds of the second audio to 0 min 10 seconds of the second audio, and the audio segment corresponding to the 10 seconds is the beginning part of the second audio.

For example, as shown in fig. 3a, fig. 3a is a schematic diagram of smoothing the first audio and the second audio according to an embodiment of the present invention. The dashed circle at the upper part of the bidirectional arrow in fig. 3a is the fade-out end time of the first audio, where the fade-out end time may specifically be the finish time of the google of the first audio, the solid circle at the lower part of the bidirectional arrow is the fade-in end time of the second audio, and the fade-in end time may specifically be the finish time of the google of the second audio. The electronic device can fade-out the ending part of the first audio through a cosine function, and the electronic device can fade-in the starting part of the second audio through a sine function.

In one possible implementation manner, a playing order of the first audio and a playing order of the second audio are obtained, where the electronic device may determine the playing order of the first audio and the second audio according to an audio duration, for example, the audio duration of the first audio is longer than the audio duration of the second audio, and then the electronic device determines that the first audio is an audio with a preceding playing order and the second audio is an audio with a following playing order. For another example, the electronic device determines, according to the audio style, the playing order of the first audio and the second audio, for example, the audio style of the first audio is style 1, the audio style of the second audio is style 2, the electronic device determines, according to the specified rule, that the audio corresponding to style 1 is the audio with the preceding playing order, and the audio corresponding to style 2 is the audio with the following playing order, that is, the electronic device determines that the first audio is the audio with the preceding playing order, and the second audio is the audio with the following playing order. If the playing sequence of the second audio is higher than that of the first audio, the electronic device performs fade-out processing on the ending part of the second audio and performs fade-in processing on the starting part of the first audio. The ending part of the second audio refers to an audio fragment of a third time period from the end time of the chorus in the second audio; the beginning portion of the first audio refers to an audio clip of a fourth period of time that continues from the beginning of the chorus of the first audio.

For example, as shown in fig. 3b, fig. 3b is a schematic diagram of another smoothing process for the first audio and the second audio according to an embodiment of the present invention. The dashed circle at the upper part of the bidirectional arrow in fig. 3b is the fade-out end time of the second audio, where the fade-out end time may specifically be the finish time of the google of the second audio, the solid circle at the lower part of the bidirectional arrow is the fade-in end time of the first audio, and the fade-in end time may specifically be the start time of the google of the first audio. The electronic device can fade-out the ending part of the second audio through a cosine function, and the electronic device can fade-in the starting part of the first audio through a sine function.

It should be noted that, in addition to the fade-in and fade-out processing method, the electronic device may also perform the smoothing processing on the first audio and the second audio in a seamless transition manner, and specifically, the electronic device may perform overlap-add on the first audio and the second audio, that is, the electronic device superimposes n (n is a positive integer) beats of an ending portion of the first audio with n beats of a beginning portion of the second audio, so that seamless connection may be implemented on the first audio and the second audio.

By means of the method, the electronic device performs smoothing on the first audio and smoothing on the second audio, so that the connection between the first audio and the second audio is smoother, and user experience is enhanced.

Step S240: and overlapping the smoothed first audio and the smoothed second audio to form target audio.

In one possible implementation, the end time point of the fade-in process of the first audio corresponds to a re-beat in the first audio (which may be referred to as a first re-beat for distinction from a re-beat of the second audio), and the end time point of the fade-in process of the second audio corresponds to a re-beat in the second audio (which may be referred to as a second re-beat for distinction from a re-beat of the first audio). The electronic equipment aligns the ending time point of the fade-in processing of the first audio with the ending time point of the fade-in processing of the second audio; and performing characteristic superposition on the first beat and the second beat to obtain target audio, namely the serial burning audio.

In one possible implementation, the end time point of the fade-in process of the second audio corresponds to a re-beat in the second audio (which may be referred to as a third re-beat for distinction from the re-beat of the second audio), and the end time point of the fade-in process of the first audio corresponds to a re-beat in the first audio (which may be referred to as a fourth re-beat for distinction from the re-beat of the first audio). The electronic equipment aligns the ending time point of the fade-in processing of the second audio with the ending time point of the fade-in processing of the first audio; and performing characteristic superposition on the third beat and the fourth beat to obtain target audio, namely the serial burning audio.

As shown in fig. 4, fig. 4 is an audio schematic diagram of a target audio according to an embodiment of the present invention. The electronic device performs overlapping operation on the smoothed first audio and the smoothed second audio, specifically, aligns an ending time point of the fade-out processing of the first audio with an ending time point of the fade-in processing of the second audio, and vertically adds a time sequence corresponding to the first audio and a time sequence corresponding to the second audio to obtain the target audio. The target audio may be referred to as a burn-in audio.

In one possible implementation manner, the electronic device performs overlapping processing on the first audio after smoothing processing and the second audio after smoothing processing to form a target audio, and then selects a third audio from a candidate pool, and the electronic device acquires music retrieval information of the target audio and music retrieval information of the third audio, wherein the music retrieval information comprises beat information and time information; if the similarity between the beat information of the target audio and the beat information of the third audio is larger than a similarity threshold, smoothing the target audio according to the time information of the target audio, and smoothing the third audio according to the time information of the third audio; and overlapping the target audio after the smoothing treatment and the third audio after the smoothing treatment to form new serial burning audio.

In particular, when the electronic device takes the target audio as a new first audio and takes the third audio as a new second audio, the electronic device re-executes the step of generating the target audio according to the first audio and the second audio. As shown in fig. 5, after the electronic device generates the target audio according to the first audio and the second audio, the electronic device selects the third audio from the audio candidate pool, and then the electronic device combines the target audio and the third audio to generate a new burn-in audio, and so on, the electronic device can splice any multiple segments of audio to obtain the burn-in audio with any duration. For example, the number of audio pieces spliced may be a threshold value set in advance, so that if the number of audio pieces included in the burn-in audio reaches the threshold value, the splicing is stopped. Or it may be determined whether to terminate the splicing process according to other conditions, for example, the duration of the spliced serial audio may be a preset threshold, so if the duration of the spliced serial audio reaches the threshold, the splicing is stopped.

According to the data processing method provided by the embodiment of the invention, firstly, high-quality audio is screened out through the preset screening rule and is put into the audio candidate pool, then the electronic equipment selects the audio with the bpm close to each other in the audio candidate pool and respectively carries out smoothing treatment on the audio, and the audio after the smoothing treatment is overlapped to obtain the serial burning audio. By the method, high-quality serial burning audio can be generated, the song listening experience of users is enriched, and compared with the manual serial burning audio production, the serial burning audio production can be completed more efficiently, so that the consumption of manpower and material resources is reduced, and the economic cost is reduced.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the invention. The information processing apparatus is configured to perform steps performed by the electronic device in the method embodiments corresponding to fig. 2 to 5, and the information processing apparatus may include:

a selection unit 610, configured to select a first audio and a second audio from the candidate pool of serial burn;

an acquiring unit 620, configured to acquire music retrieval information of the first audio and music retrieval information of the second audio, where the music retrieval information includes beat information and time information;

a processing unit 630, configured to, if a similarity between beat information of the first audio and beat information of the second audio is greater than a similarity threshold, perform smoothing processing on the first audio according to time information of the first audio, and perform smoothing processing on the second audio according to time information of the second audio;

the processing unit 630 is further configured to perform overlapping processing on the smoothed first audio and the smoothed second audio to form a target audio.

In one possible implementation, the step of constructing the candidate pool is performed before the selection unit 610 selects the first audio and the second audio from the candidate pool; the construction step of the series firing candidate pool comprises the following steps:

The acquiring unit 620 acquires the candidate pooled audio from the audio information library constructed in advance;

the processing unit 630 performs feature extraction on the candidate in-pool audio to obtain feature information of the candidate in-pool audio;

the processing unit 630 judges whether the alternative in-pool audio meets the quality requirement according to the characteristic information of the alternative in-pool audio, and if the alternative in-pool audio meets the quality requirement, the alternative in-pool audio is put into the serial burning candidate pool;

in a possible implementation manner, the alternative in-pool audio comprises a plurality of time points, the characteristic information of the alternative in-pool audio comprises music retrieval information of the alternative in-pool audio, and the time information in the music retrieval information comprises a chorus start time and a chorus end time;

the processing unit 630 performs feature extraction on the candidate in-pool audio to obtain feature information of the candidate in-pool audio, including:

obtaining a prediction probability value corresponding to each time point contained in the alternative pool entering audio;

determining a time point corresponding to the maximum probability value in a plurality of predicted probability values as the starting time of the chorus;

and determining a time point corresponding to the minimum probability value in the plurality of predicted probability values as the finishing moment of the chorus.

In one possible implementation manner, the characteristic information of the alternative in-pool audio includes an audio name, an audio format and a channel number of the alternative in-pool audio;

the processing unit 630 determines whether the candidate incoming audio meets the quality requirement according to the feature information of the candidate incoming audio, including:

judging whether the audio name of the alternative pool entering audio is unique, judging whether the audio format of the alternative pool entering audio is a preset format, and judging whether the number of channels of the alternative pool entering audio is a preset number of channels;

if the audio name of the alternative pool entering audio is unique, the audio format of the alternative pool entering audio is a preset format, and the number of channels of the alternative pool entering audio is a preset number of channels, judging that the alternative pool entering audio meets the quality requirement.

In one possible implementation, the processing unit 630 performs smoothing processing on the first audio according to time information of the first audio, and performs smoothing processing on the second audio according to time information of the second audio, including:

determining the audio with the front playing sequence and the audio with the rear playing sequence in the first audio and the second audio;

The method comprises the steps of performing fade-out processing on the ending part of the audio with the preceding playing sequence, and performing fade-in processing on the starting part of the audio with the following playing sequence;

the end part of the audio with the preceding playing sequence refers to an audio fragment of a first time period from the end time of the chorus in the audio with the preceding playing sequence; the beginning part of the audio with the subsequent play sequence refers to an audio clip of a second period of time that extends from the beginning moment of the chorus of the audio with the subsequent play sequence.

In one possible implementation, the end time point of the fade-out process of the preceding audio in the play order corresponds to a re-beat in the preceding audio in the play order, and the end time point of the fade-in process of the following audio in the play order corresponds to a re-beat in the following audio in the play order.

In one possible implementation, the processing unit 630 performs an overlapping process on the smoothed first audio and the smoothed second audio to form a target audio, including:

aligning an ending time point of the fade-out processing of the audio with the preceding playing sequence with an ending time point of the fade-in processing of the audio with the following playing sequence;

And vertically adding the aligned audio with the front playing sequence and the audio with the rear playing sequence to obtain target audio.

In one possible implementation, the processing unit 630 vertically adds the aligned audio with the preceding playing order and the audio with the following playing order to obtain the target audio, where the processing unit includes:

and carrying out characteristic superposition on the re-beat in the audio with the front playing sequence and the re-beat in the audio with the rear playing sequence to obtain target audio.

According to the data processing device provided by the embodiment of the invention, firstly, high-quality audio is screened out through the preset screening rule and is put into the audio candidate pool, then the electronic equipment selects the audio with the bpm close to each other in the audio candidate pool and respectively carries out smoothing treatment on the audio, and the audio after the smoothing treatment is overlapped to obtain the serial burning audio. By the method, high-quality serial burning audio can be generated, the song listening experience of users is enriched, and compared with the manual serial burning audio production, the serial burning audio production can be completed more efficiently, so that the consumption of manpower and material resources is reduced, and the economic cost is reduced.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device is configured to execute steps executed by the electronic device in an embodiment of a method corresponding to fig. 2 to 5, and the electronic device includes: one or more processors 710; one or more input devices 720, one or more output devices 730, and a memory 740. The processor 710, the input device 720, the output device 730, and the memory 740 are connected by a bus 750. The memory 720 is used for storing a computer program comprising program instructions, and the processor 710 is used for executing the program instructions stored in the memory 740, performing the following operations:

Selecting a first audio and a second audio from the serial burning candidate pool; acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information; if the similarity between the beat information of the first audio and the beat information of the second audio is larger than a similarity threshold, smoothing the first audio according to the time information of the first audio, and smoothing the second audio according to the time information of the second audio; and overlapping the smoothed first audio and the smoothed second audio to form target audio.

In one possible implementation, the method further includes a step of constructing a burn-in candidate pool before the processor 710 selects the first audio and the second audio from the burn-in candidate pool; the construction step of the series firing candidate pool comprises the following steps:

obtaining alternative pool entering audio from a pre-constructed audio information base;

extracting the characteristics of the alternative pool-entering audio to obtain the characteristic information of the alternative pool-entering audio;

judging whether the alternative pool entering audio meets the quality requirement or not according to the characteristic information of the alternative pool entering audio, and if the alternative pool entering audio meets the quality requirement, placing the alternative pool entering audio into the serial burning candidate pool.

the processor 710 performs feature extraction on the candidate in-pool audio to obtain feature information of the candidate in-pool audio, including:

processor 710 determines, according to the feature information of the candidate incoming audio, whether the candidate incoming audio meets a quality requirement, including:

In one possible implementation, the processor 710 performs smoothing on the first audio according to time information of the first audio, and performs smoothing on the second audio according to time information of the second audio, including:

In one possible implementation, the processor 710 performs an overlapping process on the smoothed first audio and the smoothed second audio to form a target audio, including:

In one possible implementation, the processor 710 vertically adds the aligned audio with the preceding play order and the audio with the following play order to obtain a target audio, including:

According to the electronic equipment provided by the embodiment of the invention, firstly, high-quality audio is screened out through the preset screening rule and is put into the audio candidate pool, then, the electronic equipment selects the audio with the bpm close to each other in the audio candidate pool, performs smoothing treatment on the audio, and overlaps the audio after the smoothing treatment to obtain the serial burning audio. By the method, high-quality serial burning audio can be generated, the song listening experience of users is enriched, and compared with the manual serial burning audio production, the serial burning audio production can be completed more efficiently, so that the consumption of manpower and material resources is reduced, and the economic cost is reduced.

An embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program includes program instructions, where the program instructions, when executed by a processor, may perform the steps performed by the electronic device in the foregoing embodiment.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by computer programs stored on a computer readable storage medium, which when executed, may include embodiments of the above-described data processing methods. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

The above disclosure is only a few examples of the present invention, and it is not intended to limit the scope of the present invention, but it is understood by those skilled in the art that all or a part of the above embodiments may be implemented and equivalents thereof may be modified according to the scope of the present invention.

Claims

1. A method of data processing, the method comprising:

acquiring alternative pool entering audio, and acquiring prediction probability values respectively corresponding to all time points contained in the alternative pool entering audio; the prediction probability value is obtained by traversing audio fragments of the alternative pool-entering audio at a plurality of time points, and the audio fragments of the time points are obtained by extracting the alternative pool-entering audio;

determining a time point corresponding to the maximum probability value in a plurality of predicted probability values as the starting moment of the sub song of the alternative pool-entering audio; and determining a time point corresponding to the minimum probability value in the plurality of predicted probability values as the finish time of the sub song of the alternative pool-entering audio;

putting the alternative pool audio containing the starting time and the ending time of the chorus into a serial burning candidate pool;

acquiring music retrieval information of the first audio and music retrieval information of the second audio, wherein the music retrieval information comprises beat information and time information; the time information comprises a chorus start time and a chorus end time;

2. The method of claim 1, further comprising the step of constructing a burn-in candidate pool before selecting the first audio and the second audio from the burn-in candidate pool; the construction step of the series firing candidate pool comprises the following steps:

3. The method of claim 2, wherein the characteristic information of the candidate pool audio includes an audio name, an audio format, and a channel number of the candidate pool audio; judging whether the alternative incoming audio meets the quality requirement according to the characteristic information of the alternative incoming audio comprises the following steps:

4. The method of claim 1, wherein smoothing the first audio based on the time information of the first audio and smoothing the second audio based on the time information of the second audio comprises:

5. The method of claim 4, wherein the ending time point of the fade-out process of the preceding audio in play order corresponds to a re-beat in the preceding audio in play order, and wherein the ending time point of the fade-in process of the following audio in play order corresponds to a re-beat in the following audio in play order.

6. The method of claim 5, wherein the overlapping the smoothed first audio and the smoothed second audio to form the target audio comprises:

7. The method of claim 6, wherein the vertically adding the aligned audio with the preceding play order and the audio with the following play order to obtain the target audio comprises:

8. A data processing apparatus, comprising:

the device comprises an acquisition unit, a prediction probability value acquisition unit and a prediction probability value generation unit, wherein the acquisition unit is used for acquiring alternative pool entering audio and acquiring prediction probability values respectively corresponding to all time points contained in the alternative pool entering audio; the prediction probability value is obtained by traversing audio fragments of the alternative pool-entering audio at a plurality of time points, and the audio fragments of the time points are obtained by extracting the alternative pool-entering audio;

The processing unit is used for determining a time point corresponding to the maximum probability value in the plurality of predicted probability values as the starting moment of the sub song of the alternative pool-entering audio; and determining a time point corresponding to the minimum probability value in the plurality of predicted probability values as the finish time of the sub song of the alternative pool-entering audio;

the processing unit is further configured to put the candidate pool audio including the start time of the chorus and the end time of the chorus into a serial burning candidate pool;

the selection unit is used for selecting a first audio and a second audio from the serial burning candidate pool;

the acquisition unit is further configured to acquire music retrieval information of the first audio and music retrieval information of the second audio, where the music retrieval information includes beat information and time information; the time information comprises a chorus start time and a chorus end time;

the processing unit is further configured to, if a similarity between beat information of the first audio and beat information of the second audio is greater than a similarity threshold, perform smoothing processing on the first audio according to time information of the first audio, and perform smoothing processing on the second audio according to time information of the second audio;

The processing unit is further used for performing overlapping processing on the first audio after the smoothing processing and the second audio after the smoothing processing to form target audio.

9. An electronic device comprising a memory and a processor, the memory storing a set of program code, the processor invoking the program code stored in the memory for performing the data processing method of any one of claims 1-7.