CN112118481A

CN112118481A - Audio clip generation method and device, player and storage medium

Info

Publication number: CN112118481A
Application number: CN202010987378.0A
Authority: CN
Inventors: 刘廷
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2020-12-22
Anticipated expiration: 2040-09-18
Also published as: CN112118481B

Abstract

The invention provides an audio clip generation method, an audio clip generation device, a player and a storage medium, wherein the audio clip generation method comprises the following steps: acquiring an audio file; detecting the code stream change of the audio file; and generating a target audio clip by utilizing a preset code stream range and a preset time length. The problem of generate the audio frequency section that satisfies to predetermine the requirement to let the user experience the charm of whole audio frequency through the audio frequency section is solved. By detecting the code stream change of the acquired audio file, generating an audio clip meeting a preset condition by using a preset code stream range and a preset duration, storing a climax part in the audio file (such as a song) into a clip, realizing automatic and rapid identification of the climax part in the audio file, helping a user to rapidly preview a wonderful clip in audio, and enabling the user to feel the charm of the whole audio by reflecting the audio clip of the climax part.

Description

Audio clip generation method and device, player and storage medium

Technical Field

The invention belongs to the technical field of audio processing, and particularly relates to an audio clip generation method, an audio clip generation device, an audio clip player and a storage medium.

Background

With the continuous popularization of internet technology, internet multimedia music versions are gradually accepted by users in China, more and more clients enable free users to listen to a small segment of music in a trial mode, the users can decide whether to buy a single music through the small segment of music, and the users can also save a large amount of time to select music.

In the prior art, the client simply stops the user for the first 10s or 15s, and the user does not really feel the charm of the song to cut the song, so there is a strong need in the art to solve the problem of generating an audio clip meeting the preset requirement so as to make the user feel the charm of the whole audio through the audio clip.

Disclosure of Invention

The invention provides an audio clip generation method, an audio clip generation device, a player and a storage medium, which aim to solve the problem that an audio clip meeting preset requirements is generated so that a user can feel charm of the whole audio through the audio clip.

In a first aspect, the present invention provides an audio segment generating method, including:

acquiring an audio file;

detecting the code stream change of the audio file;

and generating a target audio clip by utilizing a preset code stream range and a preset time length.

According to an embodiment of the present invention, optionally, the obtaining an audio file includes:

loading an audio file;

and analyzing the loaded audio file.

According to the embodiment of the present invention, optionally, the generating the target audio segment by using the preset code stream range and the preset duration includes:

capturing a target sub-segment of the audio file, wherein the code stream range is matched with a preset code stream range;

and generating a target audio clip according to the relation between the duration of the target sub-clip and a preset duration.

According to an embodiment of the present invention, optionally, the capturing a target sub-segment in the audio file, where the code stream range matches a preset code stream range, includes:

when detecting that the code stream of the audio file reaches the upper limit value of a preset code stream range, continuously detecting the code stream by taking the time position in the audio file corresponding to the upper limit value as a starting point;

when detecting that the code stream of the audio file reaches the lower limit value of a preset code stream range, taking the time position in the audio file corresponding to the lower limit value as an end point;

and capturing a segment between the starting point and the end point as a target sub-segment of which the code stream range is matched with a preset code stream range.

According to the embodiment of the present invention, optionally, when there is one target sub-segment, the generating a target audio segment according to the relationship between the duration of the target sub-segment and the preset duration includes:

when the time length of the target sub-segment is matched with a preset time length, the target sub-segment is a target audio segment;

and when the time length of the target sub-segment is not matched with the preset time length, intercepting the audio segment with the time length from the starting point as the preset time length as the target audio segment.

According to an embodiment of the present invention, optionally, when there are a plurality of target sub-segments, the generating a target audio segment according to a relationship between a duration of the target sub-segment and a preset duration includes:

when only one target sub-segment with the duration matched with the preset duration exists in the plurality of target sub-segments, the target sub-segment is a target audio segment;

and when a plurality of target sub-segments with the time length matched with the preset time length exist in the plurality of target sub-segments, determining one of the target sub-segments as a target audio segment.

According to the embodiment of the present invention, optionally, when there are a plurality of target sub-segments, the generating a target audio segment according to a relationship between a duration of the target sub-segment and a preset duration further includes:

when the time lengths of the plurality of target sub-segments are not matched with the preset time length, determining the time length of each target sub-segment;

determining a target sub-segment with the minimum time difference with a preset time;

and intercepting the audio segment with the preset time length from the determined starting point of the target sub-segment as the target audio segment.

According to an embodiment of the present invention, optionally, the determining the target sub-segment whose duration is the smallest difference from the preset duration includes:

and determining the target sub-segment with the minimum time difference with the preset time length from the target sub-segments with the time lengths larger than the preset time length.

In a second aspect, the present invention provides an audio clip generating apparatus, comprising:

the acquisition module is used for acquiring an audio file;

the detection module is used for detecting the code stream change of the audio file;

and the generating module is used for generating the target audio clip by utilizing the preset code stream range and the preset time length.

In a third aspect, the present invention provides a player comprising: a memory having stored thereon a computer program which, when executed by the processor, implements the audio clip generation method according to the first aspect.

In a fourth aspect, the present invention provides a storage medium comprising: the storage medium has stored thereon a computer program which, when executed by one or more processors, implements the audio clip generation method according to the first aspect.

Compared with the prior art, the invention at least has the following beneficial effects:

the method has the advantages that the change of the code stream of the acquired audio file is detected, the audio clip meeting the preset condition is generated by utilizing the preset code stream range and the preset duration, the climax part in the audio file (such as a song) is stored into a clip, the climax part in the audio file is automatically and quickly identified, the user is helped to quickly preview the wonderful clip in the audio, the user can feel the charm of the whole audio by reflecting the audio clip of the climax part, the user can be attracted to buy a single song, the user can efficiently find favorite music in massive network audio, correct selection is made, and a large amount of time and money are saved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flowchart of an audio segment generating method according to an embodiment of the present invention;

fig. 2 is a block diagram of an audio segment generating apparatus according to a second embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Example one

As shown in fig. 1, the present embodiment provides an audio segment generating method, which can be applied to a player, and in particular, can be applied to a network media player having an audio editing function, and the method includes the following steps:

and step S110, acquiring an audio file.

Optionally, the step of obtaining the audio file may include the following sub-steps:

and step S110-1, loading an audio file.

Specifically, the audio file is a Variable Bit Rate VBR (Variable Bit-Rate) audio file, and a complete Variable Bit Rate audio file is loaded to the player, for example, the audio file may be a charged song acquired through a network. The higher the bit rate, the better the sound quality, generally speaking, the higher the pitch of an audio segment, the more space is needed to store, the higher the bit rate, the VBR coding technique selects the bit rate most suitable for this frame for each audio frame, for audio frames with lower pitch, the bit rate will be lower, the data size will be smaller, for audio frames with higher pitch, the bit rate will be higher, and the data size will be larger. Therefore, on the premise of not losing audio quality, the storage space of audio data is saved, the file size of mp3 is further compressed, and the climax of music can be reflected through the bit rate.

And step S110-2, analyzing the loaded audio file.

And step S120, detecting the code stream change of the audio file.

And S130, generating a target audio clip by utilizing a preset code stream range and a preset time length.

The preset code stream range is [ Min, Max ], wherein Min represents the lower limit value of the preset code stream range, and Max represents the upper limit value of the preset code stream range. The preset time period is L, and may be set according to the actual application, for example, 10 seconds, 15 seconds, and the like. The method comprises the steps of generating an audio clip meeting preset requirements by presetting a code stream range of a climax part of an audio file and presetting the duration of a target audio clip, so that a user can feel the charm of the whole audio through the audio clip, and selecting whether to purchase or play the audio file according to the feeling when the audio clip is played through a player.

As a preferred implementation manner, the step S130 of generating the target audio segment by using a preset code stream range and a preset time duration may include the following sub-steps:

and S130-1, capturing a target sub-segment of the audio file, wherein the code stream range is matched with a preset code stream range.

Specifically, an audio clip that may meet preset requirements is obtained by capturing a target sub-clip that meets a preset code stream range from an audio file.

As a preferred implementation manner, capturing a target sub-segment in an audio file, where a code stream range matches a preset code stream range, includes:

firstly, when detecting that the code stream of the audio file reaches the upper limit value Max of the preset code stream range, taking the time position in the audio file corresponding to the upper limit value Max as a starting point K1, and continuously detecting the code stream.

Secondly, when the code stream of the audio file is detected to reach the lower limit value Min of the preset code stream range, the time position in the audio file corresponding to the lower limit value Min is taken as an end point K2.

And finally, capturing a segment between the starting point K1 and the end point K2 as a target sub-segment of which the code stream range is matched with a preset code stream range. The step S130-2 is continuously executed to generate the audio clip meeting the preset requirement.

And S130-2, generating a target audio clip according to the relation between the duration of the target sub-clip and the preset duration.

Specifically, the duration of the target sub-segment is K2-K1, the duration of the captured target sub-segment is compared with a preset duration, a target audio segment meeting the preset duration is generated and stored, so that when the user needs to listen to the audio file on trial, the audio segment is directly played for the user to enjoy, and the user can further decide whether to purchase the audio file according to the enjoying experience of the audio segment.

According to the number of the captured target sub-segments and the duration of the target sub-segments, the target audio segments can be generated in different situations:

in the first case: when the number of the target sub-segments is one, generating a target audio segment according to the relation between the duration of the target sub-segments and the preset duration, wherein the method comprises the following steps:

(1) and when the time length of the target sub-segment is matched with the preset time length, the target sub-segment is the target audio segment.

Specifically, the duration of the target sub-segment matches the preset duration, which is the duration of the target sub-segment is equal to the preset duration, for example, the preset duration is 15 seconds, the starting point K1 is 1 minute 35 seconds of the audio file, the ending point K2 is 1 minute 50 seconds of the audio file, then the duration of the captured target sub-segment is 15 seconds, and the duration of the target sub-segment is equal to the preset duration, which are matched.

In some cases, the matching between the duration of the target sub-segment and the preset duration may also be that the difference duration does not exceed a preset value, for example, if the preset value is 1 second, the starting point K1 is 1 minute 35 seconds of the audio file, the ending point K2 is 1 minute 49 seconds of the audio file, then the duration of the captured target sub-segment is 14 seconds, and if the difference between the duration of the target sub-segment and the preset duration is 1 second, the two are still considered to be matched.

(2) And when the time length of the target sub-segment is not matched with the preset time length, intercepting the audio segment with the time length of the preset time length L from the starting point K1 as the target audio segment.

It should be noted that when the duration of the target sub-segment is not equal to the preset duration, there may be two situations: one is that the duration of the target sub-segment is greater than the preset duration, at this time, the audio segment whose duration from the starting point K1 is the preset duration L is intercepted from the audio file, that is, the part of the preset duration L is intercepted from the grabbed target sub-segment as the target audio segment. The other is that the duration of the target sub-segment is less than the preset duration, at this time, the length of the target sub-segment is complemented to the preset length L, that is, a segment with the duration of the preset length L is taken as the target audio segment from the start point K1 of the grabbed target sub-segment.

In this case, the number of the target sub-segments is one, and only the first target sub-segment meeting the preset condition is captured in the process of detecting the change of the code stream of the audio file, or only one target sub-segment meeting the preset condition is captured finally by detecting the change of the code stream of the whole audio file, which is not limited herein.

If only the first target sub-segment meeting the preset condition is captured in the process of detecting the code stream change of the audio file, the detection efficiency can be improved, and the target audio segment meeting the preset condition can be quickly generated. If the code stream change of the whole audio file is detected to capture the target sub-segments meeting the preset conditions, the target sub-segments meeting the preset conditions in the whole audio file can be captured, and when a plurality of the captured target sub-segments exist, the target sub-segments meeting the preset requirements are selected from the captured target sub-segments through the second condition, so that the more accurate target audio segments are obtained.

In the second case: when a plurality of target sub-segments are available, generating a target audio segment according to the relation between the duration of the target sub-segments and the preset duration, wherein the method comprises the following steps:

Specifically, when a plurality of target sub-segments with duration matching the preset duration exist, it is indicated that a plurality of climax audio segments may exist in the current audio file, and one of the target sub-segments may be randomly selected as the target audio segment. Of course, since the climax of the audio file is often located in the middle of the whole audio file, one of the matched target sub-segments with the middle time position may be selected as the target audio segment by default.

In this embodiment, when the durations of the plurality of captured target sub-segments do not match the preset duration, a target audio segment needs to be generated according to the difference between the duration of each target sub-segment and the preset duration, the target sub-segment with the smallest difference between the duration and the preset duration is determined, and an audio segment with the preset duration is captured from the audio file from the starting point of the target sub-segment and serves as the target audio segment. Therefore, generating the target audio segment according to the relationship between the duration of the target sub-segment and the preset duration, further comprising:

Certainly, in order to further accurately reflect the climax content of the audio file, the target sub-segment with the minimum difference between the duration and the preset duration is determined, and the target sub-segment with the minimum difference between the duration and the preset duration may be determined from the target sub-segments with the duration greater than the preset duration, so that the content in the audio segment intercepted from the audio file completely belongs to the climax content of the audio file, and a better feeling is brought to a user.

Example two

As shown in fig. 2, the present embodiment provides an audio segment generating apparatus, which includes the following modules:

the obtaining module 210 is configured to obtain an audio file.

The detecting module 220 is configured to detect a code stream change of the audio file.

The generating module 230 is configured to generate a target audio segment by using a preset code stream range and a preset duration.

It is understood that the obtaining module 210 can be configured to perform the step S110 in the first embodiment, the detecting module 220 can be configured to perform the step S120 in the first embodiment, and the generating module 230 can be configured to perform the step S130 in the first embodiment. The details of the specific steps are described in the first embodiment, and are not described herein again.

Further, the generating module 230 is configured to generate the target audio segment by using a preset code stream range and a preset duration, and includes:

capturing a target sub-segment of which the code stream range is matched with a preset code stream range in the audio file;

and generating a target audio clip according to the relation between the duration of the target sub-clip and the preset duration.

Specifically, when the generating module 230 generates the target audio segment according to the relationship between the duration of the target sub-segment and the preset duration, the number of the captured target sub-segments and the duration of the target sub-segment may be divided into different cases to generate the target audio segment:

(2) And when the time length of the target sub-segment is not matched with the preset time length, intercepting the audio segment of which the time length from the starting point is L in the preset time as the target audio segment.

EXAMPLE III

The present invention provides a player, comprising: the audio clip generation device comprises a memory and a processor, wherein the memory stores a computer program which realizes the audio clip generation method provided by the embodiment when being executed by the processor.

In this embodiment, the player may be a network media player, and the Processor may be implemented by an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and is configured to execute the audio segment generating method in the foregoing embodiment. The method implemented when the computer program running on the processor is executed may refer to a specific embodiment of the audio segment generating method provided in the first embodiment of the present invention, and details are not described here again.

It will be appreciated that the player may also include multimedia components, input/output (I/O) interfaces, and communication components.

The multimedia components may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in a memory or transmitted through a communication component. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface provides an interface between the processor and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component is used for carrying out wired or wireless communication with other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G or 4G, or a combination of one or more of them, so that the corresponding Communication component may include: Wi-Fi module, bluetooth module, NFC module.

Example four

The present invention provides a storage medium comprising: the storage medium has stored thereon a computer program which, when executed by one or more processors, implements the audio clip generation method according to the first aspect.

In this embodiment, the storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. The system and method embodiments described above are merely illustrative.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for generating an audio clip, comprising:

acquiring an audio file;

detecting the code stream change of the audio file;

2. The method of claim 1, wherein the obtaining an audio file comprises:

loading an audio file;

and analyzing the loaded audio file.

3. The method of claim 1, wherein the generating the target audio segment by using the preset code stream range and the preset duration comprises:

4. The method of claim 3, wherein the capturing of the target sub-segment of the audio file with the code stream range matching a preset code stream range comprises:

5. The method as claimed in claim 3 or 4, wherein when there is one target sub-segment, the generating a target audio segment according to the relationship between the duration of the target sub-segment and the preset duration comprises:

6. The method as claimed in claim 3 or 4, wherein when there are a plurality of target sub-segments, the generating a target audio segment according to the relationship between the duration of the target sub-segment and the preset duration comprises:

7. The method as claimed in claim 3 or 4, wherein when there are a plurality of target sub-segments, the generating a target audio segment according to the relationship between the duration of the target sub-segment and the preset duration further comprises:

8. The method of claim 7, wherein the determining the target sub-segment with the smallest difference between the duration and the preset duration comprises:

9. An audio clip generation apparatus, comprising:

the acquisition module is used for acquiring an audio file;

10. A player, comprising: a memory having stored thereon a computer program which, when executed by the processor, implements the audio clip generating method of any of claims 1 to 8.

11. A storage medium, comprising: the storage medium has stored thereon a computer program which, when executed by one or more processors, implements the audio clip generation method of any one of claims 1 to 8.