CN109119101B

CN109119101B - Audio data processing method and device and mobile terminal

Info

Publication number: CN109119101B
Application number: CN201811102589.0A
Authority: CN
Inventors: 彭林峰
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2021-04-06
Anticipated expiration: 2038-09-20
Also published as: CN109119101A

Abstract

The embodiment of the application discloses a method and a device for processing audio data and a mobile terminal, wherein the method comprises the following steps: the method comprises the steps of obtaining audio data to be set as ring tones, setting the audio data to be in an editable state if the duration of the target audio data is larger than a preset duration threshold and historical information for setting the ring tones based on the audio data does not exist, and intercepting the audio data to be recommended from the audio data based on editing operation of the audio data. By the method, the processing efficiency of the audio data can be improved, and the user experience is improved.

Description

Audio data processing method and device and mobile terminal

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing audio data, and a mobile terminal.

Background

With the rapid popularization of mobile terminals, mobile terminals mainly based on mobile phones have become necessities for life and work of people, but mobile phones with high utilization rate have single brand and model and small difference of mobile phone ring tones, which causes the situation of confusion in places with dense crowds.

At present, most mobile phones provide a certain number of ring tones for users to choose from, and the users can use the downloaded audio data as ring tones of the mobile phones. Since the user often dislikes or rarely uses the ring tone provided in the mobile phone, the downloaded audio data is used as the ring tone of the mobile phone, which is long, for this reason, the user can intercept a section of audio data from the ring tone as the ring tone of the mobile phone by means of manual interception, such as a climax section of a certain song.

However, in the process of manually capturing the audio data, a user needs to open the audio data, browse the audio data, and select a required segment (such as a climax segment) in a manual continuous adjustment manner, so that the user may consume a large amount of time and cost in the process of processing the audio data, the processing efficiency of the audio data is low, and the user experience is poor.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for processing audio data and a mobile terminal, so as to solve the problems that in the prior art, the time cost is high and the user experience is poor in the process of processing the audio data.

In order to solve the above technical problem, the embodiment of the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a method for processing audio data, where the method includes:

acquiring audio data to be set as a ringtone;

if the duration of the audio data is greater than a preset duration threshold value and no history information for setting a ringtone based on the audio data exists, setting the audio data to be in an editable state;

and intercepting audio data to be recommended from the audio data based on editing operation on the audio data. Optionally, the intercepting, based on the editing operation on the audio data, audio data to be recommended from the audio data includes:

receiving an editing instruction of the audio data; extracting an audio peak area from the audio data according to the volume information and the audio information of the audio data;

and intercepting audio data to be recommended from the audio data according to the audio peak area.

Optionally, the extracting an audio peak region from the audio data according to the volume information and the audio information of the audio data includes:

dividing the audio data into multiple sections to obtain multiple sections of sub-audio data;

respectively determining candidate peak value areas from each section of sub-audio data, wherein the volume value contained in the candidate peak value areas and the audio value corresponding to the audio information are both greater than a preset selection threshold, and the duration of the candidate peak value areas is greater than a preset duration threshold;

and extracting candidate peak regions meeting the condition of a preset time length range from the candidate peak regions to serve as the audio peak regions.

respectively obtaining the sum of the volume value corresponding to the volume information of each section of sub-audio data and the audio numerical value corresponding to the audio information;

selecting sub-audio data meeting a preset segment number condition from the multiple segments of sub-audio data as the candidate peak area according to the numerical value of the sum corresponding to each segment of sub-audio data;

Optionally, the extracting, as the audio peak region, a candidate peak region that satisfies a predetermined time length range condition from the candidate peak regions includes:

and if the time length of each candidate peak area in the candidate peak areas is in a preset time length range, acquiring the candidate peak areas with the time lengths in the preset time length range from the candidate peak areas as the audio peak areas.

And if the time length of each candidate peak area in the candidate peak areas is not in the preset time length range, acquiring a plurality of adjacent candidate peak areas with the total time length in the preset time length range from the candidate peak areas as the audio peak areas.

In a second aspect, an embodiment of the present application provides an apparatus for processing audio data, where the apparatus includes:

the audio acquisition module is used for acquiring audio data to be set as ring tones;

the editing module is used for setting the audio data into an editable state if the duration of the audio data is greater than a preset duration threshold and no history information for setting the ring tone based on the audio data exists;

and the intercepting module is used for intercepting the audio data to be recommended from the audio data based on the editing operation of the audio data.

Optionally, the intercepting module includes:

a receiving unit configured to receive an editing instruction for the audio data;

the extracting unit is used for extracting an audio peak area from the audio data according to the volume information and the audio information of the audio data;

and the intercepting unit is used for intercepting the audio data to be recommended from the audio data according to the audio peak area.

Optionally, the extracting unit is configured to:

Optionally, the extraction unit includes:

respectively acquiring the sum of the volume value of each section of sub-audio data and the audio numerical value corresponding to the audio information;

Optionally, the candidate peak regions satisfying a predetermined time range condition are extracted from the candidate peak regions as the audio peak regions, and are used for:

In a third aspect, an embodiment of the present application provides a mobile terminal, including a processor, a memory, and a computer program stored on the memory and operable on the processor, where the computer program, when executed by the processor, implements the steps of the audio data processing method provided in the foregoing embodiment.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the audio data processing method provided in the foregoing embodiment.

According to the technical scheme provided by the embodiment of the application, the audio data to be set as the ring tone are obtained, if the duration of the audio data is greater than the preset duration threshold value and no history information for setting the ring tone based on the audio data exists, the audio data are set to be in an editable state, and then the audio data to be recommended are intercepted from the audio data based on the editing operation of the audio data. Therefore, in the process of processing the audio data, a user does not need to browse the audio data and select a required segment in a manual continuous adjustment mode, the processing efficiency of the audio data is improved, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a flowchart illustrating an embodiment of a method for processing audio data according to the present application;

FIG. 2 is a flow chart of another embodiment of the present application for processing audio data;

FIG. 3 is a schematic illustration of a display of a user operation for selecting a location of audio data according to the present application;

FIG. 4 is a flowchart of another embodiment of a method for processing audio data according to the present application;

FIG. 5 is a schematic diagram of an apparatus for processing audio data according to the present application;

fig. 6 is a schematic structural diagram of a mobile terminal according to the present application.

Detailed Description

The embodiment of the application provides an audio data processing method and device and a mobile terminal.

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

As shown in fig. 1, an execution body of the method may be a mobile terminal, where the mobile terminal may be, for example, a mobile phone, a tablet computer, and the like, and the mobile terminal may be a mobile terminal used by a user. According to the method, after the audio data are obtained, the audio peak area can be automatically extracted according to the volume information and the audio information of the audio data and recommended to a user. The method may specifically comprise the steps of:

in step S102, audio data to be set as a ringtone is acquired.

The audio data to be intercepted and set as the ring tone may be any audio data, for example, audio data stored in the mobile terminal by the user, or audio data acquired online by the user, or may be audio data in video data stored in the mobile terminal or acquired online by the user, or the like.

In implementation, the development of the mobile internet has promoted the popularization and use of mobile terminals, and mobile terminals mainly based on mobile phones have become necessities for life and work of people, but in the mobile terminal market, mobile phone brands and models with higher utilization rate are more concentrated, and meanwhile, when the mobile phones leave factories, manufacturers usually store a certain number of ringtones in the mobile phones for users to select, so that the users using the mobile phones of the same brand and the same model have fewer choices on the ringtones, and the ringtones of the same style can be used by many people, so that in places with concentrated crowds, when the ringtones sound, a 'chaotic' scene can be caused. In addition to the optional ring tone in the mobile phone, the user can intercept part of the audio data to be used as the ring tone of the mobile phone to distinguish the mobile phone from other users. However, in the process of interception, the user needs to browse all audio data to accurately locate the climax part to be intercepted, and sometimes needs to repeatedly operate for many times to accurately find the required part, which brings higher time cost to the user and causes poor user experience. Therefore, the embodiments of the present invention provide a technical solution to solve the above problems, and refer to the following contents.

Taking the example that the user acquires the audio data on line as the mobile phone ring, the user can select favorite audio data through audio playing software installed on the mobile terminal, download the favorite audio data into the mobile terminal, and use the favorite audio data as the audio data to be set as the ring. In addition, the user can edit the audio data, and the edited audio data can be used as the acquired audio data to be set as the ring tone.

In practical application, some audio data in the downloaded audio data may be directly used as a mobile phone ring, and for this reason, a selection condition may also be set, for example, a ring duration requirement may be set, such as a ring duration range of 40 seconds to 1 minute, and based on this, if the downloaded audio data duration does not meet the ring duration requirement, the downloaded audio data may be directly used as the acquired audio data to be set as a ring. If the downloaded audio data time length meets the ring tone time length requirement, the audio data can be directly used as the ring tone.

In the embodiment, the audio data to be set as the ring tone is obtained by the user through online downloading, and in practical applications, other situations may also be included, for example, the audio data stored in the mobile terminal such as a mobile phone by the user, or the audio data recorded by the user through the mobile terminal, and the obtained audio data may be all used as the obtained audio data to be set as the ring tone.

In step S104, if the duration of the audio data is greater than the predetermined duration threshold and there is no history information for setting a ringtone based on the audio data, the audio data is set to an editable state.

The predetermined time threshold may be any time, such as 20 seconds, 30 seconds. The editable state can be indicated by the color of the editing button, for example, if the audio data is in the editable state, the editing button corresponding to the audio data can be set to be in a highlighted state, and if the audio data is not editable, the color of the editing button can be set to be gray.

If the time length of the audio data to be set as the ring tone is greater than a preset time length threshold value, searching the history information set by the ring tone, and if the history information set as the ring tone by the audio data does not exist, setting the audio data to be in an editable state. For example, the predetermined time threshold is 30 seconds, the time of the audio data to be set as a ringtone is 1 minute, the time of the audio data is greater than the predetermined time threshold, the audio data has not yet made a ringtone, and there is no history information for setting a ringtone based on the audio data, so the audio data can be set to an editable state, an edit button can be added beside the audio data, and highlighting processing can be performed.

In step S106, the audio data to be recommended is intercepted from the audio data based on the editing operation on the audio data.

The editing operation may be to intercept partial data in the audio data, or to select partial combined data in the audio data, for example, to intercept a climax part of a certain section in the audio data, or to combine two climax parts in the audio data, and the specific editing operation may be different according to the actual application situation.

In implementation, the audio data may be edited, and a middle portion of the audio data may be cut out as the audio data to be recommended, for example, if a segment of audio data is 3 minutes, a middle 30 seconds portion, that is, a portion from 1 minute 15 seconds to 1 minute 45 seconds, may be cut out as the audio data to be recommended.

In addition, the audio data may also be edited according to a preset selection rule, for example, the preset selection rule is to obtain data in different time periods according to the type of the audio data, if the audio data belongs to a song category with a duration of 1 minute to 3 minutes, then obtain middle 30 seconds of the audio data as audio data to be recommended, if the audio data belongs to short music with a duration of 30 seconds to 1 minute, then directly use the short music as the audio data to be recommended, if the audio data belongs to long audio data with a duration of more than 3 minutes, then data with a duration of 3 minutes to 3 minutes and 30 seconds may be selected as the audio data to be recommended, and an operation of adjusting the duration of the audio data to be recommended by a user may be received.

The above embodiment provides an optional and realizable editing operation on audio data, and the specific editing operation method may be various, which is not limited in this embodiment.

The embodiment of the application provides an audio data processing method, which comprises the steps of obtaining audio data to be set to be ring tones, setting the audio data to be in an editable state if the duration of the audio data is greater than a preset duration threshold value and no history information exists for setting the ring tones based on the audio data, and intercepting the audio data to be recommended from the audio data based on editing operation of the audio data. Therefore, in the process of processing the audio data, a user does not need to browse the audio data and select a required segment in a manual continuous adjustment mode, the processing efficiency of the audio data is improved, and the user experience is improved.

Example two

As shown in fig. 2, an execution body of the method may be a mobile terminal, where the mobile terminal may be, for example, a mobile phone, a tablet computer, and the like, and the mobile terminal may be a mobile terminal used by a user. According to the method, after the audio data are obtained, the audio peak area can be automatically extracted according to the volume information and the audio information of the audio data and recommended to a user. The method may specifically comprise the steps of:

in step S202, audio data to be set as a ringtone is acquired.

For the specific processing procedure of S202, reference may be made to relevant contents of S102 in the first embodiment, which is not described herein again.

In step S204, if the duration of the audio data is greater than the predetermined duration threshold and there is no history information for setting a ringtone based on the audio data, the audio data is set to an editable state.

For the specific processing procedure of S204, reference may be made to the related content of S104 in the first embodiment, which is not described herein again.

In step S206, an editing instruction for audio data is received.

The editing instruction may be a click operation of the user on an editing button, or may be a click operation of the audio data.

In implementation, after the audio data is set to be in the editable state through step S204, an editing button may be set behind the audio data, and a click operation of the user on the editing button, that is, an editing instruction for the audio data, may be received. A long press, double click, or click operation on the audio data may also be received as an edit instruction on the audio data.

In practical application, if a preset commonly-used audio data interception scheme is stored in the mobile terminal, corresponding editing operation can be performed on the audio data according to the commonly-used audio data interception scheme, and finally, audio data to be recommended is intercepted from the audio data. If the mobile terminal does not store the preset interception scheme of the common audio data, the mobile terminal may extract an audio peak area from the audio data according to the volume information and the audio information of the audio data, and intercept the audio data to be recommended from the audio data according to the audio peak area, where the specific processing manner may include a variety of manners, and a selectable processing manner is provided below, which may be specifically referred to as the following related contents.

In step S208, the audio data is divided into a plurality of segments, resulting in a plurality of segments of sub-audio data.

In implementation, the audio data may be divided into multiple segments according to a predetermined time interval, where the predetermined time interval may be any size, or may also be a time length related to a ring duration, where the ring duration may be a time for playing an audio when the mobile terminal receives a prompt such as an incoming call or a short message, and the ring duration may be 20 seconds or 30 to 45 seconds in a general case, and specific application scenarios are different due to different application scenarios, which is not limited in this application. And dividing the audio data into a plurality of sub audio data with the same time length according to the time interval. The predetermined time interval may be the same as the ring time or smaller than the ring time, which is not limited in the embodiment of the present application. The acquired audio data is segmented according to a predetermined time interval, for example, the agreed time interval may be set to 15 seconds, and if the acquired audio data is 3 minutes in total, it may be divided into audio region segments of 12 segments or the like in duration. The time interval may also be determined according to the duration of the audio data, for example, if the total length of the acquired audio data is 3 minutes 05 seconds, the divided time interval may be set to 5 seconds or 37 seconds, thereby obtaining pieces of sub-audio data of equal time length.

In step S210, a candidate peak area is determined from each segment of sub-audio data, where the volume value included in the candidate peak area and the audio value corresponding to the audio information are both greater than a predetermined selection threshold, and the duration of the candidate peak area is greater than a predetermined duration threshold.

The volume information and the audio information in the audio data may be parameter information including the pitch, the intensity, and the tone of the audio. The audio peak region may be a region of the audio data where the values of the volume information and the audio information are large (e.g., a region exceeding a predetermined threshold).

The volume value and the audio value corresponding to the audio information may be data recording the audio feature, for example, an amplitude value determining the volume value, a frequency value determining the audio information, and the like. The predetermined selection threshold may be determined according to the total volume value of the audio data and the audio value, for example, the total volume value of the audio data for a period of 3 minutes is 600 db, the audio data is divided into 12 sub audio data for a period of 12 minutes, and the volume value of each sub audio data should be 50 db on average, and then the predetermined selection threshold may be 50 db, or 60 db higher than the average. The predetermined duration threshold may be any duration less than the duration of each segment of sub-audio data, such as 3 seconds or 5 seconds.

In implementation, the divided sub-audio data obtained in step S208 may be calculated for the volume value included in each sub-audio data and the audio value corresponding to the audio information, and if the obtained values are both greater than the predetermined selection threshold and the duration of the sub-audio data is also greater than the predetermined duration threshold, the sub-audio data may be determined as the candidate peak area.

For example, if the predetermined threshold is set to 50 db, the audio value corresponding to the audio information is 100 hz, i.e. the determined volume value of the candidate peak region and the determined audio value corresponding to the audio information should be greater than 50 db and 100 hz, respectively. If a segment of audio data with a duration of 3 minutes is divided into 12 segments of sub-audio data, the duration of each sub-audio data is 15 seconds, and in the 12 segments of sub-audio data, a segment of self-audio data contains an audio data segment with a duration of 11 seconds, wherein the volume values of all points in the audio data segment are greater than 50 decibels, the audio values of all points are greater than 100 hertz, and the duration of the audio data segment is greater than a predetermined duration threshold for 10 seconds, the audio data segment can be determined as a candidate peak region. If a plurality of sub-audio data of the 12 sub-audio data segments include audio data segments satisfying the above condition, the audio data segments satisfying the predetermined selection threshold and the predetermined duration threshold may be determined as candidate peak regions.

In step S212, candidate peak regions satisfying a predetermined time length range condition are extracted from the candidate peak regions as audio peak regions.

The predetermined time range condition may be any time range meeting the requirement of the ring tone time of the mobile phone, and for example, may be greater than 10 seconds and less than 20 seconds, or closest to 20 seconds. The specific situation can be adjusted according to the application scenario, and the embodiment of the application does not make specific requirements for the situation.

In implementation, from the candidate peak regions obtained in step S206, a candidate peak region satisfying the predetermined time length range condition is found, and the candidate peak region is extracted as an audio peak region. For example, the predetermined time length range condition may be set to an audio region greater than 10 seconds and closest to 20 seconds, and if there are a plurality of candidate peak regions having different time lengths, candidate peak regions greater than 10 seconds among the candidate peak regions may be extracted, and after comparison, one or more candidate peak regions closest to 20 seconds are found and extracted as audio peak regions.

In step S214, if the time length of each of the candidate peak regions is within the predetermined time length range, a candidate peak region having a time length within the predetermined time length range is acquired as an audio peak region from the candidate peak regions.

In implementation, if the predetermined duration range is set to be greater than 10 seconds and less than 20 seconds, one or more candidate peak regions satisfying the predetermined duration range may be extracted as audio peak regions.

In step S216, if the time length of each of the candidate peak regions is not within the predetermined time length range, a plurality of adjacent candidate peak regions having a total time length within the predetermined time length range are acquired as the audio peak region from the candidate peak regions.

In implementation, if none of the candidate peak regions has a duration that does not satisfy the predetermined duration range, a plurality of adjacent candidate peak regions having a total duration within the predetermined duration range may be acquired as the audio peak region. For example, if the predetermined time length range is set to 10-20 seconds, the time lengths of the candidate peak regions are both less than 10 seconds, if the time length of one candidate peak region is 8 seconds, the time length of the adjacent candidate peak region is 9 seconds, and the interval between the two candidate peak regions is 2 seconds, then the combined time length of the two candidate peak regions is 19 seconds, and the predetermined time length range, i.e., 10-20 seconds, is satisfied, then the combination of the two candidate peak regions can be used as the audio peak region. If there are a plurality of adjacent candidate peak regions whose combined regions can satisfy the predetermined time length range, they may also be regarded as audio peak regions.

In step S218, the audio data to be recommended is intercepted from the audio data according to the audio peak area.

In implementation, after the audio peak area is obtained in step S212, the corresponding time point may be found in the audio data according to the start position and the end position of the audio peak area, and then the audio data may be intercepted based on the found time point, and the intercepted result may be displayed in the mobile terminal for the user to view.

In addition, according to the audio peak area, a time period corresponding to the audio peak area may be highlighted on a timeline for audio data playing.

For example, if a period of audio data with a duration of 3 minutes is to capture the audio peak region with a starting time point of 1 minute 50 seconds and an ending time point of 2 minutes 03 seconds, a time region from 1 minute 50 seconds to 2 minutes 03 seconds may be highlighted on the time progress bar of the audio playback, and different colors may be used for displaying or bolding, and the positions of 1 minute 50 seconds and 2 minutes 03 seconds are marked. If there are multiple audio peak regions, the timeline may be labeled with different colors, or the time periods corresponding to each audio peak region may be labeled to indicate the distinction. For the obtained audio peak area, the user may adjust the obtained audio peak area according to the user's requirement, for example, if the audio data is a song, the intercepted audio peak area may be a climax part of the song, and if the intercepted audio peak area further includes individual lyrics of the climax part before the start point, the user may drag the start point of the time period corresponding to the audio peak area to the left on the time progress bar, intercept the intercepted lyrics into the audio peak area, or intercept the intercepted lyrics into the audio peak area to the end point of the time period corresponding to the audio peak area to the right, as shown in fig. 3. The embodiment of the application provides an audio data processing method, which comprises the steps of obtaining audio data to be set to be ring tones, setting the audio data to be in an editable state if the duration of the audio data is greater than a preset duration threshold value and no history information exists for setting the ring tones based on the audio data, and intercepting the audio data to be recommended from the audio data based on editing operation of the audio data. Therefore, in the process of processing the audio data, a user does not need to browse the audio data and select a required segment in a manual continuous adjustment mode, the processing efficiency of the audio data is improved, and the user experience is improved.

EXAMPLE III

As shown in fig. 4, an execution body of the method may be a mobile terminal, where the mobile terminal may be, for example, a mobile phone, a tablet computer, and the like, and the mobile terminal may be a mobile terminal used by a user. According to the method, after the audio data are obtained, the audio peak area can be automatically extracted according to the volume information and the audio information of the audio data and recommended to a user. The method may specifically comprise the steps of:

in step S402, audio data to be set as a ringtone is acquired.

For the specific processing procedure of S402, reference may be made to the relevant content of S102 in the first embodiment, which is not described herein again.

In step S404, if the duration of the audio data is greater than the predetermined duration threshold and there is no history information for setting a ringtone based on the audio data, the audio data is set to an editable state.

For the specific processing procedure of S404, reference may be made to relevant contents of S104 in the first embodiment, which is not described herein again.

In step S406, an editing instruction for audio data is received.

The specific processing procedure of S406 may refer to the related content of S206 in the second embodiment, which is not described herein again.

In step S408, the audio data is divided into a plurality of segments, resulting in a plurality of segments of sub-audio data.

The specific processing procedure of S408 may refer to the related content of S208 in the second embodiment, which is not described herein again.

In step S410, the sum of the volume value of each segment of sub-audio data and the audio value corresponding to the audio information is obtained.

The volume value corresponding to the volume information and the audio data corresponding to the audio information may be a sum of the volume value (e.g., decibel value) and the audio value.

In an implementation, the volume value in each segment of sub-audio data and the audio value corresponding to the audio information are obtained and summarized, for example, the sum of the volume values of all the points in a segment of sub-audio data may be 500 db, and the sum of all the audio values may be 600 hz.

In step S412, sub audio data satisfying a predetermined segment number condition is selected from the plurality of segments of sub audio data as the candidate peak area according to the value of the sum corresponding to each segment of sub audio data.

The predetermined number of segments may be one or more segments, and the specific setting may be determined according to the time length of the audio data or the number of times of the climax part in the audio data.

In implementation, the sum of the volume value and the audio value of each segment of sub-audio data is obtained according to step S406, and the sub-audio data is comprehensively sorted according to the sum of the volume value and the audio value, and if the predetermined number of segments is set to 2, the sub-audio data with the volume value and the audio value ranked two first may be used as the candidate peak region.

In step S414, a candidate peak region satisfying a predetermined time length range condition is extracted from the candidate peak regions as an audio peak region.

For the specific processing procedure of S414, reference may be made to the related content of S212 in the second embodiment, which is not described herein again.

In step S416, if the time length of each of the candidate peak regions is within the predetermined time length range, a candidate peak region having a time length within the predetermined time length range is acquired as an audio peak region from the candidate peak regions.

For the specific processing procedure of S416, reference may be made to the related content of S214 in the second embodiment, which is not described herein again.

In step S418, if the time length of each of the candidate peak regions is not within the predetermined time length range, a plurality of adjacent candidate peak regions having a total time length within the predetermined time length range are acquired as the audio peak region from the candidate peak regions.

For the specific processing procedure of S418, reference may be made to the relevant content of S216 in the second embodiment, which is not described herein again.

In step S420, the audio data to be recommended is intercepted from the audio data according to the audio peak region.

For the specific processing procedure of S420, reference may be made to the related content of S218 in the second embodiment, which is not described herein again.

The embodiment of the application provides an audio data processing method, which comprises the steps of obtaining audio data to be set to be ring tone, setting the audio data to be in an editable state if the duration of the audio data is greater than a preset duration threshold value and no history information for setting the ring tone based on the audio data exists, and intercepting the audio data to be recommended from the audio data based on the editing operation of the audio data. Therefore, in the process of processing the audio data, a user does not need to browse the audio data and select a required segment in a manual continuous adjustment mode, the processing efficiency of the audio data is improved, and the user experience is improved.

Example four

Based on the same idea, the foregoing audio data processing method provided in the embodiment of the present application further provides an audio data processing apparatus, as shown in fig. 5.

The audio data processing apparatus includes: an audio acquisition module 501, an editing module 502, and a truncation module 503, wherein:

an audio obtaining module 501, configured to obtain audio data to be set as a ringtone;

an editing module 502, configured to set the audio data to an editable state if the duration of the audio data is greater than a predetermined duration threshold and there is no history information for setting a ringtone based on the audio data;

an intercepting module 503, configured to intercept audio data to be recommended from the audio data based on an editing operation on the audio data.

In this embodiment of the present application, the intercepting module 503 includes:

In an embodiment of the present application, the extracting unit includes:

In an embodiment of the present application, the extracting unit is configured to:

The embodiment of the application provides an audio data processing device, which is used for setting audio data to be ring tone into an editable state by acquiring the audio data to be set into the ring tone, and then intercepting the audio data to be recommended from the audio data based on the editing operation of the audio data if the duration of the audio data is greater than a preset duration threshold value and no history information for setting the ring tone based on the audio data exists. Therefore, in the process of processing the audio data, a user does not need to browse the audio data and select a required segment in a manual continuous adjustment mode, the processing efficiency of the audio data is improved, and the user experience is improved.

EXAMPLE five

Figure 6 is a schematic diagram of a hardware configuration of a mobile terminal implementing various embodiments of the present invention,

the mobile terminal 600 includes, but is not limited to: a radio frequency unit 601, a network module 602, an audio output unit 603, an input unit 604, a sensor 605, a display unit 606, a user input unit 607, an interface unit 608, a memory 609, a processor 610, and a power supply 611. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 6 is not intended to be limiting of mobile terminals, and that a mobile terminal may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the mobile terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

The processor 610 is configured to obtain audio data to be set as a ring tone;

a processor 610 configured to set the audio data to an editable state if a duration of the audio data is greater than a predetermined duration threshold and there is no history information for setting a ringtone based on the audio data;

and the processor 610 is used for intercepting audio data to be recommended from the audio data based on the editing operation on the audio data.

In addition, the processor 610 is further configured to receive an editing instruction for the audio data; the processor 610 is further configured to extract an audio peak region from the audio data according to the volume information and the audio information of the audio data;

the processor 610 is further configured to intercept audio data to be recommended from the audio data according to the audio peak area.

In addition, the processor 610 is further configured to divide the audio data into multiple segments to obtain multiple segments of sub-audio data; respectively determining candidate peak value areas from each section of sub-audio data, wherein the volume value contained in the candidate peak value areas and the audio value corresponding to the audio information are both greater than a preset selection threshold, and the duration of the candidate peak value areas is greater than a preset duration threshold; and extracting candidate peak regions meeting the condition of a preset time length range from the candidate peak regions to serve as the audio peak regions.

In addition, the processor 610 is further configured to divide the audio data into multiple segments to obtain multiple segments of sub-audio data; respectively acquiring the sum of the volume value of each section of sub-audio data and the audio numerical value corresponding to the audio information; selecting sub-audio data meeting a preset segment number condition from the multiple segments of sub-audio data as the candidate peak area according to the numerical value of the sum corresponding to each segment of sub-audio data; and extracting candidate peak regions meeting the condition of a preset time length range from the candidate peak regions to serve as the audio peak regions.

In addition, the processor 610 is further configured to, if the duration of each of the candidate peak regions is within a predetermined duration range, obtain, from the candidate peak regions, a candidate peak region having a duration within a predetermined duration range as the audio peak region.

In addition, the processor 610 is further configured to, if the duration of each of the candidate peak regions is not within the predetermined duration range, obtain, from the candidate peak regions, a plurality of adjacent candidate peak regions whose total duration is within the predetermined duration range as the audio peak region.

The embodiment of the application provides a mobile terminal, which is characterized in that audio data to be set to ring tone are acquired, if the duration of the audio data is greater than a preset duration threshold value and no history information for setting the ring tone based on the audio data exists, the audio data is set to be in an editable state, and then the audio data to be recommended is intercepted from the audio data based on the editing operation of the audio data. Therefore, in the process of processing the audio data, a user does not need to browse the audio data and select a required segment in a manual continuous adjustment mode, the processing efficiency of the audio data is improved, and the user experience is improved.

It should be understood that, in the embodiment of the present application, the radio frequency unit 601 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 610; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 601 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. Further, the radio frequency unit 601 may also communicate with a network and other devices through a wireless communication system.

The mobile terminal provides the user with wireless broadband internet access through the network module 602, such as helping the user send and receive e-mails, browse webpages, access streaming media, and the like.

The audio output unit 603 may convert audio data received by the radio frequency unit 601 or the network module 602 or stored in the memory 609 into an audio signal and output as sound. Also, the audio output unit 603 may also provide audio output related to a specific function performed by the mobile terminal 600 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 603 includes a speaker, a buzzer, a receiver, and the like.

The input unit 604 is used to receive audio or video signals. The input Unit 604 may include a Graphics Processor (GPU) 6041 and a microphone 6042, and the graphics processor 6041 processes image data of a still picture or video obtained by an image capturing apparatus (such as a camera) in a video capture mode or an image capture mode. The processed image frames may be displayed on the display unit 606. The image frames processed by the graphic processor 6041 may be stored in the memory 609 (or other storage medium) or transmitted via the radio frequency unit 601 or the network module 602. The microphone 6042 can receive sound, and can process such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 601 in case of the phone call mode.

The mobile terminal 600 also includes at least one sensor 605, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 6061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 6061 and/or the backlight when the mobile terminal 600 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 605 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 606 is used to display information input by the user or information provided to the user. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 607 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 607 includes a touch panel 6071 and other input devices 6072. Touch panel 6071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 6071 using a finger, stylus, or any suitable object or accessory). The touch panel 6071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 610, receives a command from the processor 610, and executes the command. In addition, the touch panel 6071 can be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The user input unit 607 may include other input devices 6072 in addition to the touch panel 6071. Specifically, the other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein again.

Further, the touch panel 6071 can be overlaid on the display panel 6061, and when the touch panel 6071 detects a touch operation on or near the touch panel 6071, the touch operation is transmitted to the processor 610 to determine the type of the touch event, and then the processor 610 provides a corresponding visual output on the display panel 6061 according to the type of the touch event. Although the touch panel 6071 and the display panel 6061 are shown in fig. 6 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 6071 and the display panel 6061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 608 is an interface through which an external device is connected to the mobile terminal 600. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 608 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 600 or may be used to transmit data between the mobile terminal 600 and external devices.

The memory 609 may be used to store software programs as well as various data. The memory 609 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 609 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 610 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 609 and calling data stored in the memory 609, thereby integrally monitoring the mobile terminal. Processor 610 may include one or more processing units; preferably, the processor 610 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 610.

The mobile terminal 600 may further include a power supply 611 (e.g., a battery) for supplying power to the various components, and preferably, the power supply 611 is logically connected to the processor 610 via a power management system, so that functions of managing charging, discharging, and power consumption are performed via the power management system.

Preferably, an embodiment of the present invention further provides a mobile terminal, which includes a processor 610, a memory 609, and a computer program stored in the memory 609 and capable of running on the processor 610, where the computer program is executed by the processor 610 to implement each process of the foregoing audio data processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

EXAMPLE six

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned audio data processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

The embodiment of the application provides a computer-readable storage medium, which is characterized in that audio data to be set to be ring tone are acquired, if the duration of the audio data is greater than a preset duration threshold value and no history information for setting the ring tone based on the audio data exists, the audio data are set to be in an editable state, and then the audio data to be recommended are intercepted from the audio data based on the editing operation of the audio data. Therefore, in the process of processing the audio data, a user does not need to browse the audio data and select a required segment in a manual continuous adjustment mode, the processing efficiency of the audio data is improved, and the user experience is improved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (fla second hRAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (second RAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (such as modulated data signals and carrier waves).

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of processing audio data, the method comprising:

acquiring audio data to be set as a ringtone;

intercepting audio data to be recommended from the audio data based on editing operation on the audio data;

the intercepting of the audio data to be recommended from the audio data based on the editing operation on the audio data comprises:

receiving an editing instruction of the audio data;

extracting an audio peak area from the audio data according to the volume information and the audio information of the audio data;

intercepting audio data to be recommended from the audio data according to the audio peak area;

wherein, the extracting an audio peak area from the audio data according to the volume information and the audio information of the audio data comprises:

dividing the audio data into multiple sections to obtain multiple sections of sub-audio data; respectively determining candidate peak value areas from each section of sub-audio data, wherein the volume value contained in the candidate peak value areas and the audio value corresponding to the audio information are both greater than a preset selection threshold, and the duration of the candidate peak value areas is greater than a preset duration threshold; extracting candidate peak regions meeting the condition of a preset time length range from the candidate peak regions to serve as the audio peak regions; or

Dividing the audio data into multiple sections to obtain multiple sections of sub-audio data; respectively acquiring the sum of the volume value of each section of sub-audio data and the audio numerical value corresponding to the audio information; selecting sub-audio data meeting the preset segment number condition from the multiple segments of sub-audio data as candidate peak areas according to the numerical value of the sum corresponding to each segment of sub-audio data; extracting candidate peak regions meeting the condition of a preset time length range from the candidate peak regions to serve as the audio peak regions;

wherein the extracting, from the candidate peak regions, a candidate peak region satisfying a predetermined time length range condition as the audio peak region comprises:

if the time length of each candidate peak area in the candidate peak areas is in a preset time length range, acquiring the candidate peak areas with the time lengths in the preset time length range from the candidate peak areas as the audio peak areas;

2. An apparatus for processing audio data, the apparatus comprising:

the intercepting module is used for intercepting audio data to be recommended from the audio data based on editing operation on the audio data;

wherein, the intercept module comprises:

the intercepting unit is used for intercepting audio data to be recommended from the audio data according to the audio peak area;

wherein the extraction unit is configured to:

wherein, the candidate peak regions satisfying the condition of the predetermined time length range are extracted from the candidate peak regions as the audio peak regions, and are used for:

3. A mobile terminal, characterized in that it comprises a processor, a memory and a computer program stored on said memory and executable on said processor, said computer program, when executed by said processor, implementing the steps of the method of processing audio data as claimed in claim 1.