CN108831423A

CN108831423A - Extract method, apparatus, terminal and the storage medium of theme track in audio data

Info

Publication number: CN108831423A
Application number: CN201810537265.3A
Authority: CN
Inventors: 孔令城
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2018-05-30
Filing date: 2018-05-30
Publication date: 2018-11-16
Anticipated expiration: 2038-05-30
Also published as: CN108831423B

Abstract

This application discloses method, apparatus, terminal and the storage mediums of theme track in a kind of extraction audio data, belong to field of audio processing, this method includes：Multiple tracks in target audio data are extracted, the time segment information of the voice period in each track is determined, obtains the corresponding period information aggregate of each track；In the corresponding lyrics information of target audio data, the time segment information of every lyrics is determined, obtain the corresponding period information aggregate of lyrics information；Determine the matching degree of the corresponding period information aggregate of each track period information aggregate corresponding with lyrics information；By the highest track of corresponding matching degree, it is determined as the theme track of target audio data.Present application addresses current track, method for removing is not suitable for the audio of music style minority's abnormal type one by one, the problem of melody tracks non-master in audio are easily determined as the theme of the audio has achieved the effect that the universality and accuracy that improve the theme track in identification audio.

Description

Extract method, apparatus, terminal and the storage medium of theme track in audio data

Technical field

The invention relates to field of audio processing, in particular to a kind of side for extracting theme track in audio data Method, device, terminal and storage medium.

Background technique

Musical instrument digital interface (Musical Instrument Digital Interface, MIDI) is one and is used to Give out music the interface of audio.Each MIDI audio may include compound track, and each track includes the music of different musical instruments. In MIDI audio, there is a track commonly used to storage theme, other tracks are used to store accompaniment melody.

Server can provide the clothes such as music analysis, music retrieval, music identification, similar music recommendation based on audio theme Business.In the related technology, the unique track obtained after the track in MIDI audio being excluded one by one is determined as the master of the MIDI audio Melody.

And for the MIDI audio of music style minority's abnormal type, according to above-mentioned track one by one method for removing then easily by MIDI Non-master melody tracks are determined as the theme track of the MIDI audio in audio.Therefore how effectively to determine the theme sound of song Rail becomes urgent problem to be solved.

Summary of the invention

In order to solve the problems in the existing technology, the embodiment of the present application provides main rotation in a kind of extraction audio data Method, apparatus, terminal and the storage medium of note rail.Technical solution is as follows：

According to the embodiment of the present application in a first aspect, provide it is a kind of extraction audio data in theme track method, The method includes：

Multiple tracks in target audio data are extracted, determine the time segment information of the voice period in each track, Obtain the corresponding period information aggregate of each track；

In the corresponding lyrics information of the target audio data, the time segment information of every lyrics is determined, obtain described The corresponding period information aggregate of lyrics information；

Determine the corresponding period information aggregate of each track time segment information collection corresponding with the lyrics information The matching degree of conjunction；

By the highest track of corresponding matching degree, it is determined as the theme track of the audio data.

According to the second aspect of the embodiment of the present application, a kind of device for extracting theme track in audio data is provided, Described device includes：

First determining module, for extracting multiple tracks in target audio data, when determining the voice in each track Between section time segment information, obtain the corresponding period information aggregate of each track；

Second determining module, in the corresponding lyrics information of the target audio data, determine every lyrics when Between segment information, obtain the corresponding period information aggregate of the lyrics information；

Third determining module, for determining the corresponding period information aggregate of each track and the lyrics information pair The matching degree for the period information aggregate answered；

4th determining module, for being determined as the master of the target audio data for the highest track of corresponding matching degree Melody audio tracks.

According to the third aspect of the embodiment of the present application, a kind of terminal is provided, the terminal includes processor and memory, At least one instruction is stored in the memory, described instruction is loaded by the processor and executed to realize such as first aspect The method for extracting theme track in audio data.

According to the fourth aspect of the embodiment of the present application, a kind of computer readable storage medium, the storage medium are provided In be stored at least one instruction, described instruction is loaded by processor and is executed to realize extraction audio as described in relation to the first aspect The method of theme track in data.

Technical solution bring beneficial effect provided by the embodiments of the present application is：

Song by the corresponding period information aggregate of tracks multiple in target audio data, with the target audio data The corresponding time segment information of word information is matched, and the highest track of matching degree is determined as to the theme sound of target audio data Rail, due under normal conditions, in all tracks of target audio data, the corresponding period information aggregate of theme track and song Matching degree highest between the corresponding time segment information of word information；Solving current track, method for removing is not suitable for compiling one by one The audio of style of song lattice minority's abnormal type, reaches the problem of melody tracks non-master in audio are easily determined as the theme track of the audio To the effect for the universality and accuracy for improving the theme track in identification audio.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Figure 1A is the flow chart of the method for theme track in the extraction audio data provided in the application one embodiment；

Figure 1B is the corresponding period information aggregate of each track provided in the application one embodiment and lyrics information The comparison diagram of corresponding period information aggregate；

Fig. 2 is the process of the method for theme track in the extraction audio data provided in another embodiment of the application Figure；

Fig. 3 is the structure box for extracting the device of theme track in audio data that the application one embodiment provides Figure；

Fig. 4 shows the structural block diagram of the terminal 400 of one exemplary embodiment of the application offer.

Specific embodiment

To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.

Figure 1A is the flow chart of the method for theme track in the extraction audio data provided in the application one embodiment, As shown in Figure 1A, the method for theme track includes the following steps in the extraction audio data.

Step 101, multiple tracks in target audio data are extracted, determine the time of the voice period in each track Segment information obtains the corresponding period information aggregate of each track.

In the present embodiment, target audio data include but is not limited to song, music, play happy humming song, target sound Frequency evidence can be acquired from local or server.

In the present embodiment, the format of target audio data is midi format.

In the audio data of midi format, the track for being used to store theme is generally comprised, it is a plurality of to be used to store Accompany the track of melody, usually there is voice period and mute time section in each track, due to noiseless segment it is subsequent really Determine do not have reference value during theme track, it is therefore desirable to after extracting multiple tracks in target audio data, The time segment information that need to determine the voice period in each track obtains the corresponding period information aggregate of each track, from And reduce the unnecessary treating capacity of terminal.

It is the command file of ending since the audio data of midi format is usually with .mid, is included at least in this document each At the beginning of all voice periods of track and finish time, therefore can be from the corresponding command file of target audio data Each track in the target audio data is extracted, and obtains the corresponding period information aggregate of each track, each track pair The period information aggregate answered includes the track in audio data at the beginning of all voice periods and finish time.

Optionally, two are used respectively with finish time at the beginning of all voice periods of each track in audio data Array representation is tieed up, is denoted as：

Wherein, k indicates the quantity of track,At the beginning of all voice periods that middle record has k-th of track It carves,Record has the finish time of all voice periods of k-th of track,Indicate i-th of voice in k-th of track At the beginning of period/specific moment of finish time, unit millisecond, km indicates the number of voice period in k-th of track Amount.

Step 102, it in the corresponding lyrics information of target audio data, determines the time segment information of every lyrics, obtains The corresponding period information aggregate of lyrics information.

In the present embodiment, the corresponding lyrics information of target audio data is used to describe in the deduction of the target audio data Hold.The content that target audio data are deduced is described by the corresponding lyrics information of target audio data, corresponding, mesh The corresponding lyrics information of mark audio data is deduced by the target audio data.

By taking the lyrics information abc of target audio data ABC as an example, lyrics information abc is as follows：

[628,1980] a1a2a3a4a5a6,

[6301,9523] b1b2b3b4b5b6,

[12002,54301] c1c2c3c4c5c6,

……

In above-mentioned lyrics information abc, " a1a2a3a4a5a6 ", " b1b2b3b4b5b6 ", " c1c2c3c4c5c6 " etc. For the lyrics that lyrics information abc includes, " [] " before each lyrics is that the time attribute of each lyrics describes text, in " [] " The content for including is used to describe the time attribute of each lyrics, and the unit time is usually ms.Wherein, the time attribute packet of the lyrics It includes：At the beginning of the lyrics and the finish time of the lyrics.Such as：Above-mentioned [628,1980] are the lyrics " a1a2a3a4a5a6 " Time attribute text described, " 628 " therein indicate that at the beginning of the lyrics " a1a2a3a4a5a6 ", " 1980 " indicate the lyrics The finish time of " a1a2a3a4a5a6 " describes text by the time attribute of " a1 " it is found that the lyrics " a1a2a3a4a5a6's " broadcasts Period 628ms~1980ms is put, i.e., the lyrics " a1a2a3a4a5a6 " are played since 628ms, until 1980ms terminates to play.

Since lyrics information is usually using .qrc as miss potter word file, including at least the lyrics and each in this document At the beginning of the lyrics are corresponding and finish time, therefore every lyrics can be extracted from the corresponding lyrics file of target audio data Time segment information, obtain the corresponding period information aggregate of lyrics information.

Optionally, it is indicated, is denoted as with two-dimensional array respectively with finish time at the beginning of each lyrics are corresponding：

qrc_st=[t₁, t₂... t_n]

qrc_et=[t₁, t₂... t_n]

Wherein, qrc_stMiddle record has sequence at the beginning of all lyrics included by lyrics information, qrc_etRecord has song The finish time sequence of all lyrics included by word information, t_iAt the beginning of indicating i-th of lyrics/finish time it is specific Moment, unit millisecond, n indicate the quantity of the lyrics.

Step 103, the corresponding period information aggregate of each track time segment information collection corresponding with lyrics information is determined The matching degree of conjunction.

Specifically, each time segment information A in period information aggregate corresponding for lyrics information_i, in each sound In the corresponding period information aggregate of rail, lookup and A_iMeet the time segment information B of preset matching condition_j, will find pair The B answered_jA_iNumber period information aggregate corresponding with lyrics information in all time segment informations number ratio, It is determined as the matching degree of the corresponding period information aggregate of each track period information aggregate corresponding with lyrics information.

Wherein, integer of the i between 1~n, integer of the j between 1~m.

Optionally, preset matching condition includes at least two kinds of situations of subordinate：

The first situation, preset matching condition are A_iAt the beginning of with B_jAt the beginning of between time difference default First threshold in, and A_iFinish time and breath B_jFinish time between time difference in first threshold.

It is 500ms with preset first threshold, the corresponding period information aggregate of lyrics information includes time segment information [628,1980]A₁、[6301,9523]A₂、[12002,54301]A₃, the corresponding period information aggregate of each track includes the Corresponding time segment information [600,2000] B of one track₁、[6300,9600]B₂、[12000,54400]B₃And second track Corresponding time segment information [501,1580] C₁、[6000,7000]C₂、[10000,53000]C₃For.For time segment information A₁, terminal in the corresponding period information aggregate of track, find it is corresponding at the beginning of with A₁At the beginning of between when Between difference in 500ms, and corresponding finish time and A₁Finish time between B of the time difference in 500ms₁And C₁, search To at the beginning of correspondence with A₂At the beginning of between time difference in 500ms, and corresponding finish time and A₁End B of the time difference in 500ms between moment₂And C₂, find it is corresponding at the beginning of with A₃At the beginning of between time Difference is in 500ms, and corresponding finish time and A₃Finish time between B of the time difference in 500ms₃, due to the first sound The time segment information for meeting preset matching condition found in the corresponding time segment information of rail has 3, and the second track is corresponding The time segment information for meeting preset matching condition found in time segment information has 2, therefore it is corresponding to obtain the first track The matching degree of period information aggregate period information aggregate corresponding with lyrics information is 1, the second track corresponding period The matching degree of information aggregate period information aggregate corresponding with lyrics information is 2/3.

Second situation, preset matching condition are A_iAt the beginning of with B_jAt the beginning of between time difference, add A_i's Finish time and B_jFinish time between time difference summation in preset second threshold.

It is 500ms with preset second threshold, the corresponding period information aggregate of lyrics information includes time segment information [628,1980]A₁、[6301,9523]A₂、[12002,54301]A₃, the corresponding period information aggregate of each track includes the Corresponding time segment information [600,2000] B of one track₁、[6300,9600]B₂、[12000,54400]B₃And second track Corresponding time segment information [501,1580] C₁、[6000,7000]C₂、[10000,53000]C₃For.For time segment information A₁, terminal in the corresponding period information aggregate of track, find it is corresponding at the beginning of with A₁At the beginning of between when Between it is poor, add corresponding finish time and A₁Finish time between B of the time difference in 500ms₁, find corresponding beginning Moment and A₂At the beginning of between time difference, add corresponding finish time and A₁Finish time between time difference exist B in 500ms₂, find it is corresponding at the beginning of with A₃At the beginning of between time difference, add corresponding finish time with A₃Finish time between B of the time difference in 500ms₃, due to the symbol found in the corresponding time segment information of the first track The time segment information for closing preset matching condition has 3, and what is found in the corresponding time segment information of the second track meets default Time segment information with condition has 0, therefore it is corresponding with lyrics information to obtain the corresponding period information aggregate of the first track The matching degree of period information aggregate is 1, the corresponding period information aggregate of the second track period corresponding with lyrics information The matching degree of information aggregate is 0.

It should be noted that the present embodiment does not limit the specific value of preset first threshold and preset second threshold And setting means.

Step 104, by the highest track of corresponding matching degree, it is determined as the theme track of target audio data.

Continuation is illustrated with the citing in step 103：

In the first case, terminal is by highest first track (1 of corresponding matching degree>2/3), it is determined as audio data Theme track.

In second situation, terminal is by highest first track (1 of corresponding matching degree>0), it is determined as audio data Theme track.

Figure 1B is the corresponding period information aggregate of each track provided in the application one embodiment and lyrics information The comparison diagram of corresponding period information aggregate, as shown in Figure 1B, horizontal axis indicate playing duration, and the longitudinal axis 0 indicates lyrics information pair The period information aggregate answered, the longitudinal axis 1~12 respectively indicates the corresponding period information aggregate of each track, due to from Figure 1B It can be intuitive to see, the corresponding period information aggregate of track represented by the longitudinal axis 1 time segment information corresponding with lyrics information Matching degree highest between set, therefore track represented by the longitudinal axis 1 is determined as to the theme track of target audio data.

In conclusion the method provided in this embodiment for extracting theme track in audio data, by target audio data In the corresponding period information aggregate of multiple tracks, time segment information corresponding with the lyrics information of the target audio data It is matched, the highest track of matching degree is determined as to the theme track of target audio data, due under normal conditions, target In all tracks of audio data, the corresponding period information aggregate of theme track time segment information corresponding with lyrics information Between matching degree highest；Solving current track, method for removing is not suitable for the audio of music style minority's abnormal type one by one, The problem of melody tracks non-master in audio are easily determined as the theme track of the audio has reached the master improved in identification audio The universality of melody tracks and the effect of accuracy.

In preset finite time section, if the time segment information in the corresponding period information aggregate of track meet it is pre- If matching condition, then the explanation of maximum probability is within other non-default periods, in the corresponding period information aggregate of the track Time segment information also comply with preset matching condition.Therefore in order to reduce the processing pressure of processor, terminal only need to be to target audio The segment of data carries out subsequent calculating.

Fig. 2 is the process of the method for theme track in the extraction audio data provided in another embodiment of the application Figure, as shown in Fig. 2, the method for theme track includes the following steps in the extraction audio data.

Step 201, multiple tracks in target audio data are extracted, determine sound each within the scope of preset finite time The time segment information of voice period in rail obtains the corresponding period information aggregate of each track.

If the earliest moment is 25000 in the corresponding period information aggregate of the lyrics information of target audio data, the latest Moment is 225000, then, it is [25000,225000] in the finite time range that the target audio data can be chosen.

With preset finite time range for for [40000,100000], terminal is more in extraction target audio data After a track, the period letter of the voice period in finite time range [40000,100000] in each track is determined Breath, obtains the corresponding period information aggregate of the first track ([60000,200000], [630000,960000]) in track, And the second track corresponding period set ([50100,158000], [600000,700000]).

It should be noted that the present embodiment does not limit the value range and setting means of preset finite time range.

Step 202, in the corresponding lyrics information of target audio data, every lyrics within the scope of finite time are determined Time segment information obtains the corresponding period information aggregate of lyrics information.

Continuation is illustrated with the citing in step 201, when preset finite time range is [40000,100000], Terminal determines every lyrics in finite time range [40000,100000] in the corresponding lyrics information of target audio data Time segment information, obtain the corresponding period information aggregate of lyrics information.

Step 203, the corresponding period information aggregate of each track time segment information collection corresponding with lyrics information is determined The matching degree of conjunction.

Step 204, in the track that corresponding matching degree reaches preset matching degree threshold value, by corresponding matching degree highest Track, be determined as the theme track of target audio data.

In the case where target audio data do not include theme, in order to avoid terminal judges the highest track of matching degree by accident For the theme track of audio data, matching degree threshold value is preset, corresponding matching degree is reached into preset matching degree threshold value Track be determined as the candidate track of theme track.

The corresponding period information aggregate of each track period information aggregate corresponding with lyrics information is obtained in terminal Matching degree after, first reject the track that corresponding matching degree is not up to preset matching degree threshold value, only reach pre- in matching degree If matching degree threshold value track in determine theme track.If rejecting corresponding matching degree is not up to preset matching degree threshold After the track of value, the quantity of remaining track is 0, then terminal determines that the target audio data do not include theme track.

It should be noted that it is similar with step 103 by step 203 in this present embodiment, therefore the present embodiment is not to step 203 repeat explanation.

In the present embodiment, in preset finite time section, if the time in the corresponding period information aggregate of track Segment information meets preset matching condition, then the explanation of maximum probability is within other non-default periods, the track corresponding time Time segment information in segment information set also complies with preset matching condition.Therefore in order to reduce the processing pressure of processor, terminal is only Subsequent calculating need to be carried out to the segment of target audio data.

In the present embodiment, target audio data do not include theme in the case where, in order to avoid terminal by matching degree most High track is mistaken for the theme track of audio data, presets matching degree threshold value, and corresponding matching degree is reached default The track of matching degree threshold value be determined as the candidate track of theme track.

It is following that the details of description not detailed in Installation practice can be referred to above-mentioned for the application Installation practice One-to-one embodiment of the method.

Referring to FIG. 3, it illustrates the dresses for extracting theme track in audio data that the application one embodiment provides The structural block diagram set.The device includes：First determining module 301, the second determining module 302, third determining module 303 and Four determining modules 304.

First determining module 301 determines the voice in each track for extracting multiple tracks in target audio data The time segment information of period obtains the corresponding period information aggregate of each track；

Second determining module 302, for determining the time of every lyrics in the corresponding lyrics information of target audio data Segment information obtains the corresponding period information aggregate of lyrics information；

Third determining module 303, for determining that the corresponding period information aggregate of each track is corresponding with lyrics information The matching degree of period information aggregate；

4th determining module 304, for being determined as the main rotation of target audio data for the highest track of corresponding matching degree Note rail.

In conclusion the device provided in this embodiment for extracting theme track in audio data, by target audio data In the corresponding period information aggregate of multiple tracks, time segment information corresponding with the lyrics information of the target audio data It is matched, the highest track of matching degree is determined as to the theme track of target audio data, due under normal conditions, target In all tracks of audio data, the corresponding period information aggregate of theme track time segment information corresponding with lyrics information Between matching degree highest；Solving current track, method for removing is not suitable for the audio of music style minority's abnormal type one by one, The problem of melody tracks non-master in audio are easily determined as the theme track of the audio has reached the master improved in identification audio The universality of melody tracks and the effect of accuracy.

What is provided based on the above embodiment extracts the device of theme track in audio data, optionally, first determination Module is additionally configured to determine the time segment information of the voice period in track each within the scope of preset finite time；

Second determining module is additionally configured to determine the time segment information of every lyrics within the scope of finite time, obtain To the corresponding period information aggregate of lyrics information.

Optionally, the third determining module, including：

Searching unit, for by each time segment information A in the corresponding period information aggregate of lyrics information_i, successively It is compared, searches with finish time at the beginning of each time segment information in period information aggregate corresponding with track With A_iMeet the time segment information B of preset matching condition_j, wherein integer of the i between 1~n, integer of the j between 1~m；

Determination unit, for corresponding B will to be found_jA_iNumber time segment information corresponding with lyrics information The ratio of the number of all time segment informations in set is determined as the corresponding period information aggregate of track and lyrics information pair The matching degree for the period information aggregate answered.

Optionally, preset matching condition is A_iAt the beginning of with B_jAt the beginning of between time difference preset In one threshold value, and A_iFinish time and breath B_jFinish time between time difference in first threshold；Alternatively,

Preset matching condition is A_iAt the beginning of with B_jAt the beginning of between time difference, add A_iFinish time with B_jFinish time between time difference summation in preset second threshold.

Optionally, the 4th determining module is additionally configured to reach preset matching degree threshold value in corresponding matching degree In track, by the highest track of corresponding matching degree, it is determined as the theme track of audio data.

Optionally, the format of target audio data is midi format.

It should be noted that：The device of theme track in extraction audio data provided by the above embodiment, only with above-mentioned The division progress of each functional module can according to need and for example, in practical application by above-mentioned function distribution by different Functional module is completed, i.e., the internal structure of server is divided into different functional modules, with complete it is described above whole or Person's partial function.In addition, provided by the above embodiment extract the device of theme track in audio data and extract audio data The embodiment of the method for middle theme track belongs to same design, and specific implementation process is detailed in embodiment of the method, no longer superfluous here It states.

The embodiment of the present application also provides a kind of computer readable storage medium, which be can be Computer readable storage medium included in memory；It is also possible to individualism, without the calculating in supplying intelligent terminal Machine readable storage medium storing program for executing.The computer-readable recording medium storage has at least one instruction, this at least one instruction by one or The method that the more than one processor of person is used to execute theme track in said extracted audio data.

Fig. 4 shows the structural block diagram of the terminal 400 of one exemplary embodiment of the application offer.The terminal 400 can be with It is：Smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, Dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts GroupAudio Layer IV, dynamic image expert's compression standard audio level 4) player, laptop or desktop computer.Terminal 400 be also possible to by Referred to as other titles such as user equipment, portable terminal, laptop terminal, terminal console.

In general, terminal 400 includes：Processor 401 and memory 402.

Processor 401 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 401 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable LogicArray, may be programmed Logic array) at least one of example, in hardware realize.Processor 401 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 401 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 401 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.

Memory 402 may include one or more computer readable storage mediums, which can To be non-transient.Memory 402 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 402 can Storage medium is read for storing at least one instruction, at least one instruction performed by processor 401 for realizing this Shen Please in embodiment of the method provide extract audio data in theme track method.

In some embodiments, terminal 400 is also optional includes：Peripheral device interface 403 and at least one peripheral equipment. It can be connected by bus or signal wire between processor 401, memory 402 and peripheral device interface 403.Each peripheral equipment It can be connected by bus, signal wire or circuit board with peripheral device interface 403.Specifically, peripheral equipment includes：Radio circuit 404, at least one of touch display screen 405, camera 406, voicefrequency circuit 407, positioning component 408 and power supply 409.

Peripheral device interface 403 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 401 and memory 402.In some embodiments, processor 401, memory 402 and peripheral equipment Interface 403 is integrated on same chip or circuit board；In some other embodiments, processor 401, memory 402 and outer Any one or two in peripheral equipment interface 403 can realize on individual chip or circuit board, the present embodiment to this not It is limited.

Radio circuit 404 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates Frequency circuit 404 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 404 turns electric signal It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 404 wraps It includes：Antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip Group, user identity module card etc..Radio circuit 404 can be carried out by least one wireless communication protocol with other terminals Communication.The wireless communication protocol includes but is not limited to：Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 5G), wireless office Domain net and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio circuit 404 may be used also To include the related circuit of NFC (Near Field Communication, wireless near field communication), the application is not subject to this It limits.

Display screen 405 is for showing UI (UserInterface, user interface).The UI may include figure, text, figure Mark, video and its their any combination.When display screen 405 is touch display screen, display screen 405 also there is acquisition to show The ability of the touch signal on the surface or surface of screen 405.The touch signal can be used as control signal and be input to processor 401 are handled.At this point, display screen 405 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or Soft keyboard.In some embodiments, display screen 405 can be one, and the front panel of terminal 400 is arranged；In other embodiments In, display screen 405 can be at least two, be separately positioned on the different surfaces of terminal 400 or in foldover design；In still other reality It applies in example, display screen 405 can be flexible display screen, be arranged on the curved surface of terminal 400 or on fold plane.Even, it shows Display screen 405 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 405 can use LCD (Liquid Crystal Display, liquid crystal display), OLED (OrganicLight-Emitting Diode, Organic Light Emitting Diode) Etc. materials preparation.

CCD camera assembly 406 is for acquiring image or video.Optionally, CCD camera assembly 406 include front camera and Rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.One In a little embodiments, rear camera at least two is main camera, depth of field camera, wide-angle camera, focal length camera shooting respectively Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle Camera fusion realizes that pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are clapped Camera shooting function.In some embodiments, CCD camera assembly 406 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp, It is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for not With the light compensation under colour temperature.

Voicefrequency circuit 407 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and will Sound wave, which is converted to electric signal and is input to processor 401, to be handled, or is input to radio circuit 404 to realize voice communication. For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 400 to be multiple.Mike Wind can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 401 or radio circuit will to be come from 404 electric signal is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramic loudspeaker.When When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also be by telecommunications Number the sound wave that the mankind do not hear is converted to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 407 can also include Earphone jack.

Positioning component 408 is used for the current geographic position of positioning terminal 400, to realize navigation or LBS (Location Based Service, location based service).Positioning component 408 can be the GPS (Global based on the U.S. Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union The positioning component of Galileo system.

Power supply 409 is used to be powered for the various components in terminal 400.Power supply 409 can be alternating current, direct current, Disposable battery or rechargeable battery.When power supply 409 includes rechargeable battery, which can support wired charging Or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.

In some embodiments, terminal 400 further includes having one or more sensors 410.The one or more sensors 410 include but is not limited to：Acceleration transducer 411, gyro sensor 412, pressure sensor 413, fingerprint sensor 414, Optical sensor 415 and proximity sensor 416.

The acceleration that acceleration transducer 411 can detecte in three reference axis of the coordinate system established with terminal 400 is big It is small.For example, acceleration transducer 411 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 401 can With the acceleration of gravity signal acquired according to acceleration transducer 411, touch display screen 405 is controlled with transverse views or longitudinal view Figure carries out the display of user interface.Acceleration transducer 411 can be also used for the acquisition of game or the exercise data of user.

Gyro sensor 412 can detecte body direction and the rotational angle of terminal 400, and gyro sensor 412 can To cooperate with acquisition user to act the 3D of terminal 400 with acceleration transducer 411.Processor 401 is according to gyro sensor 412 Following function may be implemented in the data of acquisition：When action induction (for example changing UI according to the tilt operation of user), shooting Image stabilization, game control and inertial navigation.

The lower layer of side frame and/or touch display screen 405 in terminal 400 can be set in pressure sensor 413.Work as pressure When the side frame of terminal 400 is arranged in sensor 413, user can detecte to the gripping signal of terminal 400, by processor 401 Right-hand man's identification or prompt operation are carried out according to the gripping signal that pressure sensor 413 acquires.When the setting of pressure sensor 413 exists When the lower layer of touch display screen 405, the pressure operation of touch display screen 405 is realized to UI circle according to user by processor 401 Operability control on face is controlled.Operability control includes button control, scroll bar control, icon control, menu At least one of control.

Fingerprint sensor 414 is used to acquire the fingerprint of user, collected according to fingerprint sensor 414 by processor 401 The identity of fingerprint recognition user, alternatively, by fingerprint sensor 414 according to the identity of collected fingerprint recognition user.It is identifying When the identity of user is trusted identity out, the user is authorized to execute relevant sensitive operation, the sensitive operation packet by processor 401 Include solution lock screen, check encryption information, downloading software, payment and change setting etc..Terminal can be set in fingerprint sensor 414 400 front, the back side or side.When being provided with physical button or manufacturer Logo in terminal 400, fingerprint sensor 414 can be with It is integrated with physical button or manufacturer Logo.

Optical sensor 415 is for acquiring ambient light intensity.In one embodiment, processor 401 can be according to optics The ambient light intensity that sensor 415 acquires controls the display brightness of touch display screen 405.Specifically, when ambient light intensity is higher When, the display brightness of touch display screen 405 is turned up；When ambient light intensity is lower, the display for turning down touch display screen 405 is bright Degree.In another embodiment, the ambient light intensity that processor 401 can also be acquired according to optical sensor 415, dynamic adjust The acquisition parameters of CCD camera assembly 406.

Proximity sensor 416, also referred to as range sensor are generally arranged at the front panel of terminal 400.Proximity sensor 416 For acquiring the distance between the front of user Yu terminal 400.In one embodiment, when proximity sensor 416 detects use When family and the distance between the front of terminal 400 gradually become smaller, touch display screen 405 is controlled from bright screen state by processor 401 It is switched to breath screen state；When proximity sensor 416 detects user and the distance between the front of terminal 400 becomes larger, Touch display screen 405 is controlled by processor 401 and is switched to bright screen state from breath screen state.

It will be understood by those skilled in the art that the restriction of the not structure paired terminal 400 of structure shown in Fig. 4, can wrap It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.

It should be understood that it is used in the present context, unless the context clearly supports exceptions, singular " one It is a " (" a ", " an ", " the ") be intended to also include plural form.It is to be further understood that "and/or" used herein is Refer to any and all possible combinations including one or more than one project listed in association.

Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

The foregoing is merely the preferred embodiments of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.

Claims

1. a kind of method for extracting theme track in audio data, which is characterized in that the method includes：

Multiple tracks in target audio data are extracted, the time segment information of the voice period in each track is determined, obtains The corresponding period information aggregate of each track；

In the corresponding lyrics information of the target audio data, determines the time segment information of every lyrics, obtain the lyrics The corresponding period information aggregate of information；

Determine the corresponding period information aggregate of each track period information aggregate corresponding with the lyrics information Matching degree；

By the highest track of corresponding matching degree, it is determined as the theme track of the target audio data.

2. the method according to claim 1, wherein the time of the voice period in each track of the determination Segment information, including：

Determine the time segment information of the voice period in track each within the scope of preset finite time；

The time segment information of every lyrics of the determination obtains the corresponding period information aggregate of the lyrics information, including：

The time segment information for determining every lyrics within the scope of the finite time, obtains the lyrics information corresponding period Information aggregate.

3. the method according to claim 1, wherein the corresponding time segment information of the determination each track Gather the matching degree of period information aggregate corresponding with the lyrics information, including：

By each time segment information A in the corresponding period information aggregate of the lyrics information_i, successively corresponding with the track Period information aggregate in each time segment information at the beginning of and finish time be compared, search and A_iMeet pre- If the time segment information B of matching condition_j, wherein integer of the i between 1~n, integer of the j between 1~m；

Corresponding B will be found_jA_iNumber period information aggregate corresponding with the lyrics information in institute sometimes Between segment information number ratio, be determined as the corresponding period information aggregate of the track it is corresponding with the lyrics information when Between segment information set matching degree.

4. according to the method described in claim 3, it is characterized in that, the preset matching condition is A_iAt the beginning of with B_j's Time difference between start time is in preset first threshold, and A_iFinish time and breath B_jFinish time between when Between difference in the first threshold；Alternatively,

The preset matching condition is A_iAt the beginning of with B_jAt the beginning of between time difference, add A_iFinish time and B_j Finish time between time difference summation in preset second threshold.

5. being determined as the method according to claim 1, wherein described by the highest track of corresponding matching degree The theme track of the target audio data, including：

In the track that corresponding matching degree reaches preset matching degree threshold value, the highest track of corresponding matching degree determines For the theme track of the target audio data.

6. any method in -5 according to claim 1, which is characterized in that the format of the target audio data is MIDI Format.

7. a kind of device for extracting theme track in audio data, which is characterized in that described device includes：

First determining module determines the voice period in each track for extracting multiple tracks in target audio data Time segment information, obtain the corresponding period information aggregate of each track；

Second determining module, for determining the period of every lyrics in the corresponding lyrics information of the target audio data Information obtains the corresponding period information aggregate of the lyrics information；

Third determining module, for determining that the corresponding period information aggregate of each track is corresponding with the lyrics information The matching degree of period information aggregate；

4th determining module, for being determined as the theme of the target audio data for the highest track of corresponding matching degree Audio tracks.

8. device according to claim 7, which is characterized in that first determining module is additionally configured to determine pre- If finite time within the scope of voice period in each track time segment information；

Second determining module is additionally configured to determine the time segment information of every lyrics within the scope of the finite time, Obtain the corresponding period information aggregate of the lyrics information.

9. device according to claim 7, which is characterized in that the third determining module, including：

Searching unit, for by each time segment information A in the corresponding period information aggregate of the lyrics information_i, successively with It is compared, looks into finish time at the beginning of each time segment information in the corresponding period information aggregate of the track It looks for and A_iMeet the time segment information B of preset matching condition_j, wherein integer of the i between 1~n, integer of the j between 1~m；

Determination unit, for corresponding B will to be found_jA_iNumber time segment information corresponding with the lyrics information The ratio of the number of all time segment informations in set is determined as the corresponding period information aggregate of the track and the song The matching degree of the corresponding period information aggregate of word information.

10. device according to claim 9, which is characterized in that the preset matching condition is A_iAt the beginning of with B_j's Time difference between start time is in preset first threshold, and A_iFinish time and breath B_jFinish time between when Between difference in the first threshold；Alternatively,

11. device according to claim 7, which is characterized in that the 4th determining module is additionally configured to corresponding Matching degree reaches in the track of preset matching degree threshold value, by the highest track of corresponding matching degree, is determined as the target sound The theme track of frequency evidence.

12. according to the device any in claim 7-11, which is characterized in that the format of the target audio data is Midi format.

13. a kind of terminal, which is characterized in that the terminal includes processor and memory, is stored at least in the memory One instruction, described instruction are loaded by the processor and are executed to realize the extraction audio as described in claim 1 to 6 is any The method of theme track in data.

14. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, institute in the storage medium Instruction is stated to be loaded by processor and executed to realize theme sound in the extraction audio data as described in claim 1 to 6 is any The method of rail.