CN108268667A

CN108268667A - Audio file clustering method and device

Info

Publication number: CN108268667A
Application number: CN201810160189.9A
Authority: CN
Inventors: 龙飞
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2018-02-26
Filing date: 2018-02-26
Publication date: 2018-07-10

Abstract

The disclosure is directed to audio file clustering method and devices.This method includes：Obtain multiple audio files；According to the feature of the multiple audio file, the multiple audio file is clustered.In the technical solution, all audio files can be clustered according to feature, in this way, audio file can carry out classification storage by feature, searched audio file convenient for user, improved user experience.

Description

Audio file clustering method and device

Technical field

This disclosure relates to computer realm more particularly to audio file clustering method and device.

Background technology

At present, terminal not only can be also used as a storage device, especially as the equipment of an amusement communication It is the photo for storing user.Photo can make sound photo at present, i.e., while photo is shot or can record later One section of sound is as souvenir, in this way, the photo and sound is as a whole file storage, this document be can according to piece when Between stored, user can only classify for file manually.

Invention content

The embodiment of the present disclosure provides audio file clustering method and device.The technical solution is as follows：

According to the embodiment of the present disclosure in a first aspect, provide a kind of audio file clustering method, including：

Obtain multiple audio files；

According to the feature of the multiple audio file, the multiple audio file is clustered.

The technical scheme provided by this disclosed embodiment can include the following benefits：All audio files can be by It is clustered according to feature, in this way, audio file can carry out classification storage by feature, searches audio file convenient for user, improve User experience.

In one embodiment, the feature according to the multiple audio file carries out the multiple audio file Cluster includes：

Obtain the feature of each audio file；

The feature of each audio file is compared, the audio file with identical default feature is aggregated in identical file folder In.

In one embodiment, described to be characterized in frequency spectrum, the feature of comparison each audio file will have identical The audio file of default feature is aggregated in identical file folder and includes：

According to the frequency spectrum of each audio file, each audio file is classified；

Same type of audio file is aggregated in identical file folder.

In one embodiment, described to be characterized in duration, the feature of each audio file of comparison will have identical default The audio file of feature is aggregated in identical file folder and includes：

Determine that the audio file that duration belongs to same preset duration range is same type；

The audio file that the duration is belonged to same preset duration range is aggregated in identical file folder.

In one embodiment, the method further includes：

Time-domain sampling is carried out to the multiple audio file, obtains sampled result；

According to the sampled result and default neural network, classify to the multiple audio file；The default god It is obtained through network according to audio file sample training.

In one embodiment, fundamental frequency is obtained from the frequency spectrum of each audio file；

According to the fundamental frequency, each audio file is divided into male's class and women class；Wherein, the audio of male's class File includes audio file of the fundamental frequency in default male's base frequency range, and the audio file of the women class includes fundamental frequency default Audio file in women base frequency range.

In one embodiment, the harmonic wave intensity of the frequency spectrum for being characterized in audio file；It is described literary according to the audio The frequency spectrum of part includes audio file classification：

Harmonic wave intensity is obtained from the frequency spectrum of each audio file；

According to the harmonic wave intensity of frequency spectrum, spectra file of the harmonic wave intensity in default harmonic wave strength range is aggregated to same In file.

According to the second aspect of the embodiment of the present disclosure, a kind of audio file clustering apparatus is provided, including：

Acquisition module, for obtaining multiple audio files；

Cluster module for the feature according to the multiple audio file, clusters the multiple audio file.

In one embodiment, the cluster module includes：

Acquisition submodule, for obtaining the feature of each audio file；

Submodule is compared, for comparing the feature of each audio file, by the audio file with identical default feature It is aggregated in identical file folder.

In one embodiment, described to be characterized in frequency spectrum, the comparison submodule includes：

For the frequency spectrum according to each audio file, each audio file is classified for taxon；

First setting unit, for same type of audio file to be aggregated in identical file folder.

In one embodiment, described to be characterized in duration, the comparison submodule includes：

Determination unit, for determining that the audio file that duration belongs to same preset duration range is same type；

Second setting unit, the audio file for the duration to be belonged to same preset duration range are aggregated to same text In part folder.

In one embodiment, described device further includes：

Sampling module for carrying out time-domain sampling to the multiple audio file, obtains sampled result；

Processing module, for according to the sampled result and default neural network, dividing the multiple audio file Class；The default neural network is obtained according to audio file sample training.

In one embodiment, the fundamental frequency of the frequency spectrum for being characterized in audio file；The classification submodule includes：

First acquisition unit, for obtaining fundamental frequency from the frequency spectrum of each audio file；

Cutting unit, for according to the fundamental frequency, each audio file to be divided into male's class and women class；Wherein, institute The audio file for stating male's class includes audio file of the fundamental frequency in default male's base frequency range, the audio file of the women class Including fundamental frequency the audio file in default women base frequency range.

Second acquisition unit, for obtaining harmonic wave intensity from the frequency spectrum of each audio file；

Division unit, for the harmonic wave intensity according to frequency spectrum, by frequency spectrum of the harmonic wave intensity in default harmonic wave strength range File is aggregated in identical file folder.

According to the third aspect of the embodiment of the present disclosure, a kind of audio classification device is provided, including：

Processor；

For storing the memory of processor-executable instruction；

Wherein, the processor is configured as：

Obtain multiple audio files；

It should be understood that above general description and following detailed description are only exemplary and explanatory, not The disclosure can be limited.

Description of the drawings

Attached drawing herein is incorporated into specification and forms the part of this specification, shows the implementation for meeting the disclosure Example, and for explaining the principle of the disclosure together with specification.

Fig. 1 is the flow chart according to the audio file clustering method shown in an exemplary embodiment.

Fig. 2 is the flow chart according to the audio file clustering method shown in an exemplary embodiment.

Fig. 3 is the flow chart according to the audio file clustering method shown in an exemplary embodiment.

Fig. 4 is the flow chart according to the audio file clustering method shown in an exemplary embodiment.

Fig. 5 is the flow chart according to the audio file clustering method shown in an exemplary embodiment.

Fig. 6 is the flow chart according to the audio file clustering method shown in an exemplary embodiment.

Fig. 7 is the flow chart according to the audio file clustering method shown in an exemplary embodiment.

Fig. 8 is the block diagram according to the audio file clustering apparatus shown in an exemplary embodiment.

Fig. 9 is the block diagram according to the audio file clustering apparatus shown in an exemplary embodiment.

Figure 10 is the block diagram according to the audio file clustering apparatus shown in an exemplary embodiment.

Figure 11 is the block diagram according to the audio file clustering apparatus shown in an exemplary embodiment.

Figure 12 is the block diagram according to the audio file clustering apparatus shown in an exemplary embodiment.

Figure 13 is the block diagram according to the audio file clustering apparatus shown in an exemplary embodiment.

Figure 14 is the block diagram according to the audio file clustering apparatus shown in an exemplary embodiment.

Figure 15 is the block diagram according to the audio file clustering apparatus shown in an exemplary embodiment.

Specific embodiment

Here exemplary embodiment will be illustrated in detail, example is illustrated in the accompanying drawings.Following description is related to During attached drawing, unless otherwise indicated, the same numbers in different attached drawings represent the same or similar element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.

Fig. 1 be according to a kind of flow chart of audio file clustering method shown in an exemplary embodiment, as shown in Figure 1, Audio file clustering method is used in audio classification device, and this method may comprise steps of 101-102：

In a step 101, multiple audio files are obtained.

Here, audio file can be obtained from the device of storage audio file.

In a step 102, according to the feature of multiple audio files, multiple audio files are clustered.

The feature of audio file can include harmonic wave intensity of the frequency spectrum of audio file, duration, fundamental frequency and frequency spectrum etc..Duration Refer to the playing duration of audio file.

Here, cluster is that the set of physics or abstract object is divided into the process of multiple classes being made of similar object. By clustering the set that generated one kind is one group of data object, these data objects and the object in same class phase each other Seemingly, it is different with the object in other classes.Cluster has similar or identical feature for a kind of audio file in the present embodiment.

In one embodiment, as shown in Fig. 2, step 102, i.e., according to the feature of multiple audio files, to multiple audios File is clustered, and can be included：

In step 1021, the feature of each audio file is obtained.

Feature needed for being obtained from audio file.

In step 1022, the feature of each audio file of device is compared, the audio file with identical default feature is gathered It closes in identical file folder.

For being characterized in duration, identical preset is characterized in preset duration range.Duration is in the audio of preset duration range File is aggregated in a file.Here default is characterized in user setting.

In one embodiment, when being characterized in frequency spectrum, as shown in figure 3, step 1022, that is, compare each audio file of device Feature, by the audio file with identical default feature be aggregated in identical file folder in, can include：

In step 10221, according to the frequency spectrum of each audio file, each audio file is classified.

Frequency spectrum includes many information, such as fundamental frequency harmony intensity of wave, and gender can be distinguished by fundamental frequency, strong by harmonic wave Degree can distinguish different people.

In step 10222, same type of audio file is aggregated in identical file folder.

In one embodiment, when being characterized in duration, as shown in figure 4, step 1022, that is, the spy of each audio file is compared Audio file with identical default feature is aggregated in identical file folder, can include by sign：

In step 10223, determine that the audio file that duration belongs to same preset duration range is same type.

In step 10224, the audio file that duration is belonged to same preset duration range is aggregated in identical file folder.

Here, preset duration range has multiple, it is assumed that there are three preset duration ranges, and the first preset duration range is arrived 0 20 seconds, at 20 seconds to 5 minutes, third duration range was greater than 5 minutes the second duration range.Each preset duration range has one A file is corresponding to it.

If the duration of the first audio file be 4 points 30 seconds, then the duration of the first audio file is belonged to second and is preset First audio file is put into file corresponding with the second preset duration range by duration range；If the second audio file Duration be 10 seconds, then the duration of the second audio file belongs to the first preset duration range, by the second audio file be put into In the corresponding file of first preset duration range.

In one embodiment, as shown in figure 5, this method further includes：

In step 103, to device, multiple audio files carry out time-domain sampling, obtain sampled result.

Since audio is analog signal, it is therefore desirable to sample.

At step 104, according to sampled result and default neural network, classify to multiple audio files.

Wherein, default neural network is obtained according to audio file sample training.

Here, multiple audio files are to input default neural network one by one, and default neural network can export each sound The classification results of frequency file, each classification results characterize the type corresponding to corresponding audio file.

In the present embodiment, audio file sample is different, then the division result of the default neural network trained is also different.

If audio file sample is included for the audio file of training and audio file corresponding gender, to default god Through network inputs for the audio file of training, classification results are obtained, according to classification results gender corresponding with the audio file It is compared, changes the parameter of default neural network.

If audio file sample is included for the audio file of training and audio file corresponding user, to default god Through network inputs for the audio file of training, classification results are obtained, according to classification results user corresponding with the audio file It is compared, changes the parameter of default neural network.

In one embodiment, as shown in fig. 6, being characterized in the fundamental frequency of the frequency spectrum of audio file, step 102, i.e., according to dress The frequency spectrum of audio file is put, audio file is classified, can be included：

In step 1023, fundamental frequency is obtained from the frequency spectrum of each audio file.

In step 1024, according to fundamental frequency, each audio file is divided into male's class and women class.

Wherein, the audio file of male's class includes audio file of the fundamental frequency in default male's base frequency range, women class Audio file includes audio file of the fundamental frequency in default women base frequency range.

In the present embodiment, this feature is the fundamental frequency in frequency spectrum, presets feature and includes default male's base frequency range and women base Frequency range, since fundamental frequency can distinguish gender, then the sound that can distinguish in audio file by presetting feature be by Male or women sounding, and file can be named by gender.

In one embodiment, as shown in fig. 7, when the harmonic wave intensity for the frequency spectrum for being characterized in audio file, step 102, I.e. according to the frequency spectrum of audio file, audio file is classified, can be included：

In step 1025, harmonic wave intensity is obtained from the frequency spectrum of each audio file.

It is according to the harmonic wave intensity of frequency spectrum, frequency spectrum of the harmonic wave intensity in default harmonic wave strength range is literary in step 1026 Part is aggregated in identical file folder.

In the present embodiment, this feature is the harmonic wave intensity of the frequency spectrum of audio file.Tone, the sound spoken due to everyone Difference, therefore, the harmonic wave intensity of the frequency spectrum of everyone sound also can be different, can record the sound of several pre-set users in advance, The harmonic wave intensity (obtaining default feature) of the frequency spectrum of pre-set user is obtained according to sound, if in this way, the harmonic wave of audio file Intensity is identical with the harmonic wave intensity of some pre-set user, then, audio file is assigned to the file of the pre-set user；By audio Frequency spectrum harmonic wave intensity do not assigned in all audio files of the harmonic wave strength range of any one pre-set user one it is specified File.Here, file can be named by user, which can be named with unknown subscriber.

What deserves to be explained is the present embodiment can not only classify to audio file by the feature of the audio file, Associated associated with can classify to audio file, by taking sound photo as an example, sound photo include audio file and With the associated photo of audio file (associated with of the present embodiment), sound photo can be carried out by the feature of audio file Classification, for example, sound photo is put into corresponding file by the duration according to audio file.

Following is embodiment of the present disclosure, can be used for performing embodiments of the present disclosure.

Fig. 8 is according to a kind of block diagram of audio classification device shown in an exemplary embodiment, which can be by soft Part, hardware or both are implemented in combination with as some or all of of electronic equipment.As shown in figure 8, the audio classification device Including：

Acquisition module 201, for obtaining multiple audio files；

Cluster module 202 for the feature according to the multiple audio file, gathers the multiple audio file Class.

In one embodiment, as shown in figure 9, the cluster module 202 includes：

First acquisition submodule 2021, for obtaining the feature of each audio file；

Submodule 2022 is compared, for comparing the feature of each audio file, by the audio with identical default feature File is aggregated in identical file folder.

In one embodiment, as shown in Figure 10, described to be characterized in frequency spectrum, the comparison submodule 2022 includes：

For the frequency spectrum according to each audio file, each audio file is classified for taxon 20221；

First setting unit 20222, for same type of audio file to be aggregated in identical file folder.

In one embodiment, as shown in figure 11, described to be characterized in duration, the comparison submodule 2022 includes：

Determination unit 20223, for determining that the audio file that duration belongs to same preset duration range is same type；

Second setting unit 20224, the audio file for the duration to be belonged to same preset duration range are aggregated to In identical file folder.

In one embodiment, as shown in figure 12, described device further includes：

Sampling module 203 for carrying out time-domain sampling to the multiple audio file, obtains sampled result；

Processing module 204, for according to the sampled result and default neural network, being carried out to the multiple audio file Classification；The default neural network is obtained according to audio file sample training.

In one embodiment, as shown in figure 13, the fundamental frequency of the frequency spectrum for being characterized in audio file；The cluster module 202 include：

Second acquisition submodule 2023, for obtaining fundamental frequency from the frequency spectrum of each audio file；

Cutting unit 2024, for according to the fundamental frequency, each audio file to be divided into male's class and women class；Its In, the audio file of male's class includes audio file of the fundamental frequency in default male's base frequency range, the sound of the women class Frequency file includes audio file of the fundamental frequency in default women base frequency range；

In one embodiment, as shown in figure 14, the harmonic wave intensity of the frequency spectrum for being characterized in audio file；The cluster Module 202 includes：

Third acquisition submodule 2025, for obtaining harmonic wave intensity from the frequency spectrum of each audio file；

Submodule 2026 is divided, for the harmonic wave intensity according to frequency spectrum, by harmonic wave intensity in default harmonic wave strength range Spectra file be aggregated to identical file folder in.

Processor；

For storing the memory of processor-executable instruction；

Wherein, the processor is configured as：

Obtain multiple audio files；

Processor；

For storing the memory of processor-executable instruction；

Wherein, processor is configured as：

Obtain multiple audio files；

The feature according to the multiple audio file carries out the multiple audio file cluster and includes：

Obtain the feature of each audio file；

It is described to be characterized in frequency spectrum, the feature of comparison each audio file, by the audio with identical default feature File is aggregated in identical file folder and includes：

Same type of audio file is aggregated in identical file folder.

It is described to be characterized in duration, the feature of each audio file of comparison, by the audio file with identical default feature Identical file folder is aggregated in include：

The method further includes：

Fundamental frequency is obtained from the frequency spectrum of each audio file；

The harmonic wave intensity of the frequency spectrum for being characterized in audio file；The frequency spectrum according to the audio file, by audio Document classification includes：

About the device in above-described embodiment, wherein modules perform the concrete mode of operation in related this method Embodiment in be described in detail, explanation will be not set forth in detail herein.

Figure 15 is according to a kind of block diagram for audio file clustering apparatus shown in an exemplary embodiment, which fits For terminal device.For example, device 1700 can be mobile phone, and computer, digital broadcast terminal, messaging devices, trip Play console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc..

Device 1700 can include following one or more components：Processing component 1702, memory 1704, power supply module 1706, multimedia component 1708, audio component 1710, input/output (I/O) interface 1712, sensor module 1714, Yi Jitong Believe component 1716.

The integrated operation of 1702 usual control device 1700 of processing component, such as with display, call, data communication, Camera operation and record operate associated operation.Processing component 1702 can be performed including one or more processors 1720 Instruction, to perform all or part of the steps of the methods described above.In addition, processing component 1702 can include one or more moulds Block, convenient for the interaction between processing component 1702 and other assemblies.For example, processing component 1702 can include multi-media module, To facilitate the interaction between multimedia component 1708 and processing component 1702.

Memory 1704 is configured as storing various types of data to support the operation in device 1700.These data Example is included for the instruction of any application program or method that is operated on device 1700, contact data, telephone book data, Message, picture, video etc..Memory 1704 can by any kind of volatibility or non-volatile memory device or they Combination is realized, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM), it is erasable can Program read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory Reservoir, disk or CD.

Power supply module 1706 provides electric power for the various assemblies of device 1700.Power supply module 1706 can include power management System, one or more power supplys and other generate, manage and distribute electric power associated component with for device 1700.

Multimedia component 1708 is included in the screen of one output interface of offer between described device 1700 and user. In some embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, Screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch and passes Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or slide dynamic The boundary of work, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, it is more Media component 1708 includes a front camera and/or rear camera.When device 1700 is in operation mode, mould is such as shot When formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shooting Head and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 1710 is configured as output and/or input audio signal.For example, audio component 1710 includes a wheat Gram wind (MIC), when device 1700 is in operation mode, during such as call model, logging mode and speech recognition mode, microphone quilt It is configured to receive external audio signal.The received audio signal can be further stored in memory 1704 or via communication Component 1716 is sent.In some embodiments, audio component 1710 further includes a loud speaker, for exports audio signal.

I/O interfaces 1712 provide interface, above-mentioned peripheral interface module between processing component 1702 and peripheral interface module Can be keyboard, click wheel, button etc..These buttons may include but be not limited to：Home button, volume button, start button and Locking press button.

Sensor module 1714 includes one or more sensors, and the state for providing various aspects for device 1700 is commented Estimate.For example, sensor module 1714 can detect opening/closed state of device 1700, the relative positioning of component, such as institute The display and keypad that component is device 1700 are stated, sensor module 1714 can be with detection device 1700 or device 1,700 1 The position change of a component, the existence or non-existence that user contacts with device 1700,1700 orientation of device or acceleration/deceleration and dress Put 1700 temperature change.Sensor module 1714 can include proximity sensor, be configured in no any physics It is detected the presence of nearby objects during contact.Sensor module 1714 can also include optical sensor, as CMOS or ccd image are sensed Device, for being used in imaging applications.In some embodiments, which can also include acceleration sensing Device, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 1716 is configured to facilitate the communication of wired or wireless way between device 1700 and other equipment.Dress The wireless network based on communication standard, such as WiFi can be accessed by putting 1700,2G or 3G or combination thereof.It is exemplary at one In embodiment, communication component 1716 receives broadcast singal or broadcast correlation from external broadcasting management system via broadcast channel Information.In one exemplary embodiment, the communication component 1716 further includes near-field communication (NFC) module, to promote short distance Communication.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 1700 can be by one or more application application-specific integrated circuit (ASIC), number Signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic building bricks are realized, for performing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 1704 of instruction, above-metioned instruction can be performed to complete the above method by the processor 1720 of device 1700.Example Such as, the non-transitorycomputer readable storage medium can be ROM, it is random access memory (RAM), CD-ROM, tape, soft Disk and optical data storage devices etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of device 1700 When device performs so that device 1700 is able to carry out above-mentioned audio file clustering method, the method includes：

Obtain multiple audio files；

Obtain the feature of each audio file；

Same type of audio file is aggregated in identical file folder.

The method further includes：

The fundamental frequency of the frequency spectrum for being characterized in audio file；The frequency spectrum according to the audio file, by audio file Classification includes：

Those skilled in the art will readily occur to the disclosure its after considering specification and putting into practice disclosure disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments are considered only as illustratively, and the true scope and spirit of the disclosure are by following Claim is pointed out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by appended claim.

Claims

1. a kind of audio file clustering method, which is characterized in that including：

Obtain multiple audio files；

2. according to the method described in claim 1, it is characterized in that, the feature according to the multiple audio file, to institute It states multiple audio files and cluster and include：

Obtain the feature of each audio file；

The feature of each audio file is compared, the audio file with identical default feature is aggregated in identical file folder.

3. according to the method described in claim 2, it is characterized in that, described be characterized in frequency spectrum, comparison each audio text Audio file with identical default feature is aggregated in identical file folder and included by the feature of part：

Same type of audio file is aggregated in identical file folder.

4. according to the method described in claim 2, it is characterized in that, described be characterized in duration, each audio file of comparison Audio file with identical default feature is aggregated in identical file folder and included by feature：

5. according to the method described in claim 1, it is characterized in that, the method further includes：

According to the sampled result and default neural network, classify to the multiple audio file；The default nerve net Network is obtained according to audio file sample training.

6. the according to the method described in claim 3, it is characterized in that, fundamental frequency of the frequency spectrum for being characterized in audio file；It is described According to the frequency spectrum of the audio file, audio file classification is included：

According to the fundamental frequency, each audio file is divided into male's class and women class；Wherein, the audio file of male's class Including fundamental frequency the audio file in default male's base frequency range, the audio file of the women class includes fundamental frequency in default women Audio file in base frequency range.

7. the according to the method described in claim 3, it is characterized in that, harmonic wave intensity of the frequency spectrum for being characterized in audio file； The frequency spectrum according to the audio file includes audio file classification：

According to the harmonic wave intensity of frequency spectrum, spectra file of the harmonic wave intensity in default harmonic wave strength range is aggregated to identical file In folder.

8. a kind of audio file clustering apparatus, which is characterized in that including：

Acquisition module, for obtaining multiple audio files；

9. device according to claim 8, which is characterized in that the cluster module includes：

First acquisition submodule, for obtaining the feature of each audio file；

Submodule is compared, for comparing the feature of each audio file, will be polymerize with the audio file of identical default feature In identical file folder.

10. device according to claim 9, which is characterized in that described to be characterized in frequency spectrum, the comparison submodule includes：

11. device according to claim 9, which is characterized in that described to be characterized in duration, the comparison submodule includes：

Second setting unit, the audio file for the duration to be belonged to same preset duration range are aggregated to identical file folder In.

12. device according to claim 9, which is characterized in that described device further includes：

Processing module, for according to the sampled result and default neural network, classifying to the multiple audio file；Institute Default neural network is stated to be obtained according to audio file sample training.

13. device according to claim 10, which is characterized in that the fundamental frequency of the frequency spectrum for being characterized in audio file；Institute Cluster module is stated to include：

Second acquisition submodule, for obtaining fundamental frequency from the frequency spectrum of each audio file；

Divide submodule, for according to the fundamental frequency, each audio file to be divided into male's class and women class；Wherein, it is described The audio file of male's class includes audio file of the fundamental frequency in default male's base frequency range, the audio file packet of the women class Include audio file of the fundamental frequency in default women base frequency range.

14. device according to claim 10, which is characterized in that the harmonic wave of the frequency spectrum for being characterized in audio file is strong Degree；The cluster module includes：

Third acquisition submodule, for obtaining harmonic wave intensity from the frequency spectrum of each audio file；

Submodule is divided, for the harmonic wave intensity according to frequency spectrum, by frequency spectrum text of the harmonic wave intensity in default harmonic wave strength range Part is aggregated in identical file folder.

15. a kind of audio file clustering apparatus, which is characterized in that including：

Processor；

For storing the memory of processor-executable instruction；

Wherein, the processor is configured as：

Obtain multiple audio files；

16. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The step of any one of claim 1-7 the methods are realized during execution.