CN103035247A

CN103035247A - Method and device of operation on audio/video file based on voiceprint information

Info

Publication number: CN103035247A
Application number: CN2012105181184A
Authority: CN
Inventors: 杨帆; 苏腾荣; 李世全; 马永健
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2012-12-05
Filing date: 2012-12-05
Publication date: 2013-04-10
Anticipated expiration: 2032-12-05
Also published as: CN107274916A; CN103035247B; CN107274916B

Abstract

The invention discloses a method of an operation on an audio or a video file based on voiceprint information. The method of the operation on the audio or the video file based on the voiceprint information comprises the following steps: collecting the voiceprint information of a vocalization objective; and searching the audio or the video file according to the voiceprint information. The invention further provides terminal equipment. According to the method and the device of the operation on the audio or the video file based on the voiceprint information, the audio or the video file can be categorized according to the voiceprint information of a specific contact, when a user wants to find the audio or the video file which contains the specific contact, the files do not need to be played and checked one by one, just a direct selection is needed, thus the user can easily find out the audio or the video file which contains the voice of the specific contact. In addition, with the help of the method and the device of the operation on the audio or the video file based on the operation of the voiceprint information, the audio or the video file is capable of directly skipping to a speaking time node of a certain contact in the audio or the video file and playing so that search efficiency can be improved for the user.

Description

The method and the device that audio/video file are operated based on voiceprint

Technical field

The present invention relates to the mobile device communication application, relate in particular to according to method and the device of particular contact vocal print to the operation of terminal device audio frequency and video.

Background technology

Phonographic recorder or image pick-up device on the existing terminal device can make things convenient for the user to record and take the Voice ﹠ Video file.Along with the performance of terminal device improves, memory capacity increases, and the kind of multimedia application such as increases at the condition, and the user is easy to record or take a large amount of audio/video files.Yet, facing to a large amount of audio/video files, when the user need to search the audio/video file that all record certain particular contact, or when searching and playing a certain section customizing messages of certain particular contact in certain audio/video file, owing to can't locate fast, can run into the situation of having no way of searching.Only have the one by one broadcast of file to check, just can obtain required file or fragment.

In view of this, need to provide a kind of fast finding and class object audio/video file, and locate method and the terminal device of particular contact time of occurrence point in this document, record the file of specific people's sound and video to make things convenient for the user to search.

Summary of the invention

In order to solve the problems of the technologies described above, realization user fast finding is recorded the file of specific people's sound or video.

One of purpose of the present invention be to provide a kind of based on voiceprint to the method that the audio/video file operates, comprise the steps: to gather the voiceprint of audible target; And according to described voiceprint search audio/video file.

Another object of the present invention is to provide a kind of terminal device, comprising: the voiceprint extraction module, for the voiceprint that gathers audible target; And execution module, be used for according to described voiceprint search audio/video file.

Method and apparatus provided by the invention can fast finding be recorded the file of specific people's sound or video, to improve user's search efficiency.

The aspect that the present invention adds and advantage in the following description part provide, and these will become obviously from the following description, or recognize by practice of the present invention.

Description of drawings

Above-mentioned and/or the additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of implementation method below in conjunction with accompanying drawing, wherein:

Fig. 1 shows according to an embodiment of the invention schematic flow sheet;

The terminal device that Fig. 2 shows according to one embodiment of the invention carries out audio collection interface synoptic diagram before;

Fig. 3 shows the process flow diagram according to the audio collection of the embodiment of the invention;

The interface synoptic diagram that Fig. 4 shows terminal device according to one embodiment of the invention when carrying out audio collection;

Terminal device demonstrated the interface synoptic diagram that marks hereof the voiceprint appearance that audible target is arranged and/or the time point that finishes after Fig. 5 showed and searches out the Audio and Video file of recording;

Fig. 6 shows the process flow diagram that terminal device is checked contact person's media library that passes through according to one embodiment of the invention;

Fig. 7 shows the process flow diagram of recording contact person's sound according to the embodiment of the invention;

Fig. 8 shows according to an embodiment of the invention one-piece construction synoptic diagram;

Fig. 9 shows according to an embodiment of the invention structural representation.

Specific implementation method

Specifically describe exemplary implementation method of the present invention referring now to accompanying drawing.Yet the present invention can be with many multi-form specific implementation methods of implementing and should not be construed as limited to set forth here; On the contrary, it is of the present invention thoroughly open and complete in order to make that these implementation methods are provided, and intactly passes on thought of the present invention, idea, purpose, design, reference scheme and protection domain to those skilled in the art.The term that uses in the detailed description of the concrete exemplary implementation method of example in the accompanying drawing is not meant to limit the present invention.In the accompanying drawing, same numeral refers to identical element.

Unless those skilled in the art of the present technique are appreciated that specially statement, singulative used herein " ", " one ", " described " and " being somebody's turn to do " also can comprise plural form.What will be further understood that is, the wording of using in the instructions of the present invention " comprises " and refers to exist described feature, integer, step, operation, element and/or assembly, do not exist or adds one or more other features, integer, step, operation, element, assembly and/or their group but do not get rid of.Should be appreciated that, when we claim element to be " connected " or " coupling " when another element, it can directly connect or be couple to other elements, perhaps also can have intermediary element.In addition, " connection " used herein or " coupling " can comprise wireless connections or couple.Wording used herein " and/or " comprise one or more arbitrary unit of listing item that is associated and all make up.

Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (comprising technical term and scientific terminology) have with the present invention under the identical meaning of the general understanding of those of ordinary skill in the field.Should also be understood that such as those terms that define in the general dictionary to be understood to have the consistent meaning of meaning in the context with prior art, unless and definition as here, do not use idealized or too formal implication is explained.

As shown in Figure 1, the invention provides a kind of based on voiceprint to the method that the audio/video file operates, comprise the steps: S1, gather the voiceprint of audible target; And S2, according to voiceprint search audio/video file.

For example, step S1 realizes by the following method: when contact person X1 made a phone call to user Y, terminal device was opened built-in phonographic recorder and is recorded separately voice (these speech voice of for example, recording of speech of one section contact person X1, and therefrom extract voiceprint time span 7-10 second); Then, after stopping to converse, terminal device deposits this sample in the media library in after generating speaker model M1 according to the voiceprint of recording; Then, terminal device is with the register of the corresponding contact person in address list X of speaker model.

For example, step S1 also realizes by the following method: when user Y band son X2 goes to the park to play, open " recording the vocal print sample " option in the record of terminal device son X2 in address list and record the voiceprint of son X2; Then, after stopping to record, after terminal device generated speaker model M2 according to the voiceprint of recording, this sample deposited in the terminal memory; Then, terminal device is with the file of contact person X2 in the corresponding media library of speaker model.Certainly, be appreciated that to be that media library is a kind of statement of storing multimedia set, also can be expressed as file, file manager, media manager, Video Manager, audio manager etc.As shown in Figure 5, when running into later the voiceprint that includes speaker model M1 and M2, terminal device is classified these Audio and Video files and mark according to special object (for example, " I " and " son ") again.After classification storage, can generate the information such as Subject field, file, media library of corresponding classification.

Step S1 can also realize as follows: step S11, elected in during a audible target (for example, Zhang San) in the address list application program, provide on the display screen and record vocal print sample option; Step S12, when the user click record vocal print sample option after, terminal device gathers voiceprint, and will be stored in according to the speaker model that voiceprint generates in contact person's media library; And step S13, after entering contact person's media library page, display screen presents the audio/video file that searches.Therefore, the voiceprint that gathers audible target comprises: during certain audible target, gather voiceprint in elected; And the voiceprint of storage of collected.

The terminal device that Fig. 2 shows according to one embodiment of the invention carries out audio collection interface synoptic diagram before.Fig. 3 shows the process flow diagram according to the audio collection of the embodiment of the invention.The audio collection flow process comprises the steps: step 101: the entry communication record, open particular contact on the telephone directory.Then, step 102: by " recording the vocal print sample " option (as shown in Figure 2), record contact person's sound (that is, gathering contact person's voiceprint).Then, step 103: record finish after, contact person's sound is carried out modeling, generating speaker model, and speaker model is saved in the associated person information.Therefore, gathering and store voiceprint comprises: generate speaker model according to voiceprint; And speaker model is stored in the local memory module.

Fig. 4 shows modeling process according to an embodiment of the present invention.Utilize the technology of voiceprint identification speaker ' s identity can be called Speaker Identification (Speaker Recognition, SR), corresponding model can be called speaker model (Speaker Model, SM).Speaker Recognition System adopts the method for UBM-GMM to carry out modeling usually, namely by universal background model (Universal Background Model of a large amount of training audio frequency (a more than speaker) training, UBM), then on the basis of this UBM, by adaptive method specific speaker is carried out modeling, obtain speaker model (SM).No matter be universal background model or speaker model, usually all adopt mixed Gauss model (Gaussian Mixture Model, GMM) structure.

The interface synoptic diagram that Fig. 4 shows terminal device according to one embodiment of the invention when carrying out audio collection.For example, under the address book contact interface (as shown in Figure 4), the click interpolation is recorded vocal print sample button and just can be recorded contact person's sound when terminal device is recorded the vocal print sample.

Further, as shown in Figure 3, the Application on Voiceprint Recognition flow process comprises the steps: step 104: determine the audio/video file.Then, step 105: the voice in the audio/video file are carried out the speaker cut apart, and generate n voice unit, each voice unit only comprises single speaker's voice.Then, step 106: each voice unit (for example, n voice unit) that is partitioned into is carried out contact person's Application on Voiceprint Recognition and judges whether coupling.Then, step 107: if recognition result mates, then set up the database of corresponding relation between a contact person and this audio/video file for terminal device.Further, the database of corresponding relation can record the audio/video file that contact person's sound occurs.Further, the database of corresponding relation can also record contact person's sound and appear at time point in the audio/video file.That is to say, appear at position in the corresponding document by time point mapping audio/video.

Fig. 6 shows the process flow diagram that terminal device is checked contact person's media library that passes through according to one embodiment of the invention.The flow process of checking contact person's media library by terminal device can comprise the steps: step 201: open media library, select to enter " contact person's media library " menu.Then, step 202: begin to read contact person and audio/video document relationship database.Then, step 203: read and show contact person and corresponding media file and time point 203 after finishing.

Terminal device demonstrated the interface synoptic diagram that marks hereof the voiceprint appearance that audible target is arranged and/or the time point that finishes after Fig. 5 showed and searches out the Audio and Video file of recording.For example, open media library, select to enter " contact person's media library " menu, check that at this moment the user is presented at the interface of contact person's media library.Every terms of information after reading contact person and audio/video document relationship database is provided on the interface.Therefore, comprise according to voiceprint search audio/video file: when opening local memory module, show the audio/video file.

Further, from interface shown in Figure 5, can find out, " son " and " I " two class media files are arranged, wherein: in " International Children's Day " project of " son " file three time points are arranged, namely 3 ' 45 in the media library of this embodiment ", 18 ' 23 ", 45 ' 34 ".These three time points are exactly the time point that occurs " son " sound in " International Children's Day " project.For example, the user can select " 3 ' 45 " ", at this moment terminal device can enter in " International Children's Day " project to begin to play 3 minutes 45 seconds the time automatically.Therefore, the voiceprint of storage of collected comprises: according to the speaker model storage of classifying.Further, comprise according to voiceprint search audio/video file: when opening local memory module, show the audio/video file.Further, described classification comprises: according to speaker model to the demonstration of classifying of audio/video file.Further, described demonstration comprises: show that audible target appears at the time point in the audio/video file.Further, described classification comprises: the kind according to audible target searches classifiably the audio/video file.Further, described time point comprises: during elected middle time point of classifying in showing, play the audio/video of the audible target that contains in the audio/video file.

Shown in Fig. 1-6, according to another embodiment of the present invention, when terminal device is classified according to particular contact to the audio/video file, at first need in the address list module, carry out modeling and the storage of vocal print for its emphasis contact person.The present invention for each contact person record increases " vocal print sample " field, is used for the vocal print sample of storing contact in terminal device address list module.Concrete operation method is: the user is newly-built or edit the important relation people (for example " child ") of its concern.Subsequently, record one section this particular contact (" child's ") audio frequency (for example, record normal speech, time span 7-10 second).Terminal device carries out modeling according to sample sound to this particular contact (" child ") vocal print, and is saved in the vocal print sample field of this contact person record of address list (" child ").Then, the audio/video file of user's Record and Save on terminal device.The present invention can carry out important relation people vocal print analysis and classify the object of Tag Contact's sound time of origin point according to the contact person.Then, utilize speaker's cutting techniques with the sound extraction of all speakers that recorded in the audio/video file and be divided into a plurality of voice units, each voice unit only comprises one of them speaker's voice.Then, utilize speaker model that each voice unit is carried out Application on Voiceprint Recognition.Then, to depositing the database of contact person and audio/video relation after the Application on Voiceprint Recognition, be used for the corresponding relation of record contact person and audio/video file, reach the time point that contact person's sound occurs in this audio/video file.The vocal print that the present invention mentions refers to: the sound wave spectrum of user voice i.e. the biological characteristic of this user voice.By vocal print relatively, portable terminal can be found out the respective objects in the multimedia of storage.Therefore, when audible target is certain contact person in the contact application, the method that gathers the voiceprint of audible target comprises: when conversing with this contact person, record one section sound of contact person, only have this contact person's sound in this section sound time span 7-10 second and above and this section sound.Use this section sound extraction voiceprint and generate the vocal print template.Further, when audible target was certain contact person in the contact application, the voiceprint that gathers audible target comprised: when conversing with this contact person, and record contact person's voiceprint.Further, when audible target was certain contact person in the contact application, the voiceprint that gathers audible target comprises: the user manually recorded this contact person's voice, record contact person's voiceprint.Further, when audible target was certain contact person in the contact application, search audio/video file comprised: during this contact person, play mapping contact person's audio/video in elected.

Fig. 7 shows the process flow diagram of recording contact person's sound according to the embodiment of the invention.The flow process of recording contact person's sound comprises: step 301: open certain contact person on the address list.Then, step 302: judge whether it is to record for the first time.

When judged result is when recording for the first time, enter step 303: begin to record.Then, step 304: record finish after this audio frequency of preservation.Then, step 305: this audio frequency is carried out the vocal print modeling.Then, step 306: preserve vocal print modeling information.Then, step 307: with the existing audio/video file of this voiceprint identification.Then, step 308: the file and the time point that identify are saved in contact person and the audio/video relational database.At last, step 309: vocal print is recorded end-of-job.

When judged result is not when recording for the first time, then enter step 310: judge further whether prompting records again.If need to again record, then enter step 311: delete original recording file.After deleting original recording file, then enter step 303.Carry out successively subsequently above-mentioned steps 303 to 309.If do not need again to record, then not record, process finishes (309).

According to another embodiment of the present invention, a kind of method of Audio and Video on the terminal device being classified and identifying based on sound groove recognition technology in e one of comprises the steps: record contact person's sound to shift to an earlier date voiceprint.Then, the audio/video file is carried out the speaker cut apart, be divided into a plurality of voice units, and each voice unit only contains a speaker's voice, these voice units are carried out Application on Voiceprint Recognition one by one.Then, recognition result is saved in contact person and the audio/video relational database.When entering contact person's media library, perhaps when the user carries out " according to contact categories " or " according to searching contact person " operation in any media library of terminal device or file manager, when perhaps in contact application, directly checking the relevant audio frequency and video of this contact person, read the relational database of contact person and audio/video and their relation is shown.The present invention not only can to show the relation of contact person and audio/video in the mode of a certain menu item in media library, also can show with menu-style in contact person or file manager.

Further, according to another embodiment of the present invention, in the application programs such as terminal device media library, contact manager, file manager, the classification of selecting " according to contact categories " or " according to searching contact person " to carry out audio frequency, video shows and searches.Further, according to another embodiment of the present invention, can in contact application, directly check the audio/video that this contact person is correlated with.

Therefore, provided by the inventionly can classify to the audio/video file according to the voiceprint of particular contact to the method that the audio/video file operates based on voiceprint.Therefore, want to find the audio/video file that includes particular contact as the user, the broadcast of file is checked one by one, but directly select by media library, contact manager, file manager demonstration information, thereby make things convenient for the user to search to contain the file of specific people's sound or video.Further, the method that the audio/video file is operated based on voiceprint provided by the invention can directly jump to the timing node that certain contact person speaks in the audio/video and play, thereby user's search efficiency is provided.

As shown in Figure 8, overall plan of the present invention utilizes the technology of voiceprint identification speaker ' s identity can be called Speaker Identification (Speaker Recognition, SR), and corresponding model can be called speaker model (Speaker Model, SM).Speaker Recognition System adopts the method for UBM-GMM to carry out modeling usually, namely by universal background model (Universal Background Model of a large amount of training audio frequency (a more than speaker) training, UBM), then on the basis of this UBM, by adaptive method specific speaker is carried out modeling, obtain speaker model (SM).No matter be universal background model or speaker model, usually all adopt mixed Gauss model (Gaussian Mixture Model, GMM) structure.As shown in Figure 8, provided by the inventionly can comprise the method that the audio/video file operates based on voiceprint: modeling process, identifying.Modeling process can may further comprise the steps: step 1: the training audio frequency; Step 2: quiet detection; Step 3: voice are cut apart; Step 4: feature extraction; Step 5: intersect self-adaptation according to universal background model; Step 6: generate speaker model; Step 7: carry out Z-norm based on personator's audio frequency and process; Step 8: normalization speaker model.Identifying can may further comprise the steps: step 1: detect audio frequency to be identified; Step 2: quiet detection; Step 3: voice are cut apart; Step 4: feature extraction; Step 5: carry out score according to the normalization speaker model and calculate; Step 6: carry out T-norm based on personator's audio frequency and process; Step 7: judgement; Step 8: output recognition result.Wherein: the normalization speaker model becomes speaker model with personator's model group.According to an embodiment of the present invention, the modeling process of speaker model can roughly be described as following several stages: 1, feature extraction phases: utilize quiet detection technique (Voice Activity Detection, VAD), effective voice are detected from the input audio frequency, and will input audio segmentation according to the quiet length between voice and become some voice, then extract the needed phonetic feature of Speaker Identification from each voice that splits; 2, the UBM modelling phase: utilize from a large amount of phonetic features of training audio extraction, computer general background model (UBM); 3, the SM modelling phase: utilize universal background model and a small amount of speaker dependent's phonetic feature, calculate this speaker's model (SM) by adaptive approach; 4, the SM normalization stage: in order to strengthen the antijamming capability of speaker model, finish after the speaker model modeling, often utilize phonetic features of some personation speakers that speaker model is carried out normalization (Normalization) operation, finally obtain the speaker model (Normalized SM) after the normalization.According to an embodiment of the present invention, the identifying of Speaker Identification can roughly be described as following several stages: 1, feature extraction phases: this stage is identical with the feature extraction phases of modeling process; 2, score calculation stages: utilize speaker model, calculate the score of input phonetic feature; 3, the Score Normalization stage: utilize normalized speaker model, score obtained in the previous step is carried out normalization, and make conclusive judgement.Furthermore, in modeling as described above and identifying, part steps can have different implementation methods: 1, the quiet detection technique of feature extraction phases: the method that the application adopts is at first to utilize energy information and the fundamental frequency information of input audio frequency, with quiet and non-quiet distinguishing, recycling a support vector machine (Support Vector Machine, SVM) model distinguishes voice and the non-voice of non-quiet part.Determined the part of voice, just can according to the gap length between the voice segments, will input audio frequency and be divided into some voice; 2, utilize universal background model to calculate the adaptive approach of speaker model: what the application adopted is eigentones (Eigenvoice) method, linear (the Constrained Maximum Likelihood Linear Regression that returns of constraint maximum likelihood, CMLLR) method that combines of method and structuring maximum a posteriori probability (Structured Maximum A Posterior, SMAP) method; 3, speaker model method for normalizing: what the application adopted is the Z-Norm method; 4, score normalization: what the application adopted is the T-Norm method.The method for normalizing that Z-Norm and T-Norm method combine is present most popular method for normalizing in speaker Recognition Technology, and the former is used for the modelling phase, and the latter is used for cognitive phase.

As shown in Figure 9, another object of the present invention is to provide a kind of terminal device, comprising: the voiceprint extraction module, for the voiceprint that gathers audible target; And execution module, be used for according to voiceprint search audio/video file.

Further, the voiceprint extraction module comprises: the voiceprint collecting unit is used for gathering voiceprint when choosing certain audible target; Vocal print sample generation unit is used for generating speaker model according to voiceprint.

Further, device also comprises: memory module, and for the voiceprint of storage of collected.

Further, memory module also is used for: storage vocal print template sample.

Further, the voiceprint extraction module comprises: the target classification unit, and according to the speaker model storage of classifying.

Further, device also comprises: display when opening local memory module, shows the audio/video file.

Further, display is used for: according to the target classification unit based on the kind of audible target to the demonstration of classifying of audio/video file.

Further, display is used for: show that audible target appears at the time point of audio/video file.

Further, the target classification unit also is used for: the kind according to audible target searches classifiably the audio/video file.

Further, execution module also is used for: during elected time point of classifying in showing, play the audio/video of the audible target that contains in the audio/video file.

Further, when audible target was certain contact person in the contact application, the voiceprint extraction module was used for: when conversing with this contact person, and record contact person's voiceprint.

Further, when audible target was certain contact person in the contact application, the voiceprint extraction module is used for: the user manually recorded this contact person's voice, record contact person's voiceprint.

Further, when audible target was certain contact person in the contact application, execution module also was used for: during elected this contact person, play mapping contact person's audio/video.

Those skilled in the art of the present technique are appreciated that the present invention can relate to for the equipment of carrying out the one or more operation of operation described in the application.Described equipment can be required purpose specialized designs and manufacturing, perhaps also can comprise the known device in the multi-purpose computer, and described multi-purpose computer has storage procedure Selection ground within it to activate or reconstruct.Such computer program (for example can be stored in equipment, computing machine) in the computer-readable recording medium or be stored in the medium of any type that is suitable for the store electrons instruction and is coupled to respectively bus, described computer-readable medium includes but not limited to the dish (comprising floppy disk, hard disk, CD, CD-ROM and magneto-optic disk) of any type, immediately storer (RAM), ROM (read-only memory) (ROM), electrically programmable ROM, electric erasable ROM(EPROM), electrically erasable ROM(EEPROM), flash memory, magnetic card or light card.Computer-readable recording medium comprises for any mechanism by the storage of the readable form of equipment (for example, computing machine) or transmission information.For example, computer-readable recording medium comprises storer (RAM) immediately, ROM (read-only memory) (ROM), magnetic disk storage medium, optical storage medium, flash memory device, the signal (such as carrier wave, infrared signal, digital signal) propagated with electricity, light, sound or other form etc.

Above those skilled in the art of the present technique are appreciated that with reference to invention has been described according to the structural drawing of method, method, system and the computer program of implementation method of the present invention and/or block diagram and/or flow graph.Should be appreciated that, can realize each frame in these structural drawing and/or block diagram and/or the flow graph and the combination of the frame in these structural drawing and/or block diagram and/or the flow graph with computer program instructions.The processor that these computer program instructions can be offered multi-purpose computer, special purpose computer or other programmable data disposal routes generates machine, thereby the instruction of carrying out by the processor of computing machine or other programmable data disposal routes has created for implementation structure figure and/or the frame of block diagram and/or flow graph or the method for a plurality of frame appointments.

Those skilled in the art of the present technique be appreciated that step in the various operations discussed among the present invention, method, the flow process, measure, scheme can by alternately, change, combination or deletion.Furthermore, have other steps in the various operations discussed among the present invention, method, the flow process, measure, scheme also can by alternately, change, reset, decompose, combination or deletion.Furthermore, of the prior art have with the present invention in the disclosed various operations, method, flow process step, measure, scheme also can by alternately, change, reset, decompose, combination or deletion.

Exemplary implementation method of the present invention is disclosed in the drawing and description.Although adopted particular term, they only are used for meaning general and that describe, and are not the purpose for restriction.Should be pointed out that for those skilled in the art under the prerequisite that does not break away from the principle of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.Protection scope of the present invention should limit with claims of the present invention.

Claims

One kind based on voiceprint to the method that the audio/video file operates, it is characterized in that, comprise the steps:

Gather the voiceprint of audible target; And

According to described voiceprint search audio/video file.
2. method according to claim 1 is characterized in that, the voiceprint of described collection audible target comprises:

During certain audible target, gather voiceprint in elected; And

The voiceprint of storage of collected.
3. method according to claim 2 is characterized in that, described collection and storage voiceprint comprise:

Generate speaker model according to described voiceprint; And

Described speaker model is stored in the local memory module.
4. according to claim 2 or 3 described methods, it is characterized in that the voiceprint of described storage of collected comprises:

According to the storage of classifying of described speaker model.
5. method according to claim 3 is characterized in that, comprises according to described voiceprint search audio/video file:

When opening described local memory module, show described audio/video file.
6. method according to claim 5 is characterized in that, described classification comprises:

According to described speaker model to the demonstration of classifying of audio/video file.
7. method according to claim 6 is characterized in that, described demonstration comprises:

Show that described audible target appears at the time point in the audio/video file.
8. method according to claim 7 is characterized in that, described classification comprises:

Kind according to described audible target searches classifiably the audio/video file.
9. method according to claim 6 is characterized in that, described time point comprises:

During described time point during classification shows in elected, begin to play the audio/video of the described audible target that contains the described audio/video file from this time point.
10. method according to claim 1 is characterized in that, when described audible target was certain contact person in the contact application, the voiceprint of described collection audible target comprised:

When conversing with this contact person, record described contact person's voiceprint.
11. method according to claim 12 is characterized in that, when described audible target was certain contact person in the contact application, the voiceprint of described collection audible target comprised:

The user manually records this contact person's voice, records described contact person's voiceprint.
12. method according to claim 1 is characterized in that, when described audible target was certain contact person in the contact application, described search audio/video file comprised:

During this contact person, play the described contact person's of mapping audio/video in elected.
13. a terminal device is characterized in that, comprising:

The voiceprint extraction module is for the voiceprint that gathers audible target; And

Execution module is used for according to described voiceprint search audio/video file.
14. device according to claim 13 is characterized in that, described voiceprint extraction module comprises:

The voiceprint collecting unit is used for gathering voiceprint when choosing certain audible target;

Vocal print sample generation unit is used for generating speaker model according to described voiceprint.
15. device according to claim 14 is characterized in that, also comprises:

Memory module is for the voiceprint of storage of collected.
16. device according to claim 14 is characterized in that, described memory module also is used for: store described speaker model.
17. according to claim 14 or 16 described devices, it is characterized in that described voiceprint extraction module comprises:

The target classification unit is according to the storage of classifying of described speaker model.
18. device according to claim 15 is characterized in that, also comprises:

Display when opening described local memory module, shows described audio/video file.
19. device according to claim 18 is characterized in that, described display is used for:

According to described target classification unit based on the kind of described audible target to the demonstration of classifying of described audio/video file.
20. device according to claim 19 is characterized in that, described display is used for:

Show that described audible target appears at all time points in the audio/video file.
21. device according to claim 20 is characterized in that, described target classification unit also is used for:

Kind according to audible target searches classifiably the audio/video file.
22. device according to claim 19 is characterized in that, described execution module also is used for:

During described time point during classification shows in elected, begin to play the audio/video of the described audible target that contains the described audio/video file from this time point.
23. device according to claim 13 is characterized in that, when described audible target was certain contact person in the contact application, described voiceprint extraction module was used for:

When conversing with this contact person, record described contact person's voiceprint.
24. device according to claim 13 is characterized in that, when described audible target was certain contact person in the contact application, described voiceprint extraction module was used for:

The user manually records this contact person's voice, records described contact person's voiceprint.
25. device according to claim 13 is characterized in that, when described audible target was certain contact person in the contact application, described execution module also was used for:

During this contact person, play the described contact person's of mapping audio/video in elected.