CN103035247A - Method and device of operation on audio/video file based on voiceprint information - Google Patents

Method and device of operation on audio/video file based on voiceprint information Download PDF

Info

Publication number
CN103035247A
CN103035247A CN2012105181184A CN201210518118A CN103035247A CN 103035247 A CN103035247 A CN 103035247A CN 2012105181184 A CN2012105181184 A CN 2012105181184A CN 201210518118 A CN201210518118 A CN 201210518118A CN 103035247 A CN103035247 A CN 103035247A
Authority
CN
China
Prior art keywords
audio
voiceprint
video file
contact person
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105181184A
Other languages
Chinese (zh)
Other versions
CN103035247B (en
Inventor
杨帆
苏腾荣
李世全
马永健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Original Assignee
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Samsung Telecommunications Technology Research Co Ltd, Samsung Electronics Co Ltd filed Critical Beijing Samsung Telecommunications Technology Research Co Ltd
Priority to CN201710439537.1A priority Critical patent/CN107274916B/en
Priority to CN201210518118.4A priority patent/CN103035247B/en
Publication of CN103035247A publication Critical patent/CN103035247A/en
Application granted granted Critical
Publication of CN103035247B publication Critical patent/CN103035247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/632Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a method of an operation on an audio or a video file based on voiceprint information. The method of the operation on the audio or the video file based on the voiceprint information comprises the following steps: collecting the voiceprint information of a vocalization objective; and searching the audio or the video file according to the voiceprint information. The invention further provides terminal equipment. According to the method and the device of the operation on the audio or the video file based on the voiceprint information, the audio or the video file can be categorized according to the voiceprint information of a specific contact, when a user wants to find the audio or the video file which contains the specific contact, the files do not need to be played and checked one by one, just a direct selection is needed, thus the user can easily find out the audio or the video file which contains the voice of the specific contact. In addition, with the help of the method and the device of the operation on the audio or the video file based on the operation of the voiceprint information, the audio or the video file is capable of directly skipping to a speaking time node of a certain contact in the audio or the video file and playing so that search efficiency can be improved for the user.

Description

The method and the device that audio/video file are operated based on voiceprint
Technical field
The present invention relates to the mobile device communication application, relate in particular to according to method and the device of particular contact vocal print to the operation of terminal device audio frequency and video.
Background technology
Phonographic recorder or image pick-up device on the existing terminal device can make things convenient for the user to record and take the Voice ﹠ Video file.Along with the performance of terminal device improves, memory capacity increases, and the kind of multimedia application such as increases at the condition, and the user is easy to record or take a large amount of audio/video files.Yet, facing to a large amount of audio/video files, when the user need to search the audio/video file that all record certain particular contact, or when searching and playing a certain section customizing messages of certain particular contact in certain audio/video file, owing to can't locate fast, can run into the situation of having no way of searching.Only have the one by one broadcast of file to check, just can obtain required file or fragment.
In view of this, need to provide a kind of fast finding and class object audio/video file, and locate method and the terminal device of particular contact time of occurrence point in this document, record the file of specific people's sound and video to make things convenient for the user to search.
Summary of the invention
In order to solve the problems of the technologies described above, realization user fast finding is recorded the file of specific people's sound or video.
One of purpose of the present invention be to provide a kind of based on voiceprint to the method that the audio/video file operates, comprise the steps: to gather the voiceprint of audible target; And according to described voiceprint search audio/video file.
Another object of the present invention is to provide a kind of terminal device, comprising: the voiceprint extraction module, for the voiceprint that gathers audible target; And execution module, be used for according to described voiceprint search audio/video file.
Method and apparatus provided by the invention can fast finding be recorded the file of specific people's sound or video, to improve user's search efficiency.
The aspect that the present invention adds and advantage in the following description part provide, and these will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or the additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of implementation method below in conjunction with accompanying drawing, wherein:
Fig. 1 shows according to an embodiment of the invention schematic flow sheet;
The terminal device that Fig. 2 shows according to one embodiment of the invention carries out audio collection interface synoptic diagram before;
Fig. 3 shows the process flow diagram according to the audio collection of the embodiment of the invention;
The interface synoptic diagram that Fig. 4 shows terminal device according to one embodiment of the invention when carrying out audio collection;
Terminal device demonstrated the interface synoptic diagram that marks hereof the voiceprint appearance that audible target is arranged and/or the time point that finishes after Fig. 5 showed and searches out the Audio and Video file of recording;
Fig. 6 shows the process flow diagram that terminal device is checked contact person's media library that passes through according to one embodiment of the invention;
Fig. 7 shows the process flow diagram of recording contact person's sound according to the embodiment of the invention;
Fig. 8 shows according to an embodiment of the invention one-piece construction synoptic diagram;
Fig. 9 shows according to an embodiment of the invention structural representation.
Specific implementation method
Specifically describe exemplary implementation method of the present invention referring now to accompanying drawing.Yet the present invention can be with many multi-form specific implementation methods of implementing and should not be construed as limited to set forth here; On the contrary, it is of the present invention thoroughly open and complete in order to make that these implementation methods are provided, and intactly passes on thought of the present invention, idea, purpose, design, reference scheme and protection domain to those skilled in the art.The term that uses in the detailed description of the concrete exemplary implementation method of example in the accompanying drawing is not meant to limit the present invention.In the accompanying drawing, same numeral refers to identical element.
Unless those skilled in the art of the present technique are appreciated that specially statement, singulative used herein " ", " one ", " described " and " being somebody's turn to do " also can comprise plural form.What will be further understood that is, the wording of using in the instructions of the present invention " comprises " and refers to exist described feature, integer, step, operation, element and/or assembly, do not exist or adds one or more other features, integer, step, operation, element, assembly and/or their group but do not get rid of.Should be appreciated that, when we claim element to be " connected " or " coupling " when another element, it can directly connect or be couple to other elements, perhaps also can have intermediary element.In addition, " connection " used herein or " coupling " can comprise wireless connections or couple.Wording used herein " and/or " comprise one or more arbitrary unit of listing item that is associated and all make up.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (comprising technical term and scientific terminology) have with the present invention under the identical meaning of the general understanding of those of ordinary skill in the field.Should also be understood that such as those terms that define in the general dictionary to be understood to have the consistent meaning of meaning in the context with prior art, unless and definition as here, do not use idealized or too formal implication is explained.
As shown in Figure 1, the invention provides a kind of based on voiceprint to the method that the audio/video file operates, comprise the steps: S1, gather the voiceprint of audible target; And S2, according to voiceprint search audio/video file.
For example, step S1 realizes by the following method: when contact person X1 made a phone call to user Y, terminal device was opened built-in phonographic recorder and is recorded separately voice (these speech voice of for example, recording of speech of one section contact person X1, and therefrom extract voiceprint time span 7-10 second); Then, after stopping to converse, terminal device deposits this sample in the media library in after generating speaker model M1 according to the voiceprint of recording; Then, terminal device is with the register of the corresponding contact person in address list X of speaker model.
For example, step S1 also realizes by the following method: when user Y band son X2 goes to the park to play, open " recording the vocal print sample " option in the record of terminal device son X2 in address list and record the voiceprint of son X2; Then, after stopping to record, after terminal device generated speaker model M2 according to the voiceprint of recording, this sample deposited in the terminal memory; Then, terminal device is with the file of contact person X2 in the corresponding media library of speaker model.Certainly, be appreciated that to be that media library is a kind of statement of storing multimedia set, also can be expressed as file, file manager, media manager, Video Manager, audio manager etc.As shown in Figure 5, when running into later the voiceprint that includes speaker model M1 and M2, terminal device is classified these Audio and Video files and mark according to special object (for example, " I " and " son ") again.After classification storage, can generate the information such as Subject field, file, media library of corresponding classification.
Step S1 can also realize as follows: step S11, elected in during a audible target (for example, Zhang San) in the address list application program, provide on the display screen and record vocal print sample option; Step S12, when the user click record vocal print sample option after, terminal device gathers voiceprint, and will be stored in according to the speaker model that voiceprint generates in contact person's media library; And step S13, after entering contact person's media library page, display screen presents the audio/video file that searches.Therefore, the voiceprint that gathers audible target comprises: during certain audible target, gather voiceprint in elected; And the voiceprint of storage of collected.
The terminal device that Fig. 2 shows according to one embodiment of the invention carries out audio collection interface synoptic diagram before.Fig. 3 shows the process flow diagram according to the audio collection of the embodiment of the invention.The audio collection flow process comprises the steps: step 101: the entry communication record, open particular contact on the telephone directory.Then, step 102: by " recording the vocal print sample " option (as shown in Figure 2), record contact person's sound (that is, gathering contact person's voiceprint).Then, step 103: record finish after, contact person's sound is carried out modeling, generating speaker model, and speaker model is saved in the associated person information.Therefore, gathering and store voiceprint comprises: generate speaker model according to voiceprint; And speaker model is stored in the local memory module.
Fig. 4 shows modeling process according to an embodiment of the present invention.Utilize the technology of voiceprint identification speaker ' s identity can be called Speaker Identification (Speaker Recognition, SR), corresponding model can be called speaker model (Speaker Model, SM).Speaker Recognition System adopts the method for UBM-GMM to carry out modeling usually, namely by universal background model (Universal Background Model of a large amount of training audio frequency (a more than speaker) training, UBM), then on the basis of this UBM, by adaptive method specific speaker is carried out modeling, obtain speaker model (SM).No matter be universal background model or speaker model, usually all adopt mixed Gauss model (Gaussian Mixture Model, GMM) structure.
The interface synoptic diagram that Fig. 4 shows terminal device according to one embodiment of the invention when carrying out audio collection.For example, under the address book contact interface (as shown in Figure 4), the click interpolation is recorded vocal print sample button and just can be recorded contact person's sound when terminal device is recorded the vocal print sample.
Further, as shown in Figure 3, the Application on Voiceprint Recognition flow process comprises the steps: step 104: determine the audio/video file.Then, step 105: the voice in the audio/video file are carried out the speaker cut apart, and generate n voice unit, each voice unit only comprises single speaker's voice.Then, step 106: each voice unit (for example, n voice unit) that is partitioned into is carried out contact person's Application on Voiceprint Recognition and judges whether coupling.Then, step 107: if recognition result mates, then set up the database of corresponding relation between a contact person and this audio/video file for terminal device.Further, the database of corresponding relation can record the audio/video file that contact person's sound occurs.Further, the database of corresponding relation can also record contact person's sound and appear at time point in the audio/video file.That is to say, appear at position in the corresponding document by time point mapping audio/video.
Fig. 6 shows the process flow diagram that terminal device is checked contact person's media library that passes through according to one embodiment of the invention.The flow process of checking contact person's media library by terminal device can comprise the steps: step 201: open media library, select to enter " contact person's media library " menu.Then, step 202: begin to read contact person and audio/video document relationship database.Then, step 203: read and show contact person and corresponding media file and time point 203 after finishing.
Terminal device demonstrated the interface synoptic diagram that marks hereof the voiceprint appearance that audible target is arranged and/or the time point that finishes after Fig. 5 showed and searches out the Audio and Video file of recording.For example, open media library, select to enter " contact person's media library " menu, check that at this moment the user is presented at the interface of contact person's media library.Every terms of information after reading contact person and audio/video document relationship database is provided on the interface.Therefore, comprise according to voiceprint search audio/video file: when opening local memory module, show the audio/video file.
Further, from interface shown in Figure 5, can find out, " son " and " I " two class media files are arranged, wherein: in " International Children's Day " project of " son " file three time points are arranged, namely 3 ' 45 in the media library of this embodiment ", 18 ' 23 ", 45 ' 34 ".These three time points are exactly the time point that occurs " son " sound in " International Children's Day " project.For example, the user can select " 3 ' 45 " ", at this moment terminal device can enter in " International Children's Day " project to begin to play 3 minutes 45 seconds the time automatically.Therefore, the voiceprint of storage of collected comprises: according to the speaker model storage of classifying.Further, comprise according to voiceprint search audio/video file: when opening local memory module, show the audio/video file.Further, described classification comprises: according to speaker model to the demonstration of classifying of audio/video file.Further, described demonstration comprises: show that audible target appears at the time point in the audio/video file.Further, described classification comprises: the kind according to audible target searches classifiably the audio/video file.Further, described time point comprises: during elected middle time point of classifying in showing, play the audio/video of the audible target that contains in the audio/video file.
Shown in Fig. 1-6, according to another embodiment of the present invention, when terminal device is classified according to particular contact to the audio/video file, at first need in the address list module, carry out modeling and the storage of vocal print for its emphasis contact person.The present invention for each contact person record increases " vocal print sample " field, is used for the vocal print sample of storing contact in terminal device address list module.Concrete operation method is: the user is newly-built or edit the important relation people (for example " child ") of its concern.Subsequently, record one section this particular contact (" child's ") audio frequency (for example, record normal speech, time span 7-10 second).Terminal device carries out modeling according to sample sound to this particular contact (" child ") vocal print, and is saved in the vocal print sample field of this contact person record of address list (" child ").Then, the audio/video file of user's Record and Save on terminal device.The present invention can carry out important relation people vocal print analysis and classify the object of Tag Contact's sound time of origin point according to the contact person.Then, utilize speaker's cutting techniques with the sound extraction of all speakers that recorded in the audio/video file and be divided into a plurality of voice units, each voice unit only comprises one of them speaker's voice.Then, utilize speaker model that each voice unit is carried out Application on Voiceprint Recognition.Then, to depositing the database of contact person and audio/video relation after the Application on Voiceprint Recognition, be used for the corresponding relation of record contact person and audio/video file, reach the time point that contact person's sound occurs in this audio/video file.The vocal print that the present invention mentions refers to: the sound wave spectrum of user voice i.e. the biological characteristic of this user voice.By vocal print relatively, portable terminal can be found out the respective objects in the multimedia of storage.Therefore, when audible target is certain contact person in the contact application, the method that gathers the voiceprint of audible target comprises: when conversing with this contact person, record one section sound of contact person, only have this contact person's sound in this section sound time span 7-10 second and above and this section sound.Use this section sound extraction voiceprint and generate the vocal print template.Further, when audible target was certain contact person in the contact application, the voiceprint that gathers audible target comprised: when conversing with this contact person, and record contact person's voiceprint.Further, when audible target was certain contact person in the contact application, the voiceprint that gathers audible target comprises: the user manually recorded this contact person's voice, record contact person's voiceprint.Further, when audible target was certain contact person in the contact application, search audio/video file comprised: during this contact person, play mapping contact person's audio/video in elected.
Fig. 7 shows the process flow diagram of recording contact person's sound according to the embodiment of the invention.The flow process of recording contact person's sound comprises: step 301: open certain contact person on the address list.Then, step 302: judge whether it is to record for the first time.
When judged result is when recording for the first time, enter step 303: begin to record.Then, step 304: record finish after this audio frequency of preservation.Then, step 305: this audio frequency is carried out the vocal print modeling.Then, step 306: preserve vocal print modeling information.Then, step 307: with the existing audio/video file of this voiceprint identification.Then, step 308: the file and the time point that identify are saved in contact person and the audio/video relational database.At last, step 309: vocal print is recorded end-of-job.
When judged result is not when recording for the first time, then enter step 310: judge further whether prompting records again.If need to again record, then enter step 311: delete original recording file.After deleting original recording file, then enter step 303.Carry out successively subsequently above-mentioned steps 303 to 309.If do not need again to record, then not record, process finishes (309).
According to another embodiment of the present invention, a kind of method of Audio and Video on the terminal device being classified and identifying based on sound groove recognition technology in e one of comprises the steps: record contact person's sound to shift to an earlier date voiceprint.Then, the audio/video file is carried out the speaker cut apart, be divided into a plurality of voice units, and each voice unit only contains a speaker's voice, these voice units are carried out Application on Voiceprint Recognition one by one.Then, recognition result is saved in contact person and the audio/video relational database.When entering contact person's media library, perhaps when the user carries out " according to contact categories " or " according to searching contact person " operation in any media library of terminal device or file manager, when perhaps in contact application, directly checking the relevant audio frequency and video of this contact person, read the relational database of contact person and audio/video and their relation is shown.The present invention not only can to show the relation of contact person and audio/video in the mode of a certain menu item in media library, also can show with menu-style in contact person or file manager.
Further, according to another embodiment of the present invention, in the application programs such as terminal device media library, contact manager, file manager, the classification of selecting " according to contact categories " or " according to searching contact person " to carry out audio frequency, video shows and searches.Further, according to another embodiment of the present invention, can in contact application, directly check the audio/video that this contact person is correlated with.
Therefore, provided by the inventionly can classify to the audio/video file according to the voiceprint of particular contact to the method that the audio/video file operates based on voiceprint.Therefore, want to find the audio/video file that includes particular contact as the user, the broadcast of file is checked one by one, but directly select by media library, contact manager, file manager demonstration information, thereby make things convenient for the user to search to contain the file of specific people's sound or video.Further, the method that the audio/video file is operated based on voiceprint provided by the invention can directly jump to the timing node that certain contact person speaks in the audio/video and play, thereby user's search efficiency is provided.
As shown in Figure 8, overall plan of the present invention utilizes the technology of voiceprint identification speaker ' s identity can be called Speaker Identification (Speaker Recognition, SR), and corresponding model can be called speaker model (Speaker Model, SM).Speaker Recognition System adopts the method for UBM-GMM to carry out modeling usually, namely by universal background model (Universal Background Model of a large amount of training audio frequency (a more than speaker) training, UBM), then on the basis of this UBM, by adaptive method specific speaker is carried out modeling, obtain speaker model (SM).No matter be universal background model or speaker model, usually all adopt mixed Gauss model (Gaussian Mixture Model, GMM) structure.As shown in Figure 8, provided by the inventionly can comprise the method that the audio/video file operates based on voiceprint: modeling process, identifying.Modeling process can may further comprise the steps: step 1: the training audio frequency; Step 2: quiet detection; Step 3: voice are cut apart; Step 4: feature extraction; Step 5: intersect self-adaptation according to universal background model; Step 6: generate speaker model; Step 7: carry out Z-norm based on personator's audio frequency and process; Step 8: normalization speaker model.Identifying can may further comprise the steps: step 1: detect audio frequency to be identified; Step 2: quiet detection; Step 3: voice are cut apart; Step 4: feature extraction; Step 5: carry out score according to the normalization speaker model and calculate; Step 6: carry out T-norm based on personator's audio frequency and process; Step 7: judgement; Step 8: output recognition result.Wherein: the normalization speaker model becomes speaker model with personator's model group.According to an embodiment of the present invention, the modeling process of speaker model can roughly be described as following several stages: 1, feature extraction phases: utilize quiet detection technique (Voice Activity Detection, VAD), effective voice are detected from the input audio frequency, and will input audio segmentation according to the quiet length between voice and become some voice, then extract the needed phonetic feature of Speaker Identification from each voice that splits; 2, the UBM modelling phase: utilize from a large amount of phonetic features of training audio extraction, computer general background model (UBM); 3, the SM modelling phase: utilize universal background model and a small amount of speaker dependent's phonetic feature, calculate this speaker's model (SM) by adaptive approach; 4, the SM normalization stage: in order to strengthen the antijamming capability of speaker model, finish after the speaker model modeling, often utilize phonetic features of some personation speakers that speaker model is carried out normalization (Normalization) operation, finally obtain the speaker model (Normalized SM) after the normalization.According to an embodiment of the present invention, the identifying of Speaker Identification can roughly be described as following several stages: 1, feature extraction phases: this stage is identical with the feature extraction phases of modeling process; 2, score calculation stages: utilize speaker model, calculate the score of input phonetic feature; 3, the Score Normalization stage: utilize normalized speaker model, score obtained in the previous step is carried out normalization, and make conclusive judgement.Furthermore, in modeling as described above and identifying, part steps can have different implementation methods: 1, the quiet detection technique of feature extraction phases: the method that the application adopts is at first to utilize energy information and the fundamental frequency information of input audio frequency, with quiet and non-quiet distinguishing, recycling a support vector machine (Support Vector Machine, SVM) model distinguishes voice and the non-voice of non-quiet part.Determined the part of voice, just can according to the gap length between the voice segments, will input audio frequency and be divided into some voice; 2, utilize universal background model to calculate the adaptive approach of speaker model: what the application adopted is eigentones (Eigenvoice) method, linear (the Constrained Maximum Likelihood Linear Regression that returns of constraint maximum likelihood, CMLLR) method that combines of method and structuring maximum a posteriori probability (Structured Maximum A Posterior, SMAP) method; 3, speaker model method for normalizing: what the application adopted is the Z-Norm method; 4, score normalization: what the application adopted is the T-Norm method.The method for normalizing that Z-Norm and T-Norm method combine is present most popular method for normalizing in speaker Recognition Technology, and the former is used for the modelling phase, and the latter is used for cognitive phase.
As shown in Figure 9, another object of the present invention is to provide a kind of terminal device, comprising: the voiceprint extraction module, for the voiceprint that gathers audible target; And execution module, be used for according to voiceprint search audio/video file.
Further, the voiceprint extraction module comprises: the voiceprint collecting unit is used for gathering voiceprint when choosing certain audible target; Vocal print sample generation unit is used for generating speaker model according to voiceprint.
Further, device also comprises: memory module, and for the voiceprint of storage of collected.
Further, memory module also is used for: storage vocal print template sample.
Further, the voiceprint extraction module comprises: the target classification unit, and according to the speaker model storage of classifying.
Further, device also comprises: display when opening local memory module, shows the audio/video file.
Further, display is used for: according to the target classification unit based on the kind of audible target to the demonstration of classifying of audio/video file.
Further, display is used for: show that audible target appears at the time point of audio/video file.
Further, the target classification unit also is used for: the kind according to audible target searches classifiably the audio/video file.
Further, execution module also is used for: during elected time point of classifying in showing, play the audio/video of the audible target that contains in the audio/video file.
Further, when audible target was certain contact person in the contact application, the voiceprint extraction module was used for: when conversing with this contact person, and record contact person's voiceprint.
Further, when audible target was certain contact person in the contact application, the voiceprint extraction module is used for: the user manually recorded this contact person's voice, record contact person's voiceprint.
Further, when audible target was certain contact person in the contact application, execution module also was used for: during elected this contact person, play mapping contact person's audio/video.
Method and apparatus provided by the invention can fast finding be recorded the file of specific people's sound or video, to improve user's search efficiency.
Those skilled in the art of the present technique are appreciated that the present invention can relate to for the equipment of carrying out the one or more operation of operation described in the application.Described equipment can be required purpose specialized designs and manufacturing, perhaps also can comprise the known device in the multi-purpose computer, and described multi-purpose computer has storage procedure Selection ground within it to activate or reconstruct.Such computer program (for example can be stored in equipment, computing machine) in the computer-readable recording medium or be stored in the medium of any type that is suitable for the store electrons instruction and is coupled to respectively bus, described computer-readable medium includes but not limited to the dish (comprising floppy disk, hard disk, CD, CD-ROM and magneto-optic disk) of any type, immediately storer (RAM), ROM (read-only memory) (ROM), electrically programmable ROM, electric erasable ROM(EPROM), electrically erasable ROM(EEPROM), flash memory, magnetic card or light card.Computer-readable recording medium comprises for any mechanism by the storage of the readable form of equipment (for example, computing machine) or transmission information.For example, computer-readable recording medium comprises storer (RAM) immediately, ROM (read-only memory) (ROM), magnetic disk storage medium, optical storage medium, flash memory device, the signal (such as carrier wave, infrared signal, digital signal) propagated with electricity, light, sound or other form etc.
Above those skilled in the art of the present technique are appreciated that with reference to invention has been described according to the structural drawing of method, method, system and the computer program of implementation method of the present invention and/or block diagram and/or flow graph.Should be appreciated that, can realize each frame in these structural drawing and/or block diagram and/or the flow graph and the combination of the frame in these structural drawing and/or block diagram and/or the flow graph with computer program instructions.The processor that these computer program instructions can be offered multi-purpose computer, special purpose computer or other programmable data disposal routes generates machine, thereby the instruction of carrying out by the processor of computing machine or other programmable data disposal routes has created for implementation structure figure and/or the frame of block diagram and/or flow graph or the method for a plurality of frame appointments.
Those skilled in the art of the present technique be appreciated that step in the various operations discussed among the present invention, method, the flow process, measure, scheme can by alternately, change, combination or deletion.Furthermore, have other steps in the various operations discussed among the present invention, method, the flow process, measure, scheme also can by alternately, change, reset, decompose, combination or deletion.Furthermore, of the prior art have with the present invention in the disclosed various operations, method, flow process step, measure, scheme also can by alternately, change, reset, decompose, combination or deletion.
Exemplary implementation method of the present invention is disclosed in the drawing and description.Although adopted particular term, they only are used for meaning general and that describe, and are not the purpose for restriction.Should be pointed out that for those skilled in the art under the prerequisite that does not break away from the principle of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.Protection scope of the present invention should limit with claims of the present invention.

Claims (25)

  1. One kind based on voiceprint to the method that the audio/video file operates, it is characterized in that, comprise the steps:
    Gather the voiceprint of audible target; And
    According to described voiceprint search audio/video file.
  2. 2. method according to claim 1 is characterized in that, the voiceprint of described collection audible target comprises:
    During certain audible target, gather voiceprint in elected; And
    The voiceprint of storage of collected.
  3. 3. method according to claim 2 is characterized in that, described collection and storage voiceprint comprise:
    Generate speaker model according to described voiceprint; And
    Described speaker model is stored in the local memory module.
  4. 4. according to claim 2 or 3 described methods, it is characterized in that the voiceprint of described storage of collected comprises:
    According to the storage of classifying of described speaker model.
  5. 5. method according to claim 3 is characterized in that, comprises according to described voiceprint search audio/video file:
    When opening described local memory module, show described audio/video file.
  6. 6. method according to claim 5 is characterized in that, described classification comprises:
    According to described speaker model to the demonstration of classifying of audio/video file.
  7. 7. method according to claim 6 is characterized in that, described demonstration comprises:
    Show that described audible target appears at the time point in the audio/video file.
  8. 8. method according to claim 7 is characterized in that, described classification comprises:
    Kind according to described audible target searches classifiably the audio/video file.
  9. 9. method according to claim 6 is characterized in that, described time point comprises:
    During described time point during classification shows in elected, begin to play the audio/video of the described audible target that contains the described audio/video file from this time point.
  10. 10. method according to claim 1 is characterized in that, when described audible target was certain contact person in the contact application, the voiceprint of described collection audible target comprised:
    When conversing with this contact person, record described contact person's voiceprint.
  11. 11. method according to claim 12 is characterized in that, when described audible target was certain contact person in the contact application, the voiceprint of described collection audible target comprised:
    The user manually records this contact person's voice, records described contact person's voiceprint.
  12. 12. method according to claim 1 is characterized in that, when described audible target was certain contact person in the contact application, described search audio/video file comprised:
    During this contact person, play the described contact person's of mapping audio/video in elected.
  13. 13. a terminal device is characterized in that, comprising:
    The voiceprint extraction module is for the voiceprint that gathers audible target; And
    Execution module is used for according to described voiceprint search audio/video file.
  14. 14. device according to claim 13 is characterized in that, described voiceprint extraction module comprises:
    The voiceprint collecting unit is used for gathering voiceprint when choosing certain audible target;
    Vocal print sample generation unit is used for generating speaker model according to described voiceprint.
  15. 15. device according to claim 14 is characterized in that, also comprises:
    Memory module is for the voiceprint of storage of collected.
  16. 16. device according to claim 14 is characterized in that, described memory module also is used for: store described speaker model.
  17. 17. according to claim 14 or 16 described devices, it is characterized in that described voiceprint extraction module comprises:
    The target classification unit is according to the storage of classifying of described speaker model.
  18. 18. device according to claim 15 is characterized in that, also comprises:
    Display when opening described local memory module, shows described audio/video file.
  19. 19. device according to claim 18 is characterized in that, described display is used for:
    According to described target classification unit based on the kind of described audible target to the demonstration of classifying of described audio/video file.
  20. 20. device according to claim 19 is characterized in that, described display is used for:
    Show that described audible target appears at all time points in the audio/video file.
  21. 21. device according to claim 20 is characterized in that, described target classification unit also is used for:
    Kind according to audible target searches classifiably the audio/video file.
  22. 22. device according to claim 19 is characterized in that, described execution module also is used for:
    During described time point during classification shows in elected, begin to play the audio/video of the described audible target that contains the described audio/video file from this time point.
  23. 23. device according to claim 13 is characterized in that, when described audible target was certain contact person in the contact application, described voiceprint extraction module was used for:
    When conversing with this contact person, record described contact person's voiceprint.
  24. 24. device according to claim 13 is characterized in that, when described audible target was certain contact person in the contact application, described voiceprint extraction module was used for:
    The user manually records this contact person's voice, records described contact person's voiceprint.
  25. 25. device according to claim 13 is characterized in that, when described audible target was certain contact person in the contact application, described execution module also was used for:
    During this contact person, play the described contact person's of mapping audio/video in elected.
CN201210518118.4A 2012-12-05 2012-12-05 Based on the method and device that voiceprint is operated to audio/video file Active CN103035247B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710439537.1A CN107274916B (en) 2012-12-05 2012-12-05 Method and device for operating audio/video file based on voiceprint information
CN201210518118.4A CN103035247B (en) 2012-12-05 2012-12-05 Based on the method and device that voiceprint is operated to audio/video file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210518118.4A CN103035247B (en) 2012-12-05 2012-12-05 Based on the method and device that voiceprint is operated to audio/video file

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201710439537.1A Division CN107274916B (en) 2012-12-05 2012-12-05 Method and device for operating audio/video file based on voiceprint information

Publications (2)

Publication Number Publication Date
CN103035247A true CN103035247A (en) 2013-04-10
CN103035247B CN103035247B (en) 2017-07-07

Family

ID=48022078

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201210518118.4A Active CN103035247B (en) 2012-12-05 2012-12-05 Based on the method and device that voiceprint is operated to audio/video file
CN201710439537.1A Active CN107274916B (en) 2012-12-05 2012-12-05 Method and device for operating audio/video file based on voiceprint information

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201710439537.1A Active CN107274916B (en) 2012-12-05 2012-12-05 Method and device for operating audio/video file based on voiceprint information

Country Status (1)

Country Link
CN (2) CN103035247B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123115A (en) * 2014-07-28 2014-10-29 联想(北京)有限公司 Audio information processing method and electronic device
CN104243934A (en) * 2014-09-30 2014-12-24 智慧城市信息技术有限公司 Method and device for acquiring surveillance video and method and device for retrieving surveillance video
CN104268279A (en) * 2014-10-16 2015-01-07 魔方天空科技(北京)有限公司 Query method and device of corpus data
CN105022263A (en) * 2015-07-28 2015-11-04 广东欧珀移动通信有限公司 Method for controlling intelligent watch and intelligent watch
CN105635452A (en) * 2015-12-28 2016-06-01 努比亚技术有限公司 Mobile terminal and contact person identification method thereof
CN105654942A (en) * 2016-01-04 2016-06-08 北京时代瑞朗科技有限公司 Speech synthesis method of interrogative sentence and exclamatory sentence based on statistical parameter
CN105704512A (en) * 2014-10-06 2016-06-22 财团法人资讯工业策进会 Video capturing system and video capturing method thereof
CN105828179A (en) * 2015-06-24 2016-08-03 维沃移动通信有限公司 Video positioning method and device
WO2016165346A1 (en) * 2015-09-16 2016-10-20 中兴通讯股份有限公司 Method and apparatus for storing and playing audio file
CN106095764A (en) * 2016-03-31 2016-11-09 乐视控股(北京)有限公司 A kind of dynamic picture processing method and system
CN106448683A (en) * 2016-09-30 2017-02-22 珠海市魅族科技有限公司 Method and device for viewing recording in multimedia files
CN107452408A (en) * 2017-07-27 2017-12-08 上海与德科技有限公司 A kind of audio frequency playing method and device
CN108074574A (en) * 2017-11-29 2018-05-25 维沃移动通信有限公司 Audio-frequency processing method, device and mobile terminal
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
CN108319371A (en) * 2018-02-11 2018-07-24 广东欧珀移动通信有限公司 Control method for playing back and Related product
CN109117665A (en) * 2013-08-14 2019-01-01 华为终端(东莞)有限公司 Realize method for secret protection and device
WO2020057347A1 (en) * 2018-09-21 2020-03-26 深圳市九洲电器有限公司 Multimedia file retrieval method and apparatus
CN111883139A (en) * 2020-07-24 2020-11-03 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for screening target voices

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364663A (en) * 2018-01-02 2018-08-03 山东浪潮商用系统有限公司 A kind of method and module of automatic recording voice
CN108364654B (en) * 2018-01-30 2020-10-13 网易乐得科技有限公司 Voice processing method, medium, device and computing equipment
CN108920619A (en) * 2018-06-28 2018-11-30 Oppo广东移动通信有限公司 Document display method, device, storage medium and electronic equipment
CN111091844A (en) * 2018-10-23 2020-05-01 北京嘀嘀无限科技发展有限公司 Video processing method and system
CN112153461B (en) * 2020-09-25 2022-11-18 北京百度网讯科技有限公司 Method and device for positioning sound production object, electronic equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1156871A (en) * 1995-11-17 1997-08-13 雅马哈株式会社 Personal information database system
CN1307589C (en) * 2001-04-17 2007-03-28 皇家菲利浦电子有限公司 Method and apparatus of managing information about a person
CN102238189A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Voiceprint password authentication method and system
WO2011149647A2 (en) * 2010-05-24 2011-12-01 Microsoft Corporation Voice print identification
CN102404278A (en) * 2010-09-08 2012-04-04 盛乐信息技术(上海)有限公司 Song request system based on voiceprint recognition and application method thereof
CN102655002A (en) * 2011-03-01 2012-09-05 株式会社理光 Audio processing method and audio processing equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6345252B1 (en) * 1999-04-09 2002-02-05 International Business Machines Corporation Methods and apparatus for retrieving audio information using content and speaker information
CN102347060A (en) * 2010-08-04 2012-02-08 鸿富锦精密工业(深圳)有限公司 Electronic recording device and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1156871A (en) * 1995-11-17 1997-08-13 雅马哈株式会社 Personal information database system
CN1307589C (en) * 2001-04-17 2007-03-28 皇家菲利浦电子有限公司 Method and apparatus of managing information about a person
WO2011149647A2 (en) * 2010-05-24 2011-12-01 Microsoft Corporation Voice print identification
CN102404278A (en) * 2010-09-08 2012-04-04 盛乐信息技术(上海)有限公司 Song request system based on voiceprint recognition and application method thereof
CN102655002A (en) * 2011-03-01 2012-09-05 株式会社理光 Audio processing method and audio processing equipment
CN102238189A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Voiceprint password authentication method and system

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117665A (en) * 2013-08-14 2019-01-01 华为终端(东莞)有限公司 Realize method for secret protection and device
CN104123115B (en) * 2014-07-28 2017-05-24 联想(北京)有限公司 Audio information processing method and electronic device
CN104123115A (en) * 2014-07-28 2014-10-29 联想(北京)有限公司 Audio information processing method and electronic device
CN104243934A (en) * 2014-09-30 2014-12-24 智慧城市信息技术有限公司 Method and device for acquiring surveillance video and method and device for retrieving surveillance video
CN105704512A (en) * 2014-10-06 2016-06-22 财团法人资讯工业策进会 Video capturing system and video capturing method thereof
CN104268279B (en) * 2014-10-16 2018-04-20 魔方天空科技(北京)有限公司 The querying method and device of corpus data
CN104268279A (en) * 2014-10-16 2015-01-07 魔方天空科技(北京)有限公司 Query method and device of corpus data
CN105828179A (en) * 2015-06-24 2016-08-03 维沃移动通信有限公司 Video positioning method and device
CN105022263A (en) * 2015-07-28 2015-11-04 广东欧珀移动通信有限公司 Method for controlling intelligent watch and intelligent watch
WO2016165346A1 (en) * 2015-09-16 2016-10-20 中兴通讯股份有限公司 Method and apparatus for storing and playing audio file
CN106548793A (en) * 2015-09-16 2017-03-29 中兴通讯股份有限公司 Storage and the method and apparatus for playing audio file
CN105635452A (en) * 2015-12-28 2016-06-01 努比亚技术有限公司 Mobile terminal and contact person identification method thereof
CN105654942A (en) * 2016-01-04 2016-06-08 北京时代瑞朗科技有限公司 Speech synthesis method of interrogative sentence and exclamatory sentence based on statistical parameter
CN106095764A (en) * 2016-03-31 2016-11-09 乐视控股(北京)有限公司 A kind of dynamic picture processing method and system
CN106448683A (en) * 2016-09-30 2017-02-22 珠海市魅族科技有限公司 Method and device for viewing recording in multimedia files
CN107452408A (en) * 2017-07-27 2017-12-08 上海与德科技有限公司 A kind of audio frequency playing method and device
CN107452408B (en) * 2017-07-27 2020-09-25 成都声玩文化传播有限公司 Audio playing method and device
US11538456B2 (en) 2017-11-06 2022-12-27 Tencent Technology (Shenzhen) Company Limited Audio file processing method, electronic device, and storage medium
WO2019086044A1 (en) * 2017-11-06 2019-05-09 腾讯科技(深圳)有限公司 Audio file processing method, electronic device and storage medium
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
CN108074574A (en) * 2017-11-29 2018-05-25 维沃移动通信有限公司 Audio-frequency processing method, device and mobile terminal
CN108319371A (en) * 2018-02-11 2018-07-24 广东欧珀移动通信有限公司 Control method for playing back and Related product
WO2020057347A1 (en) * 2018-09-21 2020-03-26 深圳市九洲电器有限公司 Multimedia file retrieval method and apparatus
CN111883139A (en) * 2020-07-24 2020-11-03 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for screening target voices

Also Published As

Publication number Publication date
CN107274916A (en) 2017-10-20
CN103035247B (en) 2017-07-07
CN107274916B (en) 2021-08-20

Similar Documents

Publication Publication Date Title
CN103035247A (en) Method and device of operation on audio/video file based on voiceprint information
US10977299B2 (en) Systems and methods for consolidating recorded content
CN103377651B (en) The automatic synthesizer of voice and method
US9711167B2 (en) System and method for real-time speaker segmentation of audio interactions
WO2006025797A1 (en) A search system
CN107507626B (en) Mobile phone source identification method based on voice frequency spectrum fusion characteristics
US10242330B2 (en) Method and apparatus for detection and analysis of first contact resolution failures
Aggarwal et al. Cellphone identification using noise estimates from recorded audio
US9058384B2 (en) System and method for identification of highly-variable vocalizations
Ntalampiras et al. Acoustic detection of human activities in natural environments
CN102067589A (en) Digital video recorder system and operating method thereof
CN109710799B (en) Voice interaction method, medium, device and computing equipment
EP2926337A1 (en) Clustering and synchronizing multimedia contents
CN106302987A (en) A kind of audio frequency recommends method and apparatus
CN107679196A (en) A kind of multimedia recognition methods, electronic equipment and storage medium
Cotton et al. Soundtrack classification by transient events
US11538461B1 (en) Language agnostic missing subtitle detection
US8725508B2 (en) Method and apparatus for element identification in a signal
Pandey et al. Cell-phone identification from audio recordings using PSD of speech-free regions
CN110286775A (en) A kind of dictionary management method and device
CN106156299B (en) The subject content recognition methods of text information and device
Buddhika et al. Voicer: A crowd sourcing tool for speech data collection
CN108989551B (en) Position prompting method and device, storage medium and electronic equipment
CN105930522A (en) Intelligent music recommendation method, system and device
KR100869643B1 (en) Mp3-based popular song summarization installation and method using music structures, storage medium storing program for realizing the method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant