CN101447188B

CN101447188B - Digital voice print identification system and validation and identification method

Info

Publication number: CN101447188B
Application number: CN2007101781412A
Authority: CN
Inventors: 约翰·叶; 里奥纳德·程
Original assignee: BEIJING TONALE DIGITAL TECHNOLOGY Co Ltd
Current assignee: BEIJING TONALE DIGITAL TECHNOLOGY Co Ltd
Priority date: 2007-11-27
Filing date: 2007-11-27
Publication date: 2011-06-15
Anticipated expiration: 2027-11-27
Also published as: CN101447188A

Abstract

The invention discloses a digital voice print identification system. The system comprises a plurality of collectors; a voice print processor which is connected with a plurality of collectors; a voice print database which is connected with the voice print processor; and a comparing and adjusting engine which is connected with the voice print database. The invention further provides a validation method of forensic identity, and the method comprise the following steps: performing voice print processing on live recording to obtain a first voice print characteristic; obtaining second voice print data; computing the data to obtain a validation model library; comparing and storing a first difference in the voice print database; and determining the similarity of the first difference and a second difference. The invention further provides an identification method of the forensic identity. The method comprises the following steps: processing the data to obtain a third voice print data; compiling an identification model library according the identity information in the live recording; comparing the sampled voice print characteristic of a suspect with the identification model library and recording the first difference; comparing the third voice print difference with the identification model library to obtain the second difference; and performing similarity ordering on the first difference and the second difference to output an identification result.

Description

Digital voice print identification systems and affirmation and identification method

Technical field

The present invention relates to human body biological characteristics identification field, relate in particular to a kind of digital voice print identification systems and affirmation and identification method that is used for criminal appreciation identity.

Background technology

Human body biological characteristics comprises DNA, fingerprint, iris, nethike embrane, palm and vocal print etc.Wherein DNA and fingerprint are increasingly mature in the application of industry-by-industry, are especially utilizing DNA and fingerprint to carry out suspect's evaluation aspect the criminal appreciation.But the work of setting up DNA and fingerprint database and will collect DNA and fingerprint is very complicated, and will obtain in the scene of a crime and to want difficulty the relative sound with fingerprint of DNA.

At present, automatic recognition of speech technology, voice confirm that automatically technology, the automatic recognition techniques of words person, voice spectrum diagram technology, voice spectrum analysis technology are all in fast development.Realize that wherein words person confirms and the method for identification is a lot, (Gaus-sian mixture model GMM) comes voice spectrum is carried out the identification that modeling realizes speaker's identity for example to use gauss hybrid models based on many spatial probability distribution (multi-space probabilitydistribution).But such scheme can't be applied in the criminal appreciation substantially; because the precision of voice spectrum analysis is not enough; this spectrum analysis mainly depends on analyst's working experience; therefore different analysts obtains analysis result for same voice vestige can be different fully, thereby are not suitable for being applied to criminal appreciation aspect.

Summary of the invention

The objective of the invention is defective, provide a kind of digital voice print identification systems to realize according to live recording identification suspect at the prior art existence.

To achieve these goals, the invention provides a kind of digital voice print identification systems, comprising:

A plurality of collectors are used to gather suspect's sound vestige;

Voice print processor is connected with a plurality of collectors, is used for sound vestige that described collector is collected or live recording and carries out processing such as noise reduction, section, modeling and obtain the vocal print data and the sound vestige is set up case number register;

Voice print database is connected with voice print processor, the characteristic information of the vocal print of the suspect after the processing number relevant with case that is used to store registration;

Comparison adjustment engine is connected with voice print database, the vocal print that is used to set voice print database and the vocal print of the live recording of the input condition of comparing, and in comparison process real-time adjustment comparison condition.

Described voice print processor comprises:

Noise reduction module is used for automatic noise reduction is carried out in live recording and sound vestige;

The section module is used to receive the file auto slice behind the noise reduction that described noise reduction module sends;

The identidication key module, the file that is used to receive after the section that described section module sends is selected 5-8 critical field that does not have two words to repeat;

Parameterized module is used to receive the critical field that identifies and carries out parametrization;

MBM, thus be used for obtaining vocal print to carrying out modeling according to speech, word, initial consonant, simple or compound vowel of a Chinese syllable through parameterized critical field;

Registering modules is registered into voice print database with the vocal print that described MBM modeling obtains.

Described digital voice print identification systems also comprise:

Confirm model bank, be used to provide the affirmation universal standard;

The investigation processor, be used to import live recording and case number, from voice print database, number finding out corresponding suspect's vocal print according to case, to the vocal print after the live recording process voice print processor processing of this suspect's vocal print and input, it is said for this suspect to confirm this live recording to compare according to the affirmation model bank.

The digital voice print identification systems also comprise:

The identification model bank is used to provide the identification universal standard;

The identification processor is used for writing the identification model bank according to boundary condition from voice print database, and finds out suspect's tabulation according to the vocal print data in the identification model bank.

The invention provides a kind of confirmation method of criminal appreciation identity, comprising:

Vocal print is carried out in live recording handle the acquisition first vocal print feature;

Obtain the sound vestige with the live recording identical content, it is carried out vocal print handle and obtain the second vocal print data;

Obtain a plurality of the 3rd vocal print features, and the described second vocal print feature and described a plurality of the 3rd vocal print feature are obtained to confirm model bank by calculating;

Described second vocal print feature and described affirmation model bank are compared, and first difference that will obtain is stored in the voice print database;

The described first vocal print feature and described affirmation model bank compared obtains second difference, determines the similarity of first difference and second difference.

The described vocal print that carries out is handled and to be comprised: live recording or sound vestige are carried out obtaining the first vocal print data or the second vocal print data thereby carry out modeling according to speech, word, initial consonant, simple or compound vowel of a Chinese syllable after noise reduction, section, a selection 5-8 critical field that does not have two words to repeat carry out parametrization.

The identification method that the invention provides a kind of criminal appreciation identity comprises

Vocal print is carried out in live recording handle acquisition the 3rd vocal print data;

From database, obtain the suspect who satisfies these boundary conditions according to the identity information in the described live recording and write the identification model storehouse;

Described suspect's vocal print feature of sampling is compared with the identification model bank, write down first difference, and this difference is stored in the voice print database;

The 3rd vocal print feature and described identification model bank compared obtain second difference;

This first difference and described second difference are carried out the ordering output recognition results of similarity.

The described vocal print that carries out is handled and to be comprised: live recording is carried out obtaining the vocal print data thereby carry out modeling according to speech, word, initial consonant, simple or compound vowel of a Chinese syllable after noise reduction, section, a selection 5-8 critical field that does not have two words to repeat carry out parametrization.

Therefore, provided by the invention this can high precision and identification or confirm suspect effectively, help on criminal tracking down and arrest, using.

Below in conjunction with drawings and Examples, technical scheme of the present invention is described in further detail.

Description of drawings

Fig. 1 is the structural drawing of vocal print appreciation system embodiment of the present invention;

Fig. 2 is the structural drawing of voice print processor 2 of the present invention;

Fig. 3 is the structural drawing of vocal print appreciation system embodiment 2 of the present invention;

Fig. 4 is the structural drawing of vocal print appreciation system embodiment 3 of the present invention;

The affirmation process flow diagram flow chart of Fig. 5 vocal print appreciation of the present invention system;

The identification process process flow diagram of Fig. 6 vocal print appreciation of the present invention system.

Embodiment

The present invention is mainly used in criminal appreciation aspect, obtain on-the-spot voice vestige after, to suspect's collection of recording, carry out noise reduction process and montage, then suspect's recording parameterization and modeling are made it to become vocal print, be stored in suspect's voice print database.With on-the-spot voice vestige and the comparison of suspect's vocal print, determine that this suspect and on-the-spot voice are same people, also promptly are referred to as the affirmation process then.Perhaps after having only the on-the-spot voice vestige of acquisition, compare after can writing out the recognition mode storehouse according to the vocal print historical data base of setting up such as offender's voice print database, suspect's voice print database and escaped criminal's voice print database, find out and the immediate suspect's vocal print of on-the-spot voice vestige, promptly drawing suspect's tabulation, also is identification process.

Be illustrated in figure 1 as the structural drawing of vocal print appreciation system embodiment of the present invention, this system 100 comprises a plurality of collectors 1, is used to gather suspect's sound vestige; Voice print processor 2 is connected with a plurality of collectors 1, and the sound vestige that is used for that collector 1 is collected carries out processing such as noise reduction, section, modeling and obtains the vocal print data and it is set up case number register; Voice print database comprises first voice print database 3 and second voice print database 4, be connected with voice print processor 2 respectively, be used to store the vocal print data of suspect after number relevant processing of each case of registration such as identity information age, sex, accent ground, body weight, height, and the characteristic information of vocal print as the frequency of speaking, the amplitude of speaking, the energy of speaking, etc. the vocal print index parameter; Comparison adjustment engine 5, be connected with second voice print database 4 with first voice print database 3, be used for setting the condition that the vocal print of the live recording of suspect's the vocal print of first voice print database and second voice print database and input is compared, and in comparison process, adjust the comparison condition in real time.

Wherein first voice print database is called urgently needed voice print database again, after obtaining live recording and suspect recording, and the database that can put on record identification at once and confirm; Second voice print database is called the reservoir data storehouse again, is not having live recording but has under suspect's the situation, registers standby to suspect's vocal print.In store suspect's various information at length in the voice print database, as identity information age, sex, accent ground, body weight, height, and the characteristic information of vocal print as the frequency of speaking, the amplitude of speaking, the energy of speaking, etc. the vocal print index parameter.

As shown in Figure 2, structural drawing for voice print processor 2 of the present invention, the MBM 25 that comprises noise reduction module 21, section module 22, identidication key module 23, parameterized module 24, carries out modeling according to speech, word, initial consonant, simple or compound vowel of a Chinese syllable is registered Registering modules 26 in first or second database into vocal print.Noise reduction module 21 is carried out automatic noise reduction with the recording file that obtains such as live recording file or suspect's recording file as an audio files that comprises " being No. ten buildings, five streets " here, then the file behind the noise reduction being sent to section module 22 carries out auto slice and cuts out as audio files that will " being No. ten buildings, five streets here ", file section back is carried out automatic keyword recognition as " No. ten building, five streets " to the file of cutting into slices according to key word, the criterion of selecting key word is the word of selecting 5-8 word and not having to repeat more than two, carry out parametrization according to the section file that will comprise key word by parameterized module, MBM 25 is according to speech, word, initial consonant, simple or compound vowel of a Chinese syllable carries out modeling, thereby the acquisition vocal print is registered into first or second voice print database by Registering modules.

Referring to shown in Figure 3, the structural drawing of vocal print appreciation system embodiment 2 of the present invention, this system also comprises: confirm model bank, be used to provide the affirmation universal standard; Investigation processor 6, be used to import live recording and case number, voice print processor is from number finding out corresponding suspect's vocal print according to case first voice print database, to the vocal print after the live recording process voice print processor processing of this suspect's vocal print and input, it is said for this suspect to confirm this live recording to compare according to the affirmation model bank.This process promptly is referred to as the affirmation process as shown in Figure 5, promptly may further comprise the steps:

Step 31: voice print processor is obtained live recording, and it is processed into first vocal print, also is referred to as known vocal print; Wherein the vocal print processing is carried out in recording as shown in Figure 2; As being the initial consonant modeling of adopting in this example; As with middle 5-8 the word of one section word " being No. ten buildings, five streets here " of live recording, " 7 punctual meetings tomorrow morning " and the word that do not have a repetition more than two as carrying out the vocal print processing at 7 in " lining is No. ten, five streets " and " morning meet on time ";

Step 32: voice print processor is obtained suspect such as Zhang San's sound vestige, and it is processed into second vocal print, also is referred to as to claim vocal print; Wherein the sound vestige to suspect carries out the vocal print processing as shown in Figure 2; As being the initial consonant modeling of adopting in this example; Promptly this suspect Zhang San repeats " lining is No. ten, five streets " and " morning meet on time " in the live recording at 7, and gathers repeatedly, it is carried out vocal print equally handle; And several are not had relevant personnel sample as 5 unrelated persons, the principle of sampling be sample to have certain length as " being No. ten buildings, five streets here " (requirement has the length of a continuous 5-7 word), want text relevant, promptly require sample content consistent with on-the-spot voice content as " being No. ten buildings, five streets here ", to repeatedly sample be the suspect to repeat say the sample content several times;

Step 33 after sampling finishes, after the result of sampling handled through noise reduction, shearing etc., is input in the computing machine, by comprising suspect's Model Calculation of feature extraction and voice modeling, obtains one and confirms model bank;

Step 34 is compared this suspect Zhang San's vocal print feature and affirmation model bank, adds in the voice print database after itself and difference of confirming model bank are write down;

Step 35, with on-the-spot speech samples with confirm the model bank acquisition difference of comparing, check this difference and the similarity that leaves suspect and the difference of confirming model bank in the voice print database in, think then that as approximate fully this people of live recording and suspect are that same individual promptly establishes sb's identity, if it is dissimilar fully, then obtaining a result is to negate same people, if two difference have certain similar be more similar, then obtaining a result is the tendency people that establishes sb's identity; If it is more dissmilarity that two difference have certain dissmilarity, the result who then draws is that tendency is negative same.

Referring to shown in Figure 4, vocal print appreciation system embodiment 3 of the present invention also comprises an identification processor 7, and identification model bank 8.Identification model bank 8, be used to provide the identification universal standard, identification processor 7 is connected with 4 and world model storehouse 8 with first, second voice print database 3, be used for writing world model storehouse 8 from first and second voice print database 3 and 4, and find out the pairing suspect's tabulation of vocal print according to the every index parameter in the world model storehouse 8 according to boundary condition.This process is to recognize speaker's tabulation of this recording fragment from voice print database according to the live recording fragment.Suspect's information that the identification model storehouse obtains from first voice print database and second voice print database according to certain input parameter such as sex, age, birthplace, the identification model storehouse is the subclass of first voice print database and second voice print database.This process promptly is referred to as identification process as shown in Figure 6, promptly may further comprise the steps:

Step 41 is obtained the live recording fragment as " being No. ten buildings, five streets " here, carries out base conditioning such as noise reduction, shearing it is processed into the 3rd vocal print;

Step 42, according to identity information in the live recording fragment such as speaker is the male sex, age is at 25-25 between year, accent is the Henan sound will meet these input parameters from first voice print database and second voice print database suspect's information derivation, suspect's hypothesis of sampling there are 10 suspects, only be exemplary and unrestricted herein, the principle of sampling is that sample will have certain length as " being No. ten buildings, five streets " (requirement has the length of a continuous 5-7 word) here, want text relevant, promptly require sample content consistent with on-the-spot voice content as " being No. ten buildings, five streets " here, to repeatedly sample be the suspect to repeat say the sample content several times;

Step 43, after sampling finishes, after the result of sampling handled through noise reduction, shearing etc., be input in the computing machine, by comprising suspect's Model Calculation of feature extraction and voice modeling, obtain the universal standard that suspect's model bank (universals with these 10 people) is promptly weighed these 10 suspect's vocal print features;

Step 44 is compared these 10 suspects' vocal print feature respectively with suspect's model bank, add in the voice print database after the difference of itself and suspect's model bank is write down respectively;

Step 45, on-the-spot speech samples and suspect's model bank are compared, obtain the difference of on-the-spot speech samples and suspect's model bank, carry out from high to low ordering according to the similarity of the difference of the difference of 10 suspects in the voice print database and suspect's model bank and on-the-spot speech samples and suspect's model bank, the output recognition results draws suspect's tabulation.As Zhang San's energy confidence level as 2.1, frequency confidence level 2.0, amplitude confidence level 1.9, thereby the confidence level of distinguishing that obtains this suspect is 98%.The second suspect Li Si's energy confidence level as 1.8, frequency confidence level 2.0, amplitude confidence level 1.7, thereby the confidence level of distinguishing that obtains this suspect is 88%, the 3rd suspect king's five energy confidence level as 1.5, frequency confidence level 1.9, amplitude confidence level 1.2, distinguish that confidence level is 76%, thereby obtain a tabulation, these suspects are arranged by distinguishing that confidence level is successively decreased successively.Therefore, come first suspect in this permutation table and have maximum suspicion.

Wherein, adopt the advantage of suspect's model bank to be based on algorithm, what obtain after registration that speech samples is sampled is the difference of itself and suspect's model bank.Therefore, when two objects are only arranged, can not compare as standard, but need the object of some to form suspect's model bank, after the difference that obtains each object and suspect's model bank, just can compare with any one object wherein.Give an example, suppose that we have two objects of first, second to compare, if with the first is that standard is suspect's model bank, the difference of the difference 0 of first and suspect's model bank and second and suspect's model bank must be non-0 amount (even because twice different recording of same individual also can exist certain difference) so, so just briefly this difference is just judged two to liking same people less than after the set numerical value.

Secondly be exactly that the principle of employing is to save space resources as much as possible when the setting up of voice print database, if each sample sound all writes down his all vocal print features, what voice print database will be very like this is huge, wastes resource.The opposite suspect's model bank that adopts, the difference that only need note speech samples and suspect's model bank gets final product, and identical part has just been omitted, and reaches the purpose of saving resource.

It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can make amendment or be equal to replacement technical scheme of the present invention, and not breaking away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims

1. digital voice print identification systems is characterized in that comprising:

A plurality of collectors are used to gather suspect's sound vestige;

Voice print processor is connected with a plurality of collectors, is used for sound vestige that described a plurality of collectors are collected or live recording and carries out noise reduction, section, modeling and handle and obtain the vocal print data and the sound vestige is set up case number register;

Comparison adjustment engine is connected with voice print database, is used to set the comparison condition of vocal print with the vocal print of the live recording of input of voice print database, and in comparison process real-time adjustment comparison condition;

Described voice print processor comprises: noise reduction module is used for automatic noise reduction is carried out in live recording and sound vestige; The section module is used to receive the file behind the noise reduction that described noise reduction module sends, and described file is carried out auto slice; The identidication key module, the file that is used to receive after the section that described section module sends is selected 5-8 critical field that does not have two words to repeat; Parameterized module is used to receive the critical field that identifies and carries out parametrization; MBM, thus be used for obtaining vocal print to carrying out modeling according to speech, word, initial consonant, simple or compound vowel of a Chinese syllable through parameterized critical field; Registering modules is registered into voice print database with the vocal print that described MBM modeling obtains; Confirm model bank, be used to provide the affirmation universal standard; The investigation processor, be used to import live recording and case number, in voice print database, number find out corresponding suspect's vocal print according to case, to the vocal print after the live recording process voice print processor processing of this suspect's vocal print and input, it is said for this suspect to confirm this live recording to compare according to the affirmation model bank.

2. digital voice print identification systems according to claim 1 is characterized in that, described digital voice print identification systems also comprise: