CN109817223A

CN109817223A - Phoneme notation method and device based on audio-frequency fingerprint

Info

Publication number: CN109817223A
Application number: CN201910086808.9A
Authority: CN
Inventors: 郑棉洲; 潘雷明; 陈昊亮
Original assignee: Guangzhou Speakin Network Technology Co Ltd
Current assignee: Guangzhou Speakin Network Technology Co Ltd
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2019-05-28

Abstract

The present invention relates to vocal print identification technical fields, specifically disclose a kind of phoneme notation method and device based on audio-frequency fingerprint, which comprises extract the audio-frequency fingerprint of voice to be marked, obtain the voice spectrum pole information of the audio-frequency fingerprint of the voice to be marked；The pole information is compared with audio-frequency fingerprint all in phoneme database, obtains the highest N number of retrieval phoneme of matching value；Wherein, N is natural number；Judge that the pronunciation in top n retrieval phoneme with the presence or absence of a retrieval phoneme is consistent with the pronunciation of phoneme to be marked: if so, N number of retrieval phoneme to be confirmed as to the label phoneme of the voice to be marked.The present invention provides a kind of phoneme notation method and device based on audio-frequency fingerprint, only chooses frequency spectrum pole and is compared, can be by reducing the reduced time to achieve the effect that Fast Labeling.

Description

Phoneme notation method and device based on audio-frequency fingerprint

Technical field

The present invention relates to vocal print identification technical field more particularly to a kind of phoneme notation methods based on audio-frequency fingerprint.

Background technique

Audio fingerprint techniques are that the audio of identified content and foundation will be needed by extracting the data characteristics in sound Completion is compared in fingerprint database.Identification process is not by the saving format of audio itself, coding mode, code rate and compression Technique influence.The matching of audio-frequency fingerprint is the matching of high precision, and the element independent of file can provide the member of related pages Information (meta-information), watermarking and file cryptographic Hash.

Phoneme is the minimum unit in voice, is analyzed according to the articulation in syllable, and a movement constitutes a sound Element.Phoneme is divided into vowel, consonant two major classes.

Vocal print identification be also known as voice identity identification, Speaker identification/identification, refer to by comparing, analysis, to acoustic image provide Expect the science judgment that the Problems of Identity for the voice recorded is carried out.In practical public security and judicial work, appraiser is usually It needs to test to case-involving voice (such as extorting, threatening phone call recording, the talk recording etc. of both parties in economic dispute), It analyzes the identity of speaker, judge case-involving voice (sample voice) and special object voice (sample voice) whether from same One people；And the judge written comment-voice identity expert opinion for the science of making, and then give a clue for the investigation of case And direction, evidence is provided for court action.

In the identification of existing voice identity, need to be decomposed into audio-visual data basic phoneme/syllable, then to difference Identical phoneme/the syllable in source compares, and then judges whether sounder is same target.

Vocal print identification is largely divided into two classes: i.e. words person recognizes (Speaker Identification) and words person's confirmation (Speaker Verification).The former, to judge that certain section of voice is described in which of several people, is " multiselect One " problem, and the latter is to confirm whether certain section of voice is described in specified someone, is " one-to-one differentiation " problem.Such as Recognition techniques may be needed when reducing criminal investigation range, and then need to confirm technology when bank transaction.

Either identification or confirmation, requires first to model the vocal print of speaker.Modeling needs to extract acoustic image money The phoneme of current object in material.The current main method extracted using the method manually identified or pure machine, it is artificial to identify Method accuracy it is high, but it is more to be needed manpower, and time-consuming, inefficiency.

Summary of the invention

It is an advantage of the invention to provide a kind of phoneme notation method and device based on audio-frequency fingerprint are only chosen Frequency spectrum pole is compared, can be by reducing the reduced time to achieve the effect that Fast Labeling.

To achieve these objectives, the present invention provides a kind of phoneme notation method based on audio-frequency fingerprint, comprising the following steps:

The audio-frequency fingerprint of voice to be marked is extracted, the voice spectrum pole of the audio-frequency fingerprint of the voice to be marked is obtained Information；

The pole information is compared with audio-frequency fingerprint all in phoneme database, obtains the highest N of matching value A retrieval phoneme；Wherein, N is natural number；

Judge that the pronunciation in top n retrieval phoneme with the presence or absence of a retrieval phoneme is consistent with the pronunciation of phoneme to be marked: If so, N number of retrieval phoneme to be confirmed as to the label phoneme of the voice to be marked.

Preferably, the audio-frequency fingerprint for extracting voice to be marked, obtains the language of the audio-frequency fingerprint of the voice to be marked Sound spectrum pole information, before, further includes:

Preemphasis, framing are carried out to voice signal and add the pretreatment of Hamming window to obtain the voice to be marked.

Preferably, the audio-frequency fingerprint for extracting voice to be marked, obtains the language of the audio-frequency fingerprint of the voice to be marked Sound spectrum pole information, specifically:

The audio-frequency fingerprint of voice to be marked is extracted, obtains the voice spectrum of the audio-frequency fingerprint of the voice to be marked on time Between the extreme value of each pole that sequentially occurs.

Preferably, the pronunciation for retrieving phoneme with the presence or absence of one in the judgement top n retrieval phoneme and phoneme to be marked Pronunciation it is consistent: if so, N number of retrieval phoneme is confirmed as the label phoneme of the voice to be marked, further includes:

If it is not, then enabling N=N+1, it is described by sound all in the pole information and phoneme database then to return to execution Frequency fingerprint compares, and obtains the highest N number of retrieval phoneme of matching value.

On the other hand, the present invention provides a kind of phoneme notation device based on audio-frequency fingerprint, comprising:

Pole acquiring unit, for extracting the audio-frequency fingerprint of voice to be marked, the audio for obtaining the voice to be marked refers to The voice spectrum pole information of line；

Comparing unit is obtained for comparing the pole information with audio-frequency fingerprint all in phoneme database The highest N number of retrieval phoneme of matching value；Wherein, N is natural number；

Judging unit, pronunciation and sound to be marked for judging to retrieve phoneme with the presence or absence of one in top n retrieval phoneme The pronunciation of element is consistent；

Phoneme notation unit, if the judging result of the judging unit be it is yes, the phoneme notation unit is for will be N number of The retrieval phoneme is confirmed as the label phoneme of the voice to be marked.

Preferably, further includes:

Pretreatment unit, for carrying out preemphasis, framing to voice signal and adding the pretreatment of Hamming window described to obtain Voice to be marked.

Preferably, the pole acquiring unit is specifically used for:

Preferably, further includes:

Return execution unit, if the judging result of the judging unit be it is no, the return execution unit be used for enable N=N + 1, then return execute it is described the pole information is compared with audio-frequency fingerprint all in phoneme database, obtain With the highest N number of retrieval phoneme of value.

The beneficial effects of the present invention are: a kind of phoneme notation method and device based on audio-frequency fingerprint is provided, by right Audio-frequency fingerprint compares, and only chooses frequency spectrum pole and is compared, reduces the reduced time, and avoid the shadow of noise in audio It rings, achievees the effect that Fast Labeling.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only Some embodiments of the present invention, for those of ordinary skill in the art, without any creative labor, It can also be obtained according to these attached drawings other attached drawings.

Fig. 1 is the flow diagram for the phoneme notation method based on audio-frequency fingerprint that embodiment one provides；

Fig. 2 is the structural block diagram for the phoneme notation device based on audio-frequency fingerprint that embodiment two provides.

Specific embodiment

To enable the purpose of the present invention, feature, advantage more obvious and understandable, below in conjunction with the present embodiment In attached drawing, the technical solution in the present embodiment is clearly and completely described, it is clear that the embodiments described below are only It is only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, ordinary skill Personnel's all other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Embodiment one

The present embodiment provides a kind of phoneme notation methods based on audio-frequency fingerprint, the sound suitable for field of speech recognition Labeling effciency can be improved in element label application scenarios, described to be based on audio by one kind based on the phoneme notation method of audio-frequency fingerprint The phoneme notation device of fingerprint executes, and passes through software and or hardware realization.

Fig. 1 is the flow diagram for the phoneme notation method based on audio-frequency fingerprint that embodiment one provides.

Referring to Fig. 1, the phoneme notation method based on audio-frequency fingerprint includes the following steps:

S10, preemphasis, framing are carried out to voice signal and adds the pretreatment of Hamming window to obtain the voice to be marked.

Specifically, the preemphasis of voice signal, in order to the high frequency section of voice be aggravated, lip is removed The influence of radiation increases the high frequency resolution of voice.After carrying out the processing of preemphasis digital filtering, here is exactly to carry out adding window point Frame processing, voice signal have short-term stationarity (10--30ms in it is considered that voice signal approximation constant), thus can be with Voice signal is divided into some short sections to be handled, here it is framing, the framing of voice signal is using moveable limited Method that the window of length is weighted is realized.General frame number per second is about 33 frames~100 frames, is depended on the circumstances.One As framing method be overlapping segmentation method, the overlapping part of former frame and a later frame is known as frame shifting, and frame moves and frame length Ratio is generally 0~0.5.

S20, the audio-frequency fingerprint for extracting voice to be marked obtain the voice spectrum of the audio-frequency fingerprint of the voice to be marked Pole information.

Generally, the abscissa of voice spectrum figure is the time, and ordinate is frequency.The voice frequency of tagged speech to be extracted Spectrogram should include several poles, and the extreme value of each pole occurred sequentially in time is the pole information.Specifically Ground, step S20 are as follows: extract the audio-frequency fingerprint of voice to be marked, obtain the voice spectrum of the audio-frequency fingerprint of the voice to be marked The extreme value of each pole occurred in chronological order.

S30, the pole information is compared with audio-frequency fingerprint all in phoneme database, obtains matching value most High N number of retrieval phoneme；Wherein, N is natural number.

Specifically, the pole information of voice to be marked is compared with audio-frequency fingerprint all in phoneme database, i.e., It can achieve the extraction for targetedly comparing information, and extract content without artificial determine, to reach Fast Labeling Effect.Further, N is preset value, can be defined and select according to the demands such as retrieval precision and target retrieval time It takes.

S40, judge in top n retrieval phoneme with the presence or absence of the pronunciation and the pronunciation of phoneme to be marked of a retrieval phoneme It is consistent: if so, N number of retrieval phoneme to be confirmed as to the label phoneme of the voice to be marked；If it is not, N=N+1 is then enabled, Then S30 is returned to step.

Phoneme notation method provided in this embodiment based on audio-frequency fingerprint only chooses the pole of voice spectrum to audio Fingerprint compares, and not only can be reduced the reduced time, but also is avoided that the influence of noise in audio, to reach the effect of Fast Labeling Fruit.

Embodiment two

Phoneme notation device provided in this embodiment based on audio-frequency fingerprint can be used for executing the embodiment of the present invention offer The phoneme notation method based on audio-frequency fingerprint, have corresponding function and beneficial effect.

Referring to fig. 2, a kind of phoneme notation device based on audio-frequency fingerprint, comprising:

Pretreatment unit 1, for carrying out preemphasis, framing to voice signal and adding the pretreatment of Hamming window to obtain State voice to be marked；

Pole acquiring unit 2 obtains the audio of the voice to be marked for extracting the audio-frequency fingerprint of voice to be marked The voice spectrum pole information of fingerprint；The pole acquiring unit 2 is specifically used for extracting the audio-frequency fingerprint of voice to be marked, obtains The extreme value for each pole for taking the voice spectrum of the audio-frequency fingerprint of the voice to be marked to occur in chronological order；

Comparing unit 3 is obtained for comparing the pole information with audio-frequency fingerprint all in phoneme database To the highest N number of retrieval phoneme of matching value；Wherein, N is natural number；

Judging unit 4, for judge top n retrieval phoneme in the presence or absence of one retrieval phoneme pronunciation with it is to be marked The pronunciation of phoneme is consistent；

Phoneme notation unit 5, if the judging result of the judging unit 4 be it is yes, the phoneme notation unit 5 is used for N A retrieval phoneme is confirmed as the label phoneme of the voice to be marked；

Return execution unit 6, if the judging result of the judging unit 4 be it is no, the return execution unit 6 be used for enable N =N+1, then return execution is described compares the pole information with audio-frequency fingerprint all in phoneme database, obtains To the highest N number of retrieval phoneme of matching value.

In embodiment provided herein, it should be understood that disclosed system, unit, device and method can To realize by another way.For example, all embodiments described above are only schematical, for example, said units Or the division of module etc., only a kind of logical function partition, there may be another division manner in actual implementation, such as Multiple units, module and component can be combined or can be integrated into another system, or some features can be ignored, or It does not execute.Another point, shown or discussed mutual coupling, direct-coupling or communication connection can be by some The indirect coupling or communication connection of interface, device or unit can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, as unit The component of display may or may not be physical unit, it can and it is in one place, or may be distributed over more In a network unit.Some or all of unit therein can be selected to realize this embodiment scheme according to the actual needs Purpose.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution essence of the application On all or part of the part that contributes to existing technology or the technical solution can be with the shape of software product in other words Formula embodies, which is stored in a computer readable storage medium, including some instructions are to make A terminal device (can be mobile phone, notebook or other electronic equipments etc.) is obtained to execute described in each embodiment of the application The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with Store the medium of program code.

The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features；And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of phoneme notation method based on audio-frequency fingerprint, which comprises the following steps:

The audio-frequency fingerprint of voice to be marked is extracted, the voice spectrum pole information of the audio-frequency fingerprint of the voice to be marked is obtained；

The pole information is compared with audio-frequency fingerprint all in phoneme database, obtains the highest N number of inspection of matching value Suo Yinsu；Wherein, N is natural number；

Judge that the pronunciation in top n retrieval phoneme with the presence or absence of a retrieval phoneme is consistent with the pronunciation of phoneme to be marked: if so, N number of retrieval phoneme is then confirmed as to the label phoneme of the voice to be marked.

2. the phoneme notation method according to claim 1 based on audio-frequency fingerprint, which is characterized in that the extraction is to be marked The audio-frequency fingerprint of voice obtains the voice spectrum pole information of the audio-frequency fingerprint of the voice to be marked, before, further includes:

3. the phoneme notation method according to claim 1 based on audio-frequency fingerprint, which is characterized in that the extraction is to be marked The audio-frequency fingerprint of voice obtains the voice spectrum pole information of the audio-frequency fingerprint of the voice to be marked, specifically:

The audio-frequency fingerprint of voice to be marked is extracted, obtains the voice spectrum of the audio-frequency fingerprint of the voice to be marked in chronological order The extreme value of each pole occurred.

4. the phoneme notation method according to claim 1 based on audio-frequency fingerprint, which is characterized in that the judgement top n It retrieves consistent with the pronunciation of phoneme to be marked with the presence or absence of the pronunciation of a retrieval phoneme in phoneme: if so, by N number of inspection Suo Yinsu is confirmed as the label phoneme of the voice to be marked, further includes:

If it is not, then enabling N=N+1, then return execution is described refers to the pole information with audio all in phoneme database Line compares, and obtains the highest N number of retrieval phoneme of matching value.

5. a kind of phoneme notation device based on audio-frequency fingerprint characterized by comprising

Pole acquiring unit obtains the audio-frequency fingerprint of the voice to be marked for extracting the audio-frequency fingerprint of voice to be marked Voice spectrum pole information；

Comparing unit is matched for comparing the pole information with audio-frequency fingerprint all in phoneme database It is worth highest N number of retrieval phoneme；Wherein, N is natural number；

Judging unit, for judging in top n retrieval phoneme with the presence or absence of the pronunciation of a retrieval phoneme and phoneme to be marked Pronunciation is consistent；

Phoneme notation unit, if the judging result of the judging unit be it is yes, the phoneme notation unit is used for N number of inspection Suo Yinsu is confirmed as the label phoneme of the voice to be marked.

6. the phoneme notation device according to claim 5 based on audio-frequency fingerprint, which is characterized in that further include:

Pretreatment unit, for carrying out preemphasis, framing to voice signal and adding the pretreatment of Hamming window described wait mark to obtain Remember voice.

7. the phoneme notation device according to claim 5 based on audio-frequency fingerprint, which is characterized in that the pole obtains single Member is specifically used for:

8. the phoneme notation device according to claim 5 based on audio-frequency fingerprint, which is characterized in that further include:

Return execution unit, if the judging result of the judging unit be it is no, the return execution unit be used for enable N=N+1, so Return execution is described afterwards compares the pole information with audio-frequency fingerprint all in phoneme database, obtains matching value most High N number of retrieval phoneme.