CN108766439A

CN108766439A - A kind of monitoring method and device based on Application on Voiceprint Recognition

Info

Publication number: CN108766439A
Application number: CN201810394740.6A
Authority: CN
Inventors: 吴松海; 陈昊亮
Original assignee: Guangzhou National Sound Technology Co Ltd
Current assignee: Guangzhou National Sound Technology Co Ltd
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2018-11-06

Abstract

The embodiment of the invention discloses a kind of monitoring method and device based on Application on Voiceprint Recognition; it solves existing monitoring technology and generally uses camera; and camera can not normally obtain image after being blocked intentionally; and the result of camera shooting is easy to be limited by angle and light environment, the infull technical problem of caused monitoring.Present invention method includes：The audio that S1, acquisition listen to；S2, speech recognition is carried out to the audio listened to, when the audio listened to includes preset keyword, executes step S3；S3, Application on Voiceprint Recognition is carried out to the audio listened to, and corresponding first vocal print of the audio listened to is compared with the second vocal print in preset vocal print library, if being matched to identical vocal print, location information is sent to early warning platform and responds the early warning platform.

Description

A kind of monitoring method and device based on Application on Voiceprint Recognition

Technical field

The present invention relates to monitoring technology field more particularly to a kind of monitoring method and device based on Application on Voiceprint Recognition.

Background technology

With camera and the growing prosperity of face recognition technology, the block used, the application scenarios such as interior, Ke Yishi When monitoring and regional extent of deploying to ensure effective monitoring and control of illegal activities, target tracking, public security safety etc. practical applications.

Existing monitoring technology generally uses camera, and camera can not normally obtain image after being blocked intentionally, and The result of camera shooting is easy to be limited by angle and light environment, causes to monitor infull technical problem.

Invention content

The present invention provides a kind of monitoring method and device based on Application on Voiceprint Recognition, it is general to solve existing monitoring technology Camera is used, and camera can not normally obtain image after being blocked intentionally, and the result imaged is easy by angle and light Thread environment limits, the infull technical problem of caused monitoring.

The present invention provides a kind of monitoring methods based on Application on Voiceprint Recognition, including：

The audio that S1, acquisition listen to；

S2, speech recognition is carried out to the audio listened to, when the audio listened to includes preset keyword When, execute step S3；

S3, Application on Voiceprint Recognition is carried out to the audio that listens to, and by corresponding first vocal print of the audio listened to It is compared with the second vocal print in preset vocal print library, if being matched to identical vocal print, sends location information to early warning platform And respond the early warning platform.

Optionally, further include before the step S1：

S01, the audio for obtaining typing；

S02, the extraction typing audio in the second vocal print and preserve into preset vocal print library.

Optionally, after the step S01, further include before the step S02：

To progress voice quality detection in the audio of the typing, including：

Calculate the first signal-to-noise ratio, first the average energy value and the first efficient voice duration of the audio of the typing；

Successively by the first signal-to-noise ratio of the audio of the typing, first the average energy value and the first efficient voice duration with it is right The first preset threshold value answered is compared, if the first signal-to-noise ratio, first the average energy value and the first efficient voice duration are above Corresponding first predetermined threshold value, it is determined that the voice quality of the audio of the typing is qualified, and executes next step, and otherwise prompt is used Family re-types audio and returns to the audio for reacquiring typing.

Optionally, the first signal-to-noise ratio, first the average energy value and the first effective language of the audio for calculating the typing Further include before sound duration：

Judge that the content type in the audio of the typing, content type include random digit, random phrase, random long sentence And fixed phrase；

Corresponding first preset threshold value of the first efficient voice duration is determined according to the content type in the audio of the typing.

Optionally, the step S3 is specifically included：

Application on Voiceprint Recognition, the first vocal print in the audio listened to described in extraction are carried out to the audio listened to；

The first vocal print in the audio listened to is compared with the second vocal print in preset vocal print library, is obtained With value；

Judge whether matching value is higher than preset matching threshold, when determining that matching value is higher than preset matching threshold, it is fixed to send Position information to early warning platform and responds the early warning platform.

Optionally, when matching value is less than preset matching threshold, the first vocal print in the audio listened to is added Extremely in the preset vocal print library, and respond early warning platform.

The present invention provides a kind of monitoring devices based on Application on Voiceprint Recognition, including：

First acquisition unit, for obtaining the audio listened to；

Voice recognition unit, for carrying out speech recognition to the audio listened to, when in the audio listened to When including preset keyword, vocal print comparing unit is jumped to；

Vocal print comparing unit, for carrying out Application on Voiceprint Recognition to the audio that listens to, and by the audio listened to Corresponding first vocal print is compared with the second vocal print in preset vocal print library, if being matched to identical vocal print, sends positioning Information is to early warning platform and responds the early warning platform.

Optionally, a kind of monitoring device based on Application on Voiceprint Recognition provided by the invention further includes：

Second acquisition unit, the audio for obtaining typing；

Voiceprint extraction unit, the second vocal print in audio for extracting the typing are simultaneously preserved into preset vocal print library.

Voice quality detection unit, for carrying out voice quality detection in the audio of the typing；

Institute's Voice Quality detection unit includes：

Computation subunit, the first signal-to-noise ratio, first the average energy value and first of the audio for calculating the typing have Imitate voice duration；

Comparison subunit, for successively by the first signal-to-noise ratio of the audio of the typing, first the average energy value and first Efficient voice duration is compared with corresponding first preset threshold value, if the first signal-to-noise ratio, first the average energy value and first have Effect voice duration is above corresponding first predetermined threshold value, it is determined that the voice quality of the audio of the typing is qualified, and executes In next step, otherwise prompt user re-types audio and returns to the audio for reacquiring typing.

Optionally, voice quality detection unit further includes：

Judgment sub-unit, the content type in audio for judging the typing, content type include random digit, with Machine phrase, random long sentence and fixed phrase；

Threshold value determination subelement determines the first efficient voice duration for the content type in the audio according to the typing Corresponding first preset threshold value.

As can be seen from the above technical solutions, the present invention has the following advantages：

The present invention provides a kind of monitoring methods based on Application on Voiceprint Recognition, including：The audio that S1, acquisition listen to；It is S2, right The audio listened to carries out speech recognition, when the audio listened to includes preset keyword, executes step S3； S3, Application on Voiceprint Recognition is carried out to the audio that listens to, and by corresponding first vocal print of the audio listened to and preset sound The second vocal print in line library is compared, if being matched to identical vocal print, sends location information to early warning platform and responds institute State early warning platform.

In the present invention, by obtaining the audio listened to, and the preset keyword in the audio listened to is identified, if monitoring Preset keyword has been arrived, then Application on Voiceprint Recognition has been carried out to the audio that listens to, and by the first vocal print recognized and preset vocal print library In the second vocal print be compared, judge whether be tracking target, solve existing monitoring technology generally use camera, And camera can not normally obtain image after being blocked intentionally, and the result imaged is easy to be limited by angle and light environment, The infull technical problem of caused monitoring.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without having to pay creative labor, may be used also for those of ordinary skill in the art To obtain other attached drawings according to these attached drawings.

Fig. 1 is a kind of flow diagram of one embodiment of the monitoring method based on Application on Voiceprint Recognition provided by the invention；

Fig. 2 is a kind of flow signal of another embodiment of the monitoring method based on Application on Voiceprint Recognition provided by the invention Figure；

Fig. 3 is a kind of structural schematic diagram of one embodiment of the monitoring device based on Application on Voiceprint Recognition provided by the invention；

Fig. 4 is a kind of structural representation of another embodiment of the monitoring device based on Application on Voiceprint Recognition provided by the invention Figure.

Specific implementation mode

An embodiment of the present invention provides a kind of monitoring method and device based on Application on Voiceprint Recognition, solve existing monitoring skill Art generally uses camera, and camera can not normally obtain image after being blocked intentionally, and the result imaged is easy by angle Degree and light environment limitation, the infull technical problem of caused monitoring.

In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below Embodiment be only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention Range.

Referring to Fig. 1, the present invention provides a kind of monitoring methods based on Application on Voiceprint Recognition, including：

101, the audio listened to is obtained；

102, speech recognition is carried out to the audio listened to, when the audio listened to includes preset keyword, executed Step 103；

103, Application on Voiceprint Recognition carried out to the audio that listens to, and by corresponding first vocal print of the audio listened to and preset sound The second vocal print in line library is compared, if being matched to identical vocal print, sends location information to early warning platform and responds pre- Alert platform.

In the embodiment of the present invention, by obtaining the audio listened to, and the preset keyword in the audio listened to is identified, If having listened to preset keyword, Application on Voiceprint Recognition carried out to the audio that listens to, and by the first vocal print recognized with it is preset The second vocal print in vocal print library is compared, and judges whether it is the target tracked, solves existing monitoring technology and generally use Camera, and camera can not normally obtain image after being blocked intentionally, and the result imaged is easy by angle and light ring Border limits, the infull technical problem of caused monitoring.

It is the explanation carried out to a kind of one embodiment of the monitoring method based on Application on Voiceprint Recognition provided by the invention above, A kind of another embodiment of the monitoring method based on Application on Voiceprint Recognition provided by the invention will be illustrated below.

Referring to Fig. 2, the present invention provides a kind of monitoring methods based on Application on Voiceprint Recognition, including：

201, the audio of typing is obtained；

It should be noted that before building preset vocal print library, first choice obtains the audio for needing typing.

202, to progress voice quality detection in the audio of typing, including：

2021, judge that the content type in the audio of typing, content type include random digit, random phrase, with captain Sentence and fixed phrase；

It should be noted that the content type in judging the audio of typing, content type includes random digit, random short Language, random long sentence and fixed phrase.

2022, the corresponding first preset threshold of the first efficient voice duration is determined according to the content type in the audio of typing Value；

It should be noted that determining the first efficient voice duration corresponding first according to the content type in the audio of typing Preset threshold value, if random digit, then corresponding first preset threshold value of the first efficient voice duration is 1.2 seconds；If random short Language, then corresponding first preset threshold value of the first efficient voice duration is 1.8 seconds；If random long sentence, then when the first efficient voice Long corresponding first preset threshold value is 16 seconds；If fixed phrase, then corresponding first preset threshold value of the first efficient voice duration It is 0.8 second.

2023, the first signal-to-noise ratio, first the average energy value and the first efficient voice duration of the audio of typing are calculated；

It should be noted that calculating the first signal-to-noise ratio, first the average energy value and the first efficient voice of the audio of typing Duration.

2024, successively by the first signal-to-noise ratio of the audio of typing, first the average energy value and the first efficient voice duration with Corresponding first preset threshold value is compared, if the first signal-to-noise ratio, first the average energy value and the first efficient voice duration are high In corresponding first predetermined threshold value, it is determined that the voice quality of the audio of typing is qualified, and executes next step, otherwise prompts user It re-types audio and returns to the audio for reacquiring typing；

It should be noted that successively by the first signal-to-noise ratio of the audio of typing, first the average energy value and first effective language Sound duration is compared with corresponding first preset threshold value, if the first signal-to-noise ratio, first the average energy value and the first efficient voice Duration is above corresponding first predetermined threshold value, it is determined that the voice quality of the audio of typing is qualified, and executes next step, otherwise Prompt user re-types audio and returns to the audio for reacquiring typing, wherein the corresponding first default threshold of the first signal-to-noise ratio Value is 10 decibels, and corresponding first predetermined threshold value of first the average energy value is [1000,30000], the first efficient voice duration pair The first preset threshold value answered has determined in previous step.

203, it extracts the second vocal print in the audio of typing and preserves into preset vocal print library；

It should be noted that after the voice quality qualification for the audio for determining typing, second in the audio of typing is extracted Vocal print is simultaneously preserved into preset vocal print library.

204, the audio listened to is obtained；

It should be noted that in monitoring, the audio listened to is obtained.

205, speech recognition is carried out to the audio listened to, when the audio listened to includes preset keyword, executed Step 206；

It should be noted that carry out speech recognition to the audio that listens to, judge among the audio listened to whether include Preset keyword, if so, thening follow the steps 206, wherein preset keyword is user's sets itself.

206, Application on Voiceprint Recognition is carried out to the audio listened to, extracts the first vocal print in the audio listened to；

It should be noted that there are the audios of the typing of preset keyword to carry out Application on Voiceprint Recognition, the sound listened to is extracted The first vocal print in frequency.

207, the first vocal print in the audio listened to is compared with the second vocal print in preset vocal print library, is obtained With value；

It should be noted that the first vocal print in the audio listened to is compared with the second vocal print in preset vocal print library Right, preset vocal print library includes the second vocal print of at least one user of typing, therefore obtains at least one matching value.

208, judge whether matching value is higher than preset matching threshold, when determining that matching value is higher than preset matching threshold, hair Location information is sent to early warning platform and responds early warning platform；

It should be noted that judging whether the matching value obtained is higher than preset matching threshold, that is, judge the audio listened to In whether have the corresponding vocal print of user of typing in preset vocal print library, if so, sending location information to early warning platform and sound Answer early warning platform.

209, when matching value is less than preset matching threshold, the first vocal print in the audio listened to is added to preset sound In line library, and respond early warning platform；

It should be noted that when matching value is less than preset matching threshold, illustrate related without preserving in preset vocal print library Second vocal print, but there are preset keyword in the audio due to listening to, need corresponding first vocal print of the audio that will be listened to It preserves into preset vocal print library, and responds early warning platform.

It is saying to a kind of another embodiment progress of the monitoring method based on Application on Voiceprint Recognition provided by the invention above It is bright, a kind of one embodiment of the monitoring device based on Application on Voiceprint Recognition provided by the invention will be illustrated below.

Referring to Fig. 3, the present invention provides a kind of one embodiment of the monitoring device based on Application on Voiceprint Recognition, including：

First acquisition unit 301, for obtaining the audio listened to；

Voice recognition unit 302, for carrying out speech recognition to the audio listened to, when the audio listened to includes pre- When setting keyword, vocal print comparing unit 33 is jumped to；

Vocal print comparing unit 303, for carrying out Application on Voiceprint Recognition to the audio listened to, and the audio listened to is corresponding First vocal print is compared with the second vocal print in preset vocal print library, if being matched to identical vocal print, sends location information extremely Early warning platform simultaneously responds early warning platform.

It is the explanation carried out to a kind of one embodiment of the monitoring device based on Application on Voiceprint Recognition provided by the invention above, A kind of another embodiment of the monitoring device based on Application on Voiceprint Recognition provided by the invention will be illustrated below.

Referring to Fig. 4, the present invention provides a kind of another embodiments of the monitoring device based on Application on Voiceprint Recognition, including：

Second acquisition unit 401, the audio for obtaining typing；

Voice quality detection unit 402, for carrying out voice quality detection in the audio of typing；

Voice quality detection unit 402 includes：

Judgment sub-unit 4021, the content type in audio for judging typing, content type include random digit, with Machine phrase, random long sentence and fixed phrase；

Threshold value determination subelement 4022 determines the first efficient voice duration for the content type in the audio according to typing Corresponding first preset threshold value；

Computation subunit 4023, the first signal-to-noise ratio, first the average energy value and first of the audio for calculating typing have Imitate voice duration；

Comparison subunit 4024, for successively by the first signal-to-noise ratio of the audio of typing, first the average energy value and first Efficient voice duration is compared with corresponding first preset threshold value, if the first signal-to-noise ratio, first the average energy value and first have Effect voice duration is above corresponding first predetermined threshold value, it is determined that the voice quality of the audio of typing is qualified, and executes next Otherwise step prompts user to re-type audio and returns to the audio for reacquiring typing；

Voiceprint extraction unit 403, the second vocal print in audio for extracting typing are simultaneously preserved into preset vocal print library；

First acquisition unit 404, for obtaining the audio listened to；

Voice recognition unit 405, for carrying out speech recognition to the audio listened to, when the audio listened to includes pre- When setting keyword, vocal print comparing unit 406 is jumped to；

Vocal print comparing unit 406, for carrying out Application on Voiceprint Recognition to the audio listened to, and the audio listened to is corresponding First vocal print is compared with the second vocal print in preset vocal print library, if being matched to identical vocal print, sends location information extremely Early warning platform simultaneously responds early warning platform；

Vocal print comparing unit 406 specifically includes：

Subelement 4061 is extracted, for carrying out Application on Voiceprint Recognition to the audio that listens to, extracts the in the audio listened to One vocal print；

Comparison subunit 4062, for by the second vocal print in the first vocal print and the preset vocal print library in the audio listened to It is compared, obtains matching value；

Coupling subelement 4063, for judging whether matching value is higher than preset matching threshold, when determining matching value higher than pre- When setting matching threshold, sends location information and to early warning platform and respond early warning platform；

Coupling subelement 4063 is additionally operable to when matching value is less than preset matching threshold, by first in the audio listened to Vocal print is added in preset vocal print library, and responds early warning platform.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit It closes or communicates to connect, can be electrical, machinery or other forms.

The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can be stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.

The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although with reference to before Stating embodiment, invention is explained in detail, it will be understood by those of ordinary skill in the art that：It still can be to preceding The technical solution recorded in each embodiment is stated to modify or equivalent replacement of some of the technical features；And these Modification or replacement, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of monitoring method based on Application on Voiceprint Recognition, which is characterized in that including：

The audio that S1, acquisition listen to；

S2, the audio progress speech recognition listened to is held when the audio listened to includes preset keyword Row step S3；

S3, Application on Voiceprint Recognition is carried out to the audio listened to, and by corresponding first vocal print of the audio listened to and in advance The second vocal print set in vocal print library is compared, if being matched to identical vocal print, sends location information to early warning platform and sound Answer the early warning platform.

2. the monitoring method according to claim 1 based on Application on Voiceprint Recognition, which is characterized in that also wrapped before the step S1 It includes：

S01, the audio for obtaining typing；

3. the monitoring method according to claim 2 based on Application on Voiceprint Recognition, which is characterized in that after the step S01, institute Further include before stating step S02：

To progress voice quality detection in the audio of the typing, including：

Successively by the first signal-to-noise ratio of the audio of the typing, first the average energy value and the first efficient voice duration with it is corresponding First preset threshold value is compared, if the first signal-to-noise ratio, first the average energy value and the first efficient voice duration are above correspondence The first predetermined threshold value, it is determined that the voice quality of the audio of the typing is qualified, and executes next step, otherwise user is prompted to weigh New inputting audio simultaneously returns to the audio for reacquiring typing.

4. the monitoring method according to claim 3 based on Application on Voiceprint Recognition, which is characterized in that the calculating typing Further include before the first signal-to-noise ratio, first the average energy value and the first efficient voice duration of audio：

Judge that the content type in the audio of the typing, content type include random digit, random phrase, random long sentence and consolidate Determine phrase；

5. the monitoring method according to claim 1 based on Application on Voiceprint Recognition, which is characterized in that the step S3 is specifically wrapped It includes：

The first vocal print in the audio listened to is compared with the second vocal print in preset vocal print library, is matched Value；

Judge whether matching value is higher than preset matching threshold, when determining that matching value is higher than preset matching threshold, sends positioning letter Breath is to early warning platform and responds the early warning platform.

6. the monitoring method according to claim 5 based on Application on Voiceprint Recognition, which is characterized in that when matching value is less than preset When with threshold value, the first vocal print in the audio listened to is added in the preset vocal print library, and respond early warning platform.

7. a kind of monitoring device based on Application on Voiceprint Recognition, which is characterized in that including：

First acquisition unit, for obtaining the audio listened to；

Voice recognition unit, for carrying out speech recognition to the audio listened to, when the audio listened to includes When preset keyword, vocal print comparing unit is jumped to；

Vocal print comparing unit for carrying out Application on Voiceprint Recognition to the audio listened to, and the audio listened to is corresponded to The first vocal print be compared with the second vocal print in preset vocal print library, if being matched to identical vocal print, send location information To early warning platform and respond the early warning platform.

8. the monitoring device according to claim 7 based on Application on Voiceprint Recognition, which is characterized in that further include：

Second acquisition unit, the audio for obtaining typing；

9. the monitoring device according to claim 8 based on Application on Voiceprint Recognition, which is characterized in that further include：

Institute's Voice Quality detection unit includes：

Computation subunit, the first signal-to-noise ratio, first the average energy value and the first effective language of the audio for calculating the typing Sound duration；

Comparison subunit, for successively that the first signal-to-noise ratio of the audio of the typing, first the average energy value and first is effective Voice duration is compared with corresponding first preset threshold value, if the first signal-to-noise ratio, first the average energy value and first effective language Sound duration is above corresponding first predetermined threshold value, it is determined that the voice quality of the audio of the typing is qualified, and executes next Otherwise step prompts user to re-type audio and returns to the audio for reacquiring typing.

10. the monitoring device according to claim 9 based on Application on Voiceprint Recognition, which is characterized in that voice quality detection unit Further include：

Judgment sub-unit, the content type in audio for judging the typing, content type includes random digit, random short Language, random long sentence and fixed phrase；

Threshold value determination subelement determines that the first efficient voice duration corresponds to for the content type in the audio according to the typing The first preset threshold value.