WO2020140376A1 - 基于声纹识别的酒驾检测方法、装置、设备及存储介质 - Google Patents

基于声纹识别的酒驾检测方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020140376A1
WO2020140376A1 PCT/CN2019/089161 CN2019089161W WO2020140376A1 WO 2020140376 A1 WO2020140376 A1 WO 2020140376A1 CN 2019089161 W CN2019089161 W CN 2019089161W WO 2020140376 A1 WO2020140376 A1 WO 2020140376A1
Authority
WO
WIPO (PCT)
Prior art keywords
standard
test
feature value
sample
preset
Prior art date
Application number
PCT/CN2019/089161
Other languages
English (en)
French (fr)
Inventor
黄夕桐
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020140376A1 publication Critical patent/WO2020140376A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Definitions

  • the present application belongs to the field of artificial intelligence, more specifically, it relates to a method, device, equipment and storage medium for drunk driving detection based on voiceprint recognition.
  • Drunk driving refers to driving within 8 hours after drinking alcohol, or within 24 hours after being drunk. Statistics show that after drunk driving, drivers are 15 times more likely to have accidents, and 30% of road traffic accidents are caused by drunk driving and drunk driving.
  • the main detection method for drunk driving is to use a blow-type tester to detect whether the behavior of the driver is drunk driving behavior by testing the content of alcohol in the breath of the driver: when the alcohol content of the driver's blood is greater than Or equal to 20mg/100ml, less than 80mg/100ml, the driving behavior of the driver belongs to drinking and driving; when the blood alcohol content of the driver of the vehicle is greater than or equal to 80mg/100ml, the driving behavior of the driver belongs to drunk driving.
  • the inventor realized that drivers who drink and drive often delay or do not cooperate with the blow test, resulting in traffic law enforcement personnel detecting the drunk driving situation of a car for too long, which may cause other cars to have drunk driving behavior Taking into account, the detection efficiency is low.
  • Embodiments of the present application provide a method, device, device, and storage medium for detecting drunk driving based on voiceprint recognition, to solve the problem of low efficiency of current drunk driving detection.
  • a method for detecting drunk driving based on voiceprint recognition includes:
  • test sound samples and perform preset feature value analysis on the test sound samples to obtain test feature values
  • test feature value does not belong to the first confidence interval, it is determined that the test sound sample is a drunk driving sample.
  • a drunk driving detection device based on voiceprint recognition includes:
  • a test feature value acquiring module configured to acquire a test sound sample, perform preset feature value analysis on the test sound sample, and obtain a test feature value
  • a standard feature value acquiring module configured to acquire a standard feature value, which is obtained by analyzing preset feature values of N preset standard sound samples, where N is a positive integer;
  • a first confidence interval construction module configured to construct a first confidence interval according to the standard feature value
  • the drunk driving sample determination module is configured to determine that the test sound sample is a drunk driving sample if the test feature value does not belong to the first confidence interval.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor realizes the voiceprint-based recognition when executing the computer-readable instructions Of drunk driving detection methods.
  • One or more computer non-volatile storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, such that the one or more processors execute the above Drunk driving detection method based on voiceprint recognition.
  • FIG. 1 is a schematic diagram of an application environment of a drunk driving detection method based on voiceprint recognition in an embodiment of the present application
  • FIG. 2 is a flowchart of a method for detecting drunk driving based on voiceprint recognition in an embodiment of the present application
  • FIG. 3 is another flowchart of a method for detecting drunk driving based on voiceprint recognition in an embodiment of the present application
  • FIG. 4 is another flowchart of a method for detecting drunk driving based on voiceprint recognition in an embodiment of the present application
  • FIG. 5 is another flowchart of a method for detecting drunk driving based on voiceprint recognition in an embodiment of the present application
  • FIG. 6 is another flowchart of a method for detecting drunk driving based on voiceprint recognition in an embodiment of the present application
  • FIG. 7 is a schematic block diagram of a drunk driving detection device based on voiceprint recognition in an embodiment of the present application.
  • FIG. 8 is another schematic block diagram of a drunk driving detection device based on voiceprint recognition in an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a test feature value acquisition module in a drunk driving detection device based on voiceprint recognition in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a computer device in an embodiment of the present application.
  • the method for detecting drunk driving based on voiceprint recognition can be applied in the application environment as shown in FIG. 1, in which the client communicates with the server through the network, and the server obtains test sound samples through the client and compares the test sound samples Perform preset feature value analysis to obtain the test feature value; then the server obtains the standard feature value and constructs the first confidence interval according to the standard feature value. If the test feature value does not belong to the first confidence interval, the test sound sample is determined to be a drunk driving sample. And output the judgment result to the client.
  • the client may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented with an independent server or a server cluster composed of multiple servers.
  • a method for detecting drunk driving based on voiceprint recognition is provided. Taking the method applied to the server in FIG. 1 as an example for illustration, it includes the following steps:
  • S10 Obtain a test sound sample, and perform predetermined feature value analysis on the test sound sample to obtain a test feature value.
  • the test sound sample refers to the sound sample of the driver obtained on the spot, which can be obtained by the client and sent to the server.
  • the client includes an acquisition module for testing sound samples, and the acquisition module for testing sound samples is, for example, a recording device.
  • the traffic police law enforcement personnel can have a conversation with the driver. When the driver speaks, the traffic police law enforcement personnel activates the client's voice acquisition module to obtain the voice of the driver as a test sound sample.
  • the client can also use voiceprint recognition technology to recognize the recorded sound, and automatically select the driver's voice as the test sound sample according to the recognition result, to reduce the impact of environmental noise or other personnel's voice on the detection of drunk driving.
  • the preset feature value refers to a feature value that can reflect whether the behavior corresponding to the sound sample is drunk driving. It can be understood that after a person drinks alcohol, the person's throat will be stimulated by alcohol, and a certain degree of change will occur. These changes in the throat can be used for voiceprint recognition of test sound samples, which can be reflected by detecting preset feature values.
  • the preset characteristic value may be at least one of a frequency perturbation value, an amplitude perturbation value, or a normalized noise energy value, or other acoustic indicators that can reflect changes in the throat, which is not specifically limited in this embodiment.
  • the server obtains the driver's voice as the test sound sample through the client, and then the server performs analysis on the preset feature value of the test sound sample by connecting a voiceprint recognition device, and uses the preset feature value of the obtained test sound sample as Test characteristic values.
  • the server sends the test sound samples obtained by the client to the voiceprint recognition device connected to the server to perform preset feature value analysis, thereby obtaining the test feature value.
  • the voiceprint recognition device may be electroglottography, which can provide the fundamental frequency of vocal cord vibration, closing rate, frequency perturbation value, amplitude perturbation value, normalized noise energy value, and Characteristic values such as the degree of vocal cord abduction and changes in the height of the larynx were tested accordingly.
  • the server inputs the test sound sample into the electro-glottograph, and the corresponding value of the preset feature value can be obtained as the test feature value.
  • the voiceprint recognition device may also be integrated into the client, and after the client obtains the test sound sample, the voiceprint recognition device is used to perform preset feature value analysis to obtain the test feature value, and then the analyzed test feature value Send to the server.
  • S20 Obtain a standard feature value, which is obtained by analyzing preset feature values of N preset standard sound samples, where N is a positive integer.
  • the standard sound sample refers to the sound sample of the person who has reached the standard of drunk driving.
  • recordings of persons who have confirmed that they have reached the standard of drunk driving through methods such as insufflation testing or blood drawing tests can be obtained, and these recordings are used as standard sound samples.
  • a large number of standard sound samples may be collected in advance and stored in a database on the server side as preset standard sound samples.
  • the server acquires the sound samples of N persons who meet the standards of drunk driving as standard sound samples, and then performs predetermined feature value analysis on the N standard sound samples, and uses the analyzed feature values as standard feature values.
  • the confidence interval (Confidence Interval, referred to as CI) is an estimate of an unknown parameter value in the parameter distribution of the population that generates the sample, given in the form of an interval, and a sample statistic is used to estimate the parameter value relative to the point estimate.
  • the confidence interval also contains information about the accuracy of the estimate. It can be understood that the accuracy of determining whether the test sound sample belongs to the drunk driving sample can be improved by constructing a confidence interval.
  • the server can first obtain the variance of the standard feature values of the N preset standard sound samples, and then construct a first confidence interval according to the obtained variance, so that whether the test sound sample belongs to the drunk driving sample can be determined according to the first confidence interval .
  • the server acquires the standard feature value after obtaining the test sound sample, and constructs the first confidence interval based on the standard feature value to determine whether the test sound sample belongs to the drunk driving sample.
  • the server can obtain a new first confidence interval based on the changed standard sound sample, and then determine whether the test sound sample belongs to the drunk driving sample according to the new first confidence interval, which is convenient for updating and detecting the drunk driving detection method based on voiceprint recognition. upgrade.
  • the server can also obtain the average value of the standard feature values of the N standard sound samples, and then set a preset drunk driving threshold based on the obtained average value, if the test feature value is greater than or equal to the preset drunk driving
  • the threshold value determines that the test sound sample is a drunk driving sample; if the test feature value is less than the preset drink driving threshold, the test sound sample is determined to be a non-drunk driving sample.
  • the preset drunk driving threshold value of the frequency perturbation value is A according to the average value of the frequency perturbation value of the standard feature value, and the frequency perturbation value corresponding to the test feature value is greater than A, the test sound sample is determined to be a drunk driving sample.
  • the preset drunk driving threshold may be obtained by adding and subtracting an empirical value according to the average value of the standard feature values.
  • test feature value does not belong to the first confidence interval, it is determined that the test sound sample is a drunk driving sample.
  • the server determines that the test sound sample is a drunk driving sample and the corresponding driver is in the state of drunk driving, and the server sends the judgment result to the client; if the test feature value belongs to the first A confidence interval, the server determines that the test sound sample is a non-drunk driving sample, the corresponding driver is in a non-drunk driving state, and the server sends the judgment result to the client.
  • the server can set the test sound sample to be a drunk driving sample when each test feature value belongs to the first confidence interval corresponding to each test content; it can also set when the preset part of the test feature value belongs to At the corresponding first confidence interval, the test sound sample is determined to be a drunk driving sample, and the other part of the test feature values are used as reference data.
  • the test sound sample by acquiring the test sound sample, performing a predetermined feature value analysis on the test sound sample to obtain the test feature value; then acquiring the standard feature value, and constructing the first confidence interval according to the standard feature value, if the test feature If the value does not belong to the first confidence interval, the test sound sample is determined to be a drunk driving sample.
  • the voice of the driver By obtaining the voice of the driver and comparing it with the first confidence interval obtained from the standard sound sample, to determine whether the driver is a drunk driver, it can avoid the situation that the driver delays or does not cooperate with the detection time for a long time during the blow test. Improve the efficiency of drunk driving identification.
  • the preset feature value is at least one of a frequency perturbation value, an amplitude perturbation value, and a normalized noise energy value, before step S20, that is, before the step of acquiring the standard feature value
  • the method for detecting drunk driving based on voiceprint recognition provided in this embodiment further includes the following steps:
  • the method for obtaining standard sound samples by the server is the same as the method in step S20, and will not be repeated here.
  • S52 Perform voiceprint recognition on each standard sound sample to obtain a preset feature value of each standard sound sample.
  • Voiceprint recognition refers to sending standard sound samples to a voiceprint recognition device for analysis of preset feature values, where the preset feature values are at least one of frequency perturbation values, amplitude perturbation values, and normalized noise energy values.
  • the voiceprint recognition device is an electro-glottograph, or other voiceprint recognition devices, which is not specifically limited here.
  • the frequency perturbation (jitter) value is a physical quantity describing the change of the basic frequency between adjacent periods of the sound wave, mainly reflecting the degree of rough sound and also the degree of hoarseness.
  • the jitter value in the sound signal is consistent with the functional state of the glottis, that is, under normal circumstances, there are more sound waves with the same frequency during the voice cycle, and fewer sound waves at different frequencies, at this time the frequency perturbation value is very small; and When a person drinks alcohol, the person's throat is stimulated by alcohol, and the functional state of the glottis changes, making the sound rough and the jitter value increases.
  • the amplitude perturbation (shimmer) value describes the change in the amplitude of sound waves between adjacent cycles, mainly reflecting the degree of hoarseness.
  • the person's throat is stimulated by alcohol, and the functional state of the glottis changes, making the sound hoarse and the shimmer value increases.
  • both the Jitter value and the shimmer value reflect the stability of the vocal cord vibration. The larger the value, the greater the slight change in the acoustic signal during the vocalization.
  • the normalized noise energy (NNE) value is the energy of the glottal noise caused by the incomplete closure of the glottis when it is uttered, which mainly reflects the degree of breath sound, but also reflects the degree of hoarseness and the closing of the glottis.
  • the throat is stimulated by alcohol, and the functional state of the glottis will change, which will increase the sound of breath in the sound, thereby increasing the NNE value.
  • the server performs voiceprint recognition on each of the N standard sound samples, and analyzes at least one preset feature value among the jitter value, shimmer value, or NNE value to obtain the jitter value of each standard sound sample , At least one preset feature value among shimmer value or NNE value.
  • the server saves at least one preset feature value of each jitter value, shimmer value, or NNE value of each standard sound sample acquired in the database of the server as the standard feature value of the corresponding standard sound sample. It can be understood that when the standard feature value includes three values of the jitter value, the shimmer value, and the NNE value, the voice characteristics of the drunk driver can be more reflected, and the accuracy of the drunk driving detection method based on voiceprint recognition is higher.
  • the preset feature value of each standard sound sample is obtained, and finally the preset feature of each standard sound sample is obtained.
  • the value serves as the standard feature value of the corresponding standard sound sample.
  • step S10 a test sound sample is obtained, and a predetermined feature value analysis is performed on the test sound sample to obtain a test feature value, which may specifically include the following steps:
  • S11 Acquire a test sound sample based on at least one of a preset pronunciation phoneme, a preset utterance time, or a preset environmental condition.
  • the preset pronunciation phoneme may be a vowel; a vowel, such as a, u, o, etc., also known as a vowel, is a type of phoneme, as opposed to a consonant; a vowel is a flow of air through the mouth during pronunciation Unhindered sounds, different vowels are caused by different shapes of the mouth, understandably, the vowels can better reflect the characteristics of the sound.
  • the preset utterance time may be 3 to 5 seconds, so that the sound can be completely recorded, and the preset feature value of the sound can be fully reflected.
  • the preset environmental condition may be to control the environmental noise below 45 dB SPL. Since on-site testing is usually affected by on-site noise, the realization of the preset environmental conditions may be to take the driver to an appropriate environment (such as in the traffic police car or traffic police office), or to pre-set phonemes and/or presets On the basis of determining the sounding time as drunk driving, further confirmation is made under preset environmental conditions.
  • the server acquires the test sound sample under at least one condition of preset pronunciation phoneme, preset utterance time, and preset environmental conditions. It is understandable that the more conditions that are satisfied, the higher the accuracy of the drunk driving identification.
  • S12 Perform voiceprint recognition on the test sound sample to obtain the preset characteristic value of the test sound sample.
  • Voiceprint recognition refers to sending a test sound sample to a voiceprint recognition device for analysis of a preset feature value, where the preset feature value is at least one of a frequency perturbation value, an amplitude perturbation value, and a normalized noise energy value.
  • the voiceprint recognition device is an electro-glottograph, or other voiceprint recognition devices, which is not specifically limited here.
  • the server analyzes at least one preset feature value among the jitter value, shimmer value or NNE value of the obtained test sound sample to obtain at least one preset feature value among the jitter value, shimmer value or NNE value of the test sound sample .
  • the server uses at least one preset feature value from the obtained jitter value, shimmer value, or NNE value of the test sound sample as the test feature value, and saves it in the database on the server side.
  • the standard feature value corresponds to the jitter value, shimmer value, and NNE value included in the test feature value.
  • the test feature value also includes the jitter value and the shimmer value.
  • a test sound sample is obtained based on at least one of a preset pronunciation phoneme, a preset utterance time, or a preset environmental condition, and then a voiceprint recognition is performed on the test sound sample to obtain the test sound sample’s Preset feature value; finally, the preset feature value of the test sound sample is used as the test feature value.
  • the obtained test sound samples can more reflect and reflect the preset characteristic values, thereby improving the accuracy of determining whether the test sound samples are drunk driving samples.
  • step S30 that is, constructing the first confidence interval according to the standard feature value
  • the specific steps may include the following steps:
  • the server can calculate the variance of the standard feature values corresponding to the N standard sound samples using the following formula:
  • x refers to the standard feature value
  • is the average value of (x 1 , x 2 , x 3 ... X N ).
  • the server can select a significance level according to the input of the client, and then construct the first confidence interval according to the selected significance level.
  • the significance level is the probability that the overall parameter falls within a certain range and may make mistakes, that is, the probability of making mistakes when constructing the confidence interval through the significance level. For example, if the first confidence interval constructed is B, and if the test feature value does not fall within the first confidence interval of B, the server determines that the test sound sample is a non-drunk driving sample, but the test sound sample is actually a drunk driving sample. Make mistakes.
  • the significance level can be taken as 0.05, 0.01, taking the significance level as an example.
  • the confidence range is 95%, then the first confidence interval is constructed as: Ie Construct the first confidence interval for the margin of error.
  • the server compares the test feature value of the test sound sample with the constructed first confidence interval. If the test feature value falls within the constructed first confidence interval, the test sound sample is determined to be a non-drunk driving sample. The judgment result of drunk driving is sent to the client; if the test feature value falls outside the constructed first confidence interval, the test sound sample is judged to be a drunk driving sample, and the judgment result of drunk driving is sent to the client.
  • the voiceprint-based The identified drunk driving detection method also includes the following steps:
  • the data difference between the test feature value and the standard feature value can be measured by a significance level.
  • the preset significance level can be set according to actual needs, and can be 0.05 or 0.01. Understandably, when the preset significance levels are different, the confidence intervals constructed are also different. Since the server needs to ensure that the test sound sample is judged to be a drunk driving sample, the probability of making a mistake is lower, and the preset significance level can be set to 0.01, that is, the server saves the drunk driving sample in the database as a new standard sound sample. Set the significance level to 0.01, and the significance level at the service end to judge that the test sound sample is a drunk driving sample is 0.05.
  • the server can store the corresponding drunk driving sample (test sound sample) in the server database as a new standard sound sample . It can be understood that the more standard sound samples in the database, the more accurate the drunk driving detection method based on voiceprint recognition.
  • S62 Incorporate the new standard sound samples into the N standard sound samples to obtain the updated standard sound samples.
  • the server merges the new standard sound samples into the original N standard sound samples to obtain the updated standard sound samples.
  • n drunk driving samples are converted into new standard sound samples at the same time.
  • the server may also update the standard sound samples according to a preset time interval, such as one day, one week, or one month, etc., which is not specifically limited.
  • S63 Obtain the variance of the updated standard sound sample, and construct a second confidence interval according to the variance of the updated standard sound sample.
  • the server may obtain the variance of the updated standard sound sample according to the method of step S31, and then rebuild the confidence interval according to the variance of the updated standard sound sample to obtain the second confidence interval.
  • the server determines that the new test sound sample is a drunk driving sample. It can be understood that by continuously expanding the standard sound samples of the server-side database, the drunk driving detection method based on voiceprint recognition can be made more accurate.
  • the corresponding drunk driving sample is saved in the database as a new standard sound sample; then the new The standard sound samples are combined into N standard sound samples to obtain the updated standard sound samples, and then the variance of the updated standard sound samples is obtained, and the second confidence interval is constructed according to the variance of the updated standard sound samples.
  • the number of standard sound samples can be expanded by incorporating drunk driving samples less than the preset significance level into the standard sound samples; and the confidence interval obtained from the expanded standard sound samples to determine whether the test sound samples belong to drunk driving samples can be continuously Improve the accuracy of drunk driving identification.
  • a drunk driving detection device based on voiceprint recognition corresponds one-to-one to the drunk driving detection method based on voiceprint recognition in the foregoing embodiment.
  • the drunk driving detection device based on voiceprint recognition includes a test feature value acquisition module 10, a standard feature value acquisition module 20, a first confidence interval construction module 30 and a drunk driving sample determination module 40.
  • the detailed description of each functional module is as follows:
  • the test characteristic value obtaining module 10 is used to obtain test sound samples, and perform predetermined characteristic value analysis on the test sound samples to obtain test characteristic values;
  • the standard feature value obtaining module 20 is used to obtain a standard feature value, which is obtained by performing a preset feature value analysis on N preset standard sound samples, where N is a positive integer;
  • a first confidence interval construction module 30 configured to construct a first confidence interval according to the standard feature value
  • the drunk driving sample determination module 40 is configured to determine that the test sound sample is a drunk driving sample if the test feature value does not belong to the first confidence interval.
  • the preset characteristic value is at least one of a frequency perturbation value, an amplitude perturbation value, and a normalized noise energy value.
  • the drunk driving detection device based on voiceprint recognition provided in this embodiment further includes a preset The feature value analysis module 50.
  • the preset feature value analysis module 50 includes a standard sample acquisition unit 51, a first voiceprint recognition unit 52, and a standard feature value determination unit 53.
  • the standard sample obtaining unit 51 is used to obtain standard sound samples
  • the first voiceprint recognition unit 52 is configured to perform voiceprint recognition on each standard sound sample to obtain a preset feature value of each standard sound sample;
  • the standard feature value determining unit 53 is configured to use the preset feature value of each standard sound sample as the standard feature value of the corresponding standard sound sample.
  • test feature value acquisition module 10 further includes a test sample acquisition unit 11, a second voiceprint recognition unit 12, and a test feature value determination unit 13.
  • the test sample obtaining unit 11 is configured to obtain a test sound sample based on at least one of a preset pronunciation phoneme, a preset utterance time, or a preset environmental condition;
  • the second voiceprint recognition unit 12 is used to perform voiceprint recognition on the test sound sample to obtain the preset characteristic value of the test sound sample;
  • the test feature value determination unit 13 is configured to use the preset feature value of the test sound sample as the test feature value.
  • first confidence interval construction module 30 is also used to:
  • the first confidence interval is constructed according to the variance of N standard eigenvalues.
  • the device for detecting drunk driving based on voiceprint recognition provided in this embodiment further includes a second confidence interval construction module, where the second confidence interval construction module is specifically used for:
  • the variance of the updated standard sound sample is acquired, and the second confidence interval is constructed according to the variance of the updated standard sound sample.
  • each module in the above drunk driving detection device based on voiceprint recognition may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and an internal structure diagram thereof may be as shown in FIG.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store test sound samples, standard sound samples, test feature values, standard feature values, and drunk driving samples.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer readable instructions are executed by the processor to implement a method for detecting drunk driving based on voiceprint recognition.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • test sound samples and perform preset feature value analysis on the test sound samples to obtain test feature values
  • test sound sample is determined to be a drunk driving sample.
  • one or more computer non-volatile storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, cause the one or more The processor implements the following steps:
  • test sound samples and perform preset feature value analysis on the test sound samples to obtain test feature values
  • test sound sample is determined to be a drunk driving sample.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

本申请属于人工智能领域,公开了一种基于声纹识别的酒驾检测方法、装置、设备及存储介质,该方法包括:获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值;获取标准特征值,所述标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;根据所述标准特征值构建第一置信区间;若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本。本申请提供的酒驾识别方法可以通过检测驾驶人员的说话声音来判断驾驶人员是否属于酒驾,提高了酒驾识别的效率。

Description

基于声纹识别的酒驾检测方法、装置、设备及存储介质
本申请以2019年1月4日提交的申请号为201910007825.9,名称为“基于声纹识别的酒驾检测方法、装置、设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请属于人工智能领域,更具体地说,是涉及一种基于声纹识别的酒驾检测方法、装置、设备及存储介质。
背景技术
酒后驾车是指在饮酒后8小时之内,或者醉酒后24小时之内驾驶车辆。统计表明,驾驶员酒后驾车后,发生事故的可能性是平时的15倍,30%的道路交通事故是由酒后开车、醉酒驾车引起的。
目前,对酒驾主要的检测方法为运用吹气式检测仪来检测,通过测试驾驶人员吹出的口气中酒精的含量来判断驾驶人员的行为是否属于酒驾行为:当车辆驾驶人员血液中的酒精含量大于或者等于20mg/100ml,小于80mg/100ml时,驾驶人员的驾驶行为属于饮酒驾车;当车辆驾驶人员血液中的酒精含量大于或者等于80mg/100ml,驾驶人员的驾驶行为属于醉酒驾车。然而,发明人意识到,酒后驾车的驾驶人员经常拖延或者不配合进行吹气检测,导致交通执法人员在检测一台汽车的酒驾情况的时间过长,可能会造成其它汽车存在酒驾行为而无法兼顾,检测效率较低。
发明内容
本申请实施例提供一种基于声纹识别的酒驾检测方法、装置、设备及存储介质,以解决目前酒驾检测的效率较低的问题。
一种基于声纹识别的酒驾检测方法,包括:
获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值;
获取标准特征值,所述标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
根据所述标准特征值构建第一置信区间;
若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本。
一种基于声纹识别的酒驾检测装置,包括:
测试特征值获取模块,用于获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值;
标准特征值获取模块,用于获取标准特征值,所述标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
第一置信区间构建模块,用于根据所述标准特征值构建第一置信区间;
酒驾样本判定模块,用于若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述基于声纹识别的酒驾检测方法。
一个或多个存储有计算机可读指令的计算机非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现上述基于声纹识别的酒驾检测方法。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中基于声纹识别的酒驾检测方法的一应用环境示意图;
图2是本申请一实施例中基于声纹识别的酒驾检测方法的一流程图;
图3是本申请一实施例中基于声纹识别的酒驾检测方法的另一流程图;
图4是本申请一实施例中基于声纹识别的酒驾检测方法的另一流程图;
图5是本申请一实施例中基于声纹识别的酒驾检测方法的另一流程图;
图6是本申请一实施例中基于声纹识别的酒驾检测方法的另一流程图;
图7是本申请一实施例中基于声纹识别的酒驾检测装置的一原理框图;
图8是本申请一实施例中基于声纹识别的酒驾检测装置的另一原理框图;
图9是本申请一实施例中基于声纹识别的酒驾检测装置中测试特征值获取模块的一原理框图;
图10是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的基于声纹识别的酒驾检测方法,可应用在如图1的应用环境中,其中,客户端通过网络与服务端进行通信,服务端通过客户端获取测试声音样本,对测试声音样本进行预设特征值分析,得到测试特征值;然后服务端获取标准特征值,根据标准特征值构建第一置信区间,若测试特征值不属于第一置信区间,则判定测试声音样本为酒驾样本,并将判定结果输出至客户端。其中,客户端可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种基于声纹识别的酒驾检测方法,以该方法应用在图1中的服务端为例进行说明,包括如下步骤:
S10:获取测试声音样本,对测试声音样本进行预设特征值分析,得到测试特征值。
其中,测试声音样本是指现场获取的驾驶人员的声音样本,可以通过客户端获取后发送至服务端。可选地,客户端包括测试声音样本的获取模块,测试声音样本的获取模块例如为录音装置。现场获取时,交警执法人员可以与驾驶人员进行对话,当驾驶人员说话时,交警执法人员开启客户端的声音获取模块获取驾驶人员说话的声音作为测试声音样本。进一步地,客户端还可以采用声纹识别技术,对录入的声音进行识别,根据识别的结果自动选取驾驶人员的声音作为测试声音样本,减少环境噪声或其它人员的声音对酒驾检测的影响。
其中,预设特征值是指可以反映声音样本对应的行为是否为酒驾的特征值,可以理解,人喝酒后,人的咽喉受到酒精的刺激,会发生一定程度的变化。而咽喉这些变化可以对测试声音样本进行声纹识别,通过检测预设特征值来体现。可选地,预设特征值可以为频率微扰值、振幅微扰值或规范化噪声能量值中的至少一项,也可以为其它可以反映咽喉变化的声学指标,本实施例不做具体限定。
具体地,服务端通过客户端获取驾驶人员的声音作为测试声音样本,然后服务端通过连 接声纹识别设备对测试声音样本进行预设特征值分析,将得到的测试声音样本的预设特征值作为测试特征值。例如,服务端将通过客户端获取的测试声音样本发送至与服务端连接的声纹识别设备进行预设特征值分析,从而得到测试特征值。可选地,声纹识别设备可以是电声门图仪(electroglottography),电声门图仪可以对声带振动基频、闭合率、频率微扰值、振幅微扰值、规范化噪声能量值,以及声带的外展程度和喉位的高低变化等特征值进行相应的检测。在预设特征值分析时,服务端将测试声音样本输入电声门图仪中,可得到预设特征值相应的值作为测试特征值。可选地,也可以将声纹识别设备集成于客户端,客户端在获取到测试声音样本后用声纹识别设备进行预设特征值分析,得到测试特征值,再将分析得到的测试特征值发送至服务端。
S20:获取标准特征值,标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数。
其中,标准声音样本是指已达到酒驾标准的人员的声音样本。可选地,可以获取通过吹气检测或抽血检查等方法确认达到酒驾标准的人员的录音,将这些录音作为标准声音样本。也可以邀请不同程度饮酒的人员进行录音来获取标准声音样本,从而可以对酒驾和醉驾的标准声音样本作进一步的细分。标准声音样本为N个,可以理解,N的值越大,即标准声音样本越多,基于声纹识别的酒驾检测方法的准确度越高。可选地,可以预先收集大量的标准声音样本存储于服务端的数据库中,作为预设的标准声音样本。
具体地,服务端通过获取N个达到酒驾标准的人员的声音样本作为标准声音样本,然后对这N个标准声音样本进行预设特征值分析,将分析得到的特征值作为标准特征值。
S30:根据标准特征值构建第一置信区间。
其中,置信区间(Confidence interval,简称CI),是对产生样本的总体的参数分布中的某一个未知参数值,以区间形式给出的估计,相对于点估计用一个样本统计量来估计参数值,置信区间还蕴含了估计的精确度的信息。可以理解,通过构建置信区间可以提高判断测试声音样本是否属于酒驾样本的准确性。
具体地,服务端可以首先获取N个预设的标准声音样本的标准特征值的方差,再根据得到的方差构建第一置信区间,从而可以根据第一置信区间来判断测试声音样本是否属于酒驾样本。
可以理解,服务端在获取到测试声音样本后再获取标准特征值,根据标准特征值构建第一置信区间来判断测试声音样本是否属于酒驾样本,当标准声音样本发生变化时,例如标准声音样本增多时,服务端可以根据变化后的标准声音样本得到一个新的第一置信区间,再根据新的第一置信区间判断测试声音样本是否属于酒驾样本,方便基于声纹识别的酒驾检测方 法的更新和升级。
在一个具体的实施例中,服务端也可以先获取N个标准声音样本的标准特征值的平均值,然后根据得到的平均值设定预设酒驾阈值,若测试特征值大于或者等于预设酒驾阈值,则判定测试声音样本为酒驾样本;若测试特征值小于预设酒驾阈值,则判定测试声音样本为非酒驾样本。例如,若根据标准特征值的频率微扰值的平均值将频率微扰值的预设酒驾阈值为A,测试特征值对应的频率微扰值大于A,则判定测试声音样本为酒驾样本。可选地,预设酒驾阈值可以根据标准特征值的平均值加减一个经验值得到。
S40:若测试特征值不属于第一置信区间,则判定测试声音样本为酒驾样本。
具体地,若测试特征值不属于第一置信区间,则服务端判定测试声音样本为酒驾样本,相应的驾驶人员为酒驾状态,服务端将判断的结果发送至客户端;若测试特征值属于第一置信区间,则服务端判定测试声音样本为非酒驾样本,相应的驾驶人员为非酒驾状态,服务端将判断的结果发送至客户端。
应理解,由于预设特征值可以包括多个检测内容,因此可以根据多个检测内容分别构建不同的第一置信区间。可选地,服务端可以设定当每一测试特征值都属于每一检测内容相应的第一置信区间时,判定测试声音样本为酒驾样本;也可以设定当预设部分的测试特征值属于相应的第一置信区间时,判定测试声音样本为酒驾样本,其它部分测试特征值则作为参考数据。
在图2对应的实施例中,通过获取测试声音样本,对测试声音样本进行预设特征值分析,得到测试特征值;然后获取标准特征值,根据标准特征值构建第一置信区间,若测试特征值不属于第一置信区间,则判定测试声音样本为酒驾样本。通过获取驾驶人员的声音,与标准声音样本得到的第一置信区间进行比较,从而判断驾驶人员是否属于酒驾,可以避免进行吹气检测时驾驶人员拖延或者不配合导致的检测时间过长的情况,提高酒驾识别的效率。
在一实施例中,如图3所示,预设特征值为频率微扰值、振幅微扰值和规范化噪声能量值中的至少一个,在步骤S20之前,即在获取标准特征值的步骤之前,本实施例提供的基于声纹识别的酒驾检测方法还包括以下步骤:
S51:获取标准声音样本。
其中,服务端获取标准声音样本的方法与步骤S20的方法相同,这里不再赘述。
S52:对每一标准声音样本进行声纹识别,得到每一标准声音样本的预设特征值。
其中,声纹识别是指将标准声音样本发送至声纹识别设备进行预设特征值分析,预设特征值为频率微扰值、振幅微扰值和规范化噪声能量值中的至少一个。可选地,声纹识别设备为电声门图仪,也可以是其它声纹识别设备,这里不做具体限定。
频率微扰(jitter)值是描述声波相邻周期之间基本频率变化的物理量,主要反映粗糙声程度,也反映嘶哑声程度。声音信号中的jitter值与声门区的功能状态是一致的,即正常情况下,嗓音周期间的频率相同的声波较多,不同频率的声波很少,此时频率微扰值很小;而当人饮酒后,人的咽喉受到酒精的刺激,声门区的功能状态会发生变化,使声音粗糙,jitter值增大。
振幅微扰(shimmer)值描述相邻周期之间声波幅度的变化,主要反映嘶哑声程度。当人饮酒后,人的咽喉受到酒精的刺激,声门区的功能状态会发生变化,使声音嘶哑,shimmer值增大。可以理解,Jitter值和shimmer值都反映声带振动的稳定性,其值越大说明在发声过程中声学信号出现的微小变化越大。
规范化噪声能量(NNE)值是计算发声时由于声门非完全关闭引起的声门噪声的能量,主要反映气息声程度,也反映嘶哑声程度和声门的关闭程度,当人饮酒后,人的咽喉受到酒精的刺激,声门区的功能状态会发生变化,使声音中的气息声增多,从而NNE值增大。
具体地,服务端对N个标准声音样本中每一个标准声音样本进行声纹识别,对jitter值、shimmer值或NNE值中至少一个预设特征值进行分析,得到每一标准声音样本的jitter值、shimmer值或NNE值中至少一个预设特征值。
S53:将每一标准声音样本的预设特征值作为对应的标准声音样本的标准特征值。
具体地,服务端将获取的每一标准声音样本的每一jitter值、shimmer值或NNE值中至少一个预设特征值保存在服务端的数据库中,作为对应的标准声音样本的标准特征值。可以理解,标准特征值包括jitter值、shimmer值和NNE值中的三个值时,更能体现酒驾人员的声音特征,基于声纹识别的酒驾检测方法的准确率更高。
在图3对应的实施例中,通过获取标准声音样本,然后对每一标准声音样本进行声纹识别,得到每一标准声音样本的预设特征值,最后将每一标准声音样本的预设特征值作为对应的标准声音样本的标准特征值。通过对标准声音样本的频率微扰值、振幅微扰值或规范化噪声能量值中至少一个预设特征值进行分析,为判断测试声音样本是否属于酒驾样本提供了数据支持和判断的依据。另外,由于频率微扰值、振幅微扰值或规范化噪声能量值可以很好地反映酒驾人员的声音特征,因此可以提高判断驾驶人员是否为酒驾人员的准确度。
在一实施例中,如图4所示,在步骤S10中,即获取测试声音样本,对测试声音样本进行预设特征值分析,得到测试特征值,具体可以包括以下步骤:
S11:基于预设发音音素、预设发声时间或预设环境条件中至少一个条件下获取测试声音样本。
可以理解,为了得到更好的声纹识别效果,在获取测试声音样本时,可以对发声时间、 发音音素或环境条件中至少一个条件进行相应的规定。可选地,预设发音音素可以为元音;元音,例如是a,u、o等,又称母音,是音素的一种,与辅音相对;元音是在发音过程中由气流通过口腔而不受阻碍发出的音,不同的元音是由口腔不同的形状造成的,可以理解,通过元音可以更好地反映声音的特征。可选地,预设发声时间可以为3~5秒,使声音可以被完整的录入,声音的预设特征值能得到完整的体现。可选地,预设环境条件可以是将环境噪声控制在45dB SPL以下。由于现场测试通常会受到现场噪声的影响,预设环境条件的实现可以是将驾驶人员带至适当的环境(例如交警的车内或交警办公室),也可以在预设发音音素和/或预设发声时间判定为酒驾的基础上,再在预设环境条件下作进一步的确认。
具体地,服务端在预设发音音素、预设发声时间和预设环境条件中至少一个条件下获取测试声音样本。可以理解,满足的条件越多,酒驾识别的准确性越高。
S12:对测试声音样本进行声纹识别,得到测试声音样本的预设特征值。
其中,声纹识别是指将测试声音样本发送至声纹识别设备进行预设特征值分析,预设特征值为频率微扰值、振幅微扰值和规范化噪声能量值中的至少一个。可选地,声纹识别设备为电声门图仪,也可以是其它声纹识别设备,这里不做具体限定。
具体地,服务端对获取的测试声音样本进行jitter值、shimmer值或NNE值中至少一个预设特征值的分析,得到测试声音样本的jitter值、shimmer值或NNE值中至少一个预设特征值。
S13:将测试声音样本的预设特征值作为测试特征值。
具体地,服务端将得到的测试声音样本的jitter值、shimmer值或NNE值中至少一个预设特征值作为测试特征值,并保存在服务端的数据库中。可选地,标准特征值与测试特征值包含的jitter值、shimmer值和NNE值对应。例如,若标准特征值包括jitter值和shimmer值,则测试特征值也包括jitter值和shimmer值。
在图4对应的实施例中,通过基于预设发音音素、预设发声时间或预设环境条件中至少一个条件下获取测试声音样本,然后对测试声音样本进行声纹识别,得到测试声音样本的预设特征值;最后将测试声音样本的预设特征值作为测试特征值。通过对测试声音样本时进行发音音素、发声时间或环境条件进行规范,可以使获取的测试声音样本更能反映体现预设特征值,从而提高判断测试声音样本是否属于酒驾样本的准确性。
在一实施例中,如图5所示,在步骤S30中,即根据标准特征值构建第一置信区间,具体可以包括以下步骤:
S31:获取N个标准特征值的方差。
具体地,若N个标准特征值表示为(x 1,x 2,x 3···x N),服务端可以用以下公式计算N个标准声音样本对应的标准特征值的方差:
Figure PCTCN2019089161-appb-000001
其中,x是指标准特征值,μ是(x 1,x 2,x 3···x N)的平均值。
S32:根据N个标准特征值的方差构建第一置信区间。
具体地,服务端可以根据客户端的输入选取一个显著性水平,再根据选取的显著性水平构建第一置信区间。其中,显著性水平是估计总体参数落在某一区间内,可能犯错误的概率,即通过该显著性水平构建置信区间时可能犯错误的概率。例如,若构建的第一置信区间为B,若测试特征值没有落在B这个第一置信区间内,则服务端将测试声音样本判定为非酒驾样本,但实际上测试声音样本为酒驾样本,则为犯错误。可选地,显著性水平可以取0.05,0.01,以显著性水平取0.05为例,相应地,置信范围为95%,则第一置信区间构建为:
Figure PCTCN2019089161-appb-000002
即以
Figure PCTCN2019089161-appb-000003
为误差范围构建第一置信区间。
具体地,服务端将测试声音样本的测试特征值与构建的第一置信区间进行比较,若测试特征值落入构建的第一置信区间内,则判定测试声音样本为非酒驾样本,并将非酒驾的判定结果发送至客户端;若测试特征值落在构建的第一置信区间外,则判定测试声音样本为酒驾样本,并将酒驾的判定结果发送至客户端。
在图5对应的实施例中,通过获取N个标准特征值的方差,再根据N个标准特征值的方差构建第一置信区间,可以为判断测试声音样本为酒驾样本提供判断的依据。另外,通过构建置信区间来实现酒驾样本的识别,可以提高酒驾识别的准确度。
在一实施例中,如图6所示,在步骤S40之后,即在若测试特征值不属于第一置信区间,则判定测试声音样本为酒驾样本的步骤之后,本实施例提供的基于声纹识别的酒驾检测方法还包括以下步骤:
S61:若测试特征值与N个标准特征值的数据差异性小于预设显著性水平,则将相应的酒驾样本保存至数据库中作为新的标准声音样本。
其中,测试特征值与标准特征值的数据差异性可以用显著性水平来衡量,本实施例中,预设显著性水平可以根据实际需要进行设定,可以取0.05或0.01。可以理解,当预设显著性水平不同时,构建的置信区间也不一样。由于需要确保服务端将测试声音样本判定为酒驾样本时犯错误的概率更低,可以将预设显著性水平取为0.01,即服务端将酒驾样本保存在数 据库中作为新的标准声音样本时预设显著性水平取为0.01,而在服务端判断测试声音样本为酒驾样本的显著性水平取为0.05。
具体地,若测试特征值与N个标准特征值的数据差异性小于预设显著性水平,则服务端可以将相应的酒驾样本(测试声音样本)存入服务端的数据库中作为新的标准声音样本。可以理解,数据库中的标准声音样本的数量越多,基于声纹识别的酒驾检测方法越准确。
S62:将新的标准声音样本并入N个标准声音样本中,得到更新后的标准声音样本。
具体地,服务端将新的标准声音样本并入原来N个标准声音样本中,得到更新后的标准声音样本。可以理解,新的标准声音样本可以为多个,即同一时刻有n个酒驾样本转为新的标准声音样本。例如若新的标准声音样本个数为n个,则增加后服务端的数据库中有N+n个标准声音样本。可选地,服务端也可以根据预设时间间隔对标准声音样本进行更新,预设时间间隔例如是一天、一周或一个月等,具体不做限定。
S63:获取更新后的标准声音样本的方差,根据更新后的标准声音样本的方差构建第二置信区间。
具体地,服务端可以根据步骤S31的方法获取更新后的标准声音样本的方差,再根据更新后的标准声音样本的方差重新构建置信区间,得到第二置信区间。当需要判断新的测试声音样本是否属于酒驾样本时,用新构建的第二置信区间进行判断,判断新的测试声音样本相应的测试特征值是否属于第二置信区间,若该测试特征值不属于第二置信区间,则服务端判定新的测试声音样本为酒驾样本。可以理解,通过不断扩充服务端数据库的标准声音样本,可以使基于声纹识别的酒驾检测方法更加准确。
在图6对应的实施例中,若测试特征值与N个标准特征值的数据差异性小于预设显著性水平,则将相应的酒驾样本保存至数据库中作为新的标准声音样本;然后将新的标准声音样本并入N个标准声音样本中,得到更新后的标准声音样本,再获取更新后的标准声音样本的方差,根据更新后的标准声音样本的方差构建第二置信区间。通过将小于预设显著性水平的酒驾样本并入标准声音样本中,可以扩充标准声音样本的数量;而根据扩充后的标准声音样本得到的置信区间去判断测试声音样本是否属于酒驾样本,可以不断提高酒驾识别的准确性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种基于声纹识别的酒驾检测装置,该基于声纹识别的酒驾检测装置与上述实施例中基于声纹识别的酒驾检测方法一一对应。如图7所示,该基于声纹识别的酒驾检测装置包括测试特征值获取模块10、标准特征值获取模块20、第一置信区间构建 模块30和酒驾样本判定模块40。各功能模块详细说明如下:
测试特征值获取模块10,用于获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值;
标准特征值获取模块20,用于获取标准特征值,所述标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
第一置信区间构建模块30,用于根据所述标准特征值构建第一置信区间;
酒驾样本判定模块40,用于若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本。
进一步地,预设特征值为频率微扰值、振幅微扰值和规范化噪声能量值中的至少一个,如图8所示,本实施例提供的基于声纹识别的酒驾检测装置还包括预设特征值分析模块50,预设特征值分析模块50包括标准样本获取单元51、第一声纹识别单元52、标准特征值确定单元53。
标准样本获取单元51,用于获取标准声音样本;
第一声纹识别单元52,用于对每一标准声音样本进行声纹识别,得到每一标准声音样本的预设特征值;
标准特征值确定单元53,用于将每一标准声音样本的预设特征值作为对应的标准声音样本的标准特征值。
进一步地,如图9所示,测试特征值获取模块10还包括测试样本获取单元11、第二声纹识别单元12和测试特征值确定单元13。
测试样本获取单元11,用于基于预设发音音素、预设发声时间或预设环境条件中至少一个条件下获取测试声音样本;
第二声纹识别单元12,用于对测试声音样本进行声纹识别,得到测试声音样本的预设特征值;
测试特征值确定单元13,用于将测试声音样本的预设特征值作为测试特征值。
进一步地,第一置信区间构建模块30还用于:
获取N个标准特征值的方差;
根据N个标准特征值的方差构建第一置信区间。
进一步地,本实施例提供的基于声纹识别的酒驾检测装置还包括第二置信区间构建模块,其中,第二置信区间构建模块具体用于:
若测试特征值与N个标准特征值的数据差异性小于预设显著性水平,则将相应的酒驾样本保存至数据库中作为新的标准声音样本;
将新的标准声音样本并入N个标准声音样本中,得到更新后的标准声音样本;
获取更新后的标准声音样本的方差,根据更新后的标准声音样本的方差构建第二置信区间。
关于基于声纹识别的酒驾检测装置的具体限定可以参见上文中对于基于声纹识别的酒驾检测方法的限定,在此不再赘述。上述基于声纹识别的酒驾检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储测试声音样本、标准声音样本、测试特征值、标准特征值和酒驾样本等。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于声纹识别的酒驾检测方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:
获取测试声音样本,对测试声音样本进行预设特征值分析,得到测试特征值;
获取标准特征值,标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
根据标准特征值构建第一置信区间;
若测试特征值不属于第一置信区间,则判定测试声音样本为酒驾样本。
在一个实施例中,一个或多个存储有计算机可读指令的计算机非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器实现如下步骤:
获取测试声音样本,对测试声音样本进行预设特征值分析,得到测试特征值;
获取标准特征值,标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
根据标准特征值构建第一置信区间;
若测试特征值不属于第一置信区间,则判定测试声音样本为酒驾样本。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种基于声纹识别的酒驾检测方法,其特征在于,包括:
    获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值;
    获取标准特征值,所述标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
    根据所述标准特征值构建第一置信区间;
    若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本。
  2. 如权利要求1所述的基于声纹识别的酒驾检测方法,其特征在于,所述预设特征值为频率微扰值、振幅微扰值和规范化噪声能量值中的至少一个;
    在所述获取标准特征值的步骤之前,所述基于声纹识别的酒驾检测方法还包括:
    获取所述标准声音样本;
    对每一所述标准声音样本进行声纹识别,得到每一所述标准声音样本的所述预设特征值;
    将每一所述标准声音样本的所述预设特征值作为对应的所述标准声音样本的标准特征值。
  3. 如权利要求2所述的基于声纹识别的酒驾检测方法,其特征在于,所述获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值,包括:
    基于预设发音音素、预设发声时间或预设环境条件中至少一个条件下获取测试声音样本;
    对所述测试声音样本进行声纹识别,得到所述测试声音样本的所述预设特征值;
    将所述测试声音样本的所述预设特征值作为测试特征值。
  4. 如权利要求3所述的基于声纹识别的酒驾检测方法,其特征在于,所述根据所述标准特征值构建第一置信区间,包括:
    获取N个所述标准特征值的方差;
    根据所述N个标准特征值的方差构建第一置信区间。
  5. 如权利要求4所述的基于声纹识别的酒驾检测方法,其特征在于,在所述若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本的步骤之后,所述基于声纹识别的酒驾检测方法还包括:
    若所述测试特征值与N个所述标准特征值的数据差异性小于预设显著性水平,则将相应的所述酒驾样本保存至数据库中作为新的标准声音样本;
    将所述新的标准声音样本并入N个所述标准声音样本中,得到更新后的标准声音样本;
    获取更新后的标准声音样本的方差,根据所述更新后的标准声音样本的方差构建第二置信区间。
  6. 一种基于声纹识别的酒驾检测装置,其特征在于,包括:
    测试特征值获取模块,用于获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值;
    标准特征值获取模块,用于获取标准特征值,所述标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
    第一置信区间构建模块,用于根据所述标准特征值构建第一置信区间;
    酒驾样本判定模块,用于若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本。
  7. 如权利要求6所述的基于声纹识别的酒驾检测装置,其特征在于,所述预设特征值为频率微扰值、振幅微扰值和规范化噪声能量值中的至少一个;
    所述基于声纹识别的酒驾检测装置还包括预设特征值分析模块,所述预设特征值分析模块包括标准样本获取单元、第一声纹识别单元、标准特征值确定单元;
    所述标准样本获取单元,用于获取所述标准声音样本;
    所述第一声纹识别单元,用于对每一所述标准声音样本进行声纹识别,得到每一所述标准声音样本的所述预设特征值;
    所述标准特征值确定单元,用于将每一所述标准声音样本的所述预设特征值作为对应的所述标准声音样本的标准特征值。
  8. 如权利要求7所述的基于声纹识别的酒驾检测装置,其特征在于,所述测试特征值获取模块包括测试样本获取单元、第二声纹识别单元和测试特征值确定单元;
    所述测试样本获取单元,用于基于预设发音音素、预设发声时间或预设环境条件中至少一个条件下获取测试声音样本;
    所述第二声纹识别单元,用于对所述测试声音样本进行声纹识别,得到所述测试声音样本的所述预设特征值;
    所述测试特征值确定单元,用于将所述测试声音样本的所述预设特征值作为测试特征值。
  9. 如权利要求8所述的基于声纹识别的酒驾检测装置,其特征在于,所述第一置信区间构建模块还用于:
    获取N个所述标准特征值的方差;
    根据所述N个标准特征值的方差构建第一置信区间。
  10. 如权利要求9所述的基于声纹识别的酒驾检测装置,其特征在于,还包括第二置信 区间构建模块,所述第二置信区间构建模块用于:
    若所述测试特征值与N个所述标准特征值的数据差异性小于预设显著性水平,则将相应的所述酒驾样本保存至数据库中作为新的标准声音样本;
    将所述新的标准声音样本并入N个所述标准声音样本中,得到更新后的标准声音样本;
    获取更新后的标准声音样本的方差,根据所述更新后的标准声音样本的方差构建第二置信区间。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值;
    获取标准特征值,所述标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
    根据所述标准特征值构建第一置信区间;
    若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本。
  12. 如权利要求11所述的计算机设备,其特征在于,所述预设特征值为频率微扰值、振幅微扰值和规范化噪声能量值中的至少一个;
    在所述获取标准特征值的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取所述标准声音样本;
    对每一所述标准声音样本进行声纹识别,得到每一所述标准声音样本的所述预设特征值;
    将每一所述标准声音样本的所述预设特征值作为对应的所述标准声音样本的标准特征值。
  13. 如权利要求12所述的计算机设备,其特征在于,所述获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值,包括:
    基于预设发音音素、预设发声时间或预设环境条件中至少一个条件下获取测试声音样本;
    对所述测试声音样本进行声纹识别,得到所述测试声音样本的所述预设特征值;
    将所述测试声音样本的所述预设特征值作为测试特征值。
  14. 如权利要求13所述的计算机设备,其特征在于,所述根据所述标准特征值构建第一置信区间,包括:
    获取N个所述标准特征值的方差;
    根据所述N个标准特征值的方差构建第一置信区间。
  15. 如权利要求14所述的计算机设备,其特征在于,在所述若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本的步骤之后,所述处理器执行所述计算机可读指令时还实现如下步骤:
    若所述测试特征值与N个所述标准特征值的数据差异性小于预设显著性水平,则将相应的所述酒驾样本保存至数据库中作为新的标准声音样本;
    将所述新的标准声音样本并入N个所述标准声音样本中,得到更新后的标准声音样本;
    获取更新后的标准声音样本的方差,根据所述更新后的标准声音样本的方差构建第二置信区间。
  16. 一个或多个存储有计算机可读指令的计算机非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值;
    获取标准特征值,所述标准特征值是对N个预设的标准声音样本进行预设特征值分析后得到的,其中,N为正整数;
    根据所述标准特征值构建第一置信区间;
    若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本。
  17. 如权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述预设特征值为频率微扰值、振幅微扰值和规范化噪声能量值中的至少一个;
    在所述获取标准特征值的步骤之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    获取所述标准声音样本;
    对每一所述标准声音样本进行声纹识别,得到每一所述标准声音样本的所述预设特征值;
    将每一所述标准声音样本的所述预设特征值作为对应的所述标准声音样本的标准特征值。
  18. 如权利要求17所述的计算机非易失性可读存储介质,其特征在于,所述获取测试声音样本,对所述测试声音样本进行预设特征值分析,得到测试特征值,包括:
    基于预设发音音素、预设发声时间或预设环境条件中至少一个条件下获取测试声音样本;
    对所述测试声音样本进行声纹识别,得到所述测试声音样本的所述预设特征值;
    将所述测试声音样本的所述预设特征值作为测试特征值。
  19. 如权利要求18所述的计算机非易失性可读存储介质,其特征在于,所述根据所述标准特征值构建第一置信区间,包括:
    获取N个所述标准特征值的方差;
    根据所述N个标准特征值的方差构建第一置信区间。
  20. 如权利要求19所述的计算机非易失性可读存储介质,其特征在于,在所述若所述测试特征值不属于所述第一置信区间,则判定所述测试声音样本为酒驾样本的步骤之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    若所述测试特征值与N个所述标准特征值的数据差异性小于预设显著性水平,则将相应的所述酒驾样本保存至数据库中作为新的标准声音样本;
    将所述新的标准声音样本并入N个所述标准声音样本中,得到更新后的标准声音样本;
    获取更新后的标准声音样本的方差,根据所述更新后的标准声音样本的方差构建第二置信区间。
PCT/CN2019/089161 2019-01-04 2019-05-30 基于声纹识别的酒驾检测方法、装置、设备及存储介质 WO2020140376A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910007825.9 2019-01-04
CN201910007825.9A CN109599121A (zh) 2019-01-04 2019-01-04 基于声纹识别的酒驾检测方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020140376A1 true WO2020140376A1 (zh) 2020-07-09

Family

ID=65964943

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089161 WO2020140376A1 (zh) 2019-01-04 2019-05-30 基于声纹识别的酒驾检测方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN109599121A (zh)
WO (1) WO2020140376A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109599121A (zh) * 2019-01-04 2019-04-09 平安科技(深圳)有限公司 基于声纹识别的酒驾检测方法、装置、设备及存储介质
CN113590868A (zh) * 2021-06-11 2021-11-02 深圳供电局有限公司 声音阈值的更新方法、装置、计算机设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887722A (zh) * 2009-06-18 2010-11-17 博石金(北京)信息技术有限公司 快速声纹认证方法
CN102194134A (zh) * 2010-03-01 2011-09-21 中国科学院自动化研究所 基于统计学习的生物特征识别性能指标预测方法
WO2014114048A1 (zh) * 2013-01-24 2014-07-31 华为终端有限公司 一种语音识别的方法、装置
CN107424614A (zh) * 2017-07-17 2017-12-01 广东讯飞启明科技发展有限公司 一种声纹模型更新方法
CN109102825A (zh) * 2018-07-27 2018-12-28 科大讯飞股份有限公司 一种饮酒状态检测方法及装置
CN109599121A (zh) * 2019-01-04 2019-04-09 平安科技(深圳)有限公司 基于声纹识别的酒驾检测方法、装置、设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7457753B2 (en) * 2005-06-29 2008-11-25 University College Dublin National University Of Ireland Telephone pathology assessment
US8560316B2 (en) * 2006-12-19 2013-10-15 Robert Vogt Confidence levels for speaker recognition
JP2009096327A (ja) * 2007-10-17 2009-05-07 Xanavi Informatics Corp 飲酒状態検出装置
US9336780B2 (en) * 2011-06-20 2016-05-10 Agnitio, S.L. Identification of a local speaker
CN104505102A (zh) * 2014-12-31 2015-04-08 宇龙计算机通信科技(深圳)有限公司 身体状况检测的方法及装置
CN108269574B (zh) * 2017-12-29 2021-05-25 安徽科大讯飞医疗信息技术有限公司 语音信号处理以表示用户声带状态的方法及装置、存储介质、电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887722A (zh) * 2009-06-18 2010-11-17 博石金(北京)信息技术有限公司 快速声纹认证方法
CN102194134A (zh) * 2010-03-01 2011-09-21 中国科学院自动化研究所 基于统计学习的生物特征识别性能指标预测方法
WO2014114048A1 (zh) * 2013-01-24 2014-07-31 华为终端有限公司 一种语音识别的方法、装置
CN107424614A (zh) * 2017-07-17 2017-12-01 广东讯飞启明科技发展有限公司 一种声纹模型更新方法
CN109102825A (zh) * 2018-07-27 2018-12-28 科大讯飞股份有限公司 一种饮酒状态检测方法及装置
CN109599121A (zh) * 2019-01-04 2019-04-09 平安科技(深圳)有限公司 基于声纹识别的酒驾检测方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN109599121A (zh) 2019-04-09

Similar Documents

Publication Publication Date Title
AU2016216737B2 (en) Voice Authentication and Speech Recognition System
US9858917B1 (en) Adapting enhanced acoustic models
US9940935B2 (en) Method and device for voiceprint recognition
US9589564B2 (en) Multiple speech locale-specific hotword classifiers for selection of a speech locale
US10002613B2 (en) Determining hotword suitability
US20160372116A1 (en) Voice authentication and speech recognition system and method
US8660844B2 (en) System and method of evaluating user simulations in a spoken dialog system with a diversion metric
KR20050098839A (ko) 네트워크 환경에서 음성 처리를 위한 중간 처리기
CN105940407A (zh) 用于评估音频口令的强度的系统和方法
AU2013203139A1 (en) Voice authentication and speech recognition system and method
WO2014114116A1 (en) Method and system for voiceprint recognition
US8447603B2 (en) Rating speech naturalness of speech utterances based on a plurality of human testers
WO2020140376A1 (zh) 基于声纹识别的酒驾检测方法、装置、设备及存储介质
CN110459242A (zh) 变声检测方法、终端及计算机可读存储介质
KR20190068830A (ko) 차량의 환경에 기반한 추천 신뢰도 판단 장치 및 방법
KR20230116886A (ko) 페이크 오디오 검출을 위한 자기 지도형 음성 표현
CN116114015A (zh) 用于语音使能设备的混沌测试
KR102113879B1 (ko) 참조 데이터베이스를 활용한 화자 음성 인식 방법 및 그 장치
US11831644B1 (en) Anomaly detection in workspaces
JP4864783B2 (ja) パタンマッチング装置、パタンマッチングプログラム、およびパタンマッチング方法
KR101892736B1 (ko) 실시간 단어별 지속시간 모델링을 이용한 발화검증 장치 및 방법
US11024302B2 (en) Quality feedback on user-recorded keywords for automatic speech recognition systems
WO2020073839A1 (zh) 语音唤醒方法、装置、系统及电子设备
US20230335114A1 (en) Evaluating reliability of audio data for use in speaker identification
CN113168438A (zh) 用户认证方法和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907116

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19907116

Country of ref document: EP

Kind code of ref document: A1