CN109273003B - Voice control method and system for automobile data recorder - Google Patents

Voice control method and system for automobile data recorder Download PDF

Info

Publication number
CN109273003B
CN109273003B CN201811380932.8A CN201811380932A CN109273003B CN 109273003 B CN109273003 B CN 109273003B CN 201811380932 A CN201811380932 A CN 201811380932A CN 109273003 B CN109273003 B CN 109273003B
Authority
CN
China
Prior art keywords
control command
audio
command word
command words
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811380932.8A
Other languages
Chinese (zh)
Other versions
CN109273003A (en
Inventor
白生炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN201811380932.8A priority Critical patent/CN109273003B/en
Publication of CN109273003A publication Critical patent/CN109273003A/en
Application granted granted Critical
Publication of CN109273003B publication Critical patent/CN109273003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The embodiment of the invention provides a voice control method for a vehicle event data recorder. The method comprises the following steps: collecting sound in a vehicle in real time to generate corresponding audio; extracting Fbank characteristics of the audio, analyzing the Fbank characteristics through a built-in neural network, and determining the posterior probability of each command word in the control command words hit by the audio; determining the joint probability of audio hitting each control command word by filtering the posterior probability of each command word in the control command word; taking the control command word with the maximum joint probability as an effective control command word; and acquiring a preset identification threshold, and when the joint probability of the effective control command words reaches the preset identification threshold, corresponding the audio to the effective control command words and executing the operation corresponding to the effective control command words. The embodiment of the invention also provides a voice control system for the automobile data recorder. According to the embodiment of the invention, the Fbank characteristic extraction is carried out on the collected audio, so that the operation amount is reduced, and the occupation of a memory and a hardware algorithm is saved because the decoding is not carried out.

Description

Voice control method and system for automobile data recorder
Technical Field
The invention relates to the field of intelligent voice, in particular to a voice control method and system for a vehicle event data recorder.
Background
The driving recorder is an instrument for recording the image and sound of the vehicle during driving. After the automobile data recorder is installed, the video, the image and the sound of the whole automobile driving process can be recorded, and evidence can be provided for traffic accidents. With the development of voice technology, the automobile data recorder is controlled by adopting a touch screen or a key, and is gradually developed to use voice control. Through speech control vehicle event data recorder, liberated vehicle driver's both hands, ensured that vehicle driver's attention is not dispersed, it is safer.
The automobile data recorder capable of being controlled by voice usually adopts a real-time voice decoding recognition mode to recognize and decode the collected audio, outputs characters corresponding to the voice spoken by a driver, compares the characters with command words, finally confirms a recognition result and further executes a corresponding control instruction.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
the adoption of the recognition decoding method is limited by the algorithm characteristics of the decoding method, so that the model resource used actually is larger, the actual computation amount and the storage space requirement are higher, the operation in the equipment with stronger processing performance and larger storage space is required, and the cost is higher. In addition, the decoding mode is affected by noise in the vehicle and wind noise, so that the accuracy of identification is reduced, and the experience is affected. Some automobile data recorders do not perform voice recognition, and only send received audio signals to a specific cloud neural network through a wireless network for recognition. And then the automobile data recorder receives a specific instruction fed back by the cloud neural network to operate. However, this method requires the wireless network to be clear, and if the network is delayed or does not have a network, voice control cannot be achieved.
Disclosure of Invention
The method and the device aim to at least solve the problem that in the prior art, due to the characteristics of an identification decoding method, a vehicle event data recorder is required to have higher storage space and higher processing performance, so that the cost is higher. Meanwhile, the recognition decoding is interfered by noise, so that the accuracy of the recognition result is low. And the problem that the voice control cannot be realized when the network is delayed or has no network is solved.
In a first aspect, an embodiment of the present invention provides a voice control method for a vehicle event data recorder, including:
collecting sound in a vehicle in real time to generate corresponding audio;
extracting Fbank characteristics of the audio, analyzing the Fbank characteristics through a built-in neural network, and determining the posterior probability of each command word in the control command words hit by the audio;
processing the posterior probability of each command word in the control command words through filtering, and determining the joint probability of the audio hitting each control command word;
taking the control command word with the maximum joint probability as an effective control command word;
and acquiring a preset identification threshold, and when the joint probability of the effective control command words reaches the preset identification threshold, corresponding the audio to the effective control command words and executing the operation corresponding to the effective control command words.
In a second aspect, an embodiment of the present invention provides a voice control system for a vehicle event data recorder, including:
the sound acquisition program module is used for acquiring sound in the vehicle in real time and generating corresponding audio;
the command word posterior probability determining program module is used for extracting Fbank characteristics of the audio, analyzing the Fbank characteristics through a built-in neural network and determining the posterior probability of each command word in the control command words hit by the audio;
a joint probability determination program module for determining the joint probability of the audio hitting each control command word by filtering the posterior probability of each command word in the control command words;
an effective control command word determining program module, configured to use the control command word with the highest joint probability as an effective control command word;
and the control program module is used for acquiring a preset identification threshold, corresponding the audio frequency to the effective control command word when the joint probability of the effective control command word reaches the preset identification threshold, and executing the operation corresponding to the effective control command word.
In a third aspect, an electronic device is provided, comprising: the voice control system comprises at least one processor and a memory which is in communication connection with the at least one processor, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute the steps of the voice control method for the automobile data recorder of any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the voice control method for a tachograph according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: the Fbank feature extraction is carried out on the collected audio, the audio is converted into character-type vectors, the data volume actually sent into a neural network is reduced, meanwhile, due to the fact that a decoding part is not made, occupation of an internal memory and a hardware algorithm is saved, more stable output is obtained through digital filtering, and the identification accuracy is improved. The neural network is configured locally in the driving recorder, and the network is not needed, so that the use scene is better and wider, the problems of network speed and the like are avoided, and the use effect is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a voice control method for a vehicle event data recorder according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a voice control system for a car recorder according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a voice control method for a car recorder according to an embodiment of the present invention, which includes the following steps:
s11: collecting sound in a vehicle in real time to generate corresponding audio;
s12: extracting Fbank characteristics of the audio, analyzing the Fbank characteristics through a built-in neural network, and determining the posterior probability of each command word in the control command words hit by the audio;
s13: processing the posterior probability of each command word in the control command words through filtering, and determining the joint probability of the audio hitting each control command word;
s14: taking the control command word with the maximum joint probability as an effective control command word;
s15: and acquiring a preset identification threshold, and when the joint probability of the effective control command words reaches the preset identification threshold, corresponding the audio to the effective control command words and executing the operation corresponding to the effective control command words.
In the present embodiment, in order to solve the complicated use environment in the vehicle, the neural network is trained in the drive recorder in advance, and training is performed by using the actual in-vehicle recording so as to cover most of the in-vehicle use scenes.
For step S11, after the vehicle is started, the automobile data recorder collects the sound in the vehicle in real time, so that the sound of the user can be acquired at any time. The automobile data recorder is arranged at the position where the automobile data recorder is arranged, or the automobile data recorder is provided with a special additional microphone, the automobile data recorder or the special additional microphone can be arranged near the head of a driver, so that collected sound is clearer, and the sound quality effect of the collected sound can be further improved. And generating corresponding audio frequency according to the collected sound.
For step S12, extracting Fbank features of the audio according to the audio generated in step S11, wherein extracting Fbank features of the audio includes: the pre-emphasis is to eliminate the effects caused by vocal cords and lips during the generation process to compensate the high frequency portion of the speech signal suppressed by the pronunciation system. And can highlight the resonance peak of high frequency; dividing the speech signal into frames, wherein the frame length is usually 20-40 ms, and the frame shift is 10ms (which may be specific); windowing, namely adding a hamming/panning window to each frame of signal to enable two ends of each frame of signal to be attenuated to be close to 0; STFT, obtaining vector characteristics, and converting an energy (amplitude) spectrum into a power spectrum; mel filtering, filtering through a Mel filter bank to obtain a sound spectrum according with the hearing habits of human ears, and finally converting a unit into db by taking a logarithm usually; DCT, discrete cosine transform, obtain the cepstrum coefficient. Analyzing the Fbank characteristics through a pre-trained built-in neural network, and determining the posterior probability of each command word in the control command words hit by the audio. For example, the control command words include control command words such as "play music", "next", "previous" …, and the determined command words include "play", "music", "down", "up", "one", "first" …, and the like, so as to determine the posterior probability of each command word.
For step S13, the posterior probability of each command word in the control command word is filtered to determine the joint probability of the audio hitting the control command word, and the joint probability of the control command word is determined according to the posterior probability of each command word determined in step S12, i.e., the joint probability of the control command word is determined according to the posterior probability of the word.
For step S14, since the present implementation method is used for voice control of the car recorder, and the car recorder always determines a corresponding operation after passing through audio recognition, the control command word with the highest joint probability is used as the effective control command word.
For step S15, a preset recognition threshold is obtained, wherein the higher the setting of the recognition threshold, the more accurate the precision of the effective control command word reaching the recognition threshold. However, if the setting is too high, the joint probability of the effective control command words can not reach the recognition threshold, and the specific control command cannot be recognized, so that the recognition threshold can be adjusted according to the corresponding situation. And when the joint probability of the effective control command words is determined to reach a preset identification threshold value, corresponding the audio frequency to the effective control command words, and executing the operation corresponding to the effective control command words.
According to the implementation method, the Fbank characteristic extraction is carried out on the collected audio, the audio is converted into the character-type vector, the data volume actually sent into the neural network is reduced, meanwhile, the occupation of the memory and a hardware algorithm is saved due to the fact that a decoding part is not carried out, the neural network is configured locally on the driving recorder, a network is not needed, the use scene is wider, the problems of network speed and the like are avoided, and the use effect is improved.
As an implementation manner, in this embodiment, the filtering process includes: and (4) digital filtering.
The digital filtering filters a posterior probability of each command word within the control command word, including:
taking the maximum value of the posterior probability of each command word in each control command word as the corresponding effective posterior probability of each command word in each control command word;
and multiplying the effective posterior probabilities of the command words in the control command words by two to determine the joint probability of the control command words.
In the embodiment, the posterior probability of each command word in the command words obtained after the neural network is used can be obtained only by digitally filtering the posterior probability of each word, and the corresponding command judgment can be performed according to the comparison between the posterior probability of the command word and the preset identification threshold direction, so as to output the corresponding command. The digital filtering part filters each command word of the command words output by the neural network, and can obtain the average value of the posterior probability of each word in a range of a fixed length to avoid the false recognition of the burrs. The maximum value of the posterior probability of each command word within the command word is then searched. And finally, multiplying the maximum value of the posterior probability of the command words in sequence according to the sequence of the command words to obtain the joint probability of the command words. For example, the posterior probability of "lower" is 70%, the posterior probability of "upper" is 30%, the posterior probability of "one" is 85%, and the posterior probability of "top" is 90%. The joint probability of "next" is obtained to be 53.55%, and the joint probability of "previous" is 22.95%.
By the implementation method, the more stable output can be obtained through digital filtering. The accuracy of discernment has been promoted.
As an implementation manner, in this embodiment, the acquiring, by the automobile data recorder, sounds in the vehicle in real time, and generating the corresponding audio further includes:
the method comprises the steps of collecting sound in a vehicle in real time, and generating corresponding audio when the sound in the vehicle reaches a preset sound pressure level.
In the embodiment, the sound in the vehicle is collected in real time, the corresponding audio is generated when the sound in the vehicle reaches the preset sound pressure level, the corresponding audio is not generated in real time considering that a vehicle driver does not always speak, and only needs to use the corresponding function, so that the corresponding audio is not required to be generated in real time, when the vehicle driver speaks, the sound pressure level changes, and the corresponding audio is regenerated to be identified.
It can be seen from this embodiment that by presetting the sound pressure level, some identification processes that are not functional are avoided. The calculation amount of the automobile data recorder is further reduced.
As an implementation manner, in this embodiment, when the joint probability of the valid control command words does not reach the preset recognition threshold, the control command words corresponding to the audio cannot be determined, and recognition failure information is fed back.
In this embodiment, when it is determined that the joint probability of the valid control command words spoken by the vehicle driver does not reach the preset recognition threshold, it is also difficult for the tachograph to determine the corresponding command. Therefore, the identification failure information is fed back to the vehicle driver to remind the vehicle driver.
According to the implementation method, the vehicle driver is reminded by feeding back the information of the identification failure of the vehicle driver, and the using effect of the user is improved.
Fig. 2 is a schematic structural diagram of a voice control system for a car recorder according to an embodiment of the present invention, which can execute the voice control method for a car recorder according to any of the above embodiments and is configured in a terminal.
The voice control system for the automobile data recorder provided by the embodiment comprises: a sound collection program module 11, a command word posterior probability determination program module 12, a joint probability determination program module 13, an effective control command word determination program module 14, and a control program module 15.
The sound collection program module 11 is configured to collect sounds in the vehicle in real time and generate corresponding audio frequencies; the command word posterior probability determination program module 12 is configured to extract an Fbank feature of the audio, analyze the Fbank feature through a built-in neural network, and determine the posterior probability of each command word in the control command word hit by the audio; the joint probability determination program module 13 is configured to determine a joint probability that the audio hits each control command word by filtering a posterior probability of each command word in the control command words; the effective control command word determining program module 14 is configured to use the control command word with the largest joint probability as an effective control command word; the control program module 15 is configured to obtain a preset identification threshold, and when the joint probability of the effective control command word reaches the preset identification threshold, correspond the audio to the effective control command word, and execute an operation corresponding to the effective control command word.
Further, the system further comprises: the filtering process includes: and (4) digital filtering.
Further, the joint probability determination program module is for:
taking the maximum value of the posterior probability of each command word in each control command word as the corresponding effective posterior probability of each command word in each control command word;
and multiplying the effective posterior probabilities of the command words in the control command words by two to determine the joint probability of the control command words.
Further, the line sound collection program module is further configured to:
the method comprises the steps of collecting sound in a vehicle in real time, and generating corresponding audio when the sound in the vehicle reaches a preset sound pressure level.
Further, the system is also configured to:
and when the joint probability of the effective control command words does not reach the preset identification threshold, the control command words corresponding to the audio cannot be determined, and identification failure information is fed back.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the voice control method for the automobile data recorder in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
collecting sound in a vehicle in real time to generate corresponding audio;
extracting Fbank characteristics of the audio, analyzing the Fbank characteristics through a built-in neural network, and determining the posterior probability of each command word in the control command words hit by the audio;
processing the posterior probability of each command word in the control command words through filtering, and determining the joint probability of the audio hitting each control command word;
taking the control command word with the maximum joint probability as an effective control command word;
and acquiring a preset identification threshold, and when the joint probability of the effective control command words reaches the preset identification threshold, corresponding the audio to the effective control command words and executing the operation corresponding to the effective control command words.
As a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the methods of testing software in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a voice control method for a tachograph in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a device of test software, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the means for testing software over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the voice control system comprises at least one processor and a memory which is in communication connection with the at least one processor, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute the steps of the voice control method for the automobile data recorder of any embodiment of the invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) Other electronic devices with processing functions.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A voice control method for a tachograph, comprising:
collecting sound in a vehicle in real time to generate corresponding audio;
extracting Fbank characteristics of the audio, analyzing the Fbank characteristics through a built-in neural network, and determining the posterior probability of each command word in the control command words hit by the audio; the built-in neural network is obtained by training in advance by using actual in-vehicle recording;
processing the posterior probability of each command word in the control command words through filtering, and determining the joint probability of the audio hitting each control command word;
taking the control command word with the maximum joint probability as an effective control command word;
acquiring a preset identification threshold, and when the joint probability of the effective control command words reaches the preset identification threshold, corresponding the audio to the effective control command words and executing the operation corresponding to the effective control command words;
the filtering process includes: digital filtering;
the digital filtering filters a posterior probability of each command word within the control command word, including:
taking the maximum value of the posterior probability of each command word in each control command word as the corresponding effective posterior probability of each command word in each control command word;
and multiplying the effective posterior probabilities of the command words in the control command words by two to determine the joint probability of the control command words.
2. The method of claim 1, wherein the tachograph captures sounds within the vehicle in real time, generating corresponding audio further comprises:
the method comprises the steps of collecting sound in a vehicle in real time, and generating corresponding audio when the sound in the vehicle reaches a preset sound pressure level.
3. The method of claim 1, wherein the method further comprises:
and when the joint probability of the effective control command words does not reach the preset identification threshold, the control command words corresponding to the audio cannot be determined, and identification failure information is fed back.
4. A voice control system for a tachograph, comprising:
the sound acquisition program module is used for acquiring sound in the vehicle in real time and generating corresponding audio;
the command word posterior probability determining program module is used for extracting Fbank characteristics of the audio, analyzing the Fbank characteristics through a built-in neural network and determining the posterior probability of each command word in the control command words hit by the audio; the built-in neural network is obtained by training in advance by using actual in-vehicle recording;
a joint probability determination program module for determining the joint probability of the audio hitting each control command word by filtering the posterior probability of each command word in the control command words;
an effective control command word determining program module, configured to use the control command word with the highest joint probability as an effective control command word;
the control program module is used for acquiring a preset identification threshold, corresponding the audio frequency to the effective control command words when the joint probability of the effective control command words reaches the preset identification threshold, and executing the operation corresponding to the effective control command words;
the filtering process includes: digital filtering;
the joint probability determination program module is to:
taking the maximum value of the posterior probability of each command word in each control command word as the corresponding effective posterior probability of each command word in each control command word;
and multiplying the effective posterior probabilities of the command words in the control command words by two to determine the joint probability of the control command words.
5. The system of claim 4, wherein the sound collection program module is further to:
the method comprises the steps of collecting sound in a vehicle in real time, and generating corresponding audio when the sound in the vehicle reaches a preset sound pressure level.
6. The system of claim 4, wherein the system is further configured to:
and when the joint probability of the effective control command words does not reach the preset identification threshold, the control command words corresponding to the audio cannot be determined, and identification failure information is fed back.
CN201811380932.8A 2018-11-20 2018-11-20 Voice control method and system for automobile data recorder Active CN109273003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811380932.8A CN109273003B (en) 2018-11-20 2018-11-20 Voice control method and system for automobile data recorder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811380932.8A CN109273003B (en) 2018-11-20 2018-11-20 Voice control method and system for automobile data recorder

Publications (2)

Publication Number Publication Date
CN109273003A CN109273003A (en) 2019-01-25
CN109273003B true CN109273003B (en) 2021-11-02

Family

ID=65189287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811380932.8A Active CN109273003B (en) 2018-11-20 2018-11-20 Voice control method and system for automobile data recorder

Country Status (1)

Country Link
CN (1) CN109273003B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110261816B (en) * 2019-07-10 2020-12-15 苏州思必驰信息科技有限公司 Method and device for estimating direction of arrival of voice

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679316A (en) * 2015-12-29 2016-06-15 深圳微服机器人科技有限公司 Voice keyword identification method and apparatus based on deep neural network
CN107578771A (en) * 2017-07-25 2018-01-12 科大讯飞股份有限公司 Audio recognition method and device, storage medium, electronic equipment
JP2018081294A (en) * 2016-11-10 2018-05-24 日本電信電話株式会社 Acoustic model learning device, voice recognition device, acoustic model learning method, voice recognition method, and program
CN108198566A (en) * 2018-01-24 2018-06-22 咪咕文化科技有限公司 Information processing method and device, electronic equipment and storage medium
CN108615526A (en) * 2018-05-08 2018-10-02 腾讯科技(深圳)有限公司 The detection method of keyword, device, terminal and storage medium in voice signal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679316A (en) * 2015-12-29 2016-06-15 深圳微服机器人科技有限公司 Voice keyword identification method and apparatus based on deep neural network
JP2018081294A (en) * 2016-11-10 2018-05-24 日本電信電話株式会社 Acoustic model learning device, voice recognition device, acoustic model learning method, voice recognition method, and program
CN107578771A (en) * 2017-07-25 2018-01-12 科大讯飞股份有限公司 Audio recognition method and device, storage medium, electronic equipment
CN108198566A (en) * 2018-01-24 2018-06-22 咪咕文化科技有限公司 Information processing method and device, electronic equipment and storage medium
CN108615526A (en) * 2018-05-08 2018-10-02 腾讯科技(深圳)有限公司 The detection method of keyword, device, terminal and storage medium in voice signal

Also Published As

Publication number Publication date
CN109273003A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN110956957B (en) Training method and system of speech enhancement model
CN112435684B (en) Voice separation method and device, computer equipment and storage medium
KR101610151B1 (en) Speech recognition device and method using individual sound model
KR100636317B1 (en) Distributed Speech Recognition System and method
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
US7613611B2 (en) Method and apparatus for vocal-cord signal recognition
CN108877823B (en) Speech enhancement method and device
CN110610707B (en) Voice keyword recognition method and device, electronic equipment and storage medium
CN108597505B (en) Voice recognition method and device and terminal equipment
US20130006633A1 (en) Learning speech models for mobile device users
CN110517670A (en) Promote the method and apparatus for waking up performance
CN103151039A (en) Speaker age identification method based on SVM (Support Vector Machine)
CN110910885B (en) Voice wake-up method and device based on decoding network
CN111667835A (en) Voice recognition method, living body detection method, model training method and device
CN110600008A (en) Voice wake-up optimization method and system
CN109065043B (en) Command word recognition method and computer storage medium
CN108364656B (en) Feature extraction method and device for voice playback detection
CN111179915A (en) Age identification method and device based on voice
CN111653283B (en) Cross-scene voiceprint comparison method, device, equipment and storage medium
CN111161746B (en) Voiceprint registration method and system
CN110689887B (en) Audio verification method and device, storage medium and electronic equipment
CN114067782A (en) Audio recognition method and device, medium and chip system thereof
CN109273003B (en) Voice control method and system for automobile data recorder
CN107977187B (en) Reverberation adjusting method and electronic equipment
CN112116909A (en) Voice recognition method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Voice control method and system for driving recorder

Effective date of registration: 20230726

Granted publication date: 20211102

Pledgee: CITIC Bank Limited by Share Ltd. Suzhou branch

Pledgor: Sipic Technology Co.,Ltd.

Registration number: Y2023980049433

PE01 Entry into force of the registration of the contract for pledge of patent right