CN110164431A - A kind of audio data processing method and device, storage medium - Google Patents
A kind of audio data processing method and device, storage medium Download PDFInfo
- Publication number
- CN110164431A CN110164431A CN201811361659.4A CN201811361659A CN110164431A CN 110164431 A CN110164431 A CN 110164431A CN 201811361659 A CN201811361659 A CN 201811361659A CN 110164431 A CN110164431 A CN 110164431A
- Authority
- CN
- China
- Prior art keywords
- detection
- speech
- default
- time point
- reset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The embodiment of the invention provides a kind of audio data processing method and devices, storage medium, this method comprises: obtaining speech detection model, the speech detection model is the audio data of at least one detection path with historical accumulation characteristic and the corresponding relationship of speech recognition result;Based on the quantity of at least one detection path described in detecting, references object is determined;The references object is to carry out the factor of reset operation judgement;Based on the references object, determine that reset time point, the reset time point are in the case where guaranteeing speech recognition performance, at the time of initializing the historical accumulation in the speech detection model;When the reset time point reaches, the speech detection model is reset.
Description
Technical field
The present invention relates in electronic application field speech recognition technology more particularly to a kind of audio data processing method and
Device, storage medium.
Background technique
With the development of intelligent sound box and its spin-off, it is man-machine between interactive voice, especially far field interactive voice, by
Gradually become the application function of an important human-computer interaction interface.
Currently, the interactive voice smart machine of electronic field is mainly intelligent sound box, for example, the intelligence with voice control function
It can the products such as TV or TV box.One or more generally can be all set in the similar products such as these interactive voice smart machines
Wake up word.It is illustrated by taking intelligent sound box as an example, when user says wake-up word to intelligent sound box and detects it by intelligent sound box
Afterwards, the voice data (audio data) that next user says is just given intelligent sound box when voice command, carries out speech recognition,
And then open the voice interactive function between man-machine.Generally use shot and long term memory unit model (LSTM, Long Short
Term Memory) the wake-up detection model that is used as wake up the detection of word.
However, an important feature due to LSTM is historical information characteristic of accumulation, i.e., speech recognition is carried out using LSTM
When, to the testing result of the one section of voice data voice data of word (for example, wake up) not only phase with this section of voice data itself
It closes, the also tremendous influence by the audio data before this section of voice data.Therefore, it in the detection for waking up word, unavoidably deposits
The false wake-up the problem of, and after the noise accumulation of a period of time, inspection of the accumulation of noise data to wake-up word later
It surveys performance to have an impact, so as to cause the accuracy rate decline for the speech recognition for waking up word.
Summary of the invention
The embodiment of the present invention provides a kind of audio data processing method and device storage medium, can be improved speech recognition
Accuracy rate.
The technical solution of the embodiment of the present invention is achieved in that
The embodiment of the present invention provides a kind of audio data processing method, comprising:
Speech detection model is obtained, the speech detection model is at least one detection path with historical accumulation characteristic
Audio data and speech recognition result corresponding relationship;
Based on the quantity of at least one detection path described in detecting, references object is determined;The references object be into
The factor of row reset operation judgement;
Based on the references object, determine that reset time point, the reset time point are to guarantee speech recognition performance
In the case of, at the time of initializing the historical accumulation in the speech detection model;
When the reset time point reaches, the speech detection model is reset.
The embodiment of the present invention provides a kind of audio-frequency data processing device, comprising:
Acquiring unit, for obtaining speech detection model, the speech detection model be with historical accumulation characteristic extremely
The audio data of a few detection path and the corresponding relationship of speech recognition result;
Determination unit determines references object for the quantity based at least one detection path described in detecting;It is described
References object is to carry out the factor of reset operation judgement;And it is based on the references object, determine reset time point, the resetting
Time point is in the case where guaranteeing speech recognition performance, at the time of initializing the historical accumulation in the speech detection model;
Reset cell, for resetting the speech detection model when the reset time point reaches.
In above-mentioned apparatus, the determination unit is also used to determine when the quantity of the detection path detected is one
The references object is current detection result;
Correspondingly, the acquiring unit, is also used to obtain audio data to be detected;And utilize the speech detection model pair
The audio data to be detected is identified, current detection result is obtained;
The determination unit determines current also particularly useful for when the current detection result meets default resetting thresholding
Time point is the reset time point;Wherein, it presets resetting thresholding and is more than or equal to default wake-up thresholding.
In above-mentioned apparatus, the determination unit is also used to when the quantity of the detection path detected is greater than one,
Determine that the references object is current point in time.
In above-mentioned apparatus, the acquiring unit is also used to the utilization speech detection model to described to be detected
Audio data is identified, after obtaining current detection result, obtains the history testing result before current point in time;
The determination unit is also used to when the variation range between the current detection result and the history testing result
When meeting default false wake-up range, determine that the current point in time is the reset time point.
In above-mentioned apparatus, at least one described detection path includes: backup detection path;
The acquiring unit is also used to obtain current point in time;
The determination unit is also used to when the current point in time reaches default preheating time point, when will be described current
Between point be determined as the reset time point of the backup detection path, wherein the default preheating time point is from when default resetting
Between point start before default preheating time section time point.
In above-mentioned apparatus, the reset cell, specifically for reaching default preheating time point when the current point in time
When, it resets and starts the backup detection path.
In above-mentioned apparatus, at least one described detection path further include: main detection path;The audio data processing dress
Set further includes recognition unit and closing unit;
The recognition unit is logical using the main detection after the resetting and starting the backup detection path
Road and the backup detection path carry out speech recognition;
The reset cell reaches the default resetting also particularly useful for after by the default preheating time section
When time point, the main detection path is reset;
The closing unit, for working as since the default reset time point using the default preheating time section
When, the backup detection path is closed,
The recognition unit is also used to carry out speech recognition using the main detection path.
In above-mentioned apparatus, the default reset time point is the time series for being spaced predetermined time period;
The predetermined time period is in the range of 2 times of section of default preheating time and default tolerance threshold wake-up value;
The default tolerance threshold wake-up value is in default optimal wake-up upper limit value and presets between best false wake-up lower limit value;
The default preheating time section is more than or equal to the default wake-up word duration.
In above-mentioned apparatus, the audio-frequency data processing device further includes receiving unit and integrated treatment unit;
The receiving unit, for receiving audio data to be detected;
The recognition unit is specifically used for carrying out voice knowledge to the audio data to be detected using the main detection path
Not, main testing result is obtained;And when the main testing result is greater than default wake-up thresholding, identify the audio to be detected
Data are to wake up word, start arousal function.
In above-mentioned apparatus, the audio-frequency data processing device further includes recognition unit;
The recognition unit, for described when the reset time point reaches, after resetting the speech detection model,
Speech recognition is carried out using the speech detection model after resetting.
In above-mentioned apparatus, the audio-frequency data processing device further includes integrated treatment unit;
The recognition unit, specifically in the speech detection based at least one direction branch, according to the resetting
Speech detection model afterwards carries out speech recognition at least one direction branch respectively, obtains at least one current detection result;
The integrated treatment unit is integrated for carrying out integrated treatment at least one described current detection result
Testing result;
The recognition unit is identified and is called out also particularly useful for when the comprehensive detection result is greater than default wake-up thresholding
Awake word, starts arousal function.
In above-mentioned apparatus, the reset cell is specifically used for initializing institute's predicate when the reset time point reaches
The data with historical accumulation characteristic in sound detection model, the speech detection model after being reset.
The embodiment of the present invention provides a kind of audio-frequency data processing device, comprising:
Memory, for storing executable audio data process instruction;
Processor when for executing the executable audio data process instruction stored in the memory, realizes the present invention
The audio data processing method that embodiment provides.
The embodiment of the present invention provides a kind of computer readable storage medium, is stored with executable audio data process instruction,
When for causing processor to execute, audio data processing method provided in an embodiment of the present invention is realized.
The embodiment of the present invention has the advantages that
The embodiment of the invention provides a kind of audio data processing method and devices, storage medium, by obtaining voice inspection
Model is surveyed, the speech detection model is that the audio data of at least one detection path with historical accumulation characteristic and voice are known
The corresponding relationship of other result;Based on the quantity of at least one detection path described in detecting, references object is determined;The reference
Object is the factor for carrying out reset operation judgement;Based on the references object, determine that reset time point, the reset time point are
In the case where guaranteeing speech recognition performance, at the time of initializing the historical accumulation in the speech detection model;Described heavy
When setting time point arrival, the speech detection model is reset.Using above-mentioned technic relization scheme, due to audio-frequency data processing device
For the quantity of at least one detection path of different phonetic detection model, the weight carried out in the speech detection model can be determined
The judgement of operation is set, so that further determination is reset time point based on references object, that is to say, that for speech detection
For the different detection paths of model, the judgement of respective reset time point can be realized by the determination of different references object, and
And the reset time point is to initialize the historical accumulation in the speech detection model in the case where guaranteeing speech recognition performance
At the time of, then in reset time point, if having reset speech detection model, speech detection model after resetting just there is no
Historical trace, in this way, speech detection model is under the premise of reset time promise wakes up performance, and not by prolonged history
In the case where the influence of characteristic of accumulation, wake up the accuracy rate that speech recognition can be improved when the speech recognition of word.
Detailed description of the invention
Fig. 1 is an optional structural schematic diagram of audio-frequency data processing system framework provided in an embodiment of the present invention;
Fig. 2 is an optional structural schematic diagram of terminal provided in an embodiment of the present invention;
Fig. 3 is an optional structural schematic diagram of audio-frequency data processing device provided in an embodiment of the present invention;
Fig. 4 is an optional flow diagram one of audio data processing method provided in an embodiment of the present invention;
Fig. 5 A is the illustrative schematic diagram of a scenario one for waking up word detection provided in an embodiment of the present invention;
Fig. 5 B is the illustrative schematic diagram of a scenario two for waking up word detection provided in an embodiment of the present invention;
Fig. 6 is the structure chart of illustrative LSTM memory unit provided in an embodiment of the present invention;
Fig. 7 is an optional flow diagram two of audio data processing method provided in an embodiment of the present invention;
Fig. 8 is an optional flow diagram three of audio data processing method provided in an embodiment of the present invention;
Fig. 9 is the speech recognition schematic diagram of a scenario of illustrative at least two detection path provided in an embodiment of the present invention;
Figure 10 is the relation curve for illustrative first time provided in an embodiment of the present invention waking up success rate and stand-by time;
Figure 11 is the time diagram of illustrative active and standby detection path provided in an embodiment of the present invention;
Figure 12 is an optional flow diagram four of audio data processing method provided in an embodiment of the present invention;
Figure 13 is the speech detection schematic diagram of a scenario one of illustrative multi-direction branch provided in an embodiment of the present invention;
Figure 14 is the speech detection schematic diagram of a scenario two of illustrative multi-direction branch provided in an embodiment of the present invention;
Figure 15 is the speech detection schematic diagram of a scenario three of illustrative multi-direction branch provided in an embodiment of the present invention;
Figure 16 is the speech detection schematic diagram of a scenario four of illustrative multi-direction branch provided in an embodiment of the present invention;
Figure 17 is illustrative speech recognition schematic diagram of a scenario one provided in an embodiment of the present invention;
Figure 18 is illustrative speech recognition schematic diagram of a scenario two provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into
It is described in detail to one step, described embodiment is not construed as limitation of the present invention, and those of ordinary skill in the art are not having
All other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention
The normally understood meaning of technical staff is identical.Term used herein is intended merely to the purpose of the description embodiment of the present invention,
It is not intended to limit the present invention.
Before the embodiment of the present invention is further elaborated, to noun involved in the embodiment of the present invention and term
It is illustrated, noun involved in the embodiment of the present invention and term are suitable for following explanation.
1) word is waken up, the keyword of the starting for interactive voice smart machine refers to starting in embodiments of the present invention
The corresponding voice signal of the keyword of audio-frequency data processing device.
2) feature extraction: by primitive character be converted to one group have obvious physical significance (Gabor, geometrical characteristic [angle point,
Invariant], texture [LBP HOG] etc.) or statistical significance or core feature.Feature extraction in embodiments of the present invention refers to
To the characteristic quantity for extracting important audio-frequency information in audio data.
Illustrate that the exemplary application for realizing the audio-frequency data processing device of the embodiment of the present invention, the embodiment of the present invention mention below
The audio-frequency data processing device of confession may be embodied as smart phone, tablet computer, laptop, interactive voice smart machine
Various types of user terminals with speech recognition or audio data processing function such as (for example, intelligent sound box), can also be with
It is embodied as a server, server here is the background service for running audio data processing function or speech identifying function application
Device.In the following, the exemplary application that terminal will be covered when illustrating that audio-frequency data processing device is embodied as terminal.
Show referring to the optional framework that Fig. 1, Fig. 1 are audio-frequency data processing systems 100 provided in an embodiment of the present invention
It is intended to, supports an exemplary application to realize, terminal 400 (illustrating terminal 400-1 and terminal 400-2) passes through net
Network 200 connects server 300, and network 300 can be wide area network or local area network, or be combination, using wireless
Link realizes data transmission.
Wherein, terminal 400, for obtaining speech detection model, speech detection model be with historical accumulation characteristic extremely
The audio data of a few detection path and the corresponding relationship of speech recognition result;Based at least one detection path detected
Quantity, determine references object;References object is to carry out the factor of reset operation judgement;Based on references object, when determining resetting
Between point, reset time point is the historical accumulation in the case where guaranteeing speech recognition performance, in initialization speech detection model
Moment;When reset time point reaches, speech detection model, the speech detection model after being reset are reset;After resetting
Speech detection model speech recognition is carried out to the audio data to be detected that gets, it is determined whether arousal function is carried out, when true
When being set to arousal function, brake audio data to be checked is received, detection function audio data is treated and carries out speech recognition, obtain function
Energy phonetic order, is sent to server 300 for function phonetic order.
Server 300, for according to function phonetic order, systematic function triggering command to be controlled according to function triggering command
Terminal 300 or other terminals realize the function that function phonetic order is triggered.
Audio-frequency data processing device provided in an embodiment of the present invention may be embodied as the mode of hardware or software and hardware combining,
Illustrate the various exemplary implementations of device provided in an embodiment of the present invention below.
Referring to fig. 2, Fig. 2 is 400 1 optional structural schematic diagrams of terminal provided in an embodiment of the present invention, and terminal 400 can
To be that mobile phone, computer, digital broadcast terminal, audio data transceiver, game console, tablet device, medical treatment are set
Standby, body-building equipment, personal digital assistant etc. are according to the structure of terminal 400, it is anticipated that audio-frequency data processing device is embodied as end
Exemplary structure when end, therefore structure as described herein is not construed as limiting, such as can be omitted portion described below
Subassembly, alternatively, adding the component do not recorded hereafter to adapt to the specific demand of certain applications.
Terminal 400 shown in Fig. 2 includes: at least one processor 410, memory 440, at least one network interface 420
With user interface 430.Various components in terminal 400 are coupled by bus system 450.It is understood that bus system 450
For realizing the connection communication between these components.Bus system 450 except include data/address bus in addition to, further include power bus,
Control bus and status signal bus in addition.But for the sake of clear explanation, various buses are all designated as bus system in Fig. 2
450。
User interface 430 may include display, keyboard, mouse, trace ball, click wheel, key, button, touch-sensitive plate or
Person's touch screen etc..
Memory 440 can be volatile memory or nonvolatile memory, may also comprise volatile and non-volatile
Both memories.Wherein, nonvolatile memory can be read-only memory (ROM, Read Only Memory), programmable
Read memory (PROM, Programmable Read-Only Memory), Erasable Programmable Read Only Memory EPROM (EPROM,
Erasable Programmable Read-Only Memory), flash memory (Flash Memory) etc..Volatile memory can be with
It is random access memory (RAM, Random Access Memory), is used as External Cache.By exemplary but not
It is restricted explanation, the RAM of many forms is available, such as static random access memory (SRAM, Static Random
Access Memory), synchronous static random access memory (SSRAM, Synchronous Static Random Access
Memory).The memory 440 of description of the embodiment of the present invention is intended to include the memory of these and any other suitable type.
Memory 440 in the embodiment of the present invention can storing data to support the operation of terminal 400.These data are shown
Example includes: any computer program for operating in terminal 400, such as operating system 442 and executable program 441.Wherein,
Operating system includes various system programs, such as ccf layer, core library layer, driving layer etc., for realizing various basic businesses with
And the hardware based task of processing.Executable program may include various application programs, such as executable audio data processing refers to
It enables.
As the example that audio data processing method provided in an embodiment of the present invention uses software and hardware combining to implement, the present invention
Audio data processing method provided by embodiment can be embodied directly in be combined by the software module that processor 440 executes, soft
Part module can be located in storage medium, and storage medium is located at memory 440, and processor 410 reads software mould in memory 440
The executable audio data process instruction that block includes, in conjunction with necessary hardware (e.g., including processor 440 and be connected to total
The other assemblies of line 450) complete audio data processing method provided in an embodiment of the present invention.
As an example, processor 410 can be a kind of IC chip, and the processing capacity with signal, for example, it is general
Processor, digital signal processor (DSP, Digital Signal Processor) or other programmable logic device are divided
Vertical door or transistor logic, discrete hardware components etc., wherein general processor can be microprocessor or any normal
The processor etc. of rule.
Illustratively, it the embodiment of the invention provides a kind of audio-frequency data processing device, includes at least:
Memory 440, for storing executable audio data process instruction;
Processor 410 when for executing the executable audio data process instruction stored in the memory 440, is realized
Audio data processing method provided in an embodiment of the present invention.
Illustrate the exemplary structure of software module below, in some embodiments, as shown in figure 3, audio data processing dress
Setting the software module in 1 may include: acquiring unit 10, determination unit 11 and reset cell 12;Wherein,
Acquiring unit 10, for obtaining speech detection model, the speech detection model is with historical accumulation characteristic
The audio data of at least one detection path and the corresponding relationship of speech recognition result;
Determination unit 11 determines references object for the quantity based at least one detection path described in detecting;Institute
Stating references object is to carry out the factor of reset operation judgement;And it is based on the references object, and determine reset time point, it is described heavy
Set time point be in the case where guaranteeing speech recognition performance, initialize the historical accumulation in the speech detection model when
It carves;
Reset cell 12, for resetting the speech detection model when the reset time point reaches.
In some embodiments of the invention, the determination unit 11 is also used to the quantity when the detection path detected
When being one, determine that the references object is current detection result.
In some embodiments of the invention, the determination unit 11 is also used to the quantity when the detection path detected
When for greater than one, determine that the references object is current point in time.
In some embodiments of the invention, the acquiring unit 10, is also used to obtain audio data to be detected;And it utilizes
The speech detection model identifies the audio data to be detected, obtains current detection result;
The determination unit 11, also particularly useful for when the current detection result meets default resetting thresholding, determination is worked as
Preceding time point is the reset time point;Wherein, it presets resetting thresholding and is more than or equal to default wake-up thresholding.
In some embodiments of the invention, the acquiring unit 10 is also used to the utilization speech detection model
The audio data to be detected is identified, after obtaining current detection result, obtains the history inspection before current point in time
Survey result;
The determination unit 11 is also used to when the variation model between the current detection result and the history testing result
When enclosing the default false wake-up range of satisfaction, determine that the current point in time is the reset time point.
In some embodiments of the invention, at least one described detection path includes: backup detection path;
The acquiring unit 10, is also used to obtain current point in time;
The determination unit 11 is also used to when the current point in time reaches default preheating time point, will be described current
Time point is determined as the reset time point of the backup detection path, wherein the default preheating time point is from default resetting
Time point start before default preheating time section time point.
In some embodiments of the invention, the reset cell 12, specifically for reaching pre- when the current point in time
If when preheating time point, resetting and starting the backup detection path.
In some embodiments of the invention, at least one described detection path further include: main detection path;The audio
Data processing equipment 1 further includes recognition unit 13 and closing unit 14;
The recognition unit 13, after the resetting and starting the backup detection path, using the main detection
Access and the backup detection path carry out speech recognition;
The reset cell 12 reaches the default weight also particularly useful for after by the default preheating time section
When setting time point, the main detection path is reset;
The closing unit 14, for working as since the default reset time point using the default preheating time section
When, the backup detection path is closed,
The recognition unit 13 is also used to carry out speech recognition using the main detection path.
In some embodiments of the invention, the default reset time point is the time sequence for being spaced predetermined time period
Column;
The predetermined time period is in the range of 2 times of section of default preheating time and default tolerance threshold wake-up value;
The default tolerance threshold wake-up value is in default optimal wake-up upper limit value and presets between best false wake-up lower limit value;
The default preheating time section is more than or equal to the default wake-up word duration.
In some embodiments of the invention, the audio-frequency data processing device 1 further includes receiving unit 15 and General Office
Manage unit 16;
The receiving unit 15, for receiving audio data to be detected;
The recognition unit 13 is specifically used for carrying out voice to the audio data to be detected using the main detection path
Identification, obtains main testing result;And when the main testing result is greater than default wake-up thresholding, identify the acoustic to be checked
Frequency starts arousal function according to wake up word.
In some embodiments of the invention, the audio-frequency data processing device 1 further includes recognition unit 13;
The recognition unit 13, for described when the reset time point reaches, reset the speech detection model it
Afterwards, speech recognition is carried out using the speech detection model after resetting.
In some embodiments of the invention, the audio-frequency data processing device 1 further includes integrated treatment unit 16;
The recognition unit 13, specifically in the speech detection based at least one direction branch, according to described heavy
The speech detection model postponed carries out speech recognition at least one direction branch respectively, obtains at least one current detection knot
Fruit;
The integrated treatment unit 16 obtains comprehensive for carrying out integrated treatment at least one described current detection result
Close testing result;
The recognition unit 13 is identified also particularly useful for when the comprehensive detection result is greater than default wake-up thresholding
Word is waken up, arousal function is started.
In some embodiments of the invention, the reset cell 12 is specifically used for when the reset time point reaches,
Initialize the data with historical accumulation characteristic in the speech detection model, the speech detection model after being reset.
In practical applications, the acquiring unit 10, the determination unit 11, the reset cell 12, the identification are single
Member 13, closing unit 14 and the integrated treatment unit 16 can be realized by processor, and receiving unit 15 can then be connect by user
It mouthful realizes, the embodiment of the present invention is with no restriction.
As the example that audio data processing method provided in an embodiment of the present invention uses hardware to implement, the embodiment of the present invention
The processor 410 of hardware decoding processor form can be directly used to execute completion in provided audio data processing method,
For example, by one or more application specific integrated circuit (ASIC, Applic ation Specific Integrated
Circuit), DSP, programmable logic device (PLD, Programmabl e Logic Device), complicated programmable logic device
Part (CPLD, Complex Programmable Logi c Device), field programmable gate array (FPGA, Field-
Programmable Gate Array) or other electronic components execution realization audio data processing provided in an embodiment of the present invention
Method.
Below in conjunction with it is above-mentioned realize the embodiment of the present invention audio-frequency data processing device exemplary application and implementation,
Illustrate the audio data processing method for realizing the embodiment of the present invention.
Referring to fig. 4, Fig. 4 is an optional process signal of audio data processing method provided in an embodiment of the present invention
The step of scheming, showing in conjunction with Fig. 4 is illustrated.
S101, speech detection model is obtained, speech detection model is that at least one detection with historical accumulation characteristic is logical
The audio data on road and the corresponding relationship of speech recognition result.
S102, the quantity based at least one detection path detected, determine references object;References object is to carry out weight
Set the factor of operation judges.
S103, it is based on references object, determines that reset time point, reset time point are the case where guaranteeing speech recognition performance
Under, at the time of initializing the historical accumulation in speech detection model.
S104, reset time point reach when, reset speech detection model.
A kind of audio data processing method provided in an embodiment of the present invention is applied in speech detection or speech recognition
It in scene, such as wakes up word and detects scene, the embodiment of the present invention is with no restriction.
The example of audio data processing method provided in an embodiment of the present invention is carried out for waking up word detection scene below
Property explanation.
In embodiments of the present invention, it is waking up in word detection model as shown in Figure 5A, audio-frequency data processing device connects in real time
Audio data to be detected is received, and received audio data to be detected is input to and wakes up word detection model (i.e. speech detection model)
In identified, finally output wake up word testing result, determine whether to wake up audio-frequency data processing device according to testing result.
Illustratively, audio data to be detected can be continuous signal (continued time domain signal or the continuous frequency of monophonic
Domain signal, the embodiment of the present invention is with no restriction), the continuous signal of the monophonic, which is often admitted to as unit of frame, wakes up word detection
Model.Word detection model is waken up after obtaining each frame input continuous signal, detection/judgement in newest T time window whether
There is predefined wake-up word, that is, identifies whether as preset wake-up word.Finally, from word detection model is waken up by frame output detection
As a result.
It should be noted that in embodiments of the present invention, not limiting the output form of testing result, can be obtained to be specific
Point, two kinds of form of identification of word, such as binary representation or text results expression etc., the present invention can be waken up for yes or no
Embodiment is with no restriction.
Illustratively, testing result uses binary representation, and output 1, which is characterized in T time window, detects wake-up word;It is defeated
Out 0: wake-up word is not detected in T time window.
In embodiments of the present invention, it based in the scene of similar speech recognition shown in Fig. 5 A, proposes to speech detection
The method on the opportunity that model is reset, in order to the speech detection model after preferably being reset using detection or recognition effect
During carrying out subsequent audio data detection, it may remain in the higher level of recognition accuracy.
Here, audio-frequency data processing device carries out speech recognition, speech detection model here using speech detection model
For the audio data of at least one detection path with historical accumulation characteristic and the corresponding relationship of speech recognition result.Audio number
It needs first to obtain speech detection model according to processing unit, due to the access for carrying out speech recognition detection in speech detection model
It can be one or more, therefore, which needs first to carry out speech detection model the inspection of detection path
It surveys, after detecting at least one detection path, the quantity based at least one detection path determines different detection paths respectively
In the case where corresponding references object;Wherein, references object is to carry out the factor of reset operation judgement, is guaranteed according to resetting
When the determining reset time point of judgement carries out model resetting, the data or the spy that wake up the accuracy of word detection can be kept
Property.So, after having obtained references object, audio-frequency data processing device can be corresponding based on 2 detection path not of the same race
References object determines the reset time point in the case of detection path not of the same race, wherein reset time point is to guarantee voice knowledge
In the case where other performance, at the time of initializing the historical accumulation in speech detection model.When reset time point reaches, language is reset
Sound detection model, the speech detection model after being reset.
In some embodiments of the invention, specific reset process in embodiments of the present invention are as follows: in reset time point
When arrival, the data with historical accumulation characteristic in speech detection model, the speech detection model after being reset are initialized.
In some embodiments of the invention, when the quantity of the detection path detected is one, references object is determined
For current detection result.
In some embodiments of the invention, when the quantity of the detection path detected is greater than one, reference is determined
Object is current point in time.
That is, in embodiments of the present invention, the case where detection path of the same race, can not be divided into a detection path
The case where situation and at least two (being greater than one) detection paths.In the case where a detection path, audio data processing
Device is the judgement that the reset time point of resetting speech detection model is carried out based on current detection result;And at least two inspections
In the case where surveying access, audio-frequency data processing device is the reset time that resetting speech detection model is carried out based on current point in time
The judgement of point, detailed is to carry out reset time point according to current point in time and the reset time condition pre-set
Judgement, it will be described in detail in the embodiment below.
It is understood that since audio-frequency data processing device is logical at least one detection of different phonetic detection model
The quantity on road can determine the judgement for carrying out the reset operation in the speech detection model, thus based on references object again into one
The determination of step is reset time point, that is to say, that for the different detection paths of speech detection model, can pass through difference
The judgement of respective reset time point is realized in the determination of references object, and the reset time point is to guarantee speech recognition performance
In the case where, at the time of initializing the historical accumulation in the speech detection model, then having reset language in reset time point
If sound detection model, just there is no historical traces for the speech detection model after resetting, in this way, speech detection model is in resetting
Between under the premise of promise wakes up performance, and in the case where not influenced by prolonged historical accumulation characteristic, carry out wake-up word
Speech recognition when can improve the accuracy rate of speech recognition.
In some embodiments of the invention, after S104, audio-frequency data processing device is resetting speech detection model
Later, so that it may speech recognition be carried out using the speech detection model after resetting, the identification of the testing result obtained in this way is quasi-
True rate will be fine.
It should be noted that in embodiments of the present invention, speech detection model is that the voice with historical accumulation characteristic is known
Other model, for example, LSTM.
In embodiments of the present invention, LSTM is a kind of time recurrent neural network, can selectively remember historical information and (go through
History characteristic of accumulation).It is further improved on the basis of RNN model, using the hidden layer in LSTM unit replacement RNN network
Node just then forms LSTM.
((Memory Cell, Cell) (i.e. core door) state is controlled the memory unit of LSTM unit by 3 doors, i.e., defeated
It gets started (inputgate), forget door (forgetgate) and out gate (outputgate).
Wherein, current data is selectively input to memory unit by input gate;Door regulation historical information is forgotten to current
The influence of memory unit state value;Out gate exports memory unit state value for selectivity.3 doors and independent memory unit
Design has LSTM unit and saves, reads, resets and update the effect of long range historical information.Illustratively, as shown in Figure 6
For the structure of a LSTM memory unit Cell.
Firstly, t moment input feature vector xtWith t-1 moment hidden layer variable ht-1, in transferring weights matrix W and U, and partially
Under the collective effect for setting vector b, the quantity of state i of t moment is generatedt、ftAnd ot, see formula (1) to formula (3).Further in t-1
Moment core door state amount ct-1Auxiliary under, generate t moment core door state amount ct, see formula (4).Finally, in t moment core
Ostium quantity of state ctWith out gate quantity of state otUnder the action of, generate t moment hidden layer variable ht, and then influence t+1 moment LSTM
The interior change of neuron is shown in formula (5).
it=σ (Wixt+Uiht-1+bi) (1)
ft=σ (Wfxt+Ufht-1+bf) (2)
ot=σ (Woxt+Uoht-1+bo) (3)
ct=ft*ct-1+it*φ(Wcxt+Ucht-1+bc) (4)
ht=ot*φ(ct) (5)
Wherein, two kinds of nonlinear activation functions are respectivelyWith φ (xt)=tanh (xt)。
it、ft、otAnd ctIt respectively indicates the input gate state value of t moment, forget gate-shaped state value, out gate state value and core
Ostium state value.In embodiments of the present invention, for each logic gate, Wi、Wf、WoAnd WcIt respectively indicates input gate, forget
Transferring weights matrix corresponding to door, out gate and core door;Ui、Uf、UoAnd UcIt respectively represents input gate, forget door, out gate
With t-1 moment hidden layer variable h corresponding to core doort-1Corresponding transferring weights matrix, bi、bf、boAnd bcThen represent input
Bias vector corresponding to door, forgetting door, out gate and core door.
Illustratively, since LSTM has historical trace (can be understood as historical accumulation characteristic), to audio to be detected
Data carry out speech detection either speech recognition when, will receive the influence of history detection data and output test result,
And historical trace is limited, and exists it is thus impossible to unconfined, also, in the time span existing for historical trace,
With the growth of the stand-by time of time audio-frequency data processing device, false wake-up performance will be higher and higher, i.e. false wake-up probability
Increasing, the reset time point in the embodiment of the present invention is exactly the time point being arranged in the finite time of this historical trace,
Speech detection model is reset in reset time point, the wake-up performance of the speech detection model after resetting is with regard to fine.Tool
The reset process of body is exactly audio-frequency data processing device in reset time point, has history note to what is stored in speech detection model
The data recalled have carried out initialization cleaning, so that the speech detection model after resetting is no longer influenced by standby interior historical trace for a long time
Influence.
In some embodiments of the invention, when the quantity of the detection path detected is one, references object is to work as
Preceding testing result, the reset process that audio-frequency data processing device carries out model during speech recognition are referring to Fig. 7, Fig. 7
One optional flow diagram of audio data processing method provided in an embodiment of the present invention can also be held after S102
Row S201-S205.It is as follows:
S201, audio data to be detected is obtained.
S202, audio data to be detected is identified using speech detection model, obtains current detection result.
S203, when current detection result meets default resetting thresholding, determine current point in time for resetting time point.
Wherein, it presets resetting thresholding and is more than or equal to default wake-up thresholding.
In embodiments of the present invention, when the quantity of the detection path detected is one, references object is current detection
As a result, audio-frequency data processing device is exactly the resetting for carrying out model during speech recognition.
In S201, audio-frequency data processing device is to obtain or receive in real time audio data to be detected.
In embodiments of the present invention, due to obtaining in real time, audio data to be detected may be receive it is outer
Noise or noise in boundary, it is also possible to receive the continuous signal of user or the input of other sounding device, the embodiment of the present invention
With no restriction.
In S202, audio-frequency data processing device is after receiving audio data to be detected, due to audio data processing
Speech detection model is provided with when in device, therefore, which can use speech detection model pair
Audio data to be detected carries out speech recognition, then, exports current detection result.
In embodiments of the present invention, audio-frequency data processing device carries out the process of speech detection to audio data to be detected
In, which needs that audio data to be detected is first carried out audio feature extraction, and the audio frequency characteristics are defeated
Enter into speech detection device, to output current detection result.
In some embodiments of the invention, the mode of feature extraction includes: SPP feature extraction, mel-frequency cepstrum system
Number feature etc., the embodiment of the present invention is with no restrictions.
It should be noted that the testing result in the embodiment of the present invention can be score, or identification information (example
Such as, 0,1) etc., the embodiment of the present invention is with no restriction.
In S203, presetting resetting thresholding is and the consistent numerical value of current detection result type, that is to say, that default resetting
Thresholding is can be with the data compared with current detection result.In embodiments of the present invention, audio-frequency data processing device is by current detection
As a result it is compared with default resetting thresholding, when current detection result meets default resetting thresholding, characterization can at this time can be with
The resetting of speech detection model is carried out, then, audio-frequency data processing device obtains current point in time, determines that current point in time is
Reset time point.Wherein, it presets resetting thresholding and is more than or equal to default wake-up thresholding.
In embodiments of the present invention, the numerical lower limits of speech detection model resetting can be carried out by default resetting thresholding characterization
Value, or characterization can carry out the numberical range of speech detection model resetting;When current detection result meets speech detection model
The numerical lower limits value of resetting, or belong to speech detection model resetting numberical range when, characterization can carry out voice inspection
Survey the resetting of model.
It should be noted that in embodiments of the present invention, when applied in wake-up word detection scene as shown in Figure 5 B, sound
Frequency data processing equipment obtains audio data to be detected, using wake-up word detection model (speech detection model) to audio to be detected
Data are identified, obtain current detection as a result, carrying out resetting judgement according to current detection result, when current detection result meets
When default resetting thresholding, determine that current point in time for resetting time point, wake up in reset time point the weight of word detection algorithm
It sets.
Default resetting thresholding is centainly greater than default wake-up word limit.The default thresholding that wakes up is determined based on testing result
It can carry out the threshold value of audio-frequency data processing device arousal function.
It should be noted that in embodiments of the present invention, current detection result had not only been needed for carrying out resetting judgement, but also need
To be used to carry out wake-up judgement.
In embodiments of the present invention, when current detection result is more than default resetting thresholding, resetting speech detection model (is waken up
Word detection algorithm).It is understood that reset operation is always when default resetting thresholding selection is more than or equal to wake-up thresholding
It follows after waking up judgement, can thus evade and carry out reset problem among wake-up word detection, so as to cause speech recognition
There is the problem of mistake, accuracy rate.
Illustratively, audio-frequency data processing device carries out speech detection to audio data 1, show that testing result is 85 points,
And default resetting thresholding is 90 points, and presetting and waking up thresholding is 80 points, that is to say, that in current detection, audio data processing
Device, which meets, wakes up judgement, is waken up, and is unsatisfactory for resetting thresholding, does not need the resetting for carrying out speech detection model, still, if inspection
When survey result is 95 points, the numerical value for the testing result that speech detection model inspection goes out is slowly to finally obtain toward going up
95 points, then just having carried out waking up judgement when testing result rises to 80 points, audio-frequency data processing device is waken up, so
After continue to increase the resetting for judging until more than 90 points to need to carry out speech detection model, and at this moment, have been completed wake-up
Judgement;If default resetting thresholding is less than default wake-up thresholding, when wake-up condition does not reach also, with regard to carrying out voice always
There is the case where accidentally resetting in the resetting of detection model, and avoids the problem of may being reset in waking up word detection.
It should be noted that default resetting thresholding is consistent with the default type of thresholding that wakes up, the restriction sheet of specific numerical value
Inventive embodiments are with no restriction.
It should be noted that the setting of above-mentioned reset time point be applied best to user need to carry out in a short time it is more
In the usage scenario of secondary wake operation.
In embodiments of the present invention, if user needs to carry out multiple wake operation in a short time, at audio data
Manage score (the current detection knot that device carries out the output of speech detection model to the wake-up word (audio data to be detected) of the user
Fruit) it is successfully more than after default resetting thresholding is primary, next wake operation or wake-up judgement will all obtain optimal wake-up
Performance response (because the wake-up performance every time after resetting is all optimal);It is next meanwhile because having reset speech detection model
Waking up word will be easier to obtain higher score, and high score promotes the purpose for easily reaching default resetting thresholding again, i.e., more
It is easy again resetting of the triggering to speech detection model.
Meanwhile in terms of false wake-up, if default resetting thresholding is sufficiently high (to be more than or equal to default used by waking up wake up
Thresholding), then speech detection model is led to the probability very little of resetting in audio-frequency data processing device in standby by noise;And
And because speech detection model is far longer than i.e. often from the expectation mean value for the time span being initialised between first time false wake-up
After primary resetting, therefore the time before false wake-up performance reaches optimum state by noise false wake-up or misses the probability reset
It is very low, so, it, will not be to sound even if audio-frequency data processing device is by noise false wake-up and accidentally resets in standby
The wake-up performance of frequency data processing equipment has obvious damage, can also improve the accuracy rate of speech recognition such as wake operation.
History testing result before S204, acquisition current point in time.
S205, false wake-up range is preset when the variation range between current detection result and the history testing result meets
When, determine current point in time for resetting time point.
In S204, audio-frequency data processing device is to obtain audio data to be detected in real time, therefore, the audio detection
Device can be with real-time perfoming speech detection or speech recognition, then audio-frequency data processing device is available to many inspections
Survey result.So before current point in time, speech detection many times is had been carried out in audio-frequency data processing device, because
This, which is the available history testing result to before current point in time.
Illustratively, audio-frequency data processing device is before time t, and 50 of 50 speech detections before acquisition time t
A history testing result.
In some embodiments of the invention, default before audio-frequency data processing device can also obtain current point in time
All testing results in period, as history testing result, the concrete implementation mode embodiment of the present invention is with no restriction.
In S205, audio-frequency data processing device can be based on current detection result and history testing result, count
So in repeated detection result, whether very violent or variation is big for the variation of testing result, when changing greatly for testing result, and
When quickly acutely to decline, it is necessary to carry out the resetting of speech detection model, that is to say, that when current detection result and history
When variation range between testing result meets default false wake-up range, determine that current point in time for resetting time point, is being reset
It is further continued for carrying out speech recognition or speech detection after the resetting of time point progress speech detection model.
Wherein, it presets false wake-up range and just characterizes the numberical range that testing result acutely declines, false wake-up within the scope of this
Probability it is just very high.
It should be noted that resetting speech detection model when quick and violent decline occurs in testing result.It can be with
Understand, the testing result of the speech detection model with historical trace in normal noise, (concentrate in training data by speech detection model
Contained the noise of respective type) under generally only will be slow small size decline, often only very noisy or speech detection model
Can just cause when the noise type that do not met in the training process appears in speech detection the quick of testing result and
Significantly decline, and then wakes up performance in the period after causing and obviously deteriorate;Therefore, when audio-frequency data processing device detects
To testing result this variation when reset speech detection model, so that it may avoid the above problem (in this section), while can't
To the wake-up performance under commonly used scene, false wake-up performance and memory and operand have a significant effect, and also improve wake-up
Accuracy rate.
It should be noted that in embodiments of the present invention, S203 and S204-S205 are two kinds of realizations optional after S202
The step of mode, audio-frequency data processing device can be executed according to the actual situation after S202, the embodiment of the present invention does not limit
System.
In some embodiments of the invention, when the quantity of the detection path detected is greater than one, references object
For current point in time, detection path at this moment includes: backup detection path and main detection path;Audio-frequency data processing device is in language
The reset process of progress model is audio data processing side provided in an embodiment of the present invention referring to Fig. 8, Fig. 8 during sound identifies
S301-S306 can also be performed after S102 in one optional flow diagram of method.It is as follows:
S301, current point in time is obtained.
S302, when current point in time reaches default preheating time point, current point in time is determined as backup detection path
Reset time point, wherein default preheating time point be since default reset time point before section of default preheating time
Time point.
S303, when current point in time reaches default preheating time point, reset and start backup detection path.
S304, speech recognition is carried out using main detection path and backup detection path.
S305, after by default preheating time section, when reaching the default reset time point, it is logical to reset main detection
Road.
S306, when since default reset time point using default preheating time section, close backup detection path, adopt
Speech recognition is carried out with main detection path.
In embodiments of the present invention, when the quantity of the detection path detected is greater than one, references object is current
Time point, and detection path includes: backup detection path and main detection path;Wherein, backup detection path and main detection path
The number embodiment of the present invention all with no restriction.
Illustratively, speech detection process schematic diagram shown in Figure 9, in audio-frequency data processing device, with one
It is illustrated for main detection path and a backup detection path, is provided among main detection path and backup detection path
Resetting and starting controller, the resetting and starting controller are used to control the resetting of main detection path, and control backup detection
The resetting and starting of access.Audio data to be detected can be detected after main detection path and backup detection path
As a result (main testing result and backup testing result), finally, final by being exported again after the progress integrated treatment of all testing results
Testing result, i.e., total testing result.
In embodiments of the present invention, references object is current point in time, specifically, audio-frequency data processing device is to be based on working as
Preceding time point and preset time condition carry out the determination of reset time point.
Wherein, the time parameter in preset time condition includes default reset time point, presets optimal wake-up upper limit value, is pre-
If best false wake-up lower limit value, default preheating time section and default wake-up word duration.Wherein, preset preheating time point be from
Preset the time point of the section of default preheating time before reset time point starts.
In this way, audio-frequency data processing device is after obtaining current point in time, when current point in time reaches default preheating time
When point, current point in time is determined as to the reset time point of backup detection path.When current point in time reaches default preheating time
When point, resets and start backup detection path.Speech recognition is carried out using main detection path and backup detection path.When by pre-
If when reaching the default reset time point, resetting main detection path after preheating time section.It is opened when from default reset time point
When beginning using default preheating time section, backup detection path is closed, speech recognition is carried out using main detection path.
Wherein, the time parameter of preset time condition meets:
Default reset time point is the time series for being spaced predetermined time period;
Predetermined time period is in the range of 2 times of section of default preheating time and default tolerance threshold wake-up value;
Default tolerance threshold wake-up value in default optimal wake-up upper limit value and is preset between best false wake-up lower limit value;
Default preheating time section is more than or equal to the default wake-up word duration.
It should be noted that for the speech detection model with historical accumulation characteristic, in waking up detection scene,
Success rate is waken up as the time changes.
Illustratively, first time as shown in Figure 10 wakes up the relation curve of success rate and stand-by time, at audio data
It manages device and meets t >=T in standby (the wake-up word for being not received by user) time t0After, next first time or several leading secondary
The wake-up success rate of wake operation will be substantially reduced.Wake up size and stand-by time section t of the amplitude depending on t of reduced performance
The intensity and feature of interior ambient noise.Wherein, T0The history for representing wake-up word detection algorithm (i.e. speech detection model) is insensitive
The lower limit value of time, i.e., default optimal wake-up upper limit value.As t≤T0When, waking up success rate will not decreased significantly (if should be to
The noise data feature used when the feature and model training of the ambient noise in the machine period does not have too many differences).T0Value
Data configuration when depending on model training.And the historical trace duration for waking up word detection algorithm is often limited, and is denoted as T1
(presetting best false wake-up lower limit value), value by the algorithm speech detection model model structure and tuning parameter determine,
Data more than the historical accumulation of the duration will not have an impact that (or the influence is small to the current results for waking up word detection algorithm
To can ignore).
Therefore, in embodiments of the present invention, t≤T1When for false wake-up performance reach it is optimal before.
In embodiments of the present invention, when the wake operation of user is random distribution in time, and front and back wakes up behaviour twice
Interval time (predetermined time period) is longer between work, then needs to carry out reset operation in the standby state to guarantee to use next time
Standby time t before the wake operation at family meets t≤T1。
Illustratively, as shown in figure 11, under audio-frequency data processing device standby mode, in { t1-K,t2-K,t3-K,…}
Moment, resetting and starting controller can wake-up word detection algorithm to backup detection path initiate to reset and start-up operation.Backup
After the detection module of detection path receives resetting and start command, the data of its internal historical accumulation are removed, and start to receive
The audio data to be detected of input.K is referred to as default preheating time section herein, and K needs to wake up the duration more than or equal to default
τ: K >=τ improves the accuracy rate for waking up word detection to guarantee that backup detection path can be correctly detecting wake-up word.
Also, the wake-up of resetting and starting controller module every the D time (default reset time point) to main detection path
Word detection module initiates reset operation.
Wherein, D can be one less than T1Constant be also possible to the random number regenerated every time.
In embodiments of the present invention, default reset time point is denoted as: { t1,t2,t3,…}.The selection of reset time point needs
Meet formula (1):
2K<ti+1-ti≤T2 (6)
K is default preheating time section, T2The tolerable performance fall time selected when designing for system, meet T0≤T2
≤T1。
In { t1+K,t2+K,t3+ K ... } moment, resetting and starting controller can wake-up word to backup detection path detect
Algorithm sending is ceased and desisted order, and backup detection path is out of service or closes.
Wherein, the runing time of detection path is backed up from ti- K arrives ti+K。
It is understood that if tiIt is exactly within the scope of the audio data that some wakes up word, then at least backup detection
Access can receive the complete audio data for waking up word, realize the detection for waking up word, improves and wakes up the accurate of word detection
?.Meanwhile as long as meeting formula (7)
T0/2≥K≥τ (7)
So, in ti- K arrives tiThe wake-up word occurred in+K the period, will all obtain the optimal wake-up of backup detection path
The response of performance reaches the accuracy of optimal wake-up word detection.
It should be noted that in embodiments of the present invention, backing up the initial state of detection path to close, only default pre-
Just start when hot time point reaches.
In some embodiments of the invention, the audio-frequency data processing device in S304 carries out the detailed process of speech recognition
To receive audio data to be detected;Voice is carried out to audio data to be detected respectively using main detection path and backup detection path
Identification, obtains main testing result and backup testing result;Integrated treatment is carried out to main testing result and backup testing result, is obtained
Total testing result;When total testing result is greater than default wake-up thresholding, audio data to be detected is identified to wake up word, starting is called out
Awake function.
In some embodiments of the invention, the audio-frequency data processing device in S306 carries out the detailed process of speech recognition
Are as follows: receive audio data to be detected;Speech recognition is carried out to audio data to be detected using main detection path, obtains main detection knot
Fruit;When main testing result is greater than default wake-up thresholding, audio data to be detected is identified to wake up word, starts arousal function.
In embodiments of the present invention, audio-frequency data processing device is when starting backup detection path, main detection path
Speech detection is all carried out with backup detection path, therefore, available main testing result and backup testing result, then audio number
According to processing unit can the comprehensive detection based on main testing result and backup testing result as a result, i.e. total testing result is called out
It wakes up and judges.Also, for audio-frequency data processing device when out of service or closing backup detection path, main detection path is all
Carry out speech detection, therefore, available main testing result, then audio-frequency data processing device can be based on main testing result
Wake up and judges.In this way, the raising of the accuracy rate based on speech recognition, wakes up accuracy rate and also improves
In embodiments of the present invention, the wake-up word testing result of comprehensive main detection path and backup detection path, General Office
After reason, total testing result is exported.
Illustratively, a kind of realization of simple testing result integrated treatment is: when backup detection path is not run
(ti-1+ K~ti- K), using only the main testing result of main detection path;(the t when primary path and backup access are run simultaneouslyi- K~
ti+ K) use the higher person in testing result in main detection path and backup detection path.Assuming that main testing result is z (t), it is standby
Part testing result is b (t), and total testing result is s (t), i.e. formula (8) after integrated treatment:
S (t)=z (t), t ∈ (ti-1+ K~ti-K)
S (t)=maxz,b(z(t),b(t)),t∈(ti- K~ti+K) (8)
It should be noted that integrated treatment can also be mean operation, geometric average or weighting algorithm etc., present invention implementation
Example is with no restriction.
In embodiments of the present invention, audio-frequency data processing device is after getting total testing result, so that it may call out with default
Awake thresholding is made comparisons, and wake up and is adjudicated.
In some embodiments of the invention, the optimized integration of the speech detection model resetting based on previous embodiment description
On, it is an optional flow diagram of audio data processing method provided in an embodiment of the present invention referring to Figure 12, Figure 12, figure
12 show after S104, and audio-frequency data processing device can carry out speech recognition using the speech detection model after resetting.Tool
S105-107 can also be performed in body realization.It is as follows:
S105, in the speech detection based at least one direction branch, according to the speech detection model after resetting to extremely
A few direction branch carries out speech recognition respectively, obtains at least one current detection result.
S106, integrated treatment is carried out at least one current detection result, obtains comprehensive detection result.
S107, when comprehensive detection result be greater than it is default wake up thresholding when, identify wake-up word, start arousal function.
In embodiments of the present invention, there can be the speech detection framework of multiple directions branch, previous embodiment describes all
It is the speech detection model framework on a direction.
In some embodiments of the invention, the speech detection framework of multiple directions branch (at least one direction) can lead to
It crosses microphone array microphone array signals are distributed in different directions branch, can be transmitted after audio data input to be detected
In the speech detection of multi-direction branch, each direction branch can speech detection obtain a testing result, then, by multi-party
At least one testing result (as shown in figure 13) will be obtained to the speech detection of branch.
In embodiments of the present invention, each direction branch road is both provided with single-channel voice detection model, the single channel language
Sound detection model is exactly speech detection model described in above example.
Therefore, audio-frequency data processing device is in the speech detection based at least one direction branch, after resetting
Speech detection model (the single-channel voice detection model after resetting) carries out speech recognition at least one direction branch respectively, can
To obtain at least one current detection as a result, carrying out integrated treatment at least one current detection result, comprehensive detection knot is obtained
Fruit carries out wake-up judgement based on comprehensive detection result and the default thresholding that wakes up, i.e., wakes up door when comprehensive detection result is greater than to preset
In limited time, it identifies wake-up word, starts arousal function.
And obtained for the single-channel voice detection model after the resetting in each direction branch, in preceding embodiment
All reset process in the speech detection model of description are consistent.
That is, the audio-frequency data processing device reset time point in embodiment in front, carries out resetting speech detection
The realization of model can directly simply in the direction each of Figure 13, branch be independently operated, i.e. all directions branch according to itself
Testing result to this direction branch carry out reset operation, can also be according to the maximum value of testing result in all directions branch come
The unified single-channel voice detection model reset in all direction branches.
Illustratively, as shown in figure 14, the multi-direction branch in Figure 13 is called out by the way of a detection path
Word of waking up detects and the process of resetting detection.As shown in figure 15, for for a direction branch, using a main detection path
The mode of (single channel wakes up word detection) and a backup detection path (backup single channel wakes up word detection) is to more in Figure 13
Direction branch wake up the process of word detection and resetting detection.As shown in figure 16, it is detected for the wake-up word of multi-direction branch
During, different directions branch can be used by the way of a detection path and the cooperation of at least two detection paths.
It should be noted that each direction branch can use resetting judgment mode of the Figure 14 into Figure 16, the present invention
Which the specific direction that embodiment does not limit the branch that can be reset is.In detailed description embodiment in front into
Description is gone, details are not described herein again.
In some embodiments of the invention, in the scene of main detection path and backup detection path, to all sides
Resetting and backup operation are carried out in turn to branch, in optional reset time point ti, the i-th %N branch is reset, wherein N is
The number of branch, " % " representative take the remainder operation;Alternatively, in any reset time point ti, select current detection result minimum
Branch is in ti+1Moment carries out resetting and backup operation, and the embodiment of the present invention is with no restriction.
In the following, the embodiment of the present invention will be illustrated in an actual applied field for wake up using intelligent sound box word detection
Exemplary application in scape is illustrated by taking the reset mode of at least two detection paths as an example.
As shown in figure 17, user the moment 1 issue ", small four " audio data 1 (audio data to be detected), the audio
Data 1 are received by intelligent sound box, and intelligent sound box carries out waking up detection and resetting detection for audio data 1, and intelligent sound box is sentenced
The disconnected moment 1 compares with default preheating time point and default reset time point, obtains the moment 1 and reaches default preheating time point,
Then, it resets and starts backup detection path, in this case, intelligent sound box is using main detection path and backup detection path
Wake-up identification is carried out, main testing result and backup testing result are obtained;Main testing result and backup testing result are integrated
Processing, obtains total testing result;When total testing result is greater than default wake-up thresholding, audio data to be detected is identified to wake up
Word starts arousal function, just the voice prompting of output " I " to user.User is known that and can carry out next time in this way
Phonetic order, control intelligent sound box realize certain application function.In embodiments of the present invention, which can be
The application function of intelligent sound box itself can also be controlled by server and is in it with other terminals in a local area network
Application function.
Illustratively, as shown in figure 18, the audio data 2 of " opening TV ", intelligence after intelligent sound box is waken up, are received
Speaker starts the arousal function of TV opening, then generates electricity after it have passed through resetting detection above-mentioned and wake up judgement
Depending on enabled instruction to server, which is just opened according to TV enabled instruction by network-control TV, on the boundary of TV
The signal language of " booting " is shown on face.
The embodiment of the present invention provides one kind and is stored with computer readable storage medium, wherein it is stored with executable instruction, when
When executable audio data process instruction is executed by processor, processor will be caused to execute audio number provided in an embodiment of the present invention
According to processing method.
In some embodiments, storage medium can be FRAM, ROM, PROM, EPROM, EE PROM, flash memory, magnetic surface
The memories such as memory, CD or CD-ROM;Be also possible to include one of above-mentioned memory or any combination various equipment.
In some embodiments, executable instruction can use program, software, software module, the form of script or code,
By any form of programming language (including compiling or interpretative code, or declaratively or process programming language) write, and its
It can be disposed by arbitrary form, including be deployed as independent program or be deployed as module, component, subroutine or be suitble to
Calculate other units used in environment.
As an example, executable instruction can with but not necessarily correspond to the file in file system, can be stored in
A part of the file of other programs or data is saved, for example, being stored in hypertext markup language (H TML, Hyper Text
Markup Language) in one or more scripts in document, it is stored in the single file for being exclusively used in discussed program
In, alternatively, being stored in multiple coordinated files (for example, the file for storing one or more modules, subprogram or code section).
As an example, executable instruction can be deployed as executing in a calculating equipment, or it is being located at one place
Multiple calculating equipment on execute, or, be distributed in multiple places and by multiple calculating equipment of interconnection of telecommunication network
Upper execution.
The above, only the embodiment of the present invention, are not intended to limit the scope of the present invention.It is all in this hair
Made any modifications, equivalent replacements, and improvements etc. within bright spirit and scope, be all contained in protection scope of the present invention it
It is interior.
Claims (15)
1. a kind of audio data processing method characterized by comprising
Speech detection model is obtained, the speech detection model is the sound of at least one detection path with historical accumulation characteristic
The corresponding relationship of frequency evidence and speech recognition result;
Based on the quantity of at least one detection path described in detecting, references object is determined;The references object is to carry out weight
Set the factor of operation judges;
Based on the references object, determine that reset time point, the reset time point are the case where guaranteeing speech recognition performance
Under, at the time of initializing the historical accumulation in the speech detection model;
When the reset time point reaches, the speech detection model is reset.
2. the method according to claim 1, wherein described based at least one detection path described in detecting
Quantity, determine references object, comprising:
When the quantity of the detection path detected is one, determine that the references object is current detection result;
Correspondingly, described be based on the references object, reset time point is determined, comprising:
Obtain audio data to be detected;
The audio data to be detected is identified using the speech detection model, obtains current detection result;
When the current detection result meets default resetting thresholding, determine that current point in time is the reset time point;
Wherein, it presets resetting thresholding and is more than or equal to default wake-up thresholding.
3. the method according to claim 1, wherein described based at least one detection path described in detecting
Quantity, determine references object, comprising:
When the quantity of the detection path detected is greater than one, determine that the references object is current point in time;
Correspondingly, at least one described detection path includes: backup detection path;It is described to be based on the references object, determine weight
Set time point, comprising:
Obtain current point in time;
When the current point in time reaches default preheating time point, it is logical that the current point in time is determined as the backup detection
The reset time point on road, wherein the default preheating time point be since default reset time point before default preheating when
Between section time point.
4. according to the method described in claim 2, it is characterized in that, described utilize the speech detection model to described to be detected
Audio data is identified, after obtaining current detection result, the method also includes:
Obtain the history testing result before current point in time;
When the variation range between the current detection result and the history testing result meets default false wake-up range, really
The fixed current point in time is the reset time point.
5. according to the method described in claim 3, it is characterized in that, described when the reset time point reaches, described in resetting
Speech detection model, comprising:
When the current point in time reaches default preheating time point, resets and start the backup detection path.
6. according to the method described in claim 5, it is characterized in that, at least one described detection path further include: main detection is logical
Road;The resetting and after starting the backup detection path, the method also includes:
Speech recognition is carried out using the main detection path and the backup detection path;
After by the default preheating time section, when reaching the default reset time point, the main detection path is reset;
When since the default reset time point using the default preheating time section, it is logical to close the backup detection
Road carries out speech recognition using the main detection path.
7. according to claim 3,5 or 6 described in any item methods, which is characterized in that
The default reset time point is the time series for being spaced predetermined time period;
The predetermined time period is in the range of 2 times of section of default preheating time and default tolerance threshold wake-up value;
The default tolerance threshold wake-up value is in default optimal wake-up upper limit value and presets between best false wake-up lower limit value;
The default preheating time section is more than or equal to the default wake-up word duration.
8. according to the method described in claim 6, it is characterized in that, described detected using the main detection path and the backup
Access carries out speech recognition, comprising:
Receive audio data to be detected;
Speech recognition is carried out to the audio data to be detected respectively using the main detection path and the backup detection path,
Obtain main testing result and backup testing result;
Integrated treatment is carried out to the main testing result and the backup testing result, obtains total testing result;
When total testing result is greater than default wake-up thresholding, the audio data to be detected is identified to wake up word, starting
Arousal function.
9. according to the method described in claim 6, it is characterized in that, described carry out speech recognition using the main detection path,
Include:
Receive audio data to be detected;
Speech recognition is carried out to the audio data to be detected using the main detection path, obtains main testing result;
When the main testing result is greater than default wake-up thresholding, the audio data to be detected is identified to wake up word, starting
Arousal function.
10. the method according to claim 1, wherein described when the reset time point reaches, described in resetting
After speech detection model, the method also includes:
Speech recognition is carried out using the speech detection model after resetting.
11. according to the method described in claim 10, it is characterized in that, the speech detection model using after resetting carries out language
Sound identification, comprising:
In the speech detection based at least one direction branch, according to the speech detection model after the resetting at least one
Direction branch carries out speech recognition respectively, obtains at least one current detection result;
Integrated treatment is carried out at least one described current detection result, obtains comprehensive detection result;
When the comprehensive detection result is greater than default wake-up thresholding, identifies wake-up word, start arousal function.
12. the method according to claim 1, wherein described when the reset time point reaches, described in resetting
Speech detection model, comprising:
When the reset time point reaches, the data with historical accumulation characteristic in the speech detection model are initialized,
Speech detection model after being reset.
13. a kind of audio-frequency data processing device characterized by comprising
Acquiring unit, for obtaining speech detection model, the speech detection model is at least one with historical accumulation characteristic
The audio data of a detection path and the corresponding relationship of speech recognition result;
Determination unit determines references object for the quantity based at least one detection path described in detecting;The reference
Object is the factor for carrying out reset operation judgement;And it is based on the references object, determine reset time point, the reset time
Point is in the case where guaranteeing speech recognition performance, at the time of initializing the historical accumulation in the speech detection model;
Reset cell, for resetting the speech detection model when the reset time point reaches.
14. a kind of audio-frequency data processing device characterized by comprising
Memory, for storing executable audio data process instruction;
Processor, when for executing the executable audio data process instruction stored in the memory, realize claim 1 to
12 described in any item methods.
15. a kind of computer readable storage medium, which is characterized in that executable audio data process instruction is stored with, for drawing
When playing processor execution, the described in any item methods of claim 1 to 12 are realized.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811361659.4A CN110164431B (en) | 2018-11-15 | 2018-11-15 | Audio data processing method and device and storage medium |
CN201910810103.7A CN110364162B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence resetting method and device and storage medium |
CN201910809813.8A CN110415698B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence data detection method and device and storage medium |
CN201910809694.6A CN110517680B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence data detection method and device and storage medium |
CN201910809323.8A CN110517679B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence audio data processing method and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811361659.4A CN110164431B (en) | 2018-11-15 | 2018-11-15 | Audio data processing method and device and storage medium |
Related Child Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910809813.8A Division CN110415698B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence data detection method and device and storage medium |
CN201910810103.7A Division CN110364162B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence resetting method and device and storage medium |
CN201910809323.8A Division CN110517679B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence audio data processing method and device and storage medium |
CN201910809694.6A Division CN110517680B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence data detection method and device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110164431A true CN110164431A (en) | 2019-08-23 |
CN110164431B CN110164431B (en) | 2023-01-06 |
Family
ID=67645151
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910809694.6A Active CN110517680B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence data detection method and device and storage medium |
CN201910810103.7A Active CN110364162B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence resetting method and device and storage medium |
CN201910809323.8A Active CN110517679B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence audio data processing method and device and storage medium |
CN201811361659.4A Active CN110164431B (en) | 2018-11-15 | 2018-11-15 | Audio data processing method and device and storage medium |
CN201910809813.8A Active CN110415698B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence data detection method and device and storage medium |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910809694.6A Active CN110517680B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence data detection method and device and storage medium |
CN201910810103.7A Active CN110364162B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence resetting method and device and storage medium |
CN201910809323.8A Active CN110517679B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence audio data processing method and device and storage medium |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910809813.8A Active CN110415698B (en) | 2018-11-15 | 2018-11-15 | Artificial intelligence data detection method and device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (5) | CN110517680B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111341297B (en) * | 2020-03-04 | 2023-04-07 | 开放智能机器(上海)有限公司 | Voice wake-up rate test system and method |
CN114039398B (en) * | 2022-01-07 | 2022-05-17 | 深圳比特微电子科技有限公司 | Control method and device of new energy camera equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1264888A (en) * | 1998-12-17 | 2000-08-30 | 索尼国际(欧洲)股份有限公司 | Semi-monitoring speaker self-adaption |
US20160180838A1 (en) * | 2014-12-22 | 2016-06-23 | Google Inc. | User specified keyword spotting using long short term memory neural network feature extractor |
CN107358948A (en) * | 2017-06-27 | 2017-11-17 | 上海交通大学 | Language in-put relevance detection method based on attention model |
CN107644642A (en) * | 2017-09-20 | 2018-01-30 | 广东欧珀移动通信有限公司 | Method for recognizing semantics, device, storage medium and electronic equipment |
CN107680597A (en) * | 2017-10-23 | 2018-02-09 | 平安科技(深圳)有限公司 | Audio recognition method, device, equipment and computer-readable recording medium |
CN107978311A (en) * | 2017-11-24 | 2018-05-01 | 腾讯科技(深圳)有限公司 | A kind of voice data processing method, device and interactive voice equipment |
CN108597519A (en) * | 2018-04-04 | 2018-09-28 | 百度在线网络技术(北京)有限公司 | A kind of bill classification method, apparatus, server and storage medium |
Family Cites Families (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3873418B2 (en) * | 1997-12-26 | 2007-01-24 | 三菱電機株式会社 | Voice spotting device |
FR2808917B1 (en) * | 2000-05-09 | 2003-12-12 | Thomson Csf | METHOD AND DEVICE FOR VOICE RECOGNITION IN FLUATING NOISE LEVEL ENVIRONMENTS |
JP2002366187A (en) * | 2001-06-08 | 2002-12-20 | Sony Corp | Device and method for recognizing voice, program and recording medium |
JP4316494B2 (en) * | 2002-05-10 | 2009-08-19 | 旭化成株式会社 | Voice recognition device |
US20040064314A1 (en) * | 2002-09-27 | 2004-04-01 | Aubert Nicolas De Saint | Methods and apparatus for speech end-point detection |
CN1996870A (en) * | 2006-01-04 | 2007-07-11 | 中兴通讯股份有限公司 | A method and device for automatic switching of the communication channels in the mutual-backup board |
CN101034390A (en) * | 2006-03-10 | 2007-09-12 | 日电(中国)有限公司 | Apparatus and method for verbal model switching and self-adapting |
CN101334998A (en) * | 2008-08-07 | 2008-12-31 | 上海交通大学 | Chinese speech recognition system based on heterogeneous model differentiated fusion |
US8700399B2 (en) * | 2009-07-06 | 2014-04-15 | Sensory, Inc. | Systems and methods for hands-free voice control and voice search |
JP2011180308A (en) * | 2010-02-26 | 2011-09-15 | Masatomo Okumura | Voice recognition device and recording medium |
US8428759B2 (en) * | 2010-03-26 | 2013-04-23 | Google Inc. | Predictive pre-recording of audio for voice input |
US8473287B2 (en) * | 2010-04-19 | 2013-06-25 | Audience, Inc. | Method for jointly optimizing noise reduction and voice quality in a mono or multi-microphone system |
US9110452B2 (en) * | 2011-09-19 | 2015-08-18 | Fisher-Rosemount Systems, Inc. | Inferential process modeling, quality prediction and fault detection using multi-stage data segregation |
CN102543071B (en) * | 2011-12-16 | 2013-12-11 | 安徽科大讯飞信息科技股份有限公司 | Voice recognition system and method used for mobile equipment |
US20140365221A1 (en) * | 2012-07-31 | 2014-12-11 | Novospeech Ltd. | Method and apparatus for speech recognition |
KR20140077422A (en) * | 2012-12-14 | 2014-06-24 | 한국전자통신연구원 | Voice recognition performance improvement method |
CN104167206B (en) * | 2013-05-17 | 2017-05-31 | 佳能株式会社 | Acoustic model merging method and equipment and audio recognition method and system |
US9299340B2 (en) * | 2013-10-07 | 2016-03-29 | Honeywell International Inc. | System and method for correcting accent induced speech in an aircraft cockpit utilizing a dynamic speech database |
KR101528518B1 (en) * | 2013-11-08 | 2015-06-12 | 현대자동차주식회사 | Vehicle and control method thereof |
US20150255068A1 (en) * | 2014-03-10 | 2015-09-10 | Microsoft Corporation | Speaker recognition including proactive voice model retrieval and sharing features |
KR101598948B1 (en) * | 2014-07-28 | 2016-03-02 | 현대자동차주식회사 | Speech recognition apparatus, vehicle having the same and speech recongition method |
CN104538028B (en) * | 2014-12-25 | 2017-10-17 | 清华大学 | A kind of continuous speech recognition method that Recognition with Recurrent Neural Network is remembered based on depth shot and long term |
US11080587B2 (en) * | 2015-02-06 | 2021-08-03 | Deepmind Technologies Limited | Recurrent neural networks for data item generation |
CN105096941B (en) * | 2015-09-02 | 2017-10-31 | 百度在线网络技术(北京)有限公司 | Audio recognition method and device |
CN105355198B (en) * | 2015-10-20 | 2019-03-12 | 河海大学 | It is a kind of based on multiple adaptive model compensation audio recognition method |
CN116229981A (en) * | 2015-11-12 | 2023-06-06 | 谷歌有限责任公司 | Generating a target sequence from an input sequence using partial conditions |
CN106856092B (en) * | 2015-12-09 | 2019-11-15 | 中国科学院声学研究所 | Chinese speech keyword retrieval method based on feedforward neural network language model |
CN105489222B (en) * | 2015-12-11 | 2018-03-09 | 百度在线网络技术(北京)有限公司 | Audio recognition method and device |
CN107800860A (en) * | 2016-09-07 | 2018-03-13 | 中兴通讯股份有限公司 | Method of speech processing, device and terminal device |
CN108461080A (en) * | 2017-02-21 | 2018-08-28 | 中兴通讯股份有限公司 | A kind of Acoustic Modeling method and apparatus based on HLSTM models |
CN107220845B (en) * | 2017-05-09 | 2021-06-29 | 北京星选科技有限公司 | User re-purchase probability prediction/user quality determination method and device and electronic equipment |
CN107544726B (en) * | 2017-07-04 | 2021-04-16 | 百度在线网络技术(北京)有限公司 | Speech recognition result error correction method and device based on artificial intelligence and storage medium |
CN108076224B (en) * | 2017-12-21 | 2021-06-29 | Oppo广东移动通信有限公司 | Application program control method and device, storage medium and mobile terminal |
US11043218B1 (en) * | 2019-06-26 | 2021-06-22 | Amazon Technologies, Inc. | Wakeword and acoustic event detection |
CN110415685A (en) * | 2019-08-20 | 2019-11-05 | 河海大学 | A kind of audio recognition method |
-
2018
- 2018-11-15 CN CN201910809694.6A patent/CN110517680B/en active Active
- 2018-11-15 CN CN201910810103.7A patent/CN110364162B/en active Active
- 2018-11-15 CN CN201910809323.8A patent/CN110517679B/en active Active
- 2018-11-15 CN CN201811361659.4A patent/CN110164431B/en active Active
- 2018-11-15 CN CN201910809813.8A patent/CN110415698B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1264888A (en) * | 1998-12-17 | 2000-08-30 | 索尼国际(欧洲)股份有限公司 | Semi-monitoring speaker self-adaption |
US20160180838A1 (en) * | 2014-12-22 | 2016-06-23 | Google Inc. | User specified keyword spotting using long short term memory neural network feature extractor |
CN107358948A (en) * | 2017-06-27 | 2017-11-17 | 上海交通大学 | Language in-put relevance detection method based on attention model |
CN107644642A (en) * | 2017-09-20 | 2018-01-30 | 广东欧珀移动通信有限公司 | Method for recognizing semantics, device, storage medium and electronic equipment |
CN107680597A (en) * | 2017-10-23 | 2018-02-09 | 平安科技(深圳)有限公司 | Audio recognition method, device, equipment and computer-readable recording medium |
CN107978311A (en) * | 2017-11-24 | 2018-05-01 | 腾讯科技(深圳)有限公司 | A kind of voice data processing method, device and interactive voice equipment |
CN108597519A (en) * | 2018-04-04 | 2018-09-28 | 百度在线网络技术(北京)有限公司 | A kind of bill classification method, apparatus, server and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110415698B (en) | 2022-05-13 |
CN110364162B (en) | 2022-03-15 |
CN110517679A (en) | 2019-11-29 |
CN110164431B (en) | 2023-01-06 |
CN110364162A (en) | 2019-10-22 |
CN110517680B (en) | 2023-02-03 |
CN110517679B (en) | 2022-03-08 |
CN110517680A (en) | 2019-11-29 |
CN110415698A (en) | 2019-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107481718B (en) | Audio recognition method, device, storage medium and electronic equipment | |
CN107644642B (en) | Semantic recognition method and device, storage medium and electronic equipment | |
CN103065631B (en) | A kind of method of speech recognition, device | |
CN103971680B (en) | A kind of method, apparatus of speech recognition | |
CN109979438A (en) | Voice awakening method and electronic equipment | |
CN109192210B (en) | Voice recognition method, wake-up word detection method and device | |
CN110148405B (en) | Voice instruction processing method and device, electronic equipment and storage medium | |
CN104737101A (en) | Computing device with force-triggered non-visual responses | |
US11249645B2 (en) | Application management method, storage medium, and electronic apparatus | |
CN110544468B (en) | Application awakening method and device, storage medium and electronic equipment | |
US10911910B2 (en) | Electronic device and method of executing function of electronic device | |
CN108766438A (en) | Man-machine interaction method, device, storage medium and intelligent terminal | |
KR20190009488A (en) | An electronic device and system for deciding a duration of receiving voice input based on context information | |
CN109272991A (en) | Method, apparatus, equipment and the computer readable storage medium of interactive voice | |
US10950221B2 (en) | Keyword confirmation method and apparatus | |
CN112669822B (en) | Audio processing method and device, electronic equipment and storage medium | |
CN111722696B (en) | Voice data processing method and device for low-power-consumption equipment | |
CN110580897B (en) | Audio verification method and device, storage medium and electronic equipment | |
CN110164431A (en) | A kind of audio data processing method and device, storage medium | |
CN112185382A (en) | Method, device, equipment and medium for generating and updating wake-up model | |
CN112269322A (en) | Awakening method and device of intelligent device, electronic device and medium | |
CN112037772A (en) | Multi-mode-based response obligation detection method, system and device | |
WO2020102991A1 (en) | Method and apparatus for waking up device, storage medium and electronic device | |
TWI748587B (en) | Acoustic event detection system and method | |
EP4276826A1 (en) | Electronic device providing operation state information of home appliance, and operation method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |