CN112053686A - Audio interruption method and device and computer readable storage medium - Google Patents
Audio interruption method and device and computer readable storage medium Download PDFInfo
- Publication number
- CN112053686A CN112053686A CN202010739039.0A CN202010739039A CN112053686A CN 112053686 A CN112053686 A CN 112053686A CN 202010739039 A CN202010739039 A CN 202010739039A CN 112053686 A CN112053686 A CN 112053686A
- Authority
- CN
- China
- Prior art keywords
- audio
- data
- feature vector
- vector data
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 239000012634 fragment Substances 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 7
- 241001465754 Metazoa Species 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000013145 classification model Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L15/222—Barge in, i.e. overridable guidance for interrupting prompts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0638—Interactive procedures
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses an audio interruption method, an audio interruption device and a computer readable storage medium, wherein the method comprises the following steps: acquiring a plurality of feature vector data of the audio data; generating, for a plurality of the feature vector data, a confidence level for characterizing the audio data as a particular audio; stopping the output of the current audio information according to the generated confidence. Therefore, the output of the current audio information is determined to be stopped by using the confidence coefficient generated by the feature vector data, and a recognition result does not need to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
Description
Technical Field
The present invention relates to the field of speech processing, and in particular, to an audio interrupt method and apparatus, and a computer-readable storage medium.
Background
The existing interruption technology is mainly applied to intelligent customer service conversation, namely, a user can interrupt the speech of the robot at any time in the process of the speech of the robot. However, the recognition result of the ASR system is delayed greatly, and the delay is close to 1s from the time when the user starts speaking to the time when the interruption event is triggered, so that the intelligent customer service still performs TTS (text to speech) broadcasting within 1s after the interruption, and the interrupted user experience is influenced.
Disclosure of Invention
The embodiment of the invention provides an audio interruption method, an audio interruption device and a computer-readable storage medium, which have the technical effects of reducing interruption delay and improving user experience.
One aspect of the present invention provides an audio interruption method, including: acquiring a plurality of feature vector data of the audio data; generating, for a plurality of the feature vector data, a confidence level for characterizing the audio data as a particular audio; stopping the output of the current audio information according to the generated confidence.
In an embodiment, the obtaining the plurality of feature vector data of the audio data includes: extracting a plurality of continuous audio fragment data in the audio data in a streaming manner; and respectively extracting the characteristics of the plurality of audio fragment data to generate a plurality of characteristic vector data.
In one embodiment, the plurality of consecutive audio clip data are extracted at equal intervals, and adjacent audio clip data are partially overlapped with each other.
In an embodiment, the generating, for a plurality of the feature vector data, a confidence for characterizing the audio data as a specific audio includes: respectively generating a probability value for representing the feature vector data as preset classification information aiming at each feature vector data; generating a confidence coefficient for representing the audio data as specific audio according to the probability value corresponding to each feature vector data
In an embodiment, the generating, for each of the feature vector data, a probability value for characterizing the feature vector data as preset classification information includes: and respectively inputting each feature vector data into a classifier model for training, and respectively outputting a probability value for representing the feature vector data as preset classification information.
In an embodiment, the classifier model is a two-classifier model, and the preset classification information is human information.
In an embodiment, the generating a confidence level for characterizing the audio data as a specific audio according to the probability value corresponding to each of the feature vector data includes: counting in a streaming manner a number of at least some of the probability values exceeding a probability threshold; and if the counted number is judged to exceed the specified number threshold, generating a confidence coefficient for representing the audio data as the specific audio according to the probability value of the participated statistics.
In an embodiment, the generating a confidence level for characterizing the audio data as a specific audio according to the probability value of the participated statistics comprises: selecting probability values exceeding the probability threshold value from the probability values participating in statistics; and calculating the geometric mean value of the selected probability value to generate confidence coefficient, wherein the calculation formula is as follows:Con=0(M<Tc) (ii) a Where Con represents the confidence level, M represents the number of probability values that exceed the probability threshold, piA probability value, T, representing the feature vector data as preset classification informationpRepresenting a probability threshold, TcIndicating a specified number threshold.
Another aspect of the present invention provides an audio interrupting device, comprising: the characteristic acquisition module is used for acquiring a plurality of characteristic vector data of the audio data; a confidence generating module, configured to generate, for a plurality of the feature vector data, a confidence for characterizing the audio data as a specific audio; and the confidence coefficient execution module is used for stopping the output of the current audio information according to the generated confidence coefficient.
Another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform any of the audio interruption methods described above.
In the embodiment of the invention, the output of the current audio information is determined to stop by utilizing the confidence coefficient generated by the feature vector data, and the recognition result is not required to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Fig. 1 is a schematic flow chart illustrating an implementation of an audio interruption method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a relationship between adjacent audio clip data according to an audio interruption method of the present invention;
fig. 3 is a schematic structural diagram of an audio interrupt device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart illustrating an implementation of an audio interruption method according to an embodiment of the present invention.
As shown in fig. 1, an aspect of the present invention provides an audio interruption method, including:
and 103, stopping outputting the current audio information according to the generated confidence coefficient.
In this embodiment, in step 101, the audio data may be acquired by an audio acquisition device, such as a voice recorder or a microphone, and the audio data may be a voice of a human being, a voice of an animal, or a natural sound.
In step 102, the specific audio may be one of a human voice, an animal voice, or a natural sound, and may be specified in advance according to the actual application.
In step 103, the confidence level is used to indicate the reliability of the audio data as a specific audio, and the higher the confidence level, the higher the probability that the audio data is a specific audio. The current audio information is mainly output by a machine end or an equipment end, and when the confidence coefficient meets a certain condition, the output of the current audio information is stopped.
Therefore, the output of the current audio information is determined to be stopped by using the confidence coefficient generated by the feature vector data, and a recognition result does not need to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
When the method is applied to an intelligent customer service conversation scene, the intelligent customer service can immediately stop current audio output and continue to receive the sound of a user when the equipment end judges that the received audio data is the voice.
The method can also be applied to audio output equipment, for example, in the process that the audio output equipment such as a vehicle-mounted sound box is playing, if the whistling sound around the vehicle is received, the current playing is stopped, so that a driver can hear the whistling sound, and the driving safety is improved.
In one embodiment, obtaining a plurality of feature vector data of audio data comprises:
extracting a plurality of continuous audio fragment data in the audio data in a streaming manner;
features of the plurality of pieces of audio segment data are extracted, respectively, to generate a plurality of pieces of feature vector data.
In this embodiment, the specific process of step 101 is as follows:
extracting a plurality of continuous audio fragment data from the audio data in an order from a head data node to a tail data node;
then, the mfccs (mel Frequency Cepstral coefficients) features or FilterBank features are extracted for each audio fragment data, and a plurality of feature vector data are generated.
Fig. 2 is a schematic diagram illustrating a relationship between adjacent audio clip data in an audio interruption method according to an embodiment of the present invention.
In one embodiment, the extraction time intervals of a plurality of consecutive audio clip data are equal, and the data overlap between adjacent audio clip data.
In the present embodiment, as shown in fig. 2, the time interval is preferably one frame time, i.e., 25 ms. In order to avoid the omission of audio data, it is preferable to extract one frame every 10ms in the extraction, so that the data of adjacent audio segments overlap with each other, and the shaded portion in fig. 2 is the overlapping portion.
In an embodiment, for each feature vector data, generating a confidence for characterizing the audio data as a specific audio comprises:
respectively generating probability values for representing the feature vector data as preset classification information aiming at each feature vector data;
and generating a confidence coefficient for representing the audio data as the specific audio according to the probability value corresponding to each feature vector data.
In this embodiment, the specific process of step 102 is:
and judging and generating the probability value of the feature vector data as preset classification information aiming at each feature vector data, wherein the preset classification information can be set according to practical application, for example, when the preset classification information is applied to intelligent customer service conversation, the preset classification information is human voice, and when the preset classification information is applied to vehicle driving, the preset classification information is whistling.
And generating confidence coefficient for representing the audio data as specific audio according to the probability value of each feature vector data.
In an implementation manner, for each piece of feature vector data, respectively generating a probability value for characterizing the feature vector data as preset classification information includes:
and respectively inputting each feature vector data into the classifier model for training, and respectively outputting probability values for representing the feature vector data as preset classification information.
In this embodiment, the specific steps of generating the probability value are as follows:
and respectively outputting each feature vector to a classifier model for training, wherein the classifier model can map the data to one of the given classes so as to be applied to data prediction. In a word, the classifier is a general term of a method for classifying samples in data mining, and includes algorithms such as decision trees, logistic regression, naive bayes, neural networks and the like.
When the classifier is applied to the method, the classifier needs to be trained, and generally the following steps are carried out:
1. samples (including positive samples and negative samples) are selected, and all samples are divided into two parts, namely training samples and testing samples.
2. And executing a classifier algorithm on the training samples to generate a classification model.
3. And executing the classification model on the test sample to generate a prediction result.
4. And calculating necessary evaluation indexes according to the prediction result, and evaluating the performance of the classification model.
In one embodiment, the classifier model is a two-classifier model, and the predetermined classification information is human voice information.
In this embodiment, the classifier model is a two-classifier model, and is preferably used for predicting that the feature vector data is human voice information (denoted as p)i) Probability of (2) and probability of non-human voice information (denoted as q)i) And satisfy pi+qi=1。
In an embodiment, generating a confidence level for characterizing the audio data as a specific audio according to the probability value corresponding to each feature vector data includes:
counting in a streaming manner the number of at least partial probability values exceeding a probability threshold;
and if the counted number is judged to exceed the specified number threshold, generating a confidence coefficient for representing the audio data as the specific audio according to the probability value of the participated statistics.
In this embodiment, the specific process of generating the confidence coefficient is as follows:
and (3) selecting the probability values of the designated number in a streaming manner according to the sequence of the data stream by using a Sliding window (Sliding window) technology, and judging the number exceeding the probability threshold value in all the selected probability values, wherein the probability threshold value can be set in advance.
If the number exceeding the probability threshold exceeds a specified number threshold, which can be set in advance, the "steady-state audio" is considered to be detected, and then a confidence coefficient for representing the audio data as the specific audio is generated according to the probability value of the participated statistics.
In an embodiment, generating a confidence level for characterizing the audio data as the specific audio according to the probability value of the participated statistics comprises:
selecting probability values exceeding a probability threshold value from the probability values participating in statistics;
and calculating the geometric mean value of the selected probability value to generate confidence coefficient, wherein the calculation formula is as follows:
where Con represents confidence, M represents the number of probability values that exceed a probability threshold, piProbability value T representing feature vector data as preset classification informationpRepresenting a probability threshold, TcIndicating a specified number threshold.
In this embodiment, when the calculated confidence is higher than the specified value, it is determined that the audio data is the specific audio, and the output of the current audio information is stopped.
Fig. 3 is a schematic structural diagram of an audio interrupt device according to an embodiment of the present invention.
As shown in fig. 3, another aspect of the present invention provides an audio interrupting device, comprising:
a feature obtaining module 201, configured to obtain a plurality of feature vector data of the audio data;
a confidence generating module 202, configured to generate, for the plurality of feature vector data, a confidence for characterizing the audio data as a specific audio;
and the confidence coefficient executing module 203 is used for stopping the output of the current audio information according to the generated confidence coefficient.
In this embodiment, in the feature obtaining module 201, the audio data may be collected through an audio collecting device, such as a recording pen or a microphone, and the audio data may specifically be a human voice, a sound of an animal, or a natural sound.
In the confidence generating module 202, the specific audio may also be one of human voice, animal voice or natural sound, and may be specified in advance according to the actual application.
In the confidence performing module 203, the confidence is used to indicate the reliability of the audio data as a specific audio, and the higher the confidence is, the higher the probability that the audio data is a specific audio is. The current audio information is mainly output by a machine end or an equipment end, and when the confidence coefficient meets a certain condition, the output of the current audio information is stopped.
Therefore, the output of the current audio information is determined to be stopped by using the confidence coefficient generated by the feature vector data, and a recognition result does not need to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
When the device is applied to an intelligent customer service conversation scene, the intelligent customer service can immediately stop current audio output and continue to receive the sound of a user when the equipment end judges that the received audio data is the voice.
The device can also be applied to audio output equipment, for example, in the process that the audio output equipment such as a vehicle-mounted sound box is playing, if the whistling sound around the vehicle is received, the current playing is stopped, so that a driver can hear the whistling sound, and the driving safety is improved.
In an implementation manner, the feature obtaining module 201 is specifically configured to:
extracting a plurality of continuous audio fragment data in the audio data in a streaming manner;
features of the plurality of pieces of audio segment data are extracted, respectively, to generate a plurality of pieces of feature vector data.
In this embodiment, a plurality of consecutive audio fragment data are extracted from the audio data in the order from the head data node to the tail data node;
then, the mfccs (mel Frequency Cepstral coefficients) features or FilterBank features are extracted for each audio fragment data, and a plurality of feature vector data are generated.
In an implementation, the confidence generation module 202 is specifically configured to:
respectively generating probability values for representing the feature vector data as preset classification information aiming at each feature vector data;
and generating a confidence coefficient for representing the audio data as the specific audio according to the probability value corresponding to each feature vector data.
In this embodiment, for each piece of feature vector data, the probability value that the generated feature vector data is the preset classification information is determined, where the preset classification information may be set according to practical applications, for example, when the preset classification information is applied to an intelligent customer service conversation, the preset classification information is a human voice, and when the preset classification information is applied to vehicle driving, the preset classification information is a whistling sound.
And generating confidence coefficient for representing the audio data as specific audio according to the probability value of each feature vector data.
The confidence generating module 202 specifically includes the following steps in generating the probability value:
and respectively outputting each feature vector to a classifier model for training, wherein the classifier model can map the data to one of the given classes so as to be applied to data prediction. In a word, the classifier is a general term of a method for classifying samples in data mining, and includes algorithms such as decision trees, logistic regression, naive bayes, neural networks and the like.
The specific process of the confidence generation module 202 in generating the confidence is as follows:
and (3) selecting the probability values of the designated number in a streaming manner according to the sequence of the data stream by using a Sliding window (Sliding window) technology, and judging the number exceeding the probability threshold value in all the selected probability values, wherein the probability threshold value can be set in advance.
If the number exceeding the probability threshold exceeds the specified number threshold, considering that the steady-state audio is detected, and selecting the probability value exceeding the probability threshold from the probability values of the participated statistics; and calculating the geometric mean value of the selected probability value to generate confidence coefficient, wherein the calculation formula is as follows:
where Con represents confidence, M represents the number of probability values that exceed a probability threshold, piProbability value T representing feature vector data as preset classification informationpRepresenting a probability threshold, TcIndicating a specified number threshold.
When the confidence coefficient execution module 203 determines that the calculated confidence coefficient is higher than the specified value, it determines that the audio data is a specific audio, and stops outputting the current audio information.
Another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform an audio interrupt method.
In an embodiment of the invention, a computer-readable storage medium includes a set of computer-executable instructions that, when executed, obtain a plurality of feature vector data for audio data; generating, for a plurality of feature vector data, a confidence level for characterizing the audio data as a particular audio; stopping the output of the current audio information according to the generated confidence.
Therefore, the output of the current audio information is determined to be stopped by using the confidence coefficient generated by the feature vector data, and a recognition result does not need to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. An audio interruption method, the method comprising:
acquiring a plurality of feature vector data of the audio data;
generating, for a plurality of the feature vector data, a confidence level for characterizing the audio data as a particular audio;
stopping the output of the current audio information according to the generated confidence.
2. The method of claim 1, wherein obtaining the plurality of feature vector data for the audio data comprises:
extracting a plurality of continuous audio fragment data in the audio data in a streaming manner;
and respectively extracting the characteristics of the plurality of audio fragment data to generate a plurality of characteristic vector data.
3. The method according to claim 2, wherein the plurality of consecutive audio clip data are extracted at equal time intervals, and the adjacent audio clip data are overlapped with each other by a part of data.
4. The method of claim 1, wherein the generating, for a plurality of the feature vector data, a confidence level for characterizing the audio data as a particular audio comprises:
respectively generating a probability value for representing the feature vector data as preset classification information aiming at each feature vector data;
and generating a confidence coefficient for representing the audio data as specific audio according to the probability value corresponding to each feature vector data.
5. The method of claim 4, wherein the generating, for each of the feature vector data, a probability value for characterizing the feature vector data as preset classification information comprises:
and respectively inputting each feature vector data into a classifier model for training, and respectively outputting a probability value for representing the feature vector data as preset classification information.
6. The method of claim 5, wherein the classifier model is a two-classifier model, and the preset classification information is human voice information.
7. The method of claim 4, wherein the generating a confidence level for characterizing the audio data as a specific audio according to the probability value corresponding to each of the feature vector data comprises:
counting in a streaming manner a number of at least some of the probability values exceeding a probability threshold;
and if the counted number is judged to exceed the specified number threshold, generating a confidence coefficient for representing the audio data as the specific audio according to the probability value of the participated statistics.
8. The method of claim 7, wherein generating a confidence level for characterizing the audio data as a specific audio according to the probability values of the participating statistics comprises:
selecting probability values exceeding the probability threshold value from the probability values participating in statistics;
and calculating the geometric mean value of the selected probability value to generate confidence coefficient, wherein the calculation formula is as follows:
where Con represents the confidence level, M represents the number of probability values that exceed the probability threshold, piA probability value, T, representing the feature vector data as preset classification informationpRepresenting a probability threshold, TcIndicating a specified number threshold.
9. An audio interruption device, the device comprising:
the characteristic acquisition module is used for acquiring a plurality of characteristic vector data of the audio data;
a confidence generating module, configured to generate, for a plurality of the feature vector data, a confidence for characterizing the audio data as a specific audio;
and the confidence coefficient execution module is used for stopping the output of the current audio information according to the generated confidence coefficient.
10. A computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform the audio interruption method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010739039.0A CN112053686B (en) | 2020-07-28 | 2020-07-28 | Audio interruption method, device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010739039.0A CN112053686B (en) | 2020-07-28 | 2020-07-28 | Audio interruption method, device and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112053686A true CN112053686A (en) | 2020-12-08 |
CN112053686B CN112053686B (en) | 2024-01-02 |
Family
ID=73602486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010739039.0A Active CN112053686B (en) | 2020-07-28 | 2020-07-28 | Audio interruption method, device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112053686B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700775A (en) * | 2020-12-29 | 2021-04-23 | 维沃移动通信有限公司 | Method and device for updating voice receiving period and electronic equipment |
CN113257242A (en) * | 2021-04-06 | 2021-08-13 | 杭州远传新业科技有限公司 | Voice broadcast suspension method, device, equipment and medium in self-service voice service |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102004560A (en) * | 2010-12-01 | 2011-04-06 | 哈尔滨工业大学 | User character recognition method and online once learning method in statement-level Chinese character input method and machine learning system |
US20150356461A1 (en) * | 2014-06-06 | 2015-12-10 | Google Inc. | Training distilled machine learning models |
CN108182937A (en) * | 2018-01-17 | 2018-06-19 | 出门问问信息科技有限公司 | Keyword recognition method, device, equipment and storage medium |
CN110827798A (en) * | 2019-11-12 | 2020-02-21 | 广州欢聊网络科技有限公司 | Audio signal processing method and device |
-
2020
- 2020-07-28 CN CN202010739039.0A patent/CN112053686B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102004560A (en) * | 2010-12-01 | 2011-04-06 | 哈尔滨工业大学 | User character recognition method and online once learning method in statement-level Chinese character input method and machine learning system |
US20150356461A1 (en) * | 2014-06-06 | 2015-12-10 | Google Inc. | Training distilled machine learning models |
CN108182937A (en) * | 2018-01-17 | 2018-06-19 | 出门问问信息科技有限公司 | Keyword recognition method, device, equipment and storage medium |
CN110827798A (en) * | 2019-11-12 | 2020-02-21 | 广州欢聊网络科技有限公司 | Audio signal processing method and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700775A (en) * | 2020-12-29 | 2021-04-23 | 维沃移动通信有限公司 | Method and device for updating voice receiving period and electronic equipment |
CN113257242A (en) * | 2021-04-06 | 2021-08-13 | 杭州远传新业科技有限公司 | Voice broadcast suspension method, device, equipment and medium in self-service voice service |
Also Published As
Publication number | Publication date |
---|---|
CN112053686B (en) | 2024-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107928673B (en) | Audio signal processing method, audio signal processing apparatus, storage medium, and computer device | |
US8793127B2 (en) | Method and apparatus for automatically determining speaker characteristics for speech-directed advertising or other enhancement of speech-controlled devices or services | |
US8838452B2 (en) | Effective audio segmentation and classification | |
US20130054236A1 (en) | Method for the detection of speech segments | |
US9311930B2 (en) | Audio based system and method for in-vehicle context classification | |
JP3913772B2 (en) | Sound identification device | |
EP2560167A2 (en) | Methods and apparatus for performing song detection in audio signal | |
CN110852215A (en) | Multi-mode emotion recognition method and system and storage medium | |
Socoró et al. | Development of an Anomalous Noise Event Detection Algorithm for dynamic road traffic noise mapping | |
Valero et al. | Hierarchical classification of environmental noise sources considering the acoustic signature of vehicle pass-bys | |
CN112053686B (en) | Audio interruption method, device and computer readable storage medium | |
CN112802498B (en) | Voice detection method, device, computer equipment and storage medium | |
CN110299150A (en) | A kind of real-time voice speaker separation method and system | |
Kiktova et al. | Comparison of different feature types for acoustic event detection system | |
Jiang et al. | Video segmentation with the assistance of audio content analysis | |
CN112466287A (en) | Voice segmentation method and device and computer readable storage medium | |
KR102066718B1 (en) | Acoustic Tunnel Accident Detection System | |
JP5105097B2 (en) | Speech classification apparatus, speech classification method and program | |
CN112489692A (en) | Voice endpoint detection method and device | |
CN113963719A (en) | Deep learning-based sound classification method and apparatus, storage medium, and computer | |
CN114038487A (en) | Audio extraction method, device, equipment and readable storage medium | |
EP3309777A1 (en) | Device and method for audio frame processing | |
CN112992175B (en) | Voice distinguishing method and voice recording device thereof | |
CN114329042A (en) | Data processing method, device, equipment, storage medium and computer program product | |
EP3847646B1 (en) | An audio processing apparatus and method for audio scene classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |