CN112053686A - Audio interruption method and device and computer readable storage medium - Google Patents

Audio interruption method and device and computer readable storage medium Download PDF

Info

Publication number
CN112053686A
CN112053686A CN202010739039.0A CN202010739039A CN112053686A CN 112053686 A CN112053686 A CN 112053686A CN 202010739039 A CN202010739039 A CN 202010739039A CN 112053686 A CN112053686 A CN 112053686A
Authority
CN
China
Prior art keywords
audio
data
feature vector
vector data
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010739039.0A
Other languages
Chinese (zh)
Other versions
CN112053686B (en
Inventor
邢安昊
陈晓宇
雷欣
李志飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mobvoi Information Technology Co Ltd
Original Assignee
Mobvoi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mobvoi Information Technology Co Ltd filed Critical Mobvoi Information Technology Co Ltd
Priority to CN202010739039.0A priority Critical patent/CN112053686B/en
Publication of CN112053686A publication Critical patent/CN112053686A/en
Application granted granted Critical
Publication of CN112053686B publication Critical patent/CN112053686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L15/222Barge in, i.e. overridable guidance for interrupting prompts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an audio interruption method, an audio interruption device and a computer readable storage medium, wherein the method comprises the following steps: acquiring a plurality of feature vector data of the audio data; generating, for a plurality of the feature vector data, a confidence level for characterizing the audio data as a particular audio; stopping the output of the current audio information according to the generated confidence. Therefore, the output of the current audio information is determined to be stopped by using the confidence coefficient generated by the feature vector data, and a recognition result does not need to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.

Description

Audio interruption method and device and computer readable storage medium
Technical Field
The present invention relates to the field of speech processing, and in particular, to an audio interrupt method and apparatus, and a computer-readable storage medium.
Background
The existing interruption technology is mainly applied to intelligent customer service conversation, namely, a user can interrupt the speech of the robot at any time in the process of the speech of the robot. However, the recognition result of the ASR system is delayed greatly, and the delay is close to 1s from the time when the user starts speaking to the time when the interruption event is triggered, so that the intelligent customer service still performs TTS (text to speech) broadcasting within 1s after the interruption, and the interrupted user experience is influenced.
Disclosure of Invention
The embodiment of the invention provides an audio interruption method, an audio interruption device and a computer-readable storage medium, which have the technical effects of reducing interruption delay and improving user experience.
One aspect of the present invention provides an audio interruption method, including: acquiring a plurality of feature vector data of the audio data; generating, for a plurality of the feature vector data, a confidence level for characterizing the audio data as a particular audio; stopping the output of the current audio information according to the generated confidence.
In an embodiment, the obtaining the plurality of feature vector data of the audio data includes: extracting a plurality of continuous audio fragment data in the audio data in a streaming manner; and respectively extracting the characteristics of the plurality of audio fragment data to generate a plurality of characteristic vector data.
In one embodiment, the plurality of consecutive audio clip data are extracted at equal intervals, and adjacent audio clip data are partially overlapped with each other.
In an embodiment, the generating, for a plurality of the feature vector data, a confidence for characterizing the audio data as a specific audio includes: respectively generating a probability value for representing the feature vector data as preset classification information aiming at each feature vector data; generating a confidence coefficient for representing the audio data as specific audio according to the probability value corresponding to each feature vector data
In an embodiment, the generating, for each of the feature vector data, a probability value for characterizing the feature vector data as preset classification information includes: and respectively inputting each feature vector data into a classifier model for training, and respectively outputting a probability value for representing the feature vector data as preset classification information.
In an embodiment, the classifier model is a two-classifier model, and the preset classification information is human information.
In an embodiment, the generating a confidence level for characterizing the audio data as a specific audio according to the probability value corresponding to each of the feature vector data includes: counting in a streaming manner a number of at least some of the probability values exceeding a probability threshold; and if the counted number is judged to exceed the specified number threshold, generating a confidence coefficient for representing the audio data as the specific audio according to the probability value of the participated statistics.
In an embodiment, the generating a confidence level for characterizing the audio data as a specific audio according to the probability value of the participated statistics comprises: selecting probability values exceeding the probability threshold value from the probability values participating in statistics; and calculating the geometric mean value of the selected probability value to generate confidence coefficient, wherein the calculation formula is as follows:
Figure BDA0002606009800000021
Con=0(M<Tc) (ii) a Where Con represents the confidence level, M represents the number of probability values that exceed the probability threshold, piA probability value, T, representing the feature vector data as preset classification informationpRepresenting a probability threshold, TcIndicating a specified number threshold.
Another aspect of the present invention provides an audio interrupting device, comprising: the characteristic acquisition module is used for acquiring a plurality of characteristic vector data of the audio data; a confidence generating module, configured to generate, for a plurality of the feature vector data, a confidence for characterizing the audio data as a specific audio; and the confidence coefficient execution module is used for stopping the output of the current audio information according to the generated confidence coefficient.
Another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform any of the audio interruption methods described above.
In the embodiment of the invention, the output of the current audio information is determined to stop by utilizing the confidence coefficient generated by the feature vector data, and the recognition result is not required to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Fig. 1 is a schematic flow chart illustrating an implementation of an audio interruption method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a relationship between adjacent audio clip data according to an audio interruption method of the present invention;
fig. 3 is a schematic structural diagram of an audio interrupt device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart illustrating an implementation of an audio interruption method according to an embodiment of the present invention.
As shown in fig. 1, an aspect of the present invention provides an audio interruption method, including:
step 101, acquiring a plurality of feature vector data of audio data;
step 102, generating a confidence coefficient for representing the audio data as a specific audio for the plurality of feature vector data;
and 103, stopping outputting the current audio information according to the generated confidence coefficient.
In this embodiment, in step 101, the audio data may be acquired by an audio acquisition device, such as a voice recorder or a microphone, and the audio data may be a voice of a human being, a voice of an animal, or a natural sound.
In step 102, the specific audio may be one of a human voice, an animal voice, or a natural sound, and may be specified in advance according to the actual application.
In step 103, the confidence level is used to indicate the reliability of the audio data as a specific audio, and the higher the confidence level, the higher the probability that the audio data is a specific audio. The current audio information is mainly output by a machine end or an equipment end, and when the confidence coefficient meets a certain condition, the output of the current audio information is stopped.
Therefore, the output of the current audio information is determined to be stopped by using the confidence coefficient generated by the feature vector data, and a recognition result does not need to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
When the method is applied to an intelligent customer service conversation scene, the intelligent customer service can immediately stop current audio output and continue to receive the sound of a user when the equipment end judges that the received audio data is the voice.
The method can also be applied to audio output equipment, for example, in the process that the audio output equipment such as a vehicle-mounted sound box is playing, if the whistling sound around the vehicle is received, the current playing is stopped, so that a driver can hear the whistling sound, and the driving safety is improved.
In one embodiment, obtaining a plurality of feature vector data of audio data comprises:
extracting a plurality of continuous audio fragment data in the audio data in a streaming manner;
features of the plurality of pieces of audio segment data are extracted, respectively, to generate a plurality of pieces of feature vector data.
In this embodiment, the specific process of step 101 is as follows:
extracting a plurality of continuous audio fragment data from the audio data in an order from a head data node to a tail data node;
then, the mfccs (mel Frequency Cepstral coefficients) features or FilterBank features are extracted for each audio fragment data, and a plurality of feature vector data are generated.
Fig. 2 is a schematic diagram illustrating a relationship between adjacent audio clip data in an audio interruption method according to an embodiment of the present invention.
In one embodiment, the extraction time intervals of a plurality of consecutive audio clip data are equal, and the data overlap between adjacent audio clip data.
In the present embodiment, as shown in fig. 2, the time interval is preferably one frame time, i.e., 25 ms. In order to avoid the omission of audio data, it is preferable to extract one frame every 10ms in the extraction, so that the data of adjacent audio segments overlap with each other, and the shaded portion in fig. 2 is the overlapping portion.
In an embodiment, for each feature vector data, generating a confidence for characterizing the audio data as a specific audio comprises:
respectively generating probability values for representing the feature vector data as preset classification information aiming at each feature vector data;
and generating a confidence coefficient for representing the audio data as the specific audio according to the probability value corresponding to each feature vector data.
In this embodiment, the specific process of step 102 is:
and judging and generating the probability value of the feature vector data as preset classification information aiming at each feature vector data, wherein the preset classification information can be set according to practical application, for example, when the preset classification information is applied to intelligent customer service conversation, the preset classification information is human voice, and when the preset classification information is applied to vehicle driving, the preset classification information is whistling.
And generating confidence coefficient for representing the audio data as specific audio according to the probability value of each feature vector data.
In an implementation manner, for each piece of feature vector data, respectively generating a probability value for characterizing the feature vector data as preset classification information includes:
and respectively inputting each feature vector data into the classifier model for training, and respectively outputting probability values for representing the feature vector data as preset classification information.
In this embodiment, the specific steps of generating the probability value are as follows:
and respectively outputting each feature vector to a classifier model for training, wherein the classifier model can map the data to one of the given classes so as to be applied to data prediction. In a word, the classifier is a general term of a method for classifying samples in data mining, and includes algorithms such as decision trees, logistic regression, naive bayes, neural networks and the like.
When the classifier is applied to the method, the classifier needs to be trained, and generally the following steps are carried out:
1. samples (including positive samples and negative samples) are selected, and all samples are divided into two parts, namely training samples and testing samples.
2. And executing a classifier algorithm on the training samples to generate a classification model.
3. And executing the classification model on the test sample to generate a prediction result.
4. And calculating necessary evaluation indexes according to the prediction result, and evaluating the performance of the classification model.
In one embodiment, the classifier model is a two-classifier model, and the predetermined classification information is human voice information.
In this embodiment, the classifier model is a two-classifier model, and is preferably used for predicting that the feature vector data is human voice information (denoted as p)i) Probability of (2) and probability of non-human voice information (denoted as q)i) And satisfy pi+qi=1。
In an embodiment, generating a confidence level for characterizing the audio data as a specific audio according to the probability value corresponding to each feature vector data includes:
counting in a streaming manner the number of at least partial probability values exceeding a probability threshold;
and if the counted number is judged to exceed the specified number threshold, generating a confidence coefficient for representing the audio data as the specific audio according to the probability value of the participated statistics.
In this embodiment, the specific process of generating the confidence coefficient is as follows:
and (3) selecting the probability values of the designated number in a streaming manner according to the sequence of the data stream by using a Sliding window (Sliding window) technology, and judging the number exceeding the probability threshold value in all the selected probability values, wherein the probability threshold value can be set in advance.
If the number exceeding the probability threshold exceeds a specified number threshold, which can be set in advance, the "steady-state audio" is considered to be detected, and then a confidence coefficient for representing the audio data as the specific audio is generated according to the probability value of the participated statistics.
In an embodiment, generating a confidence level for characterizing the audio data as the specific audio according to the probability value of the participated statistics comprises:
selecting probability values exceeding a probability threshold value from the probability values participating in statistics;
and calculating the geometric mean value of the selected probability value to generate confidence coefficient, wherein the calculation formula is as follows:
Figure BDA0002606009800000071
Con=0(M<Tc);
where Con represents confidence, M represents the number of probability values that exceed a probability threshold, piProbability value T representing feature vector data as preset classification informationpRepresenting a probability threshold, TcIndicating a specified number threshold.
In this embodiment, when the calculated confidence is higher than the specified value, it is determined that the audio data is the specific audio, and the output of the current audio information is stopped.
Fig. 3 is a schematic structural diagram of an audio interrupt device according to an embodiment of the present invention.
As shown in fig. 3, another aspect of the present invention provides an audio interrupting device, comprising:
a feature obtaining module 201, configured to obtain a plurality of feature vector data of the audio data;
a confidence generating module 202, configured to generate, for the plurality of feature vector data, a confidence for characterizing the audio data as a specific audio;
and the confidence coefficient executing module 203 is used for stopping the output of the current audio information according to the generated confidence coefficient.
In this embodiment, in the feature obtaining module 201, the audio data may be collected through an audio collecting device, such as a recording pen or a microphone, and the audio data may specifically be a human voice, a sound of an animal, or a natural sound.
In the confidence generating module 202, the specific audio may also be one of human voice, animal voice or natural sound, and may be specified in advance according to the actual application.
In the confidence performing module 203, the confidence is used to indicate the reliability of the audio data as a specific audio, and the higher the confidence is, the higher the probability that the audio data is a specific audio is. The current audio information is mainly output by a machine end or an equipment end, and when the confidence coefficient meets a certain condition, the output of the current audio information is stopped.
Therefore, the output of the current audio information is determined to be stopped by using the confidence coefficient generated by the feature vector data, and a recognition result does not need to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
When the device is applied to an intelligent customer service conversation scene, the intelligent customer service can immediately stop current audio output and continue to receive the sound of a user when the equipment end judges that the received audio data is the voice.
The device can also be applied to audio output equipment, for example, in the process that the audio output equipment such as a vehicle-mounted sound box is playing, if the whistling sound around the vehicle is received, the current playing is stopped, so that a driver can hear the whistling sound, and the driving safety is improved.
In an implementation manner, the feature obtaining module 201 is specifically configured to:
extracting a plurality of continuous audio fragment data in the audio data in a streaming manner;
features of the plurality of pieces of audio segment data are extracted, respectively, to generate a plurality of pieces of feature vector data.
In this embodiment, a plurality of consecutive audio fragment data are extracted from the audio data in the order from the head data node to the tail data node;
then, the mfccs (mel Frequency Cepstral coefficients) features or FilterBank features are extracted for each audio fragment data, and a plurality of feature vector data are generated.
In an implementation, the confidence generation module 202 is specifically configured to:
respectively generating probability values for representing the feature vector data as preset classification information aiming at each feature vector data;
and generating a confidence coefficient for representing the audio data as the specific audio according to the probability value corresponding to each feature vector data.
In this embodiment, for each piece of feature vector data, the probability value that the generated feature vector data is the preset classification information is determined, where the preset classification information may be set according to practical applications, for example, when the preset classification information is applied to an intelligent customer service conversation, the preset classification information is a human voice, and when the preset classification information is applied to vehicle driving, the preset classification information is a whistling sound.
And generating confidence coefficient for representing the audio data as specific audio according to the probability value of each feature vector data.
The confidence generating module 202 specifically includes the following steps in generating the probability value:
and respectively outputting each feature vector to a classifier model for training, wherein the classifier model can map the data to one of the given classes so as to be applied to data prediction. In a word, the classifier is a general term of a method for classifying samples in data mining, and includes algorithms such as decision trees, logistic regression, naive bayes, neural networks and the like.
The specific process of the confidence generation module 202 in generating the confidence is as follows:
and (3) selecting the probability values of the designated number in a streaming manner according to the sequence of the data stream by using a Sliding window (Sliding window) technology, and judging the number exceeding the probability threshold value in all the selected probability values, wherein the probability threshold value can be set in advance.
If the number exceeding the probability threshold exceeds the specified number threshold, considering that the steady-state audio is detected, and selecting the probability value exceeding the probability threshold from the probability values of the participated statistics; and calculating the geometric mean value of the selected probability value to generate confidence coefficient, wherein the calculation formula is as follows:
Figure BDA0002606009800000091
Con=0(M<Tc);
where Con represents confidence, M represents the number of probability values that exceed a probability threshold, piProbability value T representing feature vector data as preset classification informationpRepresenting a probability threshold, TcIndicating a specified number threshold.
When the confidence coefficient execution module 203 determines that the calculated confidence coefficient is higher than the specified value, it determines that the audio data is a specific audio, and stops outputting the current audio information.
Another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform an audio interrupt method.
In an embodiment of the invention, a computer-readable storage medium includes a set of computer-executable instructions that, when executed, obtain a plurality of feature vector data for audio data; generating, for a plurality of feature vector data, a confidence level for characterizing the audio data as a particular audio; stopping the output of the current audio information according to the generated confidence.
Therefore, the output of the current audio information is determined to be stopped by using the confidence coefficient generated by the feature vector data, and a recognition result does not need to be obtained by a speech recognition decoder in the prior art, so that the calculation amount is greatly reduced, the interruption delay is reduced, and the user experience is improved.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An audio interruption method, the method comprising:
acquiring a plurality of feature vector data of the audio data;
generating, for a plurality of the feature vector data, a confidence level for characterizing the audio data as a particular audio;
stopping the output of the current audio information according to the generated confidence.
2. The method of claim 1, wherein obtaining the plurality of feature vector data for the audio data comprises:
extracting a plurality of continuous audio fragment data in the audio data in a streaming manner;
and respectively extracting the characteristics of the plurality of audio fragment data to generate a plurality of characteristic vector data.
3. The method according to claim 2, wherein the plurality of consecutive audio clip data are extracted at equal time intervals, and the adjacent audio clip data are overlapped with each other by a part of data.
4. The method of claim 1, wherein the generating, for a plurality of the feature vector data, a confidence level for characterizing the audio data as a particular audio comprises:
respectively generating a probability value for representing the feature vector data as preset classification information aiming at each feature vector data;
and generating a confidence coefficient for representing the audio data as specific audio according to the probability value corresponding to each feature vector data.
5. The method of claim 4, wherein the generating, for each of the feature vector data, a probability value for characterizing the feature vector data as preset classification information comprises:
and respectively inputting each feature vector data into a classifier model for training, and respectively outputting a probability value for representing the feature vector data as preset classification information.
6. The method of claim 5, wherein the classifier model is a two-classifier model, and the preset classification information is human voice information.
7. The method of claim 4, wherein the generating a confidence level for characterizing the audio data as a specific audio according to the probability value corresponding to each of the feature vector data comprises:
counting in a streaming manner a number of at least some of the probability values exceeding a probability threshold;
and if the counted number is judged to exceed the specified number threshold, generating a confidence coefficient for representing the audio data as the specific audio according to the probability value of the participated statistics.
8. The method of claim 7, wherein generating a confidence level for characterizing the audio data as a specific audio according to the probability values of the participating statistics comprises:
selecting probability values exceeding the probability threshold value from the probability values participating in statistics;
and calculating the geometric mean value of the selected probability value to generate confidence coefficient, wherein the calculation formula is as follows:
Figure FDA0002606009790000021
where Con represents the confidence level, M represents the number of probability values that exceed the probability threshold, piA probability value, T, representing the feature vector data as preset classification informationpRepresenting a probability threshold, TcIndicating a specified number threshold.
9. An audio interruption device, the device comprising:
the characteristic acquisition module is used for acquiring a plurality of characteristic vector data of the audio data;
a confidence generating module, configured to generate, for a plurality of the feature vector data, a confidence for characterizing the audio data as a specific audio;
and the confidence coefficient execution module is used for stopping the output of the current audio information according to the generated confidence coefficient.
10. A computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform the audio interruption method of any of claims 1-8.
CN202010739039.0A 2020-07-28 2020-07-28 Audio interruption method, device and computer readable storage medium Active CN112053686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010739039.0A CN112053686B (en) 2020-07-28 2020-07-28 Audio interruption method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010739039.0A CN112053686B (en) 2020-07-28 2020-07-28 Audio interruption method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112053686A true CN112053686A (en) 2020-12-08
CN112053686B CN112053686B (en) 2024-01-02

Family

ID=73602486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010739039.0A Active CN112053686B (en) 2020-07-28 2020-07-28 Audio interruption method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112053686B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700775A (en) * 2020-12-29 2021-04-23 维沃移动通信有限公司 Method and device for updating voice receiving period and electronic equipment
CN113257242A (en) * 2021-04-06 2021-08-13 杭州远传新业科技有限公司 Voice broadcast suspension method, device, equipment and medium in self-service voice service

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004560A (en) * 2010-12-01 2011-04-06 哈尔滨工业大学 User character recognition method and online once learning method in statement-level Chinese character input method and machine learning system
US20150356461A1 (en) * 2014-06-06 2015-12-10 Google Inc. Training distilled machine learning models
CN108182937A (en) * 2018-01-17 2018-06-19 出门问问信息科技有限公司 Keyword recognition method, device, equipment and storage medium
CN110827798A (en) * 2019-11-12 2020-02-21 广州欢聊网络科技有限公司 Audio signal processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004560A (en) * 2010-12-01 2011-04-06 哈尔滨工业大学 User character recognition method and online once learning method in statement-level Chinese character input method and machine learning system
US20150356461A1 (en) * 2014-06-06 2015-12-10 Google Inc. Training distilled machine learning models
CN108182937A (en) * 2018-01-17 2018-06-19 出门问问信息科技有限公司 Keyword recognition method, device, equipment and storage medium
CN110827798A (en) * 2019-11-12 2020-02-21 广州欢聊网络科技有限公司 Audio signal processing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700775A (en) * 2020-12-29 2021-04-23 维沃移动通信有限公司 Method and device for updating voice receiving period and electronic equipment
CN113257242A (en) * 2021-04-06 2021-08-13 杭州远传新业科技有限公司 Voice broadcast suspension method, device, equipment and medium in self-service voice service

Also Published As

Publication number Publication date
CN112053686B (en) 2024-01-02

Similar Documents

Publication Publication Date Title
CN107928673B (en) Audio signal processing method, audio signal processing apparatus, storage medium, and computer device
US8793127B2 (en) Method and apparatus for automatically determining speaker characteristics for speech-directed advertising or other enhancement of speech-controlled devices or services
US8838452B2 (en) Effective audio segmentation and classification
US20130054236A1 (en) Method for the detection of speech segments
US9311930B2 (en) Audio based system and method for in-vehicle context classification
JP3913772B2 (en) Sound identification device
EP2560167A2 (en) Methods and apparatus for performing song detection in audio signal
CN110852215A (en) Multi-mode emotion recognition method and system and storage medium
Socoró et al. Development of an Anomalous Noise Event Detection Algorithm for dynamic road traffic noise mapping
Valero et al. Hierarchical classification of environmental noise sources considering the acoustic signature of vehicle pass-bys
CN112053686B (en) Audio interruption method, device and computer readable storage medium
CN112802498B (en) Voice detection method, device, computer equipment and storage medium
CN110299150A (en) A kind of real-time voice speaker separation method and system
Kiktova et al. Comparison of different feature types for acoustic event detection system
Jiang et al. Video segmentation with the assistance of audio content analysis
CN112466287A (en) Voice segmentation method and device and computer readable storage medium
KR102066718B1 (en) Acoustic Tunnel Accident Detection System
JP5105097B2 (en) Speech classification apparatus, speech classification method and program
CN112489692A (en) Voice endpoint detection method and device
CN113963719A (en) Deep learning-based sound classification method and apparatus, storage medium, and computer
CN114038487A (en) Audio extraction method, device, equipment and readable storage medium
EP3309777A1 (en) Device and method for audio frame processing
CN112992175B (en) Voice distinguishing method and voice recording device thereof
CN114329042A (en) Data processing method, device, equipment, storage medium and computer program product
EP3847646B1 (en) An audio processing apparatus and method for audio scene classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant