CN112447187A - Device and method for recognizing sound event - Google Patents
Device and method for recognizing sound event Download PDFInfo
- Publication number
- CN112447187A CN112447187A CN201910822623.XA CN201910822623A CN112447187A CN 112447187 A CN112447187 A CN 112447187A CN 201910822623 A CN201910822623 A CN 201910822623A CN 112447187 A CN112447187 A CN 112447187A
- Authority
- CN
- China
- Prior art keywords
- sound
- features
- present disclosure
- detector
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 46
- 230000005236 sound signal Effects 0.000 claims abstract description 19
- 238000010606 normalization Methods 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 abstract description 7
- 239000013598 vector Substances 0.000 description 11
- 238000010586 diagram Methods 0.000 description 4
- 230000010339 dilation Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Abstract
Disclosed is a sound event recognition apparatus including: an encoder configured to convert a sound signal having a plurality of sound events contained therein into features in a low-dimensional space; and a detector configured to map the features to a posterior probability for each sound event, wherein the detector performs a plurality of hole convolution operations on the features. The identification apparatus according to the present disclosure more efficiently performs automatic sound event detection in an end-to-end manner.
Description
Technical Field
The present disclosure relates to the field of sound processing, and in particular, to a device and a method for recognizing a sound event.
Background
This section provides background information related to the present disclosure, which is not necessarily prior art.
Sound carries a great deal of information about the daily environment and the physical events that occur therein. A person may perceive the sound scene (busy street, office, etc.) in which he is located and may identify individual sound events (car passing, footsteps, etc.). Automatic detection of these sound events has many applications in real life. For example, it is very useful for smart devices, robots, etc. in environmental awareness, and furthermore, automatic detection of sound events can help build a complete monitoring system when radar or video systems may not work in some situations.
Disclosure of Invention
This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.
An object of the present disclosure is to provide a sound event recognition apparatus and method that more efficiently performs automatic sound event detection through an end-to-end device. Unlike traditional models based on recurrent neural networks, the device according to the present disclosure is based entirely on a pure one-dimensional convolutional neural network model, which is easier to parallelize and performs better in certain environments. Also, the device according to the present disclosure is a complete end-to-end system, without the use of human intervention. The input is the original sound signal and the output is the posterior probability of the sound event.
According to an aspect of the present disclosure, there is provided a sound event recognition apparatus including: an encoder configured to convert a sound signal having a plurality of sound events contained therein into features in a low-dimensional space; and a detector configured to map the features to a posterior probability for each sound event, wherein the detector performs a plurality of hole convolution operations on the features.
According to another aspect of the present disclosure, there is provided a method for recognizing a sound event, including: converting an acoustic signal having a plurality of acoustic events therein to features in a low dimensional space; and mapping the features to a posterior probability for each sound event, wherein a plurality of hole convolution operations are performed on the features.
According to another aspect of the present disclosure, there is provided a program product comprising machine-readable instruction code stored therein, wherein the instruction code, when read and executed by a computer, is capable of causing the computer to perform a method of recognition of sound events according to the present disclosure.
According to another aspect of the present disclosure, a machine-readable storage medium is provided, having embodied thereon a program product according to the present disclosure.
Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
Drawings
The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure. In the drawings:
FIG. 1 shows a block diagram of a recognition device of a sound event according to one embodiment of the present disclosure;
FIG. 2 illustrates an overall framework of a recognition network of sound events according to one embodiment of the present disclosure;
FIG. 3 illustrates a flow diagram of a method of recognition of a sound event according to one embodiment of the present disclosure;
and
fig. 4 is a block diagram of an exemplary structure of a general-purpose personal computer in which a recognition apparatus of a sound event and a recognition method of a sound event according to an embodiment of the present disclosure can be implemented.
While the disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure. It is noted that throughout the several views, corresponding reference numerals indicate corresponding parts.
Detailed Description
Examples of the present disclosure will now be described more fully with reference to the accompanying drawings. The following description is merely exemplary in nature and is not intended to limit the present disclosure, application, or uses.
Example embodiments are provided so that this disclosure will be thorough, and will fully convey the scope to those skilled in the art. Numerous specific details are set forth such as examples of specific components, devices, and methods to provide a thorough understanding of embodiments of the present disclosure. It will be apparent to those skilled in the art that specific details need not be employed, that example embodiments may be embodied in many different forms, and that neither should be construed to limit the scope of the disclosure. In certain example embodiments, well-known processes, well-known structures, and well-known technologies are not described in detail.
According to an embodiment of the present disclosure, there is provided a sound event recognition apparatus including: an encoder configured to convert a sound signal having a plurality of sound events contained therein into features in a low-dimensional space; and a detector configured to map the features to a posterior probability for each sound event, wherein the detector performs a plurality of hole convolution operations on the features.
As illustrated in fig. 1, the apparatus 100 for recognizing a sound event according to the present disclosure may include an encoder 101 and a detector 102.
The encoder 101 may convert a sound signal containing a plurality of sound events into features in a low-dimensional space. Such features may be used to more efficiently extract the task of identifying sound events. Here, it should be apparent to those skilled in the art that the plurality of sound events may be sound events including two or more different types (e.g., sounds of pedestrian steps and car horns on the street, etc.). The encoder 101 may convert the signals containing these sound events into feature vectors in a low-dimensional space.
Next, the detector 102 may map the feature vectors in the low dimensional space to a posterior probability of each sound event, e.g., a posterior probability of a pedestrian step or a car horn sound on the street for each frame. According to one embodiment of the present disclosure, these a posteriori probabilities may represent the type, start and end times, etc. of the sound event. Here, it should be apparent to those skilled in the art that the above events are merely exemplary, and the present disclosure is not limited thereto.
According to one embodiment of the present disclosure, the detector 102 may perform a plurality of hole convolution operations on the feature vector to obtain a posterior probability for each sound event. Hole convolution, also known as dilation convolution or dilation convolution, introduces a new parameter called the "dilation rate" to the convolution layer, which defines the spacing of values when the convolution kernel processes data. According to one embodiment of the present disclosure, the detector 102 may perform a cubic hole convolution operation on the feature vector to provide a larger receptive field (receptive field). In the convolutional neural network CNN, determining the size (mapping) of an area of an input layer corresponding to one element in an output result of a certain layer is called a receptive field. In other words, a larger receptive field means a larger amount of information. Here, it should be apparent to those skilled in the art that the present disclosure performs the cubic hole convolution operation only by way of example, and the present disclosure is not limited thereto. Those skilled in the art can certainly perform more or less hole convolution operations according to the requirements of actual operation amount and the like.
According to an embodiment of the present disclosure, the encoder 101 may perform a one-dimensional convolution operation, a ReLU operation with parameters, a normalization operation, and a 1 × 1 convolution operation on the sound signal to obtain the feature vector. The normalization operation is to perform normalization processing on the feature vector so as to improve the training speed. A 1 x 1 convolution operation may be used to modify the size of the last dimension of the feature vector. That is, the feature vectors processed through the 1 × 1 convolution operation can maintain a uniform size. Here, it should be apparent to those skilled in the art that the above-described operations are merely exemplary, and the present disclosure is not limited thereto. Those skilled in the art can add, delete or replace the operations therein according to actual needs.
According to an embodiment of the present disclosure, the detector 102 may further perform a 1 × 1 convolution operation, a full join operation, and a Softmax operation to obtain the a posteriori probability after performing the hole convolution operation on the feature a plurality of times. Here, it should be apparent to those skilled in the art that the above-described operations are merely exemplary, and the present disclosure is not limited thereto. Those skilled in the art can add, delete or replace the operations therein according to actual needs.
According to an embodiment of the present disclosure, the detector 102 may further perform a 1 × 1 convolution operation, a ReLU operation with parameters, a normalization operation, and a depth convolution operation in the process of performing each hole convolution operation. Here, it should be apparent to those skilled in the art that the above-described operations are merely exemplary, and the present disclosure is not limited thereto. Those skilled in the art can add, delete or replace the operations therein according to actual needs.
For example, as shown in fig. 2, an input audio signal containing a plurality of audio events may be subjected to a one-dimensional convolution operation, a parametric ReLU operation, a normalization operation, and a 1 × 1 convolution operation to obtain a feature vector. Next, the obtained feature vector is subjected to a hole convolution operation three times, and then subjected to a convolution operation of 1 × 1, a full join operation, and a Softmax operation, so as to obtain a posterior probability.
Referring to fig. 2 again, a process of the hole convolution operation is specifically illustrated. Wherein each circle represents a time point, i.e., a time sequence, from left to right, and each convolutional layer has an expansion rate. The expansion ratio rises exponentially to ensure that the convolutional layer can obtain information for a sufficient length of time. For example, fig. 2 schematically shows four convolutional layers, where the expansion rate d of the first layer is 1, the expansion rate d of the second layer is 2, the expansion rate d of the third layer is 4, and the expansion rate d of the fourth layer is 8. The dilation rate represents the amount of information of the feature vector on a time scale. Here, it should be apparent to those skilled in the art that the convolutional layer shown in fig. 2 of the present disclosure is merely an example, and the present disclosure is not limited thereto.
Then, according to an embodiment of the present disclosure, the 1 × 1 convolution operation, the parametric ReLU operation, the normalization operation, and the depth convolution operation may be further performed in the process of the hole convolution operation.
With the recognition device for the sound event according to the present disclosure, automatic sound event detection can be performed more effectively due to the end-to-end framework thereof, and the multiple hole convolution operations adopted therein can increase more information amount within a large time scale, thereby achieving better detection results.
According to the recognition apparatus of voice events of one embodiment of the present disclosure, in the training phase, the encoder 101 and the detector 102 may be trained using voice data with event tags. In the evaluation phase, the trained encoder 101 and detector 102 may be used to detect each event in the input mixed sound and evaluate the performance of the trained encoder 101 and detector 102.
A recognition method for a sound event according to an embodiment of the present disclosure will be described below with reference to fig. 3. As shown in fig. 3, the recognition method for a sound event according to an embodiment of the present disclosure starts at step S310.
In step S510, a sound signal containing therein a plurality of sound events is converted into features in a low-dimensional space.
Next, in step S320, the features are mapped to a posterior probability of each sound event.
In step S320, a plurality of hole convolution operations are performed on the features.
The recognition method for sound events according to one embodiment of the present disclosure further includes the step of performing a one-dimensional convolution operation, a ReLU operation with parameters, a normalization operation, and a 1 × 1 convolution operation on the sound signal to obtain the feature.
The recognition method for a sound event according to one embodiment of the present disclosure further includes the step of performing a 1 × 1 convolution operation, a full join operation, and a Softmax operation after performing a plurality of hole convolution operations on the feature to obtain the a posteriori probability.
The recognition method for sound events according to one embodiment of the present disclosure further includes the step of performing a hole convolution operation 3 times on the features.
The recognition method for a sound event according to one embodiment of the present disclosure further includes the step of performing a 1 × 1 convolution operation, a ReLU operation with parameters, a normalization operation, and a depth convolution operation in the process of performing each hole convolution operation.
A recognition method for sound events according to one embodiment of the present disclosure further includes the step of training the encoder and the detector using sound data having an event tag.
The method for identifying a sound event according to one embodiment of the present disclosure, wherein the feature is a feature based on each frame of the sound signal.
With the adoption of the identification method for the sound event, the automatic sound event detection can be more effectively carried out due to the end-to-end framework, and the multiple-time hole convolution operation adopted in the method can increase more information quantity in a large-scale time scale, so that a better detection result is realized.
Various embodiments of the above-described steps of the recognition method for a sound event according to an embodiment of the present disclosure have been described in detail above, and a description thereof will not be repeated.
It is apparent that the respective operational procedures of the recognition method for a sound event according to the present disclosure can be implemented in the form of computer-executable programs stored in various machine-readable storage media.
Moreover, the object of the present disclosure can also be achieved by: a storage medium storing the above executable program code is directly or indirectly supplied to a system or an apparatus, and a computer or a Central Processing Unit (CPU) in the system or the apparatus reads out and executes the program code. At this time, as long as the system or the apparatus has a function of executing a program, the embodiments of the present disclosure are not limited to the program, and the program may also be in any form, for example, an object program, a program executed by an interpreter, a script program provided to an operating system, or the like.
Such machine-readable storage media include, but are not limited to: various memories and storage units, semiconductor devices, magnetic disk units such as optical, magnetic, and magneto-optical disks, and other media suitable for storing information, etc.
In addition, the computer can also implement the technical solution of the present disclosure by connecting to a corresponding website on the internet, downloading and installing the computer program code according to the present disclosure into the computer and then executing the program.
Fig. 4 is a block diagram of an exemplary structure of a general-purpose personal computer 1300 in which a recognition apparatus and a recognition method for a sound event according to an embodiment of the present disclosure may be implemented.
As shown in fig. 4, the CPU 1301 executes various processes in accordance with a program stored in a Read Only Memory (ROM)1302 or a program loaded from a storage section 1308 to a Random Access Memory (RAM) 1303. In the RAM 1303, data necessary when the CPU 1301 executes various processes and the like is also stored as necessary. The CPU 1301, the ROM 1302, and the RAM 1303 are connected to each other via a bus 1304. An input/output interface 1305 is also connected to bus 1304.
The following components are connected to the input/output interface 1305: an input portion 1306 (including a keyboard, a mouse, and the like), an output portion 1307 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like), a storage portion 1308 (including a hard disk, and the like), a communication portion 1309 (including a network interface card such as a LAN card, a modem, and the like). The communication section 1309 performs communication processing via a network such as the internet. A driver 1310 may also be connected to the input/output interface 1305, as desired. A removable medium 1311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1310 as needed, so that a computer program read out therefrom is installed in the storage portion 1308 as needed.
In the case where the above-described series of processes is realized by software, a program constituting the software is installed from a network such as the internet or a storage medium such as the removable medium 1311.
It should be understood by those skilled in the art that such a storage medium is not limited to the removable medium 1311 shown in fig. 4, in which the program is stored, distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 1311 include a magnetic disk (including a floppy disk (registered trademark)), an optical disk (including a compact disc read only memory (CD-ROM) and a Digital Versatile Disc (DVD)), a magneto-optical disk (including a Mini Disk (MD) (registered trademark)), and a semiconductor memory. Alternatively, the storage medium may be the ROM 1302, a hard disk contained in the storage section 1308, or the like, in which programs are stored and which are distributed to users together with the apparatus containing them.
In the systems and methods of the present disclosure, it is apparent that individual components or steps may be broken down and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
Although the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, it should be understood that the above-described embodiments are merely illustrative of the present disclosure and do not constitute a limitation of the present disclosure. It will be apparent to those skilled in the art that various modifications and variations can be made in the above-described embodiments without departing from the spirit and scope of the disclosure. Accordingly, the scope of the disclosure is to be defined only by the claims appended hereto, and by their equivalents.
With respect to the embodiments including the above embodiments, the following remarks are also disclosed:
an encoder configured to convert a sound signal having a plurality of sound events contained therein into features in a low-dimensional space; and
a detector configured to map the features to a posterior probability for each sound event,
wherein the detector performs a plurality of hole convolution operations on the feature.
Supplementary note 2. the apparatus according to supplementary note 1, wherein the encoder performs a one-dimensional convolution operation, a ReLU operation with parameter, a normalization operation, and a 1 × 1 convolution operation on the sound signal to obtain the feature.
Supplementary 3. the apparatus according to supplementary 2, wherein the detector further performs a 1 × 1 convolution operation, a full join operation, and a Softmax operation after performing a plurality of hole convolution operations on the feature to obtain the a posteriori probability.
Supplementary note 5. the apparatus according to supplementary note 4, wherein the detector further performs a 1 × 1 convolution operation, a ReLU operation with parameter, a normalization operation, and a depth convolution operation in the process of performing each hole convolution operation.
Supplementary note 6. the apparatus according to supplementary note 1, wherein the encoder and the detector are trained using sound data with an event tag.
Supplementary note 7 the apparatus according to supplementary note 1, wherein the feature is a feature based on each frame of the sound signal.
converting an acoustic signal having a plurality of acoustic events therein to features in a low dimensional space; and
mapping the features to a posterior probability for each sound event,
wherein a plurality of hole convolution operations are performed on the features.
Supplementary note 9. the method according to supplementary note 8, further comprising performing a one-dimensional convolution operation, a ReLU operation with parameters, a normalization operation, and a 1 × 1 convolution operation on the sound signal to obtain the feature.
Supplementary notes 10 the method of supplementary notes 9, further comprising, after performing a plurality of hole convolution operations on the features, further performing a 1 × 1 convolution operation, a full join operation, and a Softmax operation to obtain the a posteriori probability.
Reference 11. the method according to any one of the references 8 to 10, wherein the feature is subjected to a hole convolution operation 3 times.
Reference 12. the method of reference 11, further comprising performing a 1 × 1 convolution operation, a parameterized ReLU operation, a normalization operation, and a depth convolution operation during each hole convolution operation.
Supplementary note 13 the method according to supplementary note 8, wherein a sound signal having a plurality of sound events therein is converted into features in a low dimensional space by an encoder, the features are mapped to a posterior probability of each sound event by a detector,
the method further includes training the encoder and the detector using sound data having an event tag.
Supplementary note 14. the method according to supplementary note 8, wherein the feature is a feature based on each frame of the sound signal.
Appendix 15. a program product comprising machine-readable instruction code stored therein, wherein the instruction code, when read and executed by a computer, is capable of causing the computer to perform a method according to any of appendixes 8-14.
Claims (9)
1. An apparatus for recognition of a sound event, comprising:
an encoder configured to convert a sound signal having a plurality of sound events contained therein into features in a low-dimensional space; and
a detector configured to map the features to a posterior probability for each sound event,
wherein the detector performs a plurality of hole convolution operations on the feature.
2. The apparatus of claim 1, wherein said encoder performs a one-dimensional convolution operation, a parametric ReLU operation, a normalization operation, and a 1 x 1 convolution operation on the sound signal to obtain the feature.
3. The apparatus of claim 2, wherein the detector further performs a 1 x 1 convolution operation, a full join operation, and a Softmax operation to obtain the a posteriori probability after performing a plurality of hole convolution operations on the feature.
4. The apparatus of any of claims 1 to 3, wherein the detector performs 3 hole convolution operations on the feature.
5. The apparatus of claim 4, wherein the detector further performs a 1 x 1 convolution operation, a parameterized ReLU operation, a normalization operation, and a depth convolution operation in the course of performing each hole convolution operation.
6. The apparatus of claim 1, wherein the encoder and the detector are trained using sound data with event tags.
7. The apparatus of claim 1, wherein the feature is a feature based on each frame of a sound signal.
8. A method of recognition of a sound event, comprising:
converting an acoustic signal having a plurality of acoustic events therein to features in a low dimensional space; and
mapping the features to a posterior probability for each sound event,
wherein a plurality of hole convolution operations are performed on the features.
9. A machine-readable storage medium having a program product embodied thereon, the program product comprising machine-readable instruction code stored therein, wherein the instruction code, when read and executed by a computer, is capable of causing the computer to perform the method of claim 8.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910822623.XA CN112447187A (en) | 2019-09-02 | 2019-09-02 | Device and method for recognizing sound event |
JP2020104793A JP2021039332A (en) | 2019-09-02 | 2020-06-17 | Device and method for recognizing voice event |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910822623.XA CN112447187A (en) | 2019-09-02 | 2019-09-02 | Device and method for recognizing sound event |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112447187A true CN112447187A (en) | 2021-03-05 |
Family
ID=74734303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910822623.XA Pending CN112447187A (en) | 2019-09-02 | 2019-09-02 | Device and method for recognizing sound event |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2021039332A (en) |
CN (1) | CN112447187A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107545890A (en) * | 2017-08-31 | 2018-01-05 | 桂林电子科技大学 | A kind of sound event recognition method |
CN108538311A (en) * | 2018-04-13 | 2018-09-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio frequency classification method, device and computer readable storage medium |
CN109065030A (en) * | 2018-08-01 | 2018-12-21 | 上海大学 | Ambient sound recognition methods and system based on convolutional neural networks |
US20190043516A1 (en) * | 2018-06-22 | 2019-02-07 | Intel Corporation | Neural network for speech denoising trained with deep feature losses |
CN109767785A (en) * | 2019-03-06 | 2019-05-17 | 河北工业大学 | Ambient noise method for identifying and classifying based on convolutional neural networks |
-
2019
- 2019-09-02 CN CN201910822623.XA patent/CN112447187A/en active Pending
-
2020
- 2020-06-17 JP JP2020104793A patent/JP2021039332A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107545890A (en) * | 2017-08-31 | 2018-01-05 | 桂林电子科技大学 | A kind of sound event recognition method |
CN108538311A (en) * | 2018-04-13 | 2018-09-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio frequency classification method, device and computer readable storage medium |
US20190043516A1 (en) * | 2018-06-22 | 2019-02-07 | Intel Corporation | Neural network for speech denoising trained with deep feature losses |
CN109065030A (en) * | 2018-08-01 | 2018-12-21 | 上海大学 | Ambient sound recognition methods and system based on convolutional neural networks |
CN109767785A (en) * | 2019-03-06 | 2019-05-17 | 河北工业大学 | Ambient noise method for identifying and classifying based on convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
YAN CHEN等: "Environmental sound classification with dilated convolutions", APPLIED ACOUSTICS, vol. 148, pages 123 - 132, XP093068656, DOI: 10.1016/j.apacoust.2018.12.019 * |
Also Published As
Publication number | Publication date |
---|---|
JP2021039332A (en) | 2021-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106887225B (en) | Acoustic feature extraction method and device based on convolutional neural network and terminal equipment | |
CN108989882B (en) | Method and apparatus for outputting music pieces in video | |
WO2023273628A1 (en) | Video loop recognition method and apparatus, computer device, and storage medium | |
CN111783712A (en) | Video processing method, device, equipment and medium | |
CN111276119A (en) | Voice generation method and system and computer equipment | |
CN111816170B (en) | Training of audio classification model and garbage audio recognition method and device | |
Dogan et al. | A novel ternary and signum kernelled linear hexadecimal pattern and hybrid feature selection based environmental sound classification method | |
JP2002041464A (en) | Method and device for identifying end-user transaction | |
Zhang et al. | Learning audio sequence representations for acoustic event classification | |
CN113409803B (en) | Voice signal processing method, device, storage medium and equipment | |
CN114818864A (en) | Gesture recognition method based on small samples | |
CN107578774B (en) | Method and system for facilitating detection of time series patterns | |
CN112447187A (en) | Device and method for recognizing sound event | |
CN116737895A (en) | Data processing method and related equipment | |
CN116978370A (en) | Speech processing method, device, computer equipment and storage medium | |
CN113298822B (en) | Point cloud data selection method and device, equipment and storage medium | |
CN113889086A (en) | Training method of voice recognition model, voice recognition method and related device | |
CN109671440B (en) | Method, device, server and storage medium for simulating audio distortion | |
CN113570044A (en) | Customer loss analysis model training method and device | |
CN112905792A (en) | Text clustering method, device and equipment based on non-text scene and storage medium | |
CN112017638A (en) | Voice semantic recognition model construction method, semantic recognition method, device and equipment | |
CN115910042B (en) | Method and device for identifying information type of formatted audio file | |
CN113032401B (en) | Big data processing method and device based on special-shaped structure tree and related equipment | |
CN113536078B (en) | Method, apparatus and computer storage medium for screening data | |
CN116206121A (en) | Method and equipment for constructing welding data analysis result image interpretation model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |