CN115148220A - Audio detection system and audio detection method - Google Patents

Audio detection system and audio detection method Download PDF

Info

Publication number
CN115148220A
CN115148220A CN202110352178.2A CN202110352178A CN115148220A CN 115148220 A CN115148220 A CN 115148220A CN 202110352178 A CN202110352178 A CN 202110352178A CN 115148220 A CN115148220 A CN 115148220A
Authority
CN
China
Prior art keywords
audio
audio data
audio detection
original
format conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110352178.2A
Other languages
Chinese (zh)
Inventor
刘锴
宋宁
徐庆嵩
杜金凤
詹宁斯·格兰特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gowin Semiconductor Corp
Original Assignee
Gowin Semiconductor Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gowin Semiconductor Corp filed Critical Gowin Semiconductor Corp
Priority to CN202110352178.2A priority Critical patent/CN115148220A/en
Publication of CN115148220A publication Critical patent/CN115148220A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding

Abstract

The embodiment of the application discloses an audio detection system and an audio detection method. The audio detection system comprises a Micro Control Unit (MCU), a programmable logic device and a shared memory; the shared memory is set to store original audio data; the micro control unit MCU is set to acquire the original audio data from the shared memory and perform format conversion on the original audio data according to a preset conversion rule; the programmable logic device is set to collect the original audio data and store the original audio data in the shared memory; and detecting the audio data after format conversion according to a pre-trained AI audio detection model to determine an audio detection result. The embodiment of the application has the advantages of low power consumption, low time delay, low cost, easy expansion and the like, and is suitable for being used in edge-end equipment.

Description

Audio detection system and audio detection method
Technical Field
The embodiment of the application relates to the field of artificial intelligence, in particular to an audio detection system and an audio detection method.
Background
With the development and wide application of AI (Artificial Intelligence) technology, AI computation under different scenes poses more and more challenges. The application of AI computing gradually expands from the cloud at the beginning to the edge devices.
At present, there are three general methods for detecting audio frequency:
the first is to analyze and process audio sample data using a complex audio processing algorithm to calculate the content of the audio data.
The second is to push the content of audio data by means of powerful hardware AI computation capability based on dedicated hardware such as an AI server or an AI processor.
And the third method is to reason and predict the content of the audio data based on the embedded AI algorithm of the high-end edge device chip.
The first two methods are not suitable for the edge device, the third method is often to use the expensive high-end chip, and the cost is not suitable for the edge device which is sought to be small and cheap.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein and is not intended to limit the scope of the appended claims.
An embodiment of the present disclosure provides an audio detection system, including,
the system comprises a Micro Control Unit (MCU), a programmable logic device and a shared memory;
the shared memory is set to store original audio data;
the micro control unit MCU is set to acquire the original audio data from the shared memory and perform format conversion on the original audio data according to a preset conversion rule;
the programmable logic device is set to collect the original audio data and store the original audio data in the shared memory; and detecting the audio data after format conversion according to a pre-trained AI audio detection model to determine an audio detection result.
The embodiment of the present disclosure further provides an audio detection method applied in the audio detection system, including,
the programmable logic device collects original audio data and stores the data in a shared memory;
the micro control unit MCU acquires the original audio data from the shared memory and performs format conversion on the original audio data according to a preset conversion rule;
and the programmable logic device detects the audio data after format conversion according to a pre-trained AI audio detection model, and determines an audio detection result.
The artificial intelligence system of the embodiment of the application can jointly complete the function of using the AI model to perform voice detection by mutually matching the MCU and the programmable logic device, so that the respective advantages of the MCU and the programmable logic device can be fully utilized, only less logic resources and limited data computing capacity are needed, the detection of the acquired audio data can be realized, and the artificial intelligence system has the advantages of low power consumption, low time delay, low cost, high performance, easiness in expansion and the like, and is suitable for being used in edge-end equipment.
Other aspects will be apparent upon reading and understanding the attached drawings and detailed description.
Drawings
The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.
FIG. 1 is a schematic diagram of an audio detection system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another audio detection system in an embodiment of the present application;
FIG. 3 is a schematic diagram of another audio detection system in an embodiment of the present application;
FIG. 4 is a flowchart illustrating an audio detection method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an exemplary overall audio detection system;
FIG. 6 is a schematic diagram of the structure of an exemplary SoC for audio detection;
FIG. 7 is a schematic flow chart of audio format conversion in an example;
fig. 8 is a schematic flow chart of AI audio detection model inference in an example.
Detailed Description
The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with, or instead of, any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in the present application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the appended claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the appended claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Accordingly, the particular order of the steps set forth in the specification should not be construed as limitations on the claims appended hereto. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
An embodiment of the present application provides an audio detection system, as shown in fig. 1, including: a micro control unit MCU11, a programmable logic device 12 and a shared memory 13;
the MCU11 is set to acquire original audio data from the shared memory 13 and perform format conversion on the original audio data according to a preset conversion rule;
a programmable logic device 12 configured to collect the original audio data and store the collected original audio data in the shared memory 13; the method also comprises the steps of detecting the audio data after format conversion according to a pre-trained AI audio detection model, and determining an audio detection result;
a shared memory 13 arranged to store raw audio data collected by the programmable logic device 12.
In some exemplary embodiments, as shown in fig. 2, the programmable logic device 12 includes: an audio acquisition module 1201 and an AI audio detection model inference module 1202;
the audio capture module 1201 is configured to capture the raw audio data input from a microphone device. The audio acquisition module 1201 receives a sound signal from the microphone device through an input port thereof, acquires original audio data, and stores the original audio data in the shared memory 13;
the AI audio detection model inference module 1202 is configured to sequentially execute the operational modes corresponding to the pre-trained AI audio detection models according to the format-converted audio data to determine the audio detection result.
In some exemplary embodiments, the MCU11 includes an audio format conversion module 1101, the audio format conversion module 1101 is disposed in a core of the MCU;
the audio format conversion module 1101 is configured to convert the original audio data into a spectrogram according to a preset conversion rule, and the spectrogram is used as the audio data after the format conversion.
In some exemplary embodiments, the raw audio data collected by the audio collection module 1201, which cannot be directly input to the AI audio detection model inference module 1202, needs to be converted into a Spectrogram (Spectrogram) format and then input to the AI audio detection model inference module 1202. The format conversion step is completed by an audio format conversion module 1101 in the MCU, which calculates and converts the audio format through the data processing capability of the MCU core.
In some exemplary embodiments, the audio format conversion module 1101 converts the raw audio data into a spectrogram according to a preset conversion rule, including:
sequentially acquiring corresponding original audio data fragments from the original audio data according to a first preset time length, and processing each original audio data fragment according to the following steps:
performing fast Fourier transform calculation according to the original audio data fragments, and determining a first preset number of transformed audio data;
and calculating the average value of the converted audio data, determining a second preset number of average audio data, and storing the average audio data into the spectrogram corresponding to the original audio data fragment.
In some exemplary embodiments, the first predetermined duration is 30 milliseconds, the first predetermined number is 256, and the second predetermined number is 43. Other values may be preset by those skilled in the art according to application needs, and are not limited to the examples of the embodiments of the present disclosure.
In some exemplary embodiments, the audio format conversion module 1101 converts the raw audio data into the spectrogram according to a preset conversion rule, including:
and taking the read original audio data with a preset determined time length as a segment, performing fast Fourier transform on each original audio data segment, then performing average value calculation, outputting audio data, storing the audio data into a spectrogram, and continuously processing all the original audio data segments according to the above mode until all the original audio data segments are processed. For example, if the segment is every 30 milliseconds, 256 audio data are output after the fast fourier transform calculation, and then the average value calculation is performed, 43 audio data are output and stored in the spectrogram.
The target format obtained by performing audio data conversion is determined by the data input requirement of the selected AI audio detection model, different AI audio detection models are used for inference prediction after being trained, and the conversion result of the audio format conversion module 1101 is correspondingly adjusted if the required input audio data formats are different. And is not limited to the spectrograms illustrated in the embodiments of the present disclosure.
In some exemplary embodiments, the programmable logic device 12 is a field programmable gate array FPGA; based on the programmable characteristic of the FPGA, the system has good expansibility. The audio acquisition module 1201 and the AI audio detection model inference module 1202 are both arranged in the core of the FPGA.
In some exemplary embodiments, the shared memory 13 is connected to the MCU core through a bus system, and the MCU core may read raw audio data from the shared memory 13 in real time, load the raw audio data into a data memory of the MCU core, input the audio format conversion module 1101, and perform audio mode conversion. The shared memory 13 is shared by the MCU kernel and the FPGA kernel, and the MCU kernel and the FPGA kernel can directly access and read and write data in real time.
In some exemplary embodiments, the operation mode corresponding to the pre-trained AI audio detection model includes: deep convolution operation, full connection operation and flexible maximum transmission operation. That is, the AI audio detection model inference module 1202 sequentially performs a deep convolution operation, a full-link operation, and a flexible maximum transmission operation on the basis of the format-converted audio data, performs a computational inference on the basis of a pre-trained AI audio detection model on the basis of the input format-converted audio data, and predicts the content of the audio data, thereby completing audio detection.
In some exemplary embodiments, the AI audio detection model is trained in a device or system other than the audio detection system. The pre-trained AI audio detection model is not limited to be from the cloud, and for example, the pre-trained AI audio detection model may be input to the audio detection system after being trained or downloaded by other devices, or may be stored at a designated location for the audio detection system to read by itself. The AI audio detection model is an AI model which is trained by learning a large amount of sample audio data in a cloud or other external equipment (or system) and can be accurately used for audio detection.
In some exemplary embodiments, the AI audio detection model may include: the system comprises a plurality of layers of operators such as Reshape, depthWiseConv2D, fullyConnected and SoftMax, an audio input data layer and a detection conclusion output data layer (reasoning and predicting result output). In some exemplary embodiments, a tensorflow is adopted at the cloud, and the AI audio detection model is trained by using a large amount of sample audio data, so as to obtain a trained AI audio detection model. Wherein, the sample audio data is the audio data marked with the audio characteristics.
In the audio detection scheme provided in the embodiment of the present disclosure, the AI audio detection model may also adopt other AI models in the related art, and is not limited to the models illustrated in the embodiment of the present disclosure. According to the description of the embodiment of the present disclosure, when different AI audio detection models are selected, the training mode, the sample audio data, and/or the operation mode to be executed in the AI audio detection model inference module of the programmable logic device 12 may be adjusted correspondingly.
In some exemplary embodiments, the programmable logic device 12 is further configured to update the pre-trained AI audio detection model. Along with improvement and upgrading of the detection function or performance, the AI audio detection model can be continuously learned/trained by using more or new sample audio data in external equipment or a cloud system, and is further optimized to improve the detection accuracy. The retrained AI audio detection model can be updated to the programmable logic device 12 to implement the function \ performance upgrade of the audio detection system.
In some exemplary embodiments, as shown in fig. 3, the MCU11 further includes a detection result obtaining module 1102 configured to obtain the audio detection result determined by the programmable logic device 12. The audio detection result may be stored; or, provided to an application in the system-on-chip; or, output to an external system.
In some exemplary embodiments, the detection result obtaining module 1102 obtains from the AI audio detection model inference module 1202; alternatively, the AI audio detection model inference module 1202 may obtain the AI audio detection model from the shared memory 13 after storing the AI audio detection model inference module.
In some exemplary embodiments, the detection result obtaining module 1102 is further configured to output the audio detection result to an external system, or provide an interface for the external system to obtain.
In some exemplary embodiments, the shared memory 13 is connected to the core of the MCU through a bus system.
In some exemplary embodiments, the MCU is a Cortex-M series of processors.
In some exemplary embodiments, the programmable logic device 1 is a low and medium-side FPGA.
It can be seen that the scheme of the audio detection System provided by the embodiment of the disclosure can be implemented On a lightweight System On Chip (soc) Chip of a middle and low-end FPGA and Cortex-M series processor with only a small amount of logic resources and limited data calculation capability, has the advantages of low power consumption, low time delay, low cost, easy expansion and the like, and is suitable for being used in edge-end mobile devices.
In some exemplary embodiments, the low-end and mid-end FPGAs are low-power, low-cost FPGA products that contain a small set of necessary logic resources. In some exemplary embodiments, the low-end FPGA may be selected as the high cloud
Figure BDA0003002651170000071
Semiconductor GWINSR-4C series FPGA productAnd (5) preparing the product.
An embodiment of the present application further provides an audio detection method, which is applied to the audio detection system according to any of the above embodiments, where the method is shown in fig. 4, and includes:
step 401, a programmable logic device collects original audio data and stores the data in a shared memory;
step 402, the MCU acquires the original audio data from the shared memory and performs format conversion on the original audio data according to a preset conversion rule;
and step 403, the programmable logic device detects the format-converted audio data according to the pre-trained AI audio detection model, and determines an audio detection result.
In some exemplary embodiments, step 403 comprises:
and sequentially executing operation modes corresponding to the pre-trained AI audio detection model according to the audio data after format conversion so as to determine the audio detection result.
In some exemplary embodiments, the operation mode corresponding to the pre-trained AI audio detection model includes: deep convolution operation, full connection operation and flexible maximum transmission operation.
In some exemplary embodiments, step 402 comprises: and converting the original audio data into a spectrogram according to a preset conversion rule to serve as the audio data after format conversion.
In some exemplary embodiments, the converting step correspondingly includes:
sequentially acquiring corresponding original audio data fragments from the original audio data according to a first preset time length, and processing each original audio data fragment according to the following steps:
performing fast Fourier transform calculation according to the original audio data fragments, and determining a first preset number of transformed audio data;
and carrying out average value calculation on the transformed audio data, determining a second preset number of average audio data, and storing the average audio data into a spectrogram corresponding to the original audio data fragment.
In some exemplary embodiments, the pre-trained AI audio detection model is trained in a device or system other than the audio detection system.
In some exemplary embodiments, the method further comprises: and updating the pre-trained AI audio detection model.
In some exemplary embodiments, other method implementation details can be found in the previous embodiments.
The above embodiments disclosed herein are illustrated below by way of an example.
The example is a voice detection system that is based on a lightweight MCU with low-end low-power FPGA SoC implementation and can reason and predict audio data.
In this example, the overall process of performing voice detection is as shown in fig. 5, and an AI audio detection model is trained at the cloud according to a large amount of audio sample audio data, so as to obtain a model for performing audio detection (i.e., a trained AI audio detection model). The trained AI audio detection model is downloaded to a system on chip and used when audio detection is needed.
When audio detection is performed, the data flow path is as shown in fig. 5:
inputting audio signals through microphone equipment, acquiring the audio signals input from the microphone equipment by an audio acquisition module to obtain original audio data, acquiring the original audio data by an audio format conversion module, converting the original audio data into a target format, and performing inference prediction on the converted audio data by an AI audio detection model inference module according to a pre-trained AI audio detection model to determine a detection result; and further outputting the detection result to other applications or external systems.
In this example, the structure of the audio detection system (SoC) is shown in fig. 6, where the SoC includes an MCU core, an FPGA core, and a shared memory, and acquires audio from a microphone device through an audio acquisition module. The MCU kernel is connected with the shared memory through a system bus, and the FPGA kernel is connected with the shared memory through a parallel bus. The MCU kernel comprises an audio format conversion module. The FPGA kernel comprises the audio acquisition module and an AI audio detection model reasoning module.
The shared memory in the chip is shared by the MCU kernel and the FPGA kernel, and the MCU kernel and the FPGA kernel can directly access and read and write data in real time.
Three modules in this example are described separately below:
(1) Audio acquisition module
The device is used for collecting audio data, is positioned in an FPGA kernel and is realized by using FPGA logic resources.
When the audio system is started, the audio acquisition module acquires original audio data, and the original audio data is input and stored into the on-chip shared memory through the FPGA port. Meanwhile, the shared memory is connected with the MCU kernel through a bus system, and the MCU kernel can read original audio data from the shared memory in real time, load the original audio data into a data memory of the MCU kernel, input an audio format conversion module and execute audio mode conversion.
(2) Audio format conversion module
The audio data read from the shared memory by the MCU kernel is original audio data collected by the audio collection module, cannot be directly input as an AI audio detection model inference module, needs to be converted into an audio format, is converted into a Spectrogram (Spectrogram), and is input into the AI audio detection model inference module.
The audio format conversion module is positioned in the MCU kernel and calculates and converts the audio format through the data processing capacity of the MCU kernel. And processing the output audio data by taking the original audio data read by the MCU kernel as a segment every fixed time, then performing average value calculation, storing the output audio data into a spectrogram, and continuously processing all the original audio data according to the above mode until all the original audio data are processed.
In some exemplary embodiments, as shown in fig. 7, 256 pieces of audio data are output as one piece every 30 milliseconds by fast fourier transform calculation; then, the average value calculation is carried out, 43 pieces of audio data are output and stored in a spectrogram. And all the data after audio conversion are sequentially stored in the spectrogram. And inputting the audio data in the spectrogram into an AI audio detection model reasoning module, and executing the reasoning and prediction of the AI audio detection. The duration of each audio segment is preset, the number of audio data obtained by fast fourier transform and the number of audio data obtained by calculating the average value may be adjusted according to the application requirement, and the present disclosure is not limited to the illustrated example.
(3) AI audio detection model reasoning module
And the spectrogram audio data output by the audio format conversion module is used as the input of the AI audio detection model reasoning module. The AI audio detection model reasoning module is positioned in an FPGA kernel, the convolution operation is realized by using FPGA logic resources, and the reasoning and prediction of the AI audio detection model are accelerated by the powerful hardware parallel processing capacity of the FPGA.
The AI audio detection model inference module includes deep convolution operation, full join operation, and flexible maximum transmission operation, which are in one-to-one correspondence with the AI audio detection model, and is used to calculate all operation modes in the AI audio detection model, and the calculation process is shown in fig. 8.
At the high in the clouds, through machine learning, this AI audio frequency detects a large amount of audio data of model learning, trains out the AI model that can accurately be used for audio frequency to detect. And the AI audio detection model reasoning module calculates and reasons the input Spectrogram audio data based on the trained model, and predicts the content of the audio data so as to finish audio detection.
It can be seen that the light-weight AI audio detection system provided by this example uses an MCU + FPGA SoC chip with very small logic resources and low cost as a carrier to implement an AI audio detection system. The system has the characteristics of low power consumption, low time delay, low cost and high performance, is suitable for the application field of the mobile equipment at the edge end, expands the application range of AI and reduces the complexity of AI and audio detection.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (10)

1. An audio detection system, comprising,
the system comprises a Micro Control Unit (MCU), a programmable logic device and a shared memory;
the shared memory is set to store original audio data;
the micro control unit MCU is set to acquire the original audio data from the shared memory and perform format conversion on the original audio data according to a preset conversion rule;
the programmable logic device is set to collect the original audio data and store the original audio data in the shared memory; and detecting the audio data after format conversion according to a pre-trained AI audio detection model to determine an audio detection result.
2. The audio detection system of claim 1,
the programmable logic device includes: the device comprises an audio acquisition module and an AI audio detection model reasoning module;
the audio acquisition module is arranged to acquire the original audio data input from the microphone device;
the AI audio detection model reasoning module is set to sequentially execute the operation modes corresponding to the pre-trained AI audio detection models according to the audio data after format conversion so as to determine the audio detection result.
3. The audio detection system according to claim 1 or 2,
the programmable logic device is a Field Programmable Gate Array (FPGA);
the audio acquisition module and the AI audio detection model reasoning module are both arranged in the kernel of the FPGA.
4. The audio detection system of claim 2,
the operation mode corresponding to the pre-trained AI audio detection model comprises: deep convolution operation, full connection operation and flexible maximum transmission operation.
5. The audio detection system according to claim 1 or 2,
the MCU comprises an audio format conversion module;
the audio format conversion module is arranged in the kernel of the MCU;
the audio format conversion module is configured to convert the original audio data into a spectrogram according to a preset conversion rule, and the spectrogram is used as the audio data after the format conversion.
6. The audio detection system of claim 5,
the audio format conversion module converts the original audio data into a spectrogram according to a preset conversion rule, and the method comprises the following steps:
sequentially acquiring corresponding original audio data fragments from the original audio data according to a first preset time length, and processing each original audio data fragment according to the following steps:
performing fast Fourier transform calculation according to the original audio data fragments, and determining a first preset number of transformed audio data;
and carrying out average value calculation on the transformed audio data, determining a second preset number of average audio data, and storing the average audio data into a spectrogram corresponding to the original audio data fragment.
7. The audio detection system according to claim 1 or 2,
and the pre-trained AI audio detection model is trained in equipment or a system except the audio detection system.
8. The audio detection system according to claim 1 or 2,
the programmable logic device is further configured to update the pre-trained AI audio detection model.
9. The audio detection system according to claim 1 or 2,
the shared memory is connected with the kernel of the MCU through a bus system.
10. An audio detection method applied to the audio detection system according to any one of claims 1 to 9, wherein the method comprises:
the programmable logic device collects original audio data and stores the data in a shared memory;
the micro control unit MCU acquires the original audio data from the shared memory and performs format conversion on the original audio data according to a preset conversion rule;
and the programmable logic device detects the audio data after format conversion according to a pre-trained AI audio detection model, and determines an audio detection result.
CN202110352178.2A 2021-03-31 2021-03-31 Audio detection system and audio detection method Pending CN115148220A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110352178.2A CN115148220A (en) 2021-03-31 2021-03-31 Audio detection system and audio detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110352178.2A CN115148220A (en) 2021-03-31 2021-03-31 Audio detection system and audio detection method

Publications (1)

Publication Number Publication Date
CN115148220A true CN115148220A (en) 2022-10-04

Family

ID=83405195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110352178.2A Pending CN115148220A (en) 2021-03-31 2021-03-31 Audio detection system and audio detection method

Country Status (1)

Country Link
CN (1) CN115148220A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150047803A (en) * 2013-10-25 2015-05-06 삼성전자주식회사 Artificial intelligence audio apparatus and operation method thereof
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test
CN106504768A (en) * 2016-10-21 2017-03-15 百度在线网络技术(北京)有限公司 Phone testing audio frequency classification method and device based on artificial intelligence
CN109658923A (en) * 2018-10-19 2019-04-19 平安科技(深圳)有限公司 Voice quality detecting method, equipment, storage medium and device based on artificial intelligence
CN109994127A (en) * 2019-04-16 2019-07-09 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency detection, device, electronic equipment and storage medium
CN110266894A (en) * 2019-06-18 2019-09-20 浙江百应科技有限公司 A kind of call method and system of automatic busy tone detecting
CN111105788A (en) * 2019-12-20 2020-05-05 北京三快在线科技有限公司 Sensitive word score detection method and device, electronic equipment and storage medium
US10645216B1 (en) * 2019-03-26 2020-05-05 Ribbon Communications Operating Company, Inc. Methods and apparatus for identification and optimization of artificial intelligence calls
US20200293875A1 (en) * 2019-03-12 2020-09-17 International Business Machines Corporation Generative Adversarial Network Based Audio Restoration

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150047803A (en) * 2013-10-25 2015-05-06 삼성전자주식회사 Artificial intelligence audio apparatus and operation method thereof
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test
CN106504768A (en) * 2016-10-21 2017-03-15 百度在线网络技术(北京)有限公司 Phone testing audio frequency classification method and device based on artificial intelligence
CN109658923A (en) * 2018-10-19 2019-04-19 平安科技(深圳)有限公司 Voice quality detecting method, equipment, storage medium and device based on artificial intelligence
US20200293875A1 (en) * 2019-03-12 2020-09-17 International Business Machines Corporation Generative Adversarial Network Based Audio Restoration
US10645216B1 (en) * 2019-03-26 2020-05-05 Ribbon Communications Operating Company, Inc. Methods and apparatus for identification and optimization of artificial intelligence calls
CN109994127A (en) * 2019-04-16 2019-07-09 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency detection, device, electronic equipment and storage medium
CN110266894A (en) * 2019-06-18 2019-09-20 浙江百应科技有限公司 A kind of call method and system of automatic busy tone detecting
CN111105788A (en) * 2019-12-20 2020-05-05 北京三快在线科技有限公司 Sensitive word score detection method and device, electronic equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ALEJANDRO MOR´AN ET AL.: "Hardware-Optimized Reservoir Computing System for Edge Intelligence Applications", 《COGNITIVE COMPUTATION》, 28 February 2021 (2021-02-28), pages 1461 - 1469 *
徐成等: "《嵌入式系统导论》", 31 January 2011, 中国铁道出版社, pages: 16 *
朱丽芳;: "人工智能技术在应用中的安全风险与管控研究", 电信工程技术与标准化, no. 12, 15 December 2019 (2019-12-15) *
李永;范雪;杨鸿波;: "声谱图在汉语普通话声调识别中的应用", 信息通信, no. 07, 15 July 2017 (2017-07-15) *
毕春艳;陈莹莹;: "人工智能环境下音频信号完整度检测系统设计", 现代电子技术, no. 08, 15 April 2020 (2020-04-15) *

Similar Documents

Publication Publication Date Title
CN110347873B (en) Video classification method and device, electronic equipment and storage medium
CN111401516B (en) Searching method for neural network channel parameters and related equipment
US9020871B2 (en) Automated classification pipeline tuning under mobile device resource constraints
WO2022027937A1 (en) Neural network compression method, apparatus and device, and storage medium
CN106709588B (en) Prediction model construction method and device and real-time prediction method and device
CN111582323B (en) Transmission line channel detection method, device and medium
Meyer et al. Efficient convolutional neural network for audio event detection
WO2018228399A1 (en) Computing device and method
Gope et al. Ternary hybrid neural-tree networks for highly constrained iot applications
CN116304720B (en) Cost model training method and device, storage medium and electronic equipment
CN110797031A (en) Voice change detection method, system, mobile terminal and storage medium
CN114443891A (en) Encoder generation method, fingerprint extraction method, medium, and electronic device
CN116012681A (en) Method and system for diagnosing motor faults of pipeline robot based on sound vibration signal fusion
Sailesh et al. A novel framework for deployment of CNN models using post-training quantization on microcontroller
WO2022246986A1 (en) Data processing method, apparatus and device, and computer-readable storage medium
CN114358274A (en) Method and apparatus for training neural network for image recognition
CN110070891B (en) Song identification method and device and storage medium
CN117527495A (en) Modulation mode identification method and device for wireless communication signals
CN115148220A (en) Audio detection system and audio detection method
CN115758237A (en) Bearing fault classification method and system based on intelligent inspection robot
CN113033397A (en) Target tracking method, device, equipment, medium and program product
CN111354372B (en) Audio scene classification method and system based on front-end and back-end combined training
US20240105211A1 (en) Weakly-supervised sound event detection method and system based on adaptive hierarchical pooling
CN112348162B (en) Method and device for generating a recognition model
KR102626550B1 (en) Deep learning-based environmental sound classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination