CN117395587A

CN117395587A - Sound detection method, device, equipment and medium based on machine vision

Info

Publication number: CN117395587A
Application number: CN202311526520.1A
Authority: CN
Inventors: 刘吉悦
Original assignee: Goertek Inc
Current assignee: Goertek Inc
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-01-12

Abstract

The application discloses an acoustic detection method, device, equipment and medium based on machine vision, and belongs to the technical field of acoustic detection. In the method, the intelligent acoustic quality detection of new product research and development and NP I stage is realized through the automatic detection technology of machine vision, so that the product quality is ensured from the research and development end. Firstly, converting recording audio recorded by a sound to be tested when playing preset test audio into a time domain image; then, detecting a detection value of a preset detection item of the recording audio on the time domain image; and if the detection values all meet the preset threshold value of the preset detection item corresponding to the detection value, determining that the sound to be detected is qualified.

Description

Sound detection method, device, equipment and medium based on machine vision

Technical Field

The present application relates to the technical field of acoustic detection, and in particular, to an acoustic detection method based on machine vision, an acoustic detection device based on machine vision, an acoustic detection apparatus based on machine vision, and a computer readable storage medium.

Background

The intelligent sound equipment combines the emerging technologies such as voice recognition, natural language processing and the like on the basis of the traditional sound equipment, so that the intelligent sound equipment has the functions of sound resource playing, intelligent voice interaction, intelligent home control and the like. The intelligent sound is regarded as a control center of the intelligent home due to the rich product functions, and then the intelligent sound becomes one of the most rapid electronic products developed nowadays. The audio playing is the basic function of the intelligent sound, so that the guarantee of the acoustic performance of the intelligent sound is the basic requirement of the quality end, and the acoustic performance needs to be detected from the new product research and development and NPI (new product import) stage.

At present, in the traditional production and manufacturing scene, acoustic performance is often judged in a new product research and development and NPI stage in a manual listening mode, and the detection method consumes labor cost, reduces detection efficiency, has poor robustness and is greatly influenced by subjective factors of people, and the long-time detection can lead to auditory fatigue of inspectors. In the high-end production and manufacturing scene, the acoustic performance judgment is finished by a complex precise acoustic instrument, the detection method has high cost, the detection condition is strictly required, and the detection efficiency is required to be improved.

Disclosure of Invention

The main object of the present application is to provide a machine vision-based acoustic detection method, a machine vision-based acoustic detection device, a machine vision-based acoustic detection apparatus, and a computer-readable storage medium, which aim to accurately detect acoustic quality of sound.

To achieve the above object, the present application provides a machine vision-based acoustic detection method, which includes:

acquiring recorded audio recorded by a sound to be tested when playing preset test audio, and converting the recorded audio into a time domain image;

detecting a detection value of a preset detection item of the recorded audio on the time domain image;

and if the detection values all meet the preset threshold value of the preset detection item corresponding to the detection value, determining that the to-be-detected sound is qualified.

Illustratively, the step of detecting the detection value of the preset detection item of the recorded audio on the time-domain image includes:

determining rising time, peak time and overshoot of the preset test audio in the time domain image;

when the preset test audio is step signal audio, the adjusting time of the step signal audio is also determined in the time domain image;

and when the preset test audio is square wave signal audio, determining the inclination of the square wave signal audio in the time domain image.

Illustratively, the step of determining the rising time, the peak time and the overshoot of the preset test audio in the time domain image includes:

binarizing the time domain image, and extracting an edge contour line through edge detection;

determining a starting point, a vertical axis coordinate maximum point and a steady state value point of the edge contour line through curve tracking and local optimizing of the edge contour line, and establishing a two-dimensional coordinate system by taking the starting point as an origin;

determining the peak time of the preset test audio as the abscissa value of the maximum value point of the ordinate;

determining that the overshoot of the preset test audio is the ratio of the difference value of the vertical coordinates of the maximum value point of the vertical axis coordinates and the steady state value point to the vertical coordinates of the steady state value point;

and determining the rising time of the preset test audio as an abscissa value when the ordinate of the edge contour line reaches a reference ordinate for the first time, wherein the reference ordinate is determined by the ordinate of the steady-state value point and a preset reference coefficient.

The step of determining the adjustment time of the step signal audio in the time domain image when the preset test audio is the step signal audio includes:

determining a vertical coordinate fluctuation interval of the steady-state value point according to the vertical coordinate of the steady-state value point and a preset fluctuation coefficient;

determining extreme points when the monotonicity of the edge contour line changes;

if the ordinate of the extreme point is in the ordinate fluctuation interval, determining a target point at which the edge contour line intersects with an interval ordinate maximum value or an interval ordinate minimum value of the ordinate fluctuation interval before the extreme point;

and determining the adjusting time of the step signal audio as the abscissa value of the target point.

The step of determining the inclination of the square wave signal audio in the time domain image when the preset test audio is the square wave signal audio includes:

determining an inclined point, wherein the abscissa value of the inclined point is the abscissa value when the ordinate of the edge contour line reaches a reference ordinate for the first time, and the ordinate value of the inclined point is the reference ordinate;

and determining the inclination of the square wave signal audio as the inclination between the inclination point and the starting point.

Exemplary, before the step of obtaining the recorded audio recorded by the sound to be tested when playing the preset test audio, the method includes:

measuring the free field frequency response of the active speaker by a standard microphone, taking the measurement result as a reference frequency response;

repeatedly measuring a microphone to be measured at the same measuring position of the standard microphone, recording the frequency response deviation of the frequency response of the microphone to be measured and the reference frequency response, and taking the frequency response deviation as a compensation value of the microphone to be measured;

and collecting recording audio recorded by the sound to be tested when the preset test audio is played by using the standard microphone or the microphone to be tested after compensation is performed based on the compensation value.

Illustratively, the method further comprises:

if the sound to be detected of the current sampling inspection batch is qualified, sending the sound to be detected corresponding to the current sampling inspection batch to a subsequent workstation;

if unqualified sound to be detected exists in the current spot check batch, increasing the spot check number of the sound to be detected until the sound to be detected in the subsequent spot check batch is qualified.

The application also provides an audio detection device based on machine vision, audio detection device based on machine vision includes:

the acquisition module is used for acquiring recorded audio recorded by the sound to be tested when the preset test audio is played, and converting the recorded audio into a time domain image;

the detection module is used for detecting the detection value of a preset detection item of the recording audio on the time domain image;

and the determining module is used for determining that the sound to be detected is qualified if the detection values all meet the preset threshold value of the preset detection item corresponding to the detection value.

The application also provides an audio detection equipment based on machine vision, audio detection equipment based on machine vision includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the machine vision-based acoustic detection method as described above.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the machine vision-based acoustic detection method as described above.

According to the machine vision-based sound detection method, the machine vision-based sound detection device, the machine vision-based sound detection equipment and the computer-readable storage medium, recording audio recorded when a sound to be detected plays preset test audio is obtained, and the recording audio is converted into a time domain image; detecting a detection value of a preset detection item of the recorded audio on the time domain image; and if the detection values all meet the preset threshold value of the preset detection item corresponding to the detection value, determining that the to-be-detected sound is qualified.

In the method, intelligent acoustic quality detection of new product research and development and NPI stages is realized through an automatic detection technology of machine vision, so that the product quality is ensured from a research and development end. Firstly, converting recording audio recorded by a sound to be tested when playing preset test audio into a time domain image; then, detecting a detection value of a preset detection item of the recording audio on the time domain image; and if the detection values all meet the preset threshold value of the preset detection item corresponding to the detection value, determining that the sound to be detected is qualified. In order to ensure the intelligent sound quality, the acoustic performance of the intelligent sound quality needs to be detected from the research and development of new products and the NPI stage, if manual detection is used, a large amount of human resources need to be consumed, hearing fatigue can be caused by long-time detection, and meanwhile, missed detection and false detection are easy to occur; if the high-end acoustic instrument is used for detection, the cost is high and the requirement on the detection environment is high. The intelligent sound laboratory quality inspection method based on machine vision is used for realizing low-cost automatic acoustic performance detection and accurately detecting the acoustic intelligence of sound.

Drawings

FIG. 1 is a schematic diagram of an operating device of a hardware operating environment according to an embodiment of the present application;

fig. 2 is a schematic flow chart of an embodiment of a machine vision-based audio detection method according to an embodiment of the present application;

fig. 3 is a schematic diagram of an audio detection algorithm according to an embodiment of a machine vision-based audio detection method according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating step signal detection according to an embodiment of a machine vision-based audio detection method according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating detection of square wave signals according to an embodiment of a machine vision-based audio detection method according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating a process of sound quality inspection according to an embodiment of a machine vision-based sound inspection method according to an embodiment of the present application;

fig. 7 is an application schematic diagram of an embodiment of a machine vision-based audio detection method according to an embodiment of the present application;

fig. 8 is a schematic diagram of an acoustic detection device based on machine vision according to an embodiment of the present application.

The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

Referring to fig. 1, fig. 1 is a schematic diagram of an operating device of a hardware operating environment according to an embodiment of the present application.

As shown in fig. 1, the operation device may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 is not limiting of the operating device and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.

As shown in fig. 1, an operating system, a data storage module, a network communication module, a user interface module, and a computer program may be included in the memory 1005 as one type of storage medium.

In the operating device shown in fig. 1, the network interface 1004 is mainly used for data communication with other devices; the user interface 1003 is mainly used for data interaction with a user; the processor 1001, the memory 1005 in the operation device of the present application may be provided in an operation device that calls a computer program stored in the memory 1005 through the processor 1001 and performs the following operations:

In an embodiment, the processor 1001 may call a computer program stored in the memory 1005, and further perform the following operations:

the step of detecting the detection value of the preset detection item of the recording audio on the time domain image comprises the following steps:

the step of determining the rising time, the peak time and the overshoot of the preset test audio in the time domain image comprises the following steps:

and when the preset test audio is step signal audio, determining the adjusting time of the step signal audio in the time domain image, wherein the step comprises the following steps:

the step of determining the inclination of the square wave signal audio frequency in the time domain image when the preset test audio frequency is the square wave signal audio frequency further comprises the following steps:

before the step of obtaining the recorded audio of the sound to be tested recorded when the preset test audio is played, the method comprises the following steps:

the method further comprises the steps of:

An embodiment of the present application provides a machine vision-based audio detection method, referring to fig. 2, in an embodiment of the machine vision-based audio detection method, the method includes:

step S10, recording audio recorded by a sound to be tested when playing preset test audio is obtained, and the recording audio is converted into a time domain image;

in order to ensure the acoustic performance of the intelligent sound, the quality of the intelligent sound needs to be detected from the research and development of a new product and the NPI stage. The detection is completed in a quality laboratory, firstly, a product to be detected (a whole machine or a module) plays test audio, a standard microphone collects audio signals and sends the audio signals to an upper computer through a serial port, the audio signals are converted into time domain images in the upper computer, and then, a detection algorithm based on machine vision is carried out on the time domain images to judge the acoustic performance of the time domain images.

Before the recorded audio of the sound to be tested recorded when the preset test audio is played is obtained, calibration of a recording device such as a microphone is also required. Standard microphones are used for recording audio, and therefore the microphones need to be calibrated prior to testing.

The calibration is performed in a sound-proof environment, and a standard microphone and a plurality of microphones to be calibrated are prepared. Firstly, fixing the positions of an active speaker and a standard microphone in a field, measuring the free field frequency response of the active speaker, and taking the measurement result as a reference frequency response; the standard microphone is replaced by the microphone to be tested at the same position, the measurement is repeated, and the frequency response deviation is recorded, and the deviation can be compensated by software in the subsequent test.

In addition, in order to improve the reliability of calibration, the signal to noise ratio of spectrum measurement is more than 20dB, and the consistency of calibration of the standard wheat can be ensured by repeated calibration.

Step S20, detecting the detection value of a preset detection item of the recording audio on the time domain image;

after the recorded audio of the acquired sound to be tested, which is recorded when the preset test audio is played, is converted into a time domain image, the detection value of the preset detection item of the recorded audio can be detected on the time domain image, and whether the sound to be tested is qualified is determined through the detection value.

In an embodiment, referring to fig. 3, first, the upper computer receives an audio signal through the serial port, the audio signal is a digital signal, and then the audio signal is converted into a time domain image according to information such as a total number of frames, a sampling rate, an amplitude and the like of the digital signal. And identifying image noise points by using a window function, judging the image noise points to be NG if the noise points are too many, and executing a subsequent detection step if the noise points are too many. The main signal region is extracted by image segmentation (while image denoising) and then the corresponding test algorithm is performed according to the different test audio frequencies.

When detecting the detection value of a preset detection item of the recorded audio on the time domain image, the test audio of the non-detailed test is a step signal, and the test content is the rising time, the peak time, the overshoot and the adjustment time of the recorded step signal detected by using a machine vision algorithm. The test audio of the detailed test is a square wave signal, and the test content is to detect the rising time, the peak time, the overshoot and the inclination of the recorded square wave signal by using a machine vision algorithm.

Taking a non-detailed test as an example, in one embodiment, referring to fig. 4, first, image preprocessing is performed. Binarizing the time domain image, extracting signal edge contour line by using edge detection (such as Canny algorithm), and fig. 4 is a partial enlarged schematic diagram of step signal after image preprocessing.

The step signal features are then marked by image processing. The binarized image foreground and background are distinguished by 0 and 255, so that key points can be obtained through curve tracking and local optimization. A starting point of curve tracking, whose absolute coordinates in the image are (x ₀ ，y ₀ ) It is marked as the origin of coordinates (0, 0) of the relative coordinates.

Then, local optimization is performed, and absolute coordinates (x) of a maximum point (vertical axis coordinate maximum point) of the curve front section in the y-axis direction are recorded ₁ ，y ₁ ) From x ₁ -x ₀ And y ₁ -y ₀ Obtain the relative coordinates (t) _m ，y _m )。

Curve tracking, taking the ordinate y of any steady-state value point at the rear section of the image ₂ From y ₂ -y ₀ Obtain the relative ordinate y _s 。

In one embodiment, the predetermined reference factor has a value of 0.9 and the reference ordinate is 0.9y _s +y ₀ From (x) ₀ ，0.9y _s +y ₀ ) The point starts to traverse along the positive x-axis, and the abscissa of the first intersection point with the curve is recorded as x ₃ From x ₃ -x ₀ Obtain the relative abscissa t _r 。

Determining the rise time: t is t _r Peak time: t is t _m Overshoot: sigma= (y) _m -y _s )/y _s 。

In one embodiment, referring to FIG. 5, the rise time t of the square wave signal _r Peak time t _m And overshoot σ= (y) _m -y _s )/y _s The calculation method is similar to the step signal, and is not repeated here.

In one embodiment, referring to FIG. 4, curve tracking is performed, from point (t _m ，y _m ) Relative ordinate y starting to traverse the curve from the positive x-axis direction along the curve _x Comparison of y _x And y is _s +Δ and y _s -delta, wherein the preset fluctuation coefficient takes 0.02, delta takes 0.02y _s 。y _x The value of (2) is cycled from decreasing to increasing to decreasing until it stabilizes, if y is a certain time _x Satisfying y at the monotonic transformation of (2) _s -Δ＜y _x ＜y _s +Δ, then find the previous (y _x ＝y _s -Δ)||(y _x ＝y _s The relative abscissa of the point of +Δ), noted as the adjustment time t of the step signal audio _s 。

In one embodiment, referring to fig. 5, the square wave signal needs to additionally detect its inclination, which is determined by the starting point (t ₀ 0) and point (t _r ，0.9y _s ) And (3) calculating to obtain: l=0.9y _s /(t _r -t ₀ )。

And step S30, if the detection values all meet the preset threshold value of the preset detection item corresponding to the detection value, determining that the sound to be detected is qualified.

When the test audio is a step signal, if the rising time, the peak time, the overshoot and the adjustment time all meet the corresponding preset thresholds, determining that the to-be-tested sound is qualified.

When the test audio is a square wave signal, if the rising time, the peak time, the overshoot and the inclination all meet the corresponding preset thresholds, determining that the to-be-tested sound is qualified.

Illustratively, the method further comprises:

In one embodiment, if the PASS is detected, then the product is determined to be acceptable for subsequent detection steps; if NG is detected, the test data is analyzed and uploaded to the development system, where it can be retested or otherwise processed (including increasing the spot check during the MP stage). Meanwhile, after laboratory detection is completed, the detection condition can be transmitted to a research and development system after simple data analysis and report generation are carried out, so that the products can be improved in a targeted manner from a research and development end in the new product research and development and NPI stage.

In an application scenario of the machine vision-based audio detection method, referring to fig. 6, the whole process of the machine vision-based intelligent audio laboratory quality inspection is divided into a laboratory pre-preparation stage and a laboratory detection stage. In the early preparation stage of a laboratory, a stable detection environment and a calibration standard microphone are ensured, and the method comprises the steps of setting up a test environment and calibrating a standard microphone. The intelligent sound laboratory quality testing environment is built, stable, sound-proof and no sound source interference are required, and the positions of the product to be tested and the standard microphone are required to be fixed.

In the laboratory detection stage, the standard wheat acquires an audio signal and transmits the audio signal to the upper computer, the acoustic complete machine or the module to be detected is sampled and then transmitted to a laboratory test environment, test audio (step signal or square wave signal) is played, and the standard wheat acquires the audio signal and transmits the audio signal to the upper computer. And performing audio detection and subsequent processing.

Referring to fig. 7, the flow of the intelligent audio laboratory quality inspection method based on machine vision is described in detail as follows:

(1) Setting up a laboratory test environment;

(2) Calibrating the test microphone to obtain a standard microphone;

(3) Selecting a product to be tested, and playing corresponding audio (step signal or square wave signal) according to the test content;

(4) The standard wheat records the test audio and transmits the test audio to an upper computer of a laboratory through a serial port;

(5) Converting the audio signal into an image in the upper computer so as to perform image processing on the image;

(6) Performing an audio quality detection algorithm based on machine vision;

(7) If the NG product exists in the step (6), the analysis data is fed back to the research and development system, and then retest, increase of the number of spot checks (MP stage), repair or other operations are executed;

(8) And (3) if all PASS of the to-be-detected products in the step (6) are judged to be qualified in audio quality, executing the subsequent detection step, and ending the quality inspection of the intelligent sound laboratory based on the machine vision.

By providing the low-cost automatic detection method for the intelligent sound laboratory stage, the audio quality is detected instead of manual automation, the detection efficiency and the detection precision are improved, and the product quality is ensured. And the system can be linked with a research and development system to optimize product design, process and the like, so that the intelligent degree of the manufacturing industry is improved and the research and development efficiency of the product is improved. The method has the advantages of high integration level, high reliability, high automation degree and certain intelligent capability, reduces the detection cost, improves the detection efficiency, and can be popularized to other products with the audio playing function.

Referring to fig. 8, in addition, the embodiment of the present application further provides an acoustic detection device based on machine vision, where the acoustic detection device based on machine vision includes:

the acquisition module M1 is used for acquiring recorded audio recorded by the sound to be tested when the preset test audio is played, and converting the recorded audio into a time domain image;

the detection module M2 is used for detecting the detection value of a preset detection item of the recording audio on the time domain image;

and the determining module M3 is used for determining that the sound to be detected is qualified if the detection values all meet the preset threshold value of the preset detection item corresponding to the detection value.

Illustratively, the detection module is further configured to:

determining the overshoot of the preset test audio as the maximum value point y of the vertical axis coordinate _m And the steady state value point y _s And the steady-state value point y _s The ratio of the ordinate of (c);

Illustratively, the detection module is further configured to:

Illustratively, the acquiring module is further configured to:

before the step of obtaining the recorded audio recorded by the sound to be tested when the preset test audio is played,

Illustratively, the determining module is further configured to:

The machine vision-based sound detection device provided by the application adopts the machine vision-based sound detection method in the embodiment, and aims to accurately detect the acoustic quality of sound. Compared with the conventional technology, the machine vision-based acoustic detection device provided by the embodiment of the present application has the same beneficial effects as the machine vision-based acoustic detection method provided by the above embodiment, and other technical features in the machine vision-based acoustic detection device are the same as the features disclosed in the method of the above embodiment, and are not repeated herein.

In addition, the embodiment of the application also provides an audio detection device based on machine vision, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the machine vision-based acoustic detection method as described above.

In addition, the embodiment of the application further provides a computer readable storage medium, and the computer readable storage medium stores a computer program, and the computer program realizes the steps of the sound detection method based on machine vision when being executed by a processor.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the conventional technology in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims

1. A machine vision-based acoustic inspection method, the method comprising:

2. The machine vision-based acoustic detection method as set forth in claim 1, wherein the step of detecting a detection value of a preset detection item of the recorded audio on the time-domain image includes:

3. The machine vision-based acoustic inspection method according to claim 2, wherein the step of determining rise time, peak time, overshoot of the preset test audio in the time domain image includes:

4. The machine vision-based acoustic inspection method as claimed in claim 3, wherein the step of further determining an adjustment time of the step signal audio in the time-domain image when the preset test audio is the step signal audio, comprises:

5. The machine vision-based acoustic inspection method as claimed in claim 3, wherein the step of further determining the inclination of the square wave signal audio in the time domain image when the preset test audio is the square wave signal audio, comprises:

6. The machine vision-based sound detection method as set forth in claim 1, wherein before the step of obtaining the recorded audio recorded by the sound to be detected when the preset test audio is played, the method includes:

7. The machine vision-based acoustic inspection method of claim 1, further comprising:

8. An acoustic detection device based on machine vision, characterized in that the acoustic detection device based on machine vision includes:

9. An acoustic inspection apparatus based on machine vision, characterized in that the acoustic inspection apparatus based on machine vision comprises: a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the machine vision-based acoustic detection method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the machine vision-based acoustic detection method according to any one of claims 1 to 7.