CN114374924B

CN114374924B - Recording quality detection method and related device

Info

Publication number: CN114374924B
Application number: CN202210015387.2A
Authority: CN
Inventors: 王国行; 徐国权; 顾陈曦; 蒋沛佚; 王东; 蒋德铭
Original assignee: Shanghai Newtailun Education Technology Co ltd
Current assignee: Shanghai Newtailun Education Technology Co ltd
Priority date: 2022-01-07
Filing date: 2022-01-07
Publication date: 2024-01-19
Anticipated expiration: 2042-01-07
Also published as: CN114374924A

Abstract

In the recording quality detection method and the related device, the detection equipment acquires the first audio recorded by the target recording equipment, and filters interference information in the first audio, so that the optimized second audio is obtained; and then, determining the recording quality of the target recording device according to the difference between the first audio and the second audio. Therefore, the aim of efficiently and objectively detecting the recording quality of the target recording equipment is fulfilled.

Description

Recording quality detection method and related device

Technical Field

The application relates to the field of data processing, in particular to a recording quality detection method and a related device.

Background

In order to provide high-quality services, the internet platform sometimes needs to detect the device performance of the audio device used by the content creator, and therefore, there is a need to provide a technical means for detecting the recording quality of the audio device.

Disclosure of Invention

In order to overcome at least one of the disadvantages in the prior art, the present application provides a recording quality detection method and related device, including:

in a first aspect, the present application provides a recording quality detection method, applied to a detection device, the method including:

acquiring a first audio recorded by target recording equipment;

filtering interference information in the first audio to obtain a second audio;

and obtaining the recording quality of the target recording device according to the difference between the first audio and the second audio.

In a second aspect, the present application provides a sound quality detection apparatus, applied to a detection device, the sound quality detection apparatus including:

the audio acquisition module is used for acquiring first audio recorded by the target recording equipment;

the audio processing module is used for filtering interference information in the first audio to obtain a second audio;

and the quality detection module is used for obtaining the recording quality of the target recording device according to the difference between the first audio and the second audio.

In a third aspect, the present application provides a detection apparatus, where the detection apparatus includes a processor and a memory, where the memory stores a computer program, and where the computer program, when executed by the processor, implements the recording quality detection method.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the recording quality detection method.

Compared with the prior art, the application has the following beneficial effects:

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a detection device according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a device according to an embodiment of the present application.

Icon: 120-memory; 130-a processor; 140-a communication unit; 201-an audio acquisition module; 202-an audio processing module; 203-a quality detection module.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Internet platforms sometimes require detection of device capabilities of audio devices used by content creators. The Internet platform can comprise an audio reading platform, a music creation platform, a video platform, a live broadcast platform, an electronic commerce platform and the like.

The following description will take the audio reading platform as an example. Live broadcasting of the audio book platform deducts the emotion of a novel in an audio book broadcasting mode; therefore, the requirements on the equipment for recording the sound are strict, and if professional recording equipment is not used, the noise in the recorded audio reading material is larger, and the quality of the audio work and the experience and feeling of a listener are affected. Therefore, in order for a listener to obtain a high quality audio reading from the audio reading platform, it is necessary to detect the recording quality of the recording device used by the anchor.

In the related art, a piece of audio is usually heard by ears manually and judged whether the audio is recorded by using a professional recording device, so that the efficiency is extremely low, the subjectivity is full, and erroneous judgment is easy.

In view of this, the present embodiment provides a recording quality detection method applied to a detection apparatus. In the method, a detection device acquires first audio recorded by a target recording device, and filters interference information in the first audio, so that optimized second audio is obtained; and then, determining the recording quality of the target recording device according to the difference between the first audio and the second audio. Therefore, the aim of efficiently and objectively detecting the recording quality of the target recording equipment is fulfilled.

The detection device may be a server, for example, a Web server, an FTP (File Transfer Protocol ) server, a data processing server, or the like. Further, the server may be a single server or a group of servers. The server farm may be centralized or distributed (e.g., the servers may be distributed systems). In some embodiments, the server may be local or remote to the user terminal. In some embodiments, the server may be implemented on a cloud platform; by way of example only, the Cloud platform may include a private Cloud, public Cloud, hybrid Cloud, community Cloud (Community Cloud), distributed Cloud, cross-Cloud (Inter-Cloud), multi-Cloud (Multi-Cloud), or the like, or any combination thereof. In some embodiments, the server may be implemented on an electronic device having one or more components.

That is, when the detection device is a server, the server may detect the first audio for uploading by the terminal, thereby determining the recording quality of the target recording device from which the first audio was collected.

The user terminal may be a mobile terminal, tablet computer, laptop computer, or the like, or any combination thereof. In some embodiments, the mobile terminal may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, or an augmented reality device, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart lace, a smart glass, a smart helmet, a smart watch, a smart garment, a smart backpack, a smart accessory, etc., or any combination thereof. In some embodiments, the smart mobile device may include a smart phone, a personal digital assistant (Personal Digital Assistant, PDA), a gaming device, a navigation device, or a Point of Sale (POS) device, or the like, or any combination thereof.

It should be understood that the target recording device may be a microphone configured by the user terminal itself or a professional recording device communicatively connected to a peripheral interface of the user terminal.

The present embodiment also provides a block diagram for describing the hardware structure of the detection device. As shown in fig. 1, the detection device includes a memory 120, a processor 130, and a communication unit 140. The memory 120, the processor 130, and the communication unit 140 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The Memory 120 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving an execution instruction.

The communication unit 140 is used for transmitting and receiving data through a network. The network may include a wired network, a wireless network, a fiber optic network, a telecommunications network, an intranet, the internet, a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), a wireless local area network (Wireless Local Area Networks, WLAN), a metropolitan area network (Metropolitan Area Network, MAN), a wide area network (Wide Area Network, WAN), a public switched telephone network (Public Switched Telephone Network, PSTN), a bluetooth network, a ZigBee network, a near field communication (Near Field Communication, NFC) network, or the like, or any combination thereof. In some embodiments, the network may include one or more network access points. For example, the network may include wired or wireless network access points, such as base stations and/or network switching nodes, through which one or more components of the service request processing system may connect to the network to exchange data and/or information.

The processor 130 may be an integrated circuit chip with signal processing capabilities and may include one or more processing cores (e.g., a single-core processor or a multi-core processor). By way of example only, the processors may include a central processing unit (Central Processing Unit, CPU), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a special instruction set Processor (Application Specific Instruction-set Processor, ASIP), a graphics processing unit (Graphics Processing Unit, GPU), a physical processing unit (Physics Processing Unit, PPU), a digital signal Processor (Digital Signal Processor, DSP), a field programmable gate array (Field Programmable Gate Array, FPGA), a programmable logic device (Programmable Logic Device, PLD), a controller, a microcontroller unit, a reduced instruction set computer (Reduced Instruction Set Computing, RISC), a microprocessor, or the like, or any combination thereof.

Based on the above-mentioned related description, the recording quality detection method in this embodiment is described in detail below with reference to the flowchart shown in fig. 2. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art. As shown in fig. 2, the method includes:

s101, acquiring first audio recorded by target recording equipment.

Illustratively, it is assumed that the detecting device is a server that is communicatively connected to the user terminal. The user terminal provides an uploading interface, so that a user can send first audio recorded by the target recording device to the server through the uploading interface.

S102, filtering interference information in the first audio to obtain a second audio.

S103, according to the difference between the first audio and the second audio, the recording quality of the target recording device is obtained.

Therefore, the detection equipment compares the first audio before and after optimization with the second audio, if the difference between the first audio and the second audio is large, the fact that more interference information is filtered from the first audio is indicated, and accordingly the fact that the recording quality of the target recording equipment is poor can be determined; similarly, if the difference between the two is smaller, the fact that less interference information is filtered from the first audio is indicated, and therefore recording quality of the target recording device can be determined to be good.

It was found that the interference information in the first audio mainly comprises background noise. Background noise, also called background noise, generally refers to the total noise in an electroacoustic system, except for useful signals: the sound reproducing device comprises two parts, namely sound equipment noise and sound reproducing environment noise. For example, in a voiced readout it is embodied as a "sand" sound in addition to the anchor voice, whereas the presence of a background noise necessarily affects the quality of the voiced readout. Therefore, step S102 may filter out the interference information therein by the following embodiments:

s102-1, performing noise reduction processing on the first audio to obtain a second audio.

Therefore, the background noise in the first audio is filtered, and the optimized second audio is obtained.

In this embodiment, PESQ (Perceptual evaluation of speech quality, objective speech quality assessment) algorithm is used to measure the difference between the first audio and the second audio, so step S103 compares the difference between the first audio and the second audio by:

s103-1, evaluating the difference between the first audio and the second audio through a PESQ algorithm to obtain a comparison score between the first audio and the second audio.

S103-2, determining the recording quality of the target recording equipment according to the comparison score.

In an alternative embodiment, the detection device compares the comparison score between the first audio and the second audio with a preset score threshold value, and judges whether the comparison score is greater than the score threshold value; and if the comparison score is larger than the score threshold, determining that the recording quality of the target recording equipment meets the recording requirement.

Otherwise, if the comparison score is smaller than or equal to the score threshold value, determining that the recording quality of the target recording device does not meet the recording requirement.

It has also been found that, when the duration of the audio is too long, the PESQ algorithm needs to take too much time to perform the operation, so, to improve the operation efficiency, the first audio includes a plurality of first audio segments, and the second audio includes a plurality of second audio segments corresponding to the plurality of first audio segments one by one, so that step S103-1 can obtain the comparison score by:

S103-1A, generating a plurality of audio fragment sets according to the plurality of first audio fragments and the plurality of second audio fragments. Wherein each audio clip set includes a first audio clip and a second audio clip at the same clip location.

For example, assuming that the detection device is the server in fig. 3, the user terminal in fig. 3 runs an applet, and obtains audio to be detected collected by the target recording device through the applet, and sends the audio to the server. The server performs slicing processing on the received audio to be detected to obtain a plurality of first audio fragments.

Assuming that the audio to be detected is sliced into 5 first audio fragments, the 5 first audio fragments are sequentially expressed as s according to the sequence of the acquisition time of each audio fragment ₁ ,s ₂ ,s ₃ ,s ₄ ,s ₅ . For the 5 first audio fragments, the server performs noise reduction processing on each first audio fragment, and sequentially represents the obtained 5 second audio fragments as

Therefore, the 5 first audio clips and the 5 second audio clips are in one-to-one correspondence, and the correspondence may be expressed as s _n And (3) withCorrespondingly, n represents the number of the audio clip. For example, s ₁ Correspond to->Representation->Is s ₁ Noise-reduced audio clip, s ₂ Correspond to->Representation->Is s ₂ And the audio fragment after noise reduction.

As an embodiment, the server uses the first audio segment and the second audio segment at the same segment position as one audio segment set, so that based on the above 5 first audio segments and second audio segments, 5 audio segment sets can be obtained, which are respectively expressed asWherein (1)>The representation will s ₁ 、/>A set of audio clips is formed.

As another embodiment, the target recording device is limited by its own structural factors, and there is an optimal distance from the sound source. When the user is located at the optimal distance, the audio with the minimum background noise can be recorded; and, the signal-to-noise ratio of the user recording audio is related to the sound size at the time of recording audio. That is, the state of the user when recording audio may affect the quality of the recorded audio; when the user just starts to pronounce, the state fluctuation is obvious; after a period of time, a steady state is entered. Thus, the segments of the first set of audio segments may interfere with the final comparison score.

The research also finds that when the PESQ algorithm evaluates the difference between the first audio segment and the second audio segment, if the duration of the first audio segment and the second audio segment is too short, objective evaluation is difficult to be performed on the first audio segment and the second audio segment, so that evaluation errors are easy to be introduced; when the audio to be detected is sliced into a plurality of first audio segments, the last audio segment has too short duration.

For example, assuming that the audio to be detected is 21min long, the audio to be detected is sliced according to the preset time length of 4min, the audio to be detected with the time length of 21min can be sliced into 6 first audio segments, and the time length of the last first audio segment is only 1min.

For the above reasons, the plurality of audio clip sets generated by the detection device do not include a start clip set and a stop clip set, wherein the start clip set represents a clip set composed of a first audio clip and a second audio clip at a start clip position, and the stop clip set represents a clip set composed of a first audio clip and a second audio clip at an end clip position.

Illustratively, continue with s above ₁ ,s ₂ ,s ₃ ,s ₄ ,s ₅ AndFor example, the server-generated audio clip set may include only +.>Thereby avoiding the start fragment set->And terminator fragment set->And improving the accuracy of the final comparison score.

S103-1B, respectively evaluating the difference of each audio fragment set through a PESQ algorithm to obtain a plurality of fragment scores.

S103-1C, obtaining a comparison score according to the scores of the fragments.

In one embodiment, the server may calculate an average score of the plurality of segment scores, with the average score being the comparison score. In another embodiment, the server selects a maximum score from the plurality of segment scores and uses the maximum score as the comparison score.

It should be understood that the applet in the above embodiment is an application that can be used without downloading and installing, and it implements the dream of "tentacle" of the application, and the user can open the application by sweeping or searching; the concept of 'run out and walk' is reflected, so that a user does not need to care whether to install too many applications, and the purposes that the applications are ubiquitous, available at any time and do not need to be installed and uninstalled are achieved.

Based on the comparison score obtained in the above embodiment, the recording quality detection method further includes the following embodiments, configured to improve content quality on an internet platform:

s104, obtaining third audio recorded by the target recording equipment.

S105, if the recording quality of the target recording equipment does not meet the recording requirement, carrying out noise reduction processing on the third audio to obtain fourth audio after noise reduction.

Continuing taking the audio reading platform in the above embodiment as an example, when the server detects that the recording quality of the target recording device does not meet the recording requirement, it means that more background noise exists in the audio recorded by the target recording device. Therefore, the server reduces the noise of the audio readings collected by the target recording device, so that the audio readings received by the listener are optimized audio readings.

Meanwhile, in consideration of noise reduction treatment on the audio reading material, the optimization degree of the audio reading material is limited, so that if the quality of the audio reading material is required to be improved fundamentally, a host player needs to use target recording equipment with better recording quality. In view of this, the server can also send the recording quality of the target recording device to the anchor, thereby reminding the anchor to replace the target recording device with better recording quality.

Based on the same inventive concept as the recording quality detection method, the present embodiment further provides an apparatus related thereto, including:

the embodiment also provides a sound quality detection device which is applied to detection equipment. The sound quality detecting apparatus includes at least one software function module that may be stored in the memory 120 in the form of software or Firmware (Firmware) or solidified in an Operating System (OS) of a server. As shown in fig. 4, functionally divided, the sound quality detection apparatus may include:

the audio obtaining module 201 is configured to obtain a first audio recorded by the target recording device.

In this embodiment, the audio acquisition module 201 is used to implement step S101 in fig. 2, and the detailed description of the audio acquisition module 201 can be referred to the detailed description of step S101.

The audio processing module 202 is configured to filter the interference information in the first audio to obtain the second audio.

In this embodiment, the audio processing module 202 is used to implement step S102 in fig. 2, and for a detailed description of the audio processing module 202, reference may be made to the detailed description of step S102.

And the quality detection module 203 is configured to obtain recording quality of the target recording device according to the difference between the first audio and the second audio.

In this embodiment, the quality detection module 203 is used to implement step S103 in fig. 2, and the detailed description of the quality detection module 203 can be referred to the detailed description of step S103.

It should be noted that, the number of software functional modules included in the sound quality detection apparatus is related to the module division standard, so in some embodiments, the sound quality detection apparatus may further include other software functional modules or sub-modules for implementing other steps or sub-steps of the recording quality detection method. In other embodiments, the above-mentioned audio acquisition module 201, audio processing module 202 and quality detection module 203 may be used for other steps or sub-steps of the recording quality detection method as well.

The embodiment also provides a detection device, which comprises a processor and a memory, wherein the memory stores a computer program, and the computer program realizes a recording quality detection method when being executed by the processor.

The embodiment also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the recording quality detection method is realized.

It should be noted that the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

It should also be understood that the apparatus and method disclosed in this embodiment may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A recording quality detection method, characterized by being applied to a detection apparatus, the method comprising:

acquiring a first audio recorded by target recording equipment;

filtering interference information in the first audio to obtain second audio, wherein the first audio comprises a plurality of first audio fragments, and the second audio comprises a plurality of second audio fragments corresponding to the plurality of first audio fragments one by one;

generating a plurality of audio fragment sets according to the plurality of first audio fragments and the plurality of second audio fragments, wherein each audio fragment set comprises a first audio fragment and a second audio fragment at the same fragment position, the plurality of audio fragment sets do not comprise a start fragment set and a stop fragment set, wherein the start fragment set represents a fragment set formed by the first audio fragment and the second audio fragment at the start fragment position, and the stop fragment set represents a fragment set formed by the first audio fragment and the second audio fragment at the end fragment position;

respectively evaluating the difference of each audio fragment set through a PESQ algorithm to obtain a plurality of fragment scores;

obtaining a comparison score according to the plurality of segment scores;

and determining the recording quality of the target recording equipment according to the comparison score.

2. The recording quality detection method of claim 1, wherein the interference information includes a background noise, and the filtering the interference information in the first audio to obtain a second audio includes:

and carrying out noise reduction processing on the first audio to obtain the second audio.

3. The recording quality detection method according to claim 1, wherein the determining the recording quality of the target recording device according to the comparison score includes:

if the comparison score is larger than the score threshold, determining that the recording quality of the target recording equipment meets the recording requirement;

and if the comparison score is smaller than or equal to the score threshold, determining that the recording quality of the target recording equipment does not meet the recording requirement.

4. The recording quality detection method of claim 1, wherein the method further comprises:

acquiring a third audio recorded by the target recording equipment;

and if the recording quality of the target recording equipment does not meet the recording requirement, carrying out noise reduction processing on the third audio to obtain a fourth audio after noise reduction.

5. A sound quality detection apparatus, characterized by being applied to a detection device, comprising:

the audio processing module is used for filtering interference information in the first audio to obtain second audio, wherein the first audio comprises a plurality of first audio fragments, and the second audio comprises a plurality of second audio fragments which are in one-to-one correspondence with the plurality of first audio fragments;

the quality detection module is used for generating a plurality of audio fragment sets according to the plurality of first audio fragments and the plurality of second audio fragments, wherein each audio fragment set comprises a first audio fragment and a second audio fragment with the same fragment position, the plurality of audio fragment sets do not comprise a start fragment set and a stop fragment set, the start fragment set represents a fragment set formed by the first audio fragment and the second audio fragment with the start fragment position, and the stop fragment set represents a fragment set formed by the first audio fragment and the second audio fragment with the end fragment position;

obtaining a comparison score according to the plurality of segment scores;

6. A detection apparatus, characterized in that the detection apparatus comprises a processor and a memory, the memory storing a computer program which, when executed by the processor, implements the recording quality detection method of any one of claims 1-4.

7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the recording quality detection method of any one of claims 1-4.