CN112885370B

CN112885370B - Sound card validity detection method and device

Info

Publication number: CN112885370B
Application number: CN202110033517.0A
Authority: CN
Inventors: 马金龙; 熊佳; 汪暾; 罗箫; 焦南凯; 徐志坚; 谢睿; 陈光尧
Original assignee: Guangzhou Huancheng Culture Media Co ltd
Current assignee: Guangzhou Huancheng Culture Media Co ltd
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2024-05-31
Anticipated expiration: 2041-01-11
Also published as: CN112885370A

Abstract

The application provides a sound card effectiveness detection method and device, wherein the method comprises the following steps: acquiring a sound card message to be detected, and extracting a sound signal in the sound card message; extracting sound features of the sound signal, the sound features including: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio; and inputting the sound features into a preset sound feature detection model so as to determine a sound card effectiveness detection result corresponding to the sound card message according to the feature confidence coefficient operation result output by the sound feature detection model and by combining the corresponding relation between the preset feature confidence coefficient operation result and the sound card effectiveness detection result. The method solves the technical problem that the existing voice effectiveness detection accuracy is low, so that abnormal sound cards are recommended and transmitted in a large quantity.

Description

Sound card validity detection method and device

Technical Field

The application relates to the technical field of voice, in particular to a method and a device for detecting effectiveness of a sound card.

Background

With the rapid development of 5G and artificial intelligence and the rising of popular entertainment products such as live broadcast, short video and the like, more social interaction playing methods are endless, so that unique sound attributes are required to be constructed during user registration and interaction, and the behavior attributes, feature tags and the like of users are detected through sound. Meanwhile, a large number of sound cards need to pass through the validity detection in the actual interaction process, the cards with the qualified validity detection are further analyzed, and the unqualified cards are re-recorded and deleted.

The current common voice effectiveness detection schemes are divided into two types, wherein the first type is that voice is judged by starting and ending by manual touch control of a recording button in an interactive mode, audio is recorded only by starting by manual clicking, and further effectiveness detection is not carried out on each frame of sound signal, so that a large number of invalid sound cards are generated to be transmitted and stored. The second type is to judge whether the sound signal is the sound signal or not through short-time average energy and average zero-crossing rate, but because the sound signal and the noise signal of the complex scene have no obvious characteristic distinction and difference, the detection accuracy of the double-threshold mode is not high, and a large number of abnormal sound cards are recommended and spread.

Disclosure of Invention

The application provides a sound card effectiveness detection method and device, which are used for solving the technical problem that abnormal sound cards are recommended and transmitted in a large quantity due to low accuracy of existing voice effectiveness detection.

First, the first aspect of the present application provides a sound card validity detection method, including:

Acquiring a sound card message to be detected, and extracting a sound signal in the sound card message;

Extracting sound features of the sound signal, the sound features comprising: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;

Inputting the sound features into a preset sound feature detection model, and performing feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card effectiveness detection results corresponding to the sound card messages according to the feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card effectiveness detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.

Preferably, extracting the sound signal in the sound card message further comprises:

preprocessing the sound signal, wherein the preprocessing comprises the following steps: windowing framing processing and pre-emphasis processing.

Preferably, the sound features consist in particular of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum and peak-to-valley ratio.

Preferably, the determining, according to the feature confidence operation result output by the sound feature detection model and in combination with a corresponding relation between a preset feature confidence operation result and a sound card validity detection result, the sound card validity detection result corresponding to the sound card message specifically includes:

Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;

And according to the matching result of the feature confidence coefficient and the preset first confidence coefficient interval and second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid.

Preferably, the acoustic feature detection model is specifically a C4.5 decision tree model.

Meanwhile, a second aspect of the present application provides a sound card validity detection apparatus comprising:

the sound signal extraction unit is used for acquiring the sound card message to be detected and extracting the sound signal in the sound card message;

A sound feature extraction unit for extracting sound features of the sound signal, the sound features including: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;

The validity detection unit is used for inputting the sound features into a preset sound feature detection model, carrying out feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card validity detection results corresponding to the sound card messages according to feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card validity detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.

Preferably, the method further comprises:

a preprocessing unit, configured to perform preprocessing on the sound signal, where the preprocessing includes: windowing framing processing and pre-emphasis processing.

From the above technical scheme, the application has the following advantages:

The first aspect of the present application provides a sound card validity detection method comprising: acquiring a sound card message to be detected, and extracting a sound signal in the sound card message; extracting sound features of the sound signal, the sound features comprising: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio; inputting the sound features into a preset sound feature detection model, and performing feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card effectiveness detection results corresponding to the sound card messages according to the feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card effectiveness detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.

According to the application, the sound feature detection model is trained by combining the decision tree model with the multi-feature attribute, the trained sound feature detection model is used for carrying out voice effectiveness detection by fusing the multi-feature, the intervention of human debugging on a system is reduced, compared with the characteristic components with double thresholds, the hard threshold judgment is carried out, the robustness is better, the generalization capability is stronger, and the technical problem that the abnormal sound card is greatly recommended and transmitted due to low accuracy of the existing voice effectiveness detection is solved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a flowchart of a first embodiment of a sound card validity detection method provided by the present application;

FIG. 2 is a flowchart illustrating a method for detecting validity of a sound card according to a second embodiment of the present application;

fig. 3 is a schematic structural diagram of a first embodiment of a sound card validity detection device provided by the present application.

Detailed Description

The embodiment of the application provides a sound card effectiveness detection method and device, which are used for solving the technical problem that abnormal sound cards are recommended and transmitted in a large quantity due to low accuracy of existing voice effectiveness detection.

In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, a first embodiment of the present application provides a sound card validity detection method, which includes:

step 101, acquiring a sound card message to be detected, and extracting a sound signal in the sound card message.

It should be noted that, when implementing the technical solution of the embodiment of the present application, firstly, a sound card message to be detected is obtained, and then, a corresponding sound signal is extracted from the sound card message.

Step 102, extracting sound characteristics of the sound signal, wherein the sound characteristics comprise: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants, and peak-to-valley ratio.

It should be noted that, based on the sound signal obtained in step 101, feature extraction is further performed to obtain sound features of the sound signal, where the sound features of the embodiment specifically include: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants, and peak-to-valley ratio.

Step 103, inputting the sound features into a preset sound feature detection model, and performing feature confidence operation on the sound features through the sound feature detection model so as to determine sound card validity detection results corresponding to sound card messages according to the feature confidence operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence operation results and the sound card validity detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.

It should be noted that, the sound feature extracted in step 102 is used as a model input, and is input into a preset sound feature detection model, and the sound feature input into the model is subjected to feature confidence operation through the sound feature detection model, so that the sound card validity detection result corresponding to the sound card message is determined according to the feature confidence operation result output by the sound feature detection model and in combination with the corresponding relation between the preset feature confidence operation result and the sound card validity detection result.

The sound feature detection model is a decision tree model obtained by training a preset sound feature sample, and the sound feature sample mentioned in this embodiment is a feature sample extracted by the same feature extraction manner in step 102 through a preset sound card sample, and the content of the sound feature sample is consistent with the sound feature mentioned in step 102.

The foregoing is a detailed description of a first embodiment of a sound card validity detection method provided by the present application, and the following is a detailed description of a second embodiment of a sound card validity detection method provided by the present application.

Referring to fig. 2, a second embodiment of the present application provides a sound card validity detection method, which includes:

step 201, acquiring a sound card message to be detected, and extracting a sound signal in the sound card message.

Step 202, preprocessing the sound signal, wherein the preprocessing comprises: windowing framing processing and pre-emphasis processing.

It should be noted that preprocessing is to frame and window PCM data and pre-emphasis. Framing refers to slicing a speech signal according to short-time stationarity, and in this embodiment, a frame length of 16ms is used. The windowing is typically a hamming or hanning window, and the preferred embodiment is a hamming window. Pre-emphasis is the step up of the energy of the high frequency components to avoid 6 dB/octave attenuation of the frequency components above 800Hz due to the glottal vibration and oronasal radiation of the user. The present embodiment preferably employs a first order high pass filter to achieve pre-emphasis in view of pre-emphasis effects.

Step 203, extracting the sound characteristics of the sound signal, wherein the sound characteristics specifically comprise short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum and peak-to-valley ratio.

It should be noted that, based on the sound feature combination selected by the trial and error, the optimal sound feature scheme provided in this embodiment extracts sound features composed of a short-time zero-crossing rate, a short-time energy spectrum, a pitch period, a formant, a short-time amplitude and a peak-to-valley ratio, as input parameters of a subsequent sound feature detection model, so as to achieve the effects of balancing detection accuracy and detection time delay.

Step 204, inputting the sound feature into a preset sound feature detection model, and performing feature confidence operation on the sound feature through the sound feature detection model so as to determine a sound card validity detection result corresponding to the sound card message according to the feature confidence operation result output by the sound feature detection model and the corresponding relation between the preset feature confidence operation result and the sound card validity detection result, wherein the sound feature detection model is a decision tree model trained according to the preset sound feature sample.

More specifically, the commonly used decision tree algorithms are ID3, C4.5, CART algorithms. The embodiment preferably adopts a better C4.5 decision tree algorithm according to actual needs, and C4.5 is a series of algorithms used in machine learning and data mining classification problems. Supervised learning mechanism through C4.5 decision tree algorithm: given a data set, each tuple therein can be described by a set of attribute values, each tuple belonging to a certain one of a mutually exclusive class. The goal of C4.5 is to find a mapping from attribute values to categories by learning, and this mapping can be used to classify new entities of unknown category.

The corresponding relation between the feature confidence operation result and the sound card validity detection result in this embodiment specifically includes: and the preset first confidence interval and the second confidence interval are used for judging which confidence interval the feature confidence is specifically in by matching the feature confidence extracted from the feature confidence operation result output by the sound feature detection model with the first confidence interval and the second confidence interval, if the feature confidence is in the range of the first confidence interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence is in the range of the second confidence interval, determining that the sound card validity detection result corresponding to the sound card message is invalid. In general, the higher the confidence level of the feature is, the higher the validity of the sound card is, whereas the lower the confidence level is, the lower the validity of the sound card is, therefore, in this embodiment, the range value of the first confidence interval is larger than the range value of the second confidence interval, and different business scenarios can determine the reasonable range of the confidence interval according to different discrimination rules.

The second embodiment of the method for detecting the validity of a sound card provided by the present application is described in detail above, and the following is a detailed description of the first embodiment of the device for detecting the validity of a sound card provided by the present application.

Referring to fig. 3, a third embodiment of the present application provides a sound card validity detecting device, including:

A sound signal extraction unit 301, configured to obtain a sound card message to be detected, and extract a sound signal in the sound card message;

A sound feature extraction unit 302, configured to extract sound features of the sound signal, where the sound features include: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;

the validity detection unit 303 is configured to input the sound feature into a preset sound feature detection model, perform feature confidence operation on the sound feature through the sound feature detection model, so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and a corresponding relation between the preset feature confidence operation result and the sound card validity detection result, where the sound feature detection model is a decision tree model obtained according to preset sound feature sample training.

More specifically, it further comprises:

The preprocessing unit 300 is configured to preprocess a sound signal, where the preprocessing includes: windowing framing processing and pre-emphasis processing.

More specifically, the sound features are composed of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum, and peak-to-valley ratio.

More specifically, according to the feature confidence operation result output by the sound feature detection model, in combination with the corresponding relation between the preset feature confidence operation result and the sound card validity detection result, determining the sound card validity detection result corresponding to the sound card message specifically includes:

according to the matching result of the feature confidence coefficient and the preset first confidence coefficient interval and second confidence coefficient interval, if the feature confidence coefficient is within the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is within the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid.

More specifically, the acoustic feature detection model is specifically a C4.5 decision tree model.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A sound card validity detection method, characterized by comprising:

preprocessing the sound signal, wherein the preprocessing comprises the following steps: windowing and framing treatment and pre-emphasis treatment;

inputting the sound feature into a preset sound feature detection model, and performing feature confidence operation on the sound feature through the sound feature detection model so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and combining a corresponding relation between the preset feature confidence operation result and the sound card validity detection result, wherein the method specifically comprises the following steps:

According to the matching result of the feature confidence coefficient and a preset first confidence coefficient interval and a preset second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid;

The sound feature detection model is a decision tree model obtained through training according to a preset sound feature sample.

2. The sound card effectiveness detection method according to claim 1, wherein the sound features are specifically composed of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum, and peak-to-valley ratio.

3. The sound card effectiveness detection method of claim 1, wherein the sound feature detection model is a C4.5 decision tree model.

4. A sound card validity detection apparatus, comprising:

A preprocessing unit, configured to perform preprocessing on the sound signal, where the preprocessing includes: windowing and framing treatment and pre-emphasis treatment;

the validity detection unit is configured to input the sound feature into a preset sound feature detection model, perform feature confidence operation on the sound feature through the sound feature detection model, so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and by combining a preset correspondence between the feature confidence operation result and the sound card validity detection result, and specifically includes:

5. The sound card effectiveness detection device of claim 4, wherein the sound features are comprised of short time zero crossing rate, short time energy spectrum, pitch period, formants, short time amplitude sum, and peak to valley ratio.

6. The sound card effectiveness detection device of claim 4, wherein the sound feature detection model is embodied as a C4.5 decision tree model.