CN112885370B - Sound card validity detection method and device - Google Patents

Sound card validity detection method and device Download PDF

Info

Publication number
CN112885370B
CN112885370B CN202110033517.0A CN202110033517A CN112885370B CN 112885370 B CN112885370 B CN 112885370B CN 202110033517 A CN202110033517 A CN 202110033517A CN 112885370 B CN112885370 B CN 112885370B
Authority
CN
China
Prior art keywords
sound
feature
sound card
confidence coefficient
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110033517.0A
Other languages
Chinese (zh)
Other versions
CN112885370A (en
Inventor
马金龙
熊佳
汪暾
罗箫
焦南凯
徐志坚
谢睿
陈光尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huancheng Culture Media Co ltd
Original Assignee
Guangzhou Huancheng Culture Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huancheng Culture Media Co ltd filed Critical Guangzhou Huancheng Culture Media Co ltd
Priority to CN202110033517.0A priority Critical patent/CN112885370B/en
Publication of CN112885370A publication Critical patent/CN112885370A/en
Application granted granted Critical
Publication of CN112885370B publication Critical patent/CN112885370B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The application provides a sound card effectiveness detection method and device, wherein the method comprises the following steps: acquiring a sound card message to be detected, and extracting a sound signal in the sound card message; extracting sound features of the sound signal, the sound features including: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio; and inputting the sound features into a preset sound feature detection model so as to determine a sound card effectiveness detection result corresponding to the sound card message according to the feature confidence coefficient operation result output by the sound feature detection model and by combining the corresponding relation between the preset feature confidence coefficient operation result and the sound card effectiveness detection result. The method solves the technical problem that the existing voice effectiveness detection accuracy is low, so that abnormal sound cards are recommended and transmitted in a large quantity.

Description

Sound card validity detection method and device
Technical Field
The application relates to the technical field of voice, in particular to a method and a device for detecting effectiveness of a sound card.
Background
With the rapid development of 5G and artificial intelligence and the rising of popular entertainment products such as live broadcast, short video and the like, more social interaction playing methods are endless, so that unique sound attributes are required to be constructed during user registration and interaction, and the behavior attributes, feature tags and the like of users are detected through sound. Meanwhile, a large number of sound cards need to pass through the validity detection in the actual interaction process, the cards with the qualified validity detection are further analyzed, and the unqualified cards are re-recorded and deleted.
The current common voice effectiveness detection schemes are divided into two types, wherein the first type is that voice is judged by starting and ending by manual touch control of a recording button in an interactive mode, audio is recorded only by starting by manual clicking, and further effectiveness detection is not carried out on each frame of sound signal, so that a large number of invalid sound cards are generated to be transmitted and stored. The second type is to judge whether the sound signal is the sound signal or not through short-time average energy and average zero-crossing rate, but because the sound signal and the noise signal of the complex scene have no obvious characteristic distinction and difference, the detection accuracy of the double-threshold mode is not high, and a large number of abnormal sound cards are recommended and spread.
Disclosure of Invention
The application provides a sound card effectiveness detection method and device, which are used for solving the technical problem that abnormal sound cards are recommended and transmitted in a large quantity due to low accuracy of existing voice effectiveness detection.
First, the first aspect of the present application provides a sound card validity detection method, including:
Acquiring a sound card message to be detected, and extracting a sound signal in the sound card message;
Extracting sound features of the sound signal, the sound features comprising: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
Inputting the sound features into a preset sound feature detection model, and performing feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card effectiveness detection results corresponding to the sound card messages according to the feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card effectiveness detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.
Preferably, extracting the sound signal in the sound card message further comprises:
preprocessing the sound signal, wherein the preprocessing comprises the following steps: windowing framing processing and pre-emphasis processing.
Preferably, the sound features consist in particular of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum and peak-to-valley ratio.
Preferably, the determining, according to the feature confidence operation result output by the sound feature detection model and in combination with a corresponding relation between a preset feature confidence operation result and a sound card validity detection result, the sound card validity detection result corresponding to the sound card message specifically includes:
Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
And according to the matching result of the feature confidence coefficient and the preset first confidence coefficient interval and second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid.
Preferably, the acoustic feature detection model is specifically a C4.5 decision tree model.
Meanwhile, a second aspect of the present application provides a sound card validity detection apparatus comprising:
the sound signal extraction unit is used for acquiring the sound card message to be detected and extracting the sound signal in the sound card message;
A sound feature extraction unit for extracting sound features of the sound signal, the sound features including: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
The validity detection unit is used for inputting the sound features into a preset sound feature detection model, carrying out feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card validity detection results corresponding to the sound card messages according to feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card validity detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.
Preferably, the method further comprises:
a preprocessing unit, configured to perform preprocessing on the sound signal, where the preprocessing includes: windowing framing processing and pre-emphasis processing.
Preferably, the sound features consist in particular of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum and peak-to-valley ratio.
Preferably, the determining, according to the feature confidence operation result output by the sound feature detection model and in combination with a corresponding relation between a preset feature confidence operation result and a sound card validity detection result, the sound card validity detection result corresponding to the sound card message specifically includes:
Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
And according to the matching result of the feature confidence coefficient and the preset first confidence coefficient interval and second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid.
Preferably, the acoustic feature detection model is specifically a C4.5 decision tree model.
From the above technical scheme, the application has the following advantages:
The first aspect of the present application provides a sound card validity detection method comprising: acquiring a sound card message to be detected, and extracting a sound signal in the sound card message; extracting sound features of the sound signal, the sound features comprising: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio; inputting the sound features into a preset sound feature detection model, and performing feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card effectiveness detection results corresponding to the sound card messages according to the feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card effectiveness detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.
According to the application, the sound feature detection model is trained by combining the decision tree model with the multi-feature attribute, the trained sound feature detection model is used for carrying out voice effectiveness detection by fusing the multi-feature, the intervention of human debugging on a system is reduced, compared with the characteristic components with double thresholds, the hard threshold judgment is carried out, the robustness is better, the generalization capability is stronger, and the technical problem that the abnormal sound card is greatly recommended and transmitted due to low accuracy of the existing voice effectiveness detection is solved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a flowchart of a first embodiment of a sound card validity detection method provided by the present application;
FIG. 2 is a flowchart illustrating a method for detecting validity of a sound card according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of a first embodiment of a sound card validity detection device provided by the present application.
Detailed Description
The embodiment of the application provides a sound card effectiveness detection method and device, which are used for solving the technical problem that abnormal sound cards are recommended and transmitted in a large quantity due to low accuracy of existing voice effectiveness detection.
In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, a first embodiment of the present application provides a sound card validity detection method, which includes:
step 101, acquiring a sound card message to be detected, and extracting a sound signal in the sound card message.
It should be noted that, when implementing the technical solution of the embodiment of the present application, firstly, a sound card message to be detected is obtained, and then, a corresponding sound signal is extracted from the sound card message.
Step 102, extracting sound characteristics of the sound signal, wherein the sound characteristics comprise: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants, and peak-to-valley ratio.
It should be noted that, based on the sound signal obtained in step 101, feature extraction is further performed to obtain sound features of the sound signal, where the sound features of the embodiment specifically include: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants, and peak-to-valley ratio.
Step 103, inputting the sound features into a preset sound feature detection model, and performing feature confidence operation on the sound features through the sound feature detection model so as to determine sound card validity detection results corresponding to sound card messages according to the feature confidence operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence operation results and the sound card validity detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.
It should be noted that, the sound feature extracted in step 102 is used as a model input, and is input into a preset sound feature detection model, and the sound feature input into the model is subjected to feature confidence operation through the sound feature detection model, so that the sound card validity detection result corresponding to the sound card message is determined according to the feature confidence operation result output by the sound feature detection model and in combination with the corresponding relation between the preset feature confidence operation result and the sound card validity detection result.
The sound feature detection model is a decision tree model obtained by training a preset sound feature sample, and the sound feature sample mentioned in this embodiment is a feature sample extracted by the same feature extraction manner in step 102 through a preset sound card sample, and the content of the sound feature sample is consistent with the sound feature mentioned in step 102.
The foregoing is a detailed description of a first embodiment of a sound card validity detection method provided by the present application, and the following is a detailed description of a second embodiment of a sound card validity detection method provided by the present application.
Referring to fig. 2, a second embodiment of the present application provides a sound card validity detection method, which includes:
step 201, acquiring a sound card message to be detected, and extracting a sound signal in the sound card message.
Step 202, preprocessing the sound signal, wherein the preprocessing comprises: windowing framing processing and pre-emphasis processing.
It should be noted that preprocessing is to frame and window PCM data and pre-emphasis. Framing refers to slicing a speech signal according to short-time stationarity, and in this embodiment, a frame length of 16ms is used. The windowing is typically a hamming or hanning window, and the preferred embodiment is a hamming window. Pre-emphasis is the step up of the energy of the high frequency components to avoid 6 dB/octave attenuation of the frequency components above 800Hz due to the glottal vibration and oronasal radiation of the user. The present embodiment preferably employs a first order high pass filter to achieve pre-emphasis in view of pre-emphasis effects.
Step 203, extracting the sound characteristics of the sound signal, wherein the sound characteristics specifically comprise short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum and peak-to-valley ratio.
It should be noted that, based on the sound feature combination selected by the trial and error, the optimal sound feature scheme provided in this embodiment extracts sound features composed of a short-time zero-crossing rate, a short-time energy spectrum, a pitch period, a formant, a short-time amplitude and a peak-to-valley ratio, as input parameters of a subsequent sound feature detection model, so as to achieve the effects of balancing detection accuracy and detection time delay.
Step 204, inputting the sound feature into a preset sound feature detection model, and performing feature confidence operation on the sound feature through the sound feature detection model so as to determine a sound card validity detection result corresponding to the sound card message according to the feature confidence operation result output by the sound feature detection model and the corresponding relation between the preset feature confidence operation result and the sound card validity detection result, wherein the sound feature detection model is a decision tree model trained according to the preset sound feature sample.
More specifically, the commonly used decision tree algorithms are ID3, C4.5, CART algorithms. The embodiment preferably adopts a better C4.5 decision tree algorithm according to actual needs, and C4.5 is a series of algorithms used in machine learning and data mining classification problems. Supervised learning mechanism through C4.5 decision tree algorithm: given a data set, each tuple therein can be described by a set of attribute values, each tuple belonging to a certain one of a mutually exclusive class. The goal of C4.5 is to find a mapping from attribute values to categories by learning, and this mapping can be used to classify new entities of unknown category.
The corresponding relation between the feature confidence operation result and the sound card validity detection result in this embodiment specifically includes: and the preset first confidence interval and the second confidence interval are used for judging which confidence interval the feature confidence is specifically in by matching the feature confidence extracted from the feature confidence operation result output by the sound feature detection model with the first confidence interval and the second confidence interval, if the feature confidence is in the range of the first confidence interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence is in the range of the second confidence interval, determining that the sound card validity detection result corresponding to the sound card message is invalid. In general, the higher the confidence level of the feature is, the higher the validity of the sound card is, whereas the lower the confidence level is, the lower the validity of the sound card is, therefore, in this embodiment, the range value of the first confidence interval is larger than the range value of the second confidence interval, and different business scenarios can determine the reasonable range of the confidence interval according to different discrimination rules.
The second embodiment of the method for detecting the validity of a sound card provided by the present application is described in detail above, and the following is a detailed description of the first embodiment of the device for detecting the validity of a sound card provided by the present application.
Referring to fig. 3, a third embodiment of the present application provides a sound card validity detecting device, including:
A sound signal extraction unit 301, configured to obtain a sound card message to be detected, and extract a sound signal in the sound card message;
A sound feature extraction unit 302, configured to extract sound features of the sound signal, where the sound features include: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
the validity detection unit 303 is configured to input the sound feature into a preset sound feature detection model, perform feature confidence operation on the sound feature through the sound feature detection model, so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and a corresponding relation between the preset feature confidence operation result and the sound card validity detection result, where the sound feature detection model is a decision tree model obtained according to preset sound feature sample training.
More specifically, it further comprises:
The preprocessing unit 300 is configured to preprocess a sound signal, where the preprocessing includes: windowing framing processing and pre-emphasis processing.
More specifically, the sound features are composed of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum, and peak-to-valley ratio.
More specifically, according to the feature confidence operation result output by the sound feature detection model, in combination with the corresponding relation between the preset feature confidence operation result and the sound card validity detection result, determining the sound card validity detection result corresponding to the sound card message specifically includes:
extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
according to the matching result of the feature confidence coefficient and the preset first confidence coefficient interval and second confidence coefficient interval, if the feature confidence coefficient is within the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is within the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid.
More specifically, the acoustic feature detection model is specifically a C4.5 decision tree model.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (6)

1. A sound card validity detection method, characterized by comprising:
Acquiring a sound card message to be detected, and extracting a sound signal in the sound card message;
preprocessing the sound signal, wherein the preprocessing comprises the following steps: windowing and framing treatment and pre-emphasis treatment;
Extracting sound features of the sound signal, the sound features comprising: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
inputting the sound feature into a preset sound feature detection model, and performing feature confidence operation on the sound feature through the sound feature detection model so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and combining a corresponding relation between the preset feature confidence operation result and the sound card validity detection result, wherein the method specifically comprises the following steps:
Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
According to the matching result of the feature confidence coefficient and a preset first confidence coefficient interval and a preset second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid;
The sound feature detection model is a decision tree model obtained through training according to a preset sound feature sample.
2. The sound card effectiveness detection method according to claim 1, wherein the sound features are specifically composed of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum, and peak-to-valley ratio.
3. The sound card effectiveness detection method of claim 1, wherein the sound feature detection model is a C4.5 decision tree model.
4. A sound card validity detection apparatus, comprising:
the sound signal extraction unit is used for acquiring the sound card message to be detected and extracting the sound signal in the sound card message;
A preprocessing unit, configured to perform preprocessing on the sound signal, where the preprocessing includes: windowing and framing treatment and pre-emphasis treatment;
A sound feature extraction unit for extracting sound features of the sound signal, the sound features including: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
the validity detection unit is configured to input the sound feature into a preset sound feature detection model, perform feature confidence operation on the sound feature through the sound feature detection model, so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and by combining a preset correspondence between the feature confidence operation result and the sound card validity detection result, and specifically includes:
Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
According to the matching result of the feature confidence coefficient and a preset first confidence coefficient interval and a preset second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid;
The sound feature detection model is a decision tree model obtained through training according to a preset sound feature sample.
5. The sound card effectiveness detection device of claim 4, wherein the sound features are comprised of short time zero crossing rate, short time energy spectrum, pitch period, formants, short time amplitude sum, and peak to valley ratio.
6. The sound card effectiveness detection device of claim 4, wherein the sound feature detection model is embodied as a C4.5 decision tree model.
CN202110033517.0A 2021-01-11 2021-01-11 Sound card validity detection method and device Active CN112885370B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110033517.0A CN112885370B (en) 2021-01-11 2021-01-11 Sound card validity detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110033517.0A CN112885370B (en) 2021-01-11 2021-01-11 Sound card validity detection method and device

Publications (2)

Publication Number Publication Date
CN112885370A CN112885370A (en) 2021-06-01
CN112885370B true CN112885370B (en) 2024-05-31

Family

ID=76044703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110033517.0A Active CN112885370B (en) 2021-01-11 2021-01-11 Sound card validity detection method and device

Country Status (1)

Country Link
CN (1) CN112885370B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105529028A (en) * 2015-12-09 2016-04-27 百度在线网络技术(北京)有限公司 Voice analytical method and apparatus
CN108564940A (en) * 2018-03-20 2018-09-21 平安科技(深圳)有限公司 Audio recognition method, server and computer readable storage medium
CN110265063A (en) * 2019-07-22 2019-09-20 东南大学 A kind of lie detecting method based on fixed duration speech emotion recognition sequence analysis
CN110880318A (en) * 2019-11-27 2020-03-13 云知声智能科技股份有限公司 Voice recognition method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170140750A1 (en) * 2015-11-17 2017-05-18 Le Holdings (Beijing) Co., Ltd. Method and device for speech recognition
CN108305615B (en) * 2017-10-23 2020-06-16 腾讯科技(深圳)有限公司 Object identification method and device, storage medium and terminal thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105529028A (en) * 2015-12-09 2016-04-27 百度在线网络技术(北京)有限公司 Voice analytical method and apparatus
CN108564940A (en) * 2018-03-20 2018-09-21 平安科技(深圳)有限公司 Audio recognition method, server and computer readable storage medium
CN110265063A (en) * 2019-07-22 2019-09-20 东南大学 A kind of lie detecting method based on fixed duration speech emotion recognition sequence analysis
CN110880318A (en) * 2019-11-27 2020-03-13 云知声智能科技股份有限公司 Voice recognition method and device

Also Published As

Publication number Publication date
CN112885370A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
Foster et al. Chime-home: A dataset for sound source recognition in a domestic environment
WO2020024690A1 (en) Speech labeling method and apparatus, and device
KR101269296B1 (en) Neural network classifier for separating audio sources from a monophonic audio signal
Umapathy et al. Audio signal feature extraction and classification using local discriminant bases
KR102128926B1 (en) Method and device for processing audio information
CN106653001A (en) Baby crying identifying method and system
Krijnders et al. Sound event recognition through expectancy-based evaluation ofsignal-driven hypotheses
Jiang et al. An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K‐Means
Ghaemmaghami et al. Noise robust voice activity detection using features extracted from the time-domain autocorrelation function
Ntalampiras et al. Acoustic detection of human activities in natural environments
Kim et al. Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology.
CN102623009A (en) Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN111951824A (en) Detection method for distinguishing depression based on sound
CN108899033B (en) Method and device for determining speaker characteristics
WO2017045429A1 (en) Audio data detection method and system and storage medium
CN109408660A (en) A method of the music based on audio frequency characteristics is classified automatically
CN113707173B (en) Voice separation method, device, equipment and storage medium based on audio segmentation
CN112435687A (en) Audio detection method and device, computer equipment and readable storage medium
Wood et al. Classification of African elephant Loxodonta africana rumbles using acoustic parameters and cluster analysis
WO2022134781A1 (en) Prolonged speech detection method, apparatus and device, and storage medium
Huang et al. Fast diagnosis of bowel activities
CN112885370B (en) Sound card validity detection method and device
CN113782051B (en) Broadcast effect classification method and system, electronic equipment and storage medium
CN108766465A (en) A kind of digital audio based on ENF universal background models distorts blind checking method
CN114463671A (en) User personality identification method based on video data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant