CN112885370B - Sound card validity detection method and device - Google Patents
Sound card validity detection method and device Download PDFInfo
- Publication number
- CN112885370B CN112885370B CN202110033517.0A CN202110033517A CN112885370B CN 112885370 B CN112885370 B CN 112885370B CN 202110033517 A CN202110033517 A CN 202110033517A CN 112885370 B CN112885370 B CN 112885370B
- Authority
- CN
- China
- Prior art keywords
- sound
- feature
- sound card
- confidence coefficient
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 129
- 230000005236 sound signal Effects 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 18
- 238000001228 spectrum Methods 0.000 claims abstract description 16
- 238000003066 decision tree Methods 0.000 claims description 18
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 8
- 238000009432 framing Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 abstract description 5
- 238000012545 processing Methods 0.000 description 9
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/15—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Electrically Operated Instructional Devices (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The application provides a sound card effectiveness detection method and device, wherein the method comprises the following steps: acquiring a sound card message to be detected, and extracting a sound signal in the sound card message; extracting sound features of the sound signal, the sound features including: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio; and inputting the sound features into a preset sound feature detection model so as to determine a sound card effectiveness detection result corresponding to the sound card message according to the feature confidence coefficient operation result output by the sound feature detection model and by combining the corresponding relation between the preset feature confidence coefficient operation result and the sound card effectiveness detection result. The method solves the technical problem that the existing voice effectiveness detection accuracy is low, so that abnormal sound cards are recommended and transmitted in a large quantity.
Description
Technical Field
The application relates to the technical field of voice, in particular to a method and a device for detecting effectiveness of a sound card.
Background
With the rapid development of 5G and artificial intelligence and the rising of popular entertainment products such as live broadcast, short video and the like, more social interaction playing methods are endless, so that unique sound attributes are required to be constructed during user registration and interaction, and the behavior attributes, feature tags and the like of users are detected through sound. Meanwhile, a large number of sound cards need to pass through the validity detection in the actual interaction process, the cards with the qualified validity detection are further analyzed, and the unqualified cards are re-recorded and deleted.
The current common voice effectiveness detection schemes are divided into two types, wherein the first type is that voice is judged by starting and ending by manual touch control of a recording button in an interactive mode, audio is recorded only by starting by manual clicking, and further effectiveness detection is not carried out on each frame of sound signal, so that a large number of invalid sound cards are generated to be transmitted and stored. The second type is to judge whether the sound signal is the sound signal or not through short-time average energy and average zero-crossing rate, but because the sound signal and the noise signal of the complex scene have no obvious characteristic distinction and difference, the detection accuracy of the double-threshold mode is not high, and a large number of abnormal sound cards are recommended and spread.
Disclosure of Invention
The application provides a sound card effectiveness detection method and device, which are used for solving the technical problem that abnormal sound cards are recommended and transmitted in a large quantity due to low accuracy of existing voice effectiveness detection.
First, the first aspect of the present application provides a sound card validity detection method, including:
Acquiring a sound card message to be detected, and extracting a sound signal in the sound card message;
Extracting sound features of the sound signal, the sound features comprising: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
Inputting the sound features into a preset sound feature detection model, and performing feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card effectiveness detection results corresponding to the sound card messages according to the feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card effectiveness detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.
Preferably, extracting the sound signal in the sound card message further comprises:
preprocessing the sound signal, wherein the preprocessing comprises the following steps: windowing framing processing and pre-emphasis processing.
Preferably, the sound features consist in particular of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum and peak-to-valley ratio.
Preferably, the determining, according to the feature confidence operation result output by the sound feature detection model and in combination with a corresponding relation between a preset feature confidence operation result and a sound card validity detection result, the sound card validity detection result corresponding to the sound card message specifically includes:
Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
And according to the matching result of the feature confidence coefficient and the preset first confidence coefficient interval and second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid.
Preferably, the acoustic feature detection model is specifically a C4.5 decision tree model.
Meanwhile, a second aspect of the present application provides a sound card validity detection apparatus comprising:
the sound signal extraction unit is used for acquiring the sound card message to be detected and extracting the sound signal in the sound card message;
A sound feature extraction unit for extracting sound features of the sound signal, the sound features including: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
The validity detection unit is used for inputting the sound features into a preset sound feature detection model, carrying out feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card validity detection results corresponding to the sound card messages according to feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card validity detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.
Preferably, the method further comprises:
a preprocessing unit, configured to perform preprocessing on the sound signal, where the preprocessing includes: windowing framing processing and pre-emphasis processing.
Preferably, the sound features consist in particular of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum and peak-to-valley ratio.
Preferably, the determining, according to the feature confidence operation result output by the sound feature detection model and in combination with a corresponding relation between a preset feature confidence operation result and a sound card validity detection result, the sound card validity detection result corresponding to the sound card message specifically includes:
Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
And according to the matching result of the feature confidence coefficient and the preset first confidence coefficient interval and second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid.
Preferably, the acoustic feature detection model is specifically a C4.5 decision tree model.
From the above technical scheme, the application has the following advantages:
The first aspect of the present application provides a sound card validity detection method comprising: acquiring a sound card message to be detected, and extracting a sound signal in the sound card message; extracting sound features of the sound signal, the sound features comprising: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio; inputting the sound features into a preset sound feature detection model, and performing feature confidence coefficient operation on the sound features through the sound feature detection model so as to determine sound card effectiveness detection results corresponding to the sound card messages according to the feature confidence coefficient operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence coefficient operation results and the sound card effectiveness detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.
According to the application, the sound feature detection model is trained by combining the decision tree model with the multi-feature attribute, the trained sound feature detection model is used for carrying out voice effectiveness detection by fusing the multi-feature, the intervention of human debugging on a system is reduced, compared with the characteristic components with double thresholds, the hard threshold judgment is carried out, the robustness is better, the generalization capability is stronger, and the technical problem that the abnormal sound card is greatly recommended and transmitted due to low accuracy of the existing voice effectiveness detection is solved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a flowchart of a first embodiment of a sound card validity detection method provided by the present application;
FIG. 2 is a flowchart illustrating a method for detecting validity of a sound card according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of a first embodiment of a sound card validity detection device provided by the present application.
Detailed Description
The embodiment of the application provides a sound card effectiveness detection method and device, which are used for solving the technical problem that abnormal sound cards are recommended and transmitted in a large quantity due to low accuracy of existing voice effectiveness detection.
In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, a first embodiment of the present application provides a sound card validity detection method, which includes:
step 101, acquiring a sound card message to be detected, and extracting a sound signal in the sound card message.
It should be noted that, when implementing the technical solution of the embodiment of the present application, firstly, a sound card message to be detected is obtained, and then, a corresponding sound signal is extracted from the sound card message.
Step 102, extracting sound characteristics of the sound signal, wherein the sound characteristics comprise: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants, and peak-to-valley ratio.
It should be noted that, based on the sound signal obtained in step 101, feature extraction is further performed to obtain sound features of the sound signal, where the sound features of the embodiment specifically include: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants, and peak-to-valley ratio.
Step 103, inputting the sound features into a preset sound feature detection model, and performing feature confidence operation on the sound features through the sound feature detection model so as to determine sound card validity detection results corresponding to sound card messages according to the feature confidence operation results output by the sound feature detection model and the corresponding relation between the preset feature confidence operation results and the sound card validity detection results, wherein the sound feature detection model is a decision tree model trained according to preset sound feature samples.
It should be noted that, the sound feature extracted in step 102 is used as a model input, and is input into a preset sound feature detection model, and the sound feature input into the model is subjected to feature confidence operation through the sound feature detection model, so that the sound card validity detection result corresponding to the sound card message is determined according to the feature confidence operation result output by the sound feature detection model and in combination with the corresponding relation between the preset feature confidence operation result and the sound card validity detection result.
The sound feature detection model is a decision tree model obtained by training a preset sound feature sample, and the sound feature sample mentioned in this embodiment is a feature sample extracted by the same feature extraction manner in step 102 through a preset sound card sample, and the content of the sound feature sample is consistent with the sound feature mentioned in step 102.
The foregoing is a detailed description of a first embodiment of a sound card validity detection method provided by the present application, and the following is a detailed description of a second embodiment of a sound card validity detection method provided by the present application.
Referring to fig. 2, a second embodiment of the present application provides a sound card validity detection method, which includes:
step 201, acquiring a sound card message to be detected, and extracting a sound signal in the sound card message.
Step 202, preprocessing the sound signal, wherein the preprocessing comprises: windowing framing processing and pre-emphasis processing.
It should be noted that preprocessing is to frame and window PCM data and pre-emphasis. Framing refers to slicing a speech signal according to short-time stationarity, and in this embodiment, a frame length of 16ms is used. The windowing is typically a hamming or hanning window, and the preferred embodiment is a hamming window. Pre-emphasis is the step up of the energy of the high frequency components to avoid 6 dB/octave attenuation of the frequency components above 800Hz due to the glottal vibration and oronasal radiation of the user. The present embodiment preferably employs a first order high pass filter to achieve pre-emphasis in view of pre-emphasis effects.
Step 203, extracting the sound characteristics of the sound signal, wherein the sound characteristics specifically comprise short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum and peak-to-valley ratio.
It should be noted that, based on the sound feature combination selected by the trial and error, the optimal sound feature scheme provided in this embodiment extracts sound features composed of a short-time zero-crossing rate, a short-time energy spectrum, a pitch period, a formant, a short-time amplitude and a peak-to-valley ratio, as input parameters of a subsequent sound feature detection model, so as to achieve the effects of balancing detection accuracy and detection time delay.
Step 204, inputting the sound feature into a preset sound feature detection model, and performing feature confidence operation on the sound feature through the sound feature detection model so as to determine a sound card validity detection result corresponding to the sound card message according to the feature confidence operation result output by the sound feature detection model and the corresponding relation between the preset feature confidence operation result and the sound card validity detection result, wherein the sound feature detection model is a decision tree model trained according to the preset sound feature sample.
More specifically, the commonly used decision tree algorithms are ID3, C4.5, CART algorithms. The embodiment preferably adopts a better C4.5 decision tree algorithm according to actual needs, and C4.5 is a series of algorithms used in machine learning and data mining classification problems. Supervised learning mechanism through C4.5 decision tree algorithm: given a data set, each tuple therein can be described by a set of attribute values, each tuple belonging to a certain one of a mutually exclusive class. The goal of C4.5 is to find a mapping from attribute values to categories by learning, and this mapping can be used to classify new entities of unknown category.
The corresponding relation between the feature confidence operation result and the sound card validity detection result in this embodiment specifically includes: and the preset first confidence interval and the second confidence interval are used for judging which confidence interval the feature confidence is specifically in by matching the feature confidence extracted from the feature confidence operation result output by the sound feature detection model with the first confidence interval and the second confidence interval, if the feature confidence is in the range of the first confidence interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence is in the range of the second confidence interval, determining that the sound card validity detection result corresponding to the sound card message is invalid. In general, the higher the confidence level of the feature is, the higher the validity of the sound card is, whereas the lower the confidence level is, the lower the validity of the sound card is, therefore, in this embodiment, the range value of the first confidence interval is larger than the range value of the second confidence interval, and different business scenarios can determine the reasonable range of the confidence interval according to different discrimination rules.
The second embodiment of the method for detecting the validity of a sound card provided by the present application is described in detail above, and the following is a detailed description of the first embodiment of the device for detecting the validity of a sound card provided by the present application.
Referring to fig. 3, a third embodiment of the present application provides a sound card validity detecting device, including:
A sound signal extraction unit 301, configured to obtain a sound card message to be detected, and extract a sound signal in the sound card message;
A sound feature extraction unit 302, configured to extract sound features of the sound signal, where the sound features include: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
the validity detection unit 303 is configured to input the sound feature into a preset sound feature detection model, perform feature confidence operation on the sound feature through the sound feature detection model, so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and a corresponding relation between the preset feature confidence operation result and the sound card validity detection result, where the sound feature detection model is a decision tree model obtained according to preset sound feature sample training.
More specifically, it further comprises:
The preprocessing unit 300 is configured to preprocess a sound signal, where the preprocessing includes: windowing framing processing and pre-emphasis processing.
More specifically, the sound features are composed of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum, and peak-to-valley ratio.
More specifically, according to the feature confidence operation result output by the sound feature detection model, in combination with the corresponding relation between the preset feature confidence operation result and the sound card validity detection result, determining the sound card validity detection result corresponding to the sound card message specifically includes:
extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
according to the matching result of the feature confidence coefficient and the preset first confidence coefficient interval and second confidence coefficient interval, if the feature confidence coefficient is within the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is within the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid.
More specifically, the acoustic feature detection model is specifically a C4.5 decision tree model.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.
Claims (6)
1. A sound card validity detection method, characterized by comprising:
Acquiring a sound card message to be detected, and extracting a sound signal in the sound card message;
preprocessing the sound signal, wherein the preprocessing comprises the following steps: windowing and framing treatment and pre-emphasis treatment;
Extracting sound features of the sound signal, the sound features comprising: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
inputting the sound feature into a preset sound feature detection model, and performing feature confidence operation on the sound feature through the sound feature detection model so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and combining a corresponding relation between the preset feature confidence operation result and the sound card validity detection result, wherein the method specifically comprises the following steps:
Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
According to the matching result of the feature confidence coefficient and a preset first confidence coefficient interval and a preset second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid;
The sound feature detection model is a decision tree model obtained through training according to a preset sound feature sample.
2. The sound card effectiveness detection method according to claim 1, wherein the sound features are specifically composed of short-time zero-crossing rate, short-time energy spectrum, pitch period, formants, short-time amplitude sum, and peak-to-valley ratio.
3. The sound card effectiveness detection method of claim 1, wherein the sound feature detection model is a C4.5 decision tree model.
4. A sound card validity detection apparatus, comprising:
the sound signal extraction unit is used for acquiring the sound card message to be detected and extracting the sound signal in the sound card message;
A preprocessing unit, configured to perform preprocessing on the sound signal, where the preprocessing includes: windowing and framing treatment and pre-emphasis treatment;
A sound feature extraction unit for extracting sound features of the sound signal, the sound features including: short time zero crossing rate, short time amplitude, short time energy spectrum, autocorrelation coefficient, fundamental frequency, pitch period, harmonic energy sum, formants and peak-to-valley ratio;
the validity detection unit is configured to input the sound feature into a preset sound feature detection model, perform feature confidence operation on the sound feature through the sound feature detection model, so as to determine a sound card validity detection result corresponding to the sound card message according to a feature confidence operation result output by the sound feature detection model and by combining a preset correspondence between the feature confidence operation result and the sound card validity detection result, and specifically includes:
Extracting feature confidence coefficient in the feature confidence coefficient operation result according to the feature confidence coefficient operation result output by the sound feature detection model;
According to the matching result of the feature confidence coefficient and a preset first confidence coefficient interval and a preset second confidence coefficient interval, if the feature confidence coefficient is in the range of the first confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is valid, and if the feature confidence coefficient is in the range of the second confidence coefficient interval, determining that the sound card validity detection result corresponding to the sound card message is invalid;
The sound feature detection model is a decision tree model obtained through training according to a preset sound feature sample.
5. The sound card effectiveness detection device of claim 4, wherein the sound features are comprised of short time zero crossing rate, short time energy spectrum, pitch period, formants, short time amplitude sum, and peak to valley ratio.
6. The sound card effectiveness detection device of claim 4, wherein the sound feature detection model is embodied as a C4.5 decision tree model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110033517.0A CN112885370B (en) | 2021-01-11 | 2021-01-11 | Sound card validity detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110033517.0A CN112885370B (en) | 2021-01-11 | 2021-01-11 | Sound card validity detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112885370A CN112885370A (en) | 2021-06-01 |
CN112885370B true CN112885370B (en) | 2024-05-31 |
Family
ID=76044703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110033517.0A Active CN112885370B (en) | 2021-01-11 | 2021-01-11 | Sound card validity detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112885370B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105529028A (en) * | 2015-12-09 | 2016-04-27 | 百度在线网络技术(北京)有限公司 | Voice analytical method and apparatus |
CN108564940A (en) * | 2018-03-20 | 2018-09-21 | 平安科技(深圳)有限公司 | Audio recognition method, server and computer readable storage medium |
CN110265063A (en) * | 2019-07-22 | 2019-09-20 | 东南大学 | A kind of lie detecting method based on fixed duration speech emotion recognition sequence analysis |
CN110880318A (en) * | 2019-11-27 | 2020-03-13 | 云知声智能科技股份有限公司 | Voice recognition method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170140750A1 (en) * | 2015-11-17 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Method and device for speech recognition |
CN108305615B (en) * | 2017-10-23 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Object identification method and device, storage medium and terminal thereof |
-
2021
- 2021-01-11 CN CN202110033517.0A patent/CN112885370B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105529028A (en) * | 2015-12-09 | 2016-04-27 | 百度在线网络技术(北京)有限公司 | Voice analytical method and apparatus |
CN108564940A (en) * | 2018-03-20 | 2018-09-21 | 平安科技(深圳)有限公司 | Audio recognition method, server and computer readable storage medium |
CN110265063A (en) * | 2019-07-22 | 2019-09-20 | 东南大学 | A kind of lie detecting method based on fixed duration speech emotion recognition sequence analysis |
CN110880318A (en) * | 2019-11-27 | 2020-03-13 | 云知声智能科技股份有限公司 | Voice recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
CN112885370A (en) | 2021-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Foster et al. | Chime-home: A dataset for sound source recognition in a domestic environment | |
WO2020024690A1 (en) | Speech labeling method and apparatus, and device | |
KR101269296B1 (en) | Neural network classifier for separating audio sources from a monophonic audio signal | |
Umapathy et al. | Audio signal feature extraction and classification using local discriminant bases | |
KR102128926B1 (en) | Method and device for processing audio information | |
CN106653001A (en) | Baby crying identifying method and system | |
Krijnders et al. | Sound event recognition through expectancy-based evaluation ofsignal-driven hypotheses | |
Jiang et al. | An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K‐Means | |
Ghaemmaghami et al. | Noise robust voice activity detection using features extracted from the time-domain autocorrelation function | |
Ntalampiras et al. | Acoustic detection of human activities in natural environments | |
Kim et al. | Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology. | |
CN102623009A (en) | Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis | |
CN111951824A (en) | Detection method for distinguishing depression based on sound | |
CN108899033B (en) | Method and device for determining speaker characteristics | |
WO2017045429A1 (en) | Audio data detection method and system and storage medium | |
CN109408660A (en) | A method of the music based on audio frequency characteristics is classified automatically | |
CN113707173B (en) | Voice separation method, device, equipment and storage medium based on audio segmentation | |
CN112435687A (en) | Audio detection method and device, computer equipment and readable storage medium | |
Wood et al. | Classification of African elephant Loxodonta africana rumbles using acoustic parameters and cluster analysis | |
WO2022134781A1 (en) | Prolonged speech detection method, apparatus and device, and storage medium | |
Huang et al. | Fast diagnosis of bowel activities | |
CN112885370B (en) | Sound card validity detection method and device | |
CN113782051B (en) | Broadcast effect classification method and system, electronic equipment and storage medium | |
CN108766465A (en) | A kind of digital audio based on ENF universal background models distorts blind checking method | |
CN114463671A (en) | User personality identification method based on video data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |