CN113270111A - Height prediction method, device, equipment and medium based on audio data - Google Patents

Height prediction method, device, equipment and medium based on audio data Download PDF

Info

Publication number
CN113270111A
CN113270111A CN202110536777.XA CN202110536777A CN113270111A CN 113270111 A CN113270111 A CN 113270111A CN 202110536777 A CN202110536777 A CN 202110536777A CN 113270111 A CN113270111 A CN 113270111A
Authority
CN
China
Prior art keywords
audio data
preset
predicted
inputting
height prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110536777.XA
Other languages
Chinese (zh)
Inventor
吴建花
李南南
张乔石
余魏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Speakin Intelligent Technology Co ltd
Original Assignee
Guangzhou Speakin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Speakin Intelligent Technology Co ltd filed Critical Guangzhou Speakin Intelligent Technology Co ltd
Priority to CN202110536777.XA priority Critical patent/CN113270111A/en
Publication of CN113270111A publication Critical patent/CN113270111A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The application discloses a height prediction method, a height prediction device, height prediction equipment and a height prediction medium based on audio data, which are used for providing a method for predicting the height through sound data and providing an efficient clue troubleshooting means for public security officers. The method comprises the following steps: acquiring audio data to be predicted; extracting the Mel frequency cepstrum coefficient characteristics of the audio data to be predicted; inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted.

Description

Height prediction method, device, equipment and medium based on audio data
Technical Field
The present application relates to the field of speech processing technologies, and in particular, to a height prediction method, apparatus, device, and medium based on audio data.
Background
With the biological equipment technology as an important field of a new generation of artificial intelligence, the important research direction of identity recognition is carried out by means of human physiological characteristics or behavior characteristics. In recent years, due to the rapid development of information technologies such as cloud computing, big data, internet of things and deep learning, the biological recognition technology makes continuous breakthrough in the aspects of basic theory, algorithm model, innovation application and the like.
Voiceprints are a common feature in biological recognition features, are widely applied to the field of voice processing, and can be used for gender recognition, age prediction, identity recognition and the like. In a speech monitoring application scenario, estimating speaker data from audio data is a key link for generating biometric evidence. However, the prior art does not provide a method for height prediction by means of sound data. Therefore, it is an urgent technical problem to be solved by those skilled in the art to provide a method for predicting height based on sound data.
Disclosure of Invention
The application provides a height prediction method, a height prediction device, height prediction equipment and a height prediction medium based on audio data, which are used for providing a height prediction method through sound data.
In view of the above, a first aspect of the present application provides a height prediction method based on audio data, including:
acquiring audio data to be predicted;
extracting the Mel frequency cepstrum coefficient characteristics of the audio data to be predicted;
and inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, and inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted.
Optionally, the extracting the mel-frequency cepstrum coefficient feature of the audio data to be predicted further includes:
and carrying out normalization processing on the Mel frequency cepstrum coefficient characteristics.
Optionally, the preset gaussian mixture general background model includes a first preset gaussian mixture general background model and a second preset gaussian mixture general background model, and the preset support vector machine regression model includes a first preset support vector machine regression model and a second preset support vector machine regression model;
the method comprises the following steps of inputting the Mel frequency cepstrum coefficient features into a preset Gaussian mixture general background model for feature extraction to obtain feature vectors, inputting the feature vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted, and the method also comprises the following steps:
judging the gender of the voice in the audio data to be predicted;
the method for obtaining the height prediction result corresponding to the audio data to be predicted comprises the following steps of inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain the height prediction result corresponding to the audio data to be predicted, and the method comprises the following steps:
when the gender of the voice in the audio data to be predicted is female, inputting the Mel frequency cepstrum coefficient characteristics into the first preset Gaussian mixture general background model for characteristic extraction to obtain a first characteristic vector, and inputting the first characteristic vector into the first preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted;
and when the gender of the voice in the audio data to be predicted is male, inputting the Mel frequency cepstrum coefficient characteristics into the second preset Gaussian mixture general background model for characteristic extraction to obtain a second characteristic vector, and inputting the second characteristic vector into the second preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
Optionally, the configuration process of the preset gaussian mixture general background model is as follows:
acquiring a plurality of female background audio data and male background audio data, and respectively training a Gaussian mixture general background model through the female audio data and the male audio data to obtain a first Gaussian mixture general background model and a second Gaussian mixture general background model;
acquiring audio data to be trained, and dividing the audio data according to the voice and the gender in the audio data to be trained to obtain female audio data to be trained and male audio data to be trained, wherein the audio data to be trained is provided with a height label;
respectively extracting Mel frequency cepstrum coefficient characteristics of the female audio data to be trained and the male audio data to be trained;
inputting the Mel frequency cepstrum coefficient characteristics of the female audio data to be trained into the first Gaussian mixture general background model for training to obtain the first preset Gaussian mixture general background model;
and inputting the Mel frequency cepstrum coefficient characteristics of the male audio data to be trained into the second Gaussian mixture general background model for training to obtain the second preset Gaussian mixture general background model.
Optionally, the configuration process of the preset support vector machine regression model is as follows:
inputting the Mel frequency cepstrum coefficient characteristics of the female audio data to be trained into the first Gaussian mixture general background model for characteristic extraction to obtain the characteristic vector of the female audio data to be trained;
inputting the Mel frequency cepstrum coefficient characteristics of the male audio data to be trained into the second Gaussian mixture general background model for characteristic extraction to obtain the characteristic vector of the male audio data to be trained;
inputting the feature vector of the female audio data to be trained into a first support vector machine regression model for supervised training to obtain a first preset support vector machine regression model;
and inputting the feature vector of the male audio data to be trained into a second support vector machine regression model for supervised training to obtain the second preset support vector machine regression model.
A second aspect of the present application provides a height prediction apparatus based on audio data, comprising:
an acquisition unit configured to acquire audio data to be predicted;
the characteristic extraction unit is used for extracting the Mel frequency cepstrum coefficient characteristic of the audio data to be predicted;
and the prediction unit is used for inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted.
Optionally, the method further includes:
and the processing unit is used for carrying out normalization processing on the Mel frequency cepstrum coefficient characteristics.
Optionally, the preset gaussian mixture general background model includes a first preset gaussian mixture general background model and a second preset gaussian mixture general background model, the preset support vector machine regression model includes a first preset support vector machine regression model and a second preset support vector machine regression model, and the apparatus further includes:
the judging unit is used for judging the gender of the human voice in the audio data to be predicted;
the prediction unit is specifically configured to:
when the gender of the voice in the audio data to be predicted is female, inputting the Mel frequency cepstrum coefficient characteristics into the first preset Gaussian mixture general background model for characteristic extraction to obtain a first characteristic vector, and inputting the first characteristic vector into the first preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted;
and when the gender of the voice in the audio data to be predicted is male, inputting the Mel frequency cepstrum coefficient characteristics into the second preset Gaussian mixture general background model for characteristic extraction to obtain a second characteristic vector, and inputting the second characteristic vector into the second preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
A third aspect of the application provides a height prediction device based on audio data, the device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the method for height prediction based on audio data according to any of the first aspect according to instructions in the program code.
A fourth aspect of the present application provides a computer-readable storage medium for storing program code for executing the method for audio data based height prediction according to any of the first aspects.
According to the technical scheme, the method has the following advantages:
the application provides a height prediction method based on audio data, which comprises the following steps: acquiring audio data to be predicted; extracting the Mel frequency cepstrum coefficient characteristics of the audio data to be predicted; inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted.
According to the method, after the audio data to be predicted are obtained, the Mel frequency cepstrum coefficient characteristics are extracted and input into a preset Gaussian mixture general background model for characteristic extraction, characteristic vectors are obtained, then the characteristic vectors are input into a preset support vector machine regression model for height prediction, height prediction results corresponding to the audio data to be predicted are obtained, and the method for height prediction through sound data is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a height prediction method based on audio data according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart illustrating a height prediction method based on audio data according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a height prediction apparatus based on audio data according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For easy understanding, please refer to fig. 1, an embodiment of a height prediction method based on audio data provided by the present application includes:
step 101, audio data to be predicted are obtained.
The voice characteristics, the vocal tract characteristics and the pronunciation habits of each person in the speaking process are almost unique, and the acoustic model is established for the voice characteristics which can represent and identify the speaker in the voice, so that the research and development can be carried out on the aspects of biological characteristic estimation such as speaker identity information identification, height, age and the like. It has been found that the higher the height of the person, the lower respiratory tract is usually larger, and the extra space including the lungs creates a more muffled sound, and as the height increases, the frequency of the sound emitted from the airway in the lungs decreases significantly, so that taller persons tend to have lower pitch. Therefore, the voice contains the language content and identity information of the speaker and the accessory language information such as height, age, sex, emotion and the like, so that the height of the speaker can be predicted through the voice.
According to the embodiment of the application, the audio data to be predicted are obtained through the audio acquisition equipment, the recording equipment and the like.
And 102, extracting the Mel frequency cepstrum coefficient characteristics of the audio data to be predicted.
After the audio data to be predicted is obtained, Mel-Frequency Cepstral Coefficient Features (Mel-Frequency Cepstral Coefficient Features) of the audio data to be predicted are extracted, and after the Mel-Frequency Cepstral Coefficient Features are extracted, normalization processing can be carried out on the Mel-Frequency Cepstral Coefficient Features. The extraction process of the mel-frequency cepstrum coefficient features belongs to the prior art, and is not described herein again.
103, inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, and inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted.
The embodiment of the application inputs the normalized Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, and inputs the characteristic vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to audio data to be predicted.
According to the method, after the audio data to be predicted are obtained, the Mel frequency cepstrum coefficient characteristics are extracted and input into a preset Gaussian mixture general background model for characteristic extraction, characteristic vectors are obtained, then the characteristic vectors are input into a preset support vector machine regression model for height prediction, height prediction results corresponding to the audio data to be predicted are obtained, and the method for height prediction through sound data is achieved.
The above is an embodiment of a height prediction method based on audio data provided by the present application, and the following is another embodiment of a height prediction method based on audio data provided by the present application.
Step 201, audio data to be predicted is obtained.
Step 202, extracting mel frequency cepstrum coefficient characteristics of the audio data to be predicted.
The specific processes of steps 201 to 202 are the same as those of steps 101 to 102, and are not described herein again.
And step 203, judging the gender of the voice in the audio data to be predicted.
The gender of the voice in the audio data to be predicted can be judged through the gender recognition model, the network model is trained through the audio data containing female and male to obtain the gender recognition model, and then the gender of the voice in the audio data to be predicted is detected through the gender recognition model.
And 204, when the gender of the voice in the audio data to be predicted is female, inputting the Mel frequency cepstrum coefficient characteristics into a first preset Gaussian mixture general background model for characteristic extraction to obtain a first characteristic vector, and inputting the first characteristic vector into a first preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
And step 205, when the gender of the voice in the audio data to be predicted is male, inputting the Mel frequency cepstrum coefficient characteristics into a second preset Gaussian mixture general background model for characteristic extraction to obtain a second characteristic vector, and inputting the second characteristic vector into a second preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
The preset Gaussian mixture general background model in the embodiment of the application comprises a first preset Gaussian mixture general background model and a second preset Gaussian mixture general background model, and the preset support vector machine regression model comprises a first preset support vector machine regression model and a second preset support vector machine regression model.
When the gender of the voice in the audio data to be predicted is female, inputting the Mel frequency cepstrum coefficient characteristics into a first preset Gaussian mixture general background model for characteristic extraction to obtain a first characteristic vector, and inputting the first characteristic vector into a first preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
When the gender of the voice in the audio data to be predicted is male, inputting the Mel frequency cepstrum coefficient characteristics into a second preset Gaussian mixture general background model for characteristic extraction to obtain a second characteristic vector, and inputting the second characteristic vector into a second preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
Further, the configuration process of the preset gaussian mixture general background model in the embodiment of the present application is as follows:
a1, acquiring a plurality of female background audio data and male background audio data, and respectively training a Gaussian mixture general background model through the female audio data and the male audio data to obtain a first Gaussian mixture general background model and a second Gaussian mixture general background model;
in order to improve the accuracy of the height prediction result, a model is trained for the male and female audio data respectively, so that the height prediction is performed on the male and female audio data respectively. Acquiring a plurality of female background audio data and male background audio data, respectively training a Gaussian mixture general background model through the female audio data and the male audio data, and obtaining a first Gaussian mixture general background model and a second Gaussian mixture general background model, wherein the network structures of the first Gaussian mixture general background model and the second Gaussian mixture general background model are consistent, and only the trained network parameters are different.
According to the embodiment of the application, a general background model is trained in advance through the female background audio data and the male background audio data, and then the audio data to be trained is subjected to targeted training, so that the problem that the data volume of the audio data to be trained is insufficient can be solved, and the generalization capability of the model is improved.
A2, obtaining audio data to be trained, and dividing the audio data according to the voice and the sex in the audio data to be trained to obtain audio data to be trained for females and audio data to be trained for males, wherein the audio data to be trained has height labels;
the audio data of a large number of known objects (including males and females) can be collected, and then height labeling is carried out on each audio data to obtain the audio data to be trained. And then, dividing according to the voice and gender in the audio data to be trained to obtain the audio data to be trained for females and the audio data to be trained for males.
A3, respectively extracting Mel frequency cepstrum coefficient characteristics of the audio data to be trained of the female and the audio data to be trained of the male;
a4, inputting Mel frequency cepstrum coefficient characteristics of female audio data to be trained into a first Gaussian mixture general background model for training to obtain a first preset Gaussian mixture general background model;
a5, inputting the Mel frequency cepstrum coefficient characteristics of the male audio data to be trained into a second Gaussian mixture general background model for training to obtain a second preset Gaussian mixture general background model.
Further, the configuration process of the preset support vector machine regression model in the embodiment of the present application is as follows:
b1, inputting Mel frequency cepstrum coefficient characteristics of the female audio data to be trained into a first Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors of the female audio data to be trained;
b2, inputting the Mel frequency cepstrum coefficient characteristics of the male audio data to be trained into a second Gaussian mixture general background model for characteristic extraction to obtain the characteristic vector of the male audio data to be trained;
b3, inputting the feature vector of the audio data to be trained of the female into a first support vector machine regression model for supervised training to obtain a first preset support vector machine regression model;
and B4, inputting the feature vector of the male audio data to be trained into a second support vector machine regression model for supervised training to obtain a second preset support vector machine regression model.
When the support vector machine regression model is trained, calculating a loss value according to a height prediction result corresponding to audio data to be trained and the real height, updating parameters of the support vector machine regression model according to the loss value until the support vector machine regression model converges to obtain the trained support vector machine regression model, and taking the trained support vector machine regression model as a preset support vector machine regression model.
In the embodiment of the application, after the audio data to be predicted is obtained, the Mel frequency cepstrum coefficient characteristics are extracted and input into a preset Gaussian mixture general background model for characteristic extraction, so that the characteristic vector is obtained, then the characteristic vector is input into a preset support vector machine regression model for height prediction, so that a height prediction result corresponding to the audio data to be predicted is obtained, and the method for height prediction through the sound data is realized.
Furthermore, in the embodiment of the application, a general background model is trained in advance through the female background audio data and the male background audio data, and then the audio data to be trained is subjected to targeted training, so that the problem of insufficient data volume of the audio data to be trained can be solved, and the generalization capability of the model is improved; the support vector machine is trained through the audio data to be trained of the female and the audio data to be trained of the male respectively to predict the height of the male and the female separately, so that the support vector machine can learn the mapping relation between the vocal print characteristics of the female and the height characteristics in a targeted manner, and the mapping relation between the vocal print characteristics of the male and the height characteristics, and the height prediction accuracy can be improved.
The above is another embodiment of the method for height prediction based on audio data provided by the present application, and the following is an embodiment of the apparatus for height prediction based on audio data provided by the present application.
Referring to fig. 3, an embodiment of the present invention provides a height prediction apparatus based on audio data, including:
an acquisition unit configured to acquire audio data to be predicted;
the characteristic extraction unit is used for extracting the Mel frequency cepstrum coefficient characteristic of the audio data to be predicted;
and the prediction unit is used for inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
As a further improvement, the method further comprises the following steps:
and the processing unit is used for carrying out normalization processing on the Mel frequency cepstrum coefficient characteristics.
As a further improvement, the preset gaussian mixture general background model includes a first preset gaussian mixture general background model and a second preset gaussian mixture general background model, the preset support vector machine regression model includes a first preset support vector machine regression model and a second preset support vector machine regression model, and the apparatus further includes:
the judging unit is used for judging the gender of the human voice in the audio data to be predicted;
the prediction unit is specifically configured to:
when the gender of the voice in the audio data to be predicted is female, inputting Mel frequency cepstrum coefficient characteristics into a first preset Gaussian mixture general background model for characteristic extraction to obtain a first characteristic vector, and inputting the first characteristic vector into a first preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted;
when the gender of the voice in the audio data to be predicted is male, inputting the Mel frequency cepstrum coefficient characteristics into a second preset Gaussian mixture general background model for characteristic extraction to obtain a second characteristic vector, and inputting the second characteristic vector into a second preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
As a further improvement, the configuration process of the preset gaussian mixture general background model is as follows:
acquiring a plurality of female background audio data and male background audio data, and respectively training a Gaussian mixture general background model through the female audio data and the male audio data to obtain a first Gaussian mixture general background model and a second Gaussian mixture general background model;
acquiring audio data to be trained, and dividing the audio data according to the voice and the gender in the audio data to be trained to obtain female audio data to be trained and male audio data to be trained, wherein the audio data to be trained is provided with a height label;
respectively extracting Mel frequency cepstrum coefficient characteristics of the audio data to be trained of the female and the audio data to be trained of the male;
inputting the Mel frequency cepstrum coefficient characteristics of female audio data to be trained into a first Gaussian mixture general background model for training to obtain a first preset Gaussian mixture general background model;
and inputting the Mel frequency cepstrum coefficient characteristics of the male audio data to be trained into a second Gaussian mixture general background model for training to obtain a second preset Gaussian mixture general background model.
As a further improvement, the configuration process of the preset support vector machine regression model is as follows:
inputting the Mel frequency cepstrum coefficient characteristics of the female audio data to be trained into a first Gaussian mixture general background model for characteristic extraction to obtain the characteristic vector of the female audio data to be trained;
inputting the Mel frequency cepstrum coefficient characteristics of the male audio data to be trained into a second Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors of the male audio data to be trained;
inputting the feature vector of the audio data to be trained of the female into a first support vector machine regression model for supervised training to obtain a first preset support vector machine regression model;
and inputting the feature vector of the male audio data to be trained into a second support vector machine regression model for supervised training to obtain a second preset support vector machine regression model.
In the embodiment of the application, after the audio data to be predicted is obtained, the Mel frequency cepstrum coefficient characteristics are extracted and input into a preset Gaussian mixture general background model for characteristic extraction, so that the characteristic vector is obtained, then the characteristic vector is input into a preset support vector machine regression model for height prediction, so that a height prediction result corresponding to the audio data to be predicted is obtained, and the method for height prediction through the sound data is realized.
Furthermore, in the embodiment of the application, a general background model is trained in advance through the female background audio data and the male background audio data, and then the audio data to be trained is subjected to targeted training, so that the problem of insufficient data volume of the audio data to be trained can be solved, and the generalization capability of the model is improved; the support vector machine is trained through the audio data to be trained of the female and the audio data to be trained of the male respectively to predict the height of the male and the female separately, so that the support vector machine can learn the mapping relation between the vocal print characteristics of the female and the height characteristics in a targeted manner, and the mapping relation between the vocal print characteristics of the male and the height characteristics, and the height prediction accuracy can be improved.
The embodiment of the application also provides height prediction equipment based on the audio data, which comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is configured to execute the method for height prediction based on audio data in the aforementioned method embodiment according to instructions in the program code.
The embodiment of the application also provides a computer-readable storage medium for storing program codes, wherein the program codes are used for executing the height prediction method based on the audio data in the embodiment of the method.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A method for height prediction based on audio data, comprising:
acquiring audio data to be predicted;
extracting the Mel frequency cepstrum coefficient characteristics of the audio data to be predicted;
and inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, and inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted.
2. The method for height prediction based on audio data according to claim 1, wherein said extracting the mel-frequency cepstrum coefficient features of the audio data to be predicted further comprises:
and carrying out normalization processing on the Mel frequency cepstrum coefficient characteristics.
3. The audio-data-based height prediction method according to claim 1, wherein the preset gaussian-mixed general background model comprises a first preset gaussian-mixed general background model and a second preset gaussian-mixed general background model, and the preset support vector machine regression model comprises a first preset support vector machine regression model and a second preset support vector machine regression model;
the method comprises the following steps of inputting the Mel frequency cepstrum coefficient features into a preset Gaussian mixture general background model for feature extraction to obtain feature vectors, inputting the feature vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted, and the method also comprises the following steps:
judging the gender of the voice in the audio data to be predicted;
the method for obtaining the height prediction result corresponding to the audio data to be predicted comprises the following steps of inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain the height prediction result corresponding to the audio data to be predicted, and the method comprises the following steps:
when the gender of the voice in the audio data to be predicted is female, inputting the Mel frequency cepstrum coefficient characteristics into the first preset Gaussian mixture general background model for characteristic extraction to obtain a first characteristic vector, and inputting the first characteristic vector into the first preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted;
and when the gender of the voice in the audio data to be predicted is male, inputting the Mel frequency cepstrum coefficient characteristics into the second preset Gaussian mixture general background model for characteristic extraction to obtain a second characteristic vector, and inputting the second characteristic vector into the second preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
4. The method of claim 3, wherein the preset Gaussian mixture general background model is configured by the following steps:
acquiring a plurality of female background audio data and male background audio data, and respectively training a Gaussian mixture general background model through the female audio data and the male audio data to obtain a first Gaussian mixture general background model and a second Gaussian mixture general background model;
acquiring audio data to be trained, and dividing the audio data according to the voice and the gender in the audio data to be trained to obtain female audio data to be trained and male audio data to be trained, wherein the audio data to be trained is provided with a height label;
respectively extracting Mel frequency cepstrum coefficient characteristics of the female audio data to be trained and the male audio data to be trained;
inputting the Mel frequency cepstrum coefficient characteristics of the female audio data to be trained into the first Gaussian mixture general background model for training to obtain the first preset Gaussian mixture general background model;
and inputting the Mel frequency cepstrum coefficient characteristics of the male audio data to be trained into the second Gaussian mixture general background model for training to obtain the second preset Gaussian mixture general background model.
5. The method of claim 4, wherein the configuration process of the regression model of the preset support vector machine is as follows:
inputting the Mel frequency cepstrum coefficient characteristics of the female audio data to be trained into the first Gaussian mixture general background model for characteristic extraction to obtain the characteristic vector of the female audio data to be trained;
inputting the Mel frequency cepstrum coefficient characteristics of the male audio data to be trained into the second Gaussian mixture general background model for characteristic extraction to obtain the characteristic vector of the male audio data to be trained;
inputting the feature vector of the female audio data to be trained into a first support vector machine regression model for supervised training to obtain a first preset support vector machine regression model;
and inputting the feature vector of the male audio data to be trained into a second support vector machine regression model for supervised training to obtain the second preset support vector machine regression model.
6. A height prediction apparatus based on audio data, comprising:
an acquisition unit configured to acquire audio data to be predicted;
the characteristic extraction unit is used for extracting the Mel frequency cepstrum coefficient characteristic of the audio data to be predicted;
and the prediction unit is used for inputting the Mel frequency cepstrum coefficient characteristics into a preset Gaussian mixture general background model for characteristic extraction to obtain characteristic vectors, inputting the characteristic vectors into a preset support vector machine regression model for height prediction to obtain height prediction results corresponding to the audio data to be predicted.
7. The audio-data-based height prediction apparatus according to claim 6, further comprising:
and the processing unit is used for carrying out normalization processing on the Mel frequency cepstrum coefficient characteristics.
8. The apparatus for height prediction based on audio data according to claim 6, wherein the preset Gaussian mixture common background model comprises a first preset Gaussian mixture common background model and a second preset Gaussian mixture common background model, the preset support vector machine regression model comprises a first preset support vector machine regression model and a second preset support vector machine regression model, the apparatus further comprising:
the judging unit is used for judging the gender of the human voice in the audio data to be predicted;
the prediction unit is specifically configured to:
when the gender of the voice in the audio data to be predicted is female, inputting the Mel frequency cepstrum coefficient characteristics into the first preset Gaussian mixture general background model for characteristic extraction to obtain a first characteristic vector, and inputting the first characteristic vector into the first preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted;
and when the gender of the voice in the audio data to be predicted is male, inputting the Mel frequency cepstrum coefficient characteristics into the second preset Gaussian mixture general background model for characteristic extraction to obtain a second characteristic vector, and inputting the second characteristic vector into the second preset support vector machine regression model for height prediction to obtain a height prediction result corresponding to the audio data to be predicted.
9. A height prediction device based on audio data, characterized in that the device comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the audio data based height prediction method of any of claims 1-5 according to instructions in the program code.
10. A computer-readable storage medium for storing program code for executing the audio data based height prediction method of any one of claims 1-5.
CN202110536777.XA 2021-05-17 2021-05-17 Height prediction method, device, equipment and medium based on audio data Pending CN113270111A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110536777.XA CN113270111A (en) 2021-05-17 2021-05-17 Height prediction method, device, equipment and medium based on audio data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110536777.XA CN113270111A (en) 2021-05-17 2021-05-17 Height prediction method, device, equipment and medium based on audio data

Publications (1)

Publication Number Publication Date
CN113270111A true CN113270111A (en) 2021-08-17

Family

ID=77231351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110536777.XA Pending CN113270111A (en) 2021-05-17 2021-05-17 Height prediction method, device, equipment and medium based on audio data

Country Status (1)

Country Link
CN (1) CN113270111A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178897A (en) * 2007-12-05 2008-05-14 浙江大学 Speaking man recognizing method using base frequency envelope to eliminate emotion voice
KR20080077719A (en) * 2007-02-21 2008-08-26 인하대학교 산학협력단 A voice-based gender identification method using a support vector machine(svm)
CN102034288A (en) * 2010-12-09 2011-04-27 江南大学 Multiple biological characteristic identification-based intelligent door control system
CN102820033A (en) * 2012-08-17 2012-12-12 南京大学 Voiceprint identification method
CN102881284A (en) * 2012-09-03 2013-01-16 江苏大学 Unspecific human voice and emotion recognition method and system
CN107146615A (en) * 2017-05-16 2017-09-08 南京理工大学 Audio recognition method and system based on the secondary identification of Matching Model
CN109446948A (en) * 2018-10-15 2019-03-08 西安交通大学 A kind of face and voice multi-biological characteristic fusion authentication method based on Android platform
CN109817246A (en) * 2019-02-27 2019-05-28 平安科技(深圳)有限公司 Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model
CN111161713A (en) * 2019-12-20 2020-05-15 北京皮尔布莱尼软件有限公司 Voice gender identification method and device and computing equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080077719A (en) * 2007-02-21 2008-08-26 인하대학교 산학협력단 A voice-based gender identification method using a support vector machine(svm)
CN101178897A (en) * 2007-12-05 2008-05-14 浙江大学 Speaking man recognizing method using base frequency envelope to eliminate emotion voice
CN102034288A (en) * 2010-12-09 2011-04-27 江南大学 Multiple biological characteristic identification-based intelligent door control system
CN102820033A (en) * 2012-08-17 2012-12-12 南京大学 Voiceprint identification method
CN102881284A (en) * 2012-09-03 2013-01-16 江苏大学 Unspecific human voice and emotion recognition method and system
CN107146615A (en) * 2017-05-16 2017-09-08 南京理工大学 Audio recognition method and system based on the secondary identification of Matching Model
CN109446948A (en) * 2018-10-15 2019-03-08 西安交通大学 A kind of face and voice multi-biological characteristic fusion authentication method based on Android platform
CN109817246A (en) * 2019-02-27 2019-05-28 平安科技(深圳)有限公司 Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model
CN111161713A (en) * 2019-12-20 2020-05-15 北京皮尔布莱尼软件有限公司 Voice gender identification method and device and computing equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱文锋等: "《中医诊断学》", 人民卫生出版社, pages: 115 *

Similar Documents

Publication Publication Date Title
US11538472B2 (en) Processing speech signals in voice-based profiling
CN107492382B (en) Voiceprint information extraction method and device based on neural network
CN107610709B (en) Method and system for training voiceprint recognition model
CN109859772B (en) Emotion recognition method, emotion recognition device and computer-readable storage medium
CN108305643B (en) Method and device for determining emotion information
Busso et al. Iterative feature normalization scheme for automatic emotion detection from speech
CN112259106A (en) Voiceprint recognition method and device, storage medium and computer equipment
CN108197115A (en) Intelligent interactive method, device, computer equipment and computer readable storage medium
Mariooryad et al. Compensating for speaker or lexical variabilities in speech for emotion recognition
CN110265040A (en) Training method, device, storage medium and the electronic equipment of sound-groove model
CN113380271B (en) Emotion recognition method, system, device and medium
US20210020191A1 (en) Methods and systems for voice profiling as a service
Sethu et al. Speech based emotion recognition
CN102404278A (en) Song requesting system based on voiceprint recognition and application method thereof
CN112735371B (en) Method and device for generating speaker video based on text information
CN108711429A (en) Electronic equipment and apparatus control method
CN112017690B (en) Audio processing method, device, equipment and medium
CN113851136A (en) Clustering-based speaker recognition method, device, equipment and storage medium
CN111710337A (en) Voice data processing method and device, computer readable medium and electronic equipment
CN111179940A (en) Voice recognition method and device and computing equipment
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
CN112735432B (en) Audio identification method, device, electronic equipment and storage medium
JP2015175859A (en) Pattern recognition device, pattern recognition method, and pattern recognition program
Kamińska et al. Comparison of perceptual features efficiency for automatic identification of emotional states from speech
CN116052644A (en) Speaker recognition method based on trivial pronunciation and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination