CN111128235A - Age prediction method, device and equipment based on voice - Google Patents

Age prediction method, device and equipment based on voice Download PDF

Info

Publication number
CN111128235A
CN111128235A CN201911234436.6A CN201911234436A CN111128235A CN 111128235 A CN111128235 A CN 111128235A CN 201911234436 A CN201911234436 A CN 201911234436A CN 111128235 A CN111128235 A CN 111128235A
Authority
CN
China
Prior art keywords
short term
term memory
long
memory network
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911234436.6A
Other languages
Chinese (zh)
Inventor
陈文敏
李稀敏
肖龙源
蔡振华
刘晓葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN201911234436.6A priority Critical patent/CN111128235A/en
Publication of CN111128235A publication Critical patent/CN111128235A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Abstract

The invention discloses an age prediction method, an age prediction device and age prediction equipment based on voice. Wherein the method comprises the following steps: the method comprises the steps of obtaining human voice data of different ages, constructing a long-short term memory network regression model based on the voice data, training the constructed long-short term memory network regression model by adopting a long-short term memory network, and predicting the human age corresponding to the voice of the human voice according to the trained long-short term memory network regression model. By the method, the age of the human body can be predicted through the voice of the human body.

Description

Age prediction method, device and equipment based on voice
Technical Field
The invention relates to the technical field of age prediction, in particular to an age prediction method, an age prediction device and age prediction equipment based on voice.
Background
Speech is a sound that is emitted from a human body through a speech organ, has a certain meaning, and is intended to be used for social interaction. The human voice generally changes with age.
The existing age prediction scheme generally obtains a face image of a human body, and performs face recognition according to the face image, so as to predict the age of the human body.
However, the conventional age prediction scheme cannot predict the age of a human body through the voice of the human body.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus and a device for predicting an age of a human body based on speech, which can predict the age of the human body through the speech of the human body.
According to an aspect of the present invention, there is provided a speech-based age prediction method, including: acquiring voice data of human bodies of different ages; constructing a long-short term memory network regression model based on the voice data; training the constructed long-short term memory network regression model by adopting a long-short term memory network; and predicting the age of the human body corresponding to the voice of the human body according to the trained long-short term memory network regression model.
The training of the constructed long-short term memory network regression model by adopting the long-short term memory network comprises the following steps: the method comprises the steps of marking age tags of corresponding ages on each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting Mel cepstrum coefficient features and fundamental frequency features from the acoustic features to be used as training input of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel cepstrum coefficient features and the fundamental frequency features as training input.
The method for predicting the human age corresponding to the voice of the human voice according to the trained long-short term memory network regression model comprises the following steps: extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of a human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.
After the predicting the human age corresponding to the speech of the human voice according to the trained long-short term memory network regression model, the method further comprises the following steps: and carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to train and update the long-short term memory network regression model through iteration of prediction times.
According to another aspect of the present invention, there is provided a speech-based age prediction apparatus including: the device comprises an acquisition module, a construction module, a training module and a prediction module; the acquisition module is used for acquiring voice data of human bodies of different ages; the construction module is used for constructing a long-short term memory network regression model based on the voice data; the training module is used for training the constructed long-short term memory network regression model by adopting a long-short term memory network; and the prediction module is used for predicting the human age corresponding to the voice of the human voice according to the trained long-short term memory network regression model.
Wherein, the training module is specifically configured to: the method comprises the steps of marking age tags of corresponding ages on each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting Mel cepstrum coefficient features and fundamental frequency features from the acoustic features to be used as training input of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel cepstrum coefficient features and the fundamental frequency features as training input.
Wherein the prediction module is specifically configured to: extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of a human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.
Wherein the age prediction device based on voice further comprises: an update module; the updating module is used for carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to carry out training updating on the long-short term memory network regression model through iteration of prediction times.
According to still another aspect of the present invention, there is provided a voice-based age prediction apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of speech-based age prediction as defined in any one of the above.
According to a further aspect of the present invention, there is provided a computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a speech based age prediction method as defined in any one of the above.
It can be found that, according to the above scheme, the voice data of the human body of different age groups can be obtained, the long-short term memory network regression model based on the voice data can be constructed, the long-short term memory network regression model constructed can be trained by adopting the long-short term memory network, the voice of the human body can be predicted according to the trained long-short term memory network regression model, and the prediction of the age of the human body corresponding to the voice can be realized through the voice of the human body.
Furthermore, according to the scheme, the long-short term memory network can be used for marking the age label of each voice in the voice data according to the age label, extracting the acoustic feature of each voice from the voice data marked by the age label, extracting the Mel cepstrum coefficient feature and the fundamental frequency feature from the acoustic feature to be used as the training input mode of the long-short term memory network, and the long-short term memory network which takes the Mel cepstrum coefficient feature and the fundamental frequency feature as the training input mode is used for training the constructed long-short term memory network regression model.
Furthermore, according to the above scheme, according to the trained long-short term memory network regression model, mel cepstrum coefficient features and fundamental frequency features associated with the speech are extracted from the speech of the human body, the extracted mel cepstrum coefficient features and fundamental frequency features associated with the speech are input into the trained long-short term memory network regression model to predict the age of the human body corresponding to the speech of the human body, and the predicted age is obtained through the trained long-short term memory network regression model.
Furthermore, the above scheme can perform parameter updating on the long-short term memory network through a loss function and an optimization algorithm of cross entropy loss, and train and update the long-short term memory network regression model through iteration of prediction times by adopting the long-short term memory network after parameter updating, so that the advantage of improving the accuracy of predicting the human age corresponding to the voice of the human body can be realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for age prediction based on speech according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a method for predicting age based on speech according to another embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an embodiment of an apparatus for predicting age based on speech according to the present invention;
FIG. 4 is a schematic diagram of another embodiment of the age prediction apparatus based on speech according to the present invention;
fig. 5 is a schematic structural diagram of an embodiment of the age prediction apparatus based on speech according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be noted that the following examples are only illustrative of the present invention, and do not limit the scope of the present invention. Similarly, the following examples are only some but not all examples of the present invention, and all other examples obtained by those skilled in the art without any inventive work are within the scope of the present invention.
The invention provides an age prediction method based on voice, which can realize the prediction of the age of a human body through the voice of the human body.
Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a method for predicting age based on speech according to the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the method comprises the steps of:
s101: and acquiring voice data of human bodies of different ages.
In this embodiment, the voice data of the human bodies of different ages can be acquired at one time, the voice data of the human bodies of different ages can be acquired for multiple times, the voice data of the human bodies of different ages can be acquired one by one, and the like, and the invention is not limited.
In this embodiment, the voice data of different human bodies of different ages can be obtained, and the voice data of the same human body of different ages can also be obtained, which is not limited in the present invention.
S102: an LSTM (Long Short-Term Memory) regression model based on the voice data is constructed.
In this embodiment, the constructed long-short term memory network regression model can predict the age of the human body by using an appropriate regressor according to the difference of the speech characteristics between human bodies of different ages.
S103: and training the constructed long-short term memory network regression model by adopting the long-short term memory network.
The training of the long-short term memory network regression model by using the long-short term memory network may include:
the method comprises the steps of marking age tags of corresponding ages of each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting MFCC (Mel-scale Frequency Cepstral Coefficients) features and fundamental Frequency features from the acoustic features as training inputs of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel-scale Cepstral coefficient features and the fundamental Frequency features as the training inputs.
S104: and predicting the age of the human body corresponding to the voice of the human body according to the trained long-short term memory network regression model.
The predicting the human age corresponding to the speech of the human voice according to the trained long-term and short-term memory network regression model may include:
according to the trained long-short term memory network regression model, Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice are extracted from the voice of the human body, the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice are input into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and the predicted age is obtained through the trained long-short term memory network regression model.
After the predicting the human age corresponding to the speech of the human voice according to the trained long-short term memory network regression model, the method may further include:
the long-short term memory network is subjected to parameter updating through a loss function of cross entropy loss and an optimization algorithm, and the long-short term memory network regression model is trained and updated through iteration of prediction times by adopting the long-short term memory network subjected to parameter updating, so that the advantage that the accuracy of predicting the human age corresponding to the voice of the human body can be improved.
It can be found that, in this embodiment, the voice data of the human body of different age groups can be obtained, the long-short term memory network regression model based on the voice data can be constructed, the long-short term memory network regression model constructed can be trained by using the long-short term memory network, and the voice of the human body can be predicted according to the trained long-short term memory network regression model, so that the age of the human body can be predicted by the voice of the human body.
Further, in this embodiment, a long-short term memory network may be used to mark an age tag of each voice in the voice data for a corresponding age, extract an acoustic feature of each voice from the voice data marked with the age tag, extract a mel-frequency cepstrum coefficient feature and a fundamental frequency feature from the acoustic feature as a training input of the long-short term memory network, and train the constructed long-short term memory network regression model by using the long-short term memory network with the mel-frequency cepstrum coefficient feature and the fundamental frequency feature as the training input.
Further, in this embodiment, according to the trained long-short term memory network regression model, mel cepstrum coefficient features and fundamental frequency features associated with the speech may be extracted from the speech of the human body, and the extracted mel cepstrum coefficient features and fundamental frequency features associated with the speech may be input into the trained long-short term memory network regression model to predict the age of the human body corresponding to the speech of the human body.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for predicting age based on speech according to another embodiment of the present invention. In this embodiment, the method includes the steps of:
s201: and acquiring voice data of human bodies of different ages.
As described above in S101, further description is omitted here.
S202: and constructing a long-short term memory network regression model based on the voice data.
As described above in S102, further description is omitted here.
S203: and training the constructed long-short term memory network regression model by adopting the long-short term memory network.
As described above in S103, which is not described herein.
S204: and predicting the age of the human body corresponding to the voice of the human body according to the trained long-short term memory network regression model.
As described above in S104, and will not be described herein.
S205: and carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to train and update the long-short term memory network regression model through iteration of prediction times.
It can be found that, in this embodiment, parameter updating can be performed on the long-short term memory network through a loss function and an optimization algorithm of cross entropy loss, and the long-short term memory network after parameter updating is adopted to perform training and updating on the long-short term memory network regression model through iteration of prediction times, which has the advantage of being capable of improving accuracy of predicting human age corresponding to human voice of a human body.
The invention also provides an age prediction device based on voice, which can realize the prediction of the age of the human body through the voice of the human body.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an age prediction apparatus based on speech according to an embodiment of the present invention. In this embodiment, the age prediction device 30 based on speech includes an obtaining module 31, a constructing module 32, a training module 33, and a predicting module 34.
The acquiring module 31 is used for acquiring voice data of human bodies of different ages.
The building module 32 is configured to build a long-short term memory network regression model based on the speech data.
The training module 33 is configured to train the constructed long-short term memory network regression model by using the long-short term memory network.
The prediction module 34 is configured to predict the human age corresponding to the speech of the human according to the trained long-term and short-term memory network regression model.
Optionally, the training module 33 may be specifically configured to:
the method comprises the steps of marking age tags of corresponding ages on each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting Mel cepstrum coefficient features and fundamental frequency features from the acoustic features to be used as training input of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel cepstrum coefficient features and the fundamental frequency features as training input.
Optionally, the prediction module 34 may be specifically configured to:
extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of the human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an age prediction apparatus based on speech according to another embodiment of the present invention. Different from the previous embodiment, the age prediction apparatus 40 based on speech of the present embodiment further includes an update module 41.
The updating module 41 is configured to perform parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and perform training and updating on the long-short term memory network regression model through iteration of prediction times by using the long-short term memory network after parameter updating.
Each unit module of the age prediction device 30/40 based on speech can respectively execute the corresponding steps in the above method embodiments, so the detailed description of each unit module is omitted here, and please refer to the description of the corresponding steps above.
The present invention further provides a voice-based age prediction apparatus, as shown in fig. 5, including: at least one processor 51; and a memory 52 communicatively coupled to the at least one processor 51; the memory 52 stores instructions executable by the at least one processor 51, and the instructions are executable by the at least one processor 51 to enable the at least one processor 51 to perform the above-described speech-based age prediction method.
Wherein the memory 52 and the processor 51 are coupled in a bus, which may comprise any number of interconnected buses and bridges, which couple one or more of the various circuits of the processor 51 and the memory 52 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 51 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 51.
The processor 51 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 52 may be used to store data used by the processor 51 in performing operations.
The present invention further provides a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
It can be found that, according to the above scheme, the voice data of the human body of different age groups can be obtained, the long-short term memory network regression model based on the voice data can be constructed, the long-short term memory network regression model constructed can be trained by adopting the long-short term memory network, the voice of the human body can be predicted according to the trained long-short term memory network regression model, and the prediction of the age of the human body corresponding to the voice can be realized through the voice of the human body.
Furthermore, according to the scheme, the long-short term memory network can be used for marking the age label of each voice in the voice data according to the age label, extracting the acoustic feature of each voice from the voice data marked by the age label, extracting the Mel cepstrum coefficient feature and the fundamental frequency feature from the acoustic feature to be used as the training input mode of the long-short term memory network, and the long-short term memory network which takes the Mel cepstrum coefficient feature and the fundamental frequency feature as the training input mode is used for training the constructed long-short term memory network regression model.
Furthermore, according to the above scheme, according to the trained long-short term memory network regression model, mel cepstrum coefficient features and fundamental frequency features associated with the speech are extracted from the speech of the human body, the extracted mel cepstrum coefficient features and fundamental frequency features associated with the speech are input into the trained long-short term memory network regression model to predict the age of the human body corresponding to the speech of the human body, and the predicted age is obtained through the trained long-short term memory network regression model.
Furthermore, the above scheme can perform parameter updating on the long-short term memory network through a loss function and an optimization algorithm of cross entropy loss, and train and update the long-short term memory network regression model through iteration of prediction times by adopting the long-short term memory network after parameter updating, so that the advantage of improving the accuracy of predicting the human age corresponding to the voice of the human body can be realized.
In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a part of the embodiments of the present invention, and not intended to limit the scope of the present invention, and all equivalent devices or equivalent processes performed by the present invention through the contents of the specification and the drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for age prediction based on speech, comprising:
acquiring voice data of human bodies of different ages;
constructing a long-short term memory network regression model based on the voice data;
training the constructed long-short term memory network regression model by adopting a long-short term memory network;
and predicting the age of the human body corresponding to the voice of the human body according to the trained long-short term memory network regression model.
2. The speech-based age prediction method of claim 1 wherein the training of the constructed long-short term memory network regression model using the long-short term memory network comprises:
the method comprises the steps of marking age tags of corresponding ages on each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting Mel cepstrum coefficient features and fundamental frequency features from the acoustic features to be used as training input of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel cepstrum coefficient features and the fundamental frequency features as training input.
3. The method of claim 1, wherein the predicting the age of the human body corresponding to the speech of the human body according to the trained long-short term memory network regression model comprises:
extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of a human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.
4. The speech-based age prediction method of claim 1, further comprising, after said predicting the human's age corresponding to the speech of the human's speech based on the trained long-short term memory network regression model, further:
and carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to train and update the long-short term memory network regression model through iteration of prediction times.
5. A speech-based age prediction apparatus, comprising:
the device comprises an acquisition module, a construction module, a training module and a prediction module;
the acquisition module is used for acquiring voice data of human bodies of different ages;
the construction module is used for constructing a long-short term memory network regression model based on the voice data;
the training module is used for training the constructed long-short term memory network regression model by adopting a long-short term memory network;
and the prediction module is used for predicting the human age corresponding to the voice of the human voice according to the trained long-short term memory network regression model.
6. The speech-based age prediction device of claim 5, wherein the training module is specifically configured to:
the method comprises the steps of marking age tags of corresponding ages on each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting Mel cepstrum coefficient features and fundamental frequency features from the acoustic features to be used as training input of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel cepstrum coefficient features and the fundamental frequency features as training input.
7. The speech-based age prediction device of claim 5, wherein the prediction module is specifically configured to:
extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of a human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.
8. The speech-based age prediction device of claim 5, further comprising:
an update module;
the updating module is used for carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to carry out training updating on the long-short term memory network regression model through iteration of prediction times.
9. A speech-based age prediction device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the speech-based age prediction method of any one of claims 1 to 4.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the speech-based age prediction method of any one of claims 1 to 4.
CN201911234436.6A 2019-12-05 2019-12-05 Age prediction method, device and equipment based on voice Pending CN111128235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911234436.6A CN111128235A (en) 2019-12-05 2019-12-05 Age prediction method, device and equipment based on voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911234436.6A CN111128235A (en) 2019-12-05 2019-12-05 Age prediction method, device and equipment based on voice

Publications (1)

Publication Number Publication Date
CN111128235A true CN111128235A (en) 2020-05-08

Family

ID=70497559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911234436.6A Pending CN111128235A (en) 2019-12-05 2019-12-05 Age prediction method, device and equipment based on voice

Country Status (1)

Country Link
CN (1) CN111128235A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023281606A1 (en) * 2021-07-05 2023-01-12 日本電信電話株式会社 Learning device, learning method, and learning program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700843A (en) * 2015-02-05 2015-06-10 海信集团有限公司 Method and device for identifying ages
CN107680597A (en) * 2017-10-23 2018-02-09 平安科技(深圳)有限公司 Audio recognition method, device, equipment and computer-readable recording medium
CN108197294A (en) * 2018-01-22 2018-06-22 桂林电子科技大学 A kind of text automatic generation method based on deep learning
CN108847224A (en) * 2018-07-05 2018-11-20 广州势必可赢网络科技有限公司 A kind of sound mural painting plane display method and device
CN109801621A (en) * 2019-03-15 2019-05-24 三峡大学 A kind of audio recognition method based on residual error gating cycle unit
CN109817222A (en) * 2019-01-26 2019-05-28 平安科技(深圳)有限公司 A kind of age recognition methods, device and terminal device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700843A (en) * 2015-02-05 2015-06-10 海信集团有限公司 Method and device for identifying ages
CN107680597A (en) * 2017-10-23 2018-02-09 平安科技(深圳)有限公司 Audio recognition method, device, equipment and computer-readable recording medium
CN108197294A (en) * 2018-01-22 2018-06-22 桂林电子科技大学 A kind of text automatic generation method based on deep learning
CN108847224A (en) * 2018-07-05 2018-11-20 广州势必可赢网络科技有限公司 A kind of sound mural painting plane display method and device
CN109817222A (en) * 2019-01-26 2019-05-28 平安科技(深圳)有限公司 A kind of age recognition methods, device and terminal device
CN109801621A (en) * 2019-03-15 2019-05-24 三峡大学 A kind of audio recognition method based on residual error gating cycle unit

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023281606A1 (en) * 2021-07-05 2023-01-12 日本電信電話株式会社 Learning device, learning method, and learning program

Similar Documents

Publication Publication Date Title
CN109299458B (en) Entity identification method, device, equipment and storage medium
US11017762B2 (en) Method and apparatus for generating text-to-speech model
KR102382499B1 (en) Translation method, target information determination method, related apparatus and storage medium
CN109271631B (en) Word segmentation method, device, equipment and storage medium
EP3133595B1 (en) Speech recognition
US10083169B1 (en) Topic-based sequence modeling neural networks
US10832658B2 (en) Quantized dialog language model for dialog systems
CN102254555B (en) Improving the robustness to environmental changes of a context dependent speech recognizer
CN111210840A (en) Age prediction method, device and equipment
CN109410918B (en) Method and device for acquiring information
CN111009238B (en) Method, device and equipment for recognizing spliced voice
CN110264992A (en) Speech synthesis processing method, device, equipment and storage medium
US11817081B2 (en) Learning device, learning method, learning program, retrieval device, retrieval method, and retrieval program
US11468892B2 (en) Electronic apparatus and method for controlling electronic apparatus
CN111261196A (en) Age estimation method, device and equipment
CN111339309A (en) Corpus expansion method and system for user intention
CN113887627A (en) Noise sample identification method and device, electronic equipment and storage medium
KR20210028041A (en) Electronic device and Method for controlling the electronic device thereof
CN111128235A (en) Age prediction method, device and equipment based on voice
US11830478B2 (en) Learning device, learning method, and learning program for images and sound which uses a similarity matrix
CN111194463A (en) Artificial intelligence system and method for displaying a destination on a mobile device
CN105374351A (en) Methods and apparatus for interpreting received speech data using speech recognition
CN112487813A (en) Named entity recognition method and system, electronic equipment and storage medium
US20210004603A1 (en) Method and apparatus for determining (raw) video materials for news
CN111128234B (en) Spliced voice recognition detection method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508

RJ01 Rejection of invention patent application after publication