CN111128235A

CN111128235A - Age prediction method, device and equipment based on voice

Info

Publication number: CN111128235A
Application number: CN201911234436.6A
Authority: CN
Inventors: 陈文敏; 李稀敏; 肖龙源; 蔡振华; 刘晓葳
Original assignee: Xiamen Kuaishangtong Technology Co Ltd
Current assignee: Xiamen Kuaishangtong Technology Co Ltd
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-05-08

Abstract

The invention discloses an age prediction method, an age prediction device and age prediction equipment based on voice. Wherein the method comprises the following steps: the method comprises the steps of obtaining human voice data of different ages, constructing a long-short term memory network regression model based on the voice data, training the constructed long-short term memory network regression model by adopting a long-short term memory network, and predicting the human age corresponding to the voice of the human voice according to the trained long-short term memory network regression model. By the method, the age of the human body can be predicted through the voice of the human body.

Description

Age prediction method, device and equipment based on voice

Technical Field

The invention relates to the technical field of age prediction, in particular to an age prediction method, an age prediction device and age prediction equipment based on voice.

Background

Speech is a sound that is emitted from a human body through a speech organ, has a certain meaning, and is intended to be used for social interaction. The human voice generally changes with age.

The existing age prediction scheme generally obtains a face image of a human body, and performs face recognition according to the face image, so as to predict the age of the human body.

However, the conventional age prediction scheme cannot predict the age of a human body through the voice of the human body.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus and a device for predicting an age of a human body based on speech, which can predict the age of the human body through the speech of the human body.

According to an aspect of the present invention, there is provided a speech-based age prediction method, including: acquiring voice data of human bodies of different ages; constructing a long-short term memory network regression model based on the voice data; training the constructed long-short term memory network regression model by adopting a long-short term memory network; and predicting the age of the human body corresponding to the voice of the human body according to the trained long-short term memory network regression model.

The training of the constructed long-short term memory network regression model by adopting the long-short term memory network comprises the following steps: the method comprises the steps of marking age tags of corresponding ages on each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting Mel cepstrum coefficient features and fundamental frequency features from the acoustic features to be used as training input of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel cepstrum coefficient features and the fundamental frequency features as training input.

The method for predicting the human age corresponding to the voice of the human voice according to the trained long-short term memory network regression model comprises the following steps: extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of a human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.

After the predicting the human age corresponding to the speech of the human voice according to the trained long-short term memory network regression model, the method further comprises the following steps: and carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to train and update the long-short term memory network regression model through iteration of prediction times.

According to another aspect of the present invention, there is provided a speech-based age prediction apparatus including: the device comprises an acquisition module, a construction module, a training module and a prediction module; the acquisition module is used for acquiring voice data of human bodies of different ages; the construction module is used for constructing a long-short term memory network regression model based on the voice data; the training module is used for training the constructed long-short term memory network regression model by adopting a long-short term memory network; and the prediction module is used for predicting the human age corresponding to the voice of the human voice according to the trained long-short term memory network regression model.

Wherein, the training module is specifically configured to: the method comprises the steps of marking age tags of corresponding ages on each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting Mel cepstrum coefficient features and fundamental frequency features from the acoustic features to be used as training input of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel cepstrum coefficient features and the fundamental frequency features as training input.

Wherein the prediction module is specifically configured to: extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of a human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.

Wherein the age prediction device based on voice further comprises: an update module; the updating module is used for carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to carry out training updating on the long-short term memory network regression model through iteration of prediction times.

According to still another aspect of the present invention, there is provided a voice-based age prediction apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of speech-based age prediction as defined in any one of the above.

According to a further aspect of the present invention, there is provided a computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a speech based age prediction method as defined in any one of the above.

It can be found that, according to the above scheme, the voice data of the human body of different age groups can be obtained, the long-short term memory network regression model based on the voice data can be constructed, the long-short term memory network regression model constructed can be trained by adopting the long-short term memory network, the voice of the human body can be predicted according to the trained long-short term memory network regression model, and the prediction of the age of the human body corresponding to the voice can be realized through the voice of the human body.

Furthermore, according to the scheme, the long-short term memory network can be used for marking the age label of each voice in the voice data according to the age label, extracting the acoustic feature of each voice from the voice data marked by the age label, extracting the Mel cepstrum coefficient feature and the fundamental frequency feature from the acoustic feature to be used as the training input mode of the long-short term memory network, and the long-short term memory network which takes the Mel cepstrum coefficient feature and the fundamental frequency feature as the training input mode is used for training the constructed long-short term memory network regression model.

Furthermore, according to the above scheme, according to the trained long-short term memory network regression model, mel cepstrum coefficient features and fundamental frequency features associated with the speech are extracted from the speech of the human body, the extracted mel cepstrum coefficient features and fundamental frequency features associated with the speech are input into the trained long-short term memory network regression model to predict the age of the human body corresponding to the speech of the human body, and the predicted age is obtained through the trained long-short term memory network regression model.

Furthermore, the above scheme can perform parameter updating on the long-short term memory network through a loss function and an optimization algorithm of cross entropy loss, and train and update the long-short term memory network regression model through iteration of prediction times by adopting the long-short term memory network after parameter updating, so that the advantage of improving the accuracy of predicting the human age corresponding to the voice of the human body can be realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for age prediction based on speech according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a method for predicting age based on speech according to another embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an embodiment of an apparatus for predicting age based on speech according to the present invention;

FIG. 4 is a schematic diagram of another embodiment of the age prediction apparatus based on speech according to the present invention;

fig. 5 is a schematic structural diagram of an embodiment of the age prediction apparatus based on speech according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be noted that the following examples are only illustrative of the present invention, and do not limit the scope of the present invention. Similarly, the following examples are only some but not all examples of the present invention, and all other examples obtained by those skilled in the art without any inventive work are within the scope of the present invention.

The invention provides an age prediction method based on voice, which can realize the prediction of the age of a human body through the voice of the human body.

Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a method for predicting age based on speech according to the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the method comprises the steps of:

s101: and acquiring voice data of human bodies of different ages.

In this embodiment, the voice data of the human bodies of different ages can be acquired at one time, the voice data of the human bodies of different ages can be acquired for multiple times, the voice data of the human bodies of different ages can be acquired one by one, and the like, and the invention is not limited.

In this embodiment, the voice data of different human bodies of different ages can be obtained, and the voice data of the same human body of different ages can also be obtained, which is not limited in the present invention.

S102: an LSTM (Long Short-Term Memory) regression model based on the voice data is constructed.

In this embodiment, the constructed long-short term memory network regression model can predict the age of the human body by using an appropriate regressor according to the difference of the speech characteristics between human bodies of different ages.

S103: and training the constructed long-short term memory network regression model by adopting the long-short term memory network.

The training of the long-short term memory network regression model by using the long-short term memory network may include:

the method comprises the steps of marking age tags of corresponding ages of each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting MFCC (Mel-scale Frequency Cepstral Coefficients) features and fundamental Frequency features from the acoustic features as training inputs of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel-scale Cepstral coefficient features and the fundamental Frequency features as the training inputs.

S104: and predicting the age of the human body corresponding to the voice of the human body according to the trained long-short term memory network regression model.

The predicting the human age corresponding to the speech of the human voice according to the trained long-term and short-term memory network regression model may include:

according to the trained long-short term memory network regression model, Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice are extracted from the voice of the human body, the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice are input into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and the predicted age is obtained through the trained long-short term memory network regression model.

After the predicting the human age corresponding to the speech of the human voice according to the trained long-short term memory network regression model, the method may further include:

the long-short term memory network is subjected to parameter updating through a loss function of cross entropy loss and an optimization algorithm, and the long-short term memory network regression model is trained and updated through iteration of prediction times by adopting the long-short term memory network subjected to parameter updating, so that the advantage that the accuracy of predicting the human age corresponding to the voice of the human body can be improved.

It can be found that, in this embodiment, the voice data of the human body of different age groups can be obtained, the long-short term memory network regression model based on the voice data can be constructed, the long-short term memory network regression model constructed can be trained by using the long-short term memory network, and the voice of the human body can be predicted according to the trained long-short term memory network regression model, so that the age of the human body can be predicted by the voice of the human body.

Further, in this embodiment, a long-short term memory network may be used to mark an age tag of each voice in the voice data for a corresponding age, extract an acoustic feature of each voice from the voice data marked with the age tag, extract a mel-frequency cepstrum coefficient feature and a fundamental frequency feature from the acoustic feature as a training input of the long-short term memory network, and train the constructed long-short term memory network regression model by using the long-short term memory network with the mel-frequency cepstrum coefficient feature and the fundamental frequency feature as the training input.

Further, in this embodiment, according to the trained long-short term memory network regression model, mel cepstrum coefficient features and fundamental frequency features associated with the speech may be extracted from the speech of the human body, and the extracted mel cepstrum coefficient features and fundamental frequency features associated with the speech may be input into the trained long-short term memory network regression model to predict the age of the human body corresponding to the speech of the human body.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for predicting age based on speech according to another embodiment of the present invention. In this embodiment, the method includes the steps of:

s201: and acquiring voice data of human bodies of different ages.

As described above in S101, further description is omitted here.

S202: and constructing a long-short term memory network regression model based on the voice data.

As described above in S102, further description is omitted here.

S203: and training the constructed long-short term memory network regression model by adopting the long-short term memory network.

As described above in S103, which is not described herein.

S204: and predicting the age of the human body corresponding to the voice of the human body according to the trained long-short term memory network regression model.

As described above in S104, and will not be described herein.

S205: and carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to train and update the long-short term memory network regression model through iteration of prediction times.

It can be found that, in this embodiment, parameter updating can be performed on the long-short term memory network through a loss function and an optimization algorithm of cross entropy loss, and the long-short term memory network after parameter updating is adopted to perform training and updating on the long-short term memory network regression model through iteration of prediction times, which has the advantage of being capable of improving accuracy of predicting human age corresponding to human voice of a human body.

The invention also provides an age prediction device based on voice, which can realize the prediction of the age of the human body through the voice of the human body.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an age prediction apparatus based on speech according to an embodiment of the present invention. In this embodiment, the age prediction device 30 based on speech includes an obtaining module 31, a constructing module 32, a training module 33, and a predicting module 34.

The acquiring module 31 is used for acquiring voice data of human bodies of different ages.

The building module 32 is configured to build a long-short term memory network regression model based on the speech data.

The training module 33 is configured to train the constructed long-short term memory network regression model by using the long-short term memory network.

The prediction module 34 is configured to predict the human age corresponding to the speech of the human according to the trained long-term and short-term memory network regression model.

Optionally, the training module 33 may be specifically configured to:

the method comprises the steps of marking age tags of corresponding ages on each voice in voice data by adopting a long-short term memory network, extracting acoustic features of each voice from the voice data marked by the age tags, extracting Mel cepstrum coefficient features and fundamental frequency features from the acoustic features to be used as training input of the long-short term memory network, and training the constructed long-short term memory network regression model by adopting the long-short term memory network with the Mel cepstrum coefficient features and the fundamental frequency features as training input.

Optionally, the prediction module 34 may be specifically configured to:

extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of the human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an age prediction apparatus based on speech according to another embodiment of the present invention. Different from the previous embodiment, the age prediction apparatus 40 based on speech of the present embodiment further includes an update module 41.

The updating module 41 is configured to perform parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and perform training and updating on the long-short term memory network regression model through iteration of prediction times by using the long-short term memory network after parameter updating.

Each unit module of the age prediction device 30/40 based on speech can respectively execute the corresponding steps in the above method embodiments, so the detailed description of each unit module is omitted here, and please refer to the description of the corresponding steps above.

The present invention further provides a voice-based age prediction apparatus, as shown in fig. 5, including: at least one processor 51; and a memory 52 communicatively coupled to the at least one processor 51; the memory 52 stores instructions executable by the at least one processor 51, and the instructions are executable by the at least one processor 51 to enable the at least one processor 51 to perform the above-described speech-based age prediction method.

Wherein the memory 52 and the processor 51 are coupled in a bus, which may comprise any number of interconnected buses and bridges, which couple one or more of the various circuits of the processor 51 and the memory 52 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 51 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 51.

The processor 51 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 52 may be used to store data used by the processor 51 in performing operations.

The present invention further provides a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a part of the embodiments of the present invention, and not intended to limit the scope of the present invention, and all equivalent devices or equivalent processes performed by the present invention through the contents of the specification and the drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for age prediction based on speech, comprising:

acquiring voice data of human bodies of different ages;

constructing a long-short term memory network regression model based on the voice data;

training the constructed long-short term memory network regression model by adopting a long-short term memory network;

and predicting the age of the human body corresponding to the voice of the human body according to the trained long-short term memory network regression model.

2. The speech-based age prediction method of claim 1 wherein the training of the constructed long-short term memory network regression model using the long-short term memory network comprises:

3. The method of claim 1, wherein the predicting the age of the human body corresponding to the speech of the human body according to the trained long-short term memory network regression model comprises:

extracting Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice from the voice of a human body according to the trained long-short term memory network regression model, inputting the extracted Mel cepstrum coefficient characteristics and fundamental frequency characteristics associated with the voice into the trained long-short term memory network regression model to predict the age of the human body corresponding to the voice of the human body, and obtaining the predicted age through the trained long-short term memory network regression model.

4. The speech-based age prediction method of claim 1, further comprising, after said predicting the human's age corresponding to the speech of the human's speech based on the trained long-short term memory network regression model, further:

and carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to train and update the long-short term memory network regression model through iteration of prediction times.

5. A speech-based age prediction apparatus, comprising:

the device comprises an acquisition module, a construction module, a training module and a prediction module;

the acquisition module is used for acquiring voice data of human bodies of different ages;

the construction module is used for constructing a long-short term memory network regression model based on the voice data;

the training module is used for training the constructed long-short term memory network regression model by adopting a long-short term memory network;

and the prediction module is used for predicting the human age corresponding to the voice of the human voice according to the trained long-short term memory network regression model.

6. The speech-based age prediction device of claim 5, wherein the training module is specifically configured to:

7. The speech-based age prediction device of claim 5, wherein the prediction module is specifically configured to:

8. The speech-based age prediction device of claim 5, further comprising:

an update module;

the updating module is used for carrying out parameter updating on the long-short term memory network through a loss function of cross entropy loss and an optimization algorithm, and adopting the long-short term memory network after parameter updating to carry out training updating on the long-short term memory network regression model through iteration of prediction times.

9. A speech-based age prediction device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the speech-based age prediction method of any one of claims 1 to 4.

10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the speech-based age prediction method of any one of claims 1 to 4.