CN111933148A

CN111933148A - Age identification method and device based on convolutional neural network and terminal

Info

Publication number: CN111933148A
Application number: CN202010601537.9A
Authority: CN
Inventors: 叶志坚; 李稀敏; 肖龙源; 刘晓葳
Original assignee: Xiamen Kuaishangtong Technology Co Ltd
Current assignee: Xiamen Kuaishangtong Technology Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-11-13

Abstract

The invention provides an age identification method based on a convolutional neural network, which comprises the following steps: collecting audio data of different age groups, and dividing the collected data into n categories according to the different age groups of the audio data; constructing a multi-classification volume neural network age classification model; constructing a multi-classification convolutional neural network age classification model, constructing a network structure and performing model training, obtaining a trained model after a plurality of iterations, inputting a test audio, extracting audio features of the test audio, and inputting the audio features into the trained network model for testing; and matching the result according to the information output by the trained network model, judging which age group the result belongs to, and outputting age group information. The invention further correspondingly provides an age identification device and a terminal based on the convolutional neural network. According to the age identification method based on the convolutional neural network, the audio data are collected to identify the age, the identification accuracy is high, and the age identification efficiency is greatly improved.

Description

Age identification method and device based on convolutional neural network and terminal

Technical Field

The invention relates to the technical field of information processing, in particular to an age identification method, device and terminal based on a convolutional neural network.

Background

In daily life, the age identification is generally carried out by adopting a facial identification mode, and in some specific cases, the age identification cannot be carried out by adopting the facial identification mode because facial information cannot be acquired.

Voice information, in some specific tasks and environments, may be collected and recognized to obtain valuable information. For example, in a criminal investigation case, since the amount of information of a suspect is acquired is small, the age group of the suspect can be identified by the acquired audio information, and the exclusion range of the police is narrowed. In the patent application No. 201910076388.6, an age identifying device based on a preset neural network is disclosed, which completes training of a network model through iterative training until a prediction error is smaller than a set threshold value.

The existing age identification method usually needs to collect a large amount of data information, and the difference of the collected information can bring interference to the identification of the data, so that the age identification is inaccurate.

Disclosure of Invention

In view of the above, it is desirable to provide an age identification method, an age identification device and an age identification terminal based on a convolutional neural network, which have high identification efficiency and accurate identification result, so as to solve the above problems.

The invention provides an age identification method based on a convolutional neural network, which comprises the following steps:

collecting audio data of different age groups, and dividing the collected data into n categories according to the different age groups of the audio data;

constructing a multi-classification convolutional neural network age group classification model, and classifying the audio data by adopting a proper classifier according to the variation characteristics of pronunciation habit difference and frequency of different age groups;

training a classification model by using a convolutional neural network, dividing audio data into a training set and a test set, carrying out vad processing on the audio data, carrying out feature extraction, constructing a network structure, carrying out model training, and carrying out iteration for a plurality of times to obtain a trained model;

inputting a test audio, extracting audio features of the test audio, and inputting the audio features into a trained network model for testing;

and outputting age group information, matching the result according to the information output by the trained network model, judging which age group the result belongs to, and outputting the age group information.

Further, the training of the classification model by using the convolutional neural network comprises:

dividing the audio data, and taking out 80% of all collected audio data as a training set and 20% of all collected audio data as a testing set;

carrying out vad processing on the audio data, cutting off a mute section of the audio data, and intercepting the audio data after being subjected to vad processing into 4s sections;

extracting characteristics, namely extracting stft characteristics from the audio data subjected to vad processing, wherein 257-dimensional stft characteristics are adopted as bottom acoustic characteristics;

constructing a network structure and carrying out model training, wherein an output layer adopts n node softmax layers, and an one-hot code is used for representing the age bracket to which the output layer belongs;

and (3) updating network parameters, wherein the network adopts a loss function as cross entropy loss, updates the network parameters by adopting an Adam algorithm, and obtains a trained model through a plurality of iterations.

Further, the network structure specifically includes:

a first layer: DNN layer, second layer: DNN layer, third layer: DNN layer, fourth layer: CNN layer, fifth-seventh layer: CNN layer, eighth layer: pooling layer, ninth layer: and (4) fully connecting the layers.

Furthermore, dropout operation is added in the process of constructing a network structure and training a model, so that overfitting of the model is prevented.

Further, the acquiring audio data of different age groups comprises:

inputting the audio information;

performing front-end preprocessing, including signal processing and feature extraction;

performing back-end processing on the audio information based on an acoustic model and a language model;

and outputting a voice recognition result.

Further, the feature extraction of the audio information includes:

preprocessing the audio information;

carrying out signal transformation on each frame of audio information to obtain an amplitude spectrum;

adding a Mel filter bank to the magnitude spectrum;

and carrying out logarithm operation on the output of the filter, and then carrying out one-step discrete cosine transform to obtain the MFCC characteristics.

The application also provides an age identification device based on volume neural network, includes:

the audio acquisition module is used for acquiring audio data information of different ages;

the classification model building module is used for building a multi-classification volume neural network age classification model;

the classification model training module is used for training a classification model by using a convolutional neural network;

the test audio input module is used for extracting the audio features of the test audio;

and the age group information output module judges which age group the result belongs to and outputs the age group information.

Further, the audio acquisition module comprises:

the preprocessing module is used for processing signals and extracting features;

and the back-end processing module is used for carrying out back-end processing on the audio information.

The application also provides a terminal device, which comprises a memory and a processor, and is characterized in that the processor is used for realizing the steps of the method when executing the computer program.

The present application also proposes a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of the present application.

According to the age identification method based on the convolutional neural network, the characteristics of the audio data are further extracted by acquiring the audio data and processing the audio data, so that the identity of the acquired different audio data is ensured, and the method is not interfered by irrelevant factors; by constructing a multi-classification convolutional neural network age classification model and carrying out model training, after a plurality of iterations, a trained model is obtained, and the result is further judged and age information is output. Compared with the prior art, the method and the device can accurately identify the audio data, and the volume neural network model is adopted, so that the identification result is accurate and the efficiency is higher for age identification based on the audio data.

Drawings

Fig. 1 is a schematic flowchart of an age identification method based on a convolutional neural network according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of model training based on a convolutional neural network in an embodiment of the present invention.

Fig. 3 is a detailed flow chart of audio data acquisition according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of feature extraction of audio information in an embodiment of the present invention.

Fig. 5 is a schematic diagram of signal transformation of audio information in an embodiment of the invention.

Fig. 6 is a block diagram showing a specific configuration of the age identifying apparatus based on the convolutional neural network according to the present invention.

Fig. 7 is a block diagram of an audio capture module in an embodiment of the invention.

Fig. 8 is a block diagram of a detailed structure of a terminal according to an embodiment of the present invention.

Description of the main elements

Terminal 100

Audio acquisition module 11

Preprocessing module 111

Back-end processing module 11

Classification model constructing module 120

Classification model training module 130

Test audio input module 140

Age group information output module 150

Processor 21

Memory 22

RAM 221

Cache 222

Storage system 223

Program module 224

I/O interface 230

Network adapter 240

The following detailed description will further illustrate the invention in conjunction with the above-described figures.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It is to be further noted that, for the convenience of description, only some but not all of the matters related to the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts various operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of various operations may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Referring to fig. 1, the present invention provides an age identification method based on a convolutional neural network, and the method of the present embodiment may be implemented by an age identification apparatus based on a convolutional neural network, which may be implemented in a hardware or software manner and may be generally integrated in a device, such as a server. The method of the embodiment specifically includes:

and S100, collecting audio data of different ages.

In this embodiment, after audio data is collected, the data is divided into n categories according to the age group to which the audio data belongs.

S200, constructing a multi-classification volume neural network age classification model.

In this embodiment, the audio data is classified by using a suitable classifier according to the pronunciation habit difference and the frequency variation characteristics of different age groups.

And S300, training a classification model by using a convolutional neural network.

In this embodiment, audio data is first divided into a training set and a test set, vad processing is performed on the audio data, feature extraction is performed, a network structure is constructed, model training is performed, and a trained model is obtained after a plurality of iterations.

And S400, inputting a test audio.

Before the test audio is input, the audio features of the test audio are extracted and then input into a trained network model for testing.

And S500, outputting age group information.

And matching the result according to the information output by the trained network model, judging which age group the result belongs to, and outputting age group information.

According to the age identification method based on the convolutional neural network, provided by the embodiment of the invention, the convolutional neural network classification model is constructed by collecting the audio data, the age group is further identified, the identification result is accurate, and the identification efficiency is high.

Fig. 2 shows a model training diagram based on the convolutional neural network. As shown in fig. 2, the training of the classification model using the convolutional neural network includes:

and S310, dividing the audio data.

In this embodiment, 80% of all collected audio data is taken as the training set and 20% is taken as the test set.

And S320, carrying out vad processing on the audio data.

And cutting off the mute section of the audio data, and cutting off the audio data after vad processing into 4s sections.

It should be noted that vad processing, that is, a voice endpoint detection technology, is adopted to separate silence from actual voice, and further intercept actual voice.

And S330, feature extraction.

Extracting stft characteristics from the audio data subjected to vad processing, wherein 257-dimensional stft characteristics are adopted as bottom acoustic characteristics;

in the present embodiment, a short-time fourier transform (stft), i.e., a series of windowed fourier transforms, is used to extract audio data features.

And S340, constructing a network structure and carrying out model training.

In this embodiment, the network structure specifically includes:

furthermore, dropout operation is added into the model on the network structure, so that overfitting of the model is prevented.

The output layer adopts n node softmax layers, and the age bracket is represented by one-hot codes. Illustratively, the age groups are arranged in the order: … for 0-5, 5-10, 10-15, 15-20 years old, then 0-5 years old is expressed as: 1000 …, expressed as age 5-10: 0100 …, age 10-15 expressed as: 0010 ….

And S350, updating the network parameters.

In this embodiment, the network adopts a loss function as cross entropy loss, updates network parameters by using an Adam algorithm, and obtains a trained model through a plurality of iterations.

Further, the Adam algorithm updates network parameters through steps of initialization, iterative processing, weighted average calculation, deviation correction, weight updating and the like, and a trained model is obtained after a plurality of iterations.

In this embodiment, the collected audio data is subjected to vad processing, data characteristics of the audio data are further extracted, a network structure is constructed for model training, and an Adam algorithm is further adopted for updating network parameters, so that the recognition degree of the collected audio data is ensured, and the accuracy of age recognition is improved.

Fig. 3 is a detailed flow chart of audio data acquisition. As shown in fig. 3, the acquiring audio data of different age groups includes:

inputting the audio information;

and outputting a voice recognition result.

In this embodiment, an acoustic model and a language model are combined, acoustic and pronunciation information is integrated, and the acquired audio data is used as a beginning input to obtain an audio recognition result.

Fig. 4 is a schematic diagram of feature extraction of audio information. Referring to fig. 4, the feature extraction of the audio information includes:

preprocessing the audio information;

adding a Mel filter bank to the magnitude spectrum;

In this embodiment, the pre-processing of the audio information is framing, i.e. processing the speech stream into segments. The pre-emphasis is realized by compensating the high frame component of the voice signal at the transmitting end, so that the influence of sharp noise is reduced, and the high frequency part is improved.

After the preprocessing, fourier signal transformation is performed on the audio information, as can be seen in fig. 5. In this embodiment, a vector can be obtained by performing fourier transform on each frame of audio, and corresponds to the size of each frequency point. Based on this, by putting together a plurality of frames, an amplitude spectrogram can be obtained.

Further, after the amplitude spectrogram is obtained, a filter bank is added to the amplitude spectrogram, logarithm operation is performed on the output of the filter bank, and dynamic features are further obtained through discrete cosine transform, so that feature vectors are output.

The audio information feature extraction provided by the embodiment can be used for processing the audio information rapidly and efficiently, and further outputting the feature vector, so that the age identification efficiency based on the convolutional neural network is effectively improved.

Fig. 6 is a block diagram showing a specific configuration of the age identifying apparatus based on the convolutional neural network according to the present invention. As shown in fig. 6, the apparatus includes:

the audio acquisition module 110 is used for acquiring audio data information of different ages;

a classification model construction module 120, configured to construct a multi-class convolutional neural network age classification model;

a classification model training module 130 for training a classification model by using a convolutional neural network;

a test audio input module 140 for extracting audio features of the test audio;

the age group information output module 150 determines which age group the result belongs to, and outputs age group information.

Further, as shown in fig. 7, the audio capture module includes:

a preprocessing module 111 for performing signal processing and feature extraction;

and a back-end processing module 112, configured to perform back-end processing on the audio information.

The age identification device based on the convolutional neural network provided by the embodiment constructs the convolutional neural network classification model by acquiring audio data, so that the age bracket is further identified, the identification result is accurate, and the identification efficiency is high.

Fig. 8 is a block diagram of a terminal according to an embodiment of the present invention. The terminal 100 shown in fig. 8 is suitable for implementing embodiments of the present invention. The terminal 100 shown in fig. 8 is only an example, and should not bring any limitation to the functions and applicable scope of the embodiments of the present invention.

As shown in fig. 8, the components of terminal 100 may include, but are not limited to: one or more processors 16, and a system memory 220. In the present embodiment, the terminal 100 includes a variety of computer system readable media. Such media may be any available media that is accessible by terminal 100 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 220 may include computer system readable media in the form of volatile memory, such as random access memory (RAM221) and/or cache memory 222. Memory 220 may include at least one program product having a set (e.g., at least one) of program modules 224 that are configured to carry out the functions of embodiments of the invention.

The terminal 100 can communicate with one or more terminals that enable a user to interact with the terminal 100, such communication being via input/output (I/O) interfaces 230. The terminal 100 may also communicate with one or more networks (e.g., a local area network, a wide area network, the internet, etc.) through a network adapter 240.

The processor 210 executes programs stored in the memory 220 to perform various functional applications and data processing, such as the age identification method based on the volume neural network provided by the embodiment of the present invention.

Embodiments of the present invention also provide a computer-readable storage medium, which when executed by a computer processor, is configured to perform a method for identifying an age based on a volume neural network according to an embodiment of the present invention. Computer storage media in accordance with embodiments of the present invention may employ any combination of one or more computer-readable media.

The computer readable storage medium of the present embodiments may be an electronic, magnetic, optical, or semiconductor system, apparatus, or device, or any combination thereof. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

In embodiments of the present invention, computer program code for the operation of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and including conventional procedural programming languages. The program code may execute entirely on the computer, partly on the computer or remotely.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The units or computer means recited in the computer means claims may also be implemented by the same unit or computer means, either in software or in hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. An age identification method based on a volume neural network is characterized by comprising the following steps:

2. The method of claim 1, wherein the training of the classification model using the convolutional neural network comprises:

3. The age identification method based on the convolutional neural network as claimed in claim 2, wherein the network structure specifically comprises:

4. The age identification method based on the convolutional neural network as claimed in claim 2, wherein dropout operation is added in the process of constructing the network structure and training the model to prevent the model from being over-fitted.

5. The method of claim 1, wherein the collecting audio data for different age groups comprises:

inputting the audio information;

and outputting a voice recognition result.

6. The volume neural network-based age identification method of claim 5, wherein the feature extraction of the audio information comprises:

preprocessing the audio information;

adding a Mel filter bank to the magnitude spectrum;

7. An age identifying apparatus based on a convolutional neural network, comprising:

8. The convolutional neural network-based age identification device of claim 7, wherein the audio acquisition module comprises:

9. A terminal device comprising a memory and a processor, characterized in that the processor is adapted to carry out the steps of the method of any of claims 1-6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.