CN111312286A - Age identification method, age identification device, age identification equipment and computer readable storage medium - Google Patents

Age identification method, age identification device, age identification equipment and computer readable storage medium Download PDF

Info

Publication number
CN111312286A
CN111312286A CN202010094834.9A CN202010094834A CN111312286A CN 111312286 A CN111312286 A CN 111312286A CN 202010094834 A CN202010094834 A CN 202010094834A CN 111312286 A CN111312286 A CN 111312286A
Authority
CN
China
Prior art keywords
voice
age
target
age identification
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010094834.9A
Other languages
Chinese (zh)
Inventor
马坤
赵之砚
施奕明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010094834.9A priority Critical patent/CN111312286A/en
Publication of CN111312286A publication Critical patent/CN111312286A/en
Priority to PCT/CN2021/071262 priority patent/WO2021159902A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides an age identification method, an age identification device, age identification equipment and a computer-readable storage medium, wherein the method comprises the following steps: acquiring a real voice sample from a preset database, and performing sample expansion on the real voice sample based on a generative countermeasure network GAN to obtain an expanded voice sample; training through the extended voice sample to obtain an age identification network model; acquiring a target voice of a target user, and converting the target voice into a corresponding input spectrogram; and extracting the depth feature of the input spectrogram through the age identification network model, and determining a target age group to which the target user belongs according to the depth feature. The invention can improve the accuracy of age identification.

Description

Age identification method, age identification device, age identification equipment and computer readable storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an age identification method, an age identification device, age identification equipment and a computer readable storage medium.
Background
At present, when the loan company hastens, in order to strengthen the user experience and hasten the effect of collecting, often can discern the user age according to the user's pronunciation in the conversation process, then adopt the different mode of hastening the collection to urge the collection according to the user age. Most of the traditional voice age identification methods are based on the voice signal characteristic of voice to carry out statistical analysis, and then determine the age of a speaker; however, due to the limitation of the phonetic signal characteristics, the method has insufficient generalization capability, low recognition accuracy in practical application and poor application effect.
Disclosure of Invention
The invention mainly aims to provide an age identification method, an age identification device, age identification equipment and a computer readable storage medium, and aims to solve the technical problem that the traditional age identification is low in accuracy.
In order to achieve the above object, an embodiment of the present invention provides an age identifying method, including:
acquiring a real voice sample from a preset database, and performing sample expansion on the real voice sample based on a generative countermeasure network GAN to obtain an expanded voice sample;
training through the extended voice sample to obtain an age identification network model;
acquiring a target voice of a target user, and converting the target voice into a corresponding input spectrogram;
and extracting the depth feature of the input spectrogram through the age identification network model, and determining a target age group to which the target user belongs according to the depth feature.
Further, to achieve the above object, an embodiment of the present invention further provides an age identifying apparatus, including:
the system comprises a sample expansion module, a voice recognition module and a voice recognition module, wherein the sample expansion module is used for acquiring a real voice sample from a preset database and performing sample expansion on the real voice sample based on a generative confrontation network GAN to obtain an expanded voice sample;
the model training module is used for obtaining an age identification network model through the training of the extended voice sample;
the voice conversion module is used for acquiring a target voice of a target user and converting the target voice into a corresponding input spectrogram;
and the age determining module is used for extracting the depth feature of the input spectrogram through the age identification network model and determining the target age bracket to which the target user belongs according to the depth feature.
Furthermore, in order to achieve the above object, an embodiment of the present invention further provides an age identifying device, which includes a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the age identifying method as described above.
Furthermore, to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the age identifying method as described above.
According to the embodiment of the invention, large-scale data samples are obtained by a generative countermeasure network GAN data expansion mode, the number of the data samples is increased, the data samples can better accord with the distribution of real data (namely, the quality of the samples is ensured), and then an end-to-end network model is constructed by training of enough number and enough real data samples, so that the hidden rule of the data samples can be more accurately understood in the model training process, the performance of the obtained network model is improved, and the accuracy of age identification is further subsequently carried out by using the network model; then the target voice to be recognized is converted into a spectrogram, the spectrogram is subjected to feature extraction through an obtained network model, the depth features of the target voice are obtained, compared with the traditional age recognition based on the voice signalology features, the depth features comprise more features, the age attribute representation which is difficult to recognize in the target voice can be concerned, the target age bracket to which the target user belongs is recognized through the depth features, the accurate grasp of the incidence relation between the age and the voice is facilitated, the generalization capability of the age recognition is improved, and the accuracy of the age recognition is improved.
Drawings
Fig. 1 is a schematic diagram of a hardware configuration of an age identifying apparatus according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an age identifying method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating an age identifying method according to a second embodiment of the present invention;
fig. 4 is a flowchart illustrating an age identifying method according to a third embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The age identification method according to the embodiment of the present invention is mainly applied to an age identification device, which may be a device having a data processing function, such as a server, a Personal Computer (PC), or a notebook computer.
Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of an age identifying apparatus according to an embodiment of the present invention. In this embodiment of the present invention, the age identifying apparatus may include a processor 1001 (e.g., a central processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used for realizing connection communication among the components; the user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WI-FI interface, WI-FI interface); the memory 1005 may be a Random Access Memory (RAM) or a non-volatile memory (non-volatile memory), such as a magnetic disk memory, and the memory 1005 may optionally be a storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration depicted in FIG. 1 is not intended to be limiting of the present invention, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.
With continued reference to FIG. 1, the memory 1005 of FIG. 1, which is one type of computer-readable storage medium, may include an operating system, a network communication module, and a computer program. In fig. 1, the network communication module may be configured to connect to a preset database, and perform data communication with the database; and the processor 1001 may call a computer program stored in the memory 1005 and perform the age identifying method provided by the embodiment of the present invention.
Based on the hardware architecture, embodiments of the age identification method of the present invention are provided.
The embodiment of the invention provides an age identification method.
Referring to fig. 2, fig. 2 is a flowchart illustrating an age identifying method according to a first embodiment of the present invention.
In this embodiment, the age identification method includes the steps of:
step S10, acquiring a real voice sample from a preset database, and performing sample expansion on the real voice sample based on a generative countermeasure network GAN to obtain an expanded voice sample;
at present, when the loan company hastens, in order to strengthen the user experience and hasten the effect of collecting, often can discern the user age according to the user's pronunciation in the conversation process, then adopt the different mode of hastening the collection to urge the collection according to the user age. Most of the traditional voice age identification methods are based on the voice signal characteristic of voice to perform statistical analysis, so as to determine the age of a speaker; however, due to the limitation of the phonetic signal characteristics, the method has insufficient generalization capability, low recognition accuracy in practical application and poor application effect. In contrast, the embodiment provides an age identification method, which includes obtaining large-scale data samples in a generative confrontation network GAN data expansion mode, enabling the data samples to better conform to distribution of real data (namely, ensuring quality of the samples) while increasing the number of the data samples, and training and constructing an end-to-end network model by using enough and real data samples, so that a hidden rule of the data samples can be more accurately understood in a model training process, performance of the obtained network model is improved, and accuracy of age identification by subsequently using the network model is improved; then the target voice to be recognized is converted into a spectrogram, the spectrogram is subjected to feature extraction through an obtained network model, the depth features of the target voice are obtained, compared with the traditional age recognition based on the signalology features, the depth features comprise more features, the age attribute representation which is difficult to recognize in the target voice can be concerned more, and the target age bracket to which the target user belongs is recognized through the depth features, so that the correlation between the age and the voice can be accurately grasped, the generalization capability of the age recognition is improved, and the accuracy of the age recognition is improved.
The age identifying method in this embodiment is implemented by an age identifying device, which may be a server, a personal computer, a notebook computer, or the like. The server in this embodiment may be a server in a collection system, and the server is connected to a preset database, where a plurality of real voice samples collected in advance are stored in the database, and the real voice samples may be in the form of original voice or in the form of spectrogram; the real voice samples comprise corresponding sample annotations, and the annotation content comprises the age bracket of the user to which the real voice samples belong (of course, the annotation content can also comprise other information).
In this embodiment, before performing age identification, an age identification network model for identifying an age needs to be trained and constructed, and the age identification network model is constructed based on a deep neural network of machine learning. Considering that there may be a problem of data imbalance in a real voice sample that can be obtained in practice, and the number and quality of the sample have a large influence on a training result (model capability) of the model, for this reason, in this embodiment, sample expansion needs to be performed on the real voice sample to obtain an expanded voice sample, so as to obtain a large-scale data sample.
In the embodiment, when performing sample expansion, in order to improve the efficiency of sample expansion and ensure the sample instruction after sample expansion, the method may be performed based on a generative countermeasure network GAN. It should be noted that, when sample expansion is performed, the real voice sample used should be in the form of a spectrogram (including three-dimensional information of time, frequency and amplitude), and for the real voice sample in the form of original voice, it is first converted into a corresponding spectrogram through short-time fourier transform (or other means). Wherein, the generative antagonistic network gan (generative adaptive networks) comprises two sub-networks, which can be respectively called as generator g (generator) and discriminator d (discriminator); g is a network for generating an extended sample, a simulation sample which is as follows as much as possible to the distribution of a real voice sample can be generated through random noise and is marked as G (z), D is a judging network for judging whether the input sample is real, if the output is 1, the input sample represents real, and if the output is 0, the input sample represents impossible to be real; during the training process, the goal of G is to try to generate a real simulation sample to deceive D, and the goal of D is to try to separate the simulation sample generated by G from the real voice sample. In the most ideal situation, G can generate enough "spurious" simulated samples G (z), which makes it difficult for D to determine whether the simulated samples G (z) generated by G are true or not, i.e., D (G (z)) is 0.5; when the condition is met, a G with finished training (namely GAN training is finished) can be obtained, and the G is used for performing sample expansion on the real voice sample to obtain an expanded voice sample.
Step S20, obtaining an age identification network model through the training of the extended voice sample;
in the embodiment, when the extended voice sample is obtained, the server trains the extended voice sample to obtain an age identification network model; for the convenience of subsequent processing, the age identification network model can be set to be in an end-to-end mode, namely the input of the age identification network model is voice, the output is the age group to which the voice belongs, compared with the traditional mode of performing feature extraction through one model and then classifying through another model, the end-to-end mode does not need to label each process respectively, the workload of labeling is reduced, and meanwhile, the accuracy of age identification is improved. For the age identification network model of this embodiment, in order to improve the generalization and identification accuracy of the model, a deep network model may be adopted, for example, the model may be implemented based on the classical deep residual network ResNet50, and of course, when the architecture of ResNet50 is adopted, the partial structure may be adjusted according to the actual situation.
Step S30, obtaining a target voice of a target user and converting the target voice into a corresponding input spectrogram;
in this embodiment, when the age identification network model is obtained, the server may perform the voice age identification process in the collection process through the age identification network model. Specifically, when a certain collection urging item needs to be urged, collection urging item information corresponding to the item, such as a loan user (target user) of a certain loan, a contact way and the like, can be obtained first, and then a collection urging call is made according to the collection urging item information; when the telephone is connected, a general greeting voice can be played first to confirm the identity of the user; when the target user replies with the voice through the telephone, the server can acquire the target voice of the target user, and then converts the target voice into a corresponding input spectrogram in a short-time Fourier transform mode for subsequent analysis processing.
Step S40, extracting the depth feature of the input spectrogram through the age identification network model, and determining the target age bracket to which the target user belongs according to the depth feature.
In this embodiment, when the input spectrogram is obtained, feature extraction may be performed on the input spectrogram through the age identification network model to obtain corresponding depth features, and then the target age group to which the target user belongs may be determined according to the depth features.
Further, for a speech signal, the speech signal has both time domain and frequency domain attributes, which are correspondingly represented in the spectrogram and correspond to various features, which may be age-related and may not be age-related (e.g., ambient noise features). In order to improve the accuracy of age identification, in the embodiment, when performing feature extraction, an attention mechanism may be introduced into the age identification network model, or it is considered that an attention module is constructed and embedded into the age identification network model, for example, after being inserted into a certain feature layer in the middle for feature optimization (fine failure), and then the obtained optimized features are subjected to subsequent processing (for example, inputting the next layer, or taking the features as final features). The age identifying network model in this embodiment includes an intermediate feature layer and a feature optimization layer, and step S30 includes:
performing original feature extraction on the input spectrogram through a middle feature layer of the age identification network model to obtain corresponding original features;
the age identification network model in the embodiment includes an intermediate feature layer and a feature optimization layer, wherein the intermediate feature layer has the feature extraction function (including convolution, pooling and the like) of a general network intermediate layer, and the feature optimization layer is constructed based on an attention mechanism. When the input spectrogram is obtained, the server can firstly extract the original features of the input spectrogram through the intermediate feature layer of the age identification network model to obtain the corresponding original features. The original features can be considered to include all features of the input spectrogram, but the features are not necessarily related to the age, and if all the features are used for age identification, the accuracy of identification may be affected; meanwhile, the recognition speed is also influenced due to overlarge calculated amount; the present embodiment will also perform feature optimization (define failure) on these original features.
And performing feature optimization on the original features through a feature optimization layer of the age identification network model based on an attention mechanism to obtain corresponding optimization features, and determining the optimization features as the depth features of the input spectrogram.
And when the original features are obtained, the server performs feature optimization on the original features through a feature optimization layer of the age identification network model based on an attention mechanism to obtain corresponding optimized features. Specifically, the original features may be represented in the form of an original feature map, and denoted as F; the optimization features can be represented by an optimization feature map, denoted as F ". When the characteristic optimization is carried out, an original characteristic diagram F of original characteristics can be obtained through a characteristic optimization layer of an age identification network model, and
F∈RC*H*W
wherein, R is the feature image (spectrogram) space, C is the image (spectrogram) channel number, H is the image high height, and W is the image width.
For this F, a corresponding one-dimensional channel attention map can be calculated and recorded as MC(F) And is provided with
MC(F)∈RC*1*1
Wherein, each channel of F can be regarded as a feature vector feature detector, and the channel attention mainly focuses on what is meaningful in the input; for efficient channel latentention calculation, the present embodiment compresses F in spatial dimension using maximum pooling and average pooling, respectively, to obtain two different spatial context descriptions
Figure BDA0002383091700000072
And
Figure BDA0002383091700000073
then, a shared network consisting of a plurality of layers of perceptrons MLP is used for calculating the two different spatial background descriptions to obtain M of FC(F) That is to say
Figure BDA0002383091700000071
Wherein σ is sigmoid function, W0∈RC/r*C,W1∈RC*C/rR is compressibility, W0Relu is used as the activation function.
After obtaining the channel association map (M)C(F) Can be applied to said F and said channel atteElement-wise multiplication is carried out on the ntionmap to obtain a corresponding intermediate feature map F ', namely the intermediate feature map F' is obtained
Figure BDA0002383091700000081
Wherein the content of the first and second substances,
Figure BDA0002383091700000082
is element-wise multiplexing.
When F' is obtained, a corresponding two-dimensional channel attention map can be calculated and recorded as MS(F') and have
MS(F')∈R1*H*W
Wherein H, W is as defined above. Location information is of primary interest for this spatial attribute. In calculating the spatial attribute, first two different profiles are obtained using maximum pooling and average pooling, respectively, on the dimension pair F' of the channel
Figure BDA0002383091700000083
And
Figure BDA0002383091700000084
the two feature descriptions are then merged using convolution, and M for F' is generated using convolution operationsS(F'), i.e.
Figure BDA0002383091700000085
Wherein σ is sigmoid function, f7*7Represents a convolution layer of 7 by 7,
Figure BDA0002383091700000086
for the characterization of the maximum pooling of F' in the channel dimension,
Figure BDA0002383091700000087
is characterized by the average pooling of F' in the channel dimension.
In obtaining spatialattention map(MS(F ')), for F' and spatial association map (M)S(F')) performing element-wise multiplexing to obtain a corresponding optimized feature map F ″, i.e., obtaining an optimized feature map F ″
Figure BDA0002383091700000088
In the above formula, F ″ may be regarded as an optimization feature, and the server may determine the optimization feature as a depth feature of the input spectrogram and perform the age identification process according to the depth feature. It is worth noting that in practice, the intermediate feature layers of the age-recognition network model may be more than two layers (where "more" includes the same number, the same below), and the optimized feature layer may be after any intermediate feature layer. For example, the middle feature layer includes two layers, which are referred to as a first layer and a second layer in order from the input; the feature optimization layer can be behind the first layer and in front of the second layer, wherein the input of the feature optimization layer is the output of the first layer, and the optimized features of the output of the feature optimization layer are used as the input of the second layer, and the final depth features are obtained after the second layer processing for age identification; the optimized feature layer may be a second layer later, where the input of the optimized feature is the output of the second layer, and the optimized feature of the output of the feature optimization layer is directly used as the final depth feature for age identification.
When the depth feature is obtained, the target age group to which the target user belongs can be determined according to the depth feature. The age identification network model in the embodiment is in an end-to-end form, so the process of age identification can be realized at the output layer of the age identification network model; in the output layer, the server calculates the spatial distance between the depth feature and the sample feature of each expanded voice sample, and takes the expanded voice sample with the minimum spatial distance as a target sample matched with the input spectrogram (target voice); and then, inquiring the sample annotation of the target sample so as to determine the sample age corresponding to the target sample, wherein the sample age can also be considered as the voice age corresponding to the target voice, and the target age group to which the target user belongs can be determined according to the voice age. Of course, the span range of the age group can be set according to actual conditions.
Further, when determining the target age group to which the target user belongs, the server may query a preset dialect library according to the target age group to obtain a corresponding target dialect template, where the target dialect template may be preset and stored by a relevant administrator, and different dialect templates may exist for different age groups; when the server obtains the target speech technology template, voice broadcasting can be carried out according to the target speech technology template, and therefore voice collection is carried out on the target user.
The embodiment obtains a real voice sample from a preset database, and performs sample expansion on the real voice sample based on a generative confrontation network GAN to obtain an expanded voice sample; training through the extended voice sample to obtain an age identification network model; acquiring a target voice of a target user, and converting the target voice into a corresponding input spectrogram; and extracting the depth feature of the input spectrogram through the age identification network model, and determining a target age group to which the target user belongs according to the depth feature. Through the above manner, in the embodiment, a large-scale data sample is obtained through a generative countermeasure network GAN data expansion manner, so that the data sample can better conform to the distribution of real data (i.e., the quality of the sample is ensured) while the number of the data sample is increased, and then an end-to-end network model is constructed by training the data sample with sufficient number and sufficient reality, so that the hidden rule of the data sample can be more accurately understood in the model training process, the performance of the obtained network model is improved, and the accuracy of age identification is further subsequently performed by using the network model; then the target voice to be recognized is converted into a spectrogram, the spectrogram is subjected to feature extraction through an obtained network model, the depth features of the target voice are obtained, compared with the traditional age recognition based on the signalology features, the depth features comprise more features, the age attribute representation which is difficult to recognize in the target voice can be concerned more, and the target age bracket to which the target user belongs is recognized through the depth features, so that the correlation between the age and the voice can be accurately grasped, the generalization capability of the age recognition is improved, and the accuracy of the age recognition is improved.
Based on the embodiment shown in fig. 2, a second embodiment of the age identifying method of the present invention is provided.
Referring to fig. 3, fig. 3 is a flowchart illustrating an age identifying method according to a second embodiment of the present invention.
In this embodiment, the step S30 includes:
step S31, acquiring a target voice of a target user, and judging whether the voice time length of the target voice is greater than a preset time length threshold value;
when the target speech lasts for a long time, if the target speech is directly subjected to the age recognition processing, the calculation amount of the model processing process may be too large, and for this embodiment, the speech with a long duration may be cut to obtain multiple sections of speech, and the age recognition is performed on each section of speech, so that the calculation amount is reduced, and the accuracy of the age recognition is also improved. Specifically, when the server obtains the target voice of the target user, it may first determine whether the voice time of the target voice is greater than a preset time threshold, where the preset time threshold may be set according to actual needs.
Step S32, if the voice duration is greater than the preset duration threshold, performing voice cutting on the target voice to obtain more than two voice segments, and respectively converting each voice segment into a corresponding segment spectrogram;
in this embodiment, if the voice duration of the target voice is greater than the preset duration threshold, the server performs voice segmentation on the target voice to obtain more than two voice segments, and then converts each voice segment into a corresponding segment spectrogram. When voice cutting is performed, different rules can be defined according to actual conditions for determining the duration of each voice segment. For example, when the voice duration of the target voice is greater than a preset duration threshold, for different voice durations, the number of different voice segments may be corresponded, if the preset duration threshold is 3 seconds, the voice duration greater than 3 seconds and not greater than 4 seconds corresponds to 2 voice segments, the voice duration greater than 4 seconds corresponds to 3 voice segments, and then mean value cutting may be performed on the target voice according to the determined voice segments, so that the segment durations of the voice segments are the same; for another example, when the voice duration of the target voice is greater than the preset duration threshold, the target voice may be cut every other preset slice length, if the preset duration threshold is 3 seconds, the target voice may be cut every other 3 seconds, and for the voice duration of 5 seconds, the target voice is cut into two voice segments, where the durations are 3 seconds and 2 seconds respectively. Of course, other cutting methods are also possible in practice. It should be noted that, if the voice duration of the target voice is less than or equal to the preset duration threshold, the whole target voice may be directly converted into the corresponding input spectrogram, and the age identification processing of step S40 is executed.
The step S40 includes:
step S41, extracting the depth features of each segmental spectrogram through the age identification network model respectively, and determining the segmental age groups corresponding to each segmental spectrogram according to the depth features of each segmental spectrogram respectively;
in this embodiment, when obtaining each segmental spectrogram, the depth features of each segmental spectrogram may be extracted through the age identification network model, and the segmental age groups corresponding to each segmental spectrogram are determined according to the depth features of each segmental spectrogram. The feature extraction process and the segment age group determination process for each segment spectrogram can refer to the step S40, and are not described herein again.
Step S42, determining a target age group to which the target user belongs according to the segment age groups corresponding to the segment spectrograms.
In this embodiment, when determining the segment age groups corresponding to the segment spectrograms, the target age group to which the target user belongs may be determined. If the segment age groups corresponding to the segment frequency spectrogram are the same, determining the same segment age group as a target age group to which a target user belongs; if the segment age groups corresponding to the segment spectrograms are different, a voting decision rule can be defined according to the actual situation, so that the target age group to which the target user belongs is determined according to the rule and the determined segment age groups. For example, the defined voting decision rule may be a median average manner, for example, if the target speech corresponds to three segment spectrograms, one of which corresponds to the ages of 22 to 24, one of which corresponds to the ages of 26 to 28, and one of which corresponds to the ages of 28 to 30, the target age segment to which the target user belongs may be determined by respectively taking the median values 23, 27, and 29 of the three age segments, and then taking the mean value 26.3 of the three median values as the target age of the target user; for the defined voting decision rule, it may also be a majority determination manner, for example, if the target speech corresponds to three segment spectrograms, two of which are 22 to 24 years old and one of which is 26 to 28 years old, then the segment 22 to 24 with the largest number of corresponding segment spectrograms may be determined as the target segment to which the target user belongs. Of course, in addition to the above examples, other forms of voting decision rules may be adopted, such as calculating the confidence level of each age group according to the speech duration weight of each segment of speech, and setting the age group with the highest confidence level as the target age group to which the target user belongs.
In the embodiment, the voice with longer duration is cut to obtain multiple sections of voices, and the age of each section of voice is recognized respectively, so that the calculation amount can be reduced, the age recognition efficiency is improved, and the recognition error caused by accidental factors in the recognition process is reduced and the accuracy of the age recognition is improved as the target age of the target user is determined according to each recognition result after each section of voice is recognized.
Based on the embodiment shown in fig. 2, a third embodiment of the age identifying method of the present invention is provided.
Referring to fig. 4, fig. 4 is a flowchart illustrating an age identifying method according to a third embodiment of the present invention.
In this embodiment, after step S30, the method further includes:
step S50, when receiving the order of urging to accept, dialing urging to accept the call according to the order of urging to accept, and obtaining the corresponding call-in voice after the call is connected;
in this embodiment, when the server receives the receiving instruction, the server may obtain corresponding information of the receiving prompting item, such as a loan user (target user) on a certain loan, a contact information, and the like, according to the receiving prompting item information, and then make a call for receiving according to the information of the receiving prompting item, and obtain a call-on voice of the other party after the call is made. The collection prompting instruction can be triggered by a certain terminal by a manager, or a related collection prompting plan is stored in the server, and when the time reaches the collection prompting time set by the collection prompting plan, the collection prompting instruction is automatically triggered.
Step S60, determining whether or not there are two or more user voices in the connection voice;
when the connection voice is obtained, the server judges whether more than two user voices exist in the connection voice.
Step S70, if there are more than two user voices in the turn-on voice, determining the target voice of the target user in each user voice according to the voice duration and/or voice volume of each user voice.
In this embodiment, when the target user answers the call, the target user may be in a noisy environment or talking with a person, at this time, the connection voice acquired by the server may have more than two user voices, and for this reason, if there are more than two user voices in the connection voice, the server needs to determine the target voice of the target user from the connection voice, so as to accurately identify the age of the target user. Specifically, the server can distinguish the voices of the users according to the frequency, then obtain voice attributes of the voices of the users, wherein the voice attributes comprise voice frequency, voice duration, voice volume and the like, and then determine the target voice of the target user according to the voice attributes. For example, in practice, the target user is always holding a telephone for a call, and therefore, the user voice with the longest voice duration may be determined as the target voice of the target user; for another example, the target user is the user closest to the telephone, so that the voice of the target user is relatively the largest for the telephone, and thus the user voice with the largest voice volume can be determined as the target voice of the target user; of course, the two factors can be combined, for a user voice, the corresponding duration score and volume score can be obtained according to the voice duration and voice volume, then the duration score and the volume score are added to obtain a comprehensive score, and the user voice with the highest comprehensive score is determined as the target voice of the target user. It should be noted that if there is only one user voice in the turn-on voice, the user voice may be directly used as the target voice. Upon determining the target voice of the target user, the relevant age recognition processing of the above-described steps S30 and S40 may be performed.
Through the mode, the server of this embodiment is when asking for calling, if there are more than two user's pronunciation in putting through the pronunciation, then can determine the target pronunciation from it earlier, carries out follow-up age identification processing again, avoids carrying out the wrong problem of recognition that age identification probably leads to a plurality of user's pronunciation, improves age identification's accuracy.
Based on the embodiment shown in fig. 2, a fourth embodiment of the age identifying method of the present invention is provided.
In this embodiment, after step S40, the method further includes:
acquiring historical target age groups of historical target users in a preset number or in a preset period, and acquiring a gathering-accelerating age distribution according to the historical target age groups;
in this embodiment, the server may further store the current target age group to which the target user belongs when obtaining the target age group. When the historical target age groups of the historical target users in a preset number or a preset period are collected, the historical target age groups can be summarized and counted to obtain corresponding income-forcing age distribution. For example, of 100 historical target users, 30 belong to the age group of 26 to 28 years, and 70 belong to the age group of 30 to 32 years; for another example, in the last month's hasten harvest cycle, there are 100 historical target users, 30 of which belong to the age group of 26 to 28 years and 70 of which belong to the age group of 30 to 32 years.
And judging whether age groups with abnormal user number exist according to the age distribution, and if so, sending corresponding abnormal prompt information to a corresponding management terminal.
In this embodiment, when obtaining the age distribution, the server may determine whether there is an age group with an abnormal number of users according to the age distribution. The abnormality determination may be performed by setting a corresponding abnormality rule in advance, and then performing abnormality determination based on the rule, for example, when the number of users corresponding to a certain age group exceeds the abnormal number threshold or ratio, it is determined that the number of users in the certain age group is abnormal. In this embodiment, if there are age groups with abnormal number of users, there may be a certain risk in the age group for the current loan transaction, or the identification capability of the age identification network model is reduced, which may result in identifying too many users as the age group, and at this time, the server may send corresponding abnormal prompt information to the corresponding management terminal to prompt the relevant management personnel to perform timely inspection and processing.
Through the mode, the server can analyze the collection-urging age distribution of the historical target user and analyze whether abnormal conditions exist according to the collection-urging age distribution, so that the abnormal conditions can be found in time, and the service risk and the stability of the age identification network model are reduced.
In addition, the embodiment of the invention also provides an age identification device.
In this embodiment, the age identifying apparatus includes:
the system comprises a sample expansion module, a voice recognition module and a voice recognition module, wherein the sample expansion module is used for acquiring a real voice sample from a preset database and performing sample expansion on the real voice sample based on a generative confrontation network GAN to obtain an expanded voice sample;
the model training module is used for obtaining an age identification network model through the training of the extended voice sample;
the voice conversion module is used for acquiring a target voice of a target user and converting the target voice into a corresponding input spectrogram;
and the age determining module is used for extracting the depth feature of the input spectrogram through the age identification network model and determining the target age bracket to which the target user belongs according to the depth feature.
Wherein, each virtual function module of the age identifying apparatus is stored in the memory 1005 of the age identifying device shown in fig. 1, and is used for realizing all functions of a computer program; the modules may perform age identification functions when executed by the processor 1001.
Further, the age identification network model comprises an intermediate feature layer and a feature optimization layer, and the age determination module comprises:
the characteristic extraction unit is used for extracting the original characteristics of the input spectrogram through a middle characteristic layer of the age identification network model to obtain corresponding original characteristics;
and the feature optimization unit is used for performing feature optimization on the original features through a feature optimization layer of the age identification network model based on an attention mechanism to obtain corresponding optimized features, and determining the optimized features as the depth features of the input spectrogram.
Further, the original features comprise an original feature map F, the optimized features comprise an optimized feature map F ″, and the feature optimization unit is further configured to obtain the original feature map F of the original features through the feature optimization layer of the age identification network model; calculating a one-dimensional channel attention map corresponding to the F; carrying out element multiplication element-wise multiplication on the F and the channel attribute map to obtain a corresponding intermediate feature map F'; calculating a two-dimensional channel attention map corresponding to the F'; and performing element-wise multiplication on the F 'and the spatial attribute map to obtain a corresponding optimized feature map F'.
Further, the voice conversion module includes:
the time length judging unit is used for acquiring the target voice of the target user and judging whether the voice time length of the target voice is greater than a preset time length threshold value or not;
the voice segmentation unit is used for performing voice segmentation on the target voice to obtain more than two voice segments and respectively converting each voice segment into a corresponding segment spectrogram if the voice duration is greater than the preset duration threshold;
the age determining module is further configured to extract depth features of the segmental frequency spectrograms respectively through the age identification network model, and determine segmental age groups corresponding to the segmental frequency spectrograms respectively according to the depth features of the segmental frequency spectrograms; and determining the target age range to which the target user belongs according to the segment age range corresponding to each segment spectrogram.
Further, the age identifying apparatus further includes:
the voice acquisition module is used for dialing a call for collection according to the call for collection when receiving the call for collection instruction and acquiring corresponding connection voice after the call is connected;
the voice judging module is used for judging whether more than two user voices exist in the switched-on voice;
and the voice determining module is used for determining the target voice of the target user in the voice of each user according to the voice duration and/or the voice volume of the voice of each user if more than two user voices exist in the switched-on voice.
Further, the age identifying apparatus further includes:
and the voice collection module is used for acquiring a corresponding target speech technology template according to the target age group and collecting the voice of the target user according to the target speech technology template.
Further, the age identifying apparatus further includes:
the distribution acquisition module is used for acquiring historical target age groups of historical target users in preset number or preset period and acquiring the collection-promoting age distribution according to the historical target age groups;
and the abnormity judgment module is used for judging whether the age groups with abnormal user number exist according to the collection-promoting age distribution, and if so, sending corresponding abnormity prompt information to the corresponding management terminal.
The function implementation of each module in the age identification device corresponds to each step in the age identification method embodiment, and the function and implementation process are not described in detail herein.
In addition, the embodiment of the invention also provides a computer readable storage medium.
The computer-readable storage medium of the invention has stored thereon a computer program which, when being executed by a processor, carries out the steps of the age identification method as described above.
The method implemented when the computer program is executed may refer to various embodiments of the age identifying method of the present invention, and will not be described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An age identification method, comprising:
acquiring a real voice sample from a preset database, and performing sample expansion on the real voice sample based on a generative countermeasure network GAN to obtain an expanded voice sample;
training through the extended voice sample to obtain an age identification network model;
acquiring a target voice of a target user, and converting the target voice into a corresponding input spectrogram;
and extracting the depth feature of the input spectrogram through the age identification network model, and determining a target age group to which the target user belongs according to the depth feature.
2. The age identification method of claim 1, wherein the age identification network model comprises an intermediate feature layer and a feature optimization layer, and the step of extracting the depth features of the input spectrogram through the age identification network model comprises:
performing original feature extraction on the input spectrogram through a middle feature layer of the age identification network model to obtain corresponding original features;
and performing feature optimization on the original features through a feature optimization layer of the age identification network model based on an attention mechanism to obtain corresponding optimization features, and determining the optimization features as the depth features of the input spectrogram.
3. The age identification method of claim 2, wherein the raw features comprise a raw feature map F, the optimized features comprise an optimized feature map F ",
the step of performing feature optimization on the original features through the feature optimization layer of the age identification network model based on an attention mechanism to obtain corresponding optimized features comprises:
acquiring an original feature map F of the original features through a feature optimization layer of the age identification network model;
calculating a one-dimensional channel attention map corresponding to the F;
carrying out element multiplication element-wise multiplication on the F and the channel attribute map to obtain a corresponding intermediate feature map F';
calculating a two-dimensional channel attention map corresponding to the F';
and performing element-wise multiplication on the F 'and the spatial attribute map to obtain a corresponding optimized feature map F'.
4. The age identification method of claim 1, wherein the step of obtaining a target voice of a target user and converting the target voice into a corresponding target spectrogram comprises:
acquiring a target voice of a target user, and judging whether the voice time of the target voice is greater than a preset time threshold;
if the voice duration is greater than the preset duration threshold, performing voice cutting on the target voice to obtain more than two voice segments, and respectively converting each voice segment into a corresponding segment spectrogram;
the step of extracting the depth feature of the input spectrogram through the age identification network model and determining the target age bracket to which the target user belongs according to the depth feature comprises the following steps:
respectively extracting the depth features of the segmental frequency spectrograms through the age identification network model, and respectively determining segmental age groups corresponding to the segmental frequency spectrograms according to the depth features of the segmental frequency spectrograms;
and determining the target age range to which the target user belongs according to the segment age range corresponding to each segment spectrogram.
5. The age identification method as claimed in claim 1, wherein the step of obtaining the target voice of the target user and converting the target voice into the corresponding input spectrogram is preceded by the steps of:
when receiving an order for urging to receive, dialing an urging call according to the order for urging to receive, and acquiring corresponding connection voice after the call is connected;
judging whether more than two user voices exist in the switched-on voice;
and if more than two user voices exist in the switched-on voice, determining the target voice of the target user in each user voice according to the voice duration and/or voice volume of each user voice.
6. The age identification method of claim 1, wherein after the step of extracting the depth features of the input spectrogram through the age identification network model and determining the target age group to which the target user belongs according to the depth features, the method further comprises:
and acquiring a corresponding target speech technology template according to the target age group, and prompting the voice of the target user according to the target speech technology template.
7. The age identification method according to any one of claims 1 to 6, wherein after the step of extracting the depth features of the input spectrogram through the age identification network model and determining the target age group to which the target user belongs according to the depth features, the method further comprises:
acquiring historical target age groups of historical target users in a preset number or in a preset period, and acquiring a gathering-accelerating age distribution according to the historical target age groups;
and judging whether age groups with abnormal user number exist according to the age distribution, and if so, sending corresponding abnormal prompt information to a corresponding management terminal.
8. An age identifying device, characterized in that the age identifying device comprises:
the system comprises a sample expansion module, a voice recognition module and a voice recognition module, wherein the sample expansion module is used for acquiring a real voice sample from a preset database and performing sample expansion on the real voice sample based on a generative confrontation network GAN to obtain an expanded voice sample;
the model training module is used for obtaining an age identification network model through the training of the extended voice sample;
the voice conversion module is used for acquiring a target voice of a target user and converting the target voice into a corresponding input spectrogram;
and the age determining module is used for extracting the depth feature of the input spectrogram through the age identification network model and determining the target age bracket to which the target user belongs according to the depth feature.
9. An age identification device, characterized in that the age identification device comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the age identification method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, carries out the steps of the age identification method according to any one of claims 1 to 7.
CN202010094834.9A 2020-02-12 2020-02-12 Age identification method, age identification device, age identification equipment and computer readable storage medium Pending CN111312286A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010094834.9A CN111312286A (en) 2020-02-12 2020-02-12 Age identification method, age identification device, age identification equipment and computer readable storage medium
PCT/CN2021/071262 WO2021159902A1 (en) 2020-02-12 2021-01-12 Age recognition method, apparatus and device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010094834.9A CN111312286A (en) 2020-02-12 2020-02-12 Age identification method, age identification device, age identification equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111312286A true CN111312286A (en) 2020-06-19

Family

ID=71150902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010094834.9A Pending CN111312286A (en) 2020-02-12 2020-02-12 Age identification method, age identification device, age identification equipment and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN111312286A (en)
WO (1) WO2021159902A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021159902A1 (en) * 2020-02-12 2021-08-19 深圳壹账通智能科技有限公司 Age recognition method, apparatus and device, and computer-readable storage medium
CN114708872A (en) * 2022-03-22 2022-07-05 青岛海尔科技有限公司 Voice instruction response method and device, storage medium and electronic device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760523A (en) * 2022-03-30 2022-07-15 咪咕数字传媒有限公司 Audio and video processing method, device, equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016149881A1 (en) * 2015-03-20 2016-09-29 Intel Corporation Object recogntion based on boosting binary convolutional neural network features
KR101809511B1 (en) * 2016-08-04 2017-12-15 세종대학교산학협력단 Apparatus and method for age group recognition of speaker
CN108924218A (en) * 2018-06-29 2018-11-30 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN109299701A (en) * 2018-10-15 2019-02-01 南京信息工程大学 Expand the face age estimation method that more ethnic group features cooperate with selection based on GAN
CN110136726A (en) * 2019-06-20 2019-08-16 厦门市美亚柏科信息股份有限公司 A kind of estimation method, device, system and the storage medium of voice gender
WO2019222996A1 (en) * 2018-05-25 2019-11-28 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for voice recognition
WO2019225801A1 (en) * 2018-05-23 2019-11-28 한국과학기술원 Method and system for simultaneously recognizing emotion, age, and gender on basis of voice signal of user
CN110556129A (en) * 2019-09-09 2019-12-10 北京大学深圳研究生院 Bimodal emotion recognition model training method and bimodal emotion recognition method
CN110619889A (en) * 2019-09-19 2019-12-27 Oppo广东移动通信有限公司 Sign data identification method and device, electronic equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080090034A (en) * 2007-04-03 2008-10-08 삼성전자주식회사 Voice speaker recognition method and apparatus
CN103310788B (en) * 2013-05-23 2016-03-16 北京云知声信息技术有限公司 A kind of voice information identification method and system
KR102410914B1 (en) * 2015-07-16 2022-06-17 삼성전자주식회사 Modeling apparatus for voice recognition and method and apparatus for voice recognition
CN108922518B (en) * 2018-07-18 2020-10-23 苏州思必驰信息科技有限公司 Voice data amplification method and system
CN109147810B (en) * 2018-09-30 2019-11-26 百度在线网络技术(北京)有限公司 Establish the method, apparatus, equipment and computer storage medium of speech enhan-cement network
CN109559736B (en) * 2018-12-05 2022-03-08 中国计量大学 Automatic dubbing method for movie actors based on confrontation network
CN111312286A (en) * 2020-02-12 2020-06-19 深圳壹账通智能科技有限公司 Age identification method, age identification device, age identification equipment and computer readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016149881A1 (en) * 2015-03-20 2016-09-29 Intel Corporation Object recogntion based on boosting binary convolutional neural network features
KR101809511B1 (en) * 2016-08-04 2017-12-15 세종대학교산학협력단 Apparatus and method for age group recognition of speaker
WO2019225801A1 (en) * 2018-05-23 2019-11-28 한국과학기술원 Method and system for simultaneously recognizing emotion, age, and gender on basis of voice signal of user
WO2019222996A1 (en) * 2018-05-25 2019-11-28 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for voice recognition
CN108924218A (en) * 2018-06-29 2018-11-30 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN109299701A (en) * 2018-10-15 2019-02-01 南京信息工程大学 Expand the face age estimation method that more ethnic group features cooperate with selection based on GAN
CN110136726A (en) * 2019-06-20 2019-08-16 厦门市美亚柏科信息股份有限公司 A kind of estimation method, device, system and the storage medium of voice gender
CN110556129A (en) * 2019-09-09 2019-12-10 北京大学深圳研究生院 Bimodal emotion recognition model training method and bimodal emotion recognition method
CN110619889A (en) * 2019-09-19 2019-12-27 Oppo广东移动通信有限公司 Sign data identification method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021159902A1 (en) * 2020-02-12 2021-08-19 深圳壹账通智能科技有限公司 Age recognition method, apparatus and device, and computer-readable storage medium
CN114708872A (en) * 2022-03-22 2022-07-05 青岛海尔科技有限公司 Voice instruction response method and device, storage medium and electronic device

Also Published As

Publication number Publication date
WO2021159902A1 (en) 2021-08-19

Similar Documents

Publication Publication Date Title
CN109658923B (en) Speech quality inspection method, equipment, storage medium and device based on artificial intelligence
CN104732978B (en) The relevant method for distinguishing speek person of text based on combined depth study
CN111312286A (en) Age identification method, age identification device, age identification equipment and computer readable storage medium
CN110110038B (en) Telephone traffic prediction method, device, server and storage medium
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
CN110556126A (en) Voice recognition method and device and computer equipment
CN111816185A (en) Method and device for identifying speaker in mixed voice
CN115394318A (en) Audio detection method and device
CN114495217A (en) Scene analysis method, device and system based on natural language and expression analysis
CN110797046B (en) Method and device for establishing prediction model of voice quality MOS value
CN111523317A (en) Voice quality inspection method and device, electronic equipment and medium
CN111640450A (en) Multi-person audio processing method, device, equipment and readable storage medium
CN108717851A (en) A kind of audio recognition method and device
CN113409774A (en) Voice recognition method and device and electronic equipment
CN110931020B (en) Voice detection method and device
CN114218428A (en) Audio data clustering method, device, equipment and storage medium
CN111341304A (en) Method, device and equipment for training speech characteristics of speaker based on GAN
CN113593525A (en) Method, device and storage medium for training accent classification model and accent classification
CN112489678A (en) Scene recognition method and device based on channel characteristics
CN111639549A (en) Method and device for determining service satisfaction degree and electronic equipment
CN106971725B (en) Voiceprint recognition method and system with priority
CN110689875A (en) Language identification method and device and readable storage medium
CN113409763B (en) Voice correction method and device and electronic equipment
CN111179942B (en) Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and computer readable storage medium
CN108744498A (en) A kind of virtual game quick start method based on double VR equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination