CN109272993A

CN109272993A - Recognition methods, device, computer equipment and the storage medium of voice class

Info

Publication number: CN109272993A
Application number: CN201810956681.7A
Authority: CN
Inventors: 易苗; 莫洋
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2018-08-21
Filing date: 2018-08-21
Publication date: 2019-01-25

Abstract

This application involves field of speech recognition, provide recognition methods, device, computer equipment and the storage medium of a kind of voice class, comprising: obtain the first voice messaging to be identified, and first voice messaging is converted to the first sound spectrograph；First sound spectrograph is input in preset Classification of Speech model, to obtain the classification results of first sound spectrograph, and using the classification results as the classification of first voice messaging；Wherein, the Classification of Speech model is composed using known emotional category or the sonagram of personality classification, is obtained based on the training of depth convolutional neural networks；Recognition methods, device, computer equipment and the storage medium of voice class provided herein, convenient for promoting the effect of emotion in voice messaging, personality classification.

Description

Recognition methods, device, computer equipment and the storage medium of voice class

Technical field

This application involves technical field of voice recognition, in particular to a kind of recognition methods of voice class, device, computer Equipment and storage medium.

Background technique

Currently, the emphasis of speech emotional, personality identification focuses primarily upon acoustic feature extraction.Know in existing speech emotional In other technology, acoustic feature for identification has prosodic features, sound quality feature, frequency spectrum correlated characteristic and features described above screening The fusion feature etc. of composition.These features mainly individually concentrate in time domain or frequency domain, for time domain, frequency domain character association variation Voice signal for, features described above often lost part characteristic information, and then influence the effect of emotion, personality identification；And And above-mentioned acoustic feature is during the extraction process, it can be by some factor (such as speech content, speaker, the rings unrelated with emotion, personality Border etc.) it influences.These incoherent factors are contained in the acoustic feature of extraction, also can greatly influence emotion, personality classification Effect.

Summary of the invention

The main purpose of the application is that the recognition methods for providing a kind of voice class, device, computer equipment and storage are situated between Matter promotes the effect of emotion in voice messaging, personality classification.

To achieve the above object, this application provides a kind of recognition methods of voice class, comprising the following steps:

The first voice messaging to be identified is obtained, and first voice messaging is converted into the first sound spectrograph；

First sound spectrograph is input in preset Classification of Speech model, to obtain the classification of first sound spectrograph As a result, and using the classification results as the classification of first voice messaging；Wherein, the Classification of Speech model is using The sonagram spectrum for knowing emotional category or personality classification, is obtained based on the training of depth convolutional neural networks；First voice messaging Classification be emotional category or personality classification.

Further, described the step of first voice messaging is converted to the first sound spectrograph, includes:

By Fourier analysis, first voice messaging is converted into corresponding first sound spectrograph.

Further, described to obtain the first voice messaging to be identified, and first voice messaging is converted to first Before the step of sound spectrograph, comprising:

Training sound spectrograph in training set is input in the depth convolutional neural networks and is trained, it is described to obtain Classification of Speech model.

Further, the training sound spectrograph by training set, which is input in the depth convolutional neural networks, instructs Practice, the step of to obtain the Classification of Speech model after, comprising:

Test sound spectrograph in test set is input to export corresponding classification results in the Classification of Speech model, and Whether identical as the test sound spectrograph classification in the test set verify the classification results.

Further, the sound spectrograph by training set is input in the depth convolutional neural networks and is trained, Before the step of obtaining the Classification of Speech model, comprising:

Second voice messaging of each known class is converted into corresponding second sound spectrograph respectively, and by second language Spectrogram is assigned as the training sound spectrograph in training set and the test sound spectrograph in test set according to setting ratio.

Further, described that first sound spectrograph is input in preset Classification of Speech model, to obtain described The classification results of one sound spectrograph, and using the classification results as after the step of the classification of first voice messaging, comprising:

According to the classification of first voice messaging, the default response message of the corresponding classification of matching, and will be described pre- If response message pushes to customer service terminal.

Obtain the identity information of the source user of first voice messaging, and by the classification of first voice messaging with And the identity information establish binding relationship after be stored in database profession.

Present invention also provides a kind of identification devices of voice class, comprising:

Converting unit is converted to first for obtaining the first voice messaging to be identified, and by first voice messaging Sound spectrograph；

Recognition unit, for first sound spectrograph to be input to the voice obtained based on the training of depth convolutional neural networks In disaggregated model, classification of the classification results of first sound spectrograph as first voice messaging is exported；First language The classification of message breath is emotional category or personality classification.

The application also provides a kind of computer equipment, including memory and processor, is stored with calculating in the memory The step of machine program, the processor realizes any of the above-described the method when executing the computer program.

The application also provides a kind of computer storage medium, is stored thereon with computer program, the computer program quilt The step of processor realizes method described in any of the above embodiments when executing.

Recognition methods, device, computer equipment and the storage medium of voice class provided herein have with following Beneficial effect:

Recognition methods, device, computer equipment and the storage medium of voice class provided herein obtain to be identified The first voice messaging, and first voice messaging is converted into the first sound spectrograph；First sound spectrograph is input to pre- If Classification of Speech model in, to obtain the classification results of first sound spectrograph, and using the classification results as described The classification of one voice messaging；Wherein, the Classification of Speech model is composed using known emotional category or the sonagram of personality classification, It is obtained based on the training of depth convolutional neural networks；Convenient for promoting the effect of emotion in voice messaging, personality classification.

Detailed description of the invention

Fig. 1 is the recognition methods step schematic diagram of voice class in one embodiment of the application；

Fig. 2 is the recognition methods step schematic diagram of voice class in another embodiment of the application；

Fig. 3 is the identification device structural block diagram of voice class in one embodiment of the application；

Fig. 4 is the identification device structural block diagram of voice class in another embodiment of the application；

Fig. 5 is the identification device structural block diagram of voice class in the another embodiment of the application；

Fig. 6 is the structural schematic block diagram of the computer equipment of one embodiment of the application.

The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

It referring to Fig.1, is to provide a kind of recognition methods of voice class in one embodiment of the application, comprising the following steps:

Step S1 obtains the first voice messaging to be identified, and first voice messaging is converted to the first sound spectrograph；

In the present embodiment, above-mentioned first sound spectrograph is a kind of spectrogram (including two and three dimensions), is to indicate voice frequency Compose the figure changed over time.Above-mentioned first voice messaging can be the customer voice got in customer service system, be also possible to Arbitrarily need to identify the voice messaging of classification in database.

In the present embodiment, not only can be in simultaneously by the mode for needing the first voice messaging identified to be converted into the first sound spectrograph The time domain of existing voice messaging, frequency domain character, avoid the loss of favorable characteristics information, and can reflect speaker in voice messaging Language feature.In the present embodiment, carrying out Fourier analysis to above-mentioned first voice messaging to be identified can be obtained accordingly The first sound spectrograph, the display figure that Fourier analysis obtains is known as sound spectrograph.

First sound spectrograph is input in preset Classification of Speech model by step S2, to obtain the first language spectrum The classification results of figure, and using the classification results as the classification of first voice messaging；Wherein, the Classification of Speech model It is to be composed using known emotional category or the sonagram of personality classification, is obtained based on the training of depth convolutional neural networks；Described first The classification of voice messaging is emotional category or personality classification.

In the present embodiment, the classification of above-mentioned first voice messaging refer to above-mentioned first voice messaging emotional category or Person's personality classification in the present embodiment, is mainly used for classifying to the emotion of the first voice messaging.According to the above-mentioned depth volume of training The difference of training set used in product neural network, obtained Classification of Speech model is also different, final Classification of Speech model output Classification results it is also different.Specifically, if the training set used is the training set for being labeled with emotional category, obtained voice point For carrying out identification classification to the emotion of above-mentioned first voice messaging, the output result of above-mentioned Classification of Speech model is then class model The emotional category of first voice messaging；It wherein, include a variety of emotional semantic classifications in above-mentioned emotional category, such as impatient, irascible, resistance to Heart etc..If the training set used is the training set for being labeled with personality classification, obtained Classification of Speech model is used for the first language The personality of message breath carries out identification classification, and the output result of above-mentioned Classification of Speech model is then the personality class of the first voice messaging Not；It wherein, include the classification of a variety of personality in above-mentioned personality classification, such as optimistic, pessimistic etc..

In the present embodiment, above-mentioned first sound spectrograph is input in Classification of Speech model, which is based on deep Degree convolutional neural networks training obtains comprising multiple network layers, each network layer can obtain feature map (feature Mapping), i.e. the feature (phonetic feature of namely above-mentioned first voice messaging) of image, network layer to above-mentioned first sound spectrograph into Row Level by level learning is to extract the feature of the first sound spectrograph；By the feature extraction of each network layer, more past high level, said extracted Feature more has Semantic, more has distinction and representativeness, highlights feature relevant to emotion, personality, so as to Enough protrude the difference between different sound spectrographs.By the study of multiple network layers, finally in the last of depth convolutional neural networks One layer (softmax layers) is classified, and the classification of above-mentioned first voice messaging is obtained.Conventional speech recognition methods need fixed by hand Justice chooses suitable phonetic feature, in the present embodiment, directly automatically extracts feature using depth convolutional neural networks, then lead to The last layer for crossing depth convolutional neural networks is classified.Use depth convolutional neural networks as classifier, compared to biography Classification method of uniting improves the classifying quality of emotion in voice messaging, personality with better recognition performance.

In one embodiment, above-mentioned the step of first voice messaging is converted to the first sound spectrograph, includes:

By Fourier analysis, first voice messaging is converted into corresponding first sound spectrograph.For one section first Voice messaging x (t) carries out framing to it first, becomes x (m, n), wherein n is frame length, and m is the number of frame；Then FFT change is done (Fourier transform) is changed, X (m, n) is obtained；Cyclic graph Y (m, n) is again, wherein Y (m, n)=X (m, n) * X (m, n) ')；Then it takes 10*log¹⁰(Y (m, n)), n)), it according to time change is scale M by above-mentioned m, n is scale N according to frequency variation, finally by M, N, 10*log¹⁰(Y (m, n)) is drawn as X-Y scheme and obtains above-mentioned first sound spectrograph (can also be drawn as three-dimensional figure).

Referring to Fig. 2, in one embodiment, above-mentioned acquisition the first voice messaging to be identified, and first voice is believed Breath is converted to before the step S1 of the first sound spectrograph, comprising:

Training sound spectrograph in training set is input in the depth convolutional neural networks and is trained by step S101, To obtain the Classification of Speech model.

In this step, above-mentioned depth convolutional neural networks are trained, in advance to obtain the Classification of Speech model.Specifically Ground, it uses sound spectrograph is largely trained in training set, which is known emotional category or personality classification , it is input in above-mentioned depth convolutional neural networks and is trained, and it is (identical that its output result is substantially equal to In) its corresponding emotional category or personality classification, obtain corresponding training parameter；Above-mentioned training parameter is input to depth volume In product neural network, to obtain optimal above-mentioned Classification of Speech model.Then, then the first unknown voice messaging can be converted It at sound spectrograph, then is input in above-mentioned Classification of Speech model, then can export the corresponding classification of the first voice messaging.

In the present embodiment, the above-mentioned training sound spectrograph by training set, which is input in the depth convolutional neural networks, to be carried out Training, after obtaining the step S101 of the Classification of Speech model, comprising:

Test sound spectrograph in test set is input in the Classification of Speech model to export corresponding point by step S102 Class is as a result, whether and to verify the classification results identical as the test sound spectrograph classification in the test set.

Test sound spectrograph in above-mentioned test set is the sound spectrograph of known class.In order to verify above-mentioned Classification of Speech model Test sound spectrograph a large amount of in test set is input in above-mentioned Classification of Speech model and learns by classification accuracy, judgement Whether its classification results for corresponding to output is identical as classification known to the test sound spectrograph in test set.Finally, being tested to improve The accuracy of card can also count the accuracy rate of the corresponding above-mentioned classification results of all test sound spectrographs, when accuracy rate is greater than one When a setting value, then illustrate that above-mentioned Classification of Speech category of model is accurate.

In one embodiment, the above-mentioned sound spectrograph by training set is input in the depth convolutional neural networks and instructs Practice, before obtaining the step S101 of the Classification of Speech model, comprising:

Second voice messaging of each known class is converted into corresponding second sound spectrograph by step S10 respectively, and by institute It states the second sound spectrograph and is assigned as the training sound spectrograph in training set and the test sound spectrograph in test set according to setting ratio.

In the present embodiment, the second voice messaging of above-mentioned known class is corresponded into the step of being converted into the second sound spectrograph It is similar to above-mentioned steps S1, it is different only in that, the voice messaging being directed to is different, and the second voice messaging in the present embodiment is A kind of voice messaging of known class classification, is translated into after the second sound spectrograph, the second obtained sound spectrograph is also known The data of emotional category or personality classification.The training language above-mentioned second sound spectrograph being assigned as according to setting ratio in training set Test sound spectrograph in spectrogram and test set；For example, by the second sound spectrograph according to 4:1 pro rate be training set in instruction Practice the test sound spectrograph in sound spectrograph and test set, so that the data volume ratio of training set and test set is 4:1.

In one embodiment, above-mentioned that first sound spectrograph is input in preset Classification of Speech model, to obtain State the classification results of the first sound spectrograph, and using the classification results as the step S2 of the classification of first voice messaging it Afterwards, comprising:

Step S3a, according to the classification of first voice messaging, the default response message of the corresponding classification of matching, and The default response message is pushed into customer service terminal.

In one embodiment, the above method is applied in customer service call scene of attending a banquet, and different phonetic is preset in database Default response message corresponding to information category.By the recognition methods of above-mentioned voice class to the first voice messaging of client into It is different according to the classification of the first voice messaging of client after the identification of row classification, the default response message of the corresponding classification of matching, And the default response message is pushed into customer service terminal, response is made according to above-mentioned default response message convenient for customer service；Such as visitor When the emotion behavior at family is impatient, above-mentioned default response message, which then can be prompting and attend a banquet, to be switched topic or hangs up.

In another embodiment, the above method is applied recommends in scene in insurance, by the above method to the first of client After voice messaging carries out the classification of personality classification, according to the difference of client's personality, corresponding prompting message is sent to customer service, and Different insurance products are pushed to customer service terminal according to different client's personality classifications, are that lead referral insurance produces convenient for customer service Product.

In another embodiment, above-mentioned that first sound spectrograph is input in preset Classification of Speech model, to obtain The classification results of first sound spectrograph, and using the classification results as the step S2 of the classification of first voice messaging it Afterwards, comprising:

Step S3b, obtains the identity information of the source user of first voice messaging, and by first voice messaging Classification and the identity information establish binding relationship after be stored in database profession.For example, recommending to be convenient in scene in insurance Identity information of the insurance agent from database according to client gets the personality of the corresponding client, so as to insurance agent's needle Different session schemes is made to the personality of client.

In another embodiment, the classification of above-mentioned first voice messaging is personality classification, and the above method is applied to social flat In platform.It is above-mentioned that first sound spectrograph is input in preset Classification of Speech model, to obtain point of first sound spectrograph Class as a result, and using the classification results as the step S2 of the classification of first voice messaging after, comprising:

According to the personality classification of first voice messaging, matched in social data library and first voice messaging The target user that personality classification matches, and the social information of the target user is recommended into coming for first voice messaging Source user；Alternatively, being also possible to push away the corresponding social information of source user of first voice messaging in other embodiments Give target user.Wherein, above-mentioned first voice messaging is to be sent by social platform, can be taken when sending the first voice messaging Social information (id information, gender etc.) with source user；A large amount of user information and its corresponding is stored in social data library Personality classification.

In conclusion obtaining the first language to be identified for the recognition methods of the voice class provided in the embodiment of the present application Message breath, and first voice messaging is converted into the first sound spectrograph；First sound spectrograph is input to preset voice In disaggregated model, to obtain the classification results of first sound spectrograph, and believe the classification results as first voice The classification of breath；Wherein, time domain, the frequency domain character of voice messaging can be not only presented in the first sound spectrograph simultaneously, and favorable characteristics is avoided to believe The loss of breath, and can reflect the language feature of speaker in voice messaging；First sound spectrograph is input to Classification of Speech model In, by the layer-by-layer feature extraction of multiple network layers, more past high level, the feature of extraction more has Semantic, more has and distinguishes Property and representativeness, highlight feature relevant to emotion, personality, so as to the difference between the different sound spectrographs of protrusion, just In the effect for promoting emotion in voice messaging, personality classification.

Referring to Fig. 3, a kind of identification device of voice class is additionally provided in one embodiment of the application, comprising:

Converting unit 10 is converted to for obtaining the first voice messaging to be identified, and by first voice messaging One sound spectrograph；

In the present embodiment, converting unit 10 will need the first voice messaging identified to be converted into the mode of the first sound spectrograph not But time domain, the frequency domain character that voice messaging can be presented simultaneously, avoid the loss of favorable characteristics information, and can reflect voice letter The language feature of speaker in breath.In the present embodiment, Fourier analysis is carried out to above-mentioned first voice messaging to be identified To obtain corresponding first sound spectrograph, the display figure that Fourier analysis obtains is known as sound spectrograph.

Recognition unit 20, it is described to obtain for first sound spectrograph to be input in preset Classification of Speech model The classification results of first sound spectrograph, and using the classification results as the classification of first voice messaging；Wherein, the voice Disaggregated model is composed using known emotional category or the sonagram of personality classification, is obtained based on the training of depth convolutional neural networks； The classification of first voice messaging is emotional category or personality classification.

In the present embodiment, above-mentioned first sound spectrograph is input in Classification of Speech model by recognition unit 20, the Classification of Speech Model is based on the training of depth convolutional neural networks and obtains comprising multiple network layers, each network layer can obtain feature Map (Feature Mapping), the i.e. feature (phonetic feature of namely above-mentioned first voice messaging) of image, network layer is to above-mentioned first Sound spectrograph carries out Level by level learning to extract the feature of the first sound spectrograph；By the feature extraction of each network layer, more past high level, on The feature for stating extraction more has Semantic, more has distinction and representativeness, highlights spy relevant to emotion, personality Sign, so as to the difference between the different sound spectrographs of protrusion.By the study of multiple network layers, finally in depth convolutional Neural net The last layer (softmax layers) of network is classified, and the classification of above-mentioned first voice messaging is obtained.Conventional speech recognition methods need It wants manual definition or chooses suitable phonetic feature, in the present embodiment, directly automatically extracted using depth convolutional neural networks Feature, then classified by the last layer of depth convolutional neural networks.Use depth convolutional neural networks as classifier, Compared to conventional sorting methods, there is better recognition performance, improve the classifying quality of emotion in voice messaging, personality.

In one embodiment, above-mentioned converting unit 10 is specifically used for:

Referring to Fig. 4, in one embodiment, the identification device of above-mentioned voice class further include:

Training unit 101, for by the training sound spectrograph in training set be input in the depth convolutional neural networks into Row training, to obtain the Classification of Speech model.

In the present embodiment, training unit 101 trains above-mentioned depth convolutional neural networks in advance, to obtain the voice point Class model.Specifically, training unit 101 has used and has largely trained sound spectrograph in training set, which is known Emotional category or personality classification, it is input in above-mentioned depth convolutional neural networks and is trained, and make its output As a result it is substantially equal to and (is identical to) its corresponding emotional category or personality classification, obtains corresponding training parameter；It will be above-mentioned Training parameter is input in depth convolutional neural networks, to obtain optimal above-mentioned Classification of Speech model.It then, then can will not The first voice messaging known is converted to sound spectrograph, then is input in above-mentioned Classification of Speech model, then can export first voice The corresponding classification of information.

In one embodiment, the identification device of above-mentioned voice class further include:

Test cell 102, for the test sound spectrograph in test set to be input in the Classification of Speech model to export Corresponding classification results, and whether verify the classification results identical as the test sound spectrograph classification in the test set.

Test sound spectrograph in above-mentioned test set is the sound spectrograph of known class.In order to verify above-mentioned Classification of Speech model Classification accuracy, test cell 102 by test sound spectrograph a large amount of in test set be input in above-mentioned Classification of Speech model into Row study judges whether it corresponds to the classification results of output identical as classification known to the test sound spectrograph in test set；Finally The accuracy rate that the corresponding above-mentioned classification results of all test sound spectrographs can also be counted, when accuracy rate is greater than a setting value, Then illustrate that above-mentioned Classification of Speech category of model is accurate.

Referring to Fig. 5, in one embodiment, the identification device of above-mentioned voice class further include:

Allocation unit 103 is composed for the second voice messaging of each known class to be converted into corresponding second language respectively Figure, and second sound spectrograph is assigned as the training sound spectrograph in training set and the test in test set according to setting ratio Sound spectrograph.

In the present embodiment, allocation unit 103 converts second for the second voice messaging one-to-one correspondence of above-mentioned known class The process of sound spectrograph is similar to the conversion process of above-mentioned converting unit 10, is different only in that, the voice letter that allocation unit 103 is directed to Breath is different, and the second voice messaging in the present embodiment is a kind of voice messaging of known class classification, is translated into second After sound spectrograph, the second obtained sound spectrograph is also the data of known emotional category or personality classification.Above-mentioned second language is composed Figure is assigned as the training sound spectrograph in training set and the test sound spectrograph in test set according to setting ratio；For example, by second Sound spectrograph is the test sound spectrograph in the training sound spectrograph and test set in training set according to the pro rate of 4:1, so that The data volume ratio of training set and test set is 4:1.

First matching unit, for the classification according to first voice messaging, corresponding the default of classification of matching is answered Information is answered, and the default response message is pushed into customer service terminal.

In one embodiment, the identification device of above-mentioned voice class is applied in customer service call scene of attending a banquet, in database It is preset with default response message corresponding to different phonetic information category.By the recognition methods of above-mentioned voice class to client's It is different according to the classification of the first voice messaging of client after first voice messaging carries out classification identification, the corresponding classification of matching Default response message, and the default response message is pushed into customer service terminal, is believed convenient for customer service according to above-mentioned default response Breath makes response；If the emotion behavior of client is when being impatient of, above-mentioned default response message, which then can be prompting and attend a banquet, switches words It inscribes or hangs up.

In another embodiment, above-mentioned apparatus is applied recommends in scene in insurance, by the above method to the first of client After voice messaging carries out the classification of personality classification, according to the difference of client's personality, corresponding prompting message is sent to customer service, and Different insurance products are pushed to customer service terminal according to different client's personality classifications, are that lead referral insurance produces convenient for customer service Product.

In another embodiment, the identification device of above-mentioned voice class further include:

Storage unit, the identity information of the source user for obtaining first voice messaging, and by first language The classification of message breath and the identity information are stored in database profession after establishing binding relationship.For example, recommending scene in insurance In, the personality of the corresponding client is got, convenient for identity information of the insurance agent from database according to client to insure Agent makes different session schemes for the personality of client.

In another embodiment, the classification of above-mentioned first voice messaging is personality classification.The identification of above-mentioned voice class fills It sets further include:

Second matching unit, for the personality classification according to first voice messaging, in social data library matching with The target user that the personality classification of first voice messaging matches, and the social information of the target user is recommended into institute State the source user of the first voice messaging.Alternatively, being also possible to use in the source of first voice messaging in other embodiments The corresponding social information in family is pushed to target user.Wherein, above-mentioned first voice messaging is to be sent by social platform, is sent The social information (id information, gender etc.) of source user can be carried when the first voice messaging；It is stored in social data library a large amount of User information and its corresponding personality classification.

In conclusion for the identification device of the voice class provided in the embodiment of the present application, converting unit 10 is obtained wait know Other first voice messaging, and first voice messaging is converted into the first sound spectrograph；Recognition unit 20 is by first language Spectrogram is input in preset Classification of Speech model, to obtain the classification results of first sound spectrograph, and the classification is tied Classification of the fruit as first voice messaging；Wherein, time domain, the frequency domain of voice messaging can be not only presented in the first sound spectrograph simultaneously Feature, avoids the loss of favorable characteristics information, and can reflect the language feature of speaker in voice messaging；First sound spectrograph It is input in Classification of Speech model, by the layer-by-layer feature extraction of multiple network layers, the feature of more past high level, extraction more has language Justice more has distinction and representativeness, feature relevant to emotion, personality is highlighted, so as to the different languages of protrusion Difference between spectrogram, convenient for promoting the effect of emotion in voice messaging, personality classification.

Referring to Fig. 6, a kind of computer equipment is also provided in the embodiment of the present application, which can be server, Its internal structure can be as shown in Figure 6.The computer equipment includes processor, the memory, network connected by system bus Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program And database.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium. The database of the computer equipment is for data such as storaged voice disaggregated models.The network interface of the computer equipment be used for it is outer The terminal in portion passes through network connection communication.A kind of identification side of voice class is realized when the computer program is executed by processor Method.

Above-mentioned processor executes the step of recognition methods of above-mentioned voice class:

In one embodiment, the step of first voice messaging is converted to the first sound spectrograph by the processor include:

In one embodiment, the processor obtains the first voice messaging to be identified, and by first voice messaging Before the step of being converted to the first sound spectrograph, comprising:

In one embodiment, the training sound spectrograph in training set is input to the depth convolutional Neural net by the processor Be trained in network, the step of to obtain the Classification of Speech model after, comprising:

In one embodiment, the sound spectrograph in training set is input in the depth convolutional neural networks by the processor Be trained, the step of to obtain the Classification of Speech model before, comprising:

In one embodiment, first sound spectrograph is input in preset Classification of Speech model by the processor, with The classification results of first sound spectrograph are obtained, and using the classification results as the step of the classification of first voice messaging Later, comprising:

It will be understood by those skilled in the art that structure shown in Fig. 6, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.

One embodiment of the application also provides a kind of computer storage medium, is stored thereon with computer program, computer journey A kind of recognition methods of voice class is realized when sequence is executed by processor, specifically:

In conclusion for the recognition methods of the voice class provided in the embodiment of the present application, device, computer equipment and depositing Storage media obtains the first voice messaging to be identified, and first voice messaging is converted to the first sound spectrograph；By described One sound spectrograph is input in preset Classification of Speech model, to obtain the classification results of first sound spectrograph, and will be described point Classification of the class result as first voice messaging；Wherein, the first sound spectrograph can not only present simultaneously voice messaging time domain, Frequency domain character, avoids the loss of favorable characteristics information, and can reflect the language feature of speaker in voice messaging；First language Spectrogram is input in Classification of Speech model, and by the layer-by-layer feature extraction of multiple network layers, the feature of more past high level, extraction more has There is Semantic, more there is distinction and representativeness, highlight feature relevant to emotion, personality, so as to protrude not With the difference between sound spectrograph, convenient for promoting the effect of emotion in voice messaging, personality classification.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can store and a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM can by diversified forms , such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), increasing Strong type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include the other elements being not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.

The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizations Equivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlations Technical field, similarly include in the scope of patent protection of the application.

Claims

1. a kind of recognition methods of voice class, which comprises the following steps:

First sound spectrograph is input in preset Classification of Speech model, to obtain the classification knot of first sound spectrograph Fruit, and using the classification results as the classification of first voice messaging；Wherein, the Classification of Speech model is using known Emotional category or the sonagram of personality classification spectrum, are obtained based on the training of depth convolutional neural networks；First voice messaging Classification is emotional category or personality classification.

2. the recognition methods of voice class according to claim 1, which is characterized in that described by first voice messaging The step of being converted to the first sound spectrograph include:

3. the recognition methods of voice class according to claim 1, which is characterized in that described to obtain the first language to be identified Message breath, and before the step of first voice messaging is converted to the first sound spectrograph, comprising:

Training sound spectrograph in training set is input in the depth convolutional neural networks and is trained, to obtain the voice Disaggregated model.

4. the recognition methods of voice class according to claim 3, which is characterized in that the training language by training set Spectrogram is input in the depth convolutional neural networks and is trained, the step of to obtain the Classification of Speech model after, packet It includes:

Test sound spectrograph in test set is input in the Classification of Speech model to export corresponding classification results, and is verified Whether the classification results are identical as the test sound spectrograph classification in the test set.

5. the recognition methods of voice class according to claim 3, which is characterized in that the sound spectrograph by training set Be input in the depth convolutional neural networks and be trained, the step of to obtain the Classification of Speech model before, comprising:

Second voice messaging of each known class is converted into corresponding second sound spectrograph respectively, and by second sound spectrograph The training sound spectrograph in training set and the test sound spectrograph in test set are assigned as according to setting ratio.

6. the recognition methods of voice class according to claim 1, which is characterized in that described that first sound spectrograph is defeated Enter into preset Classification of Speech model, to obtain the classification results of first sound spectrograph, and using the classification results as After the step of classification of first voice messaging, comprising:

According to the classification of first voice messaging, the default response message of the corresponding classification of matching, and described preset is answered It answers information and pushes to customer service terminal.

7. the recognition methods of voice class according to claim 1, which is characterized in that described that first sound spectrograph is defeated Enter into preset Classification of Speech model, to obtain the classification results of first sound spectrograph, and using the classification results as After the step of classification of first voice messaging, comprising:

Obtain the identity information of the source user of first voice messaging, and by the classification of first voice messaging and institute It states after identity information establishes binding relationship and is stored in database profession.

8. a kind of identification device of voice class characterized by comprising

Converting unit is converted to the first language spectrum for obtaining the first voice messaging to be identified, and by first voice messaging Figure；

Recognition unit, for first sound spectrograph to be input in preset Classification of Speech model, to obtain first language The classification results of spectrogram, and using the classification results as the classification of first voice messaging；Wherein, the Classification of Speech mould Type is composed using known emotional category or the sonagram of personality classification, is obtained based on the training of depth convolutional neural networks；Described The classification of one voice messaging is emotional category or personality classification.

9. a kind of computer equipment, including memory and processor, it is stored with computer program in the memory, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.

10. a kind of computer storage medium, is stored thereon with computer program, which is characterized in that the computer program is located The step of reason device realizes method described in any one of claims 1 to 7 when executing.