WO2019080502A1

WO2019080502A1 - Voice-based disease prediction method, application server, and computer readable storage medium

Info

Publication number: WO2019080502A1
Application number: PCT/CN2018/089428
Authority: WO
Inventors: 梁浩; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2017-10-23
Filing date: 2018-06-01
Publication date: 2019-05-02
Also published as: CN108053841A

Abstract

Disclosed in the present application is a voice-based disease prediction method. The method comprises: using training data to train a deep neural network model, the training data having a specific voice type, and the deep neural network model having an input layer and an output layer; obtaining patient voice data in real time; performing data processing on the patient voice data; sending the processed patient voice data to the input layer of the trained deep neural network model; obtaining an output state of the output layer of the deep neural network model; and determining the type of the patient voice data according to the obtained output state. The present invention further provides an application server. According to the voice-based disease prediction method and the application server provided by the present application, an initial diagnosis may be quickly made for a patient on the basis of the voice of the patient, thus providing certain data support and references for the follow up official diagnose of the doctor, and greatly facilitating doctors and patients.

Description

Method for predicting disease using voice, application server, and computer readable storage medium

This application is based on the priority of the Chinese Patent Application entitled "Method and Application Server for Using the Voice for Disease Prediction", which is filed on October 23, 2017, with the application number of CN 201710995691.7, which is filed on October 23, 2017. The manner of reference is incorporated in the present application.

Technical field

The present application relates to the field of disease prediction, and in particular, to a method for predicting disease using voice, an application server, and a computer readable storage medium.

Background technique

Listening to the diagnosis of the disease, it is considered to be an effective diagnostic tool for Chinese medicine practitioners. A good Chinese medicine practitioner often has rich experience in practicing medicine and needs time to accumulate. In the course of treatment, due to the limited number of experts, patients often need to wait very much. It takes a long time to get your own diagnosis. Difficulties in medical treatment and low efficiency are prominent issues of people's livelihood. How to improve the efficiency of diagnosis, so that everyone can have their own family doctor is an urgent problem to be solved in the new era.

Summary of the invention

The present application provides a method for predicting disease using voice, an application server, and a computer readable storage medium, which can conveniently perform a preliminary diagnosis of a patient through a patient's voice before the patient performs formal treatment, thereby being a follow-up doctor. The formal diagnosis provides a certain amount of data support and reference, which greatly facilitates doctors and patients.

A first aspect of the present application provides a method for predicting disease using voice, the method being applied to an application server, the method comprising:

Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;

Obtain real-time patient voice data;

Performing data processing on the patient voice data;

Transmitting the processed patient voice data into an input layer of the trained deep neural network model;

Obtaining an output state of an output layer of the deep neural network model; and

The category to which the patient voice data belongs is determined according to the acquired output state.

A second aspect of the present application provides an application server, where the application server includes a memory, a processor, and a program for performing disease prediction using voice that can be run on the processor, where the disease is performed by using a voice When the predicted program is executed by the processor, the following steps are implemented:

Obtain real-time patient voice data;

Performing data processing on the patient voice data;

A third aspect of the present application provides a computer readable storage medium storing a program for performing disease prediction using voice, the program for performing disease prediction using voice may be executed by at least one processor to enable The at least one processor performs the following steps:

Obtain real-time patient voice data;

Performing data processing on the patient voice data;

Compared with the prior art, the application server proposed by the present application, the method for predicting disease using voice, and the computer readable storage medium, firstly, training the deep neural network model by using training data, the training data having a specific voice category, The deep neural network model has an input layer and an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; then, The processed patient voice data is sent to the input layer of the deep neural network model after training; in addition, the output state of the output layer of the deep neural network model is acquired; finally, the output state is determined according to the acquired output state The category to which the patient's voice data belongs. In this way, it is possible to avoid the limited number of experts in the prior art, and the patient needs to wait for a long time to obtain his own diagnosis result, which causes the difficulty of medical treatment and low efficiency, and can be conveniently passed before the patient undergoes formal treatment. The voice quickly diagnoses the patient, which provides a certain data support and reference for the follow-up doctor's formal diagnosis, which greatly facilitates the doctor and the patient.

DRAWINGS

1 is a schematic diagram of an optional hardware architecture of an application server;

2 is a program block diagram of a first embodiment of a program for predicting disease using speech using the present application;

3 is a structural diagram of a deep neural network model in a preferred embodiment of the present application;

4 is a flow chart of a first embodiment of a method for disease prediction using speech;

FIG. 5 is a flow chart of a second embodiment of a method for disease prediction using voice.

Reference mark:

应用服务器application server	11
存储器Memory	1111
处理器processor	1212
网络接口Network Interface	1313
利用语音进行疾病预测的程序Procedure for disease prediction using speech	200200
训练模块Training module	2020
获取模块Acquisition module	21twenty one
数据处理模块Data processing module	22twenty two
输入模块Input module	23twenty three
判断模块Judgment module	24twenty four

Detailed ways

The principles and features of the present application are described in the following with reference to the accompanying drawings, which are only used to explain the present application and are not intended to limit the scope of the application.

Referring to FIG. 1, it is a schematic diagram of an optional hardware architecture of the application server 1.

The application server 1 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The application server 1 may be a stand-alone server or a server cluster composed of multiple servers.

In this embodiment, the application server 1 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus.

The application server 1 connects to the network through the network interface 13 to obtain information. The network may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, or a 5G network. Wireless or wired networks such as networks, Bluetooth, Wi-Fi, and call networks.

It is pointed out that Figure 1 only shows the application server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.

The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), and a random access memory (RAM). , static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 11 may be an internal storage unit of the application server 1, such as a hard disk or memory of the application server 1. In other embodiments, the memory 11 may also be an external storage device of the application server 1, such as a plug-in hard disk equipped with the application server 1, a smart memory card (SMC), and a secure digital ( Secure Digital, SD) cards, flash cards, etc. Of course, the memory 11 can also include both the internal storage unit of the application server 1 and its external storage device. In this embodiment, the memory 11 is generally used to store an operating system installed in the application server 1 and various types of application software, such as program codes of the program 200 for performing disease prediction using voice. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.

The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the application server 1, such as performing data interaction or communication related control and processing, and the like. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as running the program 200 for performing disease prediction using voice.

The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the application server 1 and other electronic devices.

In this embodiment, a program 200 for performing disease prediction using voice is installed and run in the application server 1. When the program 200 for performing disease prediction using voice is running, the application server 1 trains a deep neural network by using training data. a model, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the voice category; acquiring real-time patient voice data; Data is processed by data; the processed patient voice data is sent to the input layer of the trained deep neural network model; the output state of the output layer of the deep neural network model is obtained; and the output is obtained according to the acquired The state determines the category to which the patient's voice data belongs. In this way, it is possible to avoid the limited number of experts in the prior art, and the patient needs to wait for a long time to obtain his own diagnosis result, which causes the difficulty of medical treatment and low efficiency, and can be conveniently passed before the patient undergoes formal treatment. The voice quickly diagnoses the patient, which provides a certain data support and reference for the follow-up doctor's formal diagnosis, which greatly facilitates the doctor and the patient.

So far, the hardware structure and functions of the related devices of the various embodiments of the present application have been described in detail. Hereinafter, various embodiments of the present application will be proposed based on the above-described application environment and related devices.

First, the present application proposes a procedure 200 for predicting disease using speech.

In this embodiment, the program 200 for performing disease prediction using voice includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the embodiments of the present application may be implemented. Control operations for disease prediction. In some embodiments, the program 200 for predicting disease using speech may be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the program 200 for predicting disease using speech may be divided into a training module 20, an acquisition module 21, a data processing module 22, an input module 23, and a determination module 24. among them:

The training module 20 is configured to train the deep neural network model with the training data.

Specifically, the training data refers to the voice sample data used for the training of the deep neural network model, and the number of the voice sample data is extracted according to actual needs. The number of the voice sample data is not specifically limited in this embodiment. The training data has a particular speech category including a severe cold, a mild cold, a severe cough, a mild cough, and a non-disease, the state of the speech category being the probability of occurrence of the speech category.

In this embodiment, the deep neural network model has an input layer and an output layer. Further, the deep neural network model further has a hidden layer. The output layer can output the status of the voice class.

As shown in FIG. 3, it is a structural diagram of a deep neural network model in this embodiment. The deep neural network includes an input layer 201, a plurality of hidden layers 202, and a plurality of output layers 203. The input layer 201 is configured to calculate an output value of the hidden layer unit input to the lowest layer according to the voice feature data input to the deep neural network. The voice feature data refers to voice data extracted from the training data. The hidden layer 202 is configured to perform weighted summation on the input values from the next layer of hidden layers according to the weighting value of the layer, and calculate an output value outputted to the upper layer of the hidden layer. The output layer 203 is configured to perform weighted summation on the output values of the hidden layer from the uppermost layer according to the weighting value of the layer, and calculate an output probability according to the result of the weighted summation. The output probability is an output probability corresponding to the training data of the voice category. Training data such as severe cold, mild cold, severe cough, mild cough, and non-disease are introduced into the basic deep neural network model to calculate the output probability corresponding to the training data of various speech categories.

The calculation of an output value of each layer of the deep neural network can be obtained according to the following formula:

y _j =wx _j , where y _j represents the output value of the jth training data of the current layer, w represents the weighting value of the current layer, and x _j represents the input value of the jth training data of the current layer.

After the training module 20 calculates the weighted summation result of the output layer by using the weighting value of the output layer 203, the output function of the output layer is calculated by using a softmax function. The softmax function is as follows:

p _j =exp(x _j )

Where p _j represents the output probability of the jth training data in the output layer, and x _j represents the weighted summation result of the jth training data in the output layer.

After the training module 20 determines the structure of the deep neural network, it is necessary to determine the weighting values of the layers of the deep neural network. When training the deep neural network with all the voice feature data, the training module 20 inputs all the voice feature data from the input layer of the deep neural network to the deep neural network, and obtains the output probability of the deep neural network, and calculates An error between the output probability and the expected output probability, and adjusting a weighting value of a hidden layer of the deep neural network according to an error between an output probability of the depth neural network and the expected output probability. After obtaining the weighted values of the adjusted layers, the trained deep neural network model is obtained.

The obtaining module 21 is configured to acquire real-time patient voice data. Specifically, the obtaining module 21 records the telephone voice entered by the patient through the recording device of the call center, and stores the telephone voice with the telephone number as an identifier to obtain real-time patient voice data. The call center can be, but is not limited to, a telephone recording platform of a hospital and a remote server connected by a mobile phone app. In addition, the obtaining module 21 can also actively take patient voice data. For example, in a hospital, a nurse can use a special recording device to specifically collect voice data for a patient, and use the patient name (or other attribute data representing the patient identity information, For example, ID number, social security card number, etc.) are stored for identification.

After the obtaining module 21 acquires the real-time patient voice data, the data processing module 22 is configured to perform data processing on the patient voice data. Specifically, the data processing module 22 performs front-end processing on the acquired patient voice data, where the front-end processing includes noise reduction and endpoint detection. Further, the data processing module 22 further performs feature value extraction and selection of the speech signal on the patient speech data processed in the previous stage.

In this embodiment, the endpoint detection is used to determine whether the patient voice data to be processed is valid voice. If it is not valid voice, the voice data is not processed, thereby improving the efficiency of the overall system. In addition, the feature values that the data processing module 22 needs to extract include time domain feature parameters and time domain feature parameters, wherein the time domain feature parameters include short time average energy, short time average amplitude, short time average zero crossing rate, and resonance. Peak and base audio frequencies, etc., frequency domain characteristic parameters include linear prediction coefficient LPC, linear prediction cepstral coefficient LPCC, Mel freguency cepstrum coefficient (MFCC) and the like. Among them: the basic audio frequency reflects the glottal excitation characteristics, the formant reflects the characteristics of the channel response, LPC and LPCC simultaneously reflect the characteristics of glottal excitation and channel response, and MFCC simulates the human auditory characteristics. Voices of different diseases (degrees) will have different characteristic parameter values. Therefore, the degree of disease of the patient can be initially reflected by the extraction of the eigenvalues.

Further, after the data processing module 22 performs data processing on the patient voice data, the input module 23 sends the processed patient voice data to the input layer of the trained deep neural network model.

The obtaining module 21 is further configured to obtain an output state of the output layer of the deep neural network model after the processed patient voice data is sent to the input layer of the deep neural network model after training.

The determining module 24 determines the category to which the patient voice data belongs according to the acquired output state. In order to clearly and intuitively obtain the category to which the patient voice data belongs, the training module 20 is further configured to establish a mapping relationship between each voice category and a desired state of each voice category outputted in the trained deep neural network model, In this way, the determining module 24 matches the acquired output state with the expected state in the mapping relationship table, and obtains the corresponding voice category in the mapping relationship table according to the matching, and can determine the location. The patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.

In this embodiment, the expected state output by the respective voice categories in the deep neural network model is a desired probability that each voice category outputs in the trained deep neural network model, such as patient voice data input after training. The output state obtained in the deep neural network model matches the expected probability of the severe cold such speech class in the post-training deep neural network model, and the patient can be judged to be a severe cold, thereby providing certain data for the diagnosis of the follow-up doctor. support.

Through the above-mentioned program modules 20-24, the program 200 for predicting diseases using speech proposed by the present application firstly trains a deep neural network model using training data, the training data having a specific speech category, and the deep neural network model has An input layer and an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; and then, the processed patient voice Data is sent to the input layer of the deep neural network model after training; in addition, an output state of the output layer of the deep neural network model is acquired; finally, the category to which the patient voice data belongs is determined according to the acquired output state . In this way, it is possible to avoid the limited number of experts in the prior art, and the patient needs to wait for a long time to obtain his own diagnosis result, which causes the difficulty of medical treatment and low efficiency, and can be conveniently passed before the patient undergoes formal treatment. The voice quickly diagnoses the patient, which provides a certain data support and reference for the follow-up doctor's formal diagnosis, which greatly facilitates the doctor and the patient.

In addition, the present application also proposes a method for predicting disease using speech.

Referring to FIG. 4, it is a flowchart of a first embodiment of a method for predicting disease using speech using the present application. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 4 may be changed according to different requirements, and some steps may be omitted.

Step S401, training the deep neural network model with the training data.

The application server 1 calculates a weighted summation result of the output layer by using the weighting value of the output layer 203, and then calculates an output function of the output layer by using a softmax function. The softmax function is as follows:

p _j =exp(x _j )

After determining the structure of the deep neural network, the application server 1 needs to determine the weighting values of the layers of the deep neural network. When the deep neural network is trained by using all the voice feature data, the application server 1 inputs all the voice feature data from the input layer of the deep neural network to the deep neural network, and obtains the output probability of the deep neural network, and calculates An error between the output probability and the expected output probability, and adjusting a weighting value of a hidden layer of the deep neural network according to an error between an output probability of the depth neural network and the expected output probability. After obtaining the weighted values of the adjusted layers, the trained deep neural network model is obtained.

In step S402, real-time patient voice data is acquired. Specifically, the application server 1 records the telephone voice entered by the patient through the recording device of the call center, and stores the telephone voice as a logo of the telephone number to obtain real-time patient voice data. The call center can be, but is not limited to, a telephone recording platform of a hospital and a remote server connected by a mobile phone app. In addition, the application server 1 can also actively take patient voice data. For example, in a hospital, a nurse can use a special recording device to specifically collect voice data for a patient, and use the patient name (or other attribute data representing the patient identity information, For example, ID number, social security card number, etc.) are stored for identification.

Step S403, performing data processing on the patient voice data. Specifically, the step of performing data processing on the patient voice data is described in detail in the second embodiment (see FIG. 5) of the method for predicting disease using voice in the present application.

Step S404, the processed patient voice data is sent to the input layer of the trained deep neural network model.

Step S405, acquiring an output state of an output layer of the deep neural network model. In this embodiment, the expected state output by the respective voice categories in the deep neural network model is a desired probability that each voice category outputs in the trained deep neural network model.

Step S406, determining, according to the acquired output state, a category to which the patient voice data belongs.

In order to clearly and intuitively obtain the category to which the patient voice data belongs, before determining the category to which the patient voice data belongs according to the acquired output state, the application server 1 also establishes each voice category and each voice category after the training. a mapping table between the expected states of the output in the deep neural network model, such that the application server 1 matches the acquired output state with the expected state in the mapping relationship table, and obtains the expectation according to the matching The status of the corresponding voice category in the mapping relationship table can determine that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.

For example, the output state obtained by inputting patient speech data into the trained deep neural network model matches the expected probability of the severe cold such speech class in the trained deep neural network model, and then the patient may be determined to be a severe cold, and further Provide some data support for the follow-up doctor's diagnosis.

Through the above steps S401-406, the method for predicting disease using speech proposed by the present application firstly trains a deep neural network model using training data, the training data has a specific speech category, and the deep neural network model has an input layer. And an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; and then send the processed patient voice data to Entering an input layer of the deep neural network model after training; in addition, acquiring an output state of an output layer of the deep neural network model; and finally, determining a category to which the patient voice data belongs according to the acquired output state. In this way, it is possible to avoid the limited number of experts in the prior art, and the patient needs to wait for a long time to obtain his own diagnosis result, which causes the difficulty of medical treatment and low efficiency, and can be conveniently passed before the patient undergoes formal treatment. The voice quickly diagnoses the patient, which provides a certain data support and reference for the follow-up doctor's formal diagnosis, which greatly facilitates the doctor and the patient.

As shown in FIG. 5, it is a flowchart of a second embodiment of the method for predicting disease using speech using the present application. In this embodiment, the step of performing data processing on the patient voice data includes:

Step S501, performing front end processing on the acquired patient voice data. Specifically, the front segment processing includes noise reduction and endpoint detection. In this embodiment, the endpoint detection is used to determine whether the patient voice data to be processed is valid voice. If it is not valid voice, the voice data is not processed, thereby improving the efficiency of the overall system.

Step S502, performing feature value extraction and selection of the speech signal on the patient speech data processed in the previous stage.

In this embodiment, the feature values that the application server 1 needs to extract include time domain feature parameters and frequency domain feature parameters, wherein the time domain feature parameters include short time average energy, short time average amplitude, and short time average zero crossing rate. , formant, fundamental frequency, etc., frequency domain characteristic parameters include linear prediction coefficient LPC, linear prediction cepstral coefficient LPCC, Mel freguency cepstrum coefficient (MFCC) and so on. Among them: the basic audio frequency reflects the glottal excitation characteristics, the formant reflects the characteristics of the channel response, LPC and LPCC simultaneously reflect the characteristics of glottal excitation and channel response, and MFCC simulates the human auditory characteristics. Voices of different diseases (degrees) will have different characteristic parameter values. Therefore, the degree of disease of the patient can be initially reflected by the extraction of the eigenvalues.

Through the above steps S501-502, the method for predicting disease using voice proposed by the present application improves the efficiency of the overall system by performing front-end processing on the acquired patient voice data. And through the extraction and selection of the eigenvalues of the speech signal from the patient speech data processed in the previous segment, the patient's disease degree is initially reflected.

The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structural transformation, or direct/indirect use, of the present application and the contents of the drawings is used in the application of the present application. All other related technical fields are included in the patent protection scope of the present application.

Claims

A method for predicting disease using voice, applied to an application server, wherein the method comprises:

Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;

Obtain real-time patient voice data;

Performing data processing on the patient voice data;

Transmitting the processed patient voice data into an input layer of the trained deep neural network model;

Obtaining an output state of an output layer of the deep neural network model; and

The category to which the patient voice data belongs is determined according to the acquired output state.
The method for predicting disease by voice according to claim 1, wherein before the step of determining the category to which the patient voice data belongs according to the acquired output state, the method further comprises:

A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
The method for predicting disease by voice according to claim 2, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:

Matching the obtained output state with a desired state in the mapping relationship table; and

And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
The method for predicting disease using voice according to claim 1, wherein the step of acquiring real-time patient voice data comprises:

The telephone voice entered by the patient is recorded by the recording device of the call center, and the telephone voice is stored with the telephone number as an identifier.
The method for predicting disease by voice according to claim 4, wherein before the step of determining the category to which the patient voice data belongs according to the acquired output state, the method further comprises:

A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
The method for predicting a disease by using a voice according to claim 5, wherein the step of determining a category to which the patient voice data belongs according to the obtained output state comprises:

Matching the obtained output state with a desired state in the mapping relationship table; and

And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
The method for performing disease prediction using voice according to claim 1, wherein the step of performing data processing on the patient voice data comprises:

Performing front-end processing on the acquired patient voice data, the front-end processing including noise reduction and endpoint detection;

The feature value extraction and selection of the speech signal is performed on the patient speech data processed in the previous stage.
The method for predicting disease by voice according to claim 7, wherein before the step of determining the category to which the patient voice data belongs according to the acquired output state, the method further comprises:

A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
The method for predicting disease by voice using the voice according to claim 8, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:

Matching the obtained output state with a desired state in the mapping relationship table; and

And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
An application server, comprising: a memory, a processor, wherein the memory stores a program for performing disease prediction using voice on the processor, where the voice is used for disease prediction The program implements the following steps when executed by the processor:

Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;

Obtain real-time patient voice data;

Performing data processing on the patient voice data;

Transmitting the processed patient voice data into an input layer of the trained deep neural network model;

Obtaining an output state of an output layer of the deep neural network model; and

The category to which the patient voice data belongs is determined according to the acquired output state.
The application server according to claim 10, wherein said program for performing disease prediction using speech is processed by said step of determining said category to which said patient voice data belongs based on said acquired output state When the device is executed, the following steps are also implemented:

A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
The application server according to claim 11, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:

Matching the obtained output state with a desired state in the mapping relationship table; and

And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
The application server according to claim 10, wherein the step of acquiring real-time patient voice data comprises:

The telephone voice entered by the patient is recorded by the recording device of the call center, and the telephone voice is stored with the telephone number as an identifier.
The application server according to claim 13, wherein said program for performing disease prediction using speech is processed by said step of determining said category to which said patient voice data belongs based on said acquired output state When the device is executed, the following steps are also implemented:

A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
The application server according to claim 14, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:

Matching the obtained output state with a desired state in the mapping relationship table; and

And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
The application server according to claim 10, wherein the step of performing data processing on the patient voice data comprises:

Performing front-end processing on the acquired patient voice data, the front-end processing including noise reduction and endpoint detection;

The feature value extraction and selection of the speech signal is performed on the patient speech data processed in the previous stage.
The application server according to claim 16, wherein said program for performing disease prediction using speech is processed by said step of determining said category to which said patient voice data belongs based on said acquired output state When the device is executed, the following steps are also implemented:

A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
The application server according to claim 17, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:

Matching the obtained output state with a desired state in the mapping relationship table; and

And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
A computer readable storage medium, characterized in that the computer readable storage medium stores a program for disease prediction using voice, the program for predicting disease using voice can be executed by at least one processor to cause the At least one processor performs the following steps:

Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;

Obtain real-time patient voice data;

Performing data processing on the patient voice data;

Transmitting the processed patient voice data into an input layer of the trained deep neural network model;

Obtaining an output state of an output layer of the deep neural network model; and

The category to which the patient voice data belongs is determined according to the acquired output state.
The computer readable storage medium according to claim 19, wherein said step of using the voice for disease prediction is performed before said step of determining said category to which said patient voice data belongs based on said acquired output state When the at least one processor executes, the following steps are also implemented:

A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.