WO2019080502A1 - Voice-based disease prediction method, application server, and computer readable storage medium - Google Patents

Voice-based disease prediction method, application server, and computer readable storage medium

Info

Publication number
WO2019080502A1
WO2019080502A1 PCT/CN2018/089428 CN2018089428W WO2019080502A1 WO 2019080502 A1 WO2019080502 A1 WO 2019080502A1 CN 2018089428 W CN2018089428 W CN 2018089428W WO 2019080502 A1 WO2019080502 A1 WO 2019080502A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
patient
category
neural network
voice data
Prior art date
Application number
PCT/CN2018/089428
Other languages
French (fr)
Chinese (zh)
Inventor
梁浩
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019080502A1 publication Critical patent/WO2019080502A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present application relates to the field of disease prediction, and in particular, to a method for predicting disease using voice, an application server, and a computer readable storage medium.
  • the present application provides a method for predicting disease using voice, an application server, and a computer readable storage medium, which can conveniently perform a preliminary diagnosis of a patient through a patient's voice before the patient performs formal treatment, thereby being a follow-up doctor.
  • the formal diagnosis provides a certain amount of data support and reference, which greatly facilitates doctors and patients.
  • a first aspect of the present application provides a method for predicting disease using voice, the method being applied to an application server, the method comprising:
  • Training a deep neural network model with training data the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
  • the category to which the patient voice data belongs is determined according to the acquired output state.
  • a second aspect of the present application provides an application server, where the application server includes a memory, a processor, and a program for performing disease prediction using voice that can be run on the processor, where the disease is performed by using a voice
  • the following steps are implemented:
  • Training a deep neural network model with training data the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
  • the category to which the patient voice data belongs is determined according to the acquired output state.
  • a third aspect of the present application provides a computer readable storage medium storing a program for performing disease prediction using voice, the program for performing disease prediction using voice may be executed by at least one processor to enable The at least one processor performs the following steps:
  • Training a deep neural network model with training data the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
  • the category to which the patient voice data belongs is determined according to the acquired output state.
  • the application server proposed by the present application the method for predicting disease using voice, and the computer readable storage medium, firstly, training the deep neural network model by using training data, the training data having a specific voice category,
  • the deep neural network model has an input layer and an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; then,
  • the processed patient voice data is sent to the input layer of the deep neural network model after training; in addition, the output state of the output layer of the deep neural network model is acquired; finally, the output state is determined according to the acquired output state The category to which the patient's voice data belongs.
  • 1 is a schematic diagram of an optional hardware architecture of an application server
  • FIG. 2 is a program block diagram of a first embodiment of a program for predicting disease using speech using the present application
  • FIG. 3 is a structural diagram of a deep neural network model in a preferred embodiment of the present application.
  • FIG. 4 is a flow chart of a first embodiment of a method for disease prediction using speech
  • FIG. 5 is a flow chart of a second embodiment of a method for disease prediction using voice.
  • application server 1 Memory 11 processor 12 Network Interface 13 Procedure for disease prediction using speech 200 Training module 20 Acquisition module twenty one Data processing module twenty two Input module twenty three Judgment module twenty four
  • FIG. 1 it is a schematic diagram of an optional hardware architecture of the application server 1.
  • the application server 1 may be a computing device such as a rack server, a blade server, a tower server, or a rack server.
  • the application server 1 may be a stand-alone server or a server cluster composed of multiple servers.
  • the application server 1 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus.
  • the application server 1 connects to the network through the network interface 13 to obtain information.
  • the network may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, or a 5G network.
  • Wireless or wired networks such as networks, Bluetooth, Wi-Fi, and call networks.
  • Figure 1 only shows the application server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), and a random access memory (RAM). , static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the application server 1, such as a hard disk or memory of the application server 1.
  • the memory 11 may also be an external storage device of the application server 1, such as a plug-in hard disk equipped with the application server 1, a smart memory card (SMC), and a secure digital ( Secure Digital, SD) cards, flash cards, etc.
  • the memory 11 can also include both the internal storage unit of the application server 1 and its external storage device.
  • the memory 11 is generally used to store an operating system installed in the application server 1 and various types of application software, such as program codes of the program 200 for performing disease prediction using voice. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the application server 1, such as performing data interaction or communication related control and processing, and the like.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running the program 200 for performing disease prediction using voice.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the application server 1 and other electronic devices.
  • a program 200 for performing disease prediction using voice is installed and run in the application server 1.
  • the application server 1 trains a deep neural network by using training data.
  • a model, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the voice category; acquiring real-time patient voice data; Data is processed by data; the processed patient voice data is sent to the input layer of the trained deep neural network model; the output state of the output layer of the deep neural network model is obtained; and the output is obtained according to the acquired The state determines the category to which the patient's voice data belongs.
  • the present application proposes a procedure 200 for predicting disease using speech.
  • the program 200 for performing disease prediction using voice includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the embodiments of the present application may be implemented. Control operations for disease prediction.
  • the program 200 for predicting disease using speech may be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the program 200 for predicting disease using speech may be divided into a training module 20, an acquisition module 21, a data processing module 22, an input module 23, and a determination module 24. among them:
  • the training module 20 is configured to train the deep neural network model with the training data.
  • the training data refers to the voice sample data used for the training of the deep neural network model, and the number of the voice sample data is extracted according to actual needs.
  • the number of the voice sample data is not specifically limited in this embodiment.
  • the training data has a particular speech category including a severe cold, a mild cold, a severe cough, a mild cough, and a non-disease, the state of the speech category being the probability of occurrence of the speech category.
  • the deep neural network model has an input layer and an output layer. Further, the deep neural network model further has a hidden layer. The output layer can output the status of the voice class.
  • the deep neural network includes an input layer 201, a plurality of hidden layers 202, and a plurality of output layers 203.
  • the input layer 201 is configured to calculate an output value of the hidden layer unit input to the lowest layer according to the voice feature data input to the deep neural network.
  • the voice feature data refers to voice data extracted from the training data.
  • the hidden layer 202 is configured to perform weighted summation on the input values from the next layer of hidden layers according to the weighting value of the layer, and calculate an output value outputted to the upper layer of the hidden layer.
  • the output layer 203 is configured to perform weighted summation on the output values of the hidden layer from the uppermost layer according to the weighting value of the layer, and calculate an output probability according to the result of the weighted summation.
  • the output probability is an output probability corresponding to the training data of the voice category.
  • Training data such as severe cold, mild cold, severe cough, mild cough, and non-disease are introduced into the basic deep neural network model to calculate the output probability corresponding to the training data of various speech categories.
  • y j wx j , where y j represents the output value of the jth training data of the current layer, w represents the weighting value of the current layer, and x j represents the input value of the jth training data of the current layer.
  • the output function of the output layer is calculated by using a softmax function.
  • the softmax function is as follows:
  • p j represents the output probability of the jth training data in the output layer
  • x j represents the weighted summation result of the jth training data in the output layer
  • the training module 20 determines the structure of the deep neural network, it is necessary to determine the weighting values of the layers of the deep neural network.
  • the training module 20 inputs all the voice feature data from the input layer of the deep neural network to the deep neural network, and obtains the output probability of the deep neural network, and calculates An error between the output probability and the expected output probability, and adjusting a weighting value of a hidden layer of the deep neural network according to an error between an output probability of the depth neural network and the expected output probability.
  • the trained deep neural network model is obtained.
  • the obtaining module 21 is configured to acquire real-time patient voice data. Specifically, the obtaining module 21 records the telephone voice entered by the patient through the recording device of the call center, and stores the telephone voice with the telephone number as an identifier to obtain real-time patient voice data.
  • the call center can be, but is not limited to, a telephone recording platform of a hospital and a remote server connected by a mobile phone app.
  • the obtaining module 21 can also actively take patient voice data. For example, in a hospital, a nurse can use a special recording device to specifically collect voice data for a patient, and use the patient name (or other attribute data representing the patient identity information, For example, ID number, social security card number, etc.) are stored for identification.
  • the data processing module 22 is configured to perform data processing on the patient voice data. Specifically, the data processing module 22 performs front-end processing on the acquired patient voice data, where the front-end processing includes noise reduction and endpoint detection. Further, the data processing module 22 further performs feature value extraction and selection of the speech signal on the patient speech data processed in the previous stage.
  • the endpoint detection is used to determine whether the patient voice data to be processed is valid voice. If it is not valid voice, the voice data is not processed, thereby improving the efficiency of the overall system.
  • the feature values that the data processing module 22 needs to extract include time domain feature parameters and time domain feature parameters, wherein the time domain feature parameters include short time average energy, short time average amplitude, short time average zero crossing rate, and resonance. Peak and base audio frequencies, etc., frequency domain characteristic parameters include linear prediction coefficient LPC, linear prediction cepstral coefficient LPCC, Mel freguency cepstrum coefficient (MFCC) and the like.
  • the basic audio frequency reflects the glottal excitation characteristics
  • the formant reflects the characteristics of the channel response
  • LPC and LPCC simultaneously reflect the characteristics of glottal excitation and channel response
  • MFCC simulates the human auditory characteristics.
  • Voices of different diseases (degrees) will have different characteristic parameter values. Therefore, the degree of disease of the patient can be initially reflected by the extraction of the eigenvalues.
  • the input module 23 sends the processed patient voice data to the input layer of the trained deep neural network model.
  • the obtaining module 21 is further configured to obtain an output state of the output layer of the deep neural network model after the processed patient voice data is sent to the input layer of the deep neural network model after training.
  • the determining module 24 determines the category to which the patient voice data belongs according to the acquired output state.
  • the training module 20 is further configured to establish a mapping relationship between each voice category and a desired state of each voice category outputted in the trained deep neural network model, In this way, the determining module 24 matches the acquired output state with the expected state in the mapping relationship table, and obtains the corresponding voice category in the mapping relationship table according to the matching, and can determine the location.
  • the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
  • the expected state output by the respective voice categories in the deep neural network model is a desired probability that each voice category outputs in the trained deep neural network model, such as patient voice data input after training.
  • the output state obtained in the deep neural network model matches the expected probability of the severe cold such speech class in the post-training deep neural network model, and the patient can be judged to be a severe cold, thereby providing certain data for the diagnosis of the follow-up doctor. support.
  • the program 200 for predicting diseases using speech proposed by the present application firstly trains a deep neural network model using training data, the training data having a specific speech category, and the deep neural network model has An input layer and an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; and then, the processed patient voice Data is sent to the input layer of the deep neural network model after training; in addition, an output state of the output layer of the deep neural network model is acquired; finally, the category to which the patient voice data belongs is determined according to the acquired output state .
  • the present application also proposes a method for predicting disease using speech.
  • FIG. 4 it is a flowchart of a first embodiment of a method for predicting disease using speech using the present application.
  • the order of execution of the steps in the flowchart shown in FIG. 4 may be changed according to different requirements, and some steps may be omitted.
  • Step S401 training the deep neural network model with the training data.
  • the training data refers to the voice sample data used for the training of the deep neural network model, and the number of the voice sample data is extracted according to actual needs.
  • the number of the voice sample data is not specifically limited in this embodiment.
  • the training data has a particular speech category including a severe cold, a mild cold, a severe cough, a mild cough, and a non-disease, the state of the speech category being the probability of occurrence of the speech category.
  • the deep neural network model has an input layer and an output layer. Further, the deep neural network model further has a hidden layer. The output layer can output the status of the voice class.
  • the deep neural network includes an input layer 201, a plurality of hidden layers 202, and a plurality of output layers 203.
  • the input layer 201 is configured to calculate an output value of the hidden layer unit input to the lowest layer according to the voice feature data input to the deep neural network.
  • the voice feature data refers to voice data extracted from the training data.
  • the hidden layer 202 is configured to perform weighted summation on the input values from the next layer of hidden layers according to the weighting value of the layer, and calculate an output value outputted to the upper layer of the hidden layer.
  • the output layer 203 is configured to perform weighted summation on the output values of the hidden layer from the uppermost layer according to the weighting value of the layer, and calculate an output probability according to the result of the weighted summation.
  • the output probability is an output probability corresponding to the training data of the voice category.
  • Training data such as severe cold, mild cold, severe cough, mild cough, and non-disease are introduced into the basic deep neural network model to calculate the output probability corresponding to the training data of various speech categories.
  • y j wx j , where y j represents the output value of the jth training data of the current layer, w represents the weighting value of the current layer, and x j represents the input value of the jth training data of the current layer.
  • the application server 1 calculates a weighted summation result of the output layer by using the weighting value of the output layer 203, and then calculates an output function of the output layer by using a softmax function.
  • the softmax function is as follows:
  • p j represents the output probability of the jth training data in the output layer
  • x j represents the weighted summation result of the jth training data in the output layer
  • the application server 1 After determining the structure of the deep neural network, the application server 1 needs to determine the weighting values of the layers of the deep neural network.
  • the application server 1 inputs all the voice feature data from the input layer of the deep neural network to the deep neural network, and obtains the output probability of the deep neural network, and calculates An error between the output probability and the expected output probability, and adjusting a weighting value of a hidden layer of the deep neural network according to an error between an output probability of the depth neural network and the expected output probability.
  • the trained deep neural network model After obtaining the weighted values of the adjusted layers, the trained deep neural network model is obtained.
  • step S402 real-time patient voice data is acquired.
  • the application server 1 records the telephone voice entered by the patient through the recording device of the call center, and stores the telephone voice as a logo of the telephone number to obtain real-time patient voice data.
  • the call center can be, but is not limited to, a telephone recording platform of a hospital and a remote server connected by a mobile phone app.
  • the application server 1 can also actively take patient voice data. For example, in a hospital, a nurse can use a special recording device to specifically collect voice data for a patient, and use the patient name (or other attribute data representing the patient identity information, For example, ID number, social security card number, etc.) are stored for identification.
  • Step S403 performing data processing on the patient voice data. Specifically, the step of performing data processing on the patient voice data is described in detail in the second embodiment (see FIG. 5) of the method for predicting disease using voice in the present application.
  • Step S404 the processed patient voice data is sent to the input layer of the trained deep neural network model.
  • Step S405 acquiring an output state of an output layer of the deep neural network model.
  • the expected state output by the respective voice categories in the deep neural network model is a desired probability that each voice category outputs in the trained deep neural network model.
  • Step S406 determining, according to the acquired output state, a category to which the patient voice data belongs.
  • the application server 1 In order to clearly and intuitively obtain the category to which the patient voice data belongs, before determining the category to which the patient voice data belongs according to the acquired output state, the application server 1 also establishes each voice category and each voice category after the training. a mapping table between the expected states of the output in the deep neural network model, such that the application server 1 matches the acquired output state with the expected state in the mapping relationship table, and obtains the expectation according to the matching The status of the corresponding voice category in the mapping relationship table can determine that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
  • the output state obtained by inputting patient speech data into the trained deep neural network model matches the expected probability of the severe cold such speech class in the trained deep neural network model, and then the patient may be determined to be a severe cold, and further Provide some data support for the follow-up doctor's diagnosis.
  • the method for predicting disease using speech proposed by the present application firstly trains a deep neural network model using training data, the training data has a specific speech category, and the deep neural network model has an input layer. And an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; and then send the processed patient voice data to Entering an input layer of the deep neural network model after training; in addition, acquiring an output state of an output layer of the deep neural network model; and finally, determining a category to which the patient voice data belongs according to the acquired output state.
  • the step of performing data processing on the patient voice data includes:
  • Step S501 performing front end processing on the acquired patient voice data.
  • the front segment processing includes noise reduction and endpoint detection.
  • the endpoint detection is used to determine whether the patient voice data to be processed is valid voice. If it is not valid voice, the voice data is not processed, thereby improving the efficiency of the overall system.
  • Step S502 performing feature value extraction and selection of the speech signal on the patient speech data processed in the previous stage.
  • the feature values that the application server 1 needs to extract include time domain feature parameters and frequency domain feature parameters, wherein the time domain feature parameters include short time average energy, short time average amplitude, and short time average zero crossing rate.
  • time domain feature parameters include short time average energy, short time average amplitude, and short time average zero crossing rate.
  • formant, fundamental frequency, etc., frequency domain characteristic parameters include linear prediction coefficient LPC, linear prediction cepstral coefficient LPCC, Mel freguency cepstrum coefficient (MFCC) and so on.
  • LPC linear prediction cepstral coefficient
  • MFCC Mel freguency cepstrum coefficient
  • the basic audio frequency reflects the glottal excitation characteristics
  • the formant reflects the characteristics of the channel response
  • LPC and LPCC simultaneously reflect the characteristics of glottal excitation and channel response
  • MFCC simulates the human auditory characteristics. Voices of different diseases (degrees) will have different characteristic parameter values. Therefore, the degree of disease of the patient can be initially reflected by the extraction of the
  • the method for predicting disease using voice proposed by the present application improves the efficiency of the overall system by performing front-end processing on the acquired patient voice data. And through the extraction and selection of the eigenvalues of the speech signal from the patient speech data processed in the previous segment, the patient's disease degree is initially reflected.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Abstract

Disclosed in the present application is a voice-based disease prediction method. The method comprises: using training data to train a deep neural network model, the training data having a specific voice type, and the deep neural network model having an input layer and an output layer; obtaining patient voice data in real time; performing data processing on the patient voice data; sending the processed patient voice data to the input layer of the trained deep neural network model; obtaining an output state of the output layer of the deep neural network model; and determining the type of the patient voice data according to the obtained output state. The present invention further provides an application server. According to the voice-based disease prediction method and the application server provided by the present application, an initial diagnosis may be quickly made for a patient on the basis of the voice of the patient, thus providing certain data support and references for the follow up official diagnose of the doctor, and greatly facilitating doctors and patients.

Description

利用语音进行疾病预测的方法、应用服务器和计算机可读存储介质Method for predicting disease using voice, application server, and computer readable storage medium
本申请基于巴黎公约申明享有2017年10月23日递交的申请号为CN 201710995691.7、名称为“利用语音进行疾病预测的方法及应用服务器”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。This application is based on the priority of the Chinese Patent Application entitled "Method and Application Server for Using the Voice for Disease Prediction", which is filed on October 23, 2017, with the application number of CN 201710995691.7, which is filed on October 23, 2017. The manner of reference is incorporated in the present application.
技术领域Technical field
本申请涉及疾病预测领域,特别涉及一种利用语音进行疾病预测的方法、应用服务器和计算机可读存储介质。The present application relates to the field of disease prediction, and in particular, to a method for predicting disease using voice, an application server, and a computer readable storage medium.
背景技术Background technique
听音诊病,望闻问切,被认为是中医行之有效的诊断手段,一个好中医往往具备丰富的行医经验,需要时间的积累,并且在就诊过程中,由于专家数量有限,病患往往需要等候很长的时间才能够得到自己的诊断结果。看病难、效率低是时下突出的民生问题。如何提高诊断效率,让每个人都能拥有自己的家庭医生是新时代迫切需要解决的问题。Listening to the diagnosis of the disease, it is considered to be an effective diagnostic tool for Chinese medicine practitioners. A good Chinese medicine practitioner often has rich experience in practicing medicine and needs time to accumulate. In the course of treatment, due to the limited number of experts, patients often need to wait very much. It takes a long time to get your own diagnosis. Difficulties in medical treatment and low efficiency are prominent issues of people's livelihood. How to improve the efficiency of diagnosis, so that everyone can have their own family doctor is an urgent problem to be solved in the new era.
发明内容Summary of the invention
本申请提供一种利用语音进行疾病预测的方法、应用服务器和计算机可读存储介质,可以很方便的在患者进行正式治疗之前,通过患者的语音快速的对患者进行初步的诊断,进而为后续医生的正式诊断提供一定的数据支撑和参考,进而大大方便了医生和患者。The present application provides a method for predicting disease using voice, an application server, and a computer readable storage medium, which can conveniently perform a preliminary diagnosis of a patient through a patient's voice before the patient performs formal treatment, thereby being a follow-up doctor. The formal diagnosis provides a certain amount of data support and reference, which greatly facilitates doctors and patients.
本申请第一方面提供一种利用语音进行疾病预测的方法,该方法应用于应用服务器,所述方法包括:A first aspect of the present application provides a method for predicting disease using voice, the method being applied to an application server, the method comprising:
利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
获取实时的患者语音数据;Obtain real-time patient voice data;
对所述患者语音数据进行数据处理;Performing data processing on the patient voice data;
将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;Transmitting the processed patient voice data into an input layer of the trained deep neural network model;
获取所述深度神经网络模型的输出层的输出状态;及Obtaining an output state of an output layer of the deep neural network model; and
根据获取的所述输出状态判断所述患者语音数据所属的类别。The category to which the patient voice data belongs is determined according to the acquired output state.
本申请第二方面提供一种应用服务器,所述应用服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的利用语音进行疾病预测的程序,所述利用语音进行疾病预测的程序被所述处理器执行时,实现如下步骤:A second aspect of the present application provides an application server, where the application server includes a memory, a processor, and a program for performing disease prediction using voice that can be run on the processor, where the disease is performed by using a voice When the predicted program is executed by the processor, the following steps are implemented:
利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
获取实时的患者语音数据;Obtain real-time patient voice data;
对所述患者语音数据进行数据处理;Performing data processing on the patient voice data;
将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;Transmitting the processed patient voice data into an input layer of the trained deep neural network model;
获取所述深度神经网络模型的输出层的输出状态;及Obtaining an output state of an output layer of the deep neural network model; and
根据获取的所述输出状态判断所述患者语音数据所属的类别。The category to which the patient voice data belongs is determined according to the acquired output state.
本申请第三方面提供一种计算机可读存储介质,所述计算机可读存储介质存储有利用语音进行疾病预测的程序,所述利用语音进行疾病预测的程序可被至少一个处理器执行,以使所述至少一个处理器执行以下步骤:A third aspect of the present application provides a computer readable storage medium storing a program for performing disease prediction using voice, the program for performing disease prediction using voice may be executed by at least one processor to enable The at least one processor performs the following steps:
利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
获取实时的患者语音数据;Obtain real-time patient voice data;
对所述患者语音数据进行数据处理;Performing data processing on the patient voice data;
将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;Transmitting the processed patient voice data into an input layer of the trained deep neural network model;
获取所述深度神经网络模型的输出层的输出状态;及Obtaining an output state of an output layer of the deep neural network model; and
根据获取的所述输出状态判断所述患者语音数据所属的类别。The category to which the patient voice data belongs is determined according to the acquired output state.
相较于现有技术,本申请所提出的应用服务器、利用语音进行疾病预测的方法及计算机可读存储介质,首先,利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;其次,获取实时的患者语音数据;然后,对所述患者语音数据进行数据处理;接着,将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;另外,获取所述深度神经网络模型的输出层的输出状态;最后,根据获取的所述输出状态判断所述患者语音数据所属的类别。这样,可以避免现有技术中专家数量有限,病患需要等候很长的时间才能够得到自己的诊断结果,造成看病难、效率低的弊端,可以很方便的在患者进行正式治疗之前,通过患者的语音快速的对患者进行初步的诊断,进而为后续医生的正式诊断提供一定的数据支撑和参考,进而大大方便了医生和患者。Compared with the prior art, the application server proposed by the present application, the method for predicting disease using voice, and the computer readable storage medium, firstly, training the deep neural network model by using training data, the training data having a specific voice category, The deep neural network model has an input layer and an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; then, The processed patient voice data is sent to the input layer of the deep neural network model after training; in addition, the output state of the output layer of the deep neural network model is acquired; finally, the output state is determined according to the acquired output state The category to which the patient's voice data belongs. In this way, it is possible to avoid the limited number of experts in the prior art, and the patient needs to wait for a long time to obtain his own diagnosis result, which causes the difficulty of medical treatment and low efficiency, and can be conveniently passed before the patient undergoes formal treatment. The voice quickly diagnoses the patient, which provides a certain data support and reference for the follow-up doctor's formal diagnosis, which greatly facilitates the doctor and the patient.
附图说明DRAWINGS
图1是应用服务器一可选的硬件架构的示意图;1 is a schematic diagram of an optional hardware architecture of an application server;
图2是本申请利用语音进行疾病预测的程序第一实施例的程序模块图;2 is a program block diagram of a first embodiment of a program for predicting disease using speech using the present application;
图3为本申请较优实施例中深度神经网络模型的结构图;3 is a structural diagram of a deep neural network model in a preferred embodiment of the present application;
图4为本申请利用语音进行疾病预测的方法第一实施例的流程图;4 is a flow chart of a first embodiment of a method for disease prediction using speech;
图5为本申请利用语音进行疾病预测的方法第二实施例的流程图。FIG. 5 is a flow chart of a second embodiment of a method for disease prediction using voice.
附图标记:Reference mark:
应用服务器application server 11
存储器Memory 1111
处理器processor 1212
网络接口Network Interface 1313
利用语音进行疾病预测的程序Procedure for disease prediction using speech 200200
训练模块Training module 2020
获取模块Acquisition module 21twenty one
数据处理模块Data processing module 22twenty two
输入模块Input module 23twenty three
判断模块Judgment module 24twenty four
具体实施方式Detailed ways
以下结合附图对本申请的原理和特征进行描述,所举实例只用于解释本申请,并非用于限定本申请的范围。The principles and features of the present application are described in the following with reference to the accompanying drawings, which are only used to explain the present application and are not intended to limit the scope of the application.
参阅图1所示,是应用服务器1一可选的硬件架构的示意图。Referring to FIG. 1, it is a schematic diagram of an optional hardware architecture of the application server 1.
所述应用服务器1可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器等计算设备,该应用服务器1可以是独立的服务器,也可以是多个服务器所组成的服务器集群。The application server 1 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The application server 1 may be a stand-alone server or a server cluster composed of multiple servers.
本实施例中,所述应用服务器1可包括,但不仅限于,可通过系统总线相互通信连接存储器11、处理器12、网络接口13。In this embodiment, the application server 1 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus.
所述应用服务器1通过网络接口13连接网络,获取资讯。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通 讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi、通话网络等无线或有线网络。The application server 1 connects to the network through the network interface 13 to obtain information. The network may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, or a 5G network. Wireless or wired networks such as networks, Bluetooth, Wi-Fi, and call networks.
需要指出的是,图1仅示出了具有组件11-13的应用服务器1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。It is pointed out that Figure 1 only shows the application server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器11可以是所述应用服务器1的内部存储单元,例如该应用服务器1的硬盘或内存。在另一些实施例中,所述存储器11也可以是所述应用服务器1的外部存储设备,例如该应用服务器1配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器11还可以既包括所述应用服务器1的内部存储单元也包括其外部存储设备。本实施例中,所述存储器11通常用于存储安装于所述应用服务器1的操作系统和各类应用软件,例如利用语音进行疾病预测的程序200的程序代码等。此外,所述存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), and a random access memory (RAM). , static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 11 may be an internal storage unit of the application server 1, such as a hard disk or memory of the application server 1. In other embodiments, the memory 11 may also be an external storage device of the application server 1, such as a plug-in hard disk equipped with the application server 1, a smart memory card (SMC), and a secure digital ( Secure Digital, SD) cards, flash cards, etc. Of course, the memory 11 can also include both the internal storage unit of the application server 1 and its external storage device. In this embodiment, the memory 11 is generally used to store an operating system installed in the application server 1 and various types of application software, such as program codes of the program 200 for performing disease prediction using voice. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述应用服务器1的总体操作,例如执行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行所述的利用语音进行疾病预测的程序200等。The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the application server 1, such as performing data interaction or communication related control and processing, and the like. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as running the program 200 for performing disease prediction using voice.
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述应用服务器1与其他电子设备之间建立通信连接。The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the application server 1 and other electronic devices.
本实施例中,所述应用服务器1内安装并运行有利用语音进行疾病预测的程序200,当所述利用语音进行疾病预测的程序200运行时,所述应用服务器1利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;获取实时的患者语音数据;对所述患者语音数据进行数据处理;将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;获取所述深度神经网络模型的输出层的输出状态;并根据获取的所述输出状态判断所 述患者语音数据所属的类别。这样,可以避免现有技术中专家数量有限,病患需要等候很长的时间才能够得到自己的诊断结果,造成看病难、效率低的弊端,可以很方便的在患者进行正式治疗之前,通过患者的语音快速的对患者进行初步的诊断,进而为后续医生的正式诊断提供一定的数据支撑和参考,进而大大方便了医生和患者。In this embodiment, a program 200 for performing disease prediction using voice is installed and run in the application server 1. When the program 200 for performing disease prediction using voice is running, the application server 1 trains a deep neural network by using training data. a model, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the voice category; acquiring real-time patient voice data; Data is processed by data; the processed patient voice data is sent to the input layer of the trained deep neural network model; the output state of the output layer of the deep neural network model is obtained; and the output is obtained according to the acquired The state determines the category to which the patient's voice data belongs. In this way, it is possible to avoid the limited number of experts in the prior art, and the patient needs to wait for a long time to obtain his own diagnosis result, which causes the difficulty of medical treatment and low efficiency, and can be conveniently passed before the patient undergoes formal treatment. The voice quickly diagnoses the patient, which provides a certain data support and reference for the follow-up doctor's formal diagnosis, which greatly facilitates the doctor and the patient.
至此,己经详细介绍了本申请各个实施例的相关设备的硬件结构和功能。下面,将基于上述应用环境和相关设备,提出本申请的各个实施例。So far, the hardware structure and functions of the related devices of the various embodiments of the present application have been described in detail. Hereinafter, various embodiments of the present application will be proposed based on the above-described application environment and related devices.
首先,本申请提出一种利用语音进行疾病预测的程序200。First, the present application proposes a procedure 200 for predicting disease using speech.
本实施例中,所述的利用语音进行疾病预测的程序200包括一系列的存储于存储器11上的计算机程序指令,当该计算机程序指令被处理器12执行时,可以实现本申请各实施例的疾病预测的控制操作。在一些实施例中,基于该计算机程序指令各部分所实现的特定的操作,所述利用语音进行疾病预测的程序200可以被划分为一个或多个模块。例如,在图2中,所述的利用语音进行疾病预测的程序200可以被分割成训练模块20、获取模块21、数据处理模块22、输入模块23以及判断模块24。其中:In this embodiment, the program 200 for performing disease prediction using voice includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the embodiments of the present application may be implemented. Control operations for disease prediction. In some embodiments, the program 200 for predicting disease using speech may be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the program 200 for predicting disease using speech may be divided into a training module 20, an acquisition module 21, a data processing module 22, an input module 23, and a determination module 24. among them:
所述训练模块20,用于利用训练数据训练深度神经网络模型。The training module 20 is configured to train the deep neural network model with the training data.
具体地,训练数据是指用来进行深度神经网络模型训练的语音样本数据,所述语音样本数据的数量根据实际需要进行抽取,本实施例对语音样本数据的数量并不作具体限定。所述训练数据具有特定的语音类别,所述语音类别包括重度感冒、轻度感冒、重度咳嗽、轻度咳嗽、以及非疾病,所述语音类别的状态为所述语音类别出现的概率。Specifically, the training data refers to the voice sample data used for the training of the deep neural network model, and the number of the voice sample data is extracted according to actual needs. The number of the voice sample data is not specifically limited in this embodiment. The training data has a particular speech category including a severe cold, a mild cold, a severe cough, a mild cough, and a non-disease, the state of the speech category being the probability of occurrence of the speech category.
本实施例中,所述深度神经网络模型具有输入层和输出层,进一步地,所述深度神经网络模型还具有隐层。所述输出层可以输出所述语音类别的状态。In this embodiment, the deep neural network model has an input layer and an output layer. Further, the deep neural network model further has a hidden layer. The output layer can output the status of the voice class.
如图3所示,为本实施例中深度神经网络模型的结构图。所述深度神经网络包括一个输入层201,多个隐层202,以及多个输出层203。所述输入层201用于根据输入所述深度神经网络的语音特征数据计算输入至最底层的隐层单元的输出值。其中,语音特征数据是指从训练数据中提取出来的语音数据。所述隐层202用于根据本层的加权值对来自下一层隐层的输入值进行加权求和,计算向上一层隐层输出的输出值。所述输出层203用于根据本层的加权值对来自最上层的隐层的输出值进行加权求和,并根据所述加权求和的结果计算输出概率。所述输出概率是所述语音类别的训练数据对应的输出概率。即将重度感冒、轻度感冒、重度咳嗽、轻度咳嗽、非疾病此类语音类别的训练数据导入到基本的深度神经网络模型中,计算各种语音类别的训练数 据对应的输出概率。As shown in FIG. 3, it is a structural diagram of a deep neural network model in this embodiment. The deep neural network includes an input layer 201, a plurality of hidden layers 202, and a plurality of output layers 203. The input layer 201 is configured to calculate an output value of the hidden layer unit input to the lowest layer according to the voice feature data input to the deep neural network. The voice feature data refers to voice data extracted from the training data. The hidden layer 202 is configured to perform weighted summation on the input values from the next layer of hidden layers according to the weighting value of the layer, and calculate an output value outputted to the upper layer of the hidden layer. The output layer 203 is configured to perform weighted summation on the output values of the hidden layer from the uppermost layer according to the weighting value of the layer, and calculate an output probability according to the result of the weighted summation. The output probability is an output probability corresponding to the training data of the voice category. Training data such as severe cold, mild cold, severe cough, mild cough, and non-disease are introduced into the basic deep neural network model to calculate the output probability corresponding to the training data of various speech categories.
所述深度神经网络各个层的一个输出值的计算可以依据下列公式得到:The calculation of an output value of each layer of the deep neural network can be obtained according to the following formula:
y j=wx j,其中y j表示当前层的第j个训练数据的输出值,w表示当前层的加权值,x j表示当前层第j个训练数据的输入值。 y j =wx j , where y j represents the output value of the jth training data of the current layer, w represents the weighting value of the current layer, and x j represents the input value of the jth training data of the current layer.
所述训练模块20利用所述输出层203的加权值计算得到所述输出层的加权求和结果后,利用softmax函数计算所述输出层的输出函数。所述softmax函数如下:After the training module 20 calculates the weighted summation result of the output layer by using the weighting value of the output layer 203, the output function of the output layer is calculated by using a softmax function. The softmax function is as follows:
p j=exp(x j) p j =exp(x j )
其中,p j表示所述输出层中第j个训练数据的输出概率,x j表示输出层中第j个训练数据的加权求和结果。 Where p j represents the output probability of the jth training data in the output layer, and x j represents the weighted summation result of the jth training data in the output layer.
所述训练模块20在确定了所述深度神经网络的结构以后,需要确定所述深度神经网络的各层的加权值。当利用全部语音特征数据训练深度神经网络时,所述训练模块20将全部语音特征数据从所述深度神经网络的输入层输入至所述深度神经网络,得到所述深度神经网络的输出概率,计算所述输出概率与所述期望输出概率之间的误差,并根据所述深度神经网络的输出概率与所述期望输出概率之间的误差调整所述深度神经网络的隐层的加权值。获取调整好的各个层的加权值之后,则获取了训练后的深度神经网络模型。After the training module 20 determines the structure of the deep neural network, it is necessary to determine the weighting values of the layers of the deep neural network. When training the deep neural network with all the voice feature data, the training module 20 inputs all the voice feature data from the input layer of the deep neural network to the deep neural network, and obtains the output probability of the deep neural network, and calculates An error between the output probability and the expected output probability, and adjusting a weighting value of a hidden layer of the deep neural network according to an error between an output probability of the depth neural network and the expected output probability. After obtaining the weighted values of the adjusted layers, the trained deep neural network model is obtained.
所述获取模块21,用于获取实时的患者语音数据。具体地,所述获取模块21通过呼叫中心的录音装置对患者打进的电话语音进行记录,并将所述电话语音以电话号码为标识进行存储以获取实时的患者语音数据。其中,呼叫中心可以但不限于是医院的电话录音平台、手机app连接的远程服务器。另外,所述获取模块21也可以主动录取患者语音数据,比如在医院,护士可以用专门的录音设备专门针对患者进行语音数据的采集,并以患者姓名(或者其他代表患者身份信息的属性数据,例如,身份证号,社保卡号等)为标识进行存储。The obtaining module 21 is configured to acquire real-time patient voice data. Specifically, the obtaining module 21 records the telephone voice entered by the patient through the recording device of the call center, and stores the telephone voice with the telephone number as an identifier to obtain real-time patient voice data. The call center can be, but is not limited to, a telephone recording platform of a hospital and a remote server connected by a mobile phone app. In addition, the obtaining module 21 can also actively take patient voice data. For example, in a hospital, a nurse can use a special recording device to specifically collect voice data for a patient, and use the patient name (or other attribute data representing the patient identity information, For example, ID number, social security card number, etc.) are stored for identification.
在所述获取模块21获取实时的患者语音数据后,所述数据处理模块22,用于对所述患者语音数据进行数据处理。具体地,数据处理模块22对获取的患者语音数据进行前端处理,所述前段处理包括降噪及端点检测。进一步地,所述数据处理模块22还对前段处理后的患者语音数据进行语音信号的特征值提取与选择。After the obtaining module 21 acquires the real-time patient voice data, the data processing module 22 is configured to perform data processing on the patient voice data. Specifically, the data processing module 22 performs front-end processing on the acquired patient voice data, where the front-end processing includes noise reduction and endpoint detection. Further, the data processing module 22 further performs feature value extraction and selection of the speech signal on the patient speech data processed in the previous stage.
本实施例中,所述端点检测用于判断所需处理的患者语音数据是否为有效语音,如果不是有效语音,则不用对语音数据进行数据处理,进而提升整体系统的效率。另外,所述数据处理模块22所需要提取的特征值包括时域特征参数和频域特征参数,其中,时域特征参数包括短时平均能量、短时平均幅度、短时平均过零率、共振峰、基音频率等,频域特征参数包括线性预测系数LPC、线性预测倒谱系数 LPCC、梅尔频率倒谱系数(Mel freguency cepstrum coefficient,MFCC)等。其中:基音频率反映声门激励特征,共振峰体现声道响应的特性,LPC、LPCC同时体现声门激励和声道响应的特性,MFCC模拟了人耳听觉特性。不同疾病(程度)的语音将有不同的特征参数值。因此,通过特征值的提取能够初步反映患者的疾病程度。In this embodiment, the endpoint detection is used to determine whether the patient voice data to be processed is valid voice. If it is not valid voice, the voice data is not processed, thereby improving the efficiency of the overall system. In addition, the feature values that the data processing module 22 needs to extract include time domain feature parameters and time domain feature parameters, wherein the time domain feature parameters include short time average energy, short time average amplitude, short time average zero crossing rate, and resonance. Peak and base audio frequencies, etc., frequency domain characteristic parameters include linear prediction coefficient LPC, linear prediction cepstral coefficient LPCC, Mel freguency cepstrum coefficient (MFCC) and the like. Among them: the basic audio frequency reflects the glottal excitation characteristics, the formant reflects the characteristics of the channel response, LPC and LPCC simultaneously reflect the characteristics of glottal excitation and channel response, and MFCC simulates the human auditory characteristics. Voices of different diseases (degrees) will have different characteristic parameter values. Therefore, the degree of disease of the patient can be initially reflected by the extraction of the eigenvalues.
进一步地,当所述数据处理模块22对所述患者语音数据进行数据处理后,所述输入模块23将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层。Further, after the data processing module 22 performs data processing on the patient voice data, the input module 23 sends the processed patient voice data to the input layer of the trained deep neural network model.
所述获取模块21还用于获取处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层后,所述深度神经网络模型的输出层的输出状态。The obtaining module 21 is further configured to obtain an output state of the output layer of the deep neural network model after the processed patient voice data is sent to the input layer of the deep neural network model after training.
所述判断模块24根据获取的所述输出状态判断所述患者语音数据所属的类别。为了清晰直观地获得患者语音数据所属的类别,所述训练模块20还用于建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表,这样,所述判断模块24将获取的所述输出状态与所述映射关系表中的期望状态进行匹配,根据匹配得到所述期望状态在所述映射关系表中对应的语音类别,就能判定所述患者语音数据对应的患者属于所述期望状态对应的语音类别。The determining module 24 determines the category to which the patient voice data belongs according to the acquired output state. In order to clearly and intuitively obtain the category to which the patient voice data belongs, the training module 20 is further configured to establish a mapping relationship between each voice category and a desired state of each voice category outputted in the trained deep neural network model, In this way, the determining module 24 matches the acquired output state with the expected state in the mapping relationship table, and obtains the corresponding voice category in the mapping relationship table according to the matching, and can determine the location. The patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
本实施例中,所述各个语音类别在所述深度神经网络模型中输出的期望状态为各个语音类别在所述训练后的深度神经网络模型中输出的期望概率,比如患者语音数据输入到训练后的深度神经网络模型中得到的输出状态匹配了重度感冒这种语音类别在训练后的深度神经网络模型中的期望概率,则可以判定该患者属于重度感冒,进而为后续医生的诊断提供一定的数据支撑。In this embodiment, the expected state output by the respective voice categories in the deep neural network model is a desired probability that each voice category outputs in the trained deep neural network model, such as patient voice data input after training. The output state obtained in the deep neural network model matches the expected probability of the severe cold such speech class in the post-training deep neural network model, and the patient can be judged to be a severe cold, thereby providing certain data for the diagnosis of the follow-up doctor. support.
通过上述程序模块20-24,本申请所提出的利用语音进行疾病预测的程序200,首先,利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;其次,获取实时的患者语音数据;然后,对所述患者语音数据进行数据处理;接着,将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;另外,获取所述深度神经网络模型的输出层的输出状态;最后,根据获取的所述输出状态判断所述患者语音数据所属的类别。这样,可以避免现有技术中专家数量有限,病患需要等候很长的时间才能够得到自己的诊断结果,造成看病难、效率低的弊端,可以很方便的在患者进行正式治疗之前,通过患者的语音快速的对患者进行初步的诊断,进而为后续医生的正式诊断提供一定的数据支撑和参考,进 而大大方便了医生和患者。Through the above-mentioned program modules 20-24, the program 200 for predicting diseases using speech proposed by the present application firstly trains a deep neural network model using training data, the training data having a specific speech category, and the deep neural network model has An input layer and an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; and then, the processed patient voice Data is sent to the input layer of the deep neural network model after training; in addition, an output state of the output layer of the deep neural network model is acquired; finally, the category to which the patient voice data belongs is determined according to the acquired output state . In this way, it is possible to avoid the limited number of experts in the prior art, and the patient needs to wait for a long time to obtain his own diagnosis result, which causes the difficulty of medical treatment and low efficiency, and can be conveniently passed before the patient undergoes formal treatment. The voice quickly diagnoses the patient, which provides a certain data support and reference for the follow-up doctor's formal diagnosis, which greatly facilitates the doctor and the patient.
此外,本申请还提出一种利用语音进行疾病预测的方法。In addition, the present application also proposes a method for predicting disease using speech.
参阅图4所示,是本申请利用语音进行疾病预测的方法第一实施例的流程图。在本实施例中,根据不同的需求,图4所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。Referring to FIG. 4, it is a flowchart of a first embodiment of a method for predicting disease using speech using the present application. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 4 may be changed according to different requirements, and some steps may be omitted.
步骤S401,利用训练数据训练深度神经网络模型。Step S401, training the deep neural network model with the training data.
具体地,训练数据是指用来进行深度神经网络模型训练的语音样本数据,所述语音样本数据的数量根据实际需要进行抽取,本实施例对语音样本数据的数量并不作具体限定。所述训练数据具有特定的语音类别,所述语音类别包括重度感冒、轻度感冒、重度咳嗽、轻度咳嗽、以及非疾病,所述语音类别的状态为所述语音类别出现的概率。Specifically, the training data refers to the voice sample data used for the training of the deep neural network model, and the number of the voice sample data is extracted according to actual needs. The number of the voice sample data is not specifically limited in this embodiment. The training data has a particular speech category including a severe cold, a mild cold, a severe cough, a mild cough, and a non-disease, the state of the speech category being the probability of occurrence of the speech category.
本实施例中,所述深度神经网络模型具有输入层和输出层,进一步地,所述深度神经网络模型还具有隐层。所述输出层可以输出所述语音类别的状态。In this embodiment, the deep neural network model has an input layer and an output layer. Further, the deep neural network model further has a hidden layer. The output layer can output the status of the voice class.
如图3所示,为本实施例中深度神经网络模型的结构图。所述深度神经网络包括一个输入层201,多个隐层202,以及多个输出层203。所述输入层201用于根据输入所述深度神经网络的语音特征数据计算输入至最底层的隐层单元的输出值。其中,语音特征数据是指从训练数据中提取出来的语音数据。所述隐层202用于根据本层的加权值对来自下一层隐层的输入值进行加权求和,计算向上一层隐层输出的输出值。所述输出层203用于根据本层的加权值对来自最上层的隐层的输出值进行加权求和,并根据所述加权求和的结果计算输出概率。所述输出概率是所述语音类别的训练数据对应的输出概率。即将重度感冒、轻度感冒、重度咳嗽、轻度咳嗽、非疾病此类语音类别的训练数据导入到基本的深度神经网络模型中,计算各种语音类别的训练数据对应的输出概率。As shown in FIG. 3, it is a structural diagram of a deep neural network model in this embodiment. The deep neural network includes an input layer 201, a plurality of hidden layers 202, and a plurality of output layers 203. The input layer 201 is configured to calculate an output value of the hidden layer unit input to the lowest layer according to the voice feature data input to the deep neural network. The voice feature data refers to voice data extracted from the training data. The hidden layer 202 is configured to perform weighted summation on the input values from the next layer of hidden layers according to the weighting value of the layer, and calculate an output value outputted to the upper layer of the hidden layer. The output layer 203 is configured to perform weighted summation on the output values of the hidden layer from the uppermost layer according to the weighting value of the layer, and calculate an output probability according to the result of the weighted summation. The output probability is an output probability corresponding to the training data of the voice category. Training data such as severe cold, mild cold, severe cough, mild cough, and non-disease are introduced into the basic deep neural network model to calculate the output probability corresponding to the training data of various speech categories.
所述深度神经网络各个层的一个输出值的计算可以依据下列公式得到:The calculation of an output value of each layer of the deep neural network can be obtained according to the following formula:
y j=wx j,其中y j表示当前层的第j个训练数据的输出值,w表示当前层的加权值,x j表示当前层第j个训练数据的输入值。 y j =wx j , where y j represents the output value of the jth training data of the current layer, w represents the weighting value of the current layer, and x j represents the input value of the jth training data of the current layer.
所述应用服务器1利用所述输出层203的加权值计算得到所述输出层的加权求和结果后,利用softmax函数计算所述输出层的输出函数。所述softmax函数如下:The application server 1 calculates a weighted summation result of the output layer by using the weighting value of the output layer 203, and then calculates an output function of the output layer by using a softmax function. The softmax function is as follows:
p j=exp(x j) p j =exp(x j )
其中,p j表示所述输出层中第j个训练数据的输出概率,x j表示输出层中第j个训练数据的加权求和结果。 Where p j represents the output probability of the jth training data in the output layer, and x j represents the weighted summation result of the jth training data in the output layer.
所述应用服务器1在确定了所述深度神经网络的结构以后,需要 确定所述深度神经网络的各层的加权值。当利用全部语音特征数据训练深度神经网络时,所述应用服务器1将全部语音特征数据从所述深度神经网络的输入层输入至所述深度神经网络,得到所述深度神经网络的输出概率,计算所述输出概率与所述期望输出概率之间的误差,并根据所述深度神经网络的输出概率与所述期望输出概率之间的误差调整所述深度神经网络的隐层的加权值。获取调整好的各个层的加权值之后,则获取了训练后的深度神经网络模型。After determining the structure of the deep neural network, the application server 1 needs to determine the weighting values of the layers of the deep neural network. When the deep neural network is trained by using all the voice feature data, the application server 1 inputs all the voice feature data from the input layer of the deep neural network to the deep neural network, and obtains the output probability of the deep neural network, and calculates An error between the output probability and the expected output probability, and adjusting a weighting value of a hidden layer of the deep neural network according to an error between an output probability of the depth neural network and the expected output probability. After obtaining the weighted values of the adjusted layers, the trained deep neural network model is obtained.
步骤S402,获取实时的患者语音数据。具体地,所述应用服务器1通过呼叫中心的录音装置对患者打进的电话语音进行记录,并将所述电话语音以电话号码为标识进行存储以获取实时的患者语音数据。其中,呼叫中心可以但不限于是医院的电话录音平台、手机app连接的远程服务器。另外,所述应用服务器1也可以主动录取患者语音数据,比如在医院,护士可以用专门的录音设备专门针对患者进行语音数据的采集,并以患者姓名(或者其他代表患者身份信息的属性数据,例如,身份证号,社保卡号等)为标识进行存储。In step S402, real-time patient voice data is acquired. Specifically, the application server 1 records the telephone voice entered by the patient through the recording device of the call center, and stores the telephone voice as a logo of the telephone number to obtain real-time patient voice data. The call center can be, but is not limited to, a telephone recording platform of a hospital and a remote server connected by a mobile phone app. In addition, the application server 1 can also actively take patient voice data. For example, in a hospital, a nurse can use a special recording device to specifically collect voice data for a patient, and use the patient name (or other attribute data representing the patient identity information, For example, ID number, social security card number, etc.) are stored for identification.
步骤S403,对所述患者语音数据进行数据处理。具体地,所述对所述患者语音数据进行数据处理的步骤,将在本申请利用语音进行疾病预测的方法的第二实施例(参阅图5)进行详述。Step S403, performing data processing on the patient voice data. Specifically, the step of performing data processing on the patient voice data is described in detail in the second embodiment (see FIG. 5) of the method for predicting disease using voice in the present application.
步骤S404,将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层。Step S404, the processed patient voice data is sent to the input layer of the trained deep neural network model.
步骤S405,获取所述深度神经网络模型的输出层的输出状态。本实施例中,所述各个语音类别在所述深度神经网络模型中输出的期望状态为各个语音类别在所述训练后的深度神经网络模型中输出的期望概率。Step S405, acquiring an output state of an output layer of the deep neural network model. In this embodiment, the expected state output by the respective voice categories in the deep neural network model is a desired probability that each voice category outputs in the trained deep neural network model.
步骤S406,根据获取的所述输出状态判断所述患者语音数据所属的类别。Step S406, determining, according to the acquired output state, a category to which the patient voice data belongs.
为了清晰直观地获得患者语音数据所属的类别,在根据获取的所述输出状态判断所述患者语音数据所属的类别之前,所述应用服务器1还建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表,这样,所述应用服务器1将获取的所述输出状态与所述映射关系表中的期望状态进行匹配,根据匹配得到所述期望状态在所述映射关系表中对应的语音类别,就能判定所述患者语音数据对应的患者属于所述期望状态对应的语音类别。In order to clearly and intuitively obtain the category to which the patient voice data belongs, before determining the category to which the patient voice data belongs according to the acquired output state, the application server 1 also establishes each voice category and each voice category after the training. a mapping table between the expected states of the output in the deep neural network model, such that the application server 1 matches the acquired output state with the expected state in the mapping relationship table, and obtains the expectation according to the matching The status of the corresponding voice category in the mapping relationship table can determine that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
例如,患者语音数据输入到训练后的深度神经网络模型中得到的输出状态匹配了重度感冒这种语音类别在训练后的深度神经网络模型中的期望概率,则可以判定该患者属于重度感冒,进而为后续医生的诊断提供一定的数据支撑。For example, the output state obtained by inputting patient speech data into the trained deep neural network model matches the expected probability of the severe cold such speech class in the trained deep neural network model, and then the patient may be determined to be a severe cold, and further Provide some data support for the follow-up doctor's diagnosis.
通过上述步骤S401-406,本申请所提出的利用语音进行疾病预测的方法,首先,利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;其次,获取实时的患者语音数据;然后,对所述患者语音数据进行数据处理;接着,将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;另外,获取所述深度神经网络模型的输出层的输出状态;最后,根据获取的所述输出状态判断所述患者语音数据所属的类别。这样,可以避免现有技术中专家数量有限,病患需要等候很长的时间才能够得到自己的诊断结果,造成看病难、效率低的弊端,可以很方便的在患者进行正式治疗之前,通过患者的语音快速的对患者进行初步的诊断,进而为后续医生的正式诊断提供一定的数据支撑和参考,进而大大方便了医生和患者。Through the above steps S401-406, the method for predicting disease using speech proposed by the present application firstly trains a deep neural network model using training data, the training data has a specific speech category, and the deep neural network model has an input layer. And an output layer, the output layer may output a state of the voice category; secondly, acquire real-time patient voice data; and then perform data processing on the patient voice data; and then send the processed patient voice data to Entering an input layer of the deep neural network model after training; in addition, acquiring an output state of an output layer of the deep neural network model; and finally, determining a category to which the patient voice data belongs according to the acquired output state. In this way, it is possible to avoid the limited number of experts in the prior art, and the patient needs to wait for a long time to obtain his own diagnosis result, which causes the difficulty of medical treatment and low efficiency, and can be conveniently passed before the patient undergoes formal treatment. The voice quickly diagnoses the patient, which provides a certain data support and reference for the follow-up doctor's formal diagnosis, which greatly facilitates the doctor and the patient.
如图5所示,是本申请利用语音进行疾病预测的方法的第二实施例的流程图。本实施例中,所述对所述患者语音数据进行数据处理的步骤,包括:As shown in FIG. 5, it is a flowchart of a second embodiment of the method for predicting disease using speech using the present application. In this embodiment, the step of performing data processing on the patient voice data includes:
步骤S501,对获取的患者语音数据进行前端处理。具体地,所述前段处理包括降噪及端点检测。本实施例中,所述端点检测用于判断所需处理的患者语音数据是否为有效语音,如果不是有效语音,则不用对语音数据进行数据处理,进而提升整体系统的效率。Step S501, performing front end processing on the acquired patient voice data. Specifically, the front segment processing includes noise reduction and endpoint detection. In this embodiment, the endpoint detection is used to determine whether the patient voice data to be processed is valid voice. If it is not valid voice, the voice data is not processed, thereby improving the efficiency of the overall system.
步骤S502,对前段处理后的患者语音数据进行语音信号的特征值提取与选择。Step S502, performing feature value extraction and selection of the speech signal on the patient speech data processed in the previous stage.
本实施例中,所述应用服务器1所需要提取的特征值包括时域特征参数和频域特征参数,其中,时域特征参数包括短时平均能量、短时平均幅度、短时平均过零率、共振峰、基音频率等,频域特征参数包括线性预测系数LPC、线性预测倒谱系数LPCC、梅尔频率倒谱系数(Mel freguency cepstrum coefficient,MFCC)等。其中:基音频率反映声门激励特征,共振峰体现声道响应的特性,LPC、LPCC同时体现声门激励和声道响应的特性,MFCC模拟了人耳听觉特性。不同疾病(程度)的语音将有不同的特征参数值。因此,通过特征值的提取能够初步反映患者的疾病程度。In this embodiment, the feature values that the application server 1 needs to extract include time domain feature parameters and frequency domain feature parameters, wherein the time domain feature parameters include short time average energy, short time average amplitude, and short time average zero crossing rate. , formant, fundamental frequency, etc., frequency domain characteristic parameters include linear prediction coefficient LPC, linear prediction cepstral coefficient LPCC, Mel freguency cepstrum coefficient (MFCC) and so on. Among them: the basic audio frequency reflects the glottal excitation characteristics, the formant reflects the characteristics of the channel response, LPC and LPCC simultaneously reflect the characteristics of glottal excitation and channel response, and MFCC simulates the human auditory characteristics. Voices of different diseases (degrees) will have different characteristic parameter values. Therefore, the degree of disease of the patient can be initially reflected by the extraction of the eigenvalues.
通过上述步骤S501-502,本申请所提出的利用语音进行疾病预测的方法,通过对获取的患者语音数据进行前端处理,进而提升整体系统的效率。以及通过对前段处理后的患者语音数据进行语音信号的特征值提取与选择,初步反映患者的疾病程度。Through the above steps S501-502, the method for predicting disease using voice proposed by the present application improves the efficiency of the overall system by performing front-end processing on the acquired patient voice data. And through the extraction and selection of the eigenvalues of the speech signal from the patient speech data processed in the previous segment, the patient's disease degree is initially reflected.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是在本申请的申请构思下,利用本申请说明书及附图内容所作的等效结构变换,或直接/间接运用在其他相关的技术领域均包括在本申请的专利保护范围内。The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structural transformation, or direct/indirect use, of the present application and the contents of the drawings is used in the application of the present application. All other related technical fields are included in the patent protection scope of the present application.

Claims (20)

  1. 一种利用语音进行疾病预测的方法,应用于应用服务器,其特征在于,所述方法包括:A method for predicting disease using voice, applied to an application server, wherein the method comprises:
    利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
    获取实时的患者语音数据;Obtain real-time patient voice data;
    对所述患者语音数据进行数据处理;Performing data processing on the patient voice data;
    将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;Transmitting the processed patient voice data into an input layer of the trained deep neural network model;
    获取所述深度神经网络模型的输出层的输出状态;及Obtaining an output state of an output layer of the deep neural network model; and
    根据获取的所述输出状态判断所述患者语音数据所属的类别。The category to which the patient voice data belongs is determined according to the acquired output state.
  2. 如权利要求1所述的利用语音进行疾病预测的方法,其特征在于,在所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤之前,所述方法还包括:The method for predicting disease by voice according to claim 1, wherein before the step of determining the category to which the patient voice data belongs according to the acquired output state, the method further comprises:
    建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表。A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
  3. 如权利要求2所述的利用语音进行疾病预测的方法,其特征在于,所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤,包括:The method for predicting disease by voice according to claim 2, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:
    将获取的所述输出状态与所述映射关系表中的期望状态进行匹配;及Matching the obtained output state with a desired state in the mapping relationship table; and
    根据匹配得到所述期望状态在所述映射关系表中对应的语音类别,判定所述患者语音数据对应的患者属于所述期望状态对应的语音类别。And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
  4. 如权利要求1所述的利用语音进行疾病预测的方法,其特征在于,所述获取实时的患者语音数据的步骤,包括:The method for predicting disease using voice according to claim 1, wherein the step of acquiring real-time patient voice data comprises:
    通过呼叫中心的录音装置对患者打进的电话语音进行记录,并将所述电话语音以电话号码为标识进行存储。The telephone voice entered by the patient is recorded by the recording device of the call center, and the telephone voice is stored with the telephone number as an identifier.
  5. 如权利要求4所述的利用语音进行疾病预测的方法,其特征在于,在所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤之前,所述方法还包括:The method for predicting disease by voice according to claim 4, wherein before the step of determining the category to which the patient voice data belongs according to the acquired output state, the method further comprises:
    建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表。A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
  6. 如权利要求5所述的利用语音进行疾病预测的方法,其特征在于,所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤,包括:The method for predicting a disease by using a voice according to claim 5, wherein the step of determining a category to which the patient voice data belongs according to the obtained output state comprises:
    将获取的所述输出状态与所述映射关系表中的期望状态进行匹配;及Matching the obtained output state with a desired state in the mapping relationship table; and
    根据匹配得到所述期望状态在所述映射关系表中对应的语音类别,判定所述患者语音数据对应的患者属于所述期望状态对应的语音类别。And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
  7. 如权利要求1所述的利用语音进行疾病预测的方法,其特征在于,所述对所述患者语音数据进行数据处理的步骤,包括:The method for performing disease prediction using voice according to claim 1, wherein the step of performing data processing on the patient voice data comprises:
    对获取的患者语音数据进行前端处理,所述前段处理包括降噪及端点检测;及Performing front-end processing on the acquired patient voice data, the front-end processing including noise reduction and endpoint detection;
    对前段处理后的患者语音数据进行语音信号的特征值提取与选择。The feature value extraction and selection of the speech signal is performed on the patient speech data processed in the previous stage.
  8. 如权利要求7所述的利用语音进行疾病预测的方法,其特征在于,在所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤之前,所述方法还包括:The method for predicting disease by voice according to claim 7, wherein before the step of determining the category to which the patient voice data belongs according to the acquired output state, the method further comprises:
    建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表。A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
  9. 如权利要求8所述的利用语音进行疾病预测的方法,其特征在于,所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤,包括:The method for predicting disease by voice using the voice according to claim 8, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:
    将获取的所述输出状态与所述映射关系表中的期望状态进行匹配;及Matching the obtained output state with a desired state in the mapping relationship table; and
    根据匹配得到所述期望状态在所述映射关系表中对应的语音类别,判定所述患者语音数据对应的患者属于所述期望状态对应的语音类别。And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
  10. 一种应用服务器,其特征在于,所述应用服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的利用语音进行疾病预测的程序,所述利用语音进行疾病预测的程序被所述处理器执行时实现如下步骤:An application server, comprising: a memory, a processor, wherein the memory stores a program for performing disease prediction using voice on the processor, where the voice is used for disease prediction The program implements the following steps when executed by the processor:
    利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
    获取实时的患者语音数据;Obtain real-time patient voice data;
    对所述患者语音数据进行数据处理;Performing data processing on the patient voice data;
    将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;Transmitting the processed patient voice data into an input layer of the trained deep neural network model;
    获取所述深度神经网络模型的输出层的输出状态;及Obtaining an output state of an output layer of the deep neural network model; and
    根据获取的所述输出状态判断所述患者语音数据所属的类别。The category to which the patient voice data belongs is determined according to the acquired output state.
  11. 如权利要求10所述的应用服务器,其特征在于,在所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤之前,所述利用语音进行疾病预测的程序被所述处理器执行时,还实现如下步骤:The application server according to claim 10, wherein said program for performing disease prediction using speech is processed by said step of determining said category to which said patient voice data belongs based on said acquired output state When the device is executed, the following steps are also implemented:
    建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表。A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
  12. 如权利要求11所述的应用服务器,其特征在于,所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤,包括:The application server according to claim 11, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:
    将获取的所述输出状态与所述映射关系表中的期望状态进行匹配;及Matching the obtained output state with a desired state in the mapping relationship table; and
    根据匹配得到所述期望状态在所述映射关系表中对应的语音类别,判定所述患者语音数据对应的患者属于所述期望状态对应的语音类别。And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
  13. 如权利要求10所述的应用服务器,其特征在于,所述获取实时的患者语音数据的步骤,包括:The application server according to claim 10, wherein the step of acquiring real-time patient voice data comprises:
    通过呼叫中心的录音装置对患者打进的电话语音进行记录,并将所述电话语音以电话号码为标识进行存储。The telephone voice entered by the patient is recorded by the recording device of the call center, and the telephone voice is stored with the telephone number as an identifier.
  14. 如权利要求13所述的应用服务器,其特征在于,在所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤之前,所述利用语音进行疾病预测的程序被所述处理器执行时,还实现如下步骤:The application server according to claim 13, wherein said program for performing disease prediction using speech is processed by said step of determining said category to which said patient voice data belongs based on said acquired output state When the device is executed, the following steps are also implemented:
    建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表。A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
  15. 如权利要求14所述的应用服务器,其特征在于,所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤,包括:The application server according to claim 14, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:
    将获取的所述输出状态与所述映射关系表中的期望状态进行匹配;及Matching the obtained output state with a desired state in the mapping relationship table; and
    根据匹配得到所述期望状态在所述映射关系表中对应的语音类别,判定所述患者语音数据对应的患者属于所述期望状态对应的语音类别。And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
  16. 如权利要求10所述的应用服务器,其特征在于,所述对所述患者语音数据进行数据处理的步骤,包括:The application server according to claim 10, wherein the step of performing data processing on the patient voice data comprises:
    对获取的患者语音数据进行前端处理,所述前段处理包括降噪及端点检测;及Performing front-end processing on the acquired patient voice data, the front-end processing including noise reduction and endpoint detection;
    对前段处理后的患者语音数据进行语音信号的特征值提取与选择。The feature value extraction and selection of the speech signal is performed on the patient speech data processed in the previous stage.
  17. 如权利要求16所述的应用服务器,其特征在于,在所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤之前,所述利用语音进行疾病预测的程序被所述处理器执行时,还实现如下步骤:The application server according to claim 16, wherein said program for performing disease prediction using speech is processed by said step of determining said category to which said patient voice data belongs based on said acquired output state When the device is executed, the following steps are also implemented:
    建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表。A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
  18. 如权利要求17所述的应用服务器,其特征在于,所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤,包括:The application server according to claim 17, wherein the step of determining the category to which the patient voice data belongs according to the acquired output state comprises:
    将获取的所述输出状态与所述映射关系表中的期望状态进行匹配;及Matching the obtained output state with a desired state in the mapping relationship table; and
    根据匹配得到所述期望状态在所述映射关系表中对应的语音类别,判定所述患者语音数据对应的患者属于所述期望状态对应的语音类别。And determining, according to the matching, the corresponding voice category in the mapping relationship table, determining that the patient corresponding to the patient voice data belongs to the voice category corresponding to the desired state.
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有利用语音进行疾病预测的程序,所述利用语音进行疾病预测的程序可被至少一个处理器执行,以使所述至少一个处理器执行以下步骤:A computer readable storage medium, characterized in that the computer readable storage medium stores a program for disease prediction using voice, the program for predicting disease using voice can be executed by at least one processor to cause the At least one processor performs the following steps:
    利用训练数据训练深度神经网络模型,所述训练数据具有特定的语音类别,所述深度神经网络模型具有输入层和输出层,所述输出层可以输出所述语音类别的状态;Training a deep neural network model with training data, the training data having a specific speech category, the deep neural network model having an input layer and an output layer, the output layer may output a state of the speech category;
    获取实时的患者语音数据;Obtain real-time patient voice data;
    对所述患者语音数据进行数据处理;Performing data processing on the patient voice data;
    将处理后的所述患者语音数据送入训练后的所述深度神经网络模型的输入层;Transmitting the processed patient voice data into an input layer of the trained deep neural network model;
    获取所述深度神经网络模型的输出层的输出状态;及Obtaining an output state of an output layer of the deep neural network model; and
    根据获取的所述输出状态判断所述患者语音数据所属的类别。The category to which the patient voice data belongs is determined according to the acquired output state.
  20. 如权利要求19所述的计算机可读存储介质,其特征在于,在所述根据获取的所述输出状态判断所述患者语音数据所属的类别的步骤之前,所述利用语音进行疾病预测的程序被所述至少一个处理器执行时,还实现如下步骤:The computer readable storage medium according to claim 19, wherein said step of using the voice for disease prediction is performed before said step of determining said category to which said patient voice data belongs based on said acquired output state When the at least one processor executes, the following steps are also implemented:
    建立各个语音类别与各个语音类别在所述训练后的深度神经网络模型中输出的期望状态之间的映射关系表。A mapping relationship table between each voice category and a desired state of each voice category outputted in the trained deep neural network model is established.
PCT/CN2018/089428 2017-10-23 2018-06-01 Voice-based disease prediction method, application server, and computer readable storage medium WO2019080502A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710995691.7A CN108053841A (en) 2017-10-23 2017-10-23 The method and application server of disease forecasting are carried out using voice
CN201710995691.7 2017-10-23

Publications (1)

Publication Number Publication Date
WO2019080502A1 true WO2019080502A1 (en) 2019-05-02

Family

ID=62119669

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/089428 WO2019080502A1 (en) 2017-10-23 2018-06-01 Voice-based disease prediction method, application server, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN108053841A (en)
WO (1) WO2019080502A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022167243A1 (en) * 2021-02-05 2022-08-11 Novoic Ltd. Speech processing method for identifying data representations for use in monitoring or diagnosis of a health condition

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053841A (en) * 2017-10-23 2018-05-18 平安科技(深圳)有限公司 The method and application server of disease forecasting are carried out using voice
CN108518817A (en) * 2018-04-10 2018-09-11 珠海格力电器股份有限公司 A kind of autonomous adjustment control method, device and air-conditioning system
CN109431507A (en) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 Cough disease identification method and device based on deep learning
KR20220024217A (en) * 2019-05-30 2022-03-03 인슈어런스 서비시스 오피스, 인코포레이티드 Systems and methods for machine learning of speech properties
CN110473616B (en) * 2019-08-16 2022-08-23 北京声智科技有限公司 Voice signal processing method, device and system
CN112259126B (en) * 2020-09-24 2023-06-20 广州大学 Robot and method for assisting in identifying autism voice features
CN116530944B (en) * 2023-07-06 2023-10-20 荣耀终端有限公司 Sound processing method and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102342858A (en) * 2010-08-06 2012-02-08 上海中医药大学 Chinese medicine sound diagnosis acquisition and analysis system
WO2016192612A1 (en) * 2015-06-02 2016-12-08 陈宽 Method for analysing medical treatment data based on deep learning, and intelligent analyser thereof
CN106709254A (en) * 2016-12-29 2017-05-24 天津中科智能识别产业技术研究院有限公司 Medical diagnostic robot system
CN106710599A (en) * 2016-12-02 2017-05-24 深圳撒哈拉数据科技有限公司 Particular sound source detection method and particular sound source detection system based on deep neural network
CN108053841A (en) * 2017-10-23 2018-05-18 平安科技(深圳)有限公司 The method and application server of disease forecasting are carried out using voice

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739869B (en) * 2008-11-19 2012-03-28 中国科学院自动化研究所 Priori knowledge-based pronunciation evaluation and diagnosis system
WO2013142908A1 (en) * 2012-03-29 2013-10-03 The University Of Queensland A method and apparatus for processing patient sounds
CN103578470B (en) * 2012-08-09 2019-10-18 科大讯飞股份有限公司 A kind of processing method and system of telephonograph data
CN104347066B (en) * 2013-08-09 2019-11-12 上海掌门科技有限公司 Recognition method for baby cry and system based on deep-neural-network
US9687208B2 (en) * 2015-06-03 2017-06-27 iMEDI PLUS Inc. Method and system for recognizing physiological sound
CN105869658B (en) * 2016-04-01 2019-08-27 金陵科技学院 A kind of sound end detecting method using nonlinear characteristic
CN105869627A (en) * 2016-04-28 2016-08-17 成都之达科技有限公司 Vehicle-networking-based speech processing method
CN106778014B (en) * 2016-12-29 2020-06-16 浙江大学 Disease risk prediction modeling method based on recurrent neural network
CN107068167A (en) * 2017-03-13 2017-08-18 广东顺德中山大学卡内基梅隆大学国际联合研究院 Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102342858A (en) * 2010-08-06 2012-02-08 上海中医药大学 Chinese medicine sound diagnosis acquisition and analysis system
WO2016192612A1 (en) * 2015-06-02 2016-12-08 陈宽 Method for analysing medical treatment data based on deep learning, and intelligent analyser thereof
CN106710599A (en) * 2016-12-02 2017-05-24 深圳撒哈拉数据科技有限公司 Particular sound source detection method and particular sound source detection system based on deep neural network
CN106709254A (en) * 2016-12-29 2017-05-24 天津中科智能识别产业技术研究院有限公司 Medical diagnostic robot system
CN108053841A (en) * 2017-10-23 2018-05-18 平安科技(深圳)有限公司 The method and application server of disease forecasting are carried out using voice

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022167243A1 (en) * 2021-02-05 2022-08-11 Novoic Ltd. Speech processing method for identifying data representations for use in monitoring or diagnosis of a health condition

Also Published As

Publication number Publication date
CN108053841A (en) 2018-05-18

Similar Documents

Publication Publication Date Title
WO2019080502A1 (en) Voice-based disease prediction method, application server, and computer readable storage medium
US20180261236A1 (en) Speaker recognition method and apparatus, computer device and computer-readable medium
US20200380957A1 (en) Systems and Methods for Machine Learning of Voice Attributes
US10270736B2 (en) Account adding method, terminal, server, and computer storage medium
KR20190022432A (en) ELECTRONIC DEVICE, IDENTIFICATION METHOD, SYSTEM, AND COMPUTER READABLE STORAGE MEDIUM
WO2021000408A1 (en) Interview scoring method and apparatus, and device and storage medium
WO2019136909A1 (en) Voice living-body detection method based on deep learning, server and storage medium
US20090326937A1 (en) Using personalized health information to improve speech recognition
CN112562691A (en) Voiceprint recognition method and device, computer equipment and storage medium
CN109299227B (en) Information query method and device based on voice recognition
CN111933291A (en) Medical information recommendation device, method, system, equipment and readable storage medium
US11749298B2 (en) Health-related information generation and storage
WO2021159902A1 (en) Age recognition method, apparatus and device, and computer-readable storage medium
WO2020233381A1 (en) Speech recognition-based service request method and apparatus, and computer device
WO2022047311A1 (en) Computerized decision support tool and medical device for respiratory condition monitoring and care
WO2021159755A1 (en) Smart diagnosis and treatment data processing method, device, apparatus, and storage medium
CN110752027A (en) Electronic medical record data pushing method and device, computer equipment and storage medium
WO2019187107A1 (en) Information processing device, control method, and program
CN110767282B (en) Health record generation method and device and computer readable storage medium
WO2022205249A1 (en) Audio feature compensation method, audio recognition method, and related product
CN114141251A (en) Voice recognition method, voice recognition device and electronic equipment
CN110858819A (en) Corpus collection method and device based on WeChat applet and computer equipment
CN111967235A (en) Form processing method and device, computer equipment and storage medium
CN114201580A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112927413A (en) Medical registration method, medical registration device, medical registration equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18871385

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.09.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18871385

Country of ref document: EP

Kind code of ref document: A1