JP2004240394A - Speaker voice analysis system and server device used therefor, medical examination method using speaker voice analysis, and speaker voice analyzer - Google Patents

Speaker voice analysis system and server device used therefor, medical examination method using speaker voice analysis, and speaker voice analyzer Download PDF

Info

Publication number
JP2004240394A
JP2004240394A JP2003182824A JP2003182824A JP2004240394A JP 2004240394 A JP2004240394 A JP 2004240394A JP 2003182824 A JP2003182824 A JP 2003182824A JP 2003182824 A JP2003182824 A JP 2003182824A JP 2004240394 A JP2004240394 A JP 2004240394A
Authority
JP
Japan
Prior art keywords
speaker voice
analysis
voice
user
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2003182824A
Other languages
Japanese (ja)
Inventor
Hiroshi Tanimoto
広志 谷本
Original Assignee
Sense It Smart Corp
株式会社センス・イット・スマート
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to JP2002361526 priority Critical
Application filed by Sense It Smart Corp, 株式会社センス・イット・スマート filed Critical Sense It Smart Corp
Priority to JP2003182824A priority patent/JP2004240394A/en
Publication of JP2004240394A publication Critical patent/JP2004240394A/en
Application status is Pending legal-status Critical

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To enable a user to easily check own health state in any place desired. <P>SOLUTION: By recording speaker voices inputted from users' cellular phones 1a, 1b in a database 7 through a CTI server 5 and a DB server 6, determining a feature level presenting a degree of user's fatigue by chaos-analyzing the recorded speaker voices by an analysis server 8, and presenting a result of the analysis to the cellular phones 1a, 1b through a WEB server 4, it is thereby made possible to provide "a degree of fatigue" as a clear numeric value that has been presented only in an ambiguous expression so far, only by recording users' voices by using the cellular phones in any place. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speaker voice analysis system and a server device used for the same, a health diagnosis method using speaker voice analysis, and a speaker voice analysis program. The present invention relates to a method of analyzing a fluctuation of a user's voice that changes irregularly, and calculating a state of health or mental state as a numerical value.
[0002]
[Prior art]
In recent years, it has been a health-oriented boom. Occasionally, various health-related products have been marketed, and many people have been paying more attention to their health than before. By the way, it is said that many of the causes of a semi-healthy state are due to stress prevailing in modern society. Not only adults but also young people are under stress, and many diseases are caused by it.
[0003]
Under such circumstances, efforts have been made to check one's own health condition. As a measuring device therefor, a weight scale, a body fat percentage meter, a sphygmomanometer, a pulse meter and the like are provided.
[0004]
[Problems to be solved by the invention]
However, although a weight scale, a body fat percentage meter, and the like can be used relatively easily, there is a problem that they can only be used in a home where the weight scale is placed and cannot be freely measured at any time while going out. In addition, there is also a problem that it is practically impossible to grasp the health condition at that time using only the measured values.
[0005]
In addition, blood pressure monitors and pulse meters cannot be freely measured anytime, anywhere.In addition, to measure, remove the device from the storage place, wrap the band around your arm, turn on the switch and wait for a while. There was also a problem that the user had to wait for time and handling was very troublesome.
[0006]
The present invention has been made to solve such a problem, and an object of the present invention is to make it possible to easily check one's own health condition at any place.
[0007]
[Means for Solving the Problems]
A speaker voice analysis system of the present invention is a system in which a server device and a client device are configured to be connectable via a network, and the server device converts a user's speaker voice input from the client device. A speaker voice obtaining unit that obtains from the client device via the network; and the speaker voice obtained by the speaker voice obtaining unit is analyzed to obtain an index value representing a health state, a mental state, and the like of the user. A speaker voice analyzing unit; and an analysis result providing unit configured to provide an analysis result by the speaker voice analyzing unit to the client device. The client device includes a voice input unit configured to input a speaker voice of the user. Speaker voice providing means for providing the speaker voice input by the voice input means to the server device; The results provide means comprising the analysis result acquisition and output means for outputting to acquire the analysis result provided from the server device.
[0008]
In another aspect of the present invention, the speaker voice analysis unit calculates a feature amount that calculates a state vector that is a chaotic feature amount of the voice based on the speaker voice acquired by the speaker voice acquisition unit. Means, a state vector obtained by the feature amount calculating means, and a neural network operation using the state vector and a plurality of coefficients to perform an index value representing the user's health state, mental state, etc. And a neural network calculating means for determining
[0009]
In another aspect of the present invention, target value calculation means for calculating a target value related to an index value representing the user's health state, mental state, etc., and the neural network when the state vector is given to the input of the neural network Learning means for optimizing a plurality of coefficients in the neural network by minimizing an error between an output value of the network and a target value obtained by the target value calculating means. .
[0010]
In another aspect of the present invention, the target value calculating means performs a predetermined calculation using a data value obtained as a result of a flicker test and a survey in which a subject answers the degree of fatigue in a questionnaire format by himself / herself. It is characterized in that a target value of the degree is calculated.
[0011]
Further, the server device of the present invention includes: a speaker voice acquiring unit that acquires a user's speaker voice input from a client device from the client device via a network; and the speaker acquired by the speaker voice acquiring unit. A speaker voice analyzing unit that analyzes a voice and obtains an index value representing a health state, a mental state, and the like of the user; and an analysis result providing unit that provides an analysis result by the speaker voice analyzing unit to the client device. It is characterized by having.
[0012]
Also, a health diagnosis method using speaker voice analysis according to the present invention includes a speaker voice transmitting step of inputting a user's speaker voice at a client device and transmitting the voice to a server device via a network. A speaker voice receiving step in which the server device receives the speaker voice transmitted in the voice transmitting step; and analyzing the speaker voice received in the speaker voice receiving step, and a health state and a mental state of the user. A speaker voice analysis step for obtaining an index value representing the like, an analysis result providing step for providing the result analyzed in the speaker voice analysis step to the client device, and an analysis provided in the analysis result providing step And an analysis result output step of obtaining and outputting the result by the client device.
[0013]
The speaker voice analysis program according to the present invention may further include a speaker voice obtaining unit that obtains the user's speaker voice input from the client device via the network from the client device, wherein the speaker voice obtaining unit obtains the speaker voice. A speaker voice analyzing means for analyzing a speaker voice to obtain an index value representing a health state, a mental state, etc. of the user, and providing an analysis result by the speaker voice analyzing means to the client device It is for making a computer function as a means.
[0014]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing an example of the overall configuration of the speaker voice analysis system according to the present embodiment.
[0015]
In FIG. 1, reference numerals 1a and 1b denote mobile phones used by a user and have a call function and an Internet connection function. 2a is a mobile phone packet network, 2b is the Internet, and 2c is a public line network. Reference numeral 3 denotes a router having a firewall function, 4 denotes a WEB server, 5 denotes a CTI (Computer Telephony Integration) server, 6 denotes a DB server, 7 denotes a database (DB), and 8 denotes an analysis server.
[0016]
The WEB server 4 provides a WWW browser screen as an interface for the user to the mobile phones 1a and 1b, obtains necessary information from the mobile phones 1a and 1b through the browser screen, and outputs an analysis result of a speaker voice. I do. The information obtained through the WWW browser screen includes user's personal information (name, e-mail address, mobile phone number, gender and other basic information, date of birth, blood type, hometown, physical condition, personality, password, etc.) ), And various menu operation information.
[0017]
The CTI server 5 automatically reproduces a voice that has been recorded in advance in accordance with the dialing operation of the mobile phones 1a and 1b, performs an automatic response using the voice, and acquires necessary information from the mobile phones 1a and 1b. . The information acquired here is the speaker's voice of the user. The speaker's voice may be any language, but it is assumed that a voice of, for example, about 2 to 10 seconds is acquired. Preferably, a preliminary experiment is performed to specify a word from which the fluctuation (chaosity) of voice is easily extracted, and to utter the word.
[0018]
On the browser screen provided to the mobile phones 1a and 1b by the WEB server 4 described above, "<a href =" tel: telephone number so that the user can easily make a call to the CTI server 5. A tag “>” is described in HTML. When the user selects the link “TEL” on the browser screen, the mobile phones 1 a and 1 b automatically call the CTI server 5. The CTI server 5 automatically receives an incoming call by this automatic call function. Thereafter, the voice of the user is recorded by performing an automatic response by voice, and is registered in the database 7 through the DB server 6.
[0019]
The DB server 6 manages information shared among the web server 4, the CTI server 5, and the analysis server 8. For example, the data (personal information data, speaker voice data, etc.) obtained from the mobile phones 1a and 1b through the web server 4 and the CTI server 5, and the analysis result of the speaker voice performed by the analysis server 8 are stored in the database 7. I do. In addition, in response to a request from the mobile phones 1a and 1b, the analysis result is extracted from the database 7 and provided to the WEB server 4.
[0020]
The analysis server 8 monitors an analysis request from the CTI server 5, and executes a predetermined analysis process when the request is detected. That is, the analysis server 8 reads, from the database 7, the speaker voice requested to be analyzed from the CTI server 5, and performs chaos analysis on this. Then, the analysis result is supplied to the DB server 6 and stored in the database 7. The details of the chaos analysis will be described later.
[0021]
The router 3, the web server 4, the CTI server 5, the DB server 6, the database 7, and the analysis server 8 constitute a server device 10 of the present embodiment. Each server constituting the server apparatus 10 is actually provided with a CPU or an MPU, a RAM, a ROM, and the like of a computer, and can be realized by operating a program stored in the RAM or the ROM.
[0022]
Therefore, the present invention can be realized by recording a program that causes a computer to perform the functions of the present embodiment on a recording medium such as a CD-ROM, and reading the program into the computer. As a recording medium for recording the program, a flexible disk, a hard disk, a magnetic tape, an optical disk, a magneto-optical disk, a DVD, a nonvolatile memory card, and the like can be used in addition to the CD-ROM. Further, the present invention can also be realized by downloading the program to a computer via a network such as the Internet 2b.
[0023]
Further, in order to realize the functions of the server device 10 according to the present embodiment in a network environment, all or some of the programs may be executed by another computer.
[0024]
The functions of the present embodiment are not only realized by the computer executing the supplied program, but also the program is executed in cooperation with an OS (Operating System) or other application software running on the computer. In the case where the functions of the present embodiment are realized, or in the case where all or a part of the processing of the supplied program is performed by a function expansion board or a function expansion unit of a computer, the functions of the present embodiment are realized. The program is included in this embodiment.
[0025]
Next, the operation of the speaker voice analysis system according to the present embodiment configured as described above will be described. FIG. 2 is a sequence flowchart showing the overall operation of the analysis system. As shown in FIG. 2, first, the user accesses the WEB server 4 from the mobile phones 1a and 1b, performs a menu operation on a browser screen provided by the user, and enters a fatigue diagnosis site (step S1). .
[0026]
FIG. 3 is a diagram showing an example of a menu screen displayed on the mobile phones 1a and 1b. In the top menu shown in FIG. 3 (a), when the item "latest information" or "reception check" is selected and the menu operation is further continued, an item "diagnosis" (not shown) appears. By selecting this item, a screen as shown in FIG. 3B is displayed, and the user is ready to start a self-check.
[0027]
Next, when the user selects the item "call" on the screen of FIG. 3B (step S2), the mobile phones 1a and 1b automatically call the CTI server 5 in response to the selection. Apply (step S3). When the CTI server 5 automatically receives a call made by the automatic call function of the WEB server 4, the CTI server 5 automatically replies to the mobile phones 1a and 1b by voice so as to prompt the user to record a voice (step). S4).
[0028]
The user inputs his / her voice in accordance with the voice guidance provided by the CTI server 5, and then presses the “#” button. Thereby, the CTI server 5 acquires the speaker's voice of the user and stores it in the database 7 (step S5). Then, a recording end message is output to the mobile phones 1a and 1b, and the line with the mobile phones 1a and 1b is disconnected (step S6).
[0029]
FIG. 4 is a flowchart showing the details of the processing in steps S4 to S6. In FIG. 4, the CTI server 5 monitors whether there is an incoming call from the mobile phones 1a and 1b (step S11), and when there is an incoming call, first, an opening message (“Your voice is registered and checked. Please operate according to the guidance. ") Is output (step S12).
[0030]
Next, the CTI server 5 requests the user to record the voice by sending a message such as "Register the voice after the dial tone and press #." (Step S13). In response to this, the user inputs his / her own voice and then presses the “#” button to record the voice (step S14). Next, the CTI server 5 reproduces the recorded voice (step S15), and confirms with the user whether or not the content is acceptable (step S16).
[0031]
For example, a message is issued so as to press the push button "#" if the content is good and "9" to change the content. When the "9" push button is pressed, the process returns to step S13, and the voice recording is performed again. When the "#" push button is pressed, an end message such as "voice has been registered" is output (step S17), and the line is disconnected (step S18).
[0032]
Returning to FIG. 2, when the recording of the speaker's voice is completed as described above, the CTI server 5 requests the analysis server 8 to analyze the recorded speaker's voice (step S7). The analysis server 8 that has received the analysis request performs a chaos analysis process described later in detail, and stores the analysis result in the database 7 (step S8). Thereafter, when the user accesses the WEB server 4 and selects the item “diagnosis result display” from the menu screens of the mobile phones 1a and 1b, the WEB server 4 takes out the requested analysis result from the database 7 and displays it ( Step S9).
[0033]
5 and 6 are flowcharts showing the operation of the analysis server 8. 5 is a flowchart showing a main operation of the analysis server 8, and FIG. 6 is a flowchart showing a detailed operation of the chaos analysis processing. In FIG. 5, when the power is first turned on, the analysis server 8 performs predetermined initialization processing such as system information setting (step S21).
[0034]
Next, the analysis server 8 performs an end check of this processing (step S22), and determines whether or not there is an end request (step S23). If there is no end request, a chaos analysis process is performed (step S24). After that, after a sleep state for a predetermined time (step S25), the process returns to step S22. On the other hand, if there is a request for ending this processing, predetermined post-processing is performed (step S26), and this processing ends.
[0035]
The chaos analysis processing in step S24 is performed according to the flowchart in FIG. In FIG. 6, the analysis server 8 performs an analysis request check (step S31), and determines whether there is an analysis request from the CTI server 5 (step S32). If there is no analysis request, the process exits the chaos analysis process. On the other hand, when there is an analysis request, the analysis data is extracted from the speaker voice data recorded in the database 7 (step S33).
[0036]
The extraction of the analysis data is performed as follows. That is, first, from the time-series data of the sound recorded in the database 7, data for a predetermined number of sample points is extracted from the central part of the time-series for a silent part removing process. For example, the maximum value of the absolute value of the audio time-series data is obtained, and if the data value is less than one-fifth (20%) of the previously obtained maximum value for 20 consecutive points from one point, those points are determined. Omitted. In addition, when there is data of one-fifth (20%) or more of the maximum value even at one point, an intermediate point of the 20 points is set as output data. This output data is extracted from the central part of all audio time-series data by the number of sample points specified in advance.
[0037]
After extracting the time-series analysis data, the analysis server 8 performs chaos calculation on the time-series data to obtain a feature amount of the speaker's voice (step S34). In this chaos calculation, first, three parameters of Lyapunov exponent (L), entropy (E), and F-constant (F) are calculated, and the calculated parameters are input to a neural network program, thereby obtaining a speaker voice. Digitize time series data.
[0038]
The Lyapunov exponent is xn + 1= F (xn) Means the degree of divergence of two trajectories starting from two close points at time n → ∞, and is defined by the following (Equation 1). Here, N is the total number of reconstructed vectors.
[0039]
(Equation 1)
[0040]
The entropy means a quantitative measure of the irregularity of the system or the amount of information necessary to specify the state of the system, and is defined by the following (Equation 2). Here, a virtual statistical system in which a certain measurement result is always on a unit section is considered, and this section is divided into N small sections. And if the i-th subsection contains a certain range of possible results, establish it with PiCan be assigned. In simple terms, it can be said that the smaller the entropy, the more organized the amount of information that is useful, and the larger the entropy, the more disorganized and useless the amount of information.
[0041]
(Equation 2)
[0042]
The fractal dimension is an extension of the concept of a normal dimension to a non-integer region. Higuchi fractal dimension, Hausdorff dimension, correlation dimension, and the like have been proposed. Among them, the Higuchi fractal dimension means the degree of geometrical complexity when the time-series waveform is regarded as a one-dimensional geometric structure, and takes a larger value as the structure becomes more complicated, that is, as the fluctuation becomes larger.
[0043]
In the algorithm for calculating the Higuchi fractal dimension, time series data X (1), X (2),..., X (N) sampled at equal time intervals are used as input data. First, from the input time-series data, new time-series data X as shown in the following (Equation 3) is obtained.m kmake. Here, m is an initial time, k is a time interval, and [] represents a Gaussian symbol (an integer not exceeding (N−m) / k).
[0044]
(Equation 3)
[0045]
According to this algorithm, k sets of time-series data are eventually created. For example, if k = 3 and N = 100,
X1 3: X (1), X (4), X (7), ..., X (97), X (100)
X2 3: X (2), X (5), X (8),..., X (98)
X3 3: X (3), X (6), X (9), ..., X (99)
Are generated.
[0046]
Next, the time series data Xm kLength L of the curvem(K) is defined as the following (Equation 4). Here, the term (N−1) / {[(N−m) / k] · k} is a coefficient for standardizing the length of a time-series curve.
[0047]
(Equation 4)
[0048]
k sets of time series data Xm kOf the curve L obtained formLet the average of (k) be <L (k)>, which is defined as the length of the curve at time interval k. If <L (k)> ∝k-DHolds, D is a fractal dimension. That is, the horizontal axis is log10k, vertical axis is log10Points are plotted as <L (k)> to determine the slope of the straight line portion, and a value obtained by multiplying the slope by −1 is the Higuchi fractal dimension.
[0049]
F-constant shows the calculation result of Higuchi fractal dimension on the horizontal axis log.10The inclination is calculated by separating k into 10 or less and 10 or more, and D1 and D2 are obtained. That is, D1 is a fractal dimension on a micro time scale, and D2 is a fractal dimension on a macro time scale. As described above, F-constant represents a relationship between a fractal dimension of a small scale and a fractal dimension of a large scale.
[0050]
Conventionally, it has been considered that maintaining a certain state of a person is healthy. However, recent studies have shown that rather than maintaining a steady state, moderate "fluctuations" are healthier and more adaptable to external factors (homeodynamics ). Thus, in the present embodiment, data (Lyapunov exponent, entropy, F-constant) specific to the person is extracted from the voice recorded by the mobile phones 1a and 1b, and the extracted data is analyzed by a neural network. Is expressed by a numerical value.
[0051]
FIG. 7 is a diagram illustrating an example of a neural network operation according to the present embodiment. As shown in FIG. 7, the neural network according to the present embodiment has a four-layer structure including one input layer, two intermediate layers, and one output layer. By inputting the state vector (L, E, F), which is a chaotic feature of speech, to the input layer, and changing the weighting factor of the connection connecting the intermediate layers, an appropriate numerical value can be obtained from the output layer. Output.
[0052]
Then, a value having the largest value among the plurality of output numerical values A-1 to A-3 is determined as a numerical value A representing the user's degree of fatigue. The numerical value A representing the fatigue level is normalized between 0 and 1. Based on this, the user's fatigue level is represented by any one of 0 to 100 as shown in FIG. 8, for example. You. Although FIG. 7 shows only four nodes for the first intermediate layer, three nodes for the second intermediate layer, and three nodes for the output layer for the sake of simplicity of description, more nodes (for example, The hierarchical structure may be configured to include one intermediate layer, the second intermediate layer, and the output layer (100 each).
[0053]
In the example of FIG. 7, a sigmoid function is used for the operation in the neural network. As the initial data of the interlayer connection coefficient used here, for example, a preliminary test is performed on about 50 to 100 subjects before operation of the system, and an appropriate value calculated according to the result is obtained before operation. Register with the system. The details of the pre-test will be described below.
[0054]
In the preliminary test, first, a flicker test is performed on the test subject, and a subjective assessment of the degree of fatigue is performed (a survey in which the test subject answers the degree of fatigue in a questionnaire form). In the flicker test, a subject is made to look directly at discontinuous blinking light, and the blinking frequency is gradually changed. Then, the test is to measure the blinking frequency when the subject starts to feel the flickering of the blinking light or when the subject no longer feels the flickering.
[0055]
For example, when the frequency of the blinking is gradually increased, the subject watching this gradually becomes unable to feel the blinking. The flicker frequency at which flickering is no longer felt is obtained as a flicker value. Conversely, the blinking frequency may be gradually lowered, and the blinking frequency at which the subject looking at it begins to feel the flicker of light may be obtained as the flicker value.
[0056]
The flicker value is used as an index of mental fatigue and arousal level of the central nervous system. The smaller the value, the more tired you can evaluate. In order to make the obtained flicker value more objective, it is preferable to perform a flicker test a plurality of times and take the average of the flicker values. By the way, it is said that the average of the flicker value at the normal time when there is not much fatigue is about 38 Hz although there are individual differences.
[0057]
In addition, the subjective condition survey is a questionnaire survey for investigating the degree of subjective fatigue felt by the subject at that time, and is created by the Japan Society for Occupational Health. The contents of the questionnaire are divided into the following five item groups.
Group I Sleepiness: sleepy, yawning, poorly motivated, loose whole body, etc.
Group II Anxiety: Anxiety, depressed mood, restlessness, irritability, etc.
Group III Discomfort: headache, heavy head, sickness, blurred head, etc.
IV group feeling of sloppy: loose arms, lower back pain, loose legs, stiff shoulders, etc.
Group V Bluriness: Eyes are blurred, eyes are tired, eyes are dry, things are blurred, etc.
[0058]
The subject answers a plurality of questions provided for each of these five item groups at a level of 1 to 5 depending on the degree of feeling. Then, an average value of scores is obtained for each of the five item groups, and the fatigue status is evaluated for each item group. The higher the score, the more tired you can evaluate.
[0059]
Both the flicker value and the scores for subjective symptoms described above are reliable as indexes for evaluating the degree of fatigue. In the present embodiment, in order to obtain a more objective fatigue level value using these indices, a numerical value (any one of 0 to 100) representing the fatigue level of the subject based on the following (Equation 5) Is calculated).
Fatigue level = −Flicker average value + 0.5 × Group I average score + 0.5 × Group II average score + 4 × Group III average score + 0.5 × Group IV average score + 0.5 × Group V average score + 25 Equation 5)
Note that this (Equation 5) is merely an example, and the present invention is not limited to this calculation content.
[0060]
Next, the voice data of the subject is recorded. The state vector (L, E, F) is calculated for the recorded voice data, and the calculated state vector is input to the input layer of the neural network shown in FIG. Then, the neural network is trained by the back propagation learning rule so that a value as close as possible to the numerical value representing the degree of fatigue calculated by the above (Equation 5) is output from the output layer. By learning about 50 to 100 subjects, even if the state vector of an unknown user is input, an almost correct degree of fatigue can be output for the user.
[0061]
Learning of the neural network by the back propagation method (back propagation method) involves connecting each connection so as to minimize the error between the output data (fatigue degree value obtained from the voice data of the subject through the neural network) and the teacher data. Adjust the coefficient of. Details of the setting error will be described later.
[0062]
That is, in the back propagation method, when a fatigue degree value calculated by (Equation 5) for a certain subject and a state vector (L, E, F) based on voice data recorded for the same subject are output to the input layer, Each of the coupling coefficients W1 to W3 is changed from the output layer toward the input layer so as to reduce an error with the fatigue value output from the layer.
[0063]
The input data to the neural network is X (x1, X2, ..., xj), And the coupling coefficient is W (w1, W2, ..., wj), The sum of these connections becomes the neuron state S, and is represented by the following (Equation 6).
S = x1・ W1+ X2・ W2+ ... + xj・ Wj  ... (Equation 6)
This neuron state S is further processed by the activation function f (s). f (s) is defined by a sigmoid function represented by the following (Equation 7), which makes it possible to handle inputs and outputs as continuous values from 0 to 1.
f (s) = 1 / (1 + e-S) (Equation 7)
[0064]
When output data Y = f (s) is given to the output layer of the neural network, a change σ when the coupling coefficient W is changed is obtained. The change σ is calculated by multiplying the change Δf (s) of the sigmoid function expressed by the following (Equation 8) by the error E between the output data of the neuron and the teacher data (Equation 9). ). Adjustment of the coupling coefficient W using the change σ is learning.
[0065]
(Equation 5)
[0066]
The error E is represented by a decimal number from 0 to 1 by multiplying the error E by the variation Δf (s) of the sigmoid function described above. At this time, when the error E is large, the variation σ takes a large value, and when the error E is small, the variation σ takes a small value. If the coupling coefficient W3 of the neuron connected to the output layer is to be changed, the modification of the coupling coefficient W3 is as in the following (Equation 10).
W3ij(T + 1) = W3ij(T) + a × σj  ... (Equation 10)
Here, a is a coupling constant less than 1, and is usually set to 0.8.
[0067]
After updating the coupling coefficient W3 belonging to the output layer, the coupling coefficient W2 belonging to the intermediate layer is changed. Here, a new variation σ ′ is generated from the above (Equation 10). The change σ ′ is obtained by multiplying the total sum of the change coefficient σ and the coupling coefficient W3 from the output layer by the change Δf (s) of the sigmoid function as shown in the following (Equation 11). Is used.
[0068]
(Equation 6)
[0069]
Using the change amount σ ′ thus obtained, the coupling coefficient W2 of the intermediate layer is changed by the following (Equation 12) in the same manner as the updating of the coupling coefficient W3 belonging to the output layer.
W2ij(T + 1) = W2ij(T) + a × σj'(Equation 12)
Such calculation is repeated until the coupling coefficient W1 belonging to the input layer is updated.
[0070]
At this time, an evaluation element indicating the degree of learning performed by the neural network is required. The evaluation element is expressed as an evaluation function or a cost function. An RMS error (mean square error) is used as the cost function, and as shown in the following (Equation 13), the output data Y of the neuron and the teacher data T Is represented by Learning of the neural network proceeds so as to minimize this cost function.
[0071]
(Equation 7)
[0072]
The values of the coefficients W1 to W3 initially set in the system through the learning of the neural network as described above can be arbitrarily changed after the initial setting (during operation of the system). For example, these coefficients W1 to W3 are registered in a database, and can be changed by learning of a neural network even during operation of the system. For this purpose, a flicker test and a questionnaire survey on subjective symptoms are also performed each time a user who has performed voice analysis during operation of the system, and the coefficients W1 to W3 are updated using the results.
[0073]
In this case, it is preferable that these tests can be performed on the mobile phones 1a and 1b so that the user does not need to go to the test room to perform the flicker test or the subjective test. Specifically, a program is downloaded from a specific website to a mobile phone, and blinking light is displayed on the screen of the mobile phone according to the program. Then, when the user who sees this starts to feel the flickering of the blinking light, or when he / she no longer feels it, it is possible to obtain the flicker value by pressing a predetermined button.
[0074]
In addition, regarding the study of subjective symptoms, it is possible to conduct a questionnaire survey using a CGI (Common Gateway Interface). The flicker values and the scores for the subjective symptoms obtained on the mobile phones 1a and 1b in this manner are transmitted to the server device 10 in FIG. 1 and registered in the database 7 through the DB server 6. Then, the teacher data is calculated from the flicker value registered in the database 7 and the score of the subjective test according to the above-mentioned (Equation 5), and learning of the neural network is performed.
[0075]
Note that, also at the time of the pre-test, the flicker test and the subjective examination may be performed from the mobile phones 1a and 1b. This has the advantage that the preliminary test itself can be easily performed.
[0076]
Here, the relationship between the neural network, the state vector (L, E, F) and the fatigue value A will be outlined. A graph in which each coordinate axis is essentially associated with one dynamic variable is called a state space. One point in the state space represents the state of the system at a certain time. Chaotic systems take complex trajectories in state space, but the trajectories only pass through certain regions of the state space and do not pass through other regions. The orbit describes a chaotic attractor.
[0077]
This chaos attractor can be reconfigured by embedding time-series data having chaos in a multidimensional state space. For embedding, n state variables may be restored from one state variable by using the embedding delay time τ according to the Turns method. If the embedding can be performed, the reconstructed chaotic attractor is a modification of the original attractor, and the Lyapunov exponent L, entropy E, Higuchi fractal dimension F, etc. are topologically preserved. In order to reconstruct the chaos attractor from the univariate time-series data, embedding may be performed by conversion to a delay time coordinate system.
[0078]
Chaos has the stability that the geometrical structure of attractors that exhibits steady-state behavior in state space does not change even if orbital instability occurs due to minute disturbances (see “Time-series analysis system based on deterministic chaos theory”). "Instrumentation August Issue, Vol. 40, No. 8 (1997)). Therefore, in the neural network of the present embodiment, as described above, the state vector (L, E, F) can be input to the input layer, and the stable fatigue level A can be output from the output layer.
[0079]
Returning to FIG. After the chaos calculation is performed by the method as shown in FIG. 7 and the characteristic amount of the speaker voice is obtained, the file of the speaker voice recorded in the database 7 is deleted (step S35). Then, the analysis result by the chaos calculation is supplied to the DB server 6 and registered in the database 7 (step S36), and a series of chaos analysis processing ends.
[0080]
As described above, the user can always see the analysis result by accessing the WEB server 4 from the mobile phones 1a and 1b. FIG. 9 is a diagram showing an example of a screen of a diagnosis result displayed on the mobile phones 1a and 1b. As shown in FIG. 9A, on the top screen of the analysis result display, a list of newly arrived diagnosis results and past diagnosis results is displayed.
[0081]
When a newly arrived diagnosis result is selected on the top screen, the screen is changed to a detail screen shown in FIG. 9B, and details of the fatigue level of the person can be confirmed. Further, in the list of past diagnosis results, numerical values indicating the degree of fatigue are displayed, and the transition of the degree of fatigue can be seen. By selecting any of them, the past diagnosis results can be viewed.
[0082]
As described above in detail, in the present embodiment, the speaker voice input from the user's mobile phones 1a and 1b is recorded in the database 7 through the CTI server 5 and the DB server 6, and the recorded speaker voice is analyzed by the analysis server. In step 8, a chaos analysis is performed to obtain a feature quantity representing the user's degree of fatigue. Then, in response to a request from the user, the analysis result is presented to the mobile phones 1a and 1b through the web server 4.
[0083]
Thus, the user can easily check his / her own health condition from any place at any time simply by recording the voice using the mobile phones 1a and 1b. In addition, the "fatigue degree", which could be expressed only in an ambiguous expression, can be obtained as a clear numerical value. In addition, by continuously using this system, it is possible to obtain motivation to correct lifestyles and reduce overtime while following changes in the degree of fatigue.
[0084]
Further, according to the present embodiment, the teacher data is calculated using the flicker value and the score of the subjective test, and the learning of the neural network is performed based on the calculated teacher data. Thereby, a more objective fatigue value reflecting the results of the flicker test and the subjective examination can be easily obtained by simply recording the voice using the mobile phones 1a and 1b.
[0085]
In the above embodiment, the mobile phones 1a and 1b are used as terminals used by the user. However, any mobile terminal having a voice input function and a network connection function, other than the mobile phones 1a and 1b, may be used. A terminal (for example, a notebook personal computer, a PDA (Personal Digital Assistants), etc.) can also be used as the user terminal.
[0086]
In the above embodiment, the neural network is used for the chaos analysis. However, the method of calculating the correlation between the state vector (L, E, F) and the fatigue value A is not limited to this. For example, the correlation may be calculated by a statistical method, and the fatigue value A may be obtained by this.
[0087]
Further, in the above-described embodiment, the example in which the flicker test and the subjective observation are performed when the teacher data used for learning the neural network is obtained has been described. However, the present invention is not limited to this. In other words, instead of or in addition to these tests, tests to measure urine pH and protein, tests to measure blood pressure and pulse, tests to measure brain waves and electrocardiograms, All or part of a test for measuring metabolites in urine or the like may be performed, and teacher data may be calculated based on the result using a predetermined arithmetic expression. In this way, a more objective fatigue level value obtained by mixing the plurality of test results can be easily obtained simply by inputting voice using the mobile phones 1a and 1b.
[0088]
Further, in the above-described embodiment, the case where the fatigue degree is obtained as an example of the user's health state has been described, but the present invention is not limited to this. For example, by improving the voice analysis engine of the analysis server 8, it is possible to perform analysis relating to “fluctuation” of a living body, such as a hangover diagnosis, a bloody muddy diagnosis, a serious (cheating) diagnosis, a compatibility diagnosis, and a lie detection diagnosis. Is also possible. That is, it is also possible to analyze a health state other than the degree of fatigue, a mental state, and the like.
[0089]
It is also possible to diagnose whether there is a suspicion of dementia from a human voice. Similar to the above-described flicker test for fatigue degree and subjective examination, for the diagnosis of dementia, an objective and reliable index for evaluating the degree of dementia has been conventionally provided. Screening tests such as the revised Hasegawa-type simplified intelligence evaluation scale (HDS-R), MMS (Mini-Mental State), and Kanahiro test are typical (all of which are known and will be described in detail here. Is omitted).
[0090]
For example, when a subject is tested for HDS-R, a score of 1 is obtained when a result of suspected dementia is obtained, and a score of 0 is obtained when a result is determined that there is no suspected dementia. give. Also, a MMS test is performed, and 1 point is given when the result that there is a suspicion of dementia is obtained, and 0 point is given when the result that there is no suspicion of the dementia is obtained. Similarly, a Kanahiro test is performed, and 1 point is given when the result that there is a suspicion of dementia is obtained, and 0 point is given when the result that there is no suspicion of the dementia is obtained. When the total of these three test results is 0 to 1, it is determined that there is no suspicion of dementia. Then, the result is used as teacher data of the neural network.
[0091]
Next, the voice data of the subject is recorded. The state vector (L, E, F) is extracted from the recorded voice data, and is input to the input layer of the neural network. The neural network here includes, for example, three input layers (three chaotic parameters of L, E, and F), 100 × 2 intermediate layers, and two output layers (“no suspicion of dementia”, “dementia”). Suspected) ").
[0092]
Then, when the three chaos parameters (L, E, F) are input to the input layer, back propagation is performed so that either of the cells “without suspicion of dementia” and “with suspicion of dementia” is correctly fired in the output layer. The neural network is trained according to the learning rule. Also in this case, by learning about 50 to 100 subjects, even if the state vector of an unknown user is input, an almost correct degree of dementia can be output for the user.
[0093]
In addition, each of the above-described embodiments is merely an example of the embodiment for carrying out the present invention, and the technical scope of the present invention should not be interpreted in a limited manner. That is, the present invention can be embodied in various forms without departing from the spirit or main features thereof.
[0094]
【The invention's effect】
As described above, according to the present invention, a user can easily check his / her own health condition, mental condition, and the like at any place at any time simply by recording a voice using a portable client device. Become like
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating an overall configuration of a speaker voice analysis system according to an embodiment.
FIG. 2 is a sequence flowchart showing an overall operation of the speaker voice analysis system according to the embodiment;
FIG. 3 is a diagram showing an example of a menu screen displayed on the mobile phone of the embodiment.
FIG. 4 is a flowchart showing a recording operation performed by the CTI server of the embodiment.
FIG. 5 is a flowchart showing a main operation of the analysis server according to the embodiment.
FIG. 6 is a flowchart illustrating a detailed operation of chaos analysis processing performed by the analysis server of the present embodiment.
FIG. 7 is a diagram showing an example of a neural network operation performed by the analysis server of the embodiment.
FIG. 8 is a diagram illustrating an example of a fatigue level as a result of chaos analysis.
FIG. 9 is a diagram showing a screen display example of a chaos analysis result.
[Explanation of symbols]
1a, 1b Mobile phone
2a Mobile phone packet network
2b Internet
2c Public line network
3 router
4 WEB server
5 CTI server
6 DB server
7 Database
8 Analysis server
10 Server device

Claims (7)

  1. A system in which a server device and a client device are configured to be connectable via a network,
    The server device, a speaker voice obtaining means for obtaining the user's speaker voice input from the client device from the client device via the network,
    A speaker voice analyzing unit that analyzes the speaker voice acquired by the speaker voice acquiring unit and obtains an index value representing a health state, a mental state, and the like of the user;
    Analysis result providing means for providing an analysis result by the speaker voice analysis means to the client device,
    The client device, a voice input means for inputting the speaker voice of the user,
    Speaker voice providing means for providing the speaker voice input by the voice input means to the server device,
    A speaker voice analysis system, comprising: an analysis result acquisition and output unit that acquires and outputs the analysis result provided from the server device by the analysis result providing unit.
  2. The speaker voice analysis unit is a feature amount calculation unit that calculates a state vector that is a chaotic feature amount of the voice based on the speaker voice acquired by the speaker voice acquisition unit,
    By inputting the state vector obtained by the feature amount calculating means and performing a neural network operation using the state vector and a plurality of coefficients, a neural value for obtaining an index value representing the user's health state, mental state, and the like is obtained. The speaker voice analysis system according to claim 1, further comprising a network operation unit.
  3. Target value calculation means for calculating a target value related to an index value representing the user's health state, mental state, etc.,
    By minimizing an error between an output value of the neural network when the state vector is given to an input of the neural network and a target value obtained by the target value calculating means, a plurality of signals in the neural network are obtained. 3. The speaker voice analysis system according to claim 2, further comprising learning means for optimizing the coefficient of the speaker.
  4. The target value calculating means calculates a target value of the degree of fatigue by performing a predetermined calculation using a data value obtained as a result of a flicker test and a survey in which a subject answers the degree of fatigue in a questionnaire format by himself / herself. The speaker voice analysis system according to claim 3, wherein:
  5. Speaker voice obtaining means for obtaining the user's speaker voice input from the client device via the network from the client device;
    A speaker voice analyzing unit that analyzes the speaker voice acquired by the speaker voice acquiring unit and obtains an index value representing a health state, a mental state, and the like of the user;
    A server device comprising: an analysis result providing unit configured to provide an analysis result by the speaker voice analyzing unit to the client device.
  6. A speaker voice transmitting step of inputting a user's speaker voice at a client device and transmitting the voice to a server device via a network;
    A speaker voice receiving step in which the server device receives the speaker voice transmitted in the speaker voice transmitting step;
    Analyzing the speaker voice received in the speaker voice receiving step, a speaker voice analysis step of obtaining an index value representing the health state and mental state of the user,
    An analysis result providing step of providing the result analyzed in the speaker voice analysis step to the client device;
    An analysis result output step in which the client device acquires and outputs the analysis result provided in the analysis result providing step. The health diagnosis method using speaker voice analysis.
  7. Speaker voice obtaining means for obtaining the user's speaker voice input from the client device via the network from the client device;
    Analyzing the speaker voice obtained by the speaker voice obtaining means, the speaker voice analyzing means for obtaining an index value representing the health state, mental state, etc. of the user, and the analysis result by the speaker voice analyzing means, Analysis result providing means provided to the client device,
    A speaker voice analysis program to make a computer function as a computer.
JP2003182824A 2002-12-12 2003-06-26 Speaker voice analysis system and server device used therefor, medical examination method using speaker voice analysis, and speaker voice analyzer Pending JP2004240394A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2002361526 2002-12-12
JP2003182824A JP2004240394A (en) 2002-12-12 2003-06-26 Speaker voice analysis system and server device used therefor, medical examination method using speaker voice analysis, and speaker voice analyzer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003182824A JP2004240394A (en) 2002-12-12 2003-06-26 Speaker voice analysis system and server device used therefor, medical examination method using speaker voice analysis, and speaker voice analyzer

Publications (1)

Publication Number Publication Date
JP2004240394A true JP2004240394A (en) 2004-08-26

Family

ID=32964522

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003182824A Pending JP2004240394A (en) 2002-12-12 2003-06-26 Speaker voice analysis system and server device used therefor, medical examination method using speaker voice analysis, and speaker voice analyzer

Country Status (1)

Country Link
JP (1) JP2004240394A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007199807A (en) * 2006-01-24 2007-08-09 Fuji Xerox Co Ltd Learning system and device, control method for computer, and program
WO2008096634A1 (en) * 2007-02-06 2008-08-14 Nec Corporation Health management system, health managing method, and health management program
WO2015146824A1 (en) * 2014-03-25 2015-10-01 シャープ株式会社 Interactive household-electrical-appliance system, server device, interactive household electrical appliance, method whereby household-electrical-appliance system performs interaction, and non-volatile computer-readable data recording medium having, stored thereon, program for executing said method on computer
JP6263308B1 (en) * 2017-11-09 2018-01-17 パナソニックヘルスケアホールディングス株式会社 Dementia diagnosis apparatus, dementia diagnosis method, and dementia diagnosis program
JP2018025932A (en) * 2016-08-09 2018-02-15 ファナック株式会社 Work management system including sensor and mechanical learning part
JP6312014B1 (en) * 2017-08-28 2018-04-18 パナソニックIpマネジメント株式会社 Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method and program
US10478111B2 (en) 2014-08-22 2019-11-19 Sri International Systems for speech-based assessment of a patient's state-of-mind

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007199807A (en) * 2006-01-24 2007-08-09 Fuji Xerox Co Ltd Learning system and device, control method for computer, and program
WO2008096634A1 (en) * 2007-02-06 2008-08-14 Nec Corporation Health management system, health managing method, and health management program
WO2015146824A1 (en) * 2014-03-25 2015-10-01 シャープ株式会社 Interactive household-electrical-appliance system, server device, interactive household electrical appliance, method whereby household-electrical-appliance system performs interaction, and non-volatile computer-readable data recording medium having, stored thereon, program for executing said method on computer
JP2015184563A (en) * 2014-03-25 2015-10-22 シャープ株式会社 Interactive household electrical system, server device, interactive household electrical appliance, method for household electrical system to interact, and program for realizing the same by computer
US10224060B2 (en) 2014-03-25 2019-03-05 Sharp Kabushiki Kaisha Interactive home-appliance system, server device, interactive home appliance, method for allowing home-appliance system to interact, and nonvolatile computer-readable data recording medium encoded with program for allowing computer to implement the method
US10478111B2 (en) 2014-08-22 2019-11-19 Sri International Systems for speech-based assessment of a patient's state-of-mind
JP2018025932A (en) * 2016-08-09 2018-02-15 ファナック株式会社 Work management system including sensor and mechanical learning part
JP6312014B1 (en) * 2017-08-28 2018-04-18 パナソニックIpマネジメント株式会社 Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method and program
WO2019044255A1 (en) * 2017-08-28 2019-03-07 パナソニックIpマネジメント株式会社 Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method, and program
JP6263308B1 (en) * 2017-11-09 2018-01-17 パナソニックヘルスケアホールディングス株式会社 Dementia diagnosis apparatus, dementia diagnosis method, and dementia diagnosis program

Similar Documents

Publication Publication Date Title
McMahan et al. The effect of contact with natural environments on positive and negative affect: A meta-analysis
Pryss et al. Mobile crowd sensing services for tinnitus assessment, therapy, and research
JP2019536180A (en) System and method for synthetic interaction with users and devices
JP6127381B2 (en) Measuring and monitoring the effectiveness of stress-related treatments
JP6089443B2 (en) Data collection platform
Wijsman et al. Towards mental stress detection using wearable physiological sensors
JP5867282B2 (en) Continuous monitoring of stress using a stress profile created by Doppler ultrasound of the kidney
US9747902B2 (en) Method and system for assisting patients
AU2013243453B2 (en) Health-monitoring system with multiple health monitoring devices, interactive voice recognition, and mobile interfaces for data collection and transmission
US9286442B2 (en) Telecare and/or telehealth communication method and system
US8622901B2 (en) Continuous monitoring of stress using accelerometer data
US9189599B2 (en) Calculating and monitoring a composite stress index
Anstey et al. Interrelationships among biological markers of aging, health, activity, acculturation, and cognitive performance in late adulthood.
Stone et al. Real‐time data collection for pain: Appraisal and current status
Lazarus et al. A laboratory study of psychological stress produced by a motion picture film.
JP4615629B2 (en) Computer-based medical diagnosis and processing advisory system, including access to the network
KR100657901B1 (en) Method and apparatus of generating avata for representing state of health
ES2430549T3 (en) System to monitor health, well-being and fitness
KR100797458B1 (en) System for performing a medical diagnosis, mobile telephone and method for the same
JP6044111B2 (en) A system that generates a human stress model
DE60210295T2 (en) Method and device for language analysis
Zekveld et al. The development of the text reception threshold test: A visual analogue of the speech reception threshold test
US8243940B2 (en) Medical device with communication, measurement and data functions
KR100345371B1 (en) Hearing Test Method Utilizing Internet And It&#39;s Program Recorded Media
US20100332443A1 (en) Cyclical Behavior Modification

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060619

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20090731

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20091006

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20100216