EP4346558A1

EP4346558A1 - Software-based, speech-operated and objective diagnostic tool for use in diagnosing a chronic neurological disorder

Info

Publication number: EP4346558A1
Application number: EP22732938.0A
Authority: EP
Inventors: Peter O. Owotoki; Leah W. Owotoki; David Lehmann; Diana Wanjiku; Moriah-Jane Lorentz
Original assignee: VitafluenceAi GmbH
Current assignee: VitafluenceAi GmbH
Priority date: 2021-05-31
Filing date: 2022-05-30
Publication date: 2024-04-10
Also published as: WO2022253742A1; DE102021205548A1

Abstract

The invention relates to a software-based diagnostic tool, a method for operating same, and a diagnostic system for use in diagnosing a chronic neurological disorder such as autism in both children and adults. The diagnostic tool comprises a speech analysis module (21) for identifying characteristic values (28) of a vocal biomarker in a speech signal (26) of a subject (11), at least one further module (22, 23) for identifying characteristic values (30, 32) of a second biomarker, and an evaluation unit (25) connected downstream thereof. The speech analysis module (21) comprises: a speech-signal-triggering controller (21a) which displays image data on an image display device (7) in order to trigger at least one speech signal (26) in the subject (11); a speech recording unit (21b) which records the speech signal (26); and a speech signal analyzer (21c) which subsequently evaluates the speech signal (26) to determine first at what point in time which pitch level is present, and subsequently determines a frequency distribution of the pitch levels among a number of frequency bands of a selected frequency spectrum, with this frequency distribution forming the characteristic values (28) of the vocal biomarker. On the basis of the characteristic values (28, 30, 32) of the biomarkers, the evaluation unit (25) determines, by applying a machine learning algorithm and comparison with a multi-dimensional interface, whether the subject (11) has the chronic neurological disorder.

Description

Software-based, voice-driven, and objective diagnostic tool for use in the diagnosis of a chronic neurological disorder

The invention relates to a software-based diagnostic tool for use in diagnosing a chronic, neurological disorder in a human using artificial intelligence, as well as a method for operating the diagnostic tool and a system comprising the diagnostic tool.

Chronic neurological disorders are common in humans. They express themselves in an atypical intellectual development and/or an atypical social behavior. Examples of such disorders are autism, attention deficit disorder (ADHD), schizophrenia, Alzheimer's, psychosis, etc. Autism is one of the best-known chronic neurological disorders, which is why it is considered below as an example but representative of all chronic neurological disorders as the starting point for the invention.

"Autism" is understood to be a profound disorder of the neuronal and mental development of humans, which can already occur in childhood in different strengths and forms and is generally known as autism spectrum disorder, abbreviated ASS or English ASD (Austism Spectrum Disorder). is diagnosed. Autism shows itself externally, especially in behavior and communication. What is striking about this developmental disorder is, on the one hand, the social interaction or dealing with and exchanging ideas with other people and a limited interest in repetitive, identical or similar processes, and on the other hand the verbal and non-verbal language of the autistic person, ie the voice and body language such as facial expressions, eye contact and gestures. A reduction in intelligence can also often be determined, but there are also forms of autism in which the affected person is of average or even high intelligence. This can be the case, for example, in people with the so-called Asperger syndrome, which is usually associated with less restricted language development and is therefore considered a mild form of autism. According to reports from the World Health Organization (WHO), roughly 1-2% of the world's population has ASD, which is an average of 100 million people worldwide. Because autistic people due This developmental disorder requires special encouragement and support in everyday life, its early and correct diagnosis is of great importance.

Autism is diagnosed in the classic way by a specialized doctor, neurologist or therapist by asking the potentially autistic patient a more or less large number of specially developed questions from a list of questions and by subsequently observing and evaluating the answers and reactions. However, it is known that only the combination of autism-specific symptoms, i.e. the symptom constellation, allows a clear diagnosis, since individual, similarly conspicuous behavioral characteristics also occur in other disorders.

Classic diagnostics have several disadvantages. On the one hand, it should be noted that the assessment by a medical expert is always subjective and can therefore be inaccurate, in both directions of the diagnosis, which can have fatal consequences for the patient and his family. This degree of subjectivity, which is partly due to a certain degree of bias, is an integral part of the evaluation process, which in individual cases can lead to incorrect results. A well-known example is the finding that girls are underrepresented in diagnostics because they are more adaptable and therefore show less pronounced behavioral problems. Another example is the prejudice that autism occurs predominantly in boys, see Lockwood Estrin, G., Milner, V., Spain, D. et al. , “Barriers to Autism Spectrum Disorder Diagnosis for Young Women and Girls: a Systematic Review”, Review Journal of Autism and Developmental Disorders, 2020). Even if an attempt is made to make the assessment as objective as possible, it takes doctors or therapists many years to acquire the necessary experience, an experience that is difficult to verbalize, teach, quantify, standardize or validate.

Further disadvantages are the temporally and geographically limited availability of medical experts, limited access to them and their diagnostics, especially in less developed regions of the world such as Africa or South America, and those with an expert diagnosis associated high costs, especially as there are few experts and the diagnosis is regularly made on site in the expert's practice, clinic or other facility. Affected people and their relatives often have to put up with long, arduous and costly journeys or journeys to get to an expert and be able to make use of his diagnosis. The global pandemic caused by the novel coronavirus SARS-CoV-2 has additionally restricted access to the experts.

Irrespective of this, the number of experts is small compared to the need, so that there can be long waiting times to get an examination appointment. Even in Germany, this waiting time can be several years in some cases, especially for adults, because children are preferred. In contrast, in some parts of the world, such as parts of Africa, children have no possibility of diagnosis at all.

Finally, diagnosis using a questionnaire is also disadvantageous because the questions take a long time to pose, for example between one and three hours, and the questions and observations are adapted to the patient's age, regional language and ethnic background have to. The latter requires that the medical professional be familiar with the ethnic characteristics of the patient, because behavior, verbal and non-verbal communication differed from people to people.

The aforementioned deficits, explained using the example of autism, also apply to other chronic, neurological disorders. Here, too, there is a lack of sufficient experts and expert knowledge, their quick and easy accessibility and, above all, an objective diagnosis.

The object of the present invention is to provide a device, a system and an operating method that overcomes the disadvantages mentioned and enables an objective, at least assistive diagnosis of a chronic neurological disorder, in particular autism and its associated neurological diseases, which is preferably possible at any time and from anywhere in accessible to the world regardless of the language and ethnic origin of the person concerned.

This object is achieved by a diagnostic tool having the features of claim 1, a system according to claim 18 and an operating method according to claim 22. Advantageous developments are specified in the respective dependent claims.

The diagnostic tool according to the invention and the method used and executed by it are based on improvements in the state of the art and innovations in the field of artificial intelligence. By obtaining and evaluating certain biomarkers as objective and incontrovertible proof of the presence or absence of autism, a cost-effective, user-friendly and rapid diagnosis is made with the aid of the diagnostic tool according to the invention and its operating method.

A biomarker is a measurable and therefore analyzable variable of a biological characteristic of a person, more precisely a variable that enables a qualitative or quantitative assessment of a physical, physiological or behavioral characteristic of a person.

According to the invention, a software-based diagnostic tool for use in diagnosing a chronic neurological disorder in a human subject using artificial intelligence is proposed, comprising

- a higher-level operating software,

- a speech analysis module for determining characteristic values of a first, namely vocal biomarker of a speech signal of the test person,

- at least one further module for determining characteristic values of a second biomarker, and

- an overall result evaluation unit downstream of the speech analysis module and the further module.

The operating software is set up to trigger the speech analysis module and the at least one further module one after the other and to feed their determined characteristic values to the overall result evaluation unit. The speech analysis module includes

- a voice signal trigger control, which is set up to display one of the individual images and/or individual videos or a text on an image display device for the test person in order to send at least one voice signal to the test person in the form of a naming of an object contained in the respective individual image or individual video or in form of reading the text aloud,

- a voice recording unit which is set up to record the voice signal in an audio recording with the aid of a voice input device, and

- a speech signal analyzer, which is set up to first evaluate the speech signal in the audio recording as to which pitch occurs at which point in time, and then to determine a frequency distribution of the pitches over a number of frequency bands of a frequency spectrum under consideration, this frequency distribution forming the characteristic values of the first biomarker .

The overall result evaluation unit is set up to determine whether the test person has the chronic, neurological disorder on the basis of the characteristic values of the test person's biomarkers using a machine learning algorithm based on artificial intelligence by comparison with a multidimensional interface. The interface can be understood as a mathematical hyperplane in a multidimensional space whose dimensions are defined by the number of characteristic values of all biomarkers. The interface represents a mathematical boundary between the biomarker values of people with the chronic, neurological disorder and people without such a disorder. More precisely, the overall result evaluation unit is a classification model trained with biomarker values from comparison persons, which determines whether and to what degree of probability the identified biomarker values of the subject lies on the side of the interface associated with the comparators with the chronic neurological disorder or on the side of the interface associated with the comparators without the chronic neurological disorder. The learning algorithm is preferably a support vector machine (SVM), a so-called random forest or a deep convolutional neuronal network algorithm, the learning algorithm having been trained with a number of first and second comparison data sets from characteristic values of the biomarkers, the first comparison data sets of a group of Reference persons are assigned who have the chronic, neurological disorder, and the second comparison data sets are assigned to a group of reference persons who do not have the chronic, neurological disorder.

A special feature when using the learning algorithm is that it can be continuously optimized or trained with new comparative data sets in order to classify the biomarker characteristics as precisely as possible, so that it can be used in the differentiation of the biomarker recognitions between people with and without chronic, neurological disorders, or in the definition of the interface, is getting better and better. A random forest is described, for example, in A Paul, D P Mukherjee, P Das, A Gangopadhyay, AR Chintha and S Kundu, "Improved Random Forest for Classification," in IEEE Transactions on Image Processing, Vol. 27, No. 8, Pages 4012-4024, Aug. 2018. In particular, it represents a good choice for the learning algorithm when the training data, i.e. the number of comparison data sets to create the classification model, increases, in particular between a few hundred and a few thousand comparison data sets. Furthermore, a deep convolutional neural network algorithm is particularly suitable if the training data, i.e. the number of comparison data sets to create the classification model, is particularly large, in particular over 5000, with such a model even achieving a classification accuracy of close to 99%.

The diagnostic tool thus evaluates at least two biomarkers, with the first biomarker (vocal biomarker) being of particular importance and characterizing a property of the test person's voice. More specifically, the first biomarker identifies the tone spectrum used by the subject as a first criterion for assessing the presence of a chronic neurological disorder. With the help of this vocal biomarker, one can Determine with 95% certainty whether the test person has a specific chronic neurological disorder. In order to improve the accuracy of the diagnosis, at least one second biomarker is used, the characteristic values of which are determined by the at least one further module.

In one embodiment variant, the further module can be an emotion analysis module for evaluating the reaction of the test person to an emotional stimulus as a second biomarker and can include at least the following:

- an emotion-triggering control, which is set up to display a set of individual images and/or individual videos or at least one individual video on the image display device in order to stimulate a number of individual emotions in the test person, and

- an emotion observation unit, which is set up to evaluate a (video) recording of the test person's face, obtained with the aid of an image recording device, at least to determine when it shows an emotional reaction.

The emotion analysis module is set up to determine at least the respective reaction time between the stimulation of the respective emotion and the occurrence of the emotional reaction, with at least these reaction times forming the characteristic values of the second biomarker in this embodiment variant.

In another embodiment variant, the additional module can be a viewing direction analysis module for evaluating the viewing direction of the test person as a second biomarker and can include at least the following:

- a line of sight guide, which is set up to display at least one image or video on the image display device in order to guide the line of sight of the test person, and

- a viewing direction observation unit, which is set up to determine the viewing direction over time from a (video) recording of the subject's face obtained with the aid of an image recording device, with this viewing direction course forming the characteristic values of the second biomarker in this embodiment variant. Thus, according to this embodiment variant, the second biomarker can either be a property of the emotion processing or of the subject's gaze. It thus characterizes a property of their ability to interact socially, namely either the reaction time to an emotional stimulus or the direction of their gaze, and can thus be referred to as a "social biomarker".

However, there is also the possibility of cumulatively evaluating the reaction to an emotional stimulus as a first additional biomarker and the viewing direction as a second additional biomarker, so that the diagnostic tool examines a total of three biomarkers.

Thus, only the speech analysis module and the emotion analysis module can be present in one embodiment of the diagnostic tool, in another embodiment only the speech analysis module and the gaze analysis module, and in a third embodiment the speech analysis module, the emotion analysis module and the gaze analysis module.

In the third embodiment variant, the emotion analysis module then forms a first further module and the viewing direction analysis module forms a second further module, with at least the reaction times to the emotional stimuli forming characteristic values of the second biomarker and the viewing direction over time forming characteristic values of a third biomarker of the test person. The overall result evaluation unit is then set up to determine whether the test person has the chronic neurological disorder based on the characteristic values of the first, second and third biomarker of the test person using the machine learning algorithm based on artificial intelligence by comparison with a multidimensional interface (hyperplane). The order in which the characteristic values of the second and third biomarkers are determined is not important.

The diagnostic tool is preferably set up, the set of individual images and/or individual videos or the text for triggering the speech signal, and/or the set of individual images and/or individual videos or the at least one video for the emotion stimulation and/or the at least one image or video for the Select and display gaze direction control depending on person-specific data of the test person. Among other things, it can be provided that the voice signal trigger control is set up to select and display either the set of individual images and/or individual videos or the text depending on the age of the test person. Children can preferably be shown the set of individual images and/or individual videos and adults can be shown the text on the image display device if the test person cannot read. Otherwise, it is preferable to use a text to be read aloud, because this way the language element is longer, more extensive in terms of sound and tonality, and overall more homogeneous.

Preferably, the diagnostic tool can have a filter to filter out background or background noise from the speech signal before the pitch evaluation, in particular the voice or voices of other people such as an assistant who is or may be present in the vicinity of the test person and speaks during the audio recording.

The diagnostic tool can preferably have a bandpass filter that is set up to restrict the pitch spectrum under consideration to the range between 30 and 600 Hz. The human voice covers a frequency range between 30 Hz and 2000 Hz, with spoken language usually being below 600 Hz. Limiting the pitch spectrum to the range between 30 and 600 Hz with the same number of frequency bands improves pitch analysis accuracy because the individual frequency bands are narrower.

The number of frequency bands is preferably between 6 and 18, ideally 12. This number represents a good balance between the accuracy of the pitch determination and the computing time and computing power required for it.

The speech signal analyzer preferably includes a deep convolutional neuronal network algorithm in order to estimate the pitches, also referred to as pitch detection in technical jargon. However, another high quality pitch estimation algorithm can also be used, such as "PRAAT". One A key feature of the speech signal analyzer, in particular the Deep Convolutional Neural Network algorithm, is its ability to learn by continuously improving the models it uses for pitch estimation and old models being able to be replaced by improved new models, whether due to more available comparative data that can be used to train the models, or because a more intelligent way of optimization was found.

According to one embodiment variant, the emotion observation unit and/or the viewing direction observation unit is set up to evaluate the facial image in real time. In other words, the examination is carried out while the test person is looking at the image reproduction device or is being shown the set of individual images and/or individual videos or the at least one video or image on it.

Alternatively, an offline examination can be carried out. In this case, the emotion observation unit and/or the line of sight observation unit can each have a video recording unit or use such a video recording unit that is part of the diagnostic tool in order to save a corresponding video recording while the test person views the set of individual images and/or individual videos or at least one video or picture is shown. This corresponding video recording can be made available to the emotion observation unit or the viewing direction observation unit for evaluation.

Preferably, the emotion observation unit comprises face recognition software based on Compassionate Artificial Intelligence, which is trained on certain emotions, namely those emotions that are stimulated by the individual images or individual videos of the sentence or by the video, such as joy, sadness, anger or fear.

The emotion observation unit is preferably set up to determine the type of reaction to the respectively stimulated emotion in addition to the reaction time, this type of response being part of the characteristics of the second biomarker. In the simplest case, the type of reaction can be binary information that indicates whether the reaction is a positive or negative emotion. For example, joy and sadness can be interpreted as positive emotions, and anger and fear as negative emotions. Alternatively, the response type can be the specific emotion with which the subject responds. The response type can then form part of the characteristics of the second biomarker, together with the corresponding response time for the particular emotional response to which the response type is linked.

It can additionally be provided that the emotion analysis module is set up to determine whether the reaction shown by the test person corresponds to the stimulated emotion. In the simplest case, this can be done by comparing whether both the emotional stimulus and the type of reaction are positive or negative emotions. If this is the case, the test person reacted as expected or "normally". If this is not the case, i.e. the emotional reaction is positive although the emotional stimulus was negative or vice versa, the test person reacted unexpectedly or "abnormally". At best, a comparison can also be made as to whether the specifically determined emotion with which the test person reacts corresponds to that of the stimulated emotion or whether these emotions are different. The result of this respective comparison can be given in a congruence indicator, e.g. such that a "1" indicates agreement of the emotional response with the stimulated emotion and a "0" indicates a lack of agreement, at least with regard to whether they are positive or negative emotions. Alternatively, a "-1" may indicate a lack of correspondence between the emotional response and the stimulated emotion and a "0" the fact that the subject showed no response at all. The congruence indicator can then also form part of the characteristics of the second biomarker, together with the corresponding reaction time for the emotional reaction to which the congruence indicator is linked.

The congruence indicator is particularly helpful and meaningful information, at least when the test person refers to a does not react to a specific stimulus with an emotion that would be expected because this is indicative of a chronic neurological disorder.

Provision can preferably be made for the emotion analysis module to supply three items of information for each stimulated emotion, namely the reaction time to the stimulation, the emotional reaction thereto (positive/negative or specifically determined emotion) and the congruence indicator. These three items of information for each of the emotions stimulated then form the characteristics of the second biomarker. In the case of n stimulated responses, the second biomarker comprises 3n parameters in this case.

Provision is preferably made for the emotion triggering control to be set up to stimulate between 4 and 12 emotions, preferably 6 emotions.

In an embodiment variant, the line of sight guidance can be set up to display the at least one image or video in discrete positions of the image display device one after the other or to move it along a continuous path. The image or video is thus reproduced smaller than the display area (screen) of the image display device and is moved across the display area, with the test subject being supposed to follow the chronological sequence of the display locations or the display path with their eyes. However, it is also possible to show a single video over the entire surface of the display, in which case this video contains one or more objects whose position changes in relation to the spatial limitations of the display, e.g. a butterfly flying back and forth.

The line of sight observation unit preferably includes eye-tracking software.

The diagnostic tool according to the invention can advantageously be used as a software application for a portable communication terminal, in particular a smartphone or tablet. This means that the diagnostic tool can be used by almost anyone at any time. The diagnostic tool according to the invention can also be used as a software application on a server that can be controlled via a computer network by a browser on an external terminal in order to run the diagnostic tool. This variant also ensures high accessibility of the diagnostic tool or access to it at any time from anywhere in the world, with the variant also taking into account the fact that the computing power in a portable communication terminal device may not be sufficient to execute the artificial intelligence algorithms mentioned. A server with a processing unit with sufficient computing power is better suited for this.

According to the invention, a diagnostic system for use in the diagnosis of a chronic, neurological disorder in a human subject using artificial intelligence is also proposed, comprising

- a diagnostic tool according to the invention,

- at least one non-volatile memory with the diagnostic tool forming program code and data,

- a processing unit, such as a processor, for executing the program code and processing the data of the diagnostic tool, and

- the following peripherals:

- a voice input device, such as a microphone, for recording at least one voice signal from the test person for the diagnostic tool,

- an image capturing device, such as a CCD camera, for capturing an image of the subject's face for the diagnostic tool,

- an image display device, such as a monitor or a display, for displaying image data for the test person and

- at least one input means, such as keys or a touch screen, for making inputs by the test person, with the peripheral devices being operatively connected to the processing unit and the diagnostic tool being set up to at least indirectly control the voice input device, the image recording device and the image display device and the Evaluate recordings from the voice input device and the image recording device. The diagnostic system is preferably a portable communication terminal device, in particular a smartphone or tablet, on which the diagnostic tool is run as a software application. In this case, the non-volatile memory, the processing unit, the voice input device, the image recording device, the image display device and the input means represent integral components of the communication terminal.

Alternatively, the processing unit can be part of a server connected to a computer network such as the Internet and controllable via a browser, with the non-volatile memory being connected to the server and the peripheral devices being part of an external terminal device, in particular a portable communication terminal device. In other words, in this embodiment variant, the diagnostic tool can be called up via the network/Internet and executed on the server.

In a further embodiment variant, the external terminal device can also have a volatile memory, with the diagnostic tool being stored partly on the server-side memory and partly on the terminal-side memory. For example, the image or text data used by the modules, as well as at least the voice signal triggering control and the voice recording unit of the voice analysis module, the emotion triggering control of the emotion analysis module and/or the line of sight guidance of the line of sight analysis module can be stored on the end device and executed there, whereas on the server-side memory of the speech signal analyzer, the emotion observation unit and the reaction assessment unit and the line of sight observation unit and the overall result evaluation unit are stored and executed. Consequently, all computationally intensive functional units of the diagnostic tool are arranged on the server side. There is also the possibility of arranging all functional units of the diagnostic tool on the terminal side, except for the overall result evaluation unit. On the one hand, this makes sense because the overall result evaluation unit can be continuously trained with new comparison data sets and thus improved. Another advantage is that the data to be transmitted to the server, namely those from the Overall result evaluation unit to be evaluated biomarker parameters, do not contain personal data, so that this procedure is advantageous for data protection reasons.

According to the invention, a method for operating the software-based diagnostic tool for use in diagnosing a chronic neurological disorder in a human subject using artificial intelligence is also proposed, comprising

- a higher-level operating software,

- A total result evaluation unit downstream of the speech analysis module and the further module, wherein

- the operating software triggers the speech analysis module and the at least one other module one after the other and feeds their determined characteristic values to the overall result evaluation unit,

- a speech signal trigger control of the speech analysis module presents a set of individual images and/or individual videos or a text on an image display device for the test person in order to send at least one speech signal to the test person in the form of a name for an object contained in the respective individual image or individual video or in the form of a to trigger the reading of the text,

- a voice recording unit of the voice analysis module records the voice signal in an audio recording with the aid of a voice input device, and

- a speech signal analyzer of the speech analysis module first evaluates the speech signal in the audio recording to determine which pitch occurs at which point in time, and then determines a frequency distribution of the pitches over a number of frequency bands of a frequency spectrum under consideration, with this frequency distribution forming the characteristic values of the first biomarker, and - the overall result evaluation unit determines whether the test person has the chronic neurological disorder on the basis of the characteristic values of the biomarkers of the test person using a machine learning algorithm based on artificial intelligence by comparison with a multidimensional interface.

In addition, in an embodiment variant of the operating method, in which the further module is an emotion analysis module for evaluating the reaction of the test person to an emotional stimulus as a second biomarker, it can be provided that

- an emotion triggering control of the emotion analysis module displays a set of individual images and/or individual videos or at least one individual video on the image display device in order to stimulate a number of individual emotions in the test person, and

- an emotion observation unit of the emotion analysis module evaluates a recording of the subject's face, obtained with the aid of an image recording device (6), at least to determine when the subject shows an emotional reaction, and

- the emotion analysis module determines the respective reaction time between the stimulation of the respective emotion and its occurrence, and at least these reaction times form the characteristic values of the second biomarker.

As explained above, the emotion observation unit can also evaluate the recording of the test person's face as to which emotional reaction it shows, i.e. the type of reaction, for example in the way whether it is a positive or negative emotional reaction, or in the type of determination the concrete emotion. In this case, the respective reaction time and reaction type form the characteristic values of the second biomarker for each stimulated emotion.

As also stated above, the emotion analysis module can also determine a congruence indicator that indicates whether the emotional response corresponds to the stimulated emotion, for example whether both are positive or negative emotions respectively or even the emotion type matches. In this case, for each stimulated emotion, the respective reaction time and the congruence indicator form the characteristic values of the second biomarker. The emotion analysis module preferably determines however, three pieces of information for each stimulated emotion, namely the reaction time, the type of reaction and the congruence indicator. In this case, for each stimulated emotion, the respective reaction time, reaction type and the congruence indicator form the characteristic values of the second biomarker.

In addition, in another embodiment variant of the operating method, in which the further module is a viewing direction analysis module for evaluating the viewing direction of the test person as a second biomarker, it can be provided that

- a line of sight guide of the line of sight analysis module displays at least one image or video on the image display device in order to guide the line of sight of the test person, and

- a viewing direction monitoring unit of the viewing direction analysis module determines the viewing direction over time from a recording of the subject's face obtained with the aid of an image recording device (6), this viewing direction profile forming the characteristic values of the second biomarker.

Finally, in a further embodiment variant of the operating method, it can be provided that the emotion analysis module is a first additional module and the viewing direction analysis module is a second additional module and these modules are triggered one after the other, with at least the reaction times to the emotional stimuli forming characteristic values of the second biomarker and the viewing direction over the Time characteristic values of a third biomarker of the test person forms, and wherein the overall result evaluation unit determines whether the test person has the chronic neurological disorder based on the characteristics of the first, second and third biomarker of the test person using the machine learning algorithm based on artificial intelligence by comparison with a multidimensional interface having.

Otherwise, the operating method is set up to control the diagnostic tool in such a way that it executes the steps and functions for which it is set up accordingly, as described above. The software-based diagnostic tool and its operating method are described in more detail below using a specific example and the accompanying figures.

Show it:

FIG. 1: a schematic representation of the structure of a first diagnostic system according to the invention

FIG. 2: a schematic representation of the structure of a second diagnostic system according to the invention

Figure 3: a schematic representation of the functional units of the language analysis module of the diagnostic tool Figure 4: a schematic representation of the functional units of the emotion analysis module of the diagnostic tool Figure 5: a schematic representation of the functional units of the gaze analysis module of the diagnostic tool Figure 6: a schematic representation of the structure of a third diagnostic system according to the invention

FIG. 7: a flow chart of an operating method according to the invention; FIG. 8: a schematic signal flow chart

Figure 9: a recorded speech signal comprising eight individual speech signals Figure 10: the pitch signals of the eight individual speech signals from Figure 9 over time (pitch spectrum)

Figure 11: a pitch histogram for the eight pitch signals in Figure 10 Figure 12: an example pitch histogram of an autistic subject Figure 13: an example pitch histogram of a non-autistic subject Figure 14: further examples of pitch histograms of autistic subjects Figure 15: further examples of pitch histograms non- autistic subjects

Figure 16: a diagram illustrating emotional stimuli and their effect on the subject

FIG. 17: a chronological sequence of representations of an image on the image display device at different positions; FIG. 18: a determined viewing direction path FIG. 1 shows a software-based diagnostic tool as part of a diagnostic system 1 according to a first embodiment variant. FIG. 7 illustrates an operating method for this diagnostic tool or for the diagnostic system. The diagnostic system 1 comprises, on the one hand, a computer system 2, which has at least one processing unit 3 in the form of a processor 3 with one, two or more cores, and at least one non-volatile memory 4, and peripheral devices 5, 6, 7, 8, on the other hand, which are operatively connected to the computer system 2, more precisely, are connected to it by communication technology, so that the peripheral devices 5, 6, 7, 8 receive control data from the computer system 2, are therefore controlled and/or can transmit useful data, in particular image and sound data, to it .

The peripheral devices 5, 6, 7, 8 are a voice input device 5 in the form of a microphone 5, an image recording device 6 in the form of a camera 6, for example a CCD camera, an image display device 7 in the form of a display 7 or monitor and a Input means 8, e.g. in the form of control keys, a keyboard or a touch-sensitive surface of the image display device 7 in conjunction with a graphical user interface displayed thereon, which graphically highlights the partial area of the image display device 7 to be touched for a possible input. The input means 8 can also be formed by a speech recognition module. The peripheral devices 5, 6, 7, 8 are locally assigned to a test person 11, in particular accessible to him, so that he can interact with the peripheral devices 5, 6, 7, 8.

In one embodiment variant, the peripheral devices 5, 6, 7, 8 can be connected to the computer system 2 via one or more cable connections, either via a common cable connection or via individual cable connections. Instead of the cable connection, the peripheral devices 5, 6, 7, 8 can also be connected to the computer system 2 via a wireless connection, in particular a radio connection such as Bluetooth or WLAN. It is also a mixture of these connection types possible, so that one or more of the peripheral devices 5, 6, 7, 8 with the computer system 2 via a Cable connection and one or more of the peripheral devices 5, 6, 7, 8 via a wireless, in particular radio connection with the computer system 2 can be connected. Furthermore, the peripheral devices 5, 6, 7, 8 can be connected directly to the computer system or indirectly via an external device 12, for example via an external computer such as a personal computer, which in turn can be connected wirelessly or via cable via at least one local and/or global Network 9 such as the Internet can be connected to the computer system 2 for communication. This is illustrated in FIG.

The peripheral devices 5, 6, 7, 8 can each form individual devices. Alternatively, however, they can also be installed individually in combination with one another in one device. For example, the camera 6 and microphone 5 can be housed in a common housing, or the display 7 and the input device 8 can form an integrated functional unit. As a further alternative, the peripheral devices can all be an integral part of the external device 12, which can then be, for example, a mobile telecommunications terminal 12, in particular a laptop, a smartphone or a tablet. An embodiment variant of the external device 12 in the form of a smartphone 12 is illustrated in FIG. In this case too, the peripheral devices 5, 6, 7, 8 communicate with the computer system 2 via the external device 12 and a local and/or global network 9 such as the Internet 9, to which the external device 12 is connected, on the one hand, wirelessly or via Cable, and the computer system 2 on the other hand, wirelessly or via cable, is connected.

In the event of access from the network 9, in particular the Internet, the computer system 2 acts as a central server and has a corresponding communication interface 10 for this purpose, in particular an IP-based interface, via which communication with the external device 12 takes place. This enables unrestricted access to the diagnostic tool in terms of time and location. In particular, the communication with the computer system 2 as a server can take place via a special software application on the external device or via an Internet address or website that can be called up in a browser on the external device 12 . When realizing the peripheral devices 5, 6, 7, 8 and their connection to the computer system, there are numerous physical design options that enable multiple scenarios for the use of the diagnostic tool according to the invention.

The diagnostic system 1 or the computer system 2 and the peripheral devices 5, 6, 7, 8 that are operatively connected to it can be located as a common functional unit locally at the workplace of a doctor or therapist, e.g. in his practice or clinic. In this case, the test person 11 must be present in person in order to be able to use the diagnostic system 1 . It is also possible for only the external device 12 with the peripheral devices 5, 6, 7, 8 to be located at said workstation, which device accesses the computer system 2 or the diagnostic tool via the network 9. In this case, the test person 11 still has to be personally present at the doctor or therapist, but the investment costs for the doctor or therapist are lower. However, it is of particular advantage if the external device 12 is a mobile device, for example a laptop, smartphone or tablet, which also allows access to the computer system 2 or to the diagnostic tool from home. This eliminates time-consuming trips to the doctor or therapist.

A medical expert is basically not required to use the diagnostic system 1 according to the invention since the diagnosis is carried out independently and above all objectively by the diagnostic tool on the basis of the information provided by the test person 11 via the microphone 5 and the camera 6 . The test person 11 interacts with the diagnostic system 1 on the basis of textual or spoken instructions which it outputs on the image display device 7 or a loudspeaker as a further peripheral device and which the test person 11 has to follow. For children and those adults who are inexperienced in using laptops, smartphones or tablets and their programs, another person such as a parent or caregiver can support the operation of the diagnostic system 1, but this does not require a medical expert. Nevertheless, the diagnostic result should be discussed and evaluated with a medical expert, especially with regard to any therapy resulting from a positive autism diagnosis. In the case of a positive autism diagnosis, it is also recommended for reasons of emotional concern to carry out the use of the diagnosis system 1 under the supervision of another adult.

In the narrower sense, the diagnostic tool according to the invention consists of a combination of software 15 and data 14 that are stored in a non-volatile memory 4 . Figures 1 and 2 represent the simple case that the software 15 and data 14 are stored together in a memory 4, which is part of the computer system 2, for example a hard disk memory. However, this memory 4 can also be arranged outside of the computer system 2, for example in the form of a network drive or a cloud. Furthermore, it is not mandatory that the software and the data are in the same memory 4 .

Rather, the data 14 and the software 15 can also be distributed in different memories, stored inside or outside the computer system, e.g. in a network memory or a cloud, which the computer system 2 accesses when required. Furthermore, not only can all of the data 14 and all of the software 15 be stored in separate memories, rather parts of the data and/or parts of the software can also be stored on different memories. Thus, there are also numerous design options for the arrangement and distribution of the diagnostic tool within the diagnostic system 1.

In particular, the data 14 from the diagnostic tool includes image data 16, 18, 19 in the form of individual images and/or individual videos and text data 17, which are intended to be displayed by the diagnostic tool on the image display device 7 in order to express a spoken statement, an emotional reaction and to achieve a direction of vision. With regard to this purpose, the image and text data 16, 17, 18, 19 are preferably each combined into a specific group or a specific data set, which are selected by the diagnostic tool depending on the person-specific information provided by the test person. The text data 17 are provided in order to display them to an adult who is able to read as a test person 11 on the image display device 7 for reading. The text data 17 can comprise a first text 17a in a first language, eg English, and a second text in a second language, eg Swahili. The text can be, for example, a well-known standard text, eg a fairy tale or a story, such as Little Red Riding Hood or "A Tale of two Cities".

A first part 16 of the image data is provided in order to display individual images and/or individual videos one after the other on the image display device 7 to an adult or a child who is unable to read as a test person 11 so that the test person 11 names the object shown on the individual images and/or individual videos. These individual images and/or individual videos 16 are designed in such a way that only a single object that is comparatively easy to name is shown on them, such as an elephant, a car, an airplane, a doll, a soccer ball, etc. In the case of a video, it can this objects must be shown in motion. The individual images and/or individual videos can reflect reality or be drawn. Since people, in particular children, of different ages, genders and ethnic backgrounds have different interests and different socio-cultural backgrounds, the individual images and/or individual videos can be divided into individual sets 16a, 16b of individual images and/or individual videos, the content of which is based on the age, Gender and ethnic origin are coordinated or have a specific age-related, gender-related and/or cultural context in order to ensure that the test person 11 actually recognizes and names the respective object. However, the language in which the name is given is not important, since this is irrelevant for the diagnostic tool.

Thus, a first set of frames 16a can be intended to be presented to a boy or a child of a first ethnic origin on the image display device 7, and a second set of frames 16a can be intended to be presented to a girl or a child of a second ethnic origin on the Image display device 7 to be shown. A second part 18 of the image data is provided in order to display individual images and/or individual videos one after the other to the test person 11 on the image display device 7 in order to trigger a specific emotional reaction in the test person 11, eg joy, sadness, anger or fear. Although still images are generally suitable for triggering an emotional reaction, such as a short comic with a joke in it, videos can show situations that evoke more intense emotions, which is why videos are generally better suited. In this case, too, the second part 18 of the image data 16, 18 can be divided into individual sets 18a, 18b of individual images and/or individual videos, the content of which is tailored to age, gender and ethnic background, in order to ensure that the test person 11 reacts to a certain situation with a certain emotion. The individual images and/or individual videos can reflect reality or be drawn. The latter is ideal for children.

Optionally, a third part 19 of image data can be provided, comprising at least one single image or video, which is displayed to the test person 11 on the image display device 7, in particular at different positions in succession, in order to direct their line of sight to the image display device 7. In principle, a single individual image 19a (cf. FIG. 17) is sufficient for this purpose, which is displayed discretely one after the other at different positions or is continuously moved to different positions. In the simplest case, the individual image 19a can be any graphic object such as a symbol, an icon, a logo, a text or a figure. It can alternatively be a photo or a drawing. The individual image can come from the set of individual images of the first part 16 or second part 18 of the image data, so that in this case no third part 19 of image data is required for guiding the viewing direction. In order to ensure that the subject 11 does not lose interest in the image tracking, it is advisable to use different individual images or at least one video for directing the viewing direction, which then forms the third part 19 of the image data. However, these individual images or the video can also come from the first part 16 or second part 18 of the image data, so that in this case no third part 19 of image data is required either. As explained above, in addition to the data 14, the diagnostic tool consists of software 15 (program code) with instructions for execution on the processor s. More precisely, this software includes operating software 20, several analysis modules 21, 22, 23 and an overall result evaluation unit 24, the operating software 20 takes over the higher-level control of the processes in the diagnostic tool, in particular the individual analysis modules 21, 22, 23 one after the other and controls the overall result evaluation unit 24, compare Figure 7.

The first of the analysis modules is a speech analysis module 21 for determining characteristic values 27 of a first biomarker, which is referred to here as a vocal biomarker of a speech signal 26 of the test person 11 that is contained in an audio recording 26 . In order to trigger the voice signal 26 and obtain the audio recording 26, the voice analysis module 21 comprises a voice signal trigger controller 21a and a voice recording unit 21b. A speech signal analyzer 21c is also part of the speech analysis module 21 in order to obtain characteristic values of the vocal biomarker, as is shown schematically in FIG.

The speech analysis module 21 is triggered as the first analysis module by the operating software 20 after the test person 11 or another person assisting her, such as an adult or the doctor, has activated the diagnostic tool, see Figure 7, and, if necessary, personal data after the diagnostic tool has been requested , especially age

entered gender and ethnic background. There is also the possibility that this person-specific data is part of a person profile that already exists before the start of the diagnostic tool and can be used by it.

The person-specific data can be specified by the test person 11 via the input means 8 . For this purpose, the diagnostic tool expects a corresponding input via the input means 8 in order to then select the data 14 on the basis of the input made. If, however, in a simple variant of the diagnostic tool according to the invention only a certain Depending on the group of people, eg only adults or only children, the data can be specially tailored to this group of people and there is no need to enter the person-specific data. The data or individual images and individual videos are then preferably stored in memory 4 in a gender-neutral and ethnic-culturally neutral manner.

The voice analysis module 21 is configured to first execute the voice signal trigger control 21a. This in turn is set up to load a set 16a, 16b of individual images or individual videos from the first image data 16 in the memory 4, or to load a text 17a, 17b from the text data 17 in the memory 4 and to display it on the image display device 7. In the case of single images or single videos, this is done one after the other.

The set 16a, 16b of individual images or individual videos or text 17a, 17b is preferably selected as a function of the personal data.

If the person-specific data states that the test person 11 is a child or whose age is below a certain age limit of, for example, 12 years, the set 16a, 16b of individual images or individual videos is loaded, otherwise the text 17a, 17b. This condition can also be linked to the additional condition to be checked, whether the test person 11 has a reading disability, which can also be part of the person-specific data. If such a reading disability is present, the set 16a, 16b of individual images or individual videos is also used. Furthermore, depending on the gender and/or the ethnic origin of the test person 11, a first set 16a or a second set 16b of individual images or individual videos can be selected, which in this respective set is specifically tailored to the corresponding group of people. Furthermore, depending on the ethnic background or the national language of the test person 11, a first text 17a or a second text 17b can be selected, which is respectively adapted to the corresponding group of people.

The evaluation of the person-specific data, more precisely the examination of whether the test person 11 is under the age limit, has a reading disability, what gender they belong to, what ethnic origin they have or what language the test person 11 speaks or understands, or the selection of the appropriate one Set 16a, 16b of still images or still videos or text 17a, 17b are process steps which the speech signal trigger control 21a executes. It then loads the corresponding set 16a, 16b of individual images or individual videos or the corresponding text 17a, 17b from the memory 4 and controls the image display device 7 in such a way that the individual images or individual videos of the set 16a, 16b appear one after the other or the text 17a, 17b of the image display device 7 are displayed.

The individual images and individual videos of the sentence 16a, 16b and the text 17a, 17b are intended to receive a spoken statement from the test person 11, referred to below as the voice signal 26. In the case of the individual images or individual videos, it is provided that the spoken utterance is a single-word designation of the object that is shown on the respective individual image or in the respective individual video. In the case of the text 17a, 17b, it is provided that the spoken utterance is the reading of this text 17a, 17b. In order to convey this to the test person 11, it can be provided that the diagnostic tool, in particular the higher-level operating software 20 or the speech analysis module 21, sends a corresponding textual or verbal instruction before the playback of the individual images or individual videos of the sentence 16a, 16b or the text 17a, 17b outputs to the test person 11, for example via the image display device 7 and/or a loudspeaker.

For example, the set 16a, 16b may include seven or more still images or still videos. The individual frames or videos can be played back for a fixed period of time, e.g. for 5 or 6 seconds, so that after this period the next frame or video is played back until all the frames or videos have been played back.

Simultaneously with or shortly before the start of the playback of the individual images or individual videos of the sentence 16a, 16b or the text 17a, 17b on the image display device 7, the voice signal trigger control 21a activates the voice recording unit 21b to record the voice of the test person 11 as a voice signal 26. To this end, the voice recording unit 21b switches the speech input device 5 (microphone), records the time-continuous speech signal 26 or speech signals in an audio recording 26 and stores this in an audio data memory 13a for recorded speech signals. The audio recording 26 itself is digital, in which case the voice signal 26 or its sampling (sampling) can already be digitized in the voice input device 5 or in one of these downstream analog/digital converters, which is part of the processing unit 3 or a separate digital signal processor (DSP). can be. The audio data memory 13a can be part of the non-volatile memory 4 . Alternatively, it can be a memory that is separate from this in the computer system 2 or a memory that is separate from the computer system 2, for example a memory in a network drive or in a cloud.

The voice recording unit 21b can be set up to end the recording after a specified period of time in order to obtain an audio recording 27 of a specific length of time, for example an average of 45 seconds for children and an average of 60 seconds for adults. The voice input device 5 can then also be switched off. Alternatively, it can be switched off when the audio signal from the voice input device 5 is below a certain limit value for a certain time after a voice signal 26, i.e. the test person 11 is no longer speaking.

According to another embodiment variant, manual triggering and ending of the audio recording can be provided. In this case, the diagnostic tool receives a corresponding start or stop input via the input means 8.

Furthermore, the audio signal can be uninterrupted for the duration of the playback of the individual images, individual videos or the text, so that the recording is started once, namely at the beginning of playback, and is ended once, namely at the end of playback. Alternatively, it is possible to start a new audio recording for each individual image or individual video, so that each voice signal 26 is contained in a separate audio recording. The recording can be started before or at the beginning of the playback of each individual image or individual video and then ended, in particular after receipt of the voice signal 26 from the test person 11, either after a specified period of time has elapsed or if the Audio signal of the voice input device 5 for a certain time after a voice signal 26 is below a certain threshold. An example of such individual audio recordings is shown in FIG.

FIG. 9 shows the curves of the amplitude or the sound pressure level of eight individual speech signals 26, each recorded in an audio recording, over time. The speech signals 26 are each based on a single more or less long spoken word. The individual audio recordings can first be processed individually or combined to form an overall recording, which is then processed further. In FIG. 8, all of the audio recordings are provided with the reference numeral 27, regardless of whether they are a number of individual audio recordings or a single overall recording.

The audio recording(s) 27 is/are then evaluated in the speech signal analyzer 21c, characteristic values 28 of a vocal biomarker of the recorded speech signal 26 being determined, see FIG . It is therefore not important whether the naming of the object on the respective single image or video was correct.

The evaluation of the audio recording(s) 27 by the speech signal analyzer 21c takes place in that the basic vocal frequencies or pitches in the speech signal 26 contained in the audio recording 27 are first estimated over time with the aid of artificial intelligence. This is called the pitch spectrum. The speech signal analyzer 21c thus examines the basic tonal structure of the speech signal 26 in the audio recording 27. For this purpose, the audio recording 27 is processed in a “deep convolutional neural network” algorithm, which is part of the speech signal analyzer 21c. The basic principle of such an algorithm is described in the technical paper "Luc Ardaillon, Axel Roebe: Fully-Convolutional Network for Pitch Estimation of Speech Signals" Insterspeech 2019, Sep 2019, Graz, Austria, 10.21437/Interspeech, 2019-2815, hal-02439798". . An example of a deep convolutional neural network algorithm for pitch spectrum estimation is CREPE (Convolutional Representation for Pitch Estimation), which is based on a 6-level deep neural network that processes an audio signal in the time domain.

The deep convolutional neural network algorithm estimates the pitch of the audio signal 26 at any point in time, in particular within a specific frequency spectrum from 30Flz to 1000Flz, which includes all possible tones of the human voice. The progression of the pitch over time is called the pitch spectrum. Figure 10 shows the pitch spectra for the eight individual audio recordings from Figure 9.

Experience has shown that taking into account the frequency range greater than 600Flz does not lead to any significant improvement in the analysis of the vocal biomarker, so this frequency range can be neglected. This can be done, for example, by bandpass filtering, in which the frequency range from 30 Flz to 600 Flz is extracted from the voice signal 26 . This is preferably done after the pitch estimation or determination of the pitch spectrum, so that the further analysis is based only on the relevant part of the human voice. In addition, a digital bandpass filter can be applied to the audio recording(s) 27, which is also part of the speech signal analyzer 21c. In an embodiment variant, this bandpass filter can have fixed limit frequencies, in particular at 30 Hz and 600 Hz. Alternatively, the bandpass filter can have variable cut-off frequencies, with provision being made to determine the minimum and maximum frequencies in the pitch spectrum and then to configure the bandpass filter in such a way that the lower cut-off frequency corresponds to the determined minimum frequency and the upper cut-off frequency corresponds to the determined maximum frequency.

Before the audio recording is processed in the deep convolutional neural network algorithm, the speech signal 26 can also be filtered in such a way that background noise, such as the speech of persons other than the test person 11 in the speech signal 26, is eliminated. For this purpose, a corresponding digital filter can be applied to the audio recording(s) 27, which is also a component of the speech signal analyzer 21c. Digital filters of this type are known per se. Background noise is filtered out sensibly before the pitch estimation or determination of the pitch spectrum, so that the result of this estimation is not falsified.

A histogram analysis is then applied to the pitch spectrum of the audio recording(s). A histogram that is the result of this analysis is shown in FIG. 11. In the histogram analysis, the frequency range under consideration, here the range between 30Hz and 600Hz, is divided into a number n of equal sections, each of which forms a container. Each individual pitch is then assigned to the corresponding section or container using the pitches currently determined in the audio recording. This corresponds to an area-related summation of the occurrences of the individual pitches. In other words, it is determined for each frequency segment how often one of its pitches is contained in the audio recording. The determined number of total pitches of each section is then divided by the total number of pitches determined. The histogram thus indicates in % how often the pitches or frequencies of a specific frequency section occurred in the audio recording. In the case of several individual audio recordings, as shown in FIG. 9, the totality of all audio recordings or the totality of all pitch spectra (FIG. 10) is considered. In the present example, the relevant frequency range has been divided into 12 sections, although there can be fewer or more.

Example: If a pitch of 320Hz occurs in audio recording 27, this is assigned to the 7th section. If another pitch occurs at 280Hz, this is assigned to the 6th section. A pitch of 340Hz is reassigned to the 7th section, and so on. If the same pitch, e.g. 320Hz, occurs again, it is reassigned to the 7th section. If these four pitches were to remain, the 6th section would have one and the 7th section three assignments, so that for the frequency range between 250Hz and 300Hz (6th section) there would be a frequency of 25% and for the frequency range between 300Hz and 350Hz (7th section) gives a frequency of 75%. A histogram illustrates these frequencies.

In the pitch histogram in Figure 11, for example, the frequency range between 200Hz and 250Hz (5th section) with a frequency of about 13%, the Frequency range between 250Hz and 300Hz (6th section) with a frequency of approx. 23.5%, the frequency range between 300Hz and 350Hz (7th section) with a frequency of approx. 26% and the frequency range between 350Hz and 400Hz (8th section) with a frequency of approx. 14%.

FIGS. 12 and 13 each show another pitch histogram as the result of a histogram analysis. The histogram in FIG. 12 belongs to a speech signal of a test person 11 who has been proven to be autistic, whereas the histogram in FIG. 13 belongs to a speech signal of a reference person who has been shown to be non-autistic. The histogram provides information about the pitch variability in the voice of the subject 11, which is an objective biomarker for distinguishing an autistic person 11 from a reference person without autism. As Figures 12 and 13 illustrate in comparison, the voice varies less in pitch in a non-autistic person, being more confined to certain frequencies. The frequencies used here are in a comparatively narrow frequency band, namely between 250 Hz and 400 Hz, and have a clear peak there, namely at approx. 300 Hz, see Figure 13. In contrast, the variability of the pitch of the voice is greater in an autistic person, as Figure 11 shows. Here the dominant frequencies extend over a much broader frequency band, namely between 50 Hz and 350 Hz, see Figure 12, and their distribution is more even, i.e. it does not have a clearly pronounced peak.

The histograms of autistic test persons in FIG. 14 in comparison to the histograms of non-autistic reference persons in FIG. 15 also confirm this finding. FIGS. 14 and 15 each show four histograms. It can be clearly seen that autistic people use a broader spectrum of sounds,

The pitch histogram can be understood as a vocal biomarker. In this case, the characteristic values of this biomarker are formed by the frequencies of occurrence of the n frequency segments. In other words, the histogram analysis according to FIG. 11 supplies twelve characteristic values, ie a frequency of occurrence for each frequency segment. The histogram or the characteristic values of this biomarker can then be evaluated in a preliminary evaluation unit 24a to determine whether the test person 11 is not autistic, compare FIG. 8. This can be determined with a certainty of more than 95%. However, the vocal biomarker alone is not meaningful enough to be able to make a clearly positive diagnosis of autism, so that further investigations are required, as explained below. The result of the preliminary assessment unit 24a is thus the intermediate diagnosis 33 that the test person 11 is clearly not autistic or needs to be examined further.

The preliminary evaluation unit 24a can likewise be part of the speech analysis module 21, see FIG. It should be noted that the intermediate diagnosis 33 by the preliminary assessment unit 24a is not absolutely necessary. Rather, it can be provided that each test person 11 carries out all analyzes offered by the diagnostic tool.

The pre-assessment unit 24a is an algorithm that compares the characteristic values with a multidimensional plane, also called a hyperplane, which, figuratively speaking, forms an interface between subjects with and subjects without autism in a multidimensional data space. The algorithm can be a machine learning algorithm or preferably a support vector machine (SVM). Such algorithms are generally known, for example from Böser, Bernhard E.; Guyon, Isabelle M.; Vapnik, Vladimir N. (1992). "A training algorithm for optimal margin classifiers". Proceedings of the fifth annual workshop on Computational learning theory - COLT '92. p. 144, or Fradkin, Dmitriy; Muchnik, Ilya (2006). "Support Vector Machines for Classification". In Abello, J.; Carmode, G. (eds.). Discrete Methods in Epidemiology. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. 70. pp. 13-20. This is a model that has been trained with datasets of vocal biomarkers from a large number of reference persons with and without autism, so that the model matches the determined characteristic values 28 of the vocal biomarker of the test person 11 with high accuracy to a person with autism or a person without Autism can map, wherein the assignment accuracy for test persons 11 without autism is more than 95%.

If the result of the intermediate diagnosis 33 is that the test person 11 is not clearly not autistic, or if an intermediate diagnosis 33 is not made, the analysis of the vocal biomarker is followed by the analysis of a further biomarker, either in the form of the reaction time of the test person 11 to an emotional stimulus, or in the form of the viewing direction of the test person 11, with both of the other biomarkers mentioned preferably being analyzed and with a specific sequence not being important.

According to one embodiment variant, the operating software 20 activates the emotion analysis module 22 after the speech analysis module 21, see FIG. The emotion analysis module 22 includes an emotion trigger control 22a, a

Emotion observation unit 22b and a reaction evaluation unit 22c, see Figure 4. The emotion analysis module 22 measures the reaction time of the test person 11 to an emotional stimulus, which is triggered in the test person 11 by the display of selected image data 18 in the form of individual images or individual videos on the image display device 7. where the measurement is performed using facial recognition software and compassionate AI capable of recognizing certain emotions in a face. This artificial intelligence is preferably a so-called "deep learning model" that has been trained with representative data sets on the emotions to be stimulated.

For this purpose, the emotion analysis module 22 starts the emotion triggering control 22a in a first step. This is set up to load a set 18a, 18b of image data 18 from the memory 4 and to display it on the image display device 7 or to have it displayed. As with the language analysis module 21, a sentence 18a, 18b can be selected from a plurality of sentences depending on the aforementioned person-specific data, so that children or girls or persons of a first ethnic origin have a first sentence 18a of the image data 18, and adults, or boys or persons of a second ethnic origin are shown a second set 18b of the image data 18. This image data 18 is a number of individual images or individual videos that are displayed one after the other on the image display device 7 . Their content is chosen in such a way that it triggers an emotional reaction in the test person in the form of joy, cheerfulness, sadness, fear or anger.

The image data set 18 suitably comprises a total of 6 individual images and/or individual videos, each of which stimulates an equal number of positive emotions such as joy or cheerfulness and negative emotions such as sadness, fear or anger.

At the same time or shortly before the playback of the first individual image or video, the emotion trigger control 22a activates the emotion observation unit 22b, which in turn activates the image recording device 7 in order to record the face of the test person 11 or their facial expression, if necessary, at least temporarily, also record it. In one embodiment variant, the emotion observation unit 22b can be set up to record the detected face in a video recording and to analyze it “offline”, i.e. after the entire set 18a, 18b of individual images or videos has been shown. Alternatively, the face recorded by the image recording device 7 can be evaluated in real time, so that no video recording has to be saved. A video recording 29 is shown in FIG. 8, which represents the output signal of the image recording device 7 and can be either a stored video recording or a real-time recording, which is fed to the emotion observation unit 22b using signals.

The emotion trigger control 22a can set a start time marker t1, t2, t3, t4 with each playback of a new frame or video, which later serves as a reference. FIG. 16 illustrates this using four individual videos 18a1, 18a2, 18a3, 18a4 of the first set 18a of the image data 18, which are shown one after the other. The facial recognition software mentioned with compassionate artificial intelligence is part of the emotion observation unit 22b, which evaluates the video recording 29 to determine when the facial features of the test person 11 change to an extent that clearly indicates an emotional reaction, in particular associated with a specific expected emotion. In each of these recognized cases, the emotion observation unit 22b sets a reaction time marker E1, E2, E4. From the difference between the respective reaction time mark E1, E2, E3, E4 and the corresponding previously set start time mark t1, t2, t3, t4 as a reference, the reaction time R1, R2, R4 is then determined for each of the stimulated emotions (Ri = Ei - ti , with i = 1 , 2, 4). This takes place in the reaction evaluation unit 22c. In the example shown in FIG. 16, it is assumed that the test person 11 shows no or insufficient emotion in the third individual video 18a3, so that no reaction time mark could be set here either.

The individual individual images or individual videos 18a1, 18a2, 18a3, 18a4 can be played back by the emotion-triggering control 22a for a specific, specified duration, with the individual durations being able to be the same or different. Thus, the next frame or video is shown when the duration of the previous frame has expired. Alternatively or additionally, the next single image or single video can be shown as soon as or shortly after the emotion observation unit 22b has recognized an emotion. In this case, the emotion observation unit 22b gives feedback to the emotion trigger controller 22a to show the next frame or frame video.

However, since there is a risk that a specific emotion will not be triggered in the test person 11 at all, ending the playback of a single image or single video after the corresponding period forms a necessary fallback position. The non-stimulation of an emotion is also noted in this by the reaction evaluation unit 22c, e.g. with the value zero.

In an embodiment variant it can be provided that the emotion triggering controller 22a triggers a timer instead of the start time markers, with the emotion observation unit 22b being able to stop the timer again when an emotion is recognized instead of setting the reaction time markers. The timer reading is then read out by the reaction evaluation unit 22c and stored, since it represents the respective reaction time to the respective stimulated emotion. If a certain emotion is not triggered in the test subject 11, the playback of the next frame or video or the end of the playback time of the last frame or video can reset the timers take place. This case of non-stimulation of an emotion is also noted by the reaction evaluation unit 22c.

In addition to determining an emotional reaction, the emotion observation unit 22b can be set up to determine whether a positive or negative emotion was stimulated in the subject 11 . This determination, referred to below as the type of reaction, can be presented in the form of binary information +1 or −1 and linked to the corresponding reaction time R1, R2, R4. It serves as a plausibility check or enables the congruence of the emotional reaction to the stimulated emotion to be determined.

It can thus additionally be provided to use the type of reaction to determine whether the test person 11 shows a reaction to the stimulation that is to be expected. This is illustrated in FIG. 16 using a congruence indicator K1, K2, K3, K4. This results from the result of a comparison as to whether the type of reaction determined corresponds to the emotion stimulated. This comparison can also be carried out by the reaction evaluation unit 22c. If the type of reaction and the emotion stimulated are both positive or negative, there is congruence or agreement. The congruence indicator with the value 1 can show this case. Referring to FIG. 16, the test person 11 reacts as expected to the emotions stimulated by the first two individual images or individual videos 18a1, 18a2, so that the first and second congruence indicators K1, K2 each have the value 1. If the type of reaction and the emotion stimulated are different, there is a lack of congruence or agreement. The congruence indicator can display this case with the value 0 or -1. In FIG. 16, the value −1 has been chosen so that the congruence indicator value 0 can be used to indicate that there has been no reaction. This is the case with the third individual image or individual video 18a3, in which the third congruence indicator K3=0. In the example in FIG. 16, the subject 11 reacts unexpectedly to the emotion stimulated by the fourth frame or frame 18a4. Here the type of emotional reaction does not match the stimulated emotion, so that the fourth congruence indicator K4 has the value -1. As an alternative to the variant described above with individual images and/or individual videos, it can be provided that a single video from the emotion triggering controller 22a is shown on the image display device 7, which contains the individual emotional stimuli at specific times known to the emotion triggering controller 22a. Thus setting start time stamps is not necessary.

The emotion analysis module 22 thus supplies a reaction time Ri, a reaction type (positive or negative emotion, +1, -1) and a congruence indicator (values -1, 0, +1) for a number of emotions stimulated in the subject 11, which in their The characteristic values 30 of the second biomarker form the entirety, referred to as “emotional response biomarker” in FIG. A table of these characteristics is shown below:

After the characteristic values 30 of the second biomarker have been determined by the emotion analysis module 22, the operating software 20 activates the gaze direction analysis module 23, which is responsible for determining characteristic values 32 of a third biomarker of the test person 11, see Figure 7. This can be done automatically or based on a corresponding input from the Test person 11 done, who expects the diagnostic tool. The line of sight analysis module 23 includes a line of sight guide 23a and a line of sight monitoring unit 23b, see Figure 5. The line of sight analysis module 23 measures and tracks the line of sight of the test person 11 while he is looking at the image display device 7 . The image recording device 6 is expediently arranged relative to the image reproduction device 7 in such a way that it captures the face of the test person 11 . For this purpose, the viewing direction analysis module 23 starts the viewing direction guidance 23a in a first step. This is set up to load at least one individual image 19a or video of the image data 16, 18, 19 from the memory 4 and to display it on the image display device 7 or to have it displayed. As with the speech analysis module 21 and emotion analysis module 22, the at least one individual image 19a or video can be selected depending on the aforementioned personal data.

In order to steer the gaze of the test person 11, different embodiment variants are conceivable. In the case of an individual image 19a, this is reproduced on the image display device 7 in a smaller size than its total area or resolution allows, so that the individual image 19a only fills part of the display screen of the image display device 7. It is then displayed in this form by the line-of-sight guide 23a in chronological succession at different positions on the display screen, the frame 19a being able to appear discretely in succession at these positions or to be moved continuously from position to position along a continuous path. This variant is illustrated in FIG.

If they appear discretely, instead of a single frame, two or more different frames of the image data 16, 18, 19 can be loaded from the memory 4 and displayed alternately or randomly on different screen positions of the image display device 7.

It is also possible to use one or more videos instead of the single image or images, which are played back one after the other in the individual positions. The playback size of this video or these videos is therefore also smaller than the total area or resolution of the image display device 7.

The subject 11 is required to carefully follow the reproduction location of the frame or frames. For this purpose, the diagnostic tool can issue a corresponding request in advance on the image display device 7 or via a loudspeaker. Instead of the one or more individual images, the line of sight guide 23a can show a video on the image display device 7, specifically over the entire surface, which is designed to direct the gaze of the test person 11 via the image display device 7 along a specific path. For this purpose, the video can contain, for example, an object moving relative to a stationary background, such as a clown fish moving in an aquarium. Alternatively, events that attract the test person's attention can occur one after the other at different spatial points in the video. In these cases, the line of sight guide 23a consequently only needs this one video.

At the same time or shortly before the playback of the single image or video, the line of sight guide 23a activates the line of sight monitoring unit 23b, which in turn activates the image recording device 7 in order to record the face of the test person 11 or their line of sight, if necessary, at least temporarily also record it. In one embodiment variant, the line of sight analysis module 23 can be set up to record the detected face in a video recording and to analyze it “offline”, i.e. after the at least one individual image or video has been shown. However, a real-time evaluation of the viewing direction of the face recorded by the image recording device 7 is preferably carried out, so that no video recording has to be stored permanently. A video recording 31 is shown in FIG. 8, which represents the output signal of the image recording device 7 and can be either a stored video recording or a real-time recording, which is fed to the viewing direction monitoring unit 23b using signals.

The line of sight observation unit 23b is formed by eye-tracking software based on artificial intelligence. Such software is well known, e.g. from Krafka K, Khosla A, Kellnhofer P, Kannan H., "Eye Tracking for Everyone", IEEE Conference on Computer Vision and Pattern Recognition. 2016; 2176-2184. It determines the viewing direction of the test person 11 in the form of x, y coordinates of the focus of the eye at any point in time and stores them, so that a viewing direction path 35 currently results, as shown in FIG. This represents the characteristic values 32 of the third biomarker below Also referred to as viewing direction biomarkers, see also FIG.

As FIG. 8 shows, the characteristic values 28, 30, 21 of the three biomarkers, more precisely the vocal biomarker, the emotional response biomarker and the gaze direction biomarker, are supplied to an overall result evaluation unit 24, which is part of the diagnostic tool according to the invention and which calculates the characteristic values 28, 30, 21 the biomarkers combined. Like the preliminary evaluation unit 24a, the overall result evaluation unit 24 is an artificial intelligence-based algorithm and is in the form of a model that has been trained with datasets of the three biomarkers of a large number of reference persons with and without autism. Strictly speaking, the algorithm is one

Classification algorithm that classifies the subject's biomarkers as "autistic" or "non-autistic" with a certain degree of probability. The algorithm can be a machine learning algorithm or preferably a support vector machine (SVM). He compares the entirety of all characteristic values 28, 30, 21 of the three biomarkers simultaneously with a flyer level forming an interface between test persons with and test persons without autism in a multidimensional data space in order to assign the entirety of the data formed by the characteristic values either to a reference group of people with autism or to a reference group of people without autism. Depending on this assignment result, the diagnosis 34 is that the test person 11 is autistic or non-autistic with a certain probability.

Through the supporting use of the diagnostic tool according to the invention in the diagnosis of autism, it can be determined with an accuracy of more than 95% whether a test person 11 suffers from autism. The evaluation of the biomarkers leads to a robust and, above all, objective result. Among the multitude of individuals potentially suffering from autism waiting to be diagnosed by a medical expert, using the diagnostic tool can help reduce the diagnostic backlog and facilitate the decision as to which one Patents should be preferred to diagnosis by the medical expert.

A particular advantage of the diagnostic tool is that both adults and children can be examined with it and the diagnostic tool can be used from almost anywhere and at any time, especially from home.

As explained in relation to Figure 1, the software-based diagnostic tool is part of a diagnostic system 1. In a first embodiment variant, this can be a computer system 2 with peripheral devices connected to it, in particular a microphone 5, a camera 6, a display/monitor and an input device 8. The computer system 2 itself can be a personal computer with a non-volatile memory 4 in which the diagnostic tool consisting of the aforementioned software components or modules and data is stored.

In a second embodiment variant, which is shown in FIG. 2, the computer system 2 can act as a server that can be reached via the Internet 9 with an external, in particular mobile, device 12 . In this case, the peripheral devices 5, 6, 7, 8 are part of the external device, which is a smartphone or a tablet, for example. In this case, the diagnostic tool is still formed by the software components or modules and data stored in the memory 4 of the computer system 2 .

In a third embodiment variant, which is illustrated in FIG. 6, the diagnostic tool can be arranged in a distributed manner, to be more precise, it can be embodied partly in the computer system 2 and partly in the external device 12 . This embodiment variant implements an offline analysis of the biomarkers. A non-volatile memory 4′ and a processor (not shown here) can thus be present in the external device 12 . The non-volatile memory 4' stores the image data 16, 18, 19 and the text data 17 on the one hand, as well as part 20' of the operating software and those components 21a, 21b, 22a, 23a of the analysis modules 22, 22, 23 that do not require high computing power and do not place any special demands on the processor, e.g. a multi-core processor. In memory 4' is dated speech analysis module 21, the speech signal trigger controller 21a and the speech recording unit 21b. They perform the same process as previously explained, the difference being that the audio recording 27 is stored in the audio data store 13a and not analyzed on the external device 12. In addition, the emotion trigger control 22a of the emotion analysis module 22 and the gaze direction control 23a of the viewing direction analysis module 23 are stored in the memory 4 ′. These also each carry out the same method as explained above, with one difference being that during the respective playback of the images or the image, a video recording 29, 31 takes place, which is stored in the video data memory 13b and not analyzed on the external device 12 . For recording the video recordings 29, 31, a video recording unit 25 is also present in the memory 4', analogous to the voice recording unit 21b.

In contrast, the computer system 2 in its memory 4, in addition to a second part 20 'of the operating software, only contains those components of the analysis modules 21, 22, 23 that perform the actual analysis of the biomarkers, namely the speech signal analyzer 21c of the speech analysis module 21, the emotion observation unit 22b and the reaction evaluation unit 22c of the emotion analysis module 22 and the viewing direction observation unit 23b of the viewing direction analysis module 23. Finally, however, the overall result evaluation unit is also present in the memory 4. Furthermore, an audio data memory 13a and a video data memory 13b are also provided in the memory 4 of the computer system 2, into which the audio and video recordings 27, 29, 31 stored on the external device 12 are transferred. This can be done immediately after the corresponding recording has been saved or only after all recordings have been made. The evaluation of the individual biomarkers and the joint assessment of their characteristic values then continue to take place on the computer system.

In a fourth embodiment variant, not shown, it can be provided in a development of the third variant that the analyzing components 21c, 22b, 22c, 23b of the analysis modules 21, 22, 23 are also arranged in the external device 12, so that the determination of the characteristic values 28, 30, 32 the biomarker is also external Device 12 is done. As a result, only these characteristic values 28, 30, 31 are then transmitted to the computer system 2, where they are evaluated together with the overall result evaluation unit 24 accordingly. This is advantageous for reasons of data protection because the characteristic values of the biomarkers do not allow the test person to be identified.

In a fifth embodiment variant it is provided that the diagnostic tool is arranged entirely in the external device 12, so that the diagnostic system 1 is formed only from this external device 12 with the peripheral devices 5, 6, 7, 8 already integrated therein and the diagnostic tool stored thereon. The diagnostic tool can be implemented in an application, called app for short, and executed on a corresponding processor of the external device. Preferably the external device is a smartphone or tablet.

List of reference symbols Diagnostic system Computer system/server Processing unit/processor, 4' non-volatile memory Voice input device/microphone Image recording device/camera Image display device/display Input means/control buttons Network/Internet 0 Interface 1 test person 2 external, possibly mobile terminal device, tablet/smartphone/laptop 3a audio data memory for recording/recorded Speech elements 3b Video data storage for video recordings to be recorded/ recorded 4 Data 5 program code/ software 6 first image data, for voice triggering 6a first set of individual images or videos to be reproduced 6b second set of individual images or videos to be reproduced 7 text data of texts to be reproduced 7a first set of text data 7b second set of text data 8 second image data, to trigger emotions 8a first set of individual images or videos to be reproduced 8b second set of individual images or videos to be reproduced 8a1, 18a2, 18a3, 18a4 individual videos 9 third image data, for B View direction control 9a single image 0, 20', 20" operating software 1 speech analysis module 1a speech signal trigger control b Speech recording unit c Speech signal analyzer Emotion analysis module a Emotion triggering control b Emotion observation unit c Reaction evaluation unit Gaze line analysis module a Gaze guidance b Gaze line observation unit/eye tracking Overall result assessment unit a Preliminary assessment unit Video recording unit Speech signal Audio recording Characteristics of the vocal biomarker First video recording Characteristics of the emotion response biomarker Second video recording Characteristics of the gaze direction biomarker Assessment result/ Intermediate diagnosis evaluation result/diagnosis line of sight path

Claims

Expectations

Claims 1. A software-based diagnostic tool for use in diagnosing a chronic neurological disorder in a human subject (11) using artificial intelligence, characterized by

- a higher-level operating software (20, 20', 20"),

- a speech analysis module (21) for determining characteristic values (28) of a first, specifically vocal, biomarker of a speech signal (26) of the test person (11),

- at least one further module (22, 23) for determining characteristic values (30, 32) of a second biomarker, and

- A total result evaluation unit (25) connected downstream of the speech analysis module (21) and the further module (22, 23), wherein

- the operating software (20, 20', 20") is set up to trigger the speech analysis module (21) and the at least one further module (22, 23) one after the other and to feed their determined characteristic values (28, 30, 32) to the overall result evaluation unit (25). ,

- the speech analysis module (21) comprises the following: a speech signal trigger control (21a), which is set up to display a set (16a, 16b) of individual images and/or individual videos (16) or a text (17a, 17b) on an image display device (7 ) for the test person (11) in order to present at least one speech signal (26) to the test person (11) in the form of a naming of an object contained in the respective individual image or individual video (16) or in the form of a reading of the text (17a, 17b) to trigger a voice recording unit (21b) which is set up to record the voice signal (26) in an audio recording (27) with the aid of a voice input device (5), and a voice signal analyzer (21c) which is set up to record the voice signal (26) in the Audio recording (27) first to evaluate which pitch occurs at which point in time, and then a frequency distribution of the pitches over a number of frequency bands of a Frequ ence spectrum determine, this frequency distribution forming the characteristic values (28) of the first biomarker, and

- the overall result evaluation unit (25) is set up to determine, based on the characteristic values (28, 30, 32) of the biomarkers of the test person (11) using a machine learning algorithm based on artificial intelligence by comparison with a multidimensional interface, whether the test person (11) chronic neurological disorder.

2. Diagnostic tool according to claim 1, characterized in that the further module (22, 23) is an emotion analysis module (22) for evaluating the reaction of the test person (11) to an emotional stimulus as a second biomarker, and the emotion analysis module (22) comprises at least the following : an emotion-triggering control (22a), which is set up to display a set (18a, 18b) of individual images and/or individual videos (18) or at least one individual video on the image display device (7) in order to display a number of individual emotions in the test subject (11), and an emotion observation unit (21b), which is set up to evaluate a recording of the subject's (11) face, obtained with the aid of an image recording device (6), at least to determine when they show an emotional reaction, the emotion analysis module (22 ) is set up, at least the respective reaction time (R1, R2, R4) between the stimulation of the respective emotion and the occurrence least to determine the emotional reaction, and at least these reaction times form the characteristic values (30) of the second biomarker.

3. Diagnostic tool according to claim 1, characterized in that the further module (22, 23) is a viewing direction analysis module (23) for evaluating the viewing direction of the test person (11) as a second biomarker, and the viewing direction analysis module (23) comprises at least the following: a line of sight guide (23a), which is set up to display at least one image (19) or video on the image display device (7) in order to guide the line of sight of the test person (11), and a line of sight observation unit (23b), which is set up to consist of a using an image recording device (6) to determine the subject's line of sight over time, this line of sight forming the characteristic values (32) of the second biomarker.

4. Diagnostic tool according to claim 2 and 3, characterized in that the emotion analysis module (22) is a first further module (22, 23) and the line of sight analysis module (23) is a second further module (22, 23), and at least the reaction times (R1 , R2, R4) on the emotion stimuli form characteristic values of the second biomarker and the line of sight over time forms characteristic values of a third biomarker of the test person (11), the overall result evaluation unit (25) being set up on the basis of the characteristic values (28, 30, 32) of the first, second and third biomarkers of the test person (11) using the machine learning algorithm based on artificial intelligence to determine whether the test person (11) has the chronic neurological disorder by comparison with a multidimensional interface.

5. Diagnostic tool according to one of the preceding claims, characterized in that the learning algorithm is a Support Vector Machine (SVM), a Random Forest or a Deep Convolutional Neural Network - algorithm, wherein the learning algorithm with a number of first and second comparison data sets from characteristic values Biomarker has been trained, wherein the first comparative data sets are associated with a group of reference persons who have the chronic neurological disorder, and the second comparative data sets are associated with a group of reference persons who do not have the chronic neurological disorder.

6. Diagnostic tool according to one of the preceding claims, characterized in that it is set up to trigger the speech signal and/or the set (16a, 16b) of individual images and/or individual videos (16) or the text (17a, 17b). Set (18a, 18b) of individual images and/or individual videos (18) or the at least one video for the emotion stimulation and/or the at least one image (19) or video for the direction of gaze guidance depending on personal data of the test person (11) and to display, in particular that the voice signal trigger control (21a) is set up to select either the set (16a, 16b) of individual images and/or individual videos (16) or the text (17a, 17b) depending on the age of the test person (11) and to represent.

7. Diagnostic tool according to one of the preceding claims, characterized by a bandpass filter which is set up to restrict the pitch spectrum under consideration to the range between 30 and 600 Hz.

8. Diagnostic tool according to one of the preceding claims, characterized in that the number of frequency bands is between 6 and 18, preferably 12.

9. Diagnostic tool according to one of the preceding claims, characterized in that the speech signal analyzer (21c) includes a deep convolutional neuronal network algorithm, in particular CREPE, or a PRAAT algorithm, in order to estimate the pitches.

10. Diagnostic tool at least according to claim 2, 3 or 4, characterized in that the emotion observation unit (21b) and/or the viewing direction observation unit (23b) are set up to evaluate the facial image in real time.

11. Diagnostic tool at least according to claim 2 or 4, characterized in that the emotion observation unit (21b) a facial recognition software based on compassionate artificial intelligence that is trained to recognize certain emotions.

12. Diagnostic tool at least according to claim 2 or 4, characterized in that the emotion observation unit (21b) is set up to determine the reaction type to the respective stimulated emotion in addition to the reaction time, and that this reaction type is part of the characteristic values of the second biomarker.

13. Diagnostic tool at least according to claim 2 or 4, characterized in that the emotion triggering control (22a) is set up to stimulate between 4 and 12 emotions, preferably 6 emotions.

14. Diagnostic tool at least according to claim 3 or 4, characterized in that the line of sight guide (23a) is set up to display the at least one image (19) or video on discrete positions of the image display device (7) one after the other or to move it along a continuous path.

15. Diagnostic tool at least according to claim 3 or 4, characterized in that the line of sight observation unit (23b) comprises eye-tracking software.

16. Use of the diagnostic tool according to any one of claims 1 to 15 as a software application for a portable communication terminal, in particular a smartphone (12) or tablet.

17. Use of the diagnostic tool according to one of claims 1 to 15 as a software application on a server which can be controlled via a computer network (9) by a browser on an external terminal (12) in order to run the diagnostic tool.

18. Diagnostic system for use in the diagnosis of a chronic neurological disorder in a human subject (11) using artificial intelligence, characterized by a diagnostic tool according to any one of claims 1 to 15, at least one non-volatile memory (4, 4') with the Program code (15) and data (14) forming the diagnostic tool, a processing unit (2) for executing the program code (15) and processing the data (14) of the diagnostic tool and the following peripheral devices (5, 6, 7, 8):

- a voice input device (5) for recording at least one voice signal (26) of the test person (11) for the diagnostic tool,

- an image recording device (6) for recording the face of the test person (11) for the diagnostic tool,

- an image display device (7) for displaying image data for the test person (11) and

- at least one input means (8) for making inputs by the test person (11), the peripheral devices (5, 6, 7, 8) being operatively connected to the processing unit (2) and the diagnostic tool being set up, the voice input device (5) to control the image recording device (6) and the image display device (7) at least indirectly and to evaluate the recordings (27, 29, 31) from the voice input device (5) and the image recording device (6).

19. Diagnostic system according to claim 18, characterized in that it is a portable communication terminal, in particular a smartphone (12) or tablet.

20. Diagnostic system according to claim 18, characterized in that the processing unit (2) is part of a server which is connected to a computer network (9) and can be controlled via a browser, and the non-volatile memory (4) is connected to the server, and that the peripheral devices are part of an external terminal (12), in particular a portable communication terminal.

21. Diagnostic system according to claim 20, characterized in that the external terminal (12) has a further volatile memory (4'), the diagnostic tool being stored partially on the server-side memory (4) and partially on the terminal-side memory (4'). .

22. A method of operating a software-based diagnostic tool for use in diagnosing a chronic neurological disorder in a human subject (11) using artificial intelligence, comprising

- a higher-level operating software (20, 20', 20"),

- the operating software (20, 20', 20") triggers the speech analysis module (21) and the at least one further module (22, 23) one after the other and feeds their determined characteristic values (28, 30, 32) to the overall result evaluation unit (25),

- A speech signal trigger control (21a) of the speech analysis module (21) represents a set (16a, 16b) of individual images and/or individual videos (16) or a text (17a, 17b) on an image display device (7) for the test person (11). in order to trigger at least one speech signal (26) in the test person (11) in the form of a naming of an object contained in the respective individual image or individual video (16) or in the form of a reading of the text (17a, 17b),

- A voice recording unit (21b) of the voice analysis module (21) records the voice signal (26) using a voice input device (5) in an audio recording (27), and - A speech signal analyzer (21c) of the speech analysis module (21) first evaluates the speech signal (26) in the audio recording (27) as to which pitch occurs at which point in time, and then determines a frequency distribution of the pitches over a number of frequency bands of a frequency spectrum under consideration, wherein this frequency distribution forms the characteristic values (28) of the first biomarker, and

- The overall result evaluation unit (25) based on the parameters (28,

30, 32) the biomarker of the subject (11) using a machine learning algorithm based on artificial intelligence by comparison with a multidimensional interface to determine whether the subject (11) has the chronic neurological disorder.

23. The method according to claim 22, characterized in that the further module (22, 23) is an emotion analysis module (22) for evaluating the reaction of the test person (11) to an emotional stimulus as a second biomarker, and the emotion analysis module (22) carries out the following steps :

- An emotion triggering control (22a) of the emotion analysis module (22) displays a set (18a, 18b) of individual images and/or individual videos (18) or at least one individual video on the image display device (7) in order to display a number of individual emotions in the test subject (11) to stimulate, and

- An emotion observation unit (21b) of the emotion analysis module (22) evaluates a recording (29) of the face of the test person (11) obtained with the aid of an image recording device (6) at least to determine when it shows an emotional reaction, and

- the emotion analysis module (22) determines at least the respective reaction time (R1, R2, R4) between the stimulation of the respective emotion and the occurrence of the emotional reaction, with at least these reaction times forming the characteristic values (30) of the second biomarker.

24. The method according to claim 22, characterized in that the further module (22, 23) is a viewing direction analysis module (23) for evaluating the viewing direction of the test person (11) as a second biomarker, and the viewing direction analysis module (23) carries out the following steps: - a line of sight guide (23a) of the line of sight analysis module (23) displays at least one image (19) or video on the image display device (7) in order to guide the line of sight of the subject (11), and

- A viewing direction monitoring unit (23b) of the viewing direction analysis module (23) from a recording of the face of the subject (11) obtained with the aid of an image recording device (6) determines the viewing direction over time, this viewing direction course forming the characteristic values (32) of the second biomarker.

25. The method according to claim 23 and 24, characterized in that the emotion analysis module (22) is a first further module (22, 23) and the viewing direction analysis module (23) is a second further module (22, 23) and these modules (22, 23 ) are triggered one after the other, wherein at least the reaction times (R1, R2, R4) to the emotion stimuli form characteristic values of the second biomarker and the viewing direction over time forms characteristic values of a third biomarker of the test person (11), and wherein the

Overall result evaluation unit (25) based on the characteristic values (28, 30, 32) of the first, second and third biomarker of the subject (11) using the machine learning algorithm based on artificial intelligence by comparison with a multidimensional interface determines whether the subject (11) the chronic neurological disorder.