WO2021104099A1

WO2021104099A1 - Multimodal depression detection method and system employing context awareness

Info

Publication number: WO2021104099A1
Application number: PCT/CN2020/129214
Authority: WO
Inventors: 苏荣锋; 王岚; 燕楠
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2019-11-29
Filing date: 2020-11-17
Publication date: 2021-06-03
Also published as: CN110728997A; CN110728997B

Abstract

A multimodal depression detection method and system employing context awareness. The method comprises: constructing a training sample set comprising topic information, a spectrogram and corresponding text information; using a convolutional neural network in combination with multi-task learning to perform acoustic feature extraction on the spectrogram of the training sample set, and obtaining an acoustic feature having context awareness; using a transformer model to process a word embedding on the basis the training sample set, and extracting a textual feature having context awareness; establishing, with respect to the acoustic feature having context awareness, an acoustic channel subsystem for depression detection; establishing, with respect to the textual feature having context awareness, a textual channel subsystem for depression detection; and fusing outputs of the acoustic channel subsystem and the textual channel subsystem to obtain depression classification information. The present invention can improve the accuracy of depression detection.

Description

A multi-modal depression detection method and system based on situational perception

Technical field

The present invention relates to the technical field of depression detection, in particular to a multi-modal depression detection method and system based on context perception.

Background technique

In terms of feature extraction related to depression, early speech-based depression-related research mainly focused on time domain features, such as pause time, recording time, feedback time to questions, speaking speed, etc. Later, it was discovered that a single feature could not cover enough recognizable information to assist clinical diagnosis. With the in-depth study of speech signals, a large number of other speech signal features have been constructed. Researchers have tried various combinations of voice features, hoping to build a classification model for detecting patients with depression. These features include pitch, energy, speaking rate, formant, and Mel cepstrum coefficient (MFCC). Text is another kind of information related to depression that is "hidden" in the speech signal, and it is easier to obtain from the speech signal. Studies have shown that depression patients use negative emotion words and angry words significantly more than normal people. People often use word frequency statistics as text feature representation. This feature belongs to low-level text features. Recently, people prefer to use high-level text features to describe depression, which is the so-called word embedding feature to obtain word embedding. Commonly used network structures for features include skip-gram or CBOW (continuous bag-of-words) and so on.

In terms of depression detection under the condition of limited speech and text data for depression, given that the speech and text data of depression patients is difficult to collect on a large scale, the speech database that can be used to study depression is generally small. At present, researchers generally can only use simpler classification models for depression detection. Traditional voice-based depression detection methods include: Support Vector Machine (SVM), decision tree, Gaussian Mixture Model (GMM), etc. Deep learning is a new field of machine learning, which combines high-level abstract modeling of data by using multiple layers of non-linear transformations. Using deep learning algorithms can make the original data easier to adapt to learning and training in various directions. For example, use CNN and LSTM to combine to form a new deep network, and then extract the acoustic features of the speech signal and use it for the detection of depression. Another example is the semantic analysis of the conversation between the doctor and the depression patient, such as filled pause extraction, Principal Components Analysis (PCA), whitening transform (whitening transform) and other techniques to get some text The features are combined with a Linear Support Vector Regressor (SVR) classifier to classify depression. For another example, first use a separate LSTM layer to process the acoustic channel and the text channel separately, and then input the input features into the fully connected layer, and finally output the depression category. The acoustic features used in the prior art are some artificially defined 279-dimensional features, and the text features are 100-dimensional word embedding vectors extracted using the Doc2Vec tool.

In the prior art, detection methods based on biochemical reagents and EEG are usually adopted. However, in technical solutions based on voice, text, or images, most of them rely on voice data to perform depression based on feature extraction and classification. Detection. In short, the existing technology mainly has the following problems: in terms of the amount of training data, most of the existing multi-modal depression detection systems based on speech, text, or images are trained on limited depression data, so the performance is low. In terms of feature extraction, existing feature extraction methods lack verbal information related to topic and context, and are insufficient in the field of depression detection, which limits the performance of the final depression detection system; in terms of depression classification modeling, the existing technology does not consider speech , The long-term dependence of text features and depression diagnosis; in terms of multi-modal fusion, the prior art simply connects the subsystem outputs obtained under different modalities or channels in series, and finally makes a decision, ignoring each modal Or the weight relationship between channels, so performance is limited.

Summary of the invention

The purpose of the present invention is to overcome the above-mentioned shortcomings of the prior art and provide a multi-modal depression detection method and system based on context perception.

According to the first aspect of the present invention, a multi-modal depression detection method based on context perception is provided. The method includes the following steps:

Step S1: Construct a training sample set, the training sample set includes topic information, a spectrogram and corresponding text information;

Step S2: Using a convolutional neural network, combined with multi-task learning, perform acoustic feature extraction on the spectrogram of the training sample set to obtain acoustic features with contextual awareness;

Step S3: Use the training sample set to process the word embedding using the Transformer model, and extract context-aware text features;

Step S4: establishing an acoustic channel subsystem for depression detection for the context-aware acoustic features, and establishing a text channel subsystem for depression detection for the context-aware text features;

Step S5: fusing the outputs of the acoustic channel subsystem and the text channel subsystem to obtain depression classification information.

In an embodiment, the acoustic characteristics of the contextual perception are obtained according to the following steps:

Construct a convolutional neural network. The convolutional neural network includes an input layer, multiple convolutional layers, multiple fully connected layers, an output layer, and a bottleneck layer located between the last fully connected layer and the output layer. The bottleneck layer Compared with the convolutional layer and the fully connected layer, it has fewer nodes;

Inputting the spectrogram in the training sample set to the convolutional neural network, and the output layer contains the depression classification task and the topic labeling task;

The acoustic features of the context perception are extracted from the bottleneck layer of the convolutional neural network.

In an embodiment, the context-aware text features are extracted according to the following steps:

Construct a Transformer model, and use word embedding and topic identification as the input of the Transformer model. The Transformer model includes multiple encoders and decoders with self-attention and a softmax layer at the last layer;

Use the existing text corpus to pre-train the Transformer model parameters using an unsupervised training method, and then use transfer learning to perform adaptive training on the collected depression text data;

After the training is completed, the softmax layer is removed, and the output of the Transformer model is used as the context-aware text feature.

In one embodiment, step S5 includes:

Using a reinforcement learning mechanism to adjust the weight of the acoustic channel subsystem and the weight of the text channel subsystem to minimize the difference between the final depression classification prediction result and the feedback information;

The outputs of the acoustic channel subsystem and the text channel subsystem are merged to obtain a classification score for depression.

In one embodiment, the classification score of the depression is expressed as:

Among them, the weight w _i =[λ ₁ ,λ ₂ ,...,λ _c ], and c is the number of classifications of depression.

In one embodiment, the acoustic channel subsystem and the text channel subsystem are established based on a BLSTM network, and the network input of the acoustic channel subsystem is the perceptual linear prediction coefficients of consecutive multiple frames and the acoustic characteristics of the context perception , The output is a depression classification label, the network input of the text channel subsystem is text information, and the output is a depression classification label.

In one embodiment, the topic information in the training sample set includes multiple types of identifiers classified based on the content of the conversation between the doctor and the depression patient.

According to a second aspect of the present invention, a multi-modal depression detection system based on contextual perception is provided. The system includes:

Training sample construction unit: used to construct a training sample set, the training sample set including topic information, spectrogram and corresponding text information;

Acoustic feature extraction unit: used to extract acoustic features from the spectrogram of the training sample set by using a convolutional neural network, combined with multi-task learning, to obtain acoustic features with contextual awareness;

Text feature extraction unit: used to use the training sample set to process word embeddings using a Transformer model to extract context-aware text features;

Classification subsystem establishment unit: used to establish an acoustic channel subsystem for depression detection for the context-aware acoustic features, and establish a text channel subsystem for depression detection for the context-aware text features;

Classification and fusion unit: used to fuse the output of the acoustic channel subsystem and the text channel subsystem to obtain depression classification information.

Compared with the prior art, the present invention has the advantage of using the method of data enhancement to expand the voice and text training data of depression according to the topic information in the content of the free conversation between the doctor and the depression patient, and use the data for model training; Verbal information related to depression detection, including acquiring acoustic features that are not related to the speaker, highly related to depression, and context-aware, and text features that are highly related to depression and context-aware; consider doctors and depression patients For topical and contextual information in the content of free conversations, a depression detection subsystem is established in the acoustic channel and the text channel; the reinforcement learning method is used to obtain a multi-system fusion framework to achieve robust multi-modal depression automatic detection.

Description of the drawings

The following drawings only schematically illustrate and explain the present invention, and are not used to limit the scope of the present invention, in which:

Fig. 1 is a general framework diagram of a multi-modal depression detection method based on context perception according to an embodiment of the present invention;

Fig. 2 is a flowchart of a multi-modal depression detection method based on context perception according to an embodiment of the present invention;

Figure 3 is a schematic diagram of topic-based data enhancement;

Figure 4 is a schematic diagram of the acoustic feature extraction process based on CNN and multi-task learning;

Figure 5 is a schematic diagram of a text feature extraction process based on a multi-head self-attention mechanism;

Figure 6 is a schematic diagram of reinforcement learning.

Detailed ways

In order to make the objectives, technical solutions, design methods, and advantages of the present invention clearer, the following further describes the present invention in detail through specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, but not used to limit the present invention.

In all examples shown and discussed herein, any specific value should be construed as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values.

The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the specification.

To further understand the present invention, firstly refer to Figure 1. The overall technical solution includes: firstly adopt topic-based data enhancement method to obtain more topic-related depression speech and text data; then use CNN network combined with multi-task learning The method is to extract context-aware acoustic features from the spectrogram, and use Transformer to process word embeddings to obtain context-aware text features; then, use context-aware acoustic features and context-aware text features, respectively, using BLSTM (two-way length and short Temporal memory network) model is used to establish the depression detection subsystem; finally, the reinforcement learning method is used to make a fusion decision on the output of each subsystem to obtain the final depression classification.

Specifically, referring to FIG. 2, the multi-modal depression detection method based on context perception according to the embodiment of the present invention includes the following steps:

Step S210: Obtain a training sample set with context awareness.

The training sample set can be expanded based on the original training set to include context perception information. The original data set usually only includes the correspondence between speech and text.

Specifically, first, topic labeling is performed on each pair of speech and text data in the existing training set. For example, divide the content of conversations between doctors and patients with depression into 7 topics: whether they are interested, whether they sleep well, whether they feel depressed, whether they feel defeated, self-evaluation, whether they have ever been diagnosed with depression, and whether their parents have ever suffered from depression.

Next, expand the original training set:

For the speech and text belonging to each participant in the training set, calculate the number of unique topics; if the number is greater than or equal to m, use it as a candidate for data enhancement, where m is the limited number of topics;

For each candidate subject, randomly select n pairs of speech and text data belonging to the subject as a new combination;

For each new combination, randomly scramble the sequence of the voice and text data pairs, and then use them as a new training sample, as shown in Figure 3.

Some new training samples can be obtained through the above method, and the original training samples can be spliced together to expand the original data set and construct a new training sample set.

In this step, by defining the content of multiple topics that the doctor talks with the depression patient, and expanding the original training data set by random combination, a richer set of context-aware training samples can be obtained, including topic information, Spectrogram, text information, and corresponding classification labels, etc., thereby improving the accuracy of subsequent training.

Step S220, extracting acoustic features with context awareness based on CNN and multi-task learning.

In traditional methods, the acoustic features used (such as speech speed, pitch, pause duration, etc.) are all designed based on human knowledge in a specific field. Because these traditional features are insufficient in the field of depression, they affect the accuracy of the final test results. From a biological analysis, human visual perception is from low-level situational perception to high-level global perception, and Convolutional Neural Network (CNN) precisely simulates this process. In the CNN network, after partial weight sharing and a series of nonlinear transformations, some redundant and confusing information in the original visual information is removed, and only the most discriminative information of each local area is retained. In other words, the features obtained by CNN only contain the "common" description of different speakers, and individual information is discarded.

In order to make the finally obtained features contain different levels of information, the present invention combines multi-task learning and CNN network for classification network training. As shown in Figure 4, the input of the CNN network is the spectrogram of each training sample, and the CNN network includes several convolutional layers and several fully connected layers. In the convolutional layer, downsampling is performed using, for example, a maximum pooling technique. Between the last fully connected layer and the output layer, the embodiment of the present invention inserts a bottleneck layer, which contains only a few nodes, for example, the value is 39. The output layer of the CNN network contains two tasks. The first task is the classification of depression, for example, classification into multiple categories such as mild, severe, moderate, and normal. The second task is the labeling of different topics (or topic identification). ).

It should be noted that in this embodiment of the present invention, the context-aware acoustic features are extracted from the bottleneck layer of the CNN network, and are spliced with traditional acoustic features for subsequent classification network training.

In this step, CNN neural network and multi-task learning methods are used. The first task is the classification of depression, and the second task is the label of different topics. The output obtained by the network bottleneck layer is used as topical context awareness Characteristic acoustic characteristics.

Step S230, extracting context-aware text features based on the multi-head self-attention mechanism.

Traditional methods use word embedding to describe a piece of text. However, this feature is difficult to understand the meaning of a sentence from a semantic perspective, especially on some topics related to depression, which seriously lacks related semantic emotional representation. The self-attention mechanism mimics the internal process of biological observation behavior, and is good at capturing the internal correlation of data or features.

In the embodiment of the present invention, a Transformer model based on a multi-head self-attention mechanism is used to analyze the semantics of sentences, so as to extract context-aware text features. As shown in Figure 5, the input of the Transformer model is traditional word embedding plus topic ID (identification), and its main structure is composed of multiple encoders and decoders containing self-attention, which is the so-called multi-head mechanism. Because the Transformer model allows direct connections between data units, it allows the model to take into account the attention information of different locations and better capture long-term dependencies. In addition, in order to make the Transformer model fully trained, in the embodiment of the present invention, first use large-scale text corpus (such as Weibo, Wikipedia, etc.) to pre-train the Transformer model parameters using an unsupervised training method; and then use transfer learning. Method, self-adaptive training is performed on the collected textual data of depression. After the training is completed, the last softmax layer in Figure 5 is removed, and then the output is used as a text feature, that is, the extracted context-aware text feature, which will be used for subsequent depression detection model training.

In this step, combining word embedding and topic context information as input, the Transformer model can be used to extract robust text features.

In step S240, a subsystem for detecting depression is established for the acoustic features of context perception and the distribution of text features of context perception.

Because the diagnosis of depression is often not determined by a frame or sentence at a certain moment, but by a comprehensive decision of long-term multi-sentence information, the so-called long-term dependence. In order to capture this long-term dependency relationship, the embodiment of the present invention adopts a BLSTM-based method to establish a depression classification sub-network (or a sub-system). BLSTM can cache the current input, and use the current input to participate in the previous and next calculations to implicitly include time information into the model, thereby realizing the modeling of long-term dependencies. The BLSTM network adopted in the embodiment of the present invention has a total of 3 BLSTM layers, and each layer contains 128 nodes. For the acoustic channel, the corresponding network input is continuous 11 frames of PLP (Perceptual Linear Prediction Coefficient) and the acoustic features of context perception, and the output is the depression classification label; for the text channel, the corresponding network input is the context perception of a training sample The text feature of the output is the depression classification label.

In this step, the BLSTM network is used to establish a depression classification model to capture the long-term dependence of acoustic features or text features with the diagnosis of depression.

Step S250: Use reinforcement learning to fuse the outputs of the various depression detection subsystems to obtain the final depression classification.

Aiming at the strategy of multi-modal system information fusion, the embodiment of the present invention adopts a reinforcement learning mechanism to minimize the difference between the final depression prediction result and feedback information of the combined system by adjusting the weight of each subsystem. The final score for depression is expressed as:

Wherein the weights _{_{w i = [λ 1, λ}} 2, ..., λ c], c is the number of classification depression, S _i the corresponding subsystem. The decision score function L _{t of} reinforcement learning at time t is defined as:

L _t ＝W(A _t-1 )DC(2)

Where A _t-1 represents the feedback at time t-1, D represents the difference between the actual and predicted results of the development set, W represents the weight of all subsystems {w _i }, and C represents the global accuracy rate on the development set. Therefore, it is necessary to _{sum L t} at all times and maximize it, and the obtained W ^* is the weight of the final subsystem, which is expressed as:

W ^* ＝arg max _W ∑ _t L _t (3)

In the real-time example of the present invention, a hidden Markov model or other models can be used for reinforcement learning.

In this step, the reinforcement learning method is used to automatically adjust the weights of the subsystem score of the acoustic channel and the subsystem score of the text channel, so that they can be organically integrated to perform the final depression classification.

It should be understood that although this article introduces the training process, in practical applications, the trained network model can be used for new data (including topics, speech, text, etc.) using a process similar to training to treat depression Classification prediction. In addition, in addition to BLSTM, other models containing time information can also be used.

Correspondingly, the present invention also provides a multi-modal depression detection system based on context perception. Used to implement one or more aspects of the above method. For example, the system includes: a training sample construction unit, used to construct a training sample set, the training sample set includes topic information, a spectrogram and corresponding text information; an acoustic feature extraction unit, used to use a convolutional neural network, combined with multiple Task learning: extracting acoustic features from the spectrogram of the training sample set to obtain acoustic features with context awareness; text feature extraction unit: used to process the word embedding using the training sample set and using the Transformer model to extract With context-aware text features; classification subsystem establishment unit: used to establish an acoustic channel subsystem for depression detection for the context-aware acoustic features, and establish a text channel for depression detection for the context-aware text features Subsystem; classification fusion unit: used to fuse the output of the acoustic channel subsystem and the text channel subsystem to obtain depression classification information.

In summary, the present invention combines the information obtained by the acoustic channel and the text channel to achieve high-precision multi-modal depression detection. The main technical content includes: using topic-related data enhancement technology: based on limited depression speech and text data, using The topic information in the content of the free conversation between doctors and depression patients, expands the speech and text training data of depression; Robust analysis and extraction of depression-related features: Combining transfer learning and multi-head self-attention mechanism, extracting topical and context-aware features , And the acoustic feature description and text feature description showing the characteristics of depression patients to improve the accuracy of the detection system; BLSTM-based depression classification model: use the powerful time series modeling capabilities of the BLSTM network to capture acoustic information and text information and depression The long-term dependence of diagnosis; multi-modal fusion framework: the use of reinforcement learning methods to achieve the fusion of the depression detection subsystem under the acoustic channel and the text channel.

Compared with the prior art, the present invention has the following advantages:

1) The existing depression detection method only uses limited speech and text data of depression. Compared with this, the present invention uses a topic-based data enhancement method to expand the original training data set;

2). Most of the prior art uses features that lack topic context perception. Compared with this, the present invention uses CNN neural network and multi-task learning methods to extract acoustic features with topic context perception characteristics, and uses Transformer model to extract topics with topic context awareness. The textual features of context-aware features are in-depth feature descriptions, which can improve the robustness of depression detection;

3) The existing depression detection modeling technology does not consider the long-term dependence of speech, text features and depression diagnosis. In contrast, the present invention uses the BLSTM network to capture acoustic features or text features and the long-term diagnosis of depression. Dependency, better performance;

4) The existing multi-modal depression detection technology simply connects the outputs of different subsystems in series for decision-making. Compared with this, the present invention adopts a reinforcement learning method to automatically adjust the sub-system score weights under different channels, and Make the final classification decision, the performance is better.

It should be noted that although the steps are described in a specific order above, it does not mean that the steps must be executed in the above specific order. In fact, some of these steps can be executed concurrently, or even change the order, as long as it can be implemented. The required functions are sufficient.

The present invention may be a system, a method and/or a computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present invention.

The computer-readable storage medium may be a tangible device that holds and stores instructions used by the instruction execution device. The computer-readable storage medium may include, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing, for example. More specific examples of computer-readable storage media (non-exhaustive list) include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon The protruding structure in the hole card or the groove, and any suitable combination of the above.

The embodiments of the present invention have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the illustrated embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements of the various embodiments in the market, or to enable other ordinary skilled in the art to understand the various embodiments disclosed herein.

Claims

A multi-modal depression detection method based on situational perception, including the following steps:

Step S1: Construct a training sample set, the training sample set includes topic information, a spectrogram and corresponding text information;

Step S2: Using a convolutional neural network, combined with multi-task learning, perform acoustic feature extraction on the spectrogram of the training sample set to obtain acoustic features with contextual awareness;

Step S3: Use the training sample set to process the word embedding using the Transformer model, and extract context-aware text features;

Step S4: establish an acoustic channel subsystem for depression detection for the context-perceived acoustic features, and establish a text channel subsystem for depression detection for the context-aware text features;

Step S5: fusing the outputs of the acoustic channel subsystem and the text channel subsystem to obtain depression classification information.
The method according to claim 1, wherein the acoustic characteristics of the situational perception are obtained according to the following steps:

Construct a convolutional neural network. The convolutional neural network includes an input layer, multiple convolutional layers, multiple fully connected layers, an output layer, and a bottleneck layer located between the last fully connected layer and the output layer. The bottleneck layer Compared with the convolutional layer and the fully connected layer, it has fewer nodes;

Inputting the spectrogram in the training sample set to the convolutional neural network, and the output layer contains the depression classification task and the topic labeling task;

The acoustic features of the context perception are extracted from the bottleneck layer of the convolutional neural network.
The method according to claim 1, wherein the context-aware text features are extracted according to the following steps:

Construct a Transformer model, and use word embedding and topic identification as the input of the Transformer model. The Transformer model includes multiple encoders and decoders with self-attention and a softmax layer at the last layer;

Use the existing text corpus to pre-train the Transformer model parameters using an unsupervised training method, and then use transfer learning to perform adaptive training on the collected depression text data;

After the training is completed, the softmax layer is removed, and the output of the Transformer model is used as the context-aware text feature.
The method according to claim 1, wherein step S5 comprises:

Using a reinforcement learning mechanism to adjust the weight of the acoustic channel subsystem and the weight of the text channel subsystem to minimize the difference between the final depression classification prediction result and the feedback information;

The outputs of the acoustic channel subsystem and the text channel subsystem are merged to obtain a classification score for depression.
The method of claim 4, wherein the classification score of the depression is expressed as:

Among them, the weight w i =[λ 1 ,λ 2 ,...,λ c ], and c is the number of classifications of depression.
The method according to claim 1, wherein the acoustic channel subsystem and the text channel subsystem are established based on a BLSTM network, and the network input of the acoustic channel subsystem is the perceptual linear prediction coefficients of consecutive multiple frames and The output of the context-aware acoustic feature is a depression classification label, the network input of the text channel subsystem is text information, and the output is a depression classification label.
The method according to claim 1, wherein the topic information in the training sample set includes multiple types of identifiers classified based on the content of the conversation between the doctor and the depression patient.
A multi-modal depression detection system based on situational awareness, including:

Training sample construction unit: used to construct a training sample set, the training sample set including topic information, spectrogram and corresponding text information;

Acoustic feature extraction unit: used to extract acoustic features from the spectrogram of the training sample set by using a convolutional neural network, combined with multi-task learning, to obtain acoustic features with contextual awareness;

Text feature extraction unit: used to use the training sample set to process word embedding using a Transformer model to extract context-aware text features;

Classification subsystem establishment unit: used to establish an acoustic channel subsystem for depression detection for the context-aware acoustic features, and establish a text channel subsystem for depression detection for the context-aware text features;

Classification and fusion unit: used to fuse the output of the acoustic channel subsystem and the text channel subsystem to obtain depression classification information.
A computer-readable storage medium having a computer program stored thereon, wherein the program is executed by a processor to realize the steps of the method according to any one of claims 1 to 7.
A computer device, comprising a memory and a processor, and a computer program that can run on the processor is stored on the memory, wherein the processor implements any one of claims 1 to 7 when the program is executed. The steps of the method described in item.