WO2024019567A1

WO2024019567A1 - Method, apparatus, and computer program for generating sleep analysis model predicting sleep state on the basis of sound information

Info

Publication number: WO2024019567A1
Application number: PCT/KR2023/010525
Authority: WO
Inventors: 홍준기; 트란홍하이; 김대우; 이동헌; 정진환; 김종목; 김형국
Original assignee: 주식회사 에이슬립
Priority date: 2022-07-20
Filing date: 2023-07-20
Publication date: 2024-01-25

Abstract

The present invention provides an artificial neural network model for determining a sleep state of a user on the basis of sound information sensed in the sleep environment of the user. This method for generating a sleep analysis model may comprise the steps of: acquiring sleep sound information about the user; pre-processing the sleep sound information; and acquiring sleep state information by analyzing the pre-processed sleep sound information. According to the present invention, sound information related to a sleep environment is easily acquired through a user terminal (for example, a mobile terminal) carried by a user, and a sleep stage of the user is analyzed on the basis of the acquired sound information so that a sleep state can be determined.

Description

Method, apparatus, and computer program for generating a sleep analysis model that predicts sleep state based on acoustic information

The present invention is intended to analyze a user's sleeping state, and more specifically, to analyze the sleeping state based on acoustic information obtained from the user's sleeping environment.

There are various ways to maintain and improve your health, such as exercise and diet, but it is most important to manage sleep well, which takes up more than 30% of the day. However, despite the simple replacement of labor by machines and the leisure of life, modern people are unable to sleep well due to irregular eating habits, lifestyle habits, and stress, and suffer from sleep disorders such as insomnia, hypersomnia, sleep apnea syndrome, nightmares, night terrors, and sleepwalking. I'm receiving it.

According to the National Health Insurance Service, the number of patients with sleep disorders in Korea increased by about 8% on average per year from 2014 to 2018, and the number of patients treated for sleep disorders in Korea in 2018 reached approximately 570,000.

Additionally, according to a 2019 sleep-related survey, 62% of adults around the world do not get as much sleep as they want, and 67% of adults experience at least one sleep disorder every night. And while eight out of 10 adults around the world want to improve their sleep, 60% are unable to seek help from a medical professional, and 44% of adults worldwide say their sleep quality has worsened over the past five years.

Interest in a good night's sleep is increasing as it is recognized as an important factor affecting physical and mental health. However, in order to improve sleep disorders, a person must visit a specialized medical institution, a separate test fee is required, and continuous treatment is required. Due to the difficulty in management, users' efforts toward treatment are insufficient.

As sleep problems become more serious day by day, the need for sleep health management is increasing, and the sleep tech market, which seeks to solve sleep problems through technology, is also growing rapidly.

In addition, when analyzing and inferring information about sleep for sleep health management, it is required to learn various types of data in a multimodal manner rather than using only one data, and to make more accurate inferences through this.

Republic of Korea Patent Publication No. 10-2003-0032529 receives the user's physical information and outputs vibration and/or ultrasonic waves in the frequency band detected through repetitive learning according to the user's physical condition during sleep to induce optimal sleep. Discloses a sleep induction device and a sleep induction method that enable.

However, in the conventional technology, there is a risk that sleep quality may be reduced due to discomfort caused by body-worn equipment, and periodic management of the equipment (eg, charging, etc.) is required. Accordingly, research has recently been conducted to estimate the sleep state by monitoring the user's sleep in a non-contact manner and to manage the user's sleep according to the estimated sleep state.

In particular, recently, a method of analyzing a user's sleep using a wearable device has been proposed. Republic of Korea Patent Publication No. 10-2022-0015835 relates to an electronic device for evaluating sleep quality and an operating method in the electronic device, which identifies the sleep cycle based on sleep-related information acquired by the wearable device during sleep time. And it suggests a method to evaluate sleep quality accordingly.

However, the conventional sleep analysis method using a wearable device had a problem in that sleep analysis was not possible when the wearable device was not properly contacted with the user's body or when the user was not wearing the wearable device. Additionally, when multiple users sleep in the same space, not only does the movement of the non-wearable device wearer interfere with the sleep analysis of the wearable device wearer, but there is also a problem in that sleep analysis for the non-wearable device wearer is impossible.

Therefore, even without separate equipment, the user can easily acquire acoustic information related to the sleep environment through a user terminal (e.g., a mobile terminal) carried by the user, and analyze the user's sleep stage based on the acquired acoustic information. There may be a demand for technology that seeks to detect sleep states.

In addition, there may be a demand for a technology that can easily and accurately analyze sleep status without any special devices in an unrestricted general environment (e.g., home environment, etc.) rather than a limited environment such as a PSG environment, hospital environment, or laboratory environment.

Additionally, there may be a demand for technology that detects the user's sleep state based on at least one of sleep sound information or other sleep environment information.

Additionally, there may be a demand for a technology that detects sleep status in real time based on at least one of the user's sleep sound information or other sleep environment information.

The present invention was developed in response to the above-described background technology, and is intended to provide an artificial neural network model that determines the user's sleep state based on at least one of acoustic information or sleep environment information detected in the user's sleep environment.

The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

In one embodiment of the present invention to solve the above-described problem, a method for generating a sleep analysis model that predicts a sleep state based on acoustic information is disclosed.

The method of generating the sleep analysis model includes obtaining sleep sound information of a user, performing preprocessing on the sleep sound information, and performing analysis on the preprocessed sleep sound information to obtain sleep state information. May include steps.

In one embodiment of the present invention, acquiring the sleep state information may include obtaining the sleep state information using a sleep analysis model including one or more network functions.

In one embodiment of the present invention, the method for generating a sleep analysis model includes converting raw acoustic information in the time domain having amplitude, phase, and frequency into information including changes in the frequency components along the time axis. May include steps. Here, according to an embodiment of the present invention, the converted information may be visualized.

Alternatively, in one embodiment of the present invention, it may include converting raw acoustic information in the time domain into information in the frequency domain having amplitude and frequency. Here, according to an embodiment of the present invention, the information on the converted frequency domain may be visualized.

In one embodiment of the present invention, the step of converting acoustic information into spectrogram information in the frequency domain may be included.

In one embodiment of the present invention, the step of applying the Mel scale to the spectrogram and converting it into a Mel spectrogram may be included.

In one embodiment of the present invention, it may include performing preprocessing on raw acoustic information in the time domain or information in the frequency domain.

In one embodiment of the present invention, performing preprocessing on the acoustic information may include performing spectral noise gating or deep learning-based noise reduction.

In one embodiment of the present invention, the step of performing data augmentation on information in the frequency domain may be included.

In one embodiment of the present invention, the data augmentation may include performing one or more of pitch shifting, TUT (Tile UnTile) augmentation, or noise-added augmentation.

In one embodiment of the present invention, the noise addition augmentation may include a method of converting noise information and sleep sound information into the frequency domain, respectively, and adding information on the frequency domain.

In one embodiment of the present invention, the noise addition augmentation may include a method of converting noise information and sleep sound information into a spectrogram, respectively, and adding information on the domain converted into a spectrogram.

In one embodiment of the present invention, the noise addition augmentation may include a method of adding information on a domain converted into a Mel spectrogram to which a Mel scale is applied to the sleep sound information and noise information.

In one embodiment of the present invention, a step of converting information in the frequency domain into a form close to a square may be included.

In one embodiment of the present invention, the step of converting to a shape close to a square may include at least one of reshaping, resizing, and split-cat methods.

In one embodiment of the present invention, the step of converting information or spectrogram in the frequency domain to dB scale (log scale) may be included.

In one embodiment of the present invention, it may include performing a normalization process so that the average of all values is 0 and the standard deviation is 1 for information or spectrogram in the frequency domain.

In one embodiment of the present invention, it may include the step of using frequency domain information, spectrogram or mel spectrogram information as an image as input to the artificial intelligence model.

In one embodiment of the present invention, the step may include configuring a plurality of frequency domain information, spectrograms, or mel spectrograms by dividing information, spectrograms, or mel spectrograms in the frequency domain into 30-second increments.

In one embodiment of the present invention, the step of extracting sleep state information corresponding to each piece of information in the frequency domain, a spectrogram, or a mel spectrogram divided into 30 second units may be included.

In one embodiment of the present invention, it may include the step of extracting sleep state information by using a series of information consisting of a plurality of frequency domain information, spectrograms, or mel spectrograms divided into 30 second units as input to a deep learning model. You can.

In one embodiment of the present invention, the step of outputting a vector with reduced dimensionality by using information in the frequency domain, a spectrogram, or a mel spectrogram containing time series information as input to an artificial intelligence model.

In one embodiment of the present invention, the step of outputting a vector containing implied time series information by using a vector with reduced dimensionality as an input to an artificial intelligence model may be included.

In one embodiment of the present invention, the step of outputting a vector containing time series information by using a vector with a reduced dimension as an input to an intermediate layer.

In one embodiment of the present invention, the intermediate layer, where a vector of reduced dimension is input, performs linearization to imply vector information, normalization to input the average and variance, or a dropout step to deactivate some nodes. At least one of the models may be included.

In one embodiment of the present invention, a method of utilizing an unsupervised learning model that learns using unlabeled data in which the correct answer is not labeled may be included.

The unsupervised learning model used in one embodiment of the present invention may include a consistency training model using noise in the target environment.

The consistency training model used in one embodiment of the present invention may include the step of performing learning with data to which noise has been intentionally added and data to which noise has not been intentionally added.

The unsupervised learning model used in one embodiment of the present invention may include an unsupervised domain adaptation (UDA) model.

The UDA model used in one embodiment of the present invention can perform primary learning using unlabeled data and labeled data, and secondary learning using unlabeled data. .

In the first learning of the UDA model according to an embodiment of the present invention, learning can be performed using labeled data acquired in a specific environment and unlabeled data acquired in another environment or target environment.

In the first learning of the UDA model according to an embodiment of the present invention, data acquired in a specific environment and data acquired in a different environment or target environment are input to the sleep analysis model and learned to extract commonalities between the data. It may include steps.

In the first learning of the UDA model according to an embodiment of the present invention, commonalities between data obtained in a specific environment and data acquired in a different environment or target environment are used as input to the sleep analysis model to determine commonalities between the output data. It may include a learning step to classify differences between input data as input to a discriminator model.

In the secondary learning of the UDA model according to an embodiment of the present invention, unlabeled data is used as input to the deep learning model to learn the class information contained in the predicted value of sleep state information output from the sleep analysis model to make it more reliable. May include steps.

In semi-supervised learning using pseudo labels used in one embodiment of the present invention, unlabeled data is used as input to a deep learning model and the output data is labeled with a pseudo label ( It can include the step of performing learning of a deep learning model by using it as a pseudo label.

Semi-supervised learning using pseudo labels used in one embodiment of the present invention may include performing augmentation preprocessing on the image.

The augmentation preprocessing method performed in semi-supervised learning using pseudo labels used in an embodiment of the present invention includes a weakly-augmented method that modulates the image relatively little, or a weakly-augmented method that modulates the image relatively little. At least one of the Strongly-augmented methods may be included.

Augmentation preprocessing techniques performed in semi-supervised learning using pseudo labels used in an embodiment of the present invention include data augmentation and pitch shifting. It may include one or more of ation, TUT (Tile UnTile) augmentation, or noise-added augmentation techniques.

Semi-supervised learning using pseudo labels used in one embodiment of the present invention is a weakly-augmented method that modulates images relatively little, and uses image information as input to a deep learning model. By using the output predicted value as a pseudo label, a method of performing learning based on it can be included.

In semi-supervised learning using pseudo labels used in an embodiment of the present invention, moving average technique, weighted average technique, weighted moving average ) may include one or more of the techniques, or the Exponential Weighted Moving Average technique.

Semi-supervised learning using pseudo labels used in one embodiment of the present invention uses data obtained from the target environment or target group as input to a deep learning model to determine the output prediction value. It may include tuning the distribution so that it is formed in a direction that matches the distribution of the predicted value output using data obtained from a specific environment or comparison group as input to the deep learning model.

Unsupervised learning and/or semi-supervised learning used in an embodiment of the present invention may include a method of performing dictionary learning so that the reliability of the predicted value for image information can be increased even if there is no label in the image data in the image domain. .

Unsupervised learning and/or semi-supervised learning used in an embodiment of the present invention may include a method of damaging part of image information and then performing learning to predict the damaged part of image information.

In addition, according to an embodiment of the present invention, a method for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner is to analyze sleep state information using both sleep sound information and sleep environment information. The method includes: a first information acquisition step of acquiring sound information in the time domain related to the user's sleep; A second information acquisition step of acquiring user sleep environment information related to the user's sleep; combining the first information and the second information into multimodal data; Extracting features by using the multimodal data as input to a multimodal learned deep learning model; And obtaining user sleep state information by using the extracted features as input to the deep learning model, including sleep sound. It is possible to provide a method for analyzing sleep state information using multimodal information and sleep environment information.

Additionally, the first information acquisition step may include performing preprocessing of the obtained first information, and the second information acquisition step may include performing preprocessing of the obtained second information.

And, performing preprocessing of the first information includes extracting first information features based on the first information, and performing preprocessing of the second information includes extracting first information features based on the first information. It may include extracting a second information feature.

In addition, performing preprocessing of the first information includes performing data augmentation of the first information, and performing preprocessing of the second information includes performing data augmentation of the second information. It may include steps to perform.

And, the deep learning model in the step of acquiring the user sleep state information may be a deep learning model based on natural language processing.

Additionally, performing preprocessing of the first information may include converting the first information in the time domain into information in the frequency domain.

According to one embodiment for achieving the object of the present invention, a non-transitory computer readout device storing one or more programs configured to be executed by one or more processors to analyze sleep state information including sleep sound information and sleep environment information in a multimodal manner. As an enabling storage medium, the one or more programs may provide a non-transitory computer-readable storage medium containing instructions to perform the methods described above.

According to one embodiment for achieving the purpose of the present invention, in a smart device for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner, sound information in the time domain related to the user's sleep is acquired. a first information acquisition unit; a second information acquisition unit that acquires user sleep environment information related to the user's sleep; a data combiner that combines the first information and the second information into multimodal data; a feature inference unit that infers features by using the multimodal expression as an input to a learned deep learning model; and a user sleep state information acquisition unit that acquires sleep state information by using the inferred features as input to a deep learning model. A device for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner, including a can be provided.

Here, a first information preprocessing unit that performs preprocessing of the first information; and

It may further include a second information preprocessing unit that performs preprocessing of the second information.

In addition, a unit performing preprocessing of the first information may extract a first information feature based on the first information, and a unit performing preprocessing of the second information may extract a second information feature based on the second information. there is.

Additionally, the preprocessing unit for the first information may perform data augmentation of the first information, and the preprocessing unit for the second information may perform data augmentation for the second information.

Additionally, the artificial intelligence model for analyzing the user's sleep state information may be an artificial intelligence model based on natural language processing.

Additionally, the preprocessing unit that performs the first information preprocessing may convert first information in the time domain into information in the frequency domain. Here, information on the frequency domain may be information including changes along the time axis of frequency components included in the first information on the time domain.

In order to achieve the purpose of the present invention, a method for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner according to an embodiment includes first information for acquiring sleep sound information related to the user's sleep. acquisition phase; Inferring first sleep state information by using the sleep sound information as input to a deep learning model; A second information acquisition step of acquiring user sleep environment information related to the user's sleep; Inferring second sleep state information by using the user sleep environment information as input to an inference model; And analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner, including a user sleep state information acquisition step of combining the first sleep state information and the second sleep state information to obtain user sleep state information. can provide a method for

In addition, the first information acquisition step may convert first information on the time domain into information on the frequency domain.

In addition, the step of inferring the first sleep state information includes using a Hypnogram indicating a sleep stage or a Hypnodensity graph indicating the reliability of the sleep stage as a probability as the first sleep state information. can be inferred.

And, the step of inferring the second sleep state information includes inferring a hypnogram indicating a sleep stage or a hypnodensity graph indicating the reliability of the sleep stage with probability as the second sleep state information. It can be characterized.

In addition, the step of acquiring the user sleep state information further includes a sleep state information combining step of combining the first sleep state information and the second sleep state information. A method for analyzing sleep state information may be provided.

In addition, the step of obtaining user sleep state information may further include a sleep state data augmentation step of performing data augmentation using the first sleep state information and the second sleep state information.

Here, the inference model may be an artificial intelligence sleep information inference model.

In addition, in the step of acquiring user sleep state information, user sleep state information may be inferred through an artificial intelligence learning model in order to obtain the user sleep state information.

As an embodiment for achieving the purpose of the present invention, a non-transitory computer readout device stores one or more programs configured to be executed by one or more processors to analyze sleep state information using multimodal sleep sound information and sleep environment information. As an enabling storage medium, the one or more programs may provide a non-transitory computer-readable storage medium containing instructions to perform the methods described above.

According to an embodiment for achieving the purpose of the present invention, a smart device for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner, acquires sleep sound information related to the user's sleep, First information acquisition department; a first information inference unit that uses the sleep sound information as input to a deep learning model to infer first sleep state information; a second information acquisition unit that acquires user sleep environment information related to the user's sleep; a second sleep state information inference unit that uses the user sleep environment information as input to an inference model to infer second sleep state information; And analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner, including a user sleep state information acquisition unit that combines the first sleep state information and the second sleep state information to obtain user sleep state information. Devices can be provided for this.

Here, according to an embodiment of the present invention, the first information acquisition unit may be configured to convert first information on the time domain into information on the frequency domain.

In addition, according to an embodiment of the present invention, the first sleep state information inference unit infers a hypnogram indicating a sleep stage or a hypnodensity graph indicating the reliability of the sleep stage with probability as the first sleep state information. It can be characterized as:

And, according to an embodiment of the present invention, the second sleep state information inference unit infers a hypnogram indicating a sleep stage or a hypnodensity graph indicating the reliability of the sleep stage with probability as the second sleep state information. It can be characterized as:

Additionally, according to one embodiment of the present invention, the user sleep state information acquisition unit may include a sleep state information combining unit that combines the first sleep state information and the second sleep state information.

And, according to an embodiment of the present invention, the user sleep state information acquisition unit further includes a sleep state data augmentation unit that performs data augmentation using the first sleep state information and the second sleep state information. can do.

Additionally, according to an embodiment of the present invention, the inference model may be an artificial intelligence sleep information inference model.

And, according to one embodiment of the present invention, the user sleep state information acquisition unit may infer user sleep state information through an artificial intelligence learning model to obtain the user sleep state information.

Meanwhile, in a method for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner according to an embodiment for achieving the purpose of the present invention, a method for acquiring sleep sound information related to the user's sleep 1 Information acquisition phase; Inferring first sleep state information by using the sleep sound information as input to a deep learning model; A second information acquisition step of acquiring user sleep environment information related to the user's sleep; A method for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner, including a user sleep state information acquisition step of combining the first sleep state information and the second information to obtain user sleep state information. can be provided.

And, according to an embodiment of the present invention, the second information may be user information obtained through a smart watch.

Additionally, according to an embodiment of the present invention, inferring the first sleep state information may include performing preprocessing of the obtained first sleep information into first sleep information.

And, according to an embodiment of the present invention, performing preprocessing of the first information may include converting the first information on the time domain to information on the frequency domain.

Additionally, according to an embodiment of the present invention, the step of obtaining user sleep state information may further include a sleep state information combining step of combining the first sleep state information and the second information.

And, according to an embodiment of the present invention, the step of obtaining user sleep state information further includes a sleep state data augmentation step of performing data augmentation using the first sleep state information and the second information. can do.

Additionally, according to an embodiment of the present invention, in the step of acquiring user sleep state information, inference of user sleep state information may be performed through an artificial intelligence learning model to obtain the user sleep state information.

Meanwhile, a non-transitory device storing one or more programs configured to be executed by one or more processors to analyze sleep state information using multimodal sleep sound information and sleep environment information according to an embodiment to achieve the purpose of the present invention A non-transitory computer-readable storage medium may be provided, wherein the one or more programs include instructions to perform one or more of the methods described above.

In addition, according to an embodiment to achieve the purpose of the present invention, in a device for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner, the device includes acquiring sleep sound information related to the user's sleep. First information acquisition department; An inference unit that uses the sleep sound information as input to a deep learning model to determine first sleep state information; a second information acquisition unit that acquires user sleep environment information related to the user's sleep; Provides a device for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner, including a user sleep state information acquisition unit that combines the first sleep state information and the second information to obtain user sleep state information. can do.

Additionally, according to one embodiment of the present invention, the first sleep state information inference unit may include a first sleep information preprocessing unit for the obtained first sleep information.

And, according to an embodiment of the present invention, the first information preprocessing unit may convert the first information on the time domain into information on the frequency domain.

Additionally, according to one embodiment of the present invention, the user sleep state information acquisition unit may further include a sleep state information combining unit that combines the first sleep state information and the second information.

And, according to one embodiment of the present invention, the user sleep state information acquisition unit may further include a sleep state data augmentation unit that performs data augmentation using the first sleep state information and the second information. there is.

Additionally, according to one embodiment of the present invention, the user sleep state information acquisition unit may infer user sleep state information through an artificial intelligence learning model to obtain the user sleep state information.

Additionally, according to an embodiment of the present invention, a method for detecting a real-time sleep event based on at least one of sleep sound information or sleep environment information may be provided.

Additionally, according to an embodiment of the present invention, a method of learning a sleep analysis artificial intelligence model can be provided by an unsupervised learning method or a semi-supervised learning method.

Here, the learning method according to embodiments of the present invention may include semi-supervised learning based on sequential consistency loss. It can be learned to take into account the time-series characteristics of acoustic information through semi-supervised learning based on sequential consistency loss according to an embodiment of the present invention.

Alternatively, the learning method according to an embodiment of the present invention may include learning based on semi-supervised contrast loss. Learning based on semi-supervised contrast loss according to an embodiment of the present invention may include setting a class reliability threshold and adjusting the position in the vector space based on anchor data based on the set class reliability threshold. .

Here, the anchor data used for learning based on semi-supervised contrast loss according to an embodiment of the present invention is at least one of labeled data given a label for the sleep state or pseudo-labeled data given a pseudo label for the sleep state. may include.

Additionally, according to an embodiment of the present invention, a deep learning model can be provided for analyzing the user's sleep state information through multi-task learning based on acoustic information.

Here, for multi-task learning, the deep learning model may have multiple heads. In this case, each head included in the multiple heads may perform one different task among the multiple tasks.

According to embodiments of the present invention, the plurality of tasks may include tasks such as multimodal learning, sleep event analysis, and sleep stage analysis.

Other specific details of the invention are included in the detailed description and drawings.

The present invention was developed in response to the above-described background technology, and can provide an artificial neural network model that determines the user's sleep state based on at least one of acoustic information or sleep environment information detected in the user's sleep environment.

The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

1A to 1C are conceptual diagrams showing a system in which various aspects can be implemented for generating a sleep analysis model that predicts sleep state based on information related to an embodiment of the present invention.

Figure 2 is an exemplary diagram illustrating a plurality of sleep sound information acquired in various environments.

Figure 3 is a block diagram of a computing device for generating the sleep analysis model that predicts sleep state based on information related to an embodiment of the present invention.

Figure 4 is a diagram for explaining the process of acquiring sleep sound information in the sleep analysis method according to the present invention.

Figure 5 is a conceptual diagram illustrating a privacy protection method using Mel spectrogram transformation for sleep sound information extracted from a user in the sleep analysis method according to the present invention.

Figure 6 is a diagram for explaining a method of obtaining a spectrogram corresponding to sleep sound information in the sleep analysis method according to the present invention.

Figure 7 is a schematic diagram showing one or more network functions for performing the sleep analysis method according to the present invention.

Figure 8 is a diagram for explaining sleep stage analysis using a spectrogram in the sleep analysis method according to the present invention.

Figure 9 is a diagram illustrating sleep event determination using a spectrogram in the sleep analysis method according to the present invention.

Figure 10 is a diagram showing an experimental process for verifying the performance of the sleep analysis method according to the present invention.

Figure 11 is a graph verifying the performance of the sleep analysis method according to the present invention, and is a diagram comparing the polysomnography (PSG) result (PSG result) and the analysis result (AI result) using the AI algorithm according to the present invention. am.

Figure 12 is a graph verifying the performance of the sleep analysis method according to the present invention, showing polysomnography (PSG) results in relation to sleep apnea and hypoventilation (hypopnea) and polysomnography (PSG) results according to the present invention. This is a diagram comparing the analysis results (AI results) using AI algorithms.

Figure 13 is a schematic diagram of a data set according to an embodiment of the present invention.

Figure 14 is a diagram for explaining noise reduction according to an embodiment of the present invention.

Figure 15 is a diagram for explaining pitch shifting according to an embodiment of the present invention.

FIG. 16 is a diagram illustrating a preprocessing method for converting information or a spectrogram in the frequency domain into a nearly square form according to an embodiment of the present invention.

Figures 17a and 17b are diagrams for explaining the overall structure of a sleep analysis model according to an embodiment of the present invention.

Figure 18 is a diagram for explaining a feature extraction model and a feature classification model according to an embodiment of the present invention.

Figure 19 is a diagram for explaining in detail the operation of a sleep analysis model according to an embodiment of the present invention.

Figure 20 is a diagram for explaining an unsupervised or semi-supervised learning model according to an embodiment of the present invention.

Figure 21 is a diagram for explaining consistency training according to an embodiment of the present invention.

Figure 22 is a diagram for explaining Unsupervised Domain Adaptation (UDA) according to an embodiment of the present invention.

Figure 23 is a diagram for explaining the TUT (Tile UnTile) augmentation method according to an embodiment of the present invention.

Figure 24 is a diagram for explaining the structure of a sleep analysis model using a natural language processing model according to an embodiment of the present invention.

Figure 25 shows a flowchart illustrating a method for analyzing a user's sleep state through sound information according to an embodiment of the present invention.

Figure 26 is a flow chart to explain a method for analyzing a sleep state according to an embodiment of the present invention, including a process of combining sleep sound information and sleep environment information into multimodal data.

Figure 27 is a flowchart for explaining a method for analyzing a sleep state according to an embodiment of the present invention, including the step of combining the inferred sleep sound information and sleep environment information into multimodal data.

Figure 28 is a flow chart to explain a method for analyzing a sleep state according to an embodiment of the present invention, including the step of combining inferred sleep sound information with sleep environment information and multimodal data.

Figures 29a and 29b are diagrams for explaining the performance of noise addition and sleep event determination of a sleep analysis model in the sleep analysis method according to embodiments of the present invention.

Figure 30 is a diagram illustrating an example of a learning method based on consistency loss or sequential consistency loss when the number of samples in the sequence is 6, according to an embodiment of the present invention.

Figure 31 is an example diagram for explaining the operating mechanism of a learning method based on semi-supervised contrast loss according to an embodiment of the present invention.

Figure 32 is a table comparing the analysis results of a sleep analysis model according to an embodiment of the present invention and the analysis results of a PSG test in a home environment.

Figure 33 is a table comparing sleep analysis results based on PSG audio data and analysis results of a sleep analysis model according to an embodiment of the present invention.

Figure 34 is a diagram illustrating a linear regression analysis function used to analyze sleep events that occur during sleep, according to an embodiment of the present invention.

overall composition

The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various different forms. The present embodiments are merely provided to ensure that the disclosure of the present invention is complete and to provide a general understanding of the technical field to which the present invention pertains. It is provided to fully inform the skilled person of the scope of the present invention, and the present invention is only defined by the scope of the claims.

The terminology used herein is for describing embodiments and is not intended to limit the invention. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context. As used in the specification, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other elements in addition to the mentioned elements. Like reference numerals refer to like elements throughout the specification, and “and/or” includes each and every combination of one or more of the referenced elements. Although “first”, “second”, etc. are used to describe various components, these components are of course not limited by these terms. These terms are merely used to distinguish one component from another. Therefore, it goes without saying that the first component mentioned below may also be a second component within the technical spirit of the present invention.

Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those skilled in the art to which the present invention pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless clearly specifically defined.

As used in the specification, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and the “unit” or “module” performs certain roles. However, “part” or “module” is not limited to software or hardware. A “unit” or “module” may be configured to reside on an addressable storage medium and may be configured to run on one or more processors. Thus, as an example, a “part” or “module” refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, Includes procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functionality provided within components and “parts” or “modules” can be combined into smaller components and “parts” or “modules” or into additional components and “parts” or “modules”. Could be further separated.

In this specification, a computer refers to all types of hardware devices including at least one processor, and depending on the embodiment, it may be understood as encompassing software configurations that operate on the hardware device. For example, a computer can be understood to include, but is not limited to, a smartphone, tablet PC, desktop, laptop, and user clients and applications running on each device.

Those of ordinary skill in the art to which the present invention pertains will additionally recognize various example logical blocks, configurations, modules, circuits, means, logics, and algorithm steps described in connection with the embodiments disclosed herein. It should be recognized that they may be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented in hardware or software will depend on the specific application and design constraints imposed on the overall system. A skilled technician can implement the described functionality in a variety of ways for each specific application. However, such implementation decisions should not be construed as departing from the scope of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings.

Each step described in this specification is described as being performed by a computer, but the subject of each step is not limited thereto, and depending on the embodiment, at least part of each step may be performed in a different device.

1A to 1C illustrate a conceptual diagram illustrating a system in which various aspects of a method for generating a sleep analysis model that predicts sleep state based on information related to an embodiment of the present invention can be implemented.

A system according to embodiments of the present invention may include a computing device 100, a user terminal 10, an external server 20, and a network.

Here, the devices shown in FIG. 1A are only one example of a system for implementing the present invention, and the configuration is not limited to the embodiment shown in FIG. 1A, and may be added, changed, or deleted as necessary. .

Meanwhile, Figures 1B and 1C show a conceptual diagram showing a system in which various aspects of performing a sleep analysis method related to another embodiment of the present invention can be implemented.

First, the system according to the embodiment shown in FIG. 1A will be described.

As shown in FIG. 1A, the present invention provides that the computing device 100, the user terminal 10, and the external server 20 can mutually transmit and receive data for the system according to embodiments of the present invention through a network. You can.

According to one embodiment of the present invention, the computing device 100 or the external server 20 may be a server that provides cloud computing services. More specifically, the computing device 100 or the external server 20 may be a type of Internet-based computing server that provides a cloud computing service that processes information not on the user's computer but on another computer connected to the Internet. The cloud computing service may be a service that stores data on the Internet and allows users to use it anytime, anywhere through Internet access without having to install necessary data or programs on their computer. The cloud computing service can be used to easily manipulate and manipulate data stored on the Internet. You can easily share and forward with a click.

In addition, cloud computing services not only allow you to simply store data on a server on the Internet, but also allow you to perform desired tasks using the functions of applications provided on the web without having to install a separate program, and allow multiple people to view documents at the same time. It may be a service that allows you to work while sharing.

Additionally, cloud computing services may be implemented in at least one of the following forms: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), virtual machine-based cloud server, and container-based cloud server. . That is, the computing device 100 or the external server 20 of the present invention may be implemented in at least one form of the cloud computing service described above. The specific description of the cloud computing service described above is merely an example, and may include any platform for constructing the cloud computing environment of the present invention.

Networks according to embodiments of the present invention include Public Switched Telephone Network (PSTN), x Digital Subscriber Line (xDSL), Rate Adaptive DSL (RADSL), Multi Rate DSL (MDSL), and Very High Speed DSL (VDSL). ), UADSL (Universal Asymmetric DSL), HDSL (High Bit Rate DSL), and local area network (LAN) can be used. In addition, the networks presented here include Code Division Multi Access (CDMA), Time Division Multi Access (TDMA), Frequency Division Multi Access (FDMA), Orthogonal Frequency Division Multi Access (OFDMA), Single Carrier-FDMA (SC-FDMA), and A variety of wireless communication systems, such as other systems, may be used.

The network according to embodiments of the present invention can be configured regardless of the communication mode, such as wired or wireless, and is composed of various communication networks such as a personal area network (PAN) and a wide area network (WAN). It can be. Additionally, the network may be the well-known World Wide Web (WWW), or may use wireless transmission technology used for short-distance communication, such as Infrared Data Association (IrDA) or Bluetooth. The techniques described herein can be used in the networks mentioned above, as well as other networks.

According to one embodiment of the present invention, the user terminal 10 is a terminal that can receive information related to the user's sleep through information exchange with the computing device 100, and may refer to a terminal owned by the user. For example, the user terminal 10 may be a terminal related to a user who wants to improve his or her health through information related to his or her sleeping habits.

This user terminal 10 may refer to any type of entity(s) in the system that has a mechanism for communication with an external server 20 or computing device 100. For example, these user terminals 10 include personal computers (PCs), notebooks (note books), mobile terminals, smart phones, tablet PCs, artificial intelligence (AI) speakers, and It may include artificial intelligence TVs, wearable devices, home appliances, etc., and may include all types of terminals that can connect to wired/wireless networks. Additionally, the user terminal 10 may include an arbitrary server implemented by at least one of an agent, an application programming interface (API), and a plug-in. Additionally, the user terminal 10 may include an application source and/or client application.

According to one embodiment of the present invention, the external server 20 may be a server that stores information about a plurality of learning data for learning a neural network. Alternatively, the external server 20 may be a digital device, such as a laptop computer, notebook computer, desktop computer, web pad, or mobile phone, equipped with a processor and equipped with memory and computing power. The external server 20 may be a web server that processes services. The types of servers described above are merely examples and the present invention is not limited thereto. The plurality of learning data may include, for example, sleep sound information obtained from a plurality of user terminals, or health checkup information and sleep checkup information obtained at a hospital. A detailed description of the learning dataset will be provided later.

According to one embodiment of the present invention, the external server 20 may be at least one of a hospital server and a government server, and may be a server that stores information about a plurality of polysomnography records, electronic health records, and electronic medical records. there is. For example, a polysomnographic record may include information on the sleep examination subject's breathing and movements during sleep, and information on sleep diagnosis results (eg, sleep stages, etc.) corresponding to the information. Information stored in the external server 20 can be used as learning data, verification data, and test data to train the neural network in the present invention.

Additionally, the external server 20 according to an embodiment of the present invention may record an artificial intelligence model for analyzing sleep state information. In this case, if sleep environment information is obtained from the user terminal 10, etc. and transmitted to an external server, sleep state information can be generated based on the sleep environment information through an artificial intelligence model mounted on the external server.

Alternatively, according to an embodiment of the present invention, if sleep environment information is acquired from the user terminal 10 and sleep sound information is acquired through preprocessing of the sleep environment information in the user terminal 10, the acquired sleep sound information is transmitted to an external device. It is transmitted to the server, and the external server may generate sleep state information based on the received sleep sound information.

The computing device 100 of the present invention may receive a plurality of sleep sound information, health checkup information, or sleep checkup information from the external server 20, and construct a learning data set based on the corresponding information. The computing device 100 may generate a sleep analysis model that calculates sleep state information in response to sleep sound information by performing learning on one or more network functions through a learning data set.

According to an embodiment of the present invention, at least one of the user terminal 10, the computing device 100, or the external server 20 may generate a sleep analysis model. The sleep analysis model may be a neural network model that predicts information about the user's sleep state based on information related to the user's sleep sound that is non-invasively acquired during the user's sleep. At least one of the electronic devices according to embodiments of the present invention may generate a sleep analysis model that outputs the user's sleep state information by receiving the user's sleep sound information as input. A detailed description of the construction of the learning data set for learning the neural network of the present invention, the learning method using the learning data set, and the creation and learning of the sleep analysis model will be described later.

According to an embodiment of the present invention, a user can obtain monitoring information related to his or her sleep through the user terminal 10. When at least one of the electronic devices according to an embodiment of the present invention acquires or receives sleep sound information, it processes the sleep sound information as an input to the sleep analysis model to cause the sleep analysis model to output sleep state information. can do.

Meanwhile, according to an embodiment of the present invention, sleep sound information obtained from an electronic device such as the user terminal 10 may have a low signal-to-noise ratio (SNR). In general, the microphone module provided in the user terminal 10 carried by the user may be configured as a MEMS (Micro-Electro Mechanical System) since it must be provided in the user terminal 10 of a relatively small size. The microphone module provided in the user terminal 10 may be, for example, a common microphone (low-performance, small microphone). These microphone modules can be manufactured very small, but can have a lower signal-to-noise ratio than a condenser microphone or dynamic microphone. A low signal-to-noise ratio may mean that the ratio of noise, which is a sound that is not to be identified, to the sound that is to be identified is high, making it difficult to identify the sound (i.e., unclear). This sleep sound information is information about very small sounds (i.e., sounds that are difficult to distinguish) such as the user's breathing and movement, and is acquired along with other sounds during the sleep environment, so the microphone module as described above (i.e., low When acquired through a microphone module with a high signal-to-noise ratio, deriving and analyzing information can be very difficult.

Therefore, according to an embodiment of the present invention, at least one of various electronic devices can convert and/or adjust sleep sound data that is unclearly acquired including a lot of noise into data that can be analyzed, and convert and/or Learning for artificial neural networks can be performed using adjusted data. When pre-training of the artificial neural network is completed, the learned neural network (e.g., acoustic analysis model) is used to obtain (e.g., transformed and/or adjusted) data corresponding to the sleep acoustic information (e.g., raw sleep acoustic information). The user's sleep state information can be obtained based on information including changes along the time axis of the frequency components included in, information in the frequency domain, or a spectrogram).

According to one embodiment of the present invention, the computing device 100 collects sleep sound with a low signal-to-noise ratio through a commonly used user terminal (e.g., artificial intelligence speaker, bedroom IoT device, mobile phone, etc.) to collect sound. When information is obtained, it can be processed into data appropriate for analysis, and the processed data can be processed to provide sleep state information related to changes in sleep stages. This eliminates the need to have a contact microphone on the user's body to obtain clear sound, and also allows sleep status to be monitored in a typical home environment with just a software update without purchasing an additional device with a high signal-to-noise ratio. This can provide the effect of increasing convenience.

Monitoring information related to sleep may include, for example, sleep state information related to when the user fell asleep, time spent sleeping, time of waking up, etc., or specifically sleep stage information related to changes in sleep stage during sleep. there is.

For a specific example, sleep stage information may mean information on changes in the user's sleep to light sleep, normal sleep, deep sleep, or REM sleep at each time point during the user's 8 hours of sleep last night. The detailed description of the above-described sleep stage information is only an example, and the present invention is not limited thereto.

Meanwhile, as shown in FIG. 1B, sleep analysis according to the present invention may be performed on the user terminal 10 or the external server 20 without a separate computing device.

Meanwhile, FIG. 1C shows a conceptual diagram showing a system in which various aspects of various electronic devices related to another embodiment of the present invention can be implemented.

The electronic devices shown in FIG. 1C can perform at least one of the operations performed by various devices according to embodiments of the present invention.

For example, operations performed by various devices according to embodiments of the present invention include acquiring sleep environment information or environmental sensing information, learning a sleep analysis model, and inferring the sleep state through the sleep analysis model. , may include an operation of acquiring sleep state information.

Or, for example, receive information related to the user's sleep or sleep environment information, transmit or receive environmental sensing information, determine environmental sensing information, process or process data, process services, or provide services. , analyze sleep status, construct a learning data set based on information related to the user's sleep, store information about acquired data or a plurality of learning data for learning of a neural network, or transmit or receive various information. , It may also include an operation of mutually transmitting and receiving data for the system according to embodiments of the present invention through a network.

The electronic devices shown in FIG. 1C may individually perform the operations performed by various devices according to the embodiment of the present invention, but may also perform one or more operations simultaneously or in time series.

Referring to FIG. 1C, the electronic device (reference numerals 1a to 1d) may be an electronic device within the range of the area 11a that can obtain object state information, such as information about the user's movement or breathing. Hereinafter, for convenience, the area 11a where object state information or environmental sensing information, such as information about the user's movement or breathing, can be obtained will be referred to as “area 11a.”

Meanwhile, referring to FIG. 1C, the electronic device (

reference numerals

1a and 1d) may be a device composed of a combination of two or more electronic devices.

Meanwhile, referring to FIG. 1C, electronic devices (reference numerals 1a and 1b) may be electronic devices connected to a network within the area 11a.

Meanwhile, referring to FIG. 1C, electronic devices (

reference numerals

1c and 1d) may be electronic devices that are not connected to the network within the area 11a.

Meanwhile, referring to FIG. 1C, electronic devices (reference numerals 2a to 2b) may be electronic devices outside the range of area 11a.

Meanwhile, referring to FIG. 1C, there may be a network that interacts with electronic devices within the scope of the area 11a, and there may be a network that interacts with electronic devices outside the scope of the area 11a.

Here, a network that interacts with electronic devices within the scope of area 11a may serve to transmit and receive information for controlling smart home appliances.

Additionally, the network interacting with electronic devices within the scope of area 11a may be, for example, a local area network or a local network. Here, the network interacting with electronic devices within the scope of area 11a may be, for example, a remote network or a global network.

Since the detailed description of the operation of the networks shown in FIG. 1C is the same as previously described, redundant description will be omitted.

Meanwhile, referring to FIG. 1C, there may be one or more electronic devices connected through a network outside the range of area 11a, and in this case, the electronic devices may distribute data to each other or perform one or more operations separately.

Alternatively, when there is more than one electronic device connected through a network outside the range of area 11a, the electronic devices may perform various operations independently of each other.

Meanwhile, according to an embodiment of the present invention, as shown in FIG. 3, the computing device 100 may include a network unit 110, a memory 120, and a processor 130. The components included in the above-described computing device 100 are exemplary, and the scope of the present invention is not limited to the above-described components. That is, depending on the implementation aspect of the embodiments of the present invention, additional components may be included or some of the above-described components may be omitted.

According to one embodiment of the present invention, the computing device 100 may include a user terminal 10 and a network unit 110 that transmits and receives data with the external server 20. The network unit 110 may transmit and receive data for performing a method for analyzing a sleep state based on sleep sound information according to an embodiment of the present invention, to other computing devices, servers, etc. That is, the network unit 110 may provide a communication function between the computing device 100, the user terminal 10, and the external server 20. For example, the network unit 110 may receive sleep sound information from the user terminal 10 and transmit sleep state information corresponding to the received sleep sound information to the user terminal 10. Additionally, for example, the network unit 110 may receive sleep checkup records and electronic health records for a plurality of users from a hospital server. Additionally, the network unit 110 may allow information to be transferred between the computing device 100, the user terminal 10, and the external server 20 by calling a procedure with the computing device 100.

The network unit 110 according to an embodiment of the present invention may use various wired/wireless communication systems, such as the network described above. Since the description of the network was previously described, overlapping descriptions will be omitted.

According to an embodiment of the present invention, the memory 120 is configured to generate a sleep analysis model that predicts the sleep state based on sound information according to an embodiment of the present invention and to analyze the sleep state through sleep sound information. A computer program for performing the method may be stored, and the stored computer program may be read and driven by the processor 130. Additionally, the memory 120 may store any type of information generated or determined by the processor 130 and any type of information received by the network unit 110. Additionally, the memory 120 may store data related to the user's sleep. For example, the memory 120 may temporarily or permanently store input/output data (e.g., sleep sound information related to the user's sleep environment, sleep state information corresponding to sleep sound information, etc.).

According to one embodiment of the present invention, the memory 120 is a flash memory type, hard disk type, multimedia card micro type, or card type memory (e.g. (e.g. SD or -Only Memory), and may include at least one type of storage medium among magnetic memory, magnetic disk, and optical disk. The computing device 100 may operate in connection with web storage that performs a storage function of the memory 120 on the Internet. The description of the memory described above is only an example, and the present invention is not limited thereto.

According to one embodiment of the present invention, the processor 130 may be composed of one or more cores, such as a central processing unit (CPU) of a computing device, and a general purpose graphics processing unit (GPGPU). , may include a processor for data analysis, machine learning, or deep learning, such as a tensor processing unit (TPU).

The processor 130 according to an embodiment of the present invention may read a computer program stored in the memory 120 and perform data processing for model learning. According to one embodiment of the present invention, the processor 130 may perform calculations for learning a neural network. The processor 130 performs calculations for learning a neural network, such as processing input data for learning in machine learning or deep learning, extracting features from input data, calculating errors, and updating the weights of the neural network using backpropagation. can do.

Additionally, at least one of the CPU, GPGPU, and TPU of the processor 130 may process learning of the network function. For example, CPU and GPGPU can work together to process learning of network functions and data classification using network functions. Additionally, in one embodiment of the present invention, the processors of a plurality of computing devices can be used together to process learning of network functions and data classification using network functions. Additionally, a computer program executed in a computing device according to an embodiment of the present invention may be a CPU, GPGPU, or TPU executable program.

In this specification, network function may be used interchangeably with artificial neural network or neural network. In this specification, a network function may include one or more neural networks, and in this case, the output of the network function may be an ensemble of the outputs of one or more neural networks. Additionally, in this specification, the model may include a network function. A model may include one or more network functions, in which case the output of the model may be an ensemble of the outputs of one or more network functions.

The processor 130 may read a computer program stored in the memory 120 and execute a sleep analysis model according to an embodiment of the present invention. According to an embodiment of the present invention, the processor 130 may perform calculations to calculate sleep analysis information based on sleep sensing data. Alternatively, according to an embodiment of the present invention, the processor 130 may perform calculations to learn a sleep analysis model.

According to one embodiment of the present invention, the processor 130 may typically process the overall operation of the computing device 100. The processor 130 can provide or process appropriate information or functions to the user terminal by processing signals, data, information, etc. input or output through the components discussed above or by running an application program stored in the memory 120. there is.

According to one embodiment of the present invention, the processor 130 may acquire a plurality of learning data to perform learning on a neural network (or one or more network functions). The plurality of learning data may be related to a plurality of sleep sound information related to each of a plurality of users. The processor 130 may acquire a plurality of sleep sound information related to the sleep of a plurality of users, and perform learning on one or more network functions through a learning data set including the plurality of sleep sound information to model a sleep analysis model. can be created.

According to an embodiment of the present invention, acquiring a plurality of sleep sound information may be acquiring or loading sleep sound information stored in the memory 120. In one embodiment, a plurality of sleep sound information may be received from the external server 20 through the network unit 110, and the received sleep sound information may be stored in the memory 120. Additionally, acquisition of sleep sound information may involve receiving or loading data from another storage medium, another computing device, or a separate processing module within the same computing device based on wired/wireless communication means.

Sleep status information

Meanwhile, in the present invention, sleep state information may be information related to whether the user is sleeping. Specifically, the sleep state information may include at least one of first sleep state information indicating that the user is before sleep, second sleep state information indicating that the user is sleeping, and third sleep state information indicating that the user is after sleep. In other words, when first sleep state information is inferred with respect to the user, the processor 130 may determine that the user is in a pre-sleep state (i.e., before going to bed), and the second sleep state information is inferred. In this case, it may be determined that the user is in a sleeping state, and if third sleep state information is obtained, it may be determined that the user is in a post-sleep state (i.e., waking up).

This sleep state information may be obtained based on environmental sensing information or actigraphy. Environmental sensing information may be sensing information obtained in a non-contact manner in the space where the user is located. For example, the processor 130 stores acquired environmental sensing information (sound information related to cleaning, sound information related to food cooking, sound information related to watching TV, sleep sound information acquired during sleep, etc.), actigraphy, biometric information, etc. Based on this, sleep state information can be extracted. At this time, sleep sound information acquired during the user's sleep may include sounds generated as the user tosses and turns during sleep, sounds related to muscle movements, or breathing sounds during sleep. That is, sleep sound information in the present invention may mean sound information related to movement patterns and breathing patterns related to the user's sleep.

In addition, sleep state information according to an embodiment of the present invention includes, in addition to sleep stage information, information related to breathing during sleep, bruxism information, whether coughing, degree of coughing, whether sneezing, tossing and turning information, sleep talking information, sleep events, etc. It can contain various information related to.

Sleep stage information

According to one embodiment, the processor 130 may extract sleep stage information. Sleep stage information may be extracted based on the user's environmental sensing information. Sleep stages can be divided into NREM (non-REM) sleep and REM (rapid eye movement) sleep, and NREM sleep can be further divided into multiple stages (e.g., stages 2 of light and deep, and stages 4 of N1 to N4). there is. The sleep stage setting may be defined as a general sleep stage, but may also be arbitrarily set to various sleep stages depending on the designer. Through sleep stage analysis, it is possible to predict not only sleep-related sleep quality but also various sleep events, such as sleep disorders (e.g., sleep apnea) and their underlying causes (e.g., snoring).

Sleep environment information

In an embodiment, sleep environment information of the present invention may be obtained through the user terminal 10. Sleep environment information may refer to information related to sleep obtained in the space where the user is located. Sleep environment information may be sensing information obtained in a space where the user is located using a non-contact method. Sleep environment information may be information related to the user's sleep obtained from a smart watch, smart home appliance, etc.

For example, sleep environment information may be acoustic information obtained in the bedroom where the user sleeps. According to an embodiment, sleep environment information acquired through the user terminal 10 may be information that serves as the basis for obtaining the user's sleep state information in the present invention. For a specific example, sleep state information related to whether the user is before, during, or after sleep may be obtained through sleep environment information obtained in relation to the user's activities. As another specific example, the sleep environment information may include various information such as the user's heart rate, the user's breathing, illumination level, and noise information regarding the user's sleep environment.

In addition, sleep environment information includes noise information commonly occurring in daily life (sound information related to cleaning, sound information related to food cooking, sound information related to watching TV, cat sounds, dog sounds, bird sounds, car sounds, wind noise, It may be at least one of (e.g., rain sounds, etc.) or other biometric information (e.g., electrocardiogram, brain wave, pulse information, information on muscle movement, etc.).

Data for sleep analysis

Data according to an embodiment of the present invention may be raw acoustic information collected through a microphone. Here, raw acoustic information may be information in the time domain having amplitude, phase, and frequency.

Additionally, data according to an embodiment of the present invention may be raw sound information converted into information including changes in the frequency components of the raw sound information along the time axis.

Alternatively, data according to an embodiment of the present invention may be raw acoustic information converted into information in the frequency domain rather than information in the time domain. Here, Fourier Transform or Wavelet Transform can be performed to convert the information into frequency domain information.

Additionally, information converted to the frequency domain according to an embodiment of the present invention may be information having amplitude and frequency.

Additionally, data according to an embodiment of the present invention may correspond to a spectrogram obtained by converting acoustic information into information in the frequency domain.

Alternatively, data according to an embodiment of the present invention may be a Mel spectrogram in which the Mel scale is applied to the spectrogram. Specifically, a Mel-Spectrogram can be obtained through a Mel-Filter Bank for the spectrogram. In general, the parts of the human cochlea that vibrate may differ depending on the frequency of voice data. In addition, the human cochlea has the characteristic of detecting frequency changes well in low frequency bands and having difficulty detecting frequency changes in high frequency bands. Accordingly, a Mel spectrogram can be obtained from the spectrogram using a Mel filter bank so as to have a recognition ability similar to the characteristics of the human cochlea for voice data. In other words, the mel-filter bank may apply a small number of filter banks in a low frequency band and apply a wider filter bank toward higher frequencies. In other words, the processor 130 can obtain a Mel spectrogram by applying a Mel filter bank to the spectrogram to recognize voice data similar to the characteristics of the human cochlea. The Mel spectrogram may include frequency components that reflect human hearing characteristics. That is, in the present invention, the spectrogram generated in response to sleep sound information and subject to analysis using a neural network may include the Mel spectrogram described above.

Here, the information converted to information in the frequency domain, such as a spectrogram or mel spectrogram according to an embodiment of the present invention, is a domain with amplitude and frequency, and is information containing changes along the time axis of the frequency components of acoustic information. It may have been converted.

Additionally, data according to embodiments of the present invention is a visualization of the above-described information and can be input into an artificial intelligence model based on image processing. For example, raw acoustic information converted into information including changes in the frequency components along the time axis can be visualized and used as input to an artificial intelligence model. Alternatively, information converted to the frequency domain can be visualized and used as input to an artificial intelligence model. The artificial intelligence model into which the above information is input may be an image processing-based artificial intelligence model.

Meanwhile, according to an embodiment of the present invention, sleep sound information among sleep state information may be collected through polysomnography (PSG) in a hospital environment, and may be collected by a user in a home environment using a wearable device or smartphone, etc. It can also be collected through the microphone built into the terminal.

The data set according to an embodiment of the present invention can be collected and constructed through electroencephalography (Video-EEG, video-electroencephalography) during polysomnography (PSG) or through a microphone during polysomnography (PSG).

Alternatively, the data set according to an embodiment of the present invention may be constructed by collecting acoustic signals generated during sleep through a microphone built into an electronic device such as a user terminal.

How to derive sleep analysis results

Below, a method in which the processor 130 derives the final sleep analysis result using biometric information (Bio-Signal), movement information (ACTIGRAPHY), and sleep sound information (SOUND) will be described.

First, the processor 130 may derive the final sleep analysis result using weights. Specifically, the processor 130 may apply the same weight to the first sleep analysis result and the sleep analysis result using sleep sound information to derive the second sleep analysis result. Alternatively, the processor 130 may derive a secondary sleep analysis result by applying different weights to the first sleep analysis result and the sleep analysis result using sleep sound information. For example, the final secondary sleep analysis results can be derived by placing a 30% weight on the contact-type primary sleep analysis based on HRV and Actigraphy, and a 70% weight on the AI analysis using sleep sounds.

In another embodiment, the processor 130 determines that the user has entered the corresponding sleep stage only when the sleep stage in the first and second sleep analysis results completely matches, and derives the final sleep analysis result. You can.

In another embodiment, the processor 130 may use a method of learning an AI sleep analysis model that uses at least one of biometric information (Bio-Signal), movement information (ACTIGRAPHY), and sleep sound information (SOUND) as input. there is. The learning method of the AI sleep analysis model will be explained in more detail below, but briefly explained, an AI sleep analysis model that performs sleep analysis based on one or more factors can be created by inputting one or more information into the input layer of the artificial intelligence model. there is.

In another embodiment, the processor 130 first performs a secondary sleep analysis using sleep sound information (SOUND) using an AI sleep analysis model described later, and then adds AI certainty for the sleep stage for each time period. extract it as an enemy. If the extracted certainty is less than a predetermined value, the sleep stage result derived by the first sleep analysis is adopted as the sleep stage for the corresponding time period. In other words, more reliable sleep analysis results can be derived by additionally adopting the first sleep analysis results, focusing on the second sleep analysis results.

In another embodiment, the processor 130 first secures statistics of parts that are inconsistent with actual analysis results in the AI sleep analysis model, which will be described later. Statistics may be entered by a user, but may also be independently obtained through data from multiple users. The processor 130 may center around the secondary sleep analysis results (SOUND-based analysis) and additionally adopt the primary sleep analysis results in areas where the obtained statistics do not match the actual analysis results.

In another embodiment, the processor 130 learns an AI sleep analysis model based on the primary sleep analysis results obtained by bio-signal and movement information (ACTIGRAPHY) and sleep sound information (SOUND). method can be used. The learning method of the AI sleep analysis model will be explained in more detail below, but briefly explained, by inputting two pieces of information (first sleep analysis result and sleep sound information) into the input layer of the artificial intelligence model, sleep is determined by two factors. An AI sleep analysis model that performs analysis can be created.

Sleep stages can be divided into NREM (non-REM) sleep and REM (rapid eye movement) sleep, and NREM sleep can be further divided into multiple stages (e.g., stages 2 of light and deep, and stages 4 of N1 to N4). there is. Sleep stage settings may be defined based on generally accepted sleep stages, but may also be arbitrarily set in various ways depending on the designer. Through sleep stage analysis, not only sleep quality but also sleep diseases (e.g. sleep apnea) and their underlying causes (e.g. snoring) can be predicted.

A method for learning and predicting a plurality of sleep state information according to an embodiment of the present invention will be described using an example. However, the specific description related to the sleep state described below is only an example, and the present invention is not limited thereto, and the present invention is not limited thereto, and sleep events such as other sleep state information (e.g., movement information, snoring information, sleep disease information, etc.) not mentioned It should be understood that learning about differences in sleep state information according to at least one of information) and/or differences in environment (e.g., race, etc.) may also be performed.

Acoustic information acquired over a long time interval may be required to learn or predict sleep stage information. In addition, in order to learn or predict sleep state information other than sleep stage information (e.g., sleep event information such as snoring or apnea information), information obtained during a relatively short time interval (e.g., 1 minute) before and after the corresponding sleep state occurs Acoustic information may be required.

When the acoustic information obtained according to an embodiment of the present invention is converted into information including changes in the frequency component along the time axis, information in the frequency domain, or a spectrogram, and used as input to the same artificial intelligence model, multiple sleep states May include methods of performing to learn information. Alternatively, it may include a method of learning a plurality of sleep state information by visualizing information including changes along the time axis of the frequency component of the acquired acoustic information as input to an image processing-based artificial intelligence model. there is.

For example, by using the information of the spectrogram according to an embodiment of the present invention as input to the same feature extraction model, the output information can be input to different feature classification models to perform learning.

Through this method, learning can be performed to predict various sleep state information based on one sound information.

Alternatively, through this method, various sleep state information can be complementarily learned based on one sound information. For example, an artificial intelligence model that only learns sleep stages may incorrectly predict a wake state if a state that generates loud noise, such as apnea or snoring, is recognized. On the other hand, in the case of an artificial intelligence model designed to learn multiple sleep state information, the above problem can be prevented as a result of complementary learning of other sleep state information such as apnea or snoring in addition to sleep stage.

The specific description regarding the above-mentioned time interval and the specific description regarding sleep state information are merely examples for explaining the present invention, and the present invention is not limited thereto.

Acquisition of sleep acoustic information

Sleep sound information is information related to sleep sounds, and may include, for example, sounds generated as the user tosses and turns during sleep, sounds related to muscle movement, or sounds related to the user's breathing during sleep. That is, the sleep sound information of the present invention may include sound information related to the user's movement patterns and breathing patterns during sleep.

In the present invention, sleep sound information is related to sounds related to breathing and body movement, so it may be a very quiet sound. Accordingly, the processor 130 may perform sound analysis by converting the sleep sound information into information or a spectrogram containing changes along the time axis of the frequency components of the raw sleep sound information. In this case, the converted information includes information showing how the frequency spectrum of the sound transforms over time, as described above, so that breathing or The efficiency of analysis can be improved by easily identifying movement patterns.

According to one embodiment of the present invention, as shown in FIG. 8, each spectrogram may be configured to have a frequency spectrum of different concentration according to various sleep stages. In other words, it may be difficult to predict whether the sleep sound information is at least one of the awake state, REM sleep state, light sleep state, and deep sleep state based solely on changes in the energy level of the sleep sound information, but by converting the sleep sound information into a spectrogram, each frequency Since changes in the spectrum can be easily detected, analysis corresponding to small sounds (e.g., breathing and body movements) may be possible. In other words, by imaging the sound and analyzing the image pattern, it can be possible to analyze low-quality sound.

According to one embodiment of the present invention, sleep sound information, which is the basis for sleep state analysis, may include various noises. In an embodiment, sleep sound information acquired corresponding to each of a plurality of users may be acquired in a different bedroom environment for each user and may include different types of noise. Consistent analysis or prediction of sleep sound information using a sleep analysis model may be difficult due to the influence of various types of noise as described above.

Specifically, even if the user's sleep sound is the same, the sleep sound information actually obtained may be different due to various noises generated depending on the bedroom environment or sound measurement device, and accordingly, the neural network model (i.e., sleep analysis model) may be different. Prediction information (i.e., sleep state information) can be output. For example, even if the sleep sounds generated during the user's sleep are all the same, depending on environmental factors related to the space where the user sleeps and differences between the devices that acquire the sleep sounds, as shown in FIG. 2, they may be different. Sleep acoustic information may be obtained. Figure 2 exemplarily shows that different types of sleep sound information are obtained as the same sound information is acquired through various sleep environments and/or various measuring devices.

For a more specific example, the background noise of the sleep sound information obtained may be different depending on the size or structure of the bedroom where the user sleeps. In addition, for example, the sleep sound information obtained may be different due to various noises related to sounds generated in the space where the user sleeps (e.g., sounds of air conditioners, fans, pets, or refrigerators, etc.).

For another example, depending on the type of sound measurement device used to obtain sleep sound information, the sleep sound information obtained in response to the same sleep sound may be different. For a specific example, when first sleeping sound information is acquired through a first user terminal and second sleeping sound information is acquired through a second user terminal in response to the same sleeping sound, the connection between the microphone modules provided in both devices is Due to the difference, the first sleep sound information and the second sleep sound information may not be completely identical. As the microphone module used by each device is different, sleep sound information may include various types of noise.

In other words, the sleep sound information that the user wishes to personally obtain and analyze contains various noises as it is acquired through different bedroom environments and different sound measurement devices for each user, so consistent analysis using the sleep analysis model This can be difficult.

For example, a sleep analysis model can be created by training a neural network using data acquired in various noise-free environments as learning data. As a specific example, a user may sleep in a space (e.g., a hospital) where there is no noise or contains only predetermined noise and is equipped with an acoustic measurement device of predetermined performance, and acquire sleep acoustic information during sleep. The sleep sound information acquired in this way can be labeled with the correct answer (i.e., sleep stage) in response to each time point of the time-series sleep sound by a professional medical professional (e.g., sleep technician), and the sleep sound and labeled data can be Through this, learning on the neural network can be performed and a sleep analysis model can be created.

However, the sleep analysis model generated in the above manner is a sleep environment with little or no noise or contains only predefined noise, and sleep sound information and sleep sound information acquired through a high-performance microphone module. Since the correct answer is learned through labeled learning data, there is a problem in that it is difficult to expect robust performance even for acoustic data acquired in various noise environments. In addition, since it is difficult to determine that it can be robust in various environments, there is a concern that generalization of the sleep analysis model will not be easy.

In particular, since the present invention is intended to easily provide sleep state information in the user's real life, accurate analysis should be possible even for sleep sound information containing various noises depending on the bedroom environment and sound measurement device of each individual user. .

According to an embodiment of the present invention, an electronic device according to an embodiment of the present invention is robust to sleep sound information including various background noises through adaptive learning of a neural network based on sleep sound information related to different domains. A sleep analysis model with performance can be created. Sleep sound information related to different domains may mean sleep sound information acquired by different methods in different environments.

An electronic device according to an embodiment of the present invention generates a sleep analysis model with robust performance even in sleep sound information including various background noises through adaptive learning of a neural network based on sleep sound information related to different domains. The specific method of providing sleep state information through a specific method and sleep analysis model will be described in detail later. In addition, according to an embodiment of the present invention, it is possible to perform preprocessing to remove or reduce noise, or to obtain sleep state information based on acoustic information to which noise has been added. These methods will also be described in detail later. .

Meanwhile, according to an embodiment of the present invention, the processor 130 may obtain sleep state information based on acoustic information, actigraphy, and biometric information obtained from the user terminal 10. Specifically, the processor 130 may identify a singularity where information of a preset pattern is sensed in the acoustic information. Here, the preset pattern information may be related to breathing and movement patterns related to sleep.

For example, in the awake state, all nervous systems are activated, so breathing patterns may be irregular and body movements may be frequent. Additionally, breathing sounds may be very low because the neck muscles are not relaxed. On the other hand, when the user sleeps, the autonomic nervous system stabilizes, breathing changes regularly, body movements may decrease, and breathing sounds may become louder. That is, the processor 130 may identify the point in time at which sound information of a preset pattern related to regular breathing, small body movement, or small breathing sounds, etc., is detected as a singular point in the sound information. Additionally, the processor 130 may obtain sleep sound information based on sound information obtained based on the identified singularity. The processor 130 may identify a singularity related to the user's sleep time from the sound information acquired in time series and obtain sleep sound information based on the singularity.

Figure 4 is a diagram for explaining the process of acquiring sleep sound information in the sleep analysis method according to the present invention. Referring to FIG. 4, the processor 130 may identify a singularity (P) related to the user's sleep from the acoustic information (E). The processor 130 may acquire sleep sound information (SS) based on the identified singular point (P) and sound information acquired after the singular point (P). The waveforms and singularities related to sound in FIG. 4 are merely examples for understanding the present invention, and the present invention is not limited thereto.

In other words, the processor 130 identifies the singularity (P) related to the user's sleep from the acoustic information, and thus extracts only the sleep acoustic information (SS) from the vast amount of environmental sensing information (i.e., acoustic information) based on the singularity (P). It can be obtained by extracting it. This provides convenience by allowing users to automate the process of recording their sleep time, and can also contribute to improving the accuracy of acquired sleep sound information.

Additionally, in an embodiment, the processor 130 may obtain sleep state information related to whether the user is before sleep or in sleep based on the singular point (P) identified from the sound information (E). Specifically, if the singular point (P) is not identified, the processor 130 may determine that the user is before sleeping, and if the singular point (P) is identified, the processor 130 may determine that the user is sleeping after the singular point (P). there is. In addition, after the outlier P is identified, the processor 130 identifies a time point (e.g., waking up time) at which the preset pattern is not observed, and when the corresponding time point is identified, it determines that the user has woken up after sleeping. can do.

That is, the processor 130 determines whether the user is before, during, or after sleep based on whether a singular point (P) is identified in the acoustic information (E) and whether sleep is continuously detected after the singular point is identified. Sleep state information related to can be obtained.

Meanwhile, the processor 130 may obtain sleep state information based on actigraphy or biometric information rather than acoustic information (E). It may be advantageous to obtain the user's movement information through a sensor unit in contact with the body. In the present invention, since the user's sleep state information is identified in advance using actigraphy or biometric information during the first sleep analysis, the reliability of the sleep state analysis can be further improved.

Meanwhile, the technical idea of obtaining sleep state information based on the above-described sleep-related pattern information or singularity is merely an example, and the present invention is not limited to performing inference based on preset pattern information or singularity. , It may include performing inference through an artificial intelligence model created to obtain sleep state information.

Sleep acoustic information acquired in various environments

According to one embodiment of the present invention, the plurality of sleep sound information may include sleep sound information related to different domains. Specifically, a plurality of sleep sound information may be configured to include a plurality of source data and a plurality of target data. The plurality of source data and the plurality of target data are information about sleep sounds related to different domains, and may be characterized as being acquired in different sleep environments.

As an example, the plurality of source data may be acoustic data acquired in a professional sleep measurement environment (e.g., polysomnography) and may be related to the first domain, and the plurality of target data may be acquired in the daily sleep environment of individual users. The audio data may be related to the second domain. For example, the plurality of target data may be a large amount of data (i.e., sleep sound information) acquired from multiple users as the computing device 100 provides a sleep analysis service. For another example, the plurality of target data may be sleep sound information acquired through the microphone module of the user terminal 10.

In an embodiment of the present invention, the plurality of source data may be sleep sound information obtained in a space (e.g., hospital) where noise is absent or contains only predetermined noise and is equipped with an acoustic measurement device of predetermined performance. there is. Additionally, in one embodiment of the present invention, the plurality of source data is data labeled with information about a plurality of sleep states by a medical professional (eg, a sleep technician), and may be data containing predefined noise.

In addition, in an embodiment of the present invention, the plurality of target data may include various types of noise depending on each individual's bedroom environment or may be sleep sound information acquired through each of a plurality of user terminals equipped with different microphone modules. You can. Additionally, in one embodiment of the present invention, the plurality of target data may be data in which information about the plurality of sleep states is not labeled and may be data containing undefined noise.

That is, the plurality of source data may be data related to a plurality of sleep sound data acquired through equipment set in a professional institution (e.g., a hospital) with minimal noise, and the plurality of target data may be data related to each individual user. It may include data related to sleep sound data individually acquired from. As the plurality of target data is acquired in different ways depending on each user's bedroom environment, it contains various noises, and the correct answer regarding sleep state (e.g., sleep stage) may be related to unlabeled acoustic data. .

In the case of multiple target data, various noises may be included depending on the bedroom environment of each user. For example, even if the sleep sound is the same, the sleep sound information obtained may be different depending on the size or structure of the individual user's bedroom and the difference in distance between the user and the sound measuring device. That is, sleep sound information (i.e., a plurality of target data) including different background noise can be obtained depending on the size and shape of the sleeping space and the location of the sound measurement device when the user sleeps.

Additionally, for example, the plurality of target data may include various noises related to sounds generated in a space where the user sleeps. For example, the space where the user sleeps may contain various noises, such as the sound of electronic appliances such as air conditioners and fans operating, or the sounds of pets.

For another example, the plurality of target data may contain various noises depending on the type of sound measurement device used to obtain sleep sound information. For a specific example, when first sleeping sound information is acquired through a first user terminal and second sleeping sound information is acquired through a second user terminal in response to the same sleeping sound, the microphone modules provided in both devices Due to differences in specifications, the first sleep sound information and the second sleep sound information may not be completely identical. As the microphone module used for each device is different, sleep sound information may include various types of noise.

That is, as described above, the plurality of target data is acoustic data acquired in the individual sleeping environment of each of the plurality of users, and may include more diverse noises.

According to the embodiment, each of the plurality of target data may be acquired through the user terminal 10 carried by the user. For example, sleep sound information related to the user's sleep environment may be obtained through a microphone module provided in the user terminal 10.

In general, the microphone module provided in the user terminal 10 carried by the user may be configured as a MEMS (Micro-Electro Mechanical System) since it must be provided in the user terminal 10 of a relatively small size. These microphone modules can be manufactured very small, but can have a lower signal-to-noise ratio (SNR) than condenser microphones or dynamic microphones. A low signal-to-noise ratio may mean that the ratio of noise, which is a sound that is not to be identified, to the sound that is to be identified is high, making it difficult to identify the sound (i.e., unclear). Therefore, it is necessary to remove or alleviate noise, which will be described in detail below.

Noise reduction preprocessing

As shown in Figure 14 or Figure 5, sound information extracted from the user or sleep sound information (raw data) extracted therefrom undergoes a pre-processing process of noise reduction. In the noise reduction process, noise (e.g. white noise) included in raw data is removed. The noise reduction process can be accomplished using algorithms such as spectral gating and spectral subtraction to remove background noise. Furthermore, in the present invention, a noise removal process can be performed using a deep learning-based noise reduction algorithm. The deep learning-based noise reduction algorithm can use a noise reduction algorithm specialized for the user's breathing or breathing sounds, that is, a noise reduction algorithm learned through the user's breathing or breathing sounds.

Preprocessing may be performed during the learning process of sleep state information, or may be performed during the inference process. Below, an example of the preprocessing process for noise reduction will be described.

Spectral noise gating

Spectral gating or spectral noise gating is a preprocessing method for acoustic information. Noise reduction can be performed on all of the acquired acoustic information, but splitting can be performed at regular time intervals (eg, 5 minutes, etc.), and then noise reduction can be performed on each of the split acoustic information. In order to perform noise reduction on acoustic information split at regular time intervals, a method of calculating a spectrum for each frame may first be included.

Among each spectrum frame calculated as a result, the frame with the frequency spectrum with the lowest energy can be specified.

A method may be included in which the frame having the frequency spectrum with the lowest energy among each spectrum frame is assumed to be static noise, and the frequency of the frequency spectrum frame assumed to be static noise is attenuated from the spectrum frame.

Meanwhile, according to an embodiment of the present invention, when noise reduction preprocessing is performed on a plurality of data, sleep sound information can be classified into one or more sound frames. Here, the minimum sound frame with the minimum energy level may be identified based on the energy level of each of the one or more sound frames. Accordingly, noise removal or reduction can be performed on the acoustic data based on the minimum acoustic frame.

As a specific example, the processor 130 may classify 30 seconds of sleep sound information (eg, target data) into one or more very short sound frames of 40 ms in size. Additionally, the processor 130 may identify the minimum sound frame with the minimum energy level by comparing the sizes of each of the plurality of sound frames with respect to the size of 40 ms.

The processor 130 may remove the identified minimum sound frame component from the entire sleep sound information (i.e., 30 seconds of sleep sound information). For example, referring to FIG. 14, as the minimum sound frame component is removed from the sleep sound information, preprocessed sleep sound information can be obtained. That is, the processor 130 may identify the minimum sound frame as the background noise frame and perform noise removal or reduction from the original signal (i.e., sleep sound information). The specific numerical description of the above-mentioned time interval is merely an example and is not limited thereto.

Deep learning-based noise reduction

Meanwhile, in order to perform noise reduction preprocessing according to an embodiment of the present invention, a deep learning-based noise reduction method performed on raw acoustic information in the time domain rather than the frequency domain may be used. For deep learning-based noise reduction, a method may be used in which information such as sleep sound information, which is necessary information to be used as input to a sleep analysis model, is maintained, and other sounds are attenuated.

Noise reduction can be performed not only on sound information obtained through PSG test results, but also on sound information acquired through a microphone built into a user terminal such as a smartphone.

Convert raw acoustic information to information in the frequency domain

The sleep analysis method according to the present invention creates an inference model through deep learning of acoustic information, and the inference model extracts the user's sleep state and sleep stage. To briefly explain again, environmental sensing information (sound information), including sleep sound information, is converted into information including changes in the frequency components of the sound information along the time axis, or information in the frequency domain, and is based on the converted information. Thus, an inference model can be created.

Alternatively, according to one embodiment of the present invention, environmental sensing information including sleep sound information is converted into frequency domain information or a spectrogram, and an inference model is created based on the converted frequency domain information or spectrogram. Here, the information in the frequency domain may be information including changes along the time axis in the frequency components of raw sleep sound information.

Protection of user privacy cannot be overlooked in sleep analysis using acoustic information, and the present invention uses a process of preprocessing acoustic information to protect user privacy.

At this time, a method of converting raw acoustic information into information or a spectrogram in the frequency domain based only on the amplitude excluding the phase can be used. Through this method, privacy is not only protected, but the processing speed is improved by lowering the data capacity. You can do it. However, in another embodiment, it is also possible to generate a spectrogram using both phase and amplitude.

One embodiment of the present invention can generate a sleep analysis model using a spectrogram (SP) converted based on sleep sound information (SS).

If the sleeping sound information expressed as audio data is used as is, the amount of information is very large, so the amount of calculation and calculation time will increase significantly, and not only will the calculation precision be lowered because it includes unwanted signals, but also all of the user's audio. If the signal is transmitted to the server, there is a risk of privacy infringement.

Therefore, an embodiment of the present invention removes noise from sleep sound information using the above-described method, converts it into information or a spectrogram in the frequency domain, and learns the spectrogram to create a sleep analysis model, so the amount of computation is reduced. , computation time can be reduced and individual privacy can be protected.

For example, in acoustic information acquired through a microphone, etc., the sleep acoustic information (e.g., the user's breathing sound, etc.) required for sleep stage analysis may be relatively smaller than other noise, but when converted to a spectrogram, it can be compared to other noise around it. Identification of sleep acoustic information can be relatively excellent.

On the other hand, when converting to a spectrogram according to an embodiment of the present invention, personal information cannot be identified by converting the resolution of the frequency domain to low, with frequency resolution (frequency bins) below a certain number (e.g., 20). If configured, personal information cannot be identified from the recovery signal.

Additionally, according to an embodiment of the present invention, a method may be included to convert acquired acoustic information into a spectrogram in real time.

In addition, as compression of the frequency resolution of the spectrogram can be performed on the user's smartphone rather than on a server or cloud, leakage of personal information can be prevented.

At this time, de-identification of sound data can be done for natural language and breathing sounds, which can be converted into natural language conversion spectrogram, breathing sound conversion spectrogram, etc., respectively. In sleep analysis according to the present invention, calculation speed can be improved and calculation load can be reduced by using only the information necessary for the analysis model. Meanwhile, the spectrogram according to an embodiment of the present invention may be a Mel spectrogram to which the Mel scale is applied.

Method for converting raw sleep acoustic information

As shown in FIG. 6, the processor 130 may generate a spectrogram (SP) in response to the sleep sound information (SS). Raw data (raw acoustic information in the time domain) that is the basis for generating a spectrogram (SP) can be input, and raw data according to the present invention can also be collected through polysomnography (PSG) in a hospital environment. Additionally, user information in a home environment may be collected through a microphone built into a user terminal such as a wearable device or smartphone.

In addition, raw data is acquired through the user terminal 10, such as a wearable device or smartphone, from the start point input by the user to the end point, or is obtained through device manipulation (e.g., device operation) from the time the user operates the device (e.g., setting an alarm). Alarm setting time), or the time point may be automatically selected and acquired based on the user's sleep pattern. ) It can also be obtained by automatically determining the viewpoint based on sound, etc.) or changes in illumination.

The processor 130 may perform fast Fourier transform on the sleep sound information (SS) and convert it into information including changes in the frequency components of the sleep sound information (SS) along the time axis. Specifically, this information may be information in the frequency domain, and may be a spectrogram or a Mel spectrogram to which a Mel scale is applied. Information that includes changes along the time axis of these frequency components, information in the frequency domain, or spectrogram (SP) is used to visualize and understand sound or waves, and is a combination of waveform and spectrum characteristics. It could be.

Additionally, this information may be visualized by representing the information along the time axis of the frequency components of the sound information as a difference in amplitude according to changes in the time axis and frequency axis as a difference in print density or display color. When visualized in this way, sleep state information can be obtained as input to an image processing-based artificial intelligence model. Through this method, by converting sound signals into image signals, time-series sleep analysis is possible using data over a relatively long period of time, and the accuracy of sleep analysis is more accurate than analysis based on raw sleep sound information. It has the advantage of being able to go higher.

Preprocessed acoustic-related raw data can be cut into 30-second increments and converted into a spectrogram. Accordingly, a 30-second spectrogram has dimensions of 20 frequency bins x 1201 time steps. In the present invention, a rectangular spectrogram can be converted into a shape close to a square by using various methods such as reshaping, resizing, and split-cat to change it into a shape close to a square. Alternatively, by using this method, the amount of information can be preserved.

Meanwhile, the present invention can use a method of simulating breathing sounds measured in various home environments by adding various noises occurring in the home environment to clean breathing sounds. Because sounds have additive properties, they can be added to each other. However, adding original sound signals such as mp3 or pcm and converting them to a spectrogram results in very large consumption of computing resources. Therefore, the present invention proposes a method of converting breathing sounds and noise into spectrograms and adding them, respectively. Through this, it is possible to secure the robustness of sleep analysis in various home environments by simulating breathing sounds measured in various home environments and using them to learn deep learning models.

Preprocessing of converted information

The purpose of converting data according to an embodiment of the present invention into information including changes in frequency components along the time axis, information in the frequency domain, or a spectrogram is to use the converted information as an input to a sleep analysis model. The purpose is to infer through the learned model which sleep state or sleep stage the pattern corresponds to, and several preprocessing processes may be required before using it as input to the sleep analysis model.

In addition, according to an embodiment of the present invention, the converted information is converted so that the acoustic information becomes the input of an image processing-based artificial intelligence model, so the acoustic information may be visualized through this preprocessing process before being input.

These preprocessing processes may be performed only during the learning process, or may be performed not only during the learning process but also during the inference process. Or, it may only occur during the reasoning process.

Data augmentation preprocessing

According to an embodiment of the present invention, a preprocessing method that performs data augmentation on the spectrogram may be included.

Data augmentation is intended to secure a sufficient amount of learning data set or to conduct sufficient learning assuming a diverse and anomalous environment.

The data augmentation preprocessing method according to an embodiment of the present invention includes adding Gaussian noise to the spectrogram to inflate the amount of data, or a pitch shifting method of gradually raising or lowering the pitch of the overall acoustic information. , the spectrogram or mel spectrogram is converted to a vector during the learning process, and the converted vector is randomly cut (tiled) at the input stage of one node (neuron) and recombined after the output of the node (neuron) ( Untile) TUT (Tile UnTile) augmentation method may be included.

In addition, the data augmentation preprocessing method according to an embodiment of the present invention includes noise occurring in various environments other than Gaussian noise (e.g., external sounds, sounds of nature, sounds of a fan running, sounds of doors opening or closing, animals A noise addition augmentation method that adds sounds (sounds made by people, people talking, movement sounds, etc.) may also be included.

In order to shorten the learning time when using a spectrogram as an input to a learning model, noise addition augmentation according to an embodiment of the present invention converts noise information into a spectrogram and then artificially adds it to the sleep sound information and the spectrogram. It may include how to do it. In this case, there is a significant difference between the spectrogram obtained by adding noise information to the sleep sound information in the original sound information domain and the spectrogram obtained by adding sleep sound information and noise information to the spectrogram converted domain. There may not be.

In addition, the noise-added augmentation according to the embodiment of the present invention makes it difficult to convert the spectrogram back to the original signal, so that in order to protect the user's privacy, the amplitude and phase are changed from the spectrogram of each sleep sound information and noise information. By maintaining only the amplitude and adding a random phase, it is possible to make it difficult to convert back to the original signal from the spectrogram.

Alternatively, noise addition augmentation according to an embodiment of the present invention may include not only a method of adding sound information on a domain converted into a spectrogram, but also a method of adding noise on a domain converted into a Mel spectrogram to which a Mel scale is applied. there is.

Additionally, the time required for hardware to process data can be shortened by the method added by Mel Scale according to an embodiment of the present invention.

Meanwhile, the detailed description of the types of noise described above is merely an example for explaining the noise addition augmentation of the present invention, and the present invention is not limited thereto.

TUT (Tile UnTile) augmentation according to an embodiment of the present invention uses a spectrogram or vector at the input and output stages of a node (neuron) to increase the amount of learning data of various patterns when using a spectrogram as an input to a learning model. It may include randomly cutting and combining steps. Spectrograms or vectors that are randomly cut at the input stage of a node (neuron) have missing data or have less information than the information of the spectrogram or vector input to the layer of the corresponding neural network that has not been cut. There may be losses. In this case, limited information can be input to the node (neuron) and learned. A node (neuron) that takes a cut spectrogram or vector as input can output the vector after calculation. At this time, it can be combined (untiled) in the same way as it was cut again before being used as the input of the node (neuron) of the next neural network layer.

In addition, TUT augmentation according to an embodiment of the present invention randomly cuts spectrograms or vectors at the input and output stages of nodes (neurons) and combines them in the same way to induce learning of data with missing information, It can contribute to increasing the accuracy or reliability of the learning model.

Preprocessing to convert to a shape close to square

According to an embodiment of the present invention, after going through the data augmentation process of the information or spectrogram in the frequency domain, such as the above-described pitch shifting, noise-added augmentation, or TUT augmentation, the frequency domain A preprocessing method may be performed to convert the image information or spectrogram into a nearly square form.

According to an embodiment of the present invention, before using the information or spectrogram on the frequency domain on which data augmentation was performed as input to deep learning models CNN, Transformer, Vision Transformer (ViT), and Mobile Vision Transformer (MobileViT), After conversion to a form close to a square, the information or spectrogram in the frequency domain converted to a form close to a square can be used as input to AI, a deep learning model.

In performing preprocessing to convert information or spectrogram in the frequency domain into a nearly square form according to an embodiment of the present invention, various methods such as reshaping, resize, and split-cat are used. It can be converted to a shape close to a square.

According to an embodiment of the present invention, when preprocessing is performed to convert a shape close to a square by resizing, a method of lowering the resolution on the x-axis and increasing the resolution on the y-axis by copying values can be performed.

In addition, according to an embodiment of the present invention, a preprocessing method is performed to convert the entire 30-second spectrogram with a dimension of 20 frequency bin Х 1201 time step into a form close to a square at once by resizing, while removing missing information. It can be supplemented by using interpolation.

The split-cat preprocessing method according to an embodiment of the present invention is a method of splitting the spectrogram to a certain size and then concatenating the data into a nearly square shape using the concatenation function. It can be included. In other words, the split-cat method splits one spectrogram so that it can correspond to a patch in order to perform learning on a patch-by-patch basis in a deep learning model based on Vit (Vision Transformer) or Mobile Vit (Mobile Vision Transformer). Second, it refers to a method of merging each patch into a shape close to a square.

For example, according to one embodiment of the present invention, a 30-second spectrogram with dimensions of 20 frequency bins Х 1201 time steps can be converted to dimensions of 150 frequency bins Х 160 time steps. Afterwards, a 30-second spectrogram with a dimension of 150 frequency bin Х 160 time step can be converted to a dimension close to 160 frequency bin Х 160 time step using a resizing technique. According to this process, the spectrogram corresponding to every 30 second interval can be converted into a form close to a square.

When learning is performed using a spectrogram converted to a nearly square form as input, a deep learning model based on the Transformer learning model can show further improved learning performance. The detailed numerical description of the bins, division time units, and number of divisions of the above-described spectrogram is only an example, and the present invention is not limited thereto.

Scale conversion and normalization preprocessing

The information or spectrogram in the frequency domain according to an embodiment of the present invention has a very small value, so if it is not converted to another scale, it is expressed very brightly in the part where the value is greater than a certain level, while it is expressed very dark in the remaining part, so deep learning It may be inappropriate to use as model input. Accordingly, a preprocessing process can be performed to convert the information or spectrogram in the frequency domain according to an embodiment of the present invention to dB scale (log scale) before using it as input to the deep learning model.

When performing log scale conversion preprocessing according to an embodiment of the present invention, the maximum value of the logarithmic value can be set to 0 as the default base value, and the remaining values can be converted to logarithmic values.

For the spectrogram converted to logarithmic values according to an embodiment of the present invention, normalization preprocessing is additionally performed so that the average of all values is 0 and the standard deviation is 1, and then used as the input of the deep learning model. It may be possible.

By using this preprocessed data as input to the image processing deep learning model, information such as sleep state information can be learned or inferred through image analysis of the spectrogram. The detailed numerical description of the maximum log value of the spectrogram described above is only an example, and the present invention is not limited thereto.

sleep analysis model

According to one embodiment of the present invention, the method may include a step (S10) of acquiring sleep sound information related to the user's sleep.

According to an embodiment of the present invention, the method may include performing preprocessing on sleep sound information (S20).

According to an embodiment of the present invention, the method may include a step (S30) of obtaining sleep state information by performing analysis on preprocessed sleep sound information.

The order of the steps shown in FIG. 25 described above may be changed as needed, and at least one step may be omitted or added. That is, the above-described steps are only one embodiment of the present invention, and the scope of the present invention is not limited thereto.

In the present invention, sleep state information can be obtained through a sleep analysis model that analyzes the user's sleep stage based on sound information (sleep sound information).

In the present invention, the sleep sound information (SS) may be a very small sound because it is a sound related to breathing and body movement acquired during the user's sleeping time. Accordingly, the present invention uses the sleep sound information (SS) as described above. Analysis of sound can be performed by converting it into a spectrogram (SP). In this case, the spectrogram (SP) contains information that shows how the frequency spectrum of the sound changes over time, so breathing or movement patterns related to relatively small sounds can be easily identified, improving the efficiency of analysis. there is. Specifically, it may be difficult to predict whether the sleep sound information is at least one of the awake state, REM sleep state, light sleep state, and deep sleep state based solely on changes in the energy level of the sleep sound information, but by converting the sleep sound information into a spectrogram, each sleep sound information is converted into a spectrogram. Since changes in the frequency spectrum can be easily detected, analysis corresponding to small sounds (eg, breathing and body movements) may be possible.

The processor 130 may obtain sleep state information by processing information on the frequency domain or a spectrogram (SP) converted according to an embodiment of the present invention as an input to a sleep analysis model. Here, the sleep analysis model is a model for obtaining sleep state information related to changes in the user's sleep stage, and can output sleep state information by inputting sleep sound information acquired during the user's sleep. In embodiments, the sleep analysis model may include a neural network model constructed through one or more network functions.

network function

A sleep analysis model is comprised of one or more network functions, and one or more network functions may be comprised of a set of interconnected computational units, which may generally be referred to as 'nodes'. These 'nodes' may also be referred to as 'neurons'. One or more network functions are composed of at least one or more nodes. Nodes (or neurons) that make up one or more network functions may be interconnected by one or more 'links'.

Figure 7 is a schematic diagram showing one or more network functions for performing the sleep analysis method according to the present invention. A deep neural network (DNN) may refer to a neural network that includes multiple hidden layers in addition to the input layer and output layer. Deep neural networks allow you to identify latent structures in data. In other words, it is possible to identify the potential structure of a photo, text, video, voice, or music (e.g., what object is in the photo, what the content and emotion of the text are, what the content and emotion of the voice are, etc.) . Deep neural networks include convolutional neural networks (CNN), recurrent neural networks (RNN), auto encoders, generative adversarial networks (GAN), and restricted Boltzmann machines (RBMs). boltzmann machine), deep belief network (DBN), Q network, U network, Siamese network, Transformer, Vision Transformer (ViT), Mobile Vision Transformer (ViT), etc. The description of the deep neural network described above is only an example and the present invention is not limited thereto.

In the present invention, the network function may include an auto encoder. An autoencoder may be a type of artificial neural network to output output data similar to input data. The autoencoder may include at least one hidden layer, and an odd number of hidden layers may be placed between input and output layers. The number of nodes in each layer may be reduced from the number of nodes in the input layer to an intermediate layer called the bottleneck layer (encoding), and then expanded symmetrically and reduced from the bottleneck layer to the output layer (symmetrical to the input layer). The nodes of the dimensionality reduction layer and dimensionality restoration layer may or may not be symmetric. Autoencoders can perform nonlinear dimensionality reduction. The number of input layers and output layers may correspond to the number of sensors remaining after preprocessing of the input data. In an auto-encoder structure, the number of nodes in the hidden layer included in the encoder may have a structure that decreases as the distance from the input layer increases. If the number of nodes in the bottleneck layer (the layer with the fewest nodes located between the encoder and decoder) is too small, not enough information may be conveyed, so if it is higher than a certain number (e.g., more than half of the input layers, etc.) ) may be maintained.

A neural network may be trained in at least one of supervised learning, unsupervised learning, and semi-supervised learning. Learning of a neural network is intended to minimize errors in output. In neural network learning, learning data is repeatedly input into the neural network, the output of the neural network and the error of the target for the learning data are calculated, and the error of the neural network is transferred from the output layer of the neural network to the input layer in the direction of reducing the error. This is the process of updating the weight of each node in the neural network through backpropagation. In the case of supervised learning, learning data in which the correct answer is labeled for each learning data is used (i.e., labeled learning data), and in the case of unsupervised learning, the correct answer may not be labeled in each learning data. That is, for example, in the case of supervised learning on data classification, the learning data may be data in which each training data is labeled with a category. Labeled training data is input to the neural network, and the error can be calculated by comparing the output (category) of the neural network with the label of the training data. As another example, in the case of unsupervised learning on data classification, the error can be calculated by comparing the input training data with the neural network output. The calculated error is backpropagated in the reverse direction (i.e., from the output layer to the input layer) in the neural network, and the connection weight of each node in each layer of the neural network can be updated according to backpropagation. The amount of change in the connection weight of each updated node may be determined according to the learning rate. The neural network's calculation of input data and backpropagation of errors can constitute a learning cycle (epoch). The learning rate may be applied differently depending on the number of repetitions of the learning cycle of the neural network. For example, in the early stages of neural network training, a high learning rate can be used to increase efficiency by allowing the neural network to quickly achieve a certain level of performance, and in the later stages of training, a low learning rate can be used to increase accuracy.

In the learning of neural networks, the training data can generally be a subset of real data (i.e., the data to be processed using the learned neural network), and thus the error for the training data is reduced, but the error for the real data is reduced. There may be an incremental learning cycle. Overfitting is a phenomenon in which errors in actual data increase due to excessive learning on training data. For example, a phenomenon in which a neural network that learned a cat by showing a yellow cat fails to recognize that it is a cat when it sees a non-yellow cat may be a type of overfitting. Overfitting can cause errors in AI algorithms to increase. To prevent such overfitting, various optimization methods can be used. To prevent overfitting, methods such as increasing the learning data, regularization or regularization, and dropout, which omits some of the network nodes during the learning process, can be applied.

Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. (Hereinafter, it is described collectively as a neural network.) The data structure may include a neural network. And the data structure including the neural network may be stored in a computer-readable medium. Data structures including neural networks may also include data input to the neural network, weights of the neural network, hyperparameters of the neural network, data obtained from the neural network, activation functions associated with each node or layer of the neural network, and loss functions for learning the neural network. there is. A data structure containing a neural network may include any of the components disclosed above. In other words, the data structure including the neural network is all or It may be configured to include any combination of these. In addition to the configurations described above, a data structure containing a neural network may include any other information that determines the characteristics of the neural network. Additionally, the data structure may include all types of data used or generated in the computational process of a neural network and is not limited to the above. Computer-readable media may include computer-readable recording media and/or computer-readable transmission media. A neural network can generally consist of a set of interconnected computational units, which can be referred to as nodes. These nodes may also be referred to as neurons. A neural network consists of at least one node.

Within a neural network, one or more nodes connected through a link may form a relative input node and output node relationship. The concepts of input node and output node are relative, and any node in an output node relationship with one node may be in an input node relationship with another node, and vice versa. As described above, input node to output node relationships can be created around links. One or more output nodes can be connected to one input node through a link, and vice versa.

In a relationship between an input node and an output node connected through one link, the value of the output node may be determined based on data input to the input node. Here, the nodes connecting the input node and the output node may have a weight. Weights may be variable and may be varied by the user or algorithm in order for the neural network to perform the desired function. For example, when one or more input nodes are connected to one output node by respective links, the output node is set to the values input to the input nodes connected to the output node and the links corresponding to each input node. The output node value can be determined based on the weight.

As described above, in a neural network, one or more nodes are interconnected through one or more links to form an input node and output node relationship within the neural network. The characteristics of the neural network can be determined according to the number of nodes and links within the neural network, the correlation between the nodes and links, and the value of the weight assigned to each link. For example, if there are two neural networks with the same number of nodes and links and different weight values between the links, the two neural networks may be recognized as different from each other.

Some of the nodes constituting the neural network may form one layer based on the distances from the first input node. For example, a set of nodes with a distance n from the initial input node may constitute n layers. The distance from the initial input node can be defined by the minimum number of links that must be passed to reach the node from the initial input node. However, this definition of a layer is arbitrary for explanation purposes, and the order of a layer within a neural network may be defined in a different way than described above. For example, a layer of nodes may be defined by distance from the final output node.

The initial input node may refer to one or more nodes in the neural network through which data is directly input without going through links in relationships with other nodes. Alternatively, in a neural network network, in the relationship between nodes based on links, it may mean nodes that do not have other input nodes connected by links. Similarly, the final output node may refer to one or more nodes that do not have an output node in their relationship with other nodes among the nodes in the neural network. Additionally, hidden nodes may refer to nodes constituting a neural network other than the first input node and the last output node. The neural network according to an embodiment of the present invention may have more nodes in the input layer than the nodes in the hidden layer close to the output layer, and may be a neural network in which the number of nodes decreases as it progresses from the input layer to the hidden layer.

A neural network may contain one or more hidden layers. The hidden node of the hidden layer can take the output of the previous layer and the output of surrounding hidden nodes as input. The number of hidden nodes for each hidden layer may be the same or different. The number of nodes in the input layer may be determined based on the number of data fields of the input data and may be the same as or different from the number of hidden nodes. Input data input to the input layer can be operated by the hidden node of the hidden layer and output by the fully connected layer (FCL), which is the output layer.

Feature extraction model and feature classification model

The sleep analysis model used in the present invention is a feature extraction model that extracts one or more features for each predetermined epoch and a feature classification model that generates sleep state information by classifying each of the features extracted through the feature extraction model into one or more sleep stages. may include. The feature extraction model can extract features related to breathing sounds, breathing patterns, and movement patterns by analyzing the time-series frequency pattern of the spectrogram (SP). In one embodiment, the feature extraction model may be constructed from part of a neural network model that has been pre-trained using a training data set.

The sleep analysis model used in the present invention may include a feature extraction model and a feature classification model. The feature extraction model may be a deep learning learning model based on a natural language processing model that can learn the time-series correlation of given data. The feature classification model may be a learning model based on a natural language processing model that can learn the time-series correlation of given data. Here, deep learning learning models based on natural language processing models that can learn time-series correlations may include Tarnsformer, ViT, MobileViT, and MobileViT2, but are not limited thereto.

The learning data set according to an embodiment of the present invention may be composed of data in the frequency domain and a plurality of sleep state information corresponding to each data.

Alternatively, the learning data set according to an embodiment of the present invention may be composed of a plurality of spectrograms and a plurality of sleep state information corresponding to each spectrogram.

Alternatively, the learning data set according to an embodiment of the present invention may be composed of a plurality of Mel spectrograms and a plurality of sleep state information corresponding to each Mel spectrogram.

Below, for convenience of explanation, the configuration and performance of the sleep analysis model according to an embodiment of the present invention will be described in detail based on the data set of the spectrogram. However, the learning data used in the sleep analysis model of the present invention is included in the spectrogram. It is not limited, and information in the frequency domain, a spectrogram, or a mel spectrogram can be used as learning data.

Among the sleep analysis models according to an embodiment of the present invention, the feature extraction model is a one-to-one proxy task in which one spectrogram is input and learned to predict sleep state information corresponding to one spectrogram. It can be pre-trained by . When adopting a CNN deep learning model as a feature extraction model according to an embodiment of the present invention, learning may be performed by adopting the structure of FC (Fully Connected Layer) or FCN (Fully Connected Neural Network). When using the MobileViTV2 deep learning model as a feature extraction model according to an embodiment of the present invention, learning may be performed by adopting the structure of the intermediate layer.

Among the sleep analysis models according to an embodiment of the present invention, the feature classification model inputs a plurality of consecutive spectrograms, predicts sleep state information of each spectrogram, and analyzes the sequence of the plurality of consecutive spectrograms. It can be learned to predict or classify overall sleep state information.

In addition, according to an embodiment of the present invention, pre-learning is performed through a one-to-one proxy task for the feature extraction model, and then through many-to-many tasks for the pre-trained feature extraction model and feature classification model. Fine tuning can be performed. For example, the sleep stage may be inferred by inputting a sequence of 40 consecutive spectrograms into a plurality of feature extraction models learned through a one-to-one proxy task and outputting 20 sleep state information. The above-described specific numerical descriptions regarding the number of spectrograms, the number of feature extraction models, and the number of sleep state information are merely examples, and the present invention is not limited thereto.

Hereinafter, a feature extraction model and a feature classification model generated or learned based on the converted spectrogram according to an embodiment of the present invention will be described in detail. Meanwhile, the sleep analysis model of the present invention is not limited to being generated or learned based on a spectrogram, and as described above, information including changes along the time axis of the frequency components of raw acoustic information, or, It may be generated or learned based on information in the frequency domain. Additionally, inference of sleep state information through a sleep analysis model can also be performed based on information including changes in the frequency components of raw acoustic information along the time axis or converted to information in the frequency domain.

Figures 29a and 29b are diagrams for explaining the performance of determining sleep disorder and adding noise using a spectrogram in the sleep analysis method according to the present invention.

As shown in FIGS. 29A and 29B, according to the sleep analysis model according to an embodiment of the present invention, apnea can be detected with high reliability even when the spectrogram is damaged by noise.

Feature extraction model

The feature extraction model may be composed of an independent deep learning model learned through a training data set. The feature extraction model can be learned through supervised learning or unsupervised learning methods. A feature extraction model can be trained to output output data similar to input data through a learning data set. To explain in detail, only the core feature data (or features) of the input spectrogram can be learned through the hidden layer. In this case, during the decoding process through the decoder, the output data of the hidden layer may be an approximation of the input data (i.e., spectrogram) rather than a perfect copy value.

Each of the plurality of spectrograms included in the learning data set may be tagged with sleep state information. Each of the plurality of spectrograms may be input to a feature extraction model, and the output corresponding to each spectrogram may be stored by matching the tagged sleep state information. Specifically, when first learning data sets (i.e., multiple spectrograms) tagged with first sleep state information (e.g., light sleep) are used as input, features related to the output for the input are first sleep state information. It can be saved by matching. In embodiments, one or more features relevant to the output may be represented in a vector space. In this case, since the feature data output corresponding to each of the first learning data sets is output through a spectrogram related to the first sleep stage, they may be located at a relatively close distance in the vector space. That is, learning can be performed so that a plurality of spectrograms output similar features corresponding to each sleep stage.

When the feature extraction model through the above-described learning process receives a spectrogram (eg, a spectrogram converted in response to sleep sound information) as input, features corresponding to the spectrogram can be extracted.

In an embodiment, the processor 130 may extract features by processing the spectrogram (SP) generated in response to the sleep sound information (SS) as an input to a feature extraction model. Here, since the sleep sound information (SS) is time series data obtained sequentially during the user's sleep, the processor 130 may divide the spectrogram (SP) into predetermined epochs. For example, the processor 130 may obtain a plurality of spectrograms by dividing the spectrogram (SP) corresponding to the sleep sound information (SS) into 30-second increments. For example, if sleep sound information is acquired during the user's 7-hour (i.e., 420-minute) sleep, the processor 130 may obtain 840 spectrograms by dividing the spectrogram in 30-second increments. The detailed numerical description of the above-described sleep time, division time unit of the spectrogram, and number of divisions is only an example, and the present invention is not limited thereto.

The processor 130 may process each of the plurality of segmented spectrograms as input to a feature extraction model to extract a plurality of features corresponding to each of the plurality of spectrograms. For example, if the number of spectrograms is 840, the number of features extracted by the feature extraction model may also be 840. The above-described specific numerical description regarding the spectrogram and number of features is only an example, and the present invention is not limited thereto.

Meanwhile, the feature extraction model according to an embodiment of the present invention may be trained using a one-to-one proxy task. Additionally, in the process of learning to extract sleep state information for one spectrogram, it may be learned to extract sleep state information by combining a feature extraction model and another NN (Neural Network).

According to an embodiment of the present invention, if learning is performed through a simple pre-trained Neural Network, the learning time of the feature extraction model can be shortened or the learning efficiency can be increased.

For example, according to one embodiment of the present invention, one spectrogram divided in 30-second increments may be used as an input to a feature extraction model, and the output vector may be learned to output sleep state information by using it as an input to another NN. .

Meanwhile, according to an embodiment of the present invention, the processor 130 may generate a feature extraction model using a plurality of source data. Additionally, the feature extraction model may include a dimensionality reduction network function (eg, Encoder).

In an embodiment, each of a plurality of frequency domain information or spectrograms (i.e., a plurality of converted information corresponding to a plurality of source data) used as learning data may be labeled with sleep stage information. For example, the plurality of source data is information about sleeping sounds acquired in a specific space (eg, hospital), and information about a plurality of sleep states (ie, sleep stages) may be pre-labeled.

Each of the plurality of pieces of converted information may be input to a dimensionality reduction network function, and the output corresponding to each piece of converted information may be matched with labeled sleep stage information. Specifically, when first learning data sets (e.g., a plurality of spectrograms related to source data) labeled with first sleep stage information (e.g., light sleep) are used as input to the dimensionality reduction network function, the dimensionality for the input Features related to the output of the reduction network function may be matched with first sleep stage information.

In embodiments, one or more features associated with the output of a dimensionality reduction network function may be represented on a vector space. In this case, the feature data output corresponding to each of the first learning data sets are output through a spectrogram related to the first sleep stage (e.g., output through a spectrogram corresponding to the same class), so they are relatively close in the vector space. It can be located on the street.

In other words, learning of the dimensionality reduction network function may be performed so that a plurality of spectrograms output similar features corresponding to each sleep stage, but the specific learning method of the dimensionality reduction network is not limited.

Through the above-described learning process, when the feature extraction model inputs information converted to sleep sound information, it can extract features corresponding to the converted information.

Feature classification model

According to an embodiment of the present invention, the processor 130 may obtain sleep state information by processing a plurality of features output through a feature extraction model as input to a feature classification model. In an embodiment, the feature classification model may be a neural network model modeled to predict sleep stages in response to features. For example, the feature classification model includes a fully connected layer and may be a model that classifies features into at least one of the sleep stages. For example, when the feature classification model inputs the first feature corresponding to the first spectrogram, the first feature may be classified as shallow water. Or, for example, when the feature classification model inputs a second feature corresponding to the second spectrogram, the feature classification model may classify the second feature as deep sleep. Alternatively, when the feature classification model uses a third feature corresponding to the third spectrogram as input, the third feature may be classified as REM sleep. Or, for example, when the feature classification model inputs the fourth feature corresponding to the fourth spectrogram, the feature classification model may classify the fourth feature as fine.

Additionally, in one embodiment of the present invention, the feature classification model may be a neural network model modeled to predict events that occur in sleep corresponding to features. For example, the feature classification model includes a fully connected layer and may be a model that classifies a feature as at least one of the events that occur on the water surface. For example, when the feature classification model inputs the first feature corresponding to the first spectrogram, the first feature may be classified as a sleep apnea event. For example, when the feature classification model inputs a second feature corresponding to the second spectrogram, it can classify the second feature as an occurrence of a sleep hypopnea event. For example, when the feature classification model inputs the third feature corresponding to the third spectrogram, the feature classification model may classify the third feature as a normal sleep state. For example, when the feature classification model inputs the fourth feature corresponding to the fourth spectrogram, the fourth feature may be classified as a snoring event during sleep. For example, when the feature classification model inputs the fifth feature corresponding to the fifth spectrogram, it can classify the fifth feature as a sleep talking event during sleep.

According to an embodiment of the present invention, a feature classification model can perform classification for a plurality of features. The feature classification model may classify each of a plurality of features into at least one of a plurality of sleep stages. In addition, according to an embodiment of the present invention, the feature extraction model extracts features so that the feature classification model can easily classify which sleep stage it corresponds to, and the feature classification model better classifies the features delivered from the feature extraction model. It can be trained to do so (i.e., to classify well into specific sleep stages). In other words, through adversarial learning, the feature classification model can better classify the features delivered from the feature extraction model. In one embodiment, a feature classification model may be learned to facilitate class classification between features in order to perform sleep stage classification or sleep event classification well in response to features.

According to an embodiment of the present invention, the processor 130 models the feature extraction model through first learning information related to the first loss between the feature extraction model and the feature classification model, according to the learning results of the feature extraction model and the feature classification model. can be updated. Accordingly, the updated feature extraction model can extract features that allow the feature classification model to well classify the class (i.e., sleep stage). In other words, the updated feature extraction model can extract features so that each of the features related to the same sleep stage is clustered.

According to one embodiment, the discriminator model may be a neural network model that distinguishes whether each of the plurality of features delivered from the feature extraction model is a feature related to source data or a feature related to target data. For example, the discriminator model can determine whether the feature related to the input is a feature corresponding to sleep sound information acquired in a hospital or a feature corresponding to sleep sound information acquired in the real life of an individual user. That is, the discriminator model can use at least one of a plurality of features as input to distinguish whether the feature related to the input is a feature related to source data or a feature related to target data. Learning of the sleep analysis model using the discriminator model will be described in detail below.

The feature classification model can perform multi-epoch classification to predict sleep stages of multiple epochs by using spectrograms related to multiple epochs as input. Multi-epoch classification does not provide one sleep stage analysis information in response to the spectrogram of a single epoch (i.e., one spectrogram corresponding to 30 seconds), but spectrograms corresponding to multiple epochs (i.e. It may be used to estimate several sleep stages (e.g., changes in sleep stages according to time changes) at once by using a combination of spectrograms (each corresponding to 30 seconds) as input. For example, because breathing or movement patterns change more slowly than brain wave signals or other biological signals, accurate sleep stage estimation may be possible only by observing how the patterns change at past and future points in time. For a specific example, the feature classification model may receive 40 spectrograms (e.g., 40 spectrograms corresponding to 30 seconds each) as input and perform prediction for the 20 spectrograms located in the center. That is, all spectrograms from 1 to 40 are examined, but the sleep stage can be predicted through classification corresponding to the spectrograms corresponding to 10 to 20. The detailed numerical description of the number of spectrograms described above is only an example, and the present invention is not limited thereto.

That is, in the process of inferring sleep state information, rather than predicting sleep state information in response to each single spectrogram, spectrograms corresponding to multiple epochs are input so that both past and future information can be considered. By utilizing it, the accuracy of output can be improved. Meanwhile, the accuracy of the output can be improved by performing inference using not only the spectrogram but also information including changes along the time axis of frequency components corresponding to multiple epochs or information in the frequency domain as input.

According to one embodiment of the present invention, after the first sleep analysis based on actigraphy and HRV, the second analysis based on sleep sound information uses the sleep analysis model described above, and as shown in FIG. 8, the user When sleep sound information is input, the corresponding sleep stage (Wake, REM, Light, Deep) can be immediately inferred. In addition, secondary analysis based on sleep sound information can extract the time when sleep disorders (sleep apnea, hyperventilation) or snoring occurred through the singularity of the Mel spectrum corresponding to the sleep stage.

As shown in FIG. 9, the breathing pattern is analyzed in one converted frequency domain information or spectrogram or mel spectrogram, and when characteristics corresponding to sleep apnea or hyperpnea events are detected, The relevant point in time can be determined as the point in time when the sleep event occurred. At this time, a process of classifying snoring rather than sleep apnea or hyperpnea through frequency analysis may be further included.

As shown in Figure 10, the user's sleep image and sleep sound are acquired in real time, and the acquired sleep sound information is immediately converted into a spectrogram. At this time, a preprocessing process of sleep sound information may be performed. The spectrogram can be input into a sleep analysis model and sleep stages can be analyzed immediately.

Additionally, when a CNN or Transformer-based deep learning model is adopted as a feature classification model according to an embodiment of the present invention, the operation may be performed as follows.

According to one embodiment of the present invention, a spectrogram containing time series information can be used as an input to a CNN-based deep learning model, and a vector with reduced dimension can be output. By using this reduced-dimensional vector as an input to a Transformer-based deep learning model, a vector containing implied time series information can be output.

According to an embodiment of the present invention, the output vector of the Transformer-based deep learning model is input to a 1D Convolutional Neural Network (1D CNN) so that the average pooling technique can be applied, and through averaging work on time series information, The process of converting time series information into an implied N-dimensional vector can also be performed. In this case, the N-dimensional vector containing time series information corresponds to data that still contains time series information, although there is only a difference in resolution from the input data.

According to an embodiment of the present invention, prediction of various sleep stages can be performed by performing multi-epoch classification on a combination of N-dimensional vectors containing output time series information. In this case, continuous prediction of sleep state information can be performed by using the output vectors of Transformer-based deep learning models as input to a plurality of FC (Fully Connected layers).

Additionally, when a deep learning model based on ViT or Mobile ViT is employed in the feature classification model according to an embodiment of the present invention, the operation can be performed as follows. Figure 24 is a diagram for explaining the structure of a sleep analysis model using a natural language processing model according to an embodiment of the present invention.

According to one embodiment of the present invention, a spectrogram containing time series information can be used as an input to a Mobile ViT-based deep learning model, and a vector with reduced dimension can be output.

Additionally, according to an embodiment of the present invention, features can be extracted from each spectrogram as the output of a Mobile ViT-based deep learning model.

According to an embodiment of the present invention, a vector containing time series information can be output by using a vector with a reduced dimension as an input to the intermediate layer. The intermediate layer model may include at least one of the following steps: a linearization step to imply vector information, a layer normalization step to input the average and variance, or a dropout step to disable some nodes. there is.

According to an embodiment of the present invention, overfitting can be prevented by performing a process of outputting a vector containing time series information by using a reduced-dimensional vector as an input to the intermediate layer.

According to an embodiment of the present invention, sleep state information can be output by using the output vector of the intermediate layer as an input to a ViT-based deep learning model. In this case, sleep state information corresponding to information on the frequency domain containing time series information, a spectrogram, or a mel spectrogram can be output.

In addition, according to an embodiment of the present invention, sleep state information corresponding to a series of frequency domain information, spectrogram, or mel spectrogram containing time series information can be output.

Meanwhile, in the feature extraction model or feature classification model according to an embodiment of the present invention, various artificial intelligence models in addition to the above-mentioned AI models may be employed to perform learning or inference, and specific descriptions related to the types of artificial intelligence models described above may be provided. is merely an example, and the present invention is not limited thereto.

Unsupervised or semi-supervised learning method of sleep analysis model according to embodiments of the present invention

The data set according to an embodiment of the present invention may be composed of labeled data acquired in a specific environment (preferably, a polysomnography environment), but may be composed of labeled data acquired in a different environment (preferably, a polysomnography environment). It may be composed of unlabeled data obtained from the environment, etc. The specific description of the environment described above is merely an example, and the present invention is not limited thereto.

Supervised learning is possible when learning using labeled data, which is labeled learning data in which the correct answer is labeled, but unsupervised learning is necessary when learning using unlabeled data in which the correct answer is not labeled. Hereinafter, the present invention will be implemented. Unsupervised learning models, etc. according to examples will be explained.

Consistency Training using noise in the target environment

Consistency Training is a type of semi-supervised learning model. Consistency Training according to an embodiment of the present invention involves intentionally adding noise to one data, and intentionally adding noise to one data. This may be a method of performing learning with data that has not been added.

Additionally, Consistency Training according to an embodiment of the present invention may be a method of performing learning by generating data of a virtual sleep environment using noise of the target environment.

Noise intentionally added according to an embodiment of the present invention may be noise of the target environment, where the noise of the target environment may be noise obtained in an environment other than polysomnography, for example.

Hereinafter, for convenience, data to which noise is intentionally added is referred to as corrupted data. Corrupted data may preferably refer to data to which noise of the target environment has been intentionally added.

In addition, for convenience hereinafter, data to which noise has not been intentionally added will be referred to as clean data. Here, no noise was intentionally added to the clean data, but noise may actually be included.

Clean data used for Consistency Training according to an embodiment of the present invention may be data acquired in a specific environment (preferably, a polysomnographic environment), and corrupted data may be data obtained in a different environment or target environment (preferably, This may be data obtained in an environment other than polysomnography.

Corrupted data according to an embodiment of the present invention may be data in which noise acquired in another environment or a target environment (preferably, an environment other than polysomnography) is intentionally added to clean data.

In Consistency Training, when clean data and corrupted data are input to the same deep learning model, a loss function or consistency loss is defined so that each output is the same, learning to achieve consistent prediction. This can be done.

In the process of adding noise to acquire corrupted data according to an embodiment of the present invention, a problem may arise that the length of each acquired noise is different. In this case, when performing learning on multiple spectrograms, the noise sampling method may be used. Describe about it.

According to an embodiment of the present invention, there may be at least 9 types of noise, and thousands of pieces of sound information may be applied to each type of noise.

When a plurality of spectrograms according to an embodiment of the present invention are input, each spectrogram may be divided into 30 second units and 40 (data corresponding to a total time interval of 20 minutes) may be input to the deep learning model. there is. In order to match the input data and time interval, noise can be randomly sampled to correspond to a time interval of 20 minutes (e.g., 5 minutes, 9 minutes, 4 minutes, 7 minutes, etc.). If the total time interval of the sampled noises exceeds 20 minutes, the portion exceeding 20 minutes may be excluded.

A process of converting noises into a spectrogram is also performed on noises corresponding to the same time interval (e.g., 20 minutes) as the time interval of the clean data spectrogram according to an embodiment of the present invention, and a random change is performed on the clean data spectrogram. Corrupted data can be obtained by adding .

In the process of arbitrarily adding noise to a spectrogram according to an embodiment of the present invention, the spectrogram may be a Mel spectrogram to which a Mel scale is applied. Meanwhile, in the process of adding noise according to an embodiment of the present invention, the noise may be noise or spectrogram of the same domain as the acoustic information, or noise of the same domain as the Mel spectrogram to which the Mel scale is applied. Here, the domain of acoustic information may be a domain having amplitude, phase, and frequency.

Additionally, the domain of the spectrogram or mel spectrogram according to an embodiment of the present invention may be a domain having amplitude and frequency.

According to an embodiment of the present invention, when noise is converted to the same domain as the spectrogram or Mel spectrogram and added, the addition process can be performed by assigning a random phase to the noise, through which the inversion of the data in the Mel state can be performed. By making it more difficult, individual privacy can be protected by maintaining de-identification of data, while learning time can be shortened by lowering the computational amount of learning.

When the acquired corrupted data and clean data are used as inputs to the same deep learning model, training them so that the outputs are the same can be said to be a consistency training method. Meanwhile, the detailed description regarding the above-described time interval and number of spectrograms is only an example to aid understanding of the present invention, and the present invention is not limited thereto.

Unsupervised Domain Adaptation (UDA)

Figure 22 is a diagram for explaining Unsupervised Domain Adaptation (UDA) according to an embodiment of the present invention. According to one embodiment of the present invention, the processor 130 may perform learning on an artificial intelligence model including a feature extraction model, a feature classification model, and a discriminator model.

UDA according to an embodiment of the present invention can sufficiently learn an AI model through supervised learning using labeled data, and then conduct additional learning only with additional unlabeled data.

Alternatively, UDA according to an embodiment of the present invention may be configured and performed through primary learning and secondary learning.

In the first learning of UDA according to an embodiment of the present invention, unlabeled data and labeled data can be used.

In the secondary learning of UDA according to an embodiment of the present invention, unlabeled data can be used.

The primary learning of UDA according to an embodiment of the present invention may include performing learning to extract commonalities between data by using data acquired in different environments as input to a sleep analysis model. In addition, it may include performing learning to distinguish and classify differences between the input data by using data acquired in different environments as input to one sleep analysis model and using commonalities between the extracted data as input to the deep learning model. there is. The primary learning of UDA according to an embodiment of the present invention may include learning to extract common data (eg, human sleep sound information) among labeled data and unlabeled data using a feature extraction model.

Labeled data used for primary learning of UDA according to an embodiment of the present invention may be data acquired in a specific environment (preferably, a polysomnographic environment), and unlabeled data may be data acquired in a different environment or target environment (preferably a polysomnography environment). In other words, it may be data obtained in an environment other than polysomnography.

Labeled data used for primary learning of UDA according to an embodiment of the present invention may be sleep sound information obtained from a specific race (e.g., Korean), and unlabeled data may be sleep sound information obtained from a specific race (e.g., yellow race, black race, white race). , Hispanic, etc.), it may be sleep sound information obtained from people.

Labeled data used for primary learning of UDA according to an embodiment of the present invention may be sleep sound information obtained from a specific gender (e.g., male), and unlabeled data may be sleep sound information obtained from a different gender (e.g., female). It may be acoustic information.

Labeled data used for primary learning of UDA according to an embodiment of the present invention may be sleep sound information obtained from a specific age group (e.g., 20s), and unlabeled data may be sleep sound information obtained from a specific age group (e.g., 10s or 30s). It may be sleep sound information obtained from people in their 40s, etc.

Labeled data used for primary learning of UDA according to an embodiment of the present invention may be sleep sound information obtained from a specific body composition index group (e.g., a group with a body mass index BMI of 25 or more), and unlabeled data may be sleep sound information obtained from a specific body composition index group (e.g., a group with a body mass index of 25 or more) It may be sleep sound information obtained from an index group (for example, a group with a body mass index of less than 25 BMI).

Labeled data used for primary learning of UDA according to an embodiment of the present invention may be sleep sound information obtained from a group with sleep disease (e.g., a group of patients with sleep apnea), and unlabeled data may be sleep sound information obtained from a group without sleep disease. It may be sleep sound information obtained from (for example, a group without sleep apnea disease).

Labeled data used in the first learning of UDA according to an embodiment of the present invention may be sleep sound information obtained from a group with respiratory disease (e.g., a group of asthma patients), and unlabeled data may be sleep sound information obtained from a group without respiratory disease (e.g., a group of asthma patients). For example, it may be sleep sound information obtained from a group without asthma disease.

Labeled data used for primary learning of UDA according to an embodiment of the present invention is not limited to applying each environment or characteristic described above individually, but is obtained from a combination of target groups representing one or more environments or characteristics. It may be acoustic information.

In addition, the unlabeled data used for primary learning of UDA according to an embodiment of the present invention is not limited to applying each environment or characteristic described above individually, but is obtained from a combination of target groups representing one or more environments or characteristics. It may be acquired acoustic information.

In addition, the primary learning of UDA according to an embodiment of the present invention uses data acquired in a specific environment and data acquired in a different environment or target environment as input to a feature extraction model, and uses common data extracted as a discriminator ( Discriminator) may include learning to classify whether the input data was obtained from a specific environment, another environment, or the target environment as input to the model.

In this case, in the first learning of UDA according to an embodiment of the present invention, the feature extraction model is learned to output only commonalities between input data, so data acquired from a specific environment and data acquired from a different environment or target environment It can play a role in weakening the classification between data, and the discriminator model can play a role in strengthening the classification between data, so for this purpose, the loss applied to each model can be set to the opposite way.

In an embodiment of the present invention, unlike data acquired from a specific environment, data acquired from another environment or target environment may not have labeling related to sleep state information. Accordingly, learning of sleep state information may be separately performed using data labeled with sleep state information obtained from a specific environment (eg, a polysomnography environment) through a feature extraction model or feature classification model. Here, the feature extraction model or feature classification model that performs learning on sleep state information by inputting the above labeled data will be referred to as a Classifier for convenience.

If such learning is properly performed, the feature extraction model uses data acquired from a specific environment and data acquired from another environment or target environment as input, and is trained to extract only the commonalities between the input data, so feature extraction Sleep state information can be output even if the output value of unlabeled data obtained from another environment or target environment among the model's output data is entered into the Classifier.

In summary, the primary learning of UDA according to an embodiment of the present invention is obtained from a specific environment (e.g., polysomnography environment) or another environment or target environment (e.g., environment other than polysomnography) through a feature extraction model. Extracting or outputting common features of one data and inputting the output common features into a discriminator model to perform learning to classify the differences between data obtained from a specific environment and data obtained from another environment or target environment. On the other hand, by learning a classifier that acquires sleep state information from data acquired from a specific environment, data acquired from another environment or target environment is input into the feature extraction model and the extracted information is input into the classifier, although it is labeled Even if there is no data, learning can be performed to output sleep state information.

In addition, the output data of the discriminator model according to an embodiment of the present invention may be recycled in the primary learning of UDA and may be used in various ways, such as being used as a correction value for the final output of the classifier.

During the primary learning of UDA according to an embodiment of the present invention, in the process of learning sleep state information from data obtained by the classifier from a specific environment, data acquired from another environment or target environment (e.g., home environment, etc.) is input. Also, since data acquired from other environments or target environments do not have labels for sleep stages, learning using data acquired from other environments or target environments may not be performed.

When the first learning of UDA according to an embodiment of the present invention is ideally performed, the feature extraction model can extract commonalities between data, so the classifier with the output commonalities as input performs clustering (clustering) when outputting sleep state information. Clustering), it is possible to distinguish well whether the sleep stage corresponding to the input data is REM sleep, light sleep, wake state, or deep sleep stage, but Conditionalization is used to better perform such clustering. Secondary learning can also be performed using techniques such as Entropy.

In secondary learning of UDA according to an embodiment of the present invention, learning can be performed using unlabeled data as input to a deep learning model.

When unlabeled data is input into a deep learning model, information such as prediction and confidence can be output. Here, a higher reliability may mean that the class information contained in the predicted value is more reliable.

In the secondary learning of UDA according to an embodiment of the present invention, a process of learning is performed using loss to make the class information included in the predicted value of the sleep state information or sleep stage information of the classifier more reliable. In this case, It can include a method of self-learning using unlabeled output data.

The specific description of the various environments or characteristics described above is merely an example, and the present invention is not limited thereto.

Additionally, according to an embodiment of the present invention, the processor 130 may utilize a feature extraction model to extract a plurality of features corresponding to each of a plurality of source data and a plurality of target data. The processor 130 may generate a plurality of converted information (e.g., information on the frequency domain or a spectrogram) corresponding to a plurality of source data and a plurality of converted information (e.g., information on the frequency domain or a spectrogram) corresponding to a plurality of target data. Gram) can be processed as input to a feature extraction model to extract multiple features. That is, the plurality of features related to the output of the feature extraction model may include a plurality of features related to source data and a plurality of features related to target data.

In one embodiment, since the plurality of source data and the plurality of target data are data related to different domains, the converted information (e.g., information in the frequency domain or spectrogram) corresponding to the plurality of source data is generated for each epoch. Although the extracted features are labeled with information about the sleep stage, the information corresponding to the plurality of target data may not be labeled with the information about the sleep stage in the extracted features for each epoch. In other words, sleep stage classification may be difficult because features corresponding to target data containing various noises do not have labeled information.

Accordingly, according to an embodiment of the present invention, the processor 130 adjacent each of the plurality of first features corresponding to the plurality of source data and the plurality of second features corresponding to the plurality of target data in the vector space. However, each of the plurality of first features and the plurality of second features may be clustered and arranged by class.

According to an embodiment of the present invention, the first features corresponding to a plurality of source data and the second features corresponding to a plurality of target data each include various features (e.g., features of sleep stages such as light sleep, REM sleep, etc. ) may include information about. Here, the second features corresponding to the plurality of target data do not have labeled information, but when they are well mixed based on the first features corresponding to the plurality of source data, as the first features are classified, the second features Since classification may become possible, analysis of sleep acoustic information (i.e., multiple target data) including various noises may also become possible. In other words, it may be important that the second features be mapped to blend well with the first features and facilitate classification between classes, based on the first features in which labeled information exists.

Specifically, the second features can be easily mapped for classification between classes. For example, when the first features and the second features are far away in the vector space, when classifying each class based on the labeling information of the first features (e.g., classifying the class of the first features through an imaginary line) case), features corresponding to different classes of the second features may be classified into the same class, or features corresponding to the same class may be classified into different classes. In other words, the present invention allows each of the first and second features to be arranged adjacent to each other in a vector space, and a feature extraction model can be trained so that each feature is clustered and arranged by class.

To this end, the processor 130 may transfer a plurality of features to each of the feature classification model and the discriminator model.

According to one embodiment, the feature classification model may be a neural network model that classifies a plurality of features into each of one or more sleep stages. The feature classification model may be a neural network model learned to predict sleep stages in response to features. In an embodiment, the processor 130 may generate a feature classification model by performing learning on a neural network using label information matched to each feature. For example, the feature classification model may include a fully connected layer and may be a model that classifies features into at least one of the sleep stages. For example, when the feature classification model inputs a feature corresponding to the first spectrogram, the feature may be classified as shallow sleep (eg, first sleep stage).

Meanwhile, according to an embodiment of the present invention, the processor 130 uses the first learning information related to the first loss between the feature extraction model and the feature classification model according to the learning results of the feature extraction model and the feature classification model. The extraction model can be updated.

According to one embodiment, the processor 130 may obtain second learning information from the discriminator model. The second learning information may be related to adversarial learning between the feature extraction model and the discriminator model.

The processor 130 may perform learning through the second loss of the feature extraction model and the discriminator model. The second loss may refer to the loss related to adversarial learning between the feature extraction model and the discriminator model.

For a specific example, when the discriminator model inputs the first feature corresponding to source data, it outputs a probability value close to 1, and when it inputs the second feature corresponding to target data, it outputs a probability value close to 0. It can be learned to print. The sum of the difference between the output value when inputting the first feature and 1, and the difference between the output value when inputting the second feature and 0 may be the loss (or loss function) of the discriminator model. The purpose of the feature extraction model is to deceive the discriminator model (i.e., make it difficult to distinguish between the first and second features) so that when the features generated by the feature extraction model are input to the discriminator model, they come out close to 1. It can be learned. The error between the output value and 1 may be the loss of the feature extraction model. That is, each model can be learned by the processor 130 in a way that minimizes loss. In other words, the processor 130 may perform training on the adversarial neural network by updating the parameters of the feature extraction model and discriminator model in a direction that minimizes adversarial loss.

Additionally, in one embodiment of the present invention, the processor 130 generates a second feature as close to the first feature as possible through a feature extraction model, and determines the second feature as a feature related to target data through a discriminator model. You can update the parameters of each model to increase the probability of success.

That is, the processor 130 updates the parameters of the feature extraction model by using the second adversarial loss between the feature extraction model and the discriminator model as second learning information, so that the feature extraction model has similar spectrograms corresponding to the two domains. Features (i.e., features whose positions are close in the vector space) can be output. In other words, the feature extraction model updated through the second learning information can extract the first features related to the source data and the second features related to the target data in a vector space so that they are well mixed without distinction, regardless of the domain. .

As described above, when the feature extraction model updated through the first learning information and the second learning information inputs the spectrogram related to the source data and the target data, the processor 130 selects the first features and the second features. Each of these features can be arranged adjacent to each other in the vector space, but each feature can be clustered and arranged by class.

According to an embodiment of the present invention, second features without labeling information are appropriately arranged based on first features with labeling information, so that as the first features are classified, classification of the second features becomes possible. You can. Accordingly, analysis of sleep sound information (i.e., multiple target data) including various noises may also be possible.

According to an embodiment of the present invention, the processor 130 may generate a plurality of source sub-data and a plurality of target sub-data by dividing each of the plurality of source data and the plurality of target data into predetermined sample units.

Additionally, in one embodiment of the present invention, the processor 130 processes information or spectrograms in the frequency domain corresponding to each of a plurality of source sub-data and a plurality of target sub-data as input to a feature extraction model to generate one or more sample features. can be created.

Specifically, rather than processing information or spectrograms in the frequency domain corresponding to each of the source data and target data as input to the feature extraction model, the processor 130 divides each data into samples and Information or spectrograms on the corresponding plurality of frequency domains can be processed as input to each of the plurality of feature extraction models. In an embodiment, each of the plurality of feature extraction models may be characterized by sharing parameters. That is, a plurality of feature extraction models may be updated to have the same performance.

Spectrograms corresponding to each sample may generate features as they independently pass through each feature extraction model, and each of the generated features may be passed to the discriminator model. In this case, the discriminator model receives each feature corresponding to the sample unit.

Raw sleep sound information according to embodiments of the present invention is sequential (or time series) data according to time, and may be large-capacity data. When generating information including changes along the time axis of the frequency components or information in the frequency domain (e.g., a spectrogram) based on sequential data, and passing the generated spectrogram to the discriminator model, the discriminator model Since the transmitted spectrogram must be divided into epoch units and a decision corresponding to each epoch unit (e.g., judgment whether it is a feature related to the source data or a feature related to the target data) must be performed, the learning information (or learning amount) to be learned is It can be aggravating. In other words, when features are extracted corresponding to the entire spectrogram that has not been divided into samples and transmitted to the discriminator model, the learning efficiency of the discriminator model may be reduced.

Accordingly, the processor 130 may divide data (eg, a spectrogram) into samples, extract features corresponding to each sample, and input features for each sample into a discriminator model. This makes it possible to learn the discriminator model with less data through sample to sample, and can improve the overall model performance through efficient learning.

According to one embodiment of the present invention, the processor 130 may refine the feature classification model through decision boundary iterative refinement learning using a plurality of target data. Refining the feature classification model may mean using features corresponding to target data as a classification standard rather than features corresponding to source data in the process of classifying features into each class. In other words, this may mean transforming the decision boundary (i.e., classification boundary) of features corresponding to source data into a boundary based on features corresponding to target data. Processor 130 may gradually push the decision boundary out of the data density region by minimizing the target-side cluster assumption violation loss.

Specifically, the processor 130 may perform decision boundary iterative refinement learning using a teacher network. Decision boundary iterative refinement learning may be learning to improve the placement of decision boundaries based on minimizing conditional entropy related to the output of each teacher network and student network. In a specific embodiment, the processor 130 may input spectrograms corresponding to a plurality of target data into each of the student network and the teacher network. In this case, each of the student network and teacher network may be configured to include a feature extraction model and a feature classification model. The processor 130 can learn to improve the arrangement of the decision boundary through conditional entropy related to the output of the student network and the output of the teacher network.

Accordingly, the feature classification model can be refined to transform the decision boundary so that classification, which was performed based on the decision boundary of features corresponding to the source data, is performed based on the features corresponding to the target data. Since the sleep analysis model of the present invention must be equipped to provide analysis information on sounds containing a lot of noise in the real lives of general users, as described above, when the features corresponding to the target data are based on the decision boundary, Accuracy in calculating sleep state information can be improved. In other words, through decision boundary iterative refinement learning, the feature classification model can output sleep state information with improved accuracy in response to features related to sleep acoustic information including various noises.

According to one embodiment of the present invention, the processor 130 may generate a sleep analysis model through a learning model in response to the time when learning is completed. Specifically, based on the updated learning model through adversarial learning between the feature extraction model and the feature classification model and adversarial learning between the feature extraction model and the discriminator model, a sleep analysis model was developed based on the learned feature extraction model and feature classification model. can be created. That is, the sleep analysis model can be constructed through a feature extraction model and a feature classification model in the updated learning model, as shown in FIG. 17A or 17B.

The sleep analysis model according to an embodiment of the present invention is created through an adaptive learning process, that is, a training model updated through first learning information and second learning information, thereby providing improved accuracy even for acoustic data containing various noises. Through this, you can predict your sleep state.

In one embodiment of the present invention, the processor 130 may obtain sleep sound information from the user terminal 10 and provide sleep state information corresponding to the acquired sleep sound information. The processor 130 may generate sleep state information corresponding to sleep sound information using a sleep analysis model.

In this case, the sleep analysis model can perform robust predictions even on acoustic data containing a variety of noises through the adaptive learning described above, allowing users to easily monitor their sleep status in their daily environment. It has the advantage of being able to obtain analysis information. In other words, analysis of one's sleep status in a general home environment, even without directly visiting a specialized medical institution, without cost, without having to have separate special equipment other than equipment capable of acquiring sound, or without creating a special sleep environment. Information can be provided.

Semi-supervised learning using pseudo labels

Semi-supervised learning according to an embodiment of the present invention may mean performing learning of a deep learning model by using data output by a deep learning model inputting unlabeled data as a pseudo label.

By inputting unlabeled data into a deep learning model, information such as prediction and confidence can be output. Here, a higher reliability may mean that the class information contained in the predicted value is more reliable.

Semi-supervised learning according to an embodiment of the present invention treats the prediction value output from the deep learning model as a new label (pseudo label) when unlabeled data is input into a deep learning model and uses it as a basis. You can learn about sleep state information.

Semi-supervised learning according to an embodiment of the present invention can perform augmentation preprocessing on images to perform learning. Augmentation preprocessing can be weakly-augmented or strongly-augmented.

According to an embodiment of the present invention, the weakly-augmented method is a method that modulates the image relatively little, and one or more augmentation techniques can be used. Weakly-augmented augmentation techniques may include data augmentation or pitch shifting augmentation techniques (preferably, a technique in which the pitch is shifted in the range of 10% to 20%).

According to an embodiment of the present invention, the strongly-augmented method is a method of modulating an image relatively heavily, and one or more augmentation techniques may be used. The strongly-augmented augmentation technique may include one or more of data augmentation, TUT augmentation, or noise-added augmentation techniques.

The learning method according to an embodiment of the present invention may include a method of going through an augmentation preprocessing process for an image and then learning it by using it as an input to a deep learning model. Using weakly-augmentated image information as an input to a deep learning model By using the prediction output as an input as a pseudo label, a method of performing learning again based on it can be included. While learning is in progress using the weakly-augmented augmentation technique according to an embodiment of the present invention, moving average technique and weighted average are used to reflect intermediate learning information in the final learning model. Average) technique, Weighted Moving Average technique, or Exponential Weighted Moving Average technique can be used.

According to an embodiment of the present invention, a highly reliable pseudo label that can be generated by performing inference using data using a weakly-augmented augmentation technique as input to a deep learning model can be obtained.

In the learning method according to an embodiment of the present invention, strongly-augmented data augmentation is performed by using pseudo labels obtained by performing unsupervised learning through weakly-augmented data augmentation as data labels. Through this, supervised learning can be performed. By performing supervised learning using pseudo labels, more information can be learned using relatively heavily modulated images as input. In addition, supervised learning can be performed without data labeling to learn more data.

In addition, the learning method according to an embodiment of the present invention involves using data acquired from a target environment (e.g., an environment other than polysomnography, etc.) or a target target group (e.g., a target group without sleep diseases, etc.) of a deep learning model. The distribution of the predicted values of the sleep state (e.g., predicted values of the REM sleep stage, predicted values of the wake state, predicted values of the light sleep stage, predicted values of the deep sleep stage, etc.) output as a result of learning as input is determined in a specific environment. Data acquired from (e.g., polysomnography environment, etc.) or comparison group (e.g., group of sleep disorder patients, etc.) as input to a deep learning model to match the distribution of predicted values of sleep state output as a result of learning. It may include a method of performing tuning so that it is formed.

In this case, data acquired in a specific environment (e.g., polysomnography environment, etc.) is used as input to a deep learning model, and the distribution of the predicted value of the output sleep state is different in an environment or target environment (e.g., an environment other than polysomnography, etc.) ), the distributions of the predicted values obtained may not completely match each other, but may include a tuning method so that the modeling data is formed in a consistent direction.

Un/Self-Supervised learning

Un-Supervised learning or Self-Supervised learning according to an embodiment of the present invention refers to a method of performing dictionary learning so that deep learning can increase the reliability of the prediction value for image information even if there is no label in the image information in the image domain. You can. In this case, it may include a method of damaging part of the image information and then performing learning to predict the damaged part of the image information.

The learning method according to an embodiment of the present invention may include first performing dictionary learning using unlabeled data as input to a deep learning model, and then performing additional learning using labeled data as input.

As a result of training a deep learning model that performed dictionary learning according to the Un-Supervised learning or Self-Supervised learning method according to an embodiment of the present invention to perform the originally intended task by inputting a smaller number of labeled data, the dictionary The reliability of the predicted values of the learned deep learning model can be further increased.

Meanwhile, analysis of sleep state information based on sound information may include a step of identifying patterns for sleep sounds, such as breathing and body movements. However, because the characteristics of sleep sound patterns are reflected over time, it may be difficult to fully understand them with only a short snapshot of sound data at a specific point in time. Therefore, in order to model acoustic information, analysis must be performed based on the time series characteristics of acoustic information, and there has been a demand for applying semi-supervised learning methods to such time series data.

In addition, unlabeled sound information may include environmental sensing information, which is sound information acquired by the user through the user terminal 10. This environmental sensing information includes unlabeled data (e.g., life There has been a demand for new approaches to modeling data acquired in environments where quality control may be lacking (noise, sounds from multiple people, music, sounds of nature, etc.).

Therefore, according to an embodiment of the present invention, a semi-supervised learning method based on sequential consistency loss that considers the time-series characteristics of sleep sound information can be provided.

Additionally, in an embodiment, a semi-supervised contrastive learning (SSCL) method may be provided to process out-of-distribution (OOD) data from unlabeled information. .

Results of evaluating the semi-supervised learning and semi-supervised contrastive learning methods according to embodiments of the present invention based on various data sets, including labeled acoustic data sets and acoustic data sets from polysomnography (PSG) , it was found that the performance of the sleep analysis model is robust even in real environments, and that there is a possibility of generalization of the sleep analysis model according to the embodiment of the present invention. Hereinafter, the semi-supervised learning and semi-supervised contrast learning methods for processing sleep acoustic information in a real environment, including both noise characteristics and time series characteristics, will be described in detail using drawings and formulas.

A sleep analysis method based on acoustic information that requires analysis of temporal or time-series information may consist of a sequence prediction task. here

A series of Mel spectrogram samples, denoted by , are input, and the corresponding sequence of sleep stage labels is

can be predicted or output. Here, x _i and y _i mean the ith sample of x and y, which are a series of information, respectively, and N _s means the number of samples in the sequence.

y _i , the label of the sleep stage, can appear as a class of four possible sleep stage information: wake stage, REM sleep stage, light sleep stage, and deep sleep stage. It can be expressed as a one-hot label. Meanwhile, according to an embodiment of the present invention, a sequence-to-sequence system consisting of a backbone network for low-level feature extraction and a head for learning the time-series correlation between Mel spectrograms model is available.

According to the embodiment, based on this, the backbone network and the head network can be replaced with transformer-based artificial intelligence models such as MobileViTV2 and ViT, respectively, as shown in FIG. 17B. These models are sequenced

It generates predictions for the logit of sleep stages expressed as , and the supervised baseline is trained exclusively using cross-entropy loss (L _SUP ).

Learning method based on sequential consistency loss

To achieve high performance in real-life environments rather than laboratory or hospital environments, unlabeled data (

) is important to use. For consistency training according to an embodiment of the present invention, two different augmented samples from u _i are

, and the sleep analysis model is the logit of the sleep stage corresponding to each sample.

can be output. Here, the consistency loss (L _C ) for each sample as shown in [Equation 1] below can be used. In [Equation 1], Jensen-Shannon divergence can be used, and B _u means the batch size of the unlabeled sequence.

[Equation 1]

On the other hand, consistency loss ( _LC : consistency loss) can make the sleep analysis model more generalizable for predicting sleep stages for each sample, but it utilizes the temporal correlation or time-series information of sleep acoustic information and corresponding labels. This can be difficult to do. Therefore, according to an embodiment of the present invention, sequential consistency loss (L _SC ) that matches the similarity of the prediction sequence can be used, as shown in Equation 2 below. In [Equation 2], ° is the Hadamard power, and ⊙ means the element-by-element product of the two matrices averaged by the value.

[Equation 2]

According to one embodiment of the present invention, cosine similarity may be adopted between the logit of the i-th sample and the j-th sample in order to predict the degree of sleep stage variation over time. Cosine similarity is calculated by dividing the dot product of two vectors by the product of the sizes of the two vectors, and may indicate the degree of similarity or relationship between vectors measured using the cosine value of the angle between the two vectors in the dot product space.

and

is the representation of two different augmented sequences generated from the same sequence, u _s .

It refers to a symmetric cosine similarity matrix. Additionally, a weighting mask matrix W that assigns a higher weight to a pair of nearby samples than to a pair of distant samples can be defined as shown in [Equation 3] below.

[Equation 3]

In [Equation 3], w _min means the minimum weight value of the furthest pair. Therefore, according to one embodiment of the present invention, the loss can be forced so that predictions for two different augmentation results from the same sequence have similar sequential tendencies.

Figure 30 is a diagram illustrating an example of a learning method based on consistency loss or sequential consistency loss when the number of samples in the sequence is 6, according to an embodiment of the present invention. In Figure 30, the upper triangular matrices of C ^a and W are displayed to facilitate understanding of the two consistency losses.

Sequential shown in the upper part of Figure 30

included in

inside

Each is the predicted value for each epoch of the sleep analysis model. Similarly, the sequential shown in the lower part of Figure 30

included in

inside

Each is the predicted value for each epoch of the sleep analysis model. Shown in Figure 30

and

may be the result of performing prediction on the same unlabeled sequential. Learning a model using Consistency Loss involves training a sleep analysis model by setting the loss to maintain consistency for these two result values.

On the other hand, when data forms a time series sequence, relationships within the sequence may be important when analyzing characteristics for each epoch. Therefore, the relationship between each predicted value within one sequence can be expressed as C. for example,

Is

and

It shows the relationship between them. This

inside

If applied to each,

inside

An upper triangular matrix up to can be constructed (this

It can be expressed as). Likewise,

inside

from each

inside

An upper triangular matrix up to can be constructed (this

It can be expressed as).

For example, the predicted values for epoch 1 and epoch 2 are

There can be relationships even within a sequence,

There may be relationships within the sequence, and the sleep analysis model is trained by setting a loss to maintain consistency among those relationships.

Meanwhile, the upper triangular matrix represented by W in FIG. 30 represents weights according to importance. Even within the same sequence, the farther apart the sample order, the lower the correlation, so the weight can be set small. In one embodiment of the present invention, weights of 0.5, 0.625, 0.75, 0.875, and 1.0 are set among weights from 0 to 1, as expressed in the upper triangular matrix of W, but this is only a simple example, and the present invention does not apply to this. It is not limited. For example, the larger the number of samples included in the sequence, the smaller the difference between weight values can be set. Additionally, the minimum value of the weight may be set to a value other than 0.5, and the difference between the weight values may not necessarily be constant.

Additionally, the number of samples in the sequence shown in FIG. 30 is merely an example, and the present invention is not limited thereto. For example, it may consist of 40 samples or 14 samples. This may vary depending on what sleep state information the sleep analysis model predicts, and is not absolute.

Learning method based on semi-supervised contrastive loss

Hereinafter, the semi-supervised contrast learning method according to an embodiment of the present invention will be described in detail using mathematical equations and drawings. According to one embodiment of the present invention, in order to fully utilize unlabeled data that may contain out-of-distribution (OOD) samples, Class-aware Contrastive Learning (CCSSL) is used. Semi-Supervised Learning) method can be adopted.

Meanwhile, according to an embodiment of the present invention, not only can push and pull be performed between unlabeled data, but also push and pull can be performed including labeled data. there is. When performing this CCSSL method, learning can be performed so that data determined to be out-of-distribution (OOD) is reliably pushed out. Meanwhile, according to an embodiment, if labeled data is included in the process of pushing and pulling between data, the labeled data can serve as an anchor that remains stationary and does not move, and only the unlabeled data can be used. This can move. When the CCSSL method is performed using labeled data in this way, data judged to be within the distribution are better pulled out, and data judged to be out-of-distribution (OOD) are pulled out more easily. Since it can be pushed out better, the learning performance effect of the sleep analysis model can be further improved.

Referring to FIG. 31, x1 is information labeled as a deep sleep stage and can serve as an anchor. In the semi-supervised contrastive learning method according to an embodiment of the present invention, since u1 has sufficiently high confidence for the class of the deep sleep stage due to its clear and regular breathing pattern, u1 can be pulled as the anchor x1. At this time, a certain level of threshold can be viewed as the in-distribution threshold of the pseudo label, and if the reliability exceeds that threshold, it can be judged to be pulling.

On the other hand, since u3 has no evidence of being a Deep sleep stage class, it may be another class or Out-of-Distribution (OOD) data, pushing u3 away from anchor x1. . At this time, a certain level of threshold can be viewed as the in-distribution threshold of the pseudo label, and if it does not exceed the threshold, it can be judged to be pushed out. In other words, through this method, data that is judged to be out-of-distribution (OOD) can be pushed out.

On the other hand, in cases where the similarity with the anchor is judged to be neither high nor low, such as u2, pushing and pulling will not be performed because there is a risk of pushing data belonging to the same class or pulling data belonging to a different class. It may be possible.

The CCSSL method uses labeled data or pseudo labels as above to calculate supervised contrastive loss between unlabeled data samples, and calculates OOD in clusters representing features within the class. Samples can be excluded.

B _u unlabeled sequences have a total of

There are samples, and CCSSL can apply two strong augmentation methods to each unlabeled sample u _i . According to one embodiment of the present invention, the index of any augmentation sample is

If expressed as , the contrast loss of CCSSL is

It can be defined as: Here, the loss of anchor i can be expressed as Equation 4 below.

[Equation 4]

Here, referring to FIG. 17b, z _i is the embedding of anchor i obtained through a transformer-based artificial intelligence model (or ViT),

is the embedding of another augmented sample derived from the same unlabeled sample. In other words, z _i means a feature corresponding to one piece of input data (e.g., image information),

may mean that data augmentation is performed on z _i .

Meanwhile, · is a combustion symbol representing the inner product. Additionally, τ represents the temperature,

represents the index of the augmented sample associated with the same pseudo label. also,

is a re-weighting factor.

As can be seen from [Equation 4], if calculated to lower the loss value on the left side, the loss value on the right side is

Corresponds to the numerator of the base of log in the term

The part represents samples pulling together, and corresponds to the denominator of the base of log.

The parts can be interpreted as representing samples pushing each other. That is, the above claim

In part,

Since data augmentation is performed on z _i , it can be understood as being based on data (e.g., images) of the same class as z _i . Samples output based on data of the same class are pulled together. Learning can be performed. Additionally, the above protest

In part, z _i and z _j can be understood as being based on data of different classes (e.g., images), and learning can be performed so that samples output based on data of different classes are pushed together. .

Meanwhile, the right side of [Equation 4]

In port

The meaning of is that they are judged to be the same class. Since z _i and z _k , which are judged to be the same class, are trained to attract each other,

is located in the numerator part of the base of log. On the other hand, the denominator

In this part, like the denominator of the first term, it can be understood that learning is performed so that z _i and z _j based on data of different classes repel each other.

As can be seen from [Equation 4], learning can be performed to pull for samples that are not labeled but pseudo-labeled according to the above method.

On the other hand, in the presence of heavily contaminated unlabeled data, CCSSL may be unreliable because OOD samples in the unlabeled data are sampled with high confidence, which may cause confusion in class clustering. there is. Therefore, according to an embodiment of the present invention, in order to solve this problem, a semi-supervised contrast learning (SSCL) method that utilizes reliable labeled data as an anchor point for class clustering will be provided. You can.

In semi-supervised contrast learning (SSCL) according to an embodiment of the present invention, reliable positive and negative samples can be used by considering labeled samples as anchors. Here, a positive sample refers to a sample that exceeds the threshold of in-distribution data, and corresponds to a sample that is judged to be of the same class. Learning can be performed such that positive samples are pulled to labeled samples. Additionally, a voice sample refers to a sample that does not exceed a threshold called in-distribution data, and corresponds to a sample that is determined to be of a different class from the labeled sample. Learning may be performed so that speech samples are pushed from labeled samples. Accordingly, in the SSCL method according to an embodiment of the present invention, learning can be performed to push out OOD samples from the embedding cluster (cluster) within the class. Meanwhile, if it is not determined clearly whether the class of the sample is positive or negative in relation to the anchor, push and pull learning may not be performed.

Meanwhile, the contrastive loss for labeled anchor m can be defined as follows [Equation 5]. Here, m represents the index of a sample in a batch of B _l labeled sequences,

means the similarity between the ith augmented unlabeled sample i and the class y _m of the mth anchor.

[Equation 5]

According to one embodiment of the present invention, positive and negative samples can be constructed using only pseudo labels with high reliability. in other words,

and

can be defined as a set of unlabeled samples that are positively and negatively augmented, respectively. Here, referring to Figure 31,

and

represents the filtering threshold, respectively. As a result, the contrast loss of SSCL can be obtained as L _SSCL in [Equation 6] below. In [Equation 6], B _l means the batch size of the labeled sequence.

[Equation 6]

According to an embodiment of the present invention, SSCL can detach the gradient of the labeled embedding z _m . This is because the goal of SSCL is not to train the characteristics of labeled samples, but to push OOD samples in the unlabeled sequence away from labeled samples.

Finally, the overall training loss according to an embodiment of the present invention can be expressed as [Equation 7]. From here,

means the weight value applied to the corresponding loss L _A.

[Equation 7]

Performance of sleep analysis model based on semi-supervised learning

The sleep analysis model according to one embodiment of the present invention was learned or trained based on labeled data obtained from approximately 3,000 PSG tests in a laboratory environment and approximately 3,000 self-collected unlabeled data at home.

Additionally, the sleep analysis model according to an embodiment of the present invention was evaluated based on PSG in a laboratory environment, PSG in a home environment, and PSG audio data. Here, the generalization ability of the sleep analysis model was tested by evaluating its performance when compared to PSG audio data (PSG-Auido), an open dataset mainly composed of apnea patient data.

Specifically, in order to evaluate the results of the sleep analysis model according to an embodiment of the present invention, the N _S value, which is the number of samples in the sequence, was set to 40, and in a PSG test performed in a hospital environment, a sleep analysis technician performs the analysis every 30 seconds. When assigning labels, it was evaluated in the same way, considering that it usually checks for about ±10 minutes.

Additionally, the labeled batch size (B _l ) and unlabeled batch size (B _u ) were each set to 4. The weight value of each unsupervised learning loss is

,

and

were set to 1.5, 0.1, 0.1, and 0.1, respectively, for unsupervised training.

and

The filtering thresholds were set to 0.9 and 0.2, respectively. Meanwhile, the description of the specific numerical values mentioned above is merely an example, and the present invention is not limited thereto.

Figure 32 is a table comparing the analysis results of a sleep analysis model according to an embodiment of the present invention and the analysis results of a PSG test in a home environment. The SoundSleepNet row in Figure 32 is a sequence-to-sequence consisting of a head for learning the time-series correlation between a backbone network for low-level feature extraction and a mel spectrogram according to an embodiment of the present invention. -sequence) This refers to the results of applying the sleep analysis model to a PSG test in a home environment. Additionally, the SleepFormer row in FIG. 32 represents a sleep analysis model reflecting the unsupervised learning and/or semi-supervised learning method according to an embodiment of the present invention. Meanwhile, C, SC, CC, SS, and WA represent consistency, sequential consistency, CCSSL, SSCL, and weight average, respectively.

Additionally, Figure 33 is a table comparing sleep analysis results based on PSG audio data and analysis results of a sleep analysis model according to an embodiment of the present invention. The table at the top of Figure 33 is a table comparing sleep analysis results based on PSG-Audio, and the table at the bottom of Figure 33 is a table comparing sleep analysis results based on PSG data in a laboratory environment.

First, as shown in the analysis result comparison table shown in FIG. 32, changes in performance were evaluated by applying the semi-supervised learning methods according to an embodiment of the present invention one by one. The SleepFormer model, which is the guidance baseline, was constructed using a transformer-based artificial intelligence model, and the F1 score was 0.6332, which was confirmed to be improved by 0.0614 compared to SoundSleepNet. Additionally, by adding consistency loss (C: consistency loss) and sequential consistency loss (SC), it was confirmed that the F1 score improved significantly to 0.6597 and 0.6751, respectively.

In addition, by introducing SS (SSCL) according to an embodiment of the present invention, the F1 score was greatly improved to 0.6780, and when the weights of three models trained with different seeds were averaged and compared (WA), the total final result was It was confirmed that a score of 0.6804 was achieved, which was improved by 0.1085.

Meanwhile, as shown in the table shown in FIG. 33, the sleep analysis results based on PSG audio data were compared with the analysis results of the sleep analysis model according to an embodiment of the present invention. Supervissed, the first row of each table in FIG. 33, is the result of predicting sleep state information by inputting PSG audio data into a sleep analysis model according to an embodiment of the present invention, and Ours, the second row, is the result of predicting sleep state information according to an embodiment of the present invention. This is the result of predicting sleep state information by inputting PSG audio data into a sleep analysis model based on semi-supervised learning.

Here, the PSG-Audio dataset that is the input to the sleep analysis model in the table shown at the top of Figure 33 is a data distribution that the sleep analysis model did not encounter, that is, was not exposed during training, and this dataset is exposed to new data. It can be used to evaluate the generalization performance of sleep analysis models on unresolved data. Meanwhile, the PSG-Audio dataset mainly consists of severe apnea patients, so the sleep stage class distribution may be unbalanced.

Additionally, in the table shown at the bottom of FIG. 33, the PSG data set in a laboratory environment that is input to the sleep analysis model may represent the distribution of labeled sources.

As shown in Figure 33, when comparing the accuracy of the predicted results by inputting the PSG-Audio dataset into the supervised model and the semi-supervised model, it was confirmed that the accuracy of the semi-supervised model was improved by 0.0437. Meanwhile, the degree of accuracy improvement in the PSG dataset in a laboratory environment was relatively small, because the supervised baseline according to an embodiment of the present invention achieved good performance with a score of 0.7000 on already labeled source distribution data. Because.

Using this semi-supervised learning method, the sleep analysis model can be further improved through processing of time-series sleep sound information in an actual sleep environment. To briefly summarize, sequential consistency loss improves the temporal correlation of sleep analysis models, and semi-supervised contrastive loss improves the feature representation clustering with labeled samples, thereby improving the distribution. Accuracy can be improved by effectively filtering out-of-distribution (OOD) samples. In addition, it was confirmed that the sleep analysis model according to the embodiment of the present invention can show significant and consistent improvement effects in all data sets in the home environment, unexposed data, and labeled data sets.

Method for analyzing sleep state information using multimodal sleep information

One embodiment of a multimodal sleep state information analysis method (CONCEPT-A)

FIG. 26 is a flowchart illustrating a method for analyzing sleep state information including a process of combining sleep sound information and sleep environment information into multimodal data according to an embodiment of the present invention.

In order to achieve the purpose of the present invention, according to one embodiment, a method for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner includes a first method of acquiring sound information in the time domain related to the user's sleep. An information acquisition step (S100), a step of preprocessing the first information (S102), a second information acquisition step of acquiring user sleep environment information related to the user's sleep (S110), a step of performing preprocessing of the second information (S112), a combining step of combining multi-modal data (S120), a step of inputting multi-modal data into a deep learning model (S130), and a step of obtaining sleep state information as the output of the deep learning model (S140). can do.

According to an embodiment of the present invention, the first information acquisition step (S100) may acquire sound information in the time domain related to the user's sleep from the user terminal 10. Sound information in the time domain related to the user's sleep may include sound source information obtained from the sound source detection unit of the user terminal 10.

According to an embodiment of the present invention, in the step of performing data preprocessing of the first information (S102), sleep sound information in the time domain is converted into information including changes in the frequency component along the time axis or information in the frequency domain. can do. Additionally, information in the frequency domain may be expressed as a spectrogram, which may be a Mel spectrogram to which the Mel scale is applied. By converting to a spectrogram, user privacy can be protected and the amount of data processing can be reduced. In addition, information converted from sleep sound information in the time domain is visualized, and in this case, sleep state information can be obtained through image analysis by using it as input to an image processing-based artificial intelligence model.

According to an embodiment of the present invention, the step of performing data preprocessing of the first information (S102) may further include extracting features based on the acoustic information. For example, the user's sleep breathing pattern can be extracted based on the acquired acoustic information in the time domain. For example, the acquired acoustic information in the time domain can be converted into information including changes in frequency components along the time axis, and the user's breathing pattern can be extracted based on the converted information. Alternatively, acoustic information on the time domain may be converted into information on the frequency domain, and the user's sleep breathing pattern may be extracted based on the acoustic information on the frequency domain.

In this case, the converted information is visualized and can be used as input to an image processing-based artificial intelligence model to output information such as the user's breathing pattern.

According to an embodiment of the present invention, the step of performing data preprocessing of the first information (S102) may include a data augmentation process to obtain a sufficient amount of meaningful data to input sleep sound information into a deep learning model. there is. Data augmentation techniques may include pitch shifting (Pitch Shifting) augmentation, TUT (Tile UnTile) augmentation, and noise-added augmentation. The above-described augmentation technique is merely an example, and the present invention is not limited thereto.

According to an embodiment of the present invention, the time required for hardware to process data can be shortened by the method added by Mel Scale.

According to an embodiment of the present invention, the second information acquisition step (S110) of acquiring user sleep environment information related to the user's sleep may acquire user sleep environment information through the user terminal 10, an external server, or a network. . The user's sleep environment information may refer to information related to sleep obtained in the space where the user is located. Sleep environment information may be sensing information obtained in a space where the user is located using a non-contact method. Sleep environment information may be breathing movement and body movement information measured through radar. Sleep environment information may be information related to the user's sleep obtained from a smart watch, smart home appliance, etc. Sleep environment information may be a photoplethysmography signal (PhotoPlethysmoGraphy). Sleep environment information can be Heart Rate Variability (HRV) and heart rate obtained through PhotoPlethysmoGraphy (PPG), and photoplethysmography signals can be measured by smart watches and smart rings. there is. Sleep environment information may be an electroencephalography (EEG) signal. Sleep environment information may be an Actigraphy signal measured during sleep.

According to an embodiment of the present invention, the step of preprocessing the second information (S112) is a data augmentation process to obtain a sufficient amount of meaningful data to input the data of the user's sleep environment information into a deep learning model. It can be included.

According to an embodiment of the present invention, the preprocessing of the second information (S112) may include processing data of the user's sleep environment information to extract features. For example, when the second information is a photoplethysmography signal (PPG), heart rate variability (HRV) and heart rate can be extracted from the photoplethysmography signal.

According to an embodiment of the present invention, in the step of pre-processing the second information (S112), when the data of the user's sleep environment information is obtained as image information, the image information is subjected to TUT (Tile UnTile) augmentation and noise addition. May include augmentation. The above-described augmentation technique is merely an example of an augmentation technique for image information, and the present invention is not limited thereto. The user's sleep environment information may be information in various storage formats. Various methods may be employed to augment the user's sleep environment information.

According to an embodiment of the present invention, the step of combining the first information and the second information that have undergone a data preprocessing process into multimodal data (S120) combines the data to input the multimodal data to the deep learning model.

According to an embodiment of the present invention, a method of combining multimodal data may be combining preprocessed first information and preprocessed second information into data of the same format. Specifically, the first information may be acoustic image information in the frequency domain, and the second information may be heart rate image information in the time domain obtained from a smart watch. At this time, since the domains of the first information and the second information are not the same, they can be converted to the same domain and combined.

According to an embodiment of the present invention, a method of combining multimodal data may be combining preprocessed first information and preprocessed second information into data of the same format. Specifically, the first information may be acoustic image information in the frequency domain, and the second information may be heart rate image information in the time domain obtained from a smart watch. At this time, because the domains of the first information and the second information are not the same in order to be used as input to a deep learning model, each data can be labeled as being related to the first information and the second information.

According to an embodiment of the present invention, the step of combining multimodal data (S120) may be performed by performing first information augmentation, performing second information augmentation, and then combining. For example, the first information may be acoustic information in the user's time domain, and the second information may be a photoplethysmography signal (PPG), which may be combined into multimodal data. For example, the first information may be sound information on the user's time domain or a spectrogram converted from sound information on the time domain into sound information on the frequency domain, and the second information may be a photoplethysmography signal (PPG), , this can be combined into multimodal data.

According to an embodiment of the present invention, the step of combining multimodal data (S120) may be performed by performing first information augmentation and extracting and combining second information augmentation and features. For example, the first information may be the user's sound information in the time domain or a spectrogram converted from the sound information in the time domain to sound information in the frequency domain, and the second information may be the heart rate obtained from the photoplethysmography signal (PPG). This can be HRV or heart rate, and can be combined into multimodal data. According to an embodiment of the present invention, the step of combining multimodal data (S120) may be performed by performing first information augmentation and feature extraction, and performing second information augmentation. For example, the first information may be the user's breathing pattern extracted based on the user's acoustic information, and the second information may be heart rate variability (HRV) or heart rate obtained from the photoplethysmography signal (PPG). Rate), and this can be combined into multimodal data.

According to an embodiment of the present invention, in the step of combining multimodal data (S120), first information augmentation and feature extraction may be performed, and second information augmentation and feature extraction may be performed and combined. . For example, the first information may be a user's breathing pattern extracted based on the user's acoustic information, and the second information may be heart rate variability (HRV) or heart rate obtained from a photoplethysmography signal (PPG). and can be combined into multimodal data.

According to an embodiment of the present invention, the step of inputting multimodal combined data into a deep learning model (S130) can process the data into a matching form required for inputting the deep learning model to input multimodal combined data. there is.

According to an embodiment of the present invention, the step of acquiring sleep state information as an output of a deep learning model (S140) is to infer sleep state information by using multimodal combined data as an input to a deep learning model for inferring sleep state information. can do. Sleep state information may be information about the user's sleep state.

According to an embodiment of the present invention, the user's sleep state information may include sleep stage information expressing the user's sleep as a stage. Stages of sleep can be divided into NREM (non-REM) sleep and REM (rapid eye movement) sleep, and NREM sleep can be further divided into multiple stages (e.g., stages 2 of light and deep, and stages 4 of N1 to N4). You can. The sleep stage setting may be defined as a general sleep stage, but may also be arbitrarily set to various sleep stages depending on the designer.

According to an embodiment of the present invention, the user's sleep state information may include sleep event information expressing sleep-related diseases that occur during the user's sleep or behavior during sleep. Specifically, sleep event information that occurs during the user's sleep may include sleep apnea and hypopnea information due to the user's sleep disease. Additionally, specifically, sleep event information that occurs during the user's sleep may include whether the user snores, the duration of snoring, whether the user talks in his sleep, the duration of the sleep talk, whether he tosses and turns, and the duration of the tossing and turning. The user's sleep event information described is only an example for expressing events that occur during the user's sleep, and is not limited thereto.

An embodiment of a multimodal sleep state information analysis method (CONCEPT-B)

Figure 27 is a flowchart illustrating a method for analyzing sleep state information including the step of combining the inferred sleep sound information and sleep environment information into multimodal data according to an embodiment of the present invention.

In order to achieve the purpose of the present invention, according to one embodiment, a method for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner includes a first method of acquiring sound information in the time domain related to the user's sleep. Information acquisition step (S200), performing preprocessing of the first information (S202), inferring information about sleep by using the first information as input to the deep learning model (S204), user sleep related to the user's sleep A second information acquisition step of acquiring environmental information (S210), a step of preprocessing the second information (S212), a step of inferring information about sleep by using the second information as input to the deep learning model (S214), It may include a combining step of combining multi-modal data (S220) and a step of obtaining sleep state information by combining multi-modal data (S230).

According to an embodiment of the present invention, in the first information acquisition step (S200), sound information in the time domain related to the user's sleep may be acquired from the user terminal 10. Sound information in the time domain related to the user's sleep may include sound source information obtained from the sound source detection unit of the user terminal 10.

According to an embodiment of the present invention, in the step of performing data preprocessing of the first information (S202), temporal acoustic information on the time domain is converted into information including changes in the frequency component along the time axis or information on the frequency domain. can do. Additionally, information in the frequency domain may be expressed as a spectrogram, which may be a Mel spectrogram to which the Mel scale is applied. By converting to a spectrogram, user privacy can be protected and the amount of data processing can be reduced.

According to an embodiment of the present invention, the step of performing data preprocessing of the first information (S202) may include a data augmentation process to obtain a sufficient amount of meaningful data to input sleep sound information into a deep learning model. there is. Data augmentation techniques may include pitch shifting (Pitch Shifting) augmentation, TUT (Tile UnTile) augmentation, and noise-added augmentation. The above-described augmentation technique is merely an example, and the present invention is not limited thereto.

According to an embodiment of the present invention, the second information acquisition step (S210) of acquiring user sleep environment information related to the user's sleep may acquire user sleep environment information through the user terminal 10, an external server, or a network. . The user's sleep environment information may refer to information related to sleep obtained in the space where the user is located. Sleep environment information may be sensing information obtained in a space where the user is located using a non-contact method. Sleep environment information may be breathing movement and body movement information measured through radar. Sleep environment information may be information related to the user's sleep obtained from a smart watch, smart home appliance, etc. Sleep environment information may be Heart Rate Variability (HRV) and heart rate obtained through PhotoPlethysmoGraphy (PPG), and photoplethysmography signals can be measured by smart watches and smart rings. You can. Sleep environment information may be an electroencephalography (EEG) signal. Sleep environment information may be an Actigraphy signal measured during sleep.

According to an embodiment of the present invention, the preprocessing of the second information (S212) involves a data augmentation process to obtain a sufficient amount of meaningful data to input the data of the user's sleep environment information into a deep learning model. It can be included.

According to an embodiment of the present invention, in the step of pre-processing the second information (S212), when the data of the user's sleep environment information is obtained as image information, the image information is converted to TUT (Tile UnTile) augmentation. and noise-added augmentation. The above-described augmentation technique is merely an example of an augmentation technique for image information, and the present invention is not limited thereto. The user's sleep environment information may be information in various storage formats. Various methods may be employed to augment the user's sleep environment information.

According to an embodiment of the present invention, the step of inferring information about sleep by using the preprocessed first information as an input to a deep learning model (S204) involves inferring information about sleep by using the preprocessed first information as an input to a deep learning model. You can.

According to an embodiment of the present invention, a previously learned deep learning model can use inferred data as input for self-learning through inferred data.

According to an embodiment of the present invention, a deep learning sleep analysis model that infers information about sleep by using first information about sleep sounds as input may include a feature extraction model and a feature classification model.

Among the deep learning sleep analysis models according to an embodiment of the present invention, the feature extraction model is a one-to-one proxy task (Proxy) in which one spectrogram is input and learned to predict sleep state information corresponding to one spectrogram. It can be pre-trained by task. When adopting a CNN deep learning model as a feature extraction model according to an embodiment of the present invention, learning may be performed by adopting the structure of FC (Fully Connected Layer) or FCN (Fully Connected Neural Network). When using the MobileViTV2 deep learning model as a feature extraction model according to an embodiment of the present invention, learning may be performed by adopting the structure of the intermediate layer.

Among the deep learning sleep analysis models according to an embodiment of the present invention, the feature classification model inputs a plurality of consecutive spectrograms, predicts sleep state information of each spectrogram, and analyzes the sequence of the plurality of consecutive spectrograms. Thus, it can be learned to predict or classify overall sleep state information.

According to an embodiment of the present invention, in the step (S214) of inferring information about sleep by using the preprocessed second information as an input to the inference model, information about sleep can be inferred by using the preprocessed second information as an input to the inference model. there is. The previously learned inference model may be the sleep deep learning sleep analysis model described above, but is not limited thereto, and the previously learned inference model may be an inference model of various types to achieve the purpose. A variety of methods can be used for the previously learned inference model.

According to an embodiment of the present invention, the step of combining the first information and the second information that have undergone a data preprocessing process into multimodal data (S220) combines the information to determine sleep state information.

According to an embodiment of the present invention, a method of combining multimodal data may be to combine sleep information inferred through preprocessed first information and information inferred through preprocessed second information into data of the same format. .

According to an embodiment of the present invention, in the step of acquiring sleep state information by combining multimodal data (S230), the user's sleep state information can be determined by combining data obtained through multimodality. Sleep state information may be information about the user's sleep state.

According to an embodiment of the present invention, the step of acquiring sleep state information by combining multimodal data (S230) includes the step of inferring information about sleep by using the preprocessed first information as an input to a deep learning model (S204). In the step (S214) of inferring information about sleep using the inferred hypnogram about the user's sleep and the preprocessed second information as input to the inference model, the hypnogram about the inferred user's sleep is generated. Can be combined. For example, by overlapping each hypnogram, information about the sleep stage for the matching part is adopted, and information about the sleep stage for the non-matching part is weighted to determine whether or not to adopt the sleep state. Information can be obtained.

According to an embodiment of the present invention, the step of acquiring sleep state information by combining multimodal data (S230) includes the step of inferring information about sleep by using the preprocessed first information as an input to a deep learning model (S204). Hypnodensity about the user's sleep inferred in the step (S214) of inferring information about sleep using the inferred hypnodensity graph about the user's sleep and the preprocessed second information as input to the inference model. Graphs (hypnodensity graphs) can be combined. For example, by substituting the probability of each hypnodensity graph into a formula, the sleep stage with the highest reliability at each time can be obtained as the user's sleep stage information. For example, in each hypnodensity graph, if the reliability over time exceeds the preset reliability threshold, it is adopted as the user's sleep stage information, and the sleep stage information whose reliability over time exceeds the preset reliability threshold is adopted as the user's sleep stage information. If there is no, sleep state information can be obtained by adopting it as sleep stage information through weighting.

According to an embodiment of the present invention, the step of acquiring sleep state information by combining multimodal data (S230) includes the step of inferring information about sleep by using the preprocessed first information as an input to a deep learning model (S204). A hypnodensity graph about the user's sleep inferred in the step (S214) of inferring information about sleep using the inferred hypnogram about the user's sleep and the preprocessed second information as input to the inference model. graph) can be combined. For example, if the reliability of the sleep stage displayed in the hypnogram and the hypnodensity graph exceeds a preset threshold, information on the user's sleep state can be obtained by adopting it as the user's sleep stage. For example, if the reliability of the sleep stage displayed in the hypnogram and the hypnodensity graph does not exceed a preset threshold, a weighted calculation is made and adopted as the user's sleep stage to obtain highly reliable user's sleep state information. You can.

According to an embodiment of the present invention, the user's sleep state information may include sleep stage information indicating the user's sleep as a stage. Methods for displaying sleep stages may include a Hypnogram, which displays sleep stages on a graph, and a Hypnodensity graph, which displays the probability of each sleep stage on a graph, but the display method is as follows. It is not limited.

According to an embodiment of the present invention, the user's sleep state information may include sleep event information expressing sleep-related diseases that occur during the user's sleep or behavior during sleep. Specifically, sleep event information that occurs during the user's sleep may include sleep apnea and hypopnea information due to the user's sleep disease. Additionally, specifically, the sleep event information that occurs during the user's sleep may include whether the user snores, the duration of snoring, whether the user talks in his sleep, the duration of the sleep talking, whether he tosses and turns, and the duration of the tossing and turning. The user's sleep event information described is only an example for expressing events that occur during the user's sleep, and is not limited thereto.

One embodiment of a multimodal sleep state information analysis method (CONCEPT-C)

Figure 28 is a flowchart illustrating a method for analyzing sleep state information including the step of combining inferred sleep sound information into sleep environment information and multimodal data according to an embodiment of the present invention.

In order to achieve the purpose of the present invention, according to one embodiment, a method for analyzing sleep state information using sleep sound information and sleep environment information in a multimodal manner includes a first method of acquiring sound information in the time domain related to the user's sleep. Information acquisition step (S300), performing preprocessing of the first information (S302), inferring information about sleep by using the first information as input to the deep learning model (S304), user sleep related to the user's sleep It may include a second information acquisition step of acquiring environmental information (S310), a combining step of combining multi-modal data (S320), and a step of acquiring sleep state information by combining multi-modal data (S330).

According to an embodiment of the present invention, the first information acquisition step (S300) may acquire sound information in the time domain related to the user's sleep from the user terminal 10. Sound information in the time domain related to the user's sleep may include sound source information obtained from the sound source detection unit of the user terminal 10.

According to an embodiment of the present invention, in step S302 of performing data preprocessing of the first information, temporal sound information on the time domain may be converted into information on the frequency domain. Additionally, information in the frequency domain may be expressed as a spectrogram, which may be a Mel spectrogram to which the Mel scale is applied. By converting to a spectrogram, user privacy can be protected and the amount of data processing can be reduced.

According to an embodiment of the present invention, the step of performing data preprocessing of the first information (S302) may include a data augmentation process to obtain a sufficient amount of meaningful data to input sleep sound information into a deep learning model. there is. Data augmentation techniques may include pitch shifting (Pitch Shifting) augmentation, TUT (Tile UnTile) augmentation, and noise-added augmentation. The above-described augmentation technique is merely an example, and the present invention is not limited thereto.

According to an embodiment of the present invention, the second information acquisition step (S310) of acquiring user sleep environment information related to the user's sleep may acquire user sleep environment information through the user terminal 10, an external server, or a network. . The user's sleep environment information may refer to information related to sleep obtained in the space where the user is located. Sleep environment information may be sensing information obtained in a space where the user is located using a non-contact method. Sleep environment information may be breathing movement and body movement information measured through radar. Sleep environment information may be information related to the user's sleep obtained from a smart watch, smart home appliance, etc. Sleep environment information may be Heart Rate Variability (HRV) and heart rate obtained through PhotoPlethysmoGraphy (PPG), and photoplethysmography signals can be measured by smart watches and smart rings. You can. Sleep environment information may be an electroencephalography (EEG) signal. Sleep environment information may be an Actigraphy signal measured during sleep. Sleep environment information may be labeling data representing user information. Specifically, the labeling data may include the user's age, disease status, physical condition, race, height, weight, and body mass index, and this is only an example of labeling data representing the user's information and is not limited thereto. The above-described sleep environment information is only an example of information that may affect the user's sleep, and is not limited thereto.

According to an embodiment of the present invention, the step of inferring information about sleep by using the preprocessed first information as an input to a deep learning model (S304) involves inferring information about sleep by using the preprocessed first information as an input to a deep learning model. You can.

Among the deep learning sleep analysis models according to an embodiment of the present invention, the feature classification model inputs a plurality of consecutive spectrograms, predicts sleep state information of each spectrogram, and analyzes the sequence of the plurality of consecutive spectrograms. Thus, it can be learned to predict or classify time-series sleep state information.

According to an embodiment of the present invention, the step of combining the first information and the second information that have undergone a data preprocessing process into multimodal data (S320) combines the data to input the multimodal data to the deep learning model.

According to an embodiment of the present invention, the step of acquiring sleep state information by combining multi-modal data (S330) can determine the user's sleep state information by combining data obtained through multi-modality. Sleep state information may be information about the user's sleep state.

Real-time sleep analysis according to an embodiment of the present invention

According to an embodiment of the present invention, analysis of sleep state information based on acoustic information may include a detection step for sleep events (eg, apnea, hypopnea, snoring, sleep talking, etc.). However, because the characteristics of sleep acoustic patterns are reflected over time, it may be difficult to determine them using only short acoustic data at a specific point in time. Therefore, in order to model acoustic information, analysis must be performed based on the time series characteristics of the acoustic information.

Additionally, sleep events that occur during sleep (eg, apnea, hypopnea, snoring, sleep talking, etc.) have various characteristics related to sleep events. For example, there is no sound during an apnea event, but when the apnea event ends, a loud sound may be generated as air passes again, and sleep events can be detected by learning the characteristics of the apnea event in time series.

Deep for real-time sleep event detection neural network difference

In order to detect sleep events that occur during sleep according to an embodiment of the present invention, the deep neural network structure for analyzing the above-described sleep stages can be modified and used. Specifically, sleep stage analysis requires time-series learning of sleep sounds, but sleep event detection occurs on average between 10 and 60 seconds, so 1 epoch or 2 epochs of 30 seconds are used. ) is sufficient to accurately detect. Therefore, the deep neural network structure for analyzing sleep stages according to an embodiment of the present invention can reduce the amount of input and output of the deep neural network structure for analyzing sleep stages. For example, if the deep neural network structure for analyzing sleep stages processes 40 Mel spectrograms and outputs sleep stages of 20 epochs, the deep neural network structure for detecting sleep events processes 14 Mel spectrograms. Thus, the sleep event label of 10 epochs can be output. Here, the sleep event label may include, but is not limited to, no event, apnea, hypopnea, snoring, tossing and turning, etc.

Additionally, a deep neural network structure for detecting sleep events that occur during sleep according to an embodiment of the present invention may include a feature extraction model and a feature classification model. Specifically, the feature extraction model extracts the features of sleep events found in each mel spectrogram, and the feature classification model detects multiple epochs, finds epochs containing sleep events, and analyzes neighboring features to identify sleep events in time series. Types of events can be predicted and classified.

Class Weights for real-time sleep event detection

According to an embodiment of the present invention, a method for detecting sleep events that occur during sleep may assign class weights to solve the class imbalance problem of each sleep event. Specifically, among sleep events that occur during sleep, “no event” may have a dominant effect on the overall sleep length, resulting in a decrease in sleep event learning efficiency. Therefore, by assigning a higher weight than “no event” to other sleep events, learning efficiency and accuracy can be improved. For example, if the sleep event class is classified into three categories: “No event,” “Apnea,” and “Hypopnea,” to reduce the impact of “No event” on learning, “No event” is assigned 1.0, “ A weight of 1.3 can be assigned to “apnea” and a weight of 2.1 to “hypopnea.”

Consistency training for real-time sleep event detection

And, Figure 21 is a diagram for explaining consistency training according to an embodiment of the present invention. According to an embodiment of the present invention, the step of detecting a sleep event that occurs during sleep is to detect a sleep event that occurs during sleep in a home environment and a noisy environment. As described above, consistency training is performed as shown in FIG. 21. (Consistency Training) can be used. Consistency Training is a type of semi-supervised learning model. Consistency Training according to an embodiment of the present invention involves intentionally adding noise to one data, and intentionally adding noise to one data. This may be a method of performing learning with data that has not been added.

Noise intentionally added according to an embodiment of the present invention may be noise of the target environment, where the noise of the target environment may be noise obtained in an environment other than polysomnography, for example. Specifically, when detecting a sleep event, various noises can be added by adjusting the SNR and type of noise to resemble the actual user's environment. Through this, you can collect and learn about the types of noise obtained in various laboratories and the noise that occurs in actual home environments.

According to embodiments of the present invention, for convenience, data to which noise is intentionally added is referred to as corrupted data. Corrupted data may preferably refer to data to which noise of the target environment has been intentionally added.

Additionally, for convenience, data to which no noise is intentionally added will be referred to as clean data. Here, no noise was intentionally added to the clean data, but noise may actually be included.

Home Noise Consistency Training

According to an embodiment of the present invention, detection of sleep events (eg, apnea, hypopnea, snoring, sleep talking, etc.) that occur during sleep may include home noise consistency training. Consistency learning in the home environment can make the model perform robustly even against noise at home. Consistency learning in a home environment can be made robust to noise by performing consistency learning so that the model outputs similar predictions regardless of whether there is noise or not.

According to an embodiment of the present invention, sleep event detection that occurs during sleep can proceed with consistency learning in the home environment. Consistency learning in the home environment may involve a consistency loss function. For example, consistency loss can be defined as the mean square error (MSE) between the prediction of a clean sleep breathing sound and the prediction of a corrupted version of that sound.

According to one embodiment of the present invention, consistency learning in a home environment randomly samples data from the training noise to generate corrupted sounds, and adds noise to clean sleep breathing sounds with a random SNR between -20 and 5. can do.

According to an embodiment of the present invention, consistency learning in a home environment can be done so that the length of the input sequence is 14 epochs and the total length of sampled noise is 7 minutes or more. Through this, detecting a sleep event according to the present invention detects information within a shorter period of time compared to sleep stage analysis according to the present invention, and the accuracy of sleep event detection can be increased.

From event detection AHI Regression analysis for value estimation

Figure 34 is a diagram illustrating a linear regression analysis function used to analyze AHI, a sleep apnea occurrence index, through sleep events that occur during sleep, according to an embodiment of the present invention.

According to one embodiment of the present invention, the AHI index, which means the number of respiratory events that occur per unit time (e.g., 1 hour), is, separately from sleep stage analysis, the length of an epoch for one sleep stage analysis and Analysis can be done independently. Specifically, two or three short sleep events may be included during one epoch, and one long sleep event may be included during multiple epochs. According to one embodiment of the present invention, sleep A regression analysis function can be used to estimate the number of actual events that occur from the number of epochs in which sleep events occur. For example, a RANSAC (Random Sample Consensus) regression analysis model can be used. The RANSAC regression model is one of the methods for estimating the parameters of an approximate model (fitting model). It is a method of randomly selecting sample data and then selecting the model that matches the maximum.

Multi-task analysis through multi-head

A method for analyzing a sleep state according to an embodiment of the present invention may include analysis through a deep learning model. The deep learning model according to an embodiment of the present invention is capable of multi-task learning and/or multi-task analysis. Specifically, multi-task learning and multi-task analysis can simultaneously learn tasks according to the above-described embodiments of the present invention (eg, multi-modal learning, real-time sleep event analysis, sleep stage analysis, etc.).

A deep learning model for analyzing sleep states according to an embodiment of the present invention is capable of multi-task learning and multi-task analysis. Specifically, for multi-task learning and analysis, a deep learning model may adopt a structure with multiple heads. Each of the plurality of heads may be responsible for a specific task or task (eg, multimodal learning, real-time sleep event analysis, sleep stage analysis, etc.). For example, a deep learning model may have a structure with a total of three heads: a first head, a second head, and a third head, and the first head performs inference and/or classification on sleep stage information, The second head may perform detection and/or classification of sleep apnea and hypopnea during the sleep event, and the third head may perform detection and classification of snoring during the sleep event. The detailed description of the specific work or task of the head described above is only an example for explaining the present invention, and is not limited thereto. The deep learning model according to the present invention can perform multi-task learning and analysis through a structure with multiple heads, and can optimize multiple tasks or specific tasks by increasing data efficiency.

Effect of the sleep analysis method according to the present invention

Comparing with the results of polysomnography (PSG), it was confirmed that the sleep analysis model results using sleep acoustic information as input were very accurate.

Existing sleep analysis models predict sleep stages using ECG (Electrocardiogram) or HRV (Heart Rate Variability) as input, but the present invention converts sleep sound information into the frequency domain, spectrogram, or mel spectrogram. By using this as input, you can proceed with sleep stage analysis and inference. Therefore, unlike existing sleep analysis models, because sleep sound information is converted into frequency domain information, spectrogram, or mel spectrogram as input, sleep stage can be sensed in real time through analysis of the specificity of sleep patterns. It can be obtained.

As shown in Figure 11, the sleep analysis results obtained according to the present invention are not only consistent with polysomnography, but also contain more precise and meaningful information related to sleep stages (Wake, Light, Deep, REM). do. The hypnogram shown at the bottom of Figure 10 shows the probability of which of the four classes (Wake, Light, Deep, REM) it belongs to in 30 second increments when predicting the sleep stage by receiving the user's sleep sound information. indicates. Here, the four classes refer to the awake state, light sleep state, deep sleep state, and REM sleep state, respectively.

Figure 12 is a graph verifying the performance of the sleep analysis method according to the present invention, showing polysomnography (PSG) results in relation to sleep apnea and hypoventilation (hypopnea) and polysomnography (PSG) results according to the present invention. This is a diagram comparing the analysis results (AI results) using AI algorithms. The hypnogram shown at the bottom of FIG. 12 indicates the probability of which of the two diseases (sleep apnea, hypoventilation) it belongs to in 30-second increments when predicting a sleep disease by receiving user sleep sound information. Using sleep analysis according to the present invention, as shown in Figure 12, the sleep state information obtained according to the present invention not only closely matches polysomnography, but also includes more precise analysis information related to apnea and hypoventilation. do.

The present invention can analyze the user's sleep in real time and identify the point where sleep disorders (sleep apnea, sleep hyperventilation, sleep hypopnea) occur. If stimulation (tactile, auditory, olfactory, etc.) is provided to the user at the moment the sleep disorder occurs, the sleep disorder may be temporarily alleviated. In other words, the present invention can stop the user's sleep disorder and reduce the frequency of sleep disorder based on accurate event detection related to the sleep disorder. In addition, according to the present invention, there is an effect that very accurate sleep analysis is possible by performing sleep analysis in a multimodal manner.

Post-processing of the inference step of the present invention

In the inference stage of inferring the sleep stage through learning, sleep consists not only of a certain amount of time (e.g., 30 seconds, 20 minutes, etc.) like the learning data, but also of sleep time (e.g., 5 hours, 8 hours, etc.) It could be. In order to make accurate inferences about sleep stages, post-processing can be performed to increase the accuracy of inferences in light of sleep duration. The specific values related to the above-mentioned time interval are merely examples, and the present invention is not limited thereto.

According to one embodiment of the present invention, inference regarding the depth of sleep according to sleep duration can be post-processed using medical information.

According to one embodiment of the present invention, post-processing can be performed through artificial intelligence learning using sleep stage information data according to sleep duration.

The steps of the method or algorithm described in connection with embodiments of the present invention may be implemented directly in hardware, implemented as a software module executed by hardware, or a combination thereof. The software module may be RAM (Random Access Memory), ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), Flash Memory, hard disk, removable disk, CD-ROM, or It may reside on any type of computer-readable recording medium well known in the art to which the present invention pertains.

The components of the present invention may be implemented as a program (or application) and stored in a medium in order to be executed in conjunction with a hardware computer. Components of the invention may be implemented as software programming or software elements, and similarly, embodiments of the invention may include various algorithms implemented as combinations of data structures, processes, routines or other programming constructs, including: It can be implemented in a programming or scripting language such as C, C++, Java, assembler, etc. Functional aspects may be implemented as algorithms running on one or more processors.

Those skilled in the art will understand that various illustrative logical blocks, modules, processors, means, circuits and algorithm steps described in connection with the embodiments disclosed herein can be used in electronic hardware, (for convenience) It will be understood that the implementation may be implemented by various forms of program or design code (referred to herein as “software”) or a combination of both. To clearly illustrate this interoperability of hardware and software, various illustrative components, blocks, modules, circuits and steps have been described above generally with respect to their functionality. Whether this functionality is implemented as hardware or software depends on the specific application and design constraints imposed on the overall system. A person skilled in the art may implement the described functionality in various ways for each specific application, but such implementation decisions should not be construed as departing from the scope of the present invention.

The various embodiments presented herein may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term “article of manufacture” includes a computer program, carrier, or media accessible from any computer-readable device. For example, computer-readable media include magnetic storage devices (e.g., hard disks, floppy disks, magnetic strips, etc.), optical disks (e.g., CDs, DVDs, etc.), smart cards, and flash memory. Includes, but is not limited to, devices (e.g., EEPROM, cards, sticks, key drives, etc.). Additionally, various storage media presented herein include one or more devices and/or other machine-readable media for storing information. The term “machine-readable media” includes, but is not limited to, wireless channels and various other media capable of storing, retaining, and/or transmitting instruction(s) and/or data.

It is to be understood that the specific order or hierarchy of steps in the processes presented is an example of illustrative approaches. It is to be understood that the specific order or hierarchy of steps in processes may be rearranged within the scope of the present invention, based on design priorities. The appended method claims present elements of the various steps in a sample order but are not meant to be limited to the particular order or hierarchy presented.

The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments without departing from the scope of the invention. Thus, the present invention is not limited to the embodiments presented herein, but is to be construed in the broadest scope consistent with the principles and novel features presented herein.

[Explanation of symbols]

10: User terminal

100: computing device

110: Network unit

120: memory

130: processor

20: External server

11a: Area where object state information or environmental sensing information can be obtained

1a: Electronic devices connected to the network within area 11a

1b: Electronic devices connected to the network within area 11a

1c: Electronic devices not connected to the network within area 11a

1d: Electronic devices not connected to the network within area 11a

2a: Electronic devices outside the scope of area 11a

2b: Electronic devices outside the scope of area 11a

E: Acoustic information

P: Singularities related to the user’s sleep

SS: Sleep acoustic information

SP: spectrogram

Claims

In a method for analyzing the user's sleep state information through acoustic information,

Obtaining acoustic information in the time domain related to the user's sleep;

performing preprocessing on the sound information; and

performing at least one of extracting or classifying sleep state information using the preprocessed information as input to a deep learning model;

Includes,

The preprocessed information visualizes changes along the time axis in the frequency component of the acquired acoustic information,

The deep learning model is an artificial intelligence model based on natural language processing,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

Obtaining acoustic information in the time domain related to the user's sleep;

performing preprocessing on the sound information; and

performing at least one of extracting or classifying sleep state information using the preprocessed information as input to a deep learning model;

Includes,

The deep learning model performs learning based on consistency loss,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

Obtaining acoustic information in the time domain related to the user's sleep;

performing preprocessing on the sound information; and

performing at least one of extracting or classifying sleep state information using the preprocessed information as input to a deep learning model;

Includes,

The deep learning model performs semi-supervised learning based on sequential consistency loss to consider the time series characteristics of the acoustic information,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

Obtaining acoustic information in the time domain related to the user's sleep;

performing preprocessing on the sound information; and

performing at least one of extracting or classifying sleep state information using the preprocessed information as input to a deep learning model;

Includes,

The deep learning model performs learning based on semi-supervised contrast loss,

A method for analyzing user sleep state information through acoustic information.
According to clause 4,

Learning based on the semi-supervised contrast loss is,

setting a class confidence threshold; and

adjusting a position in a vector space based on anchor data based on the set class reliability threshold;

Including,

A method for analyzing user sleep state information through acoustic information.
According to clause 5,

Characterized in that the anchor data includes at least one of labeling data with a label for a sleep state or pseudo-label data with a pseudo label for a sleep state,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

Obtaining acoustic information in the time domain related to the user's sleep;

performing preprocessing on the sound information; and

performing at least one of extracting or classifying sleep state information using the preprocessed information as input to a deep learning model;

Includes,

The deep learning model performs learning based on the unsupervised domain adaptation (UDA) method,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

Obtaining acoustic information in the time domain related to the user's sleep;

performing preprocessing on the sound information; and

performing at least one of extracting or classifying sleep state information using the preprocessed information as input to a deep learning model;

Includes,

The deep learning model treats the predicted value for unlabeled acoustic information as a pseudo label and is learned based on the pseudo label,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

Obtaining acoustic information in the time domain related to the user's sleep;

performing preprocessing on the sound information; and

performing at least one of extracting or classifying sleep state information using the preprocessed information as input to a deep learning model;

Includes,

The deep learning model is trained to infer a sleep event based on the acoustic information - in the sleep event inference learning method, class weights are assigned to solve the imbalance problem of each sleep event class -

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

Obtaining acoustic information in the time domain related to the user's sleep;

performing preprocessing on the sound information; and

performing at least one of extracting or classifying sleep state information using the preprocessed information as input to a deep learning model;

Includes,

The deep learning model performs multi-task learning based on the acoustic information - for the multi-task learning, the deep learning model has a structure having a plurality of heads.

A method for analyzing user sleep state information through acoustic information.
According to clause 10,

Each head included in the plurality of heads performs one different task among the plurality of tasks,

Characterized in that the plurality of tasks include at least one of multimodal learning, sleep event analysis, and sleep stage analysis,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

A first information acquisition step of acquiring sound information in the time domain related to the user's sleep;

A second information acquisition step of acquiring user sleep environment information related to the user's sleep;

combining the first information and the second information into multimodal data;

Extracting features by using the multimodal data as input to a multimodal learned deep learning model; and

Obtaining information on the user's sleep state by using the extracted features as input to a deep learning model;

Including,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

A first information acquisition step of acquiring sleep sound information related to the user's sleep;

Inferring first sleep state information by using the obtained first information as input to a deep learning model;

A second information acquisition step of acquiring user sleep environment information related to the user's sleep;

Inferring second sleep state information by using the acquired second information as input to an inference model; and

Obtaining sleep state information of the user by combining the inferred first sleep state information and the inferred second sleep state information;

Including,

A method for analyzing user sleep state information through acoustic information.
In a method for analyzing the user's sleep state information through acoustic information,

A first information acquisition step of acquiring sleep sound information related to the user's sleep;

Inferring first sleep state information by using the obtained first information as input to a deep learning model;

A second information acquisition step of acquiring user sleep environment information related to the user's sleep; and

Obtaining sleep state information of the user by combining the inferred first sleep state information and the obtained second information;

Including,

A method for analyzing user sleep state information through acoustic information.