WO2021189903A1

WO2021189903A1 - Audio-based user state identification method and apparatus, and electronic device and storage medium

Info

Publication number: WO2021189903A1
Application number: PCT/CN2020/131983
Authority: WO
Inventors: 魏文琦; 王健宗; 贾雪丽; 张之勇; 程宁
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-10-09
Filing date: 2020-11-27
Publication date: 2021-09-30
Also published as: CN112233700A

Abstract

An audio-based user state identification method and apparatus, and an electronic device and a computer-readable storage medium. The method comprises: acquiring an audio training set, and performing feature conversion on each piece of audio in the audio training set so as to obtain a target sonogram set (S1); on the basis of an attention mechanism and small sample learning, training a pre-constructed deep learning network model by using the target sonogram set so as to obtain a user state identification model (S2); when audio of a user to be subjected to identification is received, performing feature conversion on the audio of said user so as to obtain a sonogram to be subjected to identification (S3); and identifying said sonogram by using the user state identification model so as to obtain a user state identification result (S4). In addition, the present application further relates to blockchain technology, and the audio training set can be stored in a blockchain. By using the method, the consumption of data resources is reduced, and the practicability of a model is enhanced.

Description

Audio-based user state recognition method, device, electronic equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with the application number CN202011074898.9 and titled "Audio-based user status recognition method, device and storage medium" on October 9, 2020. The entire content of the application is approved The reference is incorporated in this application.

Technical field

This application relates to the field of artificial intelligence, and in particular to an audio-based user state recognition method, device, electronic equipment, and storage medium.

Background technique

With the gradual popularization of the concept of smart life, user status has become the core concern of smart life. Therefore, the identification of user status has become a very important thing, such as identifying the current health status of users, especially when infectious diseases are spreading. , It’s important to know everyone’s health at all times. Under normal circumstances, users need to go to the hospital to find a doctor for a physical examination to understand their health. The hospital itself is full of various germs, and there is a risk of infection when going to the hospital for examination.

technical problem

The inventor realized that at present, a large number of medical images of users (such as chest X-rays) are usually used to train machine learning models to realize user status recognition to determine the health status of users, but a large number of medical images of users consume a lot of data resources. , And the user’s high threshold for obtaining medical images leads to poor practicability and cannot be better promoted.

Technical solutions

An audio-based user state recognition method provided in this application includes:

Acquiring an audio training set, performing feature conversion on each audio in the audio training set, to obtain a target sound spectrum atlas;

Based on the attention mechanism and small sample learning, use the target spectrogram atlas to train the pre-built deep learning network model to obtain the user state recognition model;

When the audio of the user to be identified is received, feature conversion is performed on the audio of the user to be identified to obtain the spectrogram to be identified;

The user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.

The present application also provides an audio-based user state recognition device, which includes:

The model generation module is used to obtain an audio training set, perform feature conversion on each audio in the audio training set to obtain a target spectrogram atlas; based on the attention mechanism and small sample learning, use the target spectrogram atlas to The pre-built deep learning network model is trained to obtain the user state recognition model;

The state recognition module is used to perform feature conversion on the audio of the user to be recognized when the audio of the user to be recognized is received to obtain the spectrogram to be recognized; use the user state recognition model to perform the feature conversion on the spectrogram to be recognized Recognize and get the result of user status recognition.

This application also provides an electronic device, which includes:

Memory, storing at least one instruction; and

The processor executes the instructions stored in the memory to implement the following steps:

The present application also provides a computer-readable storage medium in which at least one instruction is stored, and when the at least one instruction is executed by a processor in an electronic device, the following steps are implemented:

Description of the drawings

FIG. 1 is a schematic flowchart of an audio-based user state recognition method provided by an embodiment of this application;

2 is a schematic diagram of a detailed process of obtaining a target spectrogram set in an audio-based user state recognition method provided by an embodiment of the application;

3 is a schematic diagram of modules of an audio-based user state recognition device provided by an embodiment of the application;

4 is a schematic diagram of the internal structure of an electronic device that implements an audio-based user state recognition method provided by an embodiment of the application;

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Embodiments of the present invention

It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

This application provides an audio-based user status recognition method. Referring to FIG. 1, it is a schematic flowchart of an audio-based user state recognition method provided by an embodiment of this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.

In this embodiment, the audio-based user state recognition method includes:

S1. Obtain an audio training set, perform feature conversion on each audio in the audio training set, to obtain a target spectrogram atlas;

In the embodiment of the present application, the audio training set is a collection of audios containing initial tags. Preferably, the initial tags are the user's disease conditions, such as acute bronchitis, chronic pharyngitis, pertussis, fever; further, Since the user's cough audio has corresponding sound features under different disease conditions, preferably, the audio training set is a collection of cough audio corresponding to different disease conditions, wherein the sound feature is the frequency domain of the cough audio The characteristics can be represented by a spectrogram.

Further, in order to make the feature of each audio in the audio training set more intuitive and clearer for a better subsequent model, the embodiment of the present application performs feature transformation on the audio training set to obtain the target spectrogram atlas, including:

S11. Resample each audio in the audio training set to obtain a corresponding digital voice signal;

In the embodiment of the present application, in order to facilitate data processing of each audio in the audio training set, each audio in the audio training set is resampled to obtain the corresponding digital voice signal. Preferably, the present application The embodiment uses a digital-to-analog converter to resample each audio in the audio training set.

S12. Pre-emphasize the digital voice signal to obtain a standard digital voice signal;

S13. Summarize all the standard digital voice signals to obtain a voice signal set;

In the embodiment of the present application, in order to compensate for the loss of audio information caused during the acquisition of the audio training set, a pre-emphasis operation is performed on each audio in the audio training set,

In detail, in the embodiment of the present application, the performing the pre-emphasis operation on each audio in the audio training set includes: re-sampling each audio in the audio training set to obtain the corresponding digital voice signal; The digital voice signal is pre-emphasized to obtain a standard digital voice signal, and all the standard digital voice signals are summarized to obtain a voice signal set.

In detail, the embodiment of the present application uses the following formula to perform the pre-emphasis operation:

y(t)=x(t)-μx(t-1)

Wherein, x(t) is the digital voice signal, t is the time, y(t) is the standard digital voice signal, μ is the preset adjustment value of the pre-emphasis operation, preferably, the value of μ The range is [0.9,1.0].

S14. Perform feature conversion on each standard digital voice signal included in the voice signal set to obtain a target spectrogram set.

In the embodiment of the present application, the standard voice signal in the voice signal set can only reflect the change of audio in the time domain, and cannot reflect the audio characteristics of the standard voice signal. In order to reflect the audio characteristics of the standard voice signal, the audio The features are more intuitive and clear, and feature conversion is performed on each standard digital voice signal in the voice signal set.

In detail, in the embodiment of the present application, performing feature conversion on each standard digital voice signal in the voice signal set includes: using a preset sound processing algorithm to map each standard digital voice signal in the voice signal set on a frequency Domain, the corresponding target spectrogram is obtained, and all the target spectrograms are summarized to obtain the target spectrogram set.

Preferably, the sound processing algorithm described in this application is the Mel filter algorithm.

In the embodiment of this application, the above steps only perform feature conversion on each audio of the audio training set, and will not affect the initial label corresponding to each audio of the audio training set, so the target spectrogram is set Each target spectrogram has a corresponding initial label.

S2, based on the attention mechanism and small sample learning, use the target spectrogram atlas to train the pre-built deep learning network model to obtain the user state recognition model;

In the embodiment of this application, since the number of samples in the audio training set is too small, in order to ensure the training accuracy and robustness of the subsequent model, based on the attention mechanism and small sample learning, the target spectrogram atlas is used to pre-build The deep learning network model is trained to obtain an audio-based user state recognition model.

In detail, in the embodiment of the present application, the training of the pre-built deep learning network model by using the target spectrogram atlas includes:

Step A: The target spectrogram set is divided into a training set and a test set;

In the embodiment of the present application, since the sample data in the target spectrogram atlas is small and difficult to obtain, directly using the target spectrogram atlas as the training set will result in poor robustness of the subsequent model. Therefore, the implementation of this application For example, the target spectrogram set is divided into a training set and a test set, and the robustness of the model is enhanced by continuously testing the training model by using the test set, and dividing the target spectrogram set into a training set And a test set, including: classifying each target spectrogram in the target spectrogram atlas according to the corresponding initial label to obtain the corresponding classification target spectrogram atlas; randomly taking out from the classification target spectrogram atlas A preset number of target spectrograms are used as the test subset, and the complement of the training subset in the classified spectrogram set is used as the training subset; all the training sets of the training subset are summarized, and all the training sets are summarized. The test subset obtains a test set. Preferably, the preset number in the embodiment of the present application is 1.

Step B: Use the training set to train the deep learning network to obtain an initial recognition model, test the initial recognition model according to the test set to obtain a loss value, and return to step when the loss value is greater than a preset threshold A. When the loss value is less than or equal to a preset threshold, the initial recognition model is used as the user state recognition model.

Preferably, the deep learning network in the embodiment of the present application is a convolutional neural network.

In the embodiment of this application, since the audio time in the audio training set may be inconsistent, the size of the images in the target spectrogram atlas may be different, which in turn leads to the target extracted by the deep learning network model during the training process. The target spectrograms in the spectrogram set have different feature dimensions and cannot be uniformly trained. Therefore, in order to make better use of the data in the audio training set, the embodiment of the present application uses the training set to compare the pre-deep learning network , It is necessary to add an attention mechanism processing layer before the fully connected layer of the deep learning network model to perform image feature alignment, where the attention mechanism processing layer performs a feature alignment network according to different image feature dimensions, for example: target sound spectrum The image feature a of the feature extraction performed on the deep learning network model in Figure A is a D*T1 dimensional matrix, and the image feature b of the target spectrogram B that is feature extraction performed on the deep learning network model is a D*T2 dimensional matrix The attention mechanism processing layer converts the preset weight matrix of image feature a multiplied by T1*1 into a D-dimensional matrix, and converts the preset weight matrix of image feature b multiplied by T2*1 into a D-dimensional matrix to realize the image Feature a and image feature b are aligned.

Further, since the number of samples in the training set is small, the embodiment of the present application needs to perform the initial recognition model to verify the recognition ability of the model to facilitate the training and adjustment of the model.

In detail, the recognition category of the initial recognition model in the embodiment of the present application is the same as the category of the initial tags in the target spectrogram atlas. For example, there are two initial tags in the target spectrogram atlas: chronic pharyngitis and fever. , Then the recognition categories in the initial recognition model also have the same two types: chronic pharyngitis and fever. Further, in the embodiment of the present application, testing the initial recognition model according to the test set to obtain a loss value includes: extracting a feature vector corresponding to each of the initial tags in the initial recognition model to obtain a target feature Vector; use the initial recognition model to perform feature extraction on each target spectrogram in the test subset to obtain a test feature vector; calculate the target feature vector and the test feature vector corresponding to each of the initial tags Calculate the average value of all the loss distance values to obtain the loss value. Preferably, the embodiment of the present application adopts an Euclidean distance calculation method to calculate the distance between the target feature vector corresponding to each of the initial tags and the test feature vector.

Furthermore, those skilled in the art can know that the different recognition types of the initial model correspond to different fully connected layer nodes, and the fully connected layer nodes have corresponding sequences. The embodiment of the present application obtains the full range corresponding to each recognition type of the initial recognition model. The output values of the connection layer nodes are combined in the order of the corresponding fully connected layer nodes to obtain the corresponding target feature vector; further, in the embodiment of the present application, each target spectrogram in the test subset is input to the office According to the initial recognition model, according to the initial label corresponding to each target spectrogram in the test subset, the output value of the fully connected layer node corresponding to the recognition category in the initial recognition model is obtained, and the output value of the fully connected layer node is obtained according to the corresponding fully connected layer node. Combine sequentially to obtain the test feature vector.

In another embodiment of the present application, in order to ensure data privacy, the audio training set may be stored in a blockchain node.

S3. When the audio of the user to be identified is received, perform feature conversion on the audio of the user to be identified to obtain the spectrogram to be identified;

In the embodiment of the present application, the audio of the user to be identified is of the same category as the audio in the audio training set. Preferably, in the embodiment of the present application, the audio of the user to be identified is the user's cough audio. Audio training set

Further, the method for performing feature conversion on the audio of the user to be identified in the embodiment of the present application is the same as the above-mentioned method for performing feature conversion on each audio of the audio training set.

S4. Recognizing the spectrogram to be recognized by using the user state recognition model to obtain a user state recognition result.

In the embodiment of the present application, the user status recognition result is the user's health status, such as acute bronchitis, chronic pharyngitis, pertussis, and fever.

In the embodiment of the present application, feature conversion is performed on each audio in the audio training set to obtain the target spectrogram atlas, so that the features in the audio in the audio training set are clearer and more intuitive, and the accuracy of subsequent model training is increased; Attention mechanism and small sample learning, using the target spectrogram atlas to train a pre-built deep learning network model to obtain a user state recognition model, which enhances the robustness and training accuracy of the model under the small sample training set; Perform feature conversion on the audio of the user to be identified to obtain the spectrogram to be identified, so that the audio features of the user to be identified are more clear and intuitive, and the recognition accuracy of the subsequent model is improved; The to-be-recognized spectrogram is recognized, and the user state recognition result is obtained. A small amount of more easily available audio data is used to train the model, which reduces the data resource consumption of the model training. Only the user's audio can be used to recognize the user state. Enhance the practicality of the model.

As shown in Fig. 3, it is a functional block diagram of the audio-based user state recognition device of the present application.

The audio-based user state recognition apparatus 100 described in this application can be installed in an electronic device. According to the implemented functions, the audio-based user state recognition device may include a model generation module 101 and a state recognition module 102. The module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.

In this embodiment, the functions of each module/unit are as follows:

The model generation module 101 is used to obtain an audio training set, perform feature conversion on each audio in the audio training set to obtain a target spectrogram set; based on the attention mechanism and small sample learning, use the target spectrogram Set to train the pre-built deep learning network model to obtain the user state recognition model.

Further, in order for the subsequent model to have better features of each audio in the audio training set more intuitive and clear, the model generation module 101 in this embodiment of the present application uses the following means to perform feature transformation on the audio training set to obtain the Target sound spectrum atlas, including:

Re-sampling each audio in the audio training set to obtain a corresponding digital voice signal;

Performing pre-emphasis on the digital voice signal to obtain a standard digital voice signal;

Summarize all the standard digital voice signals to obtain a voice signal set;

In detail, in the embodiment of the present application, the pre-emphasis operation on each audio in the audio training set includes: re-sampling each audio in the audio training set to obtain the corresponding digital voice signal; The digital voice signal is pre-emphasized to obtain a standard digital voice signal, and all the standard digital voice signals are summarized to obtain a voice signal set.

In detail, the model generation module 101 according to the embodiment of the present application uses the following formula to perform the pre-emphasis operation:

y(t)=x(t)-μx(t-1)

Perform feature conversion on each standard digital voice signal included in the voice signal set to obtain a target spectrogram set.

In detail, the model generation module 101 in the embodiment of the present application uses the following means to perform feature conversion on each standard digital voice signal in the voice signal set, including: using a preset voice processing algorithm to concentrate the voice signal Each standard digital speech signal is mapped in the frequency domain to obtain a corresponding target spectrogram, and all the target spectrograms are summarized to obtain the target spectrogram set.

In detail, in the embodiment of the present application, the model generation module 101 uses the following methods to train the pre-built deep learning network model, including:

In the embodiment of the present application, since the sample data in the target spectrogram atlas is small and difficult to obtain, directly using the target spectrogram atlas as the training set will result in poor robustness of the subsequent model. Therefore, the implementation of this application For example, the target spectrogram set is divided into a training set and a test set, and the robustness of the model is enhanced by continuously testing the training model by using the test set, and dividing the target spectrogram set into a training set And the test set, including: classifying each target spectrogram in the target spectrogram atlas according to the corresponding initial label to obtain the corresponding classification target spectrogram atlas; randomly taking out from the classification target spectrogram atlas A preset number of target spectrograms are used as the test subset, and the complement of the training subset in the classified spectrogram set is used as the training subset; all the training sets of the training subset are summarized, and all the training sets are summarized. The test subset obtains a test set. Preferably, the preset number in the embodiment of the present application is 1.

In detail, the recognition category of the initial recognition model in the embodiment of the present application is the same as the category of the initial tags in the target spectrogram atlas. For example, there are two initial tags in the target spectrogram atlas that are chronic pharyngitis. , Fever, then the recognition categories in the initial recognition model also have the same two types: chronic pharyngitis and fever. Further, the model generation module 101 in the embodiment of the present application obtains the loss value by the following means, including: extracting the feature vector corresponding to each of the initial tags in the initial recognition model to obtain the target feature vector; The recognition model performs feature extraction on each target spectrogram in the test subset to obtain a test feature vector; calculates the distance between the target feature vector corresponding to each initial tag and the test feature vector to obtain the loss distance Value; Calculate the average of all the loss distance values to obtain the loss value. Preferably, the embodiment of the present application adopts an Euclidean distance calculation method to calculate the distance between the target feature vector corresponding to each of the initial tags and the test feature vector.

Further, those skilled in the art can know that the different recognition types of the initial model correspond to different fully connected layer nodes, and the fully connected layer nodes have a corresponding sequence. The model generation module 101 described in this embodiment of the application obtains each of the initial recognition models. The output values of the fully connected layer nodes corresponding to the recognition category are combined according to the order of the corresponding fully connected layer nodes to obtain the corresponding target feature vector; further, the model generation module 101 described in the embodiment of the present application combines the Each target spectrogram in the test subset is input to the initial recognition model, and the fully connected layer node corresponding to the recognition category in the initial recognition model is obtained according to the initial label corresponding to each target spectrogram in the test subset The output values of are combined according to the order of the corresponding fully connected layer nodes to obtain the test feature vector.

The state recognition module 102 is configured to, when receiving the audio of the user to be recognized, perform feature conversion on the audio of the user to be recognized to obtain the spectrogram to be recognized; The spectrum is identified, and the user status identification result is obtained.

In the embodiment of the present application, the user status recognition result is the user's disease condition, such as acute bronchitis, chronic pharyngitis, whooping cough, and fever.

As shown in FIG. 44, it is a schematic structural diagram of an electronic device that implements an audio-based user state recognition method according to the present application.

The electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as an audio-based user state recognition program.

Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. Digital, SD) card, flash memory card (Flash Card) and so on. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of an audio-based user status recognition program, etc., but also to temporarily store data that has been output or will be output.

The processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc. The processor 10 is the control core of the electronic device (Control Unit), using various interfaces and lines to connect the various components of the entire electronic device, by running or executing programs or modules stored in the memory 11 (for example, audio-based user status recognition programs, etc.), and calling The data in the memory 11 is used to perform various functions of the electronic device 1 and process data.

The bus may be a peripheral component interconnect (PCI) bus or an extended industry standard structure (extended industry standard structure). industry standard architecture, EISA for short) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.

FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.

For example, although not shown, the electronic device 1 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power The device implements functions such as charge management, discharge management, and power consumption management. The power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators. The electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

Further, the electronic device 1 may also include a network interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a Wi-Fi interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.

Optionally, the electronic device 1 may also include a user interface. The user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)). Optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.

It should be understood that the embodiments are only for illustrative purposes, and are not limited by this structure in the scope of the patent application.

The audio-based user state recognition program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:

Specifically, for the specific implementation method of the above-mentioned instructions by the processor 10, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG. 1, which will not be repeated here.

Further, if the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. It can be volatile or non-volatile. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .

The computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the following steps:

Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store a block chain node Use the created data, etc.

In the several embodiments provided in this application, it should be understood that the disclosed equipment, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application.

Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any associated diagram marks in the claims should not be regarded as limiting the claims involved.

The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

In addition, it is obvious that the word "including" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. The second class words are used to indicate names, and do not indicate any specific order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present application.

Claims

An audio-based user status recognition method, wherein the method includes:

Acquiring an audio training set, performing feature conversion on each audio in the audio training set, to obtain a target sound spectrum atlas;

Based on the attention mechanism and small sample learning, use the target spectrogram atlas to train the pre-built deep learning network model to obtain the user state recognition model;

When the audio of the user to be identified is received, feature conversion is performed on the audio of the user to be identified to obtain the spectrogram to be identified;

The user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.
The audio-based user state recognition method according to claim 1, wherein said performing feature conversion on each audio in said audio training set to obtain a target spectrogram atlas comprises:

Resample each audio in the audio training set to obtain a corresponding digital voice signal;

Performing pre-emphasis on the digital voice signal to obtain a standard digital voice signal;

Summarize all the standard digital voice signals to obtain a voice signal set;

Perform feature conversion on each standard digital voice signal included in the voice signal set to obtain a target spectrogram set.
3. The audio-based user state recognition method according to claim 2, wherein said performing feature conversion on each standard digital voice signal included in said voice signal set to obtain a target spectroscopic atlas includes:

Using a preset sound processing algorithm to map each standard digital voice signal in the voice signal set in the frequency domain to obtain a corresponding target spectrogram;

Collect all the target spectrograms to obtain the target spectrogram set.
The audio-based user state recognition method according to claim 1, wherein said training a pre-built deep learning network model by using said target spectroscopic atlas to obtain a user state recognition model comprises:

Randomly dividing the target sound spectrum atlas into a training set and a test set;

Training the deep learning network model by using the training set to obtain an initial recognition model;

Testing the initial recognition model according to the test set to obtain a loss value;

Returning to the step of randomly dividing the target spectrogram set into a training set and a test set when the loss value is greater than a preset threshold;

When the loss value is less than or equal to a preset threshold, the initial recognition model is used as the user state recognition model.
The audio-based user state recognition method of claim 4, wherein the randomly dividing the target spectrogram set into a training set and a test set includes:

Classify each target spectrogram in the target spectrogram set according to the corresponding initial label to obtain the corresponding classified target spectrogram set;

Randomly taking out a preset number of target spectrograms from the classified target spectrogram set as a test subset, and using a complement of the test subset in the classified spectrogram set as a training subset;

Summarize all the training subsets to obtain a training set;

Summarize all the test subsets to obtain a test set.
5. The audio-based user state recognition method according to claim 5, wherein said testing said initial recognition model according to said test set to obtain a loss value comprises:

Extracting a feature vector corresponding to each of the initial tags in the initial recognition model to obtain a target feature vector;

Using the initial recognition model to perform feature extraction on each target spectrogram in the test set to obtain a corresponding test feature vector;

Calculating the distance between the target feature vector and the test feature vector corresponding to each of the initial tags to obtain a loss distance value;

Calculate the average value of all the loss distance values to obtain the loss value.
The audio-based user state recognition method according to any one of claims 1 to 6, wherein the audio training set is a set of cough audio corresponding to different disease conditions.
An audio-based user state recognition device, wherein the device includes:

The model generation module is used to obtain an audio training set, perform feature conversion on each audio in the audio training set to obtain a target spectrogram atlas; based on the attention mechanism and small sample learning, use the target spectrogram atlas to The pre-built deep learning network model is trained to obtain the user state recognition model;

The state recognition module is used to perform feature conversion on the audio of the user to be recognized when the audio of the user to be recognized is received to obtain the spectrogram to be recognized; use the user state recognition model to perform the feature conversion on the spectrogram to be recognized Recognize and get the result of user status recognition.
An electronic device, wherein the electronic device includes:

At least one processor; and,

A memory communicatively connected with the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the following steps:

Acquiring an audio training set, performing feature conversion on each audio in the audio training set, to obtain a target sound spectrum atlas;

Based on the attention mechanism and small sample learning, use the target spectrogram atlas to train the pre-built deep learning network model to obtain the user state recognition model;

When the audio of the user to be identified is received, feature conversion is performed on the audio of the user to be identified to obtain the spectrogram to be identified;

The user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.
9. The electronic device according to claim 9, wherein said performing feature conversion on each audio in said audio training set to obtain a target sound spectrum atlas comprises:

Resample each audio in the audio training set to obtain a corresponding digital voice signal;

Performing pre-emphasis on the digital voice signal to obtain a standard digital voice signal;

Summarize all the standard digital voice signals to obtain a voice signal set;

Perform feature conversion on each standard digital voice signal included in the voice signal set to obtain a target spectrogram set.
10. The electronic device according to claim 10, wherein said performing feature conversion on each standard digital voice signal included in said voice signal set to obtain a target spectrogram set comprises:

Using a preset sound processing algorithm to map each standard digital voice signal in the voice signal set in the frequency domain to obtain a corresponding target spectrogram;

Collect all the target spectrograms to obtain the target spectrogram set.
9. The electronic device according to claim 9, wherein the training a pre-built deep learning network model by using the target spectrogram atlas to obtain a user state recognition model comprises:

Randomly dividing the target sound spectrum atlas into a training set and a test set;

Training the deep learning network model by using the training set to obtain an initial recognition model;

Testing the initial recognition model according to the test set to obtain a loss value;

When the loss value is greater than the preset threshold, return to the step of randomly dividing the target spectrogram set into a training set and a test set;

When the loss value is less than or equal to a preset threshold, the initial recognition model is used as the user state recognition model.
The electronic device according to claim 12, wherein the randomly dividing the target spectrogram set into a training set and a test set comprises:

Classify each target spectrogram in the target spectrogram set according to the corresponding initial label to obtain a corresponding classified target spectrogram set;

Randomly taking out a preset number of target spectrograms from the classified target spectrogram set as a test subset, and using a complement of the test subset in the classified spectrogram set as a training subset;

Summarize all the training subsets to obtain a training set;

Summarize all the test subsets to obtain a test set.
15. The electronic device of claim 13, wherein said testing said initial recognition model according to said test set to obtain a loss value comprises:

Extracting a feature vector corresponding to each of the initial tags in the initial recognition model to obtain a target feature vector;

Using the initial recognition model to perform feature extraction on each target spectrogram in the test set to obtain a corresponding test feature vector;

Calculating the distance between the target feature vector and the test feature vector corresponding to each of the initial tags to obtain a loss distance value;

Calculate the average value of all the loss distance values to obtain the loss value.
The electronic device according to any one of claims 9 to 14, wherein the audio training set is a set of cough audio corresponding to different disease conditions.
A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the following steps:

Acquiring an audio training set, performing feature conversion on each audio in the audio training set, to obtain a target sound spectrum atlas;

Based on the attention mechanism and small sample learning, use the target spectrogram atlas to train the pre-built deep learning network model to obtain the user state recognition model;

When the audio of the user to be identified is received, feature conversion is performed on the audio of the user to be identified to obtain the spectrogram to be identified;

The user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.
15. The computer-readable storage medium according to claim 16, wherein said performing feature conversion on each audio in said audio training set to obtain a target spectrogram atlas comprises:

Resample each audio in the audio training set to obtain a corresponding digital voice signal;

Performing pre-emphasis on the digital voice signal to obtain a standard digital voice signal;

Summarize all the standard digital voice signals to obtain a voice signal set;

Perform feature conversion on each standard digital voice signal included in the voice signal set to obtain a target spectrogram set.
17. The computer-readable storage medium according to claim 17, wherein said performing feature conversion on each standard digital voice signal contained in said voice signal set to obtain a target spectroscopic atlas includes:

Using a preset sound processing algorithm to map each standard digital voice signal in the voice signal set in the frequency domain to obtain a corresponding target spectrogram;

Collect all the target spectrograms to obtain the target spectrogram set.
15. The computer-readable storage medium according to claim 16, wherein the training a pre-built deep learning network model by using the target spectroscopic atlas to obtain a user state recognition model comprises:

Randomly dividing the target sound spectrum atlas into a training set and a test set;

Training the deep learning network model by using the training set to obtain an initial recognition model;

Testing the initial recognition model according to the test set to obtain a loss value;

When the loss value is greater than the preset threshold, return to the step of randomly dividing the target spectrogram set into a training set and a test set;

When the loss value is less than or equal to a preset threshold, the initial recognition model is used as the user state recognition model.
19. The computer-readable storage medium of claim 19, wherein the randomly dividing the target spectrogram set into a training set and a test set comprises:

Classify each target spectrogram in the target spectrogram set according to the corresponding initial label to obtain a corresponding classified target spectrogram set;

Randomly taking out a preset number of target spectrograms from the classified target spectrogram set as a test subset, and using a complement of the test subset in the classified spectrogram set as a training subset;

Summarize all the training subsets to obtain a training set;

Summarize all the test subsets to obtain a test set.