WO2022178934A1

WO2022178934A1 - Health testing method and apparatus, and device and storage medium

Info

Publication number: WO2022178934A1
Application number: PCT/CN2021/082864
Authority: WO
Inventors: 顾艳梅; 马骏; 王少军
Original assignee: 平安科技（深圳）有限公司
Priority date: 2021-02-26
Filing date: 2021-03-25
Publication date: 2022-09-01
Also published as: CN112967806A

Abstract

A health testing method, relating to the field of artificial intelligence. The method comprises: acquiring voice data and a tongue appearance image of a user (S1); performing voice health testing on the voice data using a completely trained voice health recognition model, so as to obtain a voice health testing result (S2); performing image health testing on the tongue appearance image using a completely trained tongue appearance health recognition model, so as to obtain a tongue appearance health testing result (S3); and combining the voice health testing result with the tongue appearance health testing result to obtain a health testing result, and pushing the health testing result to the user (S4). By means of the method, health testing accuracy can be improved.

Description

Health detection method, device, equipment and storage medium

This application claims the priority of the Chinese patent application with the application number CN202110214173.3 and titled "Health Detection Method, Apparatus, Equipment and Storage Medium" filed with the China Patent Office on February 26, 2021, the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the field of artificial intelligence, and in particular, to a health detection method, apparatus, electronic device, and computer-readable storage medium.

Background technique

With the continuous improvement of people's living standards, people are paying more and more attention to their physical health. At the same time, people expect that the current medical technology can better care for ordinary people. However, my country's current medical system is still in the "difficulty in seeing a doctor. Medical treatment is expensive” stage. The inventor realized that at present, the user health detection is mainly based on the face recognition identity, and the corresponding user medical data is pulled, so as to realize the user health detection, but often in real scenarios, the medical data is limited and cannot be fully understood. The comprehensive medical data of the user will affect the accuracy of the user's health detection.

SUMMARY OF THE INVENTION

In order to achieve the above purpose, a health detection method provided by this application includes:

Obtain the user's voice data and tongue images;

Use the trained voice health recognition model to perform voice health detection on the voice data, and obtain a voice health detection result;

Using the trained tongue health recognition model to perform image health detection on the tongue image to obtain a tongue health detection result;

The voice health detection result and the tongue health detection result are fused to obtain a health detection result, and the health detection result is pushed to the user.

The present application also provides a health detection device, the device comprising:

The acquisition module is used to acquire the user's voice data and tongue image;

a detection module, configured to perform voice health detection on the voice data by using the trained voice health recognition model to obtain a voice health detection result;

The detection module is used to perform image health detection on the tongue image by using the trained tongue health recognition model to obtain a tongue health detection result;

A push module is configured to fuse the voice health detection result and the tongue health detection result to obtain a health detection result, and push the health detection result to the user.

The present application also provides an electronic device, the electronic device comprising:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to implement the following steps:

Obtain the user's voice data and tongue images;

The present application also provides a computer-readable storage medium, where at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is executed by a processor in an electronic device to implement the following steps:

Obtain the user's voice data and tongue images;

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below, and other features and advantages of the application will become apparent from the description, drawings, and claims.

Description of drawings

1 is a schematic flowchart of a health detection method provided by an embodiment of the present application;

FIG. 2 is a detailed flowchart of one step of the health detection method provided in FIG. 1 in the first embodiment of the present application;

FIG. 3 is a schematic block diagram of a health detection device provided by an embodiment of the present application;

4 is a schematic diagram of the internal structure of an electronic device implementing a health detection method provided by an embodiment of the present application;

The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.

Detailed ways

It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

The embodiments of the present application provide a health detection method. The execution body of the health detection method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server and a terminal. In other words, the health detection method can be executed by software or hardware installed in a terminal device or a server device, and the software can be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Referring to FIG. 1 , a schematic flowchart of a health detection method according to an embodiment of the present application is shown. In the embodiment of the present application, the health detection method includes:

S1. Acquire the voice data and the tongue image of the user.

It should be understood that with the continuous improvement of people's living standards, more and more people will pay attention to their physical health status. Therefore, the embodiment of the present application obtains the user's voice data and tongue image to ensure the premise of subsequent user health detection. This can help users better understand their own physique. Wherein, the voice data refers to the voice data issued by the user, and the tongue image refers to a picture of the user's tongue.

In an optional embodiment, the voice data may be acquired through a sound collection device, and the sound collection device includes a mobile phone microphone.

After an optional embodiment, the tongue phase image can be acquired by an image acquisition device, and the image acquisition device includes a mobile phone camera.

S2. Use the trained voice health recognition model to perform voice health detection on the voice data to obtain a voice health detection result.

In the embodiment of the present application, the voice health recognition model includes a voice classification module and a voice analysis module, wherein the voice classification module is used to extract user voices from the voice data, so as to segment the background sound of the voice data , extracting the user's voice, and the voice analysis module is configured to perform a voice health analysis on the user's voice output by the voice classification module, so as to detect the voice health detection result of the user.

Further, before using the trained voice health recognition model to perform voice health detection on the voice data according to the embodiment of the present application, the voice health recognition model needs to be trained to ensure the voice health of the voice health recognition model. detection accuracy.

In detail, referring to Fig. 2, the training of the voice health recognition model includes:

S20, acquiring training voice data, marking the human voice data in the training voice data to obtain standard human voice data, and performing health detection on the standard human voice data to obtain standard human voice health data;

S21, using the voice classification module of the voice health recognition model to perform vocal segmentation on the training voice data to obtain training voice data;

S22, using the voice analysis module in the voice health recognition model to perform health detection on the training vocal data to obtain training vocal health data;

S23, according to the standard vocal data, standard vocal health data, training vocal data and the training vocal health data, calculate the loss value of the voice health recognition model;

If the loss value does not meet the preset condition, perform S24, adjust the parameters of the voice health recognition model, and return to the voice classification module using the voice health recognition model to perform vocal segmentation on the training voice data , the steps of obtaining training vocal data;

If the loss value satisfies the preset condition, perform S25 to obtain a trained voice health recognition model.

In an optional embodiment of the present application, the human voice data and health detection in S20 can be implemented by manual labeling, so as to ensure the accuracy of the generated standard human voice data and standard human voice health data, so that the Better supervise the learning ability of subsequent models.

In an optional embodiment of the present application, the use of the voice classification module of the voice health recognition model to perform human voice segmentation on the training voice data to obtain training voice data includes: using the voice classification module in the voice classification module. The voice frequency conversion algorithm converts the training voice data into corresponding voice frequencies, calculates the dimensional parameters of the voice frequencies, and filters out the human voice data in the training voice data according to the dimensional parameters to obtain the training voice. data. The dimension parameters include: intonation, speech rate, and the like. For example, convert a user's voice into a voice frequency in the range of 70-100 Hz, calculate the user's intonation, speed and other dimension parameters according to the voice frequency, and filter out the people in the training voice data according to the dimension parameters. sound data

In an optional embodiment, the sound frequency conversion algorithm includes:

where B(f) represents the speech frequency and f represents the expected frequency of the training speech data.

In an optional embodiment, the following method is used to calculate the dimension parameter of the voice frequency:

Among them, d(n) represents the dimension parameter of the speech frequency, i represents the frame rate of the speech frequency, n represents the amplitude of the speech frequency, B(f) represents the speech frequency, and k represents the linear combination of the current standard speech frame and the preceding and following standard speech frames , usually takes a value of 2, which represents the linear combination of the current speech frame and the two preceding and following speech frames.

In an optional embodiment of the present application, the step S22 includes: using the speech analysis module to perform feature extraction on the training vocal data to obtain characteristic speech data, and performing health analysis on the characteristic speech data and outputting, Obtain the training vocal health data. The characteristic voice data refers to the characteristic voiceprint in the training voice data, which is used to represent the voice information of the training voice data, and the health analysis may Frequency and other dimensional information to establish convolution kernel implementation.

In an optional embodiment of the present application, the S23 includes: calculating a first loss value of the voice health recognition model according to the standard human voice data and the training human voice data; The health data and the training vocal health data are used to calculate the second loss value of the voice health recognition model; the loss value of the voice health recognition model is calculated according to the first loss value and the second loss value.

In an optional embodiment, the following method is used to calculate the first loss value of the voice health recognition model:

Among them, L(s) represents the first loss value, k represents the number of training speech data, _yi represents the ith training vocal data, and y′ _i represents the ith standard human voice data.

In an optional embodiment, the second loss value of the voice health recognition model is calculated by the following method:

L1=|α _p -α _g |

where L1 represents the second loss value, α _g represents the standard vocal health data, and α _p is the training vocal health data.

In an optional embodiment, the calculating the loss value of the speech health recognition model according to the first training loss and the second training loss includes: combining the first loss value and the second loss The values are added to obtain the loss value of the speech health recognition model, that is, L=L(s)+LC.

In an optional embodiment of the present application, the preset condition includes that the loss value is less than a loss threshold. That is, when the loss value is less than the loss threshold, it means that the loss value satisfies the preset condition, and when the loss value is greater than or equal to the loss threshold, it means that the loss value does not meet the predetermined condition. when the preset conditions are stated. Wherein, the loss threshold may be set to 0.1, or may be set according to actual scenarios. Further, the parameter adjustment of the voice health recognition model may be implemented by a currently known stochastic gradient descent algorithm, which will not be described further herein.

S3. Use the trained tongue health recognition model to perform image health detection on the tongue image to obtain a tongue health detection result.

In the embodiment of the present application, the tongue health recognition model includes an image classification module and an image analysis module, wherein the image classification module is used to perform background segmentation on the tongue image to output the tongue region, the image The analysis module is configured to perform a tongue health analysis on the tongue region output by the image classification module, and output the tongue health state.

In the embodiment of the present application, before the tongue health detection is performed on the tongue image by using the trained tongue health recognition model, the tongue health recognition model needs to be trained to ensure the tongue health recognition. The model's tongue health detection accuracy.

In detail, the training of the tongue health recognition model includes: acquiring a training tongue image, and using an image classification module in the tongue health recognition model to perform feature extraction on the training tongue image to obtain features Tongue image, use the image analysis module in the tongue health recognition model to detect the health status of the characteristic tongue image, obtain the predicted health status, and calculate the predicted health status and the standard health corresponding to the training tongue image The training loss of the state, according to the training loss, the parameters of the tongue health recognition model are adjusted until the training loss is less than the preset training loss, and the trained tongue health recognition model is obtained. Optionally, the preset training loss is 0.1.

In an optional embodiment of the present application, performing feature extraction on the training lingual image by using the image classification module in the lingual health recognition model to obtain a characteristic lingual image includes: using the image classification The convolution layer in the module performs the convolution operation on the training tongue image to obtain the initial characteristic tongue image, and uses the pooling layer in the image classification module to perform a dimensionality reduction operation on the initial characteristic tongue image to obtain The dimension-reduced characteristic lingual image is outputted by using the activation function in the image classification module to obtain the characteristic lingual image. Wherein, the activation function in the image classification module includes a relu activation function.

In an optional embodiment of the present application, the detecting the health state of the characteristic tongue image by using the image analysis module in the tongue health recognition model to obtain the predicted health state includes: using the image analysis module The sampling layer in the sample up-samples the characteristic lingual image to obtain the sampled lingual image, and uses the fully connected layer in the image analysis module to perform health detection on the sampled lingual image and output it to obtain the prediction. health status.

In an optional embodiment of the present application, the training loss may be calculated by a currently known sigmoid function.

Further, in the embodiment of the present application, before performing image health detection on the lingual image by using the trained lingual health recognition model, the method further includes: performing a preprocessing operation on the lingual image to improve the lingual image. The quality of the images ensures the accuracy of the analysis of lingual images. Wherein, the preprocessing operation includes: performing a grayscale conversion operation on the tongue image through each proportional method to obtain a grayscale tongue image; using Gaussian filtering to reduce noise on the grayscale tongue image; and using contrast The enhancement is to perform contrast enhancement on the noise-reduced grayscale lingual image; and perform a thresholding operation on the contrast-enhanced grayscale lingual image according to the OTSU algorithm.

S6. Integrate the voice health detection result and the tongue health detection result to obtain a health detection result, and push the health detection result to the user.

In the embodiment of the present application, the following formula is used to fuse the voice health detection result and the tongue health detection result:

Wherein, f(x, a) represents the health detection result, k represents the number of the fused health detection results, x represents the same vector of the voice health detection result and the tongue health detection result,

represents the voice health detection result,

represents the tongue health detection result, and ɑ represents the preset weight parameter (a∈(0,1)).

Further, it should be noted that the division of the weight parameters can be implemented according to actual business scenarios, for example, the weight of the tongue health detection result is 60%, and the weight of the voice health detection result is 40%.

Further, in this embodiment of the present application, the health detection result can be pushed to the user through a mobile terminal, so that the user can intuitively understand his physical state in real time, wherein the mobile terminal can be a mobile terminal.

Further, in order to ensure the privacy and security of the health detection results, the health detection results can also be stored in a blockchain node.

The embodiment of the present application first obtains the user's voice data and tongue image to ensure the premise of subsequent user health detection; secondly, the embodiment of the present application uses the trained voice health recognition model to perform voice health detection on the voice data to obtain a voice Health detection results, and use the trained tongue health recognition model to perform image health detection on the tongue images to obtain tongue health detection results, which can realize the health detection of online users' voice data and tongue data, so that users can The health detection is more comprehensive; further, the embodiment of the present application fuses the voice health detection result and the tongue health detection result to obtain a health detection result, and pushes the health detection result to the user to Help users understand their physical state intuitively in real time. Therefore, the present application can improve the convenience of user health detection.

As shown in FIG. 3 , it is a functional block diagram of the health detection device of the present application.

The health detection apparatus 100 described in this application may be installed in an electronic device. According to the implemented functions, the health detection apparatus may include an acquisition module 101 , a detection module 102 and a push module 103 . The modules described in the present invention can also be called units, which refer to a series of computer program segments that can be executed by the electronic device processor and can perform fixed functions, and are stored in the memory of the electronic device.

In this embodiment, the functions of each module/unit are as follows:

The acquisition module 101 is used to acquire the user's voice data and tongue image;

The detection module 102 is configured to perform voice health detection on the voice data by using the trained voice health recognition model to obtain a voice health detection result;

The detection module 102 is configured to perform image health detection on the tongue image by using the trained tongue health recognition model to obtain a tongue health detection result;

The pushing module 103 is configured to fuse the voice health detection result and the tongue health detection result to obtain a health detection result, and push the health detection result to the user.

In detail, the modules in the health detection device 100 in the embodiments of the present application use the same technical means as the health detection methods described in the above-mentioned FIG. 1 and FIG. 2 , and can generate the same technology The effect will not be repeated here.

As shown in FIG. 4 , it is a schematic structural diagram of an electronic device implementing the health detection method of the present application.

The electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as a health detection program 12.

Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 . In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can not only be used to store application software installed in the electronic device 1 and various types of data, such as codes of the health detection program 12, etc., but also can be used to temporarily store data that has been output or will be output.

In some embodiments, the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits. Central Processing Unit (CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc. The processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the program or module (for example, executing the program) stored in the memory 11. health detection program, etc.), and call the data stored in the memory 11 to perform various functions of the electronic device 1 and process data.

The bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.

FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the drawings. components, or a combination of certain components, or a different arrangement of components.

For example, although not shown, the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management The device implements functions such as charge management, discharge management, and power consumption management. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

Further, the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.

Optionally, the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.

It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.

The health detection program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple programs. When running in the processor 10, it can realize:

Obtain the user's voice data and tongue images;

Specifically, for the specific implementation method of the above program by the processor 10, reference may be made to the description of the relevant steps in the corresponding embodiments of FIG. 1 and FIG. 2 , and details are not described herein.

Further, if the modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a non-volatile computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disc, a computer memory, a read-only memory (ROM, Read-Only). Memory).

The present application also provides a computer-readable storage medium, where the readable storage medium stores a computer program, and when executed by a processor of an electronic device, the computer program can realize:

Obtain the user's voice data and tongue images;

In the several embodiments provided in this application, it should be understood that the disclosed apparatus, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application.

Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any reference signs in the claims shall not be construed as limiting the involved claim.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the system claims can also be realized by one unit or means by means of software or hardware. Second-class terms are used to denote names and do not denote any particular order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application rather than limitations. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims

A health detection method, wherein the method includes:

Obtain the user's voice data and tongue images;

Use the trained voice health recognition model to perform voice health detection on the voice data, and obtain a voice health detection result;

Using the trained tongue health recognition model to perform image health detection on the tongue image to obtain a tongue health detection result;

The voice health detection result and the tongue health detection result are fused to obtain a health detection result, and the health detection result is pushed to the user.
The health detection method according to claim 1, wherein before the voice health detection is performed on the voice data by using the trained voice health recognition model, the method further comprises:

Obtaining training voice data, marking the human voice data in the training voice data, obtaining standard human voice data, and performing health detection on the standard human voice data to obtain standard human voice health data;

Use the voice classification module of the voice health recognition model to perform human voice segmentation on the training voice data to obtain training voice data;

Use the voice analysis module in the voice health recognition model to perform health detection on the training vocal data to obtain training vocal health data;

Calculate the loss value of the voice health recognition model according to the standard vocal data, the standard vocal health data, the training vocal data, and the training vocal health data;

If the loss value does not meet the preset conditions, then adjust the parameters of the voice health recognition model, and return to the voice classification module using the voice health recognition model to perform human voice segmentation on the training voice data to obtain training Steps for vocal data;

If the loss value satisfies the preset condition, a trained voice health recognition model is obtained.
The health detection method according to claim 2, wherein the voice classification module using the voice health recognition model performs human voice segmentation on the training voice data to obtain training voice data, comprising:

Using the voice frequency conversion algorithm in the voice classification module to convert the training voice data into corresponding voice frequencies, and calculate the dimension parameters of the voice frequencies;

According to the dimension parameter, the human voice data in the training voice data is screened out to obtain training human voice data.
The health detection method according to claim 2, wherein the calculation of the voice health recognition model is based on the standard vocal data, standard vocal health data, training vocal data and the training vocal health data. Loss values, including:

Calculate the first loss value of the voice health recognition model according to the standard vocal data and the training vocal data;

Calculate the second loss value of the voice health recognition model according to the standard vocal health data and the training vocal health data;

According to the first loss value and the second loss value, a loss value of the speech health recognition model is calculated.
The health detection method according to claim 1, wherein, before performing image health detection on the tongue image by using the trained tongue health recognition model, the method further comprises:

Obtain training tongue images;

Using the image classification module in the lingual health recognition model to perform feature extraction on the training lingual images to obtain characteristic lingual images;

Use the image analysis module in the tongue health recognition model to detect the health state of the characteristic tongue image to obtain the predicted health state;

calculating the training loss of the predicted health state and the standard health state corresponding to the training lingual image;

According to the training loss, the parameters of the tongue health recognition model are adjusted until the training loss is less than the preset training loss, and the trained tongue health recognition model is obtained.
The health detection method according to claim 1, wherein, before performing the image health detection on the tongue image by using the trained tongue health recognition model, the method further comprises:

performing a grayscale conversion operation on the lingual image to obtain a grayscale lingual image;

denoising the grayscale lingual image;

performing contrast enhancement on the grayscale tongue phase image after noise reduction;

Thresholding is performed on the contrast-enhanced grayscale lingual image.
The health detection method according to any one of claims 1 to 6, wherein the fusion of the voice health detection result and the tongue health detection result comprises:

Utilize the following formula to fuse the voice health detection result and the tongue health detection result:

Wherein, f(x, a) represents the health detection result, k represents the number of the fused health detection results, x represents the same vector of the voice health detection result and the tongue health detection result,
represents the voice health detection result,
represents the tongue health detection result, and ɑ represents the preset weight parameter.
A health detection device, wherein the device includes:

The acquisition module is used to acquire the user's voice data and tongue image;

a detection module, configured to perform voice health detection on the voice data by using the trained voice health recognition model to obtain a voice health detection result;

The detection module is used to perform image health detection on the tongue image by using the trained tongue health recognition model to obtain a tongue health detection result;

A push module is configured to fuse the voice health detection result and the tongue health detection result to obtain a health detection result, and push the health detection result to the user.
An electronic device, wherein the electronic device comprises:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the steps of:

Obtain the user's voice data and tongue images;

Use the trained voice health recognition model to perform voice health detection on the voice data, and obtain a voice health detection result;

Using the trained tongue health recognition model to perform image health detection on the tongue image to obtain a tongue health detection result;

The voice health detection result and the tongue health detection result are fused to obtain a health detection result, and the health detection result is pushed to the user.
The electronic device according to claim 9, wherein before the voice health detection is performed on the voice data by using the trained voice health recognition model, the computer program further implements the following steps when executed by the at least one processor :

Obtaining training voice data, marking the human voice data in the training voice data, obtaining standard human voice data, and performing health detection on the standard human voice data to obtain standard human voice health data;

Use the voice classification module of the voice health recognition model to perform human voice segmentation on the training voice data to obtain training voice data;

Use the voice analysis module in the voice health recognition model to perform health detection on the training vocal data to obtain training vocal health data;

Calculate the loss value of the voice health recognition model according to the standard vocal data, the standard vocal health data, the training vocal data and the training vocal health data;

If the loss value does not meet the preset conditions, adjust the parameters of the voice health recognition model, and return to the voice classification module using the voice health recognition model to perform human voice segmentation on the training voice data to obtain training Steps for vocal data;

If the loss value satisfies the preset condition, a trained voice health recognition model is obtained.
The electronic device according to claim 10, wherein the voice classification module using the voice health recognition model performs human voice segmentation on the training voice data to obtain training voice data, comprising:

Using the voice frequency conversion algorithm in the voice classification module to convert the training voice data into corresponding voice frequencies, and calculate the dimension parameters of the voice frequencies;

According to the dimension parameter, the human voice data in the training voice data is screened out to obtain training human voice data.
The electronic device of claim 10, wherein the loss of the speech health recognition model is calculated according to the standard vocal data, standard vocal health data, training vocal data, and the training vocal health data values, including:

Calculate the first loss value of the voice health recognition model according to the standard vocal data and the training vocal data;

Calculate the second loss value of the voice health recognition model according to the standard vocal health data and the training vocal health data;

According to the first loss value and the second loss value, a loss value of the speech health recognition model is calculated.
The electronic device as claimed in claim 9, wherein before the image health detection is performed on the tongue image by using the trained tongue health recognition model, the computer program is further executed by the at least one processor. Implement the following steps:

Obtain training tongue images;

Using the image classification module in the lingual health recognition model to perform feature extraction on the training lingual images to obtain characteristic lingual images;

Use the image analysis module in the tongue health recognition model to detect the health state of the characteristic tongue image to obtain the predicted health state;

calculating the training loss of the predicted health state and the standard health state corresponding to the training lingual image;

According to the training loss, the parameters of the tongue health recognition model are adjusted until the training loss is less than the preset training loss, and the trained tongue health recognition model is obtained.
The electronic device according to claim 9, wherein before the image health detection is performed on the tongue image by using the trained tongue health recognition model, the computer program is further implemented when executed by the at least one processor Follow the steps below:

performing a grayscale conversion operation on the lingual image to obtain a grayscale lingual image;

denoising the grayscale lingual image;

performing contrast enhancement on the grayscale tongue phase image after noise reduction;

Thresholding is performed on the contrast-enhanced grayscale lingual image.
The electronic device according to any one of claims 9 to 14, wherein the fusion of the voice health detection result and the tongue health detection result comprises:

Utilize the following formula to fuse the voice health detection result and the tongue health detection result:

Wherein, f(x, a) represents the health detection result, k represents the number of the fused health detection results, x represents the same vector of the voice health detection result and the tongue health detection result,
represents the voice health detection result,
represents the tongue health detection result, and ɑ represents the preset weight parameter.
A computer-readable storage medium storing a computer program, wherein the computer program implements the following steps when executed by a processor:

Obtain the user's voice data and tongue images;

Use the trained voice health recognition model to perform voice health detection on the voice data, and obtain a voice health detection result;

Using the trained tongue health recognition model to perform image health detection on the tongue image to obtain a tongue health detection result;

The voice health detection result and the tongue health detection result are fused to obtain a health detection result, and the health detection result is pushed to the user.
The computer-readable storage medium according to claim 16, wherein, before the voice health detection is performed on the voice data by using the trained voice health recognition model, the computer program further implements the following steps when executed by the processor:

Obtaining training voice data, marking the human voice data in the training voice data, obtaining standard human voice data, and performing health detection on the standard human voice data to obtain standard human voice health data;

Use the voice classification module of the voice health recognition model to perform human voice segmentation on the training voice data to obtain training voice data;

Use the voice analysis module in the voice health recognition model to perform health detection on the training vocal data to obtain training vocal health data;

Calculate the loss value of the voice health recognition model according to the standard vocal data, the standard vocal health data, the training vocal data, and the training vocal health data;

If the loss value does not meet the preset conditions, then adjust the parameters of the voice health recognition model, and return to the voice classification module using the voice health recognition model to perform human voice segmentation on the training voice data to obtain training Steps for vocal data;

If the loss value satisfies the preset condition, a trained voice health recognition model is obtained.
The computer-readable storage medium according to claim 17 , wherein the voice segmentation of the training voice data by the voice classification module of the voice health recognition model to obtain training voice data, comprising:

Using the voice frequency conversion algorithm in the voice classification module to convert the training voice data into corresponding voice frequencies, and calculate the dimension parameters of the voice frequencies;

According to the dimension parameter, the human voice data in the training voice data is screened out to obtain training human voice data.
18. The computer-readable storage medium of claim 17, wherein the voice health recognition is calculated based on the standard vocal data, standard vocal health data, training vocal data, and the training vocal health data The loss value of the model, including:

Calculate the first loss value of the voice health recognition model according to the standard vocal data and the training vocal data;

Calculate the second loss value of the voice health recognition model according to the standard vocal health data and the training vocal health data;

According to the first loss value and the second loss value, a loss value of the speech health recognition model is calculated.
The computer-readable storage medium according to claim 16, wherein before the image health detection is performed on the tongue image by using the trained tongue health recognition model, the computer program further implements when executed by the processor Follow the steps below:

Obtain training tongue images;

Using the image classification module in the lingual health recognition model to perform feature extraction on the training lingual images to obtain characteristic lingual images;

Use the image analysis module in the tongue health recognition model to detect the health state of the characteristic tongue image to obtain the predicted health state;

calculating the training loss of the predicted health state and the standard health state corresponding to the training lingual image;

According to the training loss, the parameters of the tongue health recognition model are adjusted until the training loss is less than the preset training loss, and the trained tongue health recognition model is obtained.