CN117094966A

CN117094966A - Tongue image identification method and device based on image amplification and computer equipment

Info

Publication number: CN117094966A
Application number: CN202311051530.4A
Authority: CN
Inventors: 冯健; 陈栋栋; 赖永航
Original assignee: Qingdao Medcare Digital Engineering Co ltd
Current assignee: Qingdao Medcare Digital Engineering Co ltd
Priority date: 2023-08-21
Filing date: 2023-08-21
Publication date: 2023-11-21
Anticipated expiration: 2043-08-21
Also published as: CN117094966B

Abstract

The invention relates to the technical field of image processing, and provides a tongue image recognition method, a tongue image recognition device and computer equipment based on image amplification, wherein the method comprises the following steps: carrying out effective region segmentation processing on an initial tongue image to be processed to obtain a first tongue image and a first tongue outline mask image; generating N first generated tongue images corresponding to the first tongue outline mask images based on a preset image diffusion model, wherein N is greater than or equal to 1; respectively carrying out color correction on the first tongue images based on the N first generation type tongue images, and respectively generating N second tongue images corresponding to the N first generation type tongue images one by one; and taking N second tongue images or a set of N second tongue images and the first tongue images as target image data, and calling a preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image. The tongue image recognition method and device improve the accuracy of tongue image recognition.

Description

Tongue image identification method and device based on image amplification and computer equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a tongue image recognition method, apparatus and computer device based on image amplification.

Background

The tongue is one of the important constituent organs in the human digestive tract, consists of a plurality of crisscrossed striated muscles, and is externally covered with a special mucous membrane structure. In traditional Chinese medicine, the tongue is closely related to the viscera, especially the spleen and stomach, so that the tongue is often used to infer the digestive tract diseases.

When digestive tract diseases occur, the characteristics of tongue, coating, tooth trace and the like often show regular changes, and the changes can be captured and analyzed by naked eyes or by an image. In modern medicine, the doctor is assisted in identifying digestive tract diseases by judging the type of digestive tract diseases corresponding to tongue images. However, in the remote consultation, because the tongue coating image shot by the mobile phone is easily affected by factors such as a mobile phone imaging algorithm, light, background interference and the like, the image cannot reflect real image information, such as white tongue coating, but the tongue coating shot by the mobile phone is reddish, and the error of tongue image recognition is caused by the deviation caused by shooting.

Disclosure of Invention

The present invention has been made in view of the above problems, and has as its object to provide a tongue image recognition method, apparatus and computer device based on image augmentation, which overcome the above problems.

In one aspect of the present invention, there is provided a tongue image recognition method based on image augmentation, the method comprising:

carrying out effective region segmentation processing on an initial tongue image to be processed to obtain a first tongue image and a first tongue outline mask image;

generating N first generated tongue images corresponding to the first tongue outline mask images based on a preset image diffusion model, wherein N is greater than or equal to 1;

respectively carrying out color correction on the first tongue images based on N first generation type tongue images, and respectively generating N second tongue images corresponding to the N first generation type tongue images one by one;

and taking N second tongue images or a set of N second tongue images and the first tongue images as target image data, and calling a preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image.

Further, the performing color correction on the first tongue images based on the N first generated tongue images, and generating N second tongue images corresponding to the N first generated tongue images one to one respectively includes:

sequentially selecting one image from the first generated tongue image as a target generated tongue image, and performing feature point matching on the first tongue image and the target generated tongue image to obtain feature point pairs;

Generating a color correction matrix between the first tongue image and the target generated tongue image according to the color corresponding relation of the feature points;

and performing color correction on the first tongue image based on the color correction matrix to obtain a second tongue image corresponding to the target generated tongue image.

Further, the image recognition model is a cascading network model, a first network model of the cascading network model is a text description prediction model, and a second network model of the cascading network model is a tongue image classification prediction model;

the step of calling a preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image comprises the following steps:

inputting each piece of target image data into the text description prediction model for recognition so as to obtain text descriptions corresponding to each piece of target image data, wherein the text descriptions are used for describing tongue texture features in the target image data;

and inputting text descriptions corresponding to the target image data into the tongue image classification prediction model for recognition so as to obtain a recognition result corresponding to the initial tongue image.

Further, before invoking a preset image recognition model to recognize the target image data to obtain a recognition result of the initial tongue image, the method further includes: pre-training the text description prediction model;

The pre-training the text description prediction model specifically comprises the following steps:

carrying out effective area segmentation processing on the first initial tongue image sample to obtain a first tongue image sample;

acquiring a text description describing tongue texture features of the first tongue image sample;

and inputting the first tongue image sample and the text description of the first tongue image sample as training data into an input layer of a preset multi-mode training model, and obtaining a text description prediction model through training.

Further, before invoking a preset image recognition model to recognize the target image data to obtain a recognition result of the initial tongue image, the method further includes: pre-training the tongue image classification prediction model;

the pre-training the tongue image classification prediction model specifically comprises the following steps:

carrying out effective region segmentation processing on a second initial tongue image sample to obtain a second tongue image sample and a first tongue outline mask image sample, and carrying out tongue image classification labeling on the second initial tongue image sample;

generating N first generation type tongue image samples corresponding to the second tongue outline mask image based on a preset image diffusion model;

Respectively carrying out color correction on the second tongue image samples based on N first generation type tongue image samples, and respectively generating N third tongue image samples which are in one-to-one correspondence with the N first generation type tongue image samples;

taking N third tongue image samples or a set of N third tongue image samples and second tongue image samples as target image data samples, and calling a preset text description prediction model to identify the target image data samples so as to obtain text descriptions corresponding to the target image data samples;

and inputting a tongue image classification result of the second initial tongue image sample and text descriptions corresponding to each target image data sample as training data into an input layer of a preset tongue image classification prediction training model, and obtaining tongue image classification through training.

acquiring text descriptions of a plurality of tongue texture features corresponding to different tongue image classification results;

And inputting any target tongue image classification result and text description of a plurality of tongue texture features corresponding to the current target tongue image classification result as training data into an input layer of a preset tongue image classification prediction training model, and obtaining the tongue image classification prediction model through training.

Further, the image recognition model is an image recognition model obtained by training in advance, and the calling the preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image comprises the following steps:

inputting each target image data into the image recognition model for recognition so as to obtain tongue image classification results corresponding to each target image data;

and carrying out statistical calculation on tongue image classification results corresponding to each target image data so as to finally obtain an initial tongue image recognition result.

Further, before the effective region segmentation processing is performed on the initial tongue image to be processed to obtain the first tongue image and the first tongue contour mask image, the method further includes: pre-training a tongue segmentation model;

the pre-trained tongue segmentation model comprises:

acquiring a third initial tongue image sample containing a tongue effective area;

Labeling the tongue effective areas in each third initial tongue image sample to obtain a semantic segmentation training sample;

inputting the semantic segmentation training sample into an input layer of a preset semantic segmentation model, and obtaining a tongue segmentation model through training.

Further, before the effective region segmentation processing is performed on the initial tongue image to be processed to obtain a first tongue image and a first tongue outline mask image, the method further comprises;

judging whether the initial tongue image contains a tongue effective area, and if not, sending out an error prompt.

In another aspect of the present invention, there is provided a tongue image recognition apparatus based on image augmentation, the apparatus comprising:

the effective area segmentation module is used for carrying out effective area segmentation processing on the initial tongue image to be processed to obtain a first tongue image and a first tongue outline mask image;

the image diffusion module is used for generating N first generated tongue images corresponding to the first tongue outline mask images based on a preset image diffusion model, wherein N is greater than or equal to 1;

the color correction module is used for respectively carrying out color correction on the first tongue images based on N first generation type tongue images and respectively generating N second tongue images which are in one-to-one correspondence with the N first generation type tongue images;

The image recognition module is used for taking N second tongue images or a set of N second tongue images and the first tongue images as target image data, and calling a preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image.

In another aspect of the application, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor; the computer program, when executed by the processor, implements the steps of the tongue image recognition method based on image augmentation as described in any one of the above.

After an initial tongue image is obtained, effective area segmentation processing is carried out on the initial tongue image to obtain a first tongue image and a first tongue outline mask image; generating N first generation type tongue images corresponding to the first tongue outline mask images based on a preset image diffusion model, respectively carrying out color correction on the first tongue images based on the N first generation type tongue images, respectively generating N second tongue images corresponding to the N first generation type tongue images one by one, and improving generalization capability and robustness in image recognition by acquiring the N second tongue images after color correction; and finally, taking N second tongue images or a set of N second tongue images and the first tongue images as target image data, and calling a preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. In the drawings:

FIG. 1 is a flowchart of a tongue image recognition method based on image augmentation according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of a tongue effective area image and a tongue contour mask of a tongue image according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a tongue image recognition device based on image augmentation according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Example 1

The embodiment of the invention provides a tongue image recognition method based on image amplification, which is shown in fig. 1 and comprises the following steps:

s1, carrying out effective area segmentation processing on an initial tongue image to be processed to obtain a first tongue image and a first tongue outline mask image;

specifically, the outer contour of the tongue region can be used as an image boundary to carry out effective region segmentation, the region in the contour is used for obtaining an effective tongue image, the region outside the contour is filled with black pixels to generate a new image, the new image is cut off an ineffective region to obtain a tongue contour mask image, and part of effective characteristic information of the tongue is still reserved in the tongue contour mask image.

S2, generating N first generated tongue images corresponding to the first tongue outline mask images based on a preset image diffusion model, wherein N is greater than or equal to 1;

the image diffusion technology enables a model to learn the characteristics and the patterns of an image by training a large number of image data sets, and then generates a new generated image, wherein the generated image has the characteristics and the patterns of the learned image. The image diffusion model preset in the invention learns a large number of tongue images capable of truly reflecting tongue fur texture features, so that a first generated tongue image corresponding to a first tongue outline mask image can be generated according to the first tongue outline mask image. The first generated tongue image can truly reflect the texture characteristics of the tongue fur, and has the same structural characteristics as the first tongue image, so that the first tongue image can be used as the basis for color correction.

S3, respectively carrying out color correction on the first tongue images based on N first generation type tongue images, and respectively generating N second tongue images corresponding to the N first generation type tongue images one by one;

s4, taking N second tongue images or a set of N second tongue images and the first tongue images as target image data, and calling a preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image.

According to the invention, the first tongue image and the first tongue outline mask image are obtained by carrying out effective area segmentation processing on the initial tongue image, so that noise interference of images outside the tongue effective area in image recognition is reduced, a plurality of first generated tongue images capable of truly reflecting the texture characteristics of the tongue are generated through the image diffusion model and the first tongue outline mask image, and as the first generated tongue image and the first tongue image are in the same position relationship, the first tongue image is subjected to color correction through the first generated tongue image, a plurality of second tongue images subjected to color correction can be obtained, and the second tongue image can truly reflect the texture characteristics of the tongue. Finally, learning and recognition are carried out based on a preset image recognition model, and finally, a recognition result of the tongue fur image picture is obtained, so that the accuracy of image recognition is improved, and misdiagnosis caused by deviation of mobile phone photography is avoided.

In an embodiment of the present invention, after obtaining the identification result, the method further includes: comparing the identification result with a pre-stored initial result, and judging whether the treatment effect is effective or not according to the comparison result; if no effect exists, pushing the initial tongue image and/or the identification result to a preset doctor system. The tongue image recognition method based on image amplification can assist doctors to finish follow-up tracking of the illness state of digestive diseases.

Specifically, the initial tongue image uploaded by the user through the mobile phone is subjected to image diffusion and color correction to obtain a plurality of tongue images after color correction, and the tongue images after color correction and the initial tongue image obtained in the original are input into the image recognition model together, so that the robustness and generalization capability of model prediction are improved, the tongue image of a patient can be accurately recognized, and a doctor can conveniently adjust a treatment scheme according to the treatment effect by checking the treatment effect of the patient at regular time.

In addition, in the embodiment of the invention, the patient can shoot and upload tongue images through the patient condition follow-up APP at the mobile phone end, and the APP can also remind the user to shoot tongue images by the mobile phone at regular time and punch cards during the medication period of the patient.

In step S1, the initial tongue image is subjected to effective region segmentation processing by using a tongue segmentation model obtained by pre-training, the tongue segmentation model uses the outer contour of the tongue region as an image boundary, the outer region of the contour is filled with black pixels to generate a new image, the new image cuts off an ineffective region, and effective characteristic information of the tongue is reserved, so that noise interference in subsequent image recognition can be reduced, and recognition accuracy is improved.

Further, before the effective region segmentation processing is performed on the initial tongue image to obtain the first tongue image and the first tongue outline mask image, the method further includes: pre-training a tongue segmentation model; the pre-trained tongue segmentation model comprises: acquiring a third initial tongue image sample containing a tongue effective area; labeling the tongue effective areas in each third initial tongue image sample to obtain a semantic segmentation training sample; inputting the semantic segmentation training sample into an input layer of a preset semantic segmentation model, and obtaining a tongue segmentation model through training.

In addition, the tongue segmentation model of the embodiment of the present invention may be further used to determine whether a tongue effective area exists in the initial tongue image, specifically, before the effective area segmentation process is performed on the initial tongue image to obtain a first tongue image and a first tongue contour mask image, the method further includes; judging whether the initial tongue image contains a tongue effective area, and if not, sending out an error prompt.

In step S2, the image diffusion model may be a Controlnet image diffusion model, which is trained based on a large number of tongue images capable of truly reflecting tongue fur texture features, and thus may be considered as a tongue image in which the first generated tongue image generated based on the image diffusion model has tongue fur texture features capable of truly reflecting tongue fur.

Further, the embodiment of the invention further includes a training method for an image diffusion model, and before generating at least one first generated tongue image corresponding to the first tongue outline mask image based on a preset image diffusion model, the method further includes: acquiring a fourth initial tongue image sample capable of truly reflecting tongue texture features; carrying out effective region segmentation processing on the fourth initial tongue image sample to obtain a fourth tongue image sample and a second tongue outline mask image sample; and inputting the fourth tongue image sample and the second tongue outline mask image sample as training data into an input layer of a preset image diffusion training model, and obtaining the image diffusion model through training.

Fig. 2 schematically illustrates a training pair of a fourth initial tongue image sample and a second tongue outline mask image according to an embodiment of the present invention, and because the tongue image in the fourth initial tongue image sample can truly reverse the texture features of the tongue, the generated tongue images generated by the trained image diffusion model in the embodiment of the present invention are tongue effective area pictures that can truly reflect the texture features of the tongue. It can be inferred that the first generated tongue images in the embodiments of the present invention are tongue effective area images that can truly reflect the tongue texture features, and the first generated tongue images can be used as references for performing color correction on the first tongue images. So as to carry out color correction on the first tongue image by referencing the color information of the first generated tongue image. The tongue texture features can be texture features of tongue quality and tongue fur.

In step S3, performing color correction on the first tongue images based on the N first generated tongue images, and generating N second tongue images corresponding to the N first generated tongue images one by one includes: sequentially selecting one image from the first generated tongue image as a target generated tongue image, and performing feature point matching on the first tongue image and the target generated tongue image to obtain feature point pairs; generating a color correction matrix between the first tongue image and the target generated tongue image according to the color corresponding relation of the feature points; and performing color correction on the first tongue image based on the color correction matrix to obtain a second tongue image corresponding to the target generated tongue image. That is, the number of second tongue images in the embodiment of the present invention is the same as the number of generated tongue images.

The color correction matrix is a technology for correcting the color deviation of the digital image, and the color correction matrix is more true and accurate by adjusting the color of the image. The basic principle is to build a matrix by measuring RGB values of a series of standard colors and RGB values of the same series of standard colors photographed by a camera, the matrix being capable of correcting color deviations of an image photographed by the camera. The matrix contains color deviation information of an image shot by the camera, and the color of each pixel point of the image can be corrected in a matrix operation mode, so that the color of the image is more real and accurate.

Further, the first generated tongue image in the embodiment of the invention has RGB values capable of truly inverting the tongue texture feature, the SIFT matching algorithm can be selected to perform feature point matching on the target generated tongue image and the first tongue image, the feature point pair is determined as an extraction key matching point, and the RGB values capable of truly reflecting the tongue texture feature are provided in the target generated tongue image, and based on certain consistency of the RGB values of the same object in different images, the color correction matrix can be calculated by matching the RGB values of the same object in different images, and the color correction can be completed by processing the first effective area image by using the color correction matrix.

According to the color correction method based on feature point matching, feature matching between two images with larger difference can be achieved, robustness and accuracy of color correction are greatly improved, and corrected images can show true colors of tongue fur.

In one embodiment of the present invention, the image recognition model in step S4 may be a cascading network model, a first network model of the cascading network model is a text description prediction model, and a second network model of the cascading network model is a tongue image classification model; the step of calling a preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image comprises the following steps: inputting each piece of target image data into the text description prediction model for recognition so as to obtain text descriptions corresponding to each piece of target image data, wherein the text descriptions are used for describing tongue texture features in the target image data; and inputting text descriptions corresponding to the target image data into the tongue image classification model for recognition so as to obtain a recognition result corresponding to the initial tongue image.

In the embodiment of the invention, all second tongue images are used as target image data, wherein all second tongue images are tongue images after color correction. Or the second tongue image and the first tongue image are used as target image data, the original tongue image shot by the mobile phone is used as a tongue image to be learned, if the original image has no chromatic aberration, the texture characteristics of the tongue and tongue fur can be truly represented, and the generalization capability of the model can be further improved by adding the image into the model. The tongue image classification in the embodiment of the invention can classify intestinal diseases of different grades aiming at tongue image characterization, namely, based on the recognition result of the tongue image, doctors can be assisted to know the illness state of patients. It can be appreciated that in the embodiment of the present invention, tongue images may be classified according to other classification criteria, so as to accomplish different types of tongue image recognition.

Further, the embodiment of the invention also comprises a training method for the text description prediction model and the tongue image classification model. Therefore, before calling a preset image recognition model to recognize the target image data so as to obtain the recognition result of the initial tongue image, the method further comprises the following steps: pre-training the text description prediction model; the pre-training the text description prediction model specifically comprises the following steps: carrying out effective area segmentation processing on the first initial tongue image sample to obtain a first tongue image sample; acquiring a text description describing tongue texture features of the first tongue image sample; and inputting the first tongue image sample and the text description of the first tongue image sample as training data into an input layer of a preset multi-mode training model, and obtaining a text description prediction model through training. The text description of the tongue texture feature of the embodiments of the present invention can be expressed as: the tongue is pale white, the tongue shape is old and tender, the tongue coating is thick and dry, the tongue coating is white and the tongue coating is partially concentrated, and the text description of the tongue texture features is only an alternative embodiment of the invention and is not used as the basis for limiting the protection scope of the invention.

Further, before invoking a preset image recognition model to recognize the target image data to obtain a recognition result of the initial tongue image, the method further includes: pre-training the tongue image classification model; the pre-training the tongue image classification model specifically comprises the following steps: carrying out effective region segmentation processing on a second initial tongue image sample to obtain a second tongue image sample and a first tongue outline mask image sample, and carrying out tongue image classification labeling on the second initial tongue image sample; generating N first generation type tongue image samples corresponding to the second tongue outline mask image based on a preset image diffusion model; respectively carrying out color correction on the second tongue image samples based on N first generation type tongue image samples, and respectively generating N third tongue image samples which are in one-to-one correspondence with the N first generation type tongue image samples; taking N third tongue image samples or a set of N third tongue image samples and second tongue image samples as target image data samples, and calling a preset text description prediction model to identify the target image data samples so as to obtain text descriptions corresponding to the target image data samples; and inputting a tongue image classification result of the second initial tongue image sample and text descriptions corresponding to each target image data sample as training data into an input layer of a preset tongue image classification prediction training model, and obtaining a tongue image classification model through training. The tongue image classification result of the embodiment of the present invention may specifically be a tongue image belonging to a class C1 of chronic atrophic gastritis, a tongue image belonging to a class C2 of chronic atrophic gastritis, or the like.

Further, before invoking a preset image recognition model to recognize the target image data to obtain a recognition result of the initial tongue image, the method further includes: pre-training the tongue image classification model; the pre-training the tongue image classification model specifically comprises the following steps: acquiring text descriptions of a plurality of tongue texture features corresponding to different tongue image classification results; and inputting any target tongue image classification result and text description of a plurality of tongue texture features corresponding to the current target tongue image classification result as training data into an input layer of a preset tongue image classification prediction training model, and obtaining a tongue image classification model through training.

In one embodiment of the present invention, the image recognition model in step S4 may be an image recognition model, where the image recognition model is an image recognition model obtained by training in advance, and the calling a preset image recognition model to recognize the target image data, so as to obtain a recognition result of the initial tongue image, includes: inputting each target image data into the image recognition model for recognition so as to obtain tongue image classification results corresponding to each target image data; and carrying out statistical calculation on tongue image classification results corresponding to each target image data so as to finally obtain an initial tongue image recognition result. According to the embodiment, the target image data are directly subjected to learning identification based on the image identification model, after tongue image classification results corresponding to each target image data are obtained, the tongue image classification results with the largest occurrence number are counted and selected as the identification results of finally identified tongue fur image pictures.

For the purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated by one of ordinary skill in the art that the methodologies are not limited by the order of acts, as some acts may, in accordance with the methodologies, take place in other order or concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Example two

Fig. 3 schematically illustrates a schematic structural diagram of a tongue image recognition device based on image augmentation, and referring to fig. 3, the tongue image recognition device based on image augmentation in the embodiment of the present invention specifically includes an effective area segmentation module 301, an image diffusion module 302, a color correction module 303, and an image recognition module 304, where:

an effective area segmentation module 301, configured to perform effective area segmentation processing on an initial tongue image to be processed to obtain a first tongue image and a first tongue contour mask image;

the image diffusion module 302 is configured to generate N first generated tongue images corresponding to the first tongue contour mask images based on a preset image diffusion model, where N is greater than or equal to 1;

The color correction module 303 is configured to perform color correction on the first tongue images based on N first generated tongue images, and generate N second tongue images corresponding to the N first generated tongue images one by one;

the image recognition module 304 is configured to take N second tongue images or a set of N second tongue images and the first tongue image as target image data, and call a preset image recognition model to recognize the target image data, so as to obtain a recognition result of the initial tongue image.

Further, the color correction module 303 specifically includes:

the characteristic point matching submodule is used for sequentially selecting one image from the first generated tongue image as a target generated tongue image, and carrying out characteristic point matching on the first tongue image and the target generated tongue image to obtain a characteristic point pair;

the matrix generation sub-module is used for generating a color correction matrix between the first tongue image and the target generation type tongue image according to the color corresponding relation of the characteristic points;

and the correction calculation sub-module is used for carrying out color correction on the first tongue image based on the color correction matrix so as to obtain a second tongue image corresponding to the target generation type tongue image.

The image recognition module 304 of one embodiment of the present invention includes:

the text description prediction sub-module is used for inputting each piece of target image data into the text description prediction model for recognition so as to obtain text descriptions corresponding to each piece of target image data, wherein the text descriptions are used for describing tongue texture characteristics in the target image data;

and the tongue image classification prediction sub-module is used for inputting text descriptions corresponding to each target image data into the tongue image classification model for recognition so as to obtain a recognition result corresponding to the initial tongue image.

Further, the tongue image recognition device based on image augmentation further comprises a text description prediction model training module for training the text description prediction model in advance.

The text description prediction model training module specifically comprises:

the first acquisition submodule is used for carrying out effective area segmentation processing on the first initial tongue image sample to obtain a first tongue image sample;

a first obtaining sub-module for obtaining a text description describing tongue texture features of the first tongue image sample;

the text description training sub-module is used for inputting the first tongue image sample and the text description of the first tongue image sample as training data into an input layer of a preset multi-mode training model, and obtaining a text description prediction model through training.

Further, the tongue image recognition device based on image augmentation according to the embodiment of the invention further comprises a tongue image classification model training module for training the tongue image classification model in advance.

The tongue image classification model training module of one embodiment of the invention specifically comprises:

the second acquisition sub-module is used for carrying out effective area segmentation processing on a second initial tongue image sample to obtain a second tongue image sample and a first tongue outline mask image sample, and carrying out tongue image classification labeling on the second initial tongue image sample;

the image diffusion submodule is used for generating N first generation type tongue image samples corresponding to the second tongue outline mask image based on a preset image diffusion model;

the color correction sub-module is used for respectively carrying out color correction on the second tongue image samples based on N first generation type tongue image samples, and respectively generating N third tongue image samples which are in one-to-one correspondence with the N first generation type tongue image samples;

the text description acquisition sub-module is used for taking each third tongue image sample or a set of each third tongue image sample and the first tongue image sample as a target image data sample, and calling a preset text description prediction model to learn and identify the target image data sample so as to obtain text description corresponding to each target image data sample;

The first tongue image classification training sub-module is used for inputting a tongue image classification result of the second initial tongue image sample and text descriptions corresponding to each target image data sample as training data into an input layer of a preset tongue image classification prediction training model, and obtaining a tongue image classification model through training.

The tongue image classification model training module of another embodiment of the invention specifically comprises:

the third acquisition sub-module is used for acquiring text descriptions of a plurality of tongue texture features corresponding to different tongue image classification results;

the second tongue image classification training sub-module is used for inputting any target tongue image classification result and text description of a plurality of tongue texture features corresponding to the current target tongue image classification result as training data into an input layer of a preset tongue image classification prediction training model, and obtaining a tongue image classification model through training.

The image recognition module 304 of another embodiment of the present invention includes:

the image recognition sub-module is used for inputting each piece of target image data into the image recognition model for recognition so as to obtain tongue image classification results corresponding to each piece of target image data;

and the statistical calculation sub-module is used for carrying out statistical calculation on tongue image classification results corresponding to each target image data so as to finally obtain an initial tongue image recognition result.

Further, the tongue image recognition device based on image augmentation according to the embodiment of the invention further comprises a tongue segmentation model training module for training the tongue segmentation model in advance.

The tongue segmentation model training module specifically comprises:

a fourth acquisition sub-module for acquiring a third initial tongue image sample comprising a tongue active area;

the labeling sub-module is used for labeling the tongue effective areas in each third initial tongue image sample to obtain semantic segmentation training samples;

the semantic segmentation training sub-module is used for inputting the semantic segmentation training sample into an input layer of a preset semantic segmentation model, and obtaining a tongue segmentation model through training.

Further, the effective area segmentation module 301 of the embodiment of the present invention is further configured to determine whether the initial tongue image includes a tongue effective area, and if not, send an error prompt.

Further, the tongue image recognition device based on image augmentation according to the embodiment of the invention further comprises an image diffusion model training module for training the image diffusion model in advance.

The image diffusion model training module specifically comprises:

a fifth acquisition sub-module, configured to acquire a fourth initial tongue image sample that can truly reflect the texture characteristics of the tongue;

The effective area segmentation submodule is used for carrying out effective area segmentation processing on the fourth initial tongue image sample to obtain a fourth tongue image sample and a second tongue outline mask image sample;

and the image diffusion training sub-module is used for inputting the fourth tongue image sample and the second tongue outline mask image sample as training data into an input layer of a preset image diffusion training model, and obtaining the image diffusion model through training.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

After an initial tongue image is obtained, a first tongue image and a first tongue outline mask image which are subjected to effective area segmentation processing on the initial tongue image are obtained; generating N first generation type tongue images corresponding to the first tongue outline mask images based on a preset image diffusion model, respectively carrying out color correction on the first tongue images based on the N first generation type tongue images, respectively generating N second tongue images corresponding to the N first generation type tongue images one by one, and improving generalization capability and robustness in image recognition by acquiring the N second tongue images after color correction; and finally, taking N second tongue images or a set of N second tongue images and the first tongue images as target image data, and calling a preset image recognition model to recognize the target image data so as to obtain a recognition result of the initial tongue image.

Example III

The embodiment of the invention provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the tongue image identification method embodiments based on image augmentation, such as the steps S1-S4 shown in figure 1 when executing the computer program. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units in the tongue image recognition device or gateway system embodiments based on image augmentation, such as the effective area segmentation module 301, the image diffusion module 302, the color correction module 303, and the image recognition module 304 shown in fig. 3.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, any of the claimed embodiments can be used in any combination.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A tongue image recognition method based on image augmentation, the method comprising:

2. The method according to claim 1, wherein the performing color correction on the first tongue images based on the N first generated tongue images respectively, and generating N second tongue images corresponding to the N first generated tongue images one to one respectively includes:

3. The method of claim 1, wherein the image recognition model is a cascading network model, a first network model of the cascading network model is a text description prediction model, and a second network model of the cascading network model is a tongue image classification prediction model;

4. A method according to claim 3, wherein before invoking a preset image recognition model to recognize the target image data to obtain a recognition result of the initial tongue image, the method further comprises: pre-training the text description prediction model;

5. The method of claim 4, wherein before invoking a preset image recognition model to recognize the target image data to obtain a recognition result of the initial tongue image, the method further comprises: pre-training the tongue image classification prediction model;

and inputting a tongue image classification result of the second initial tongue image sample and text descriptions corresponding to each target image data sample as training data into an input layer of a preset tongue image classification prediction training model, and obtaining the tongue image classification prediction model through training.

6. A method according to claim 3, wherein before invoking a preset image recognition model to recognize the target image data to obtain a recognition result of the initial tongue image, the method further comprises: pre-training the tongue image classification prediction model;

and inputting a target tongue image classification result and text descriptions of a plurality of tongue texture features corresponding to the current target tongue image classification as training data into an input layer of a preset tongue image classification prediction training model, and obtaining the tongue image classification prediction model through training.

7. The method according to claim 1, wherein the calling a preset image recognition model to recognize the target image data to obtain a recognition result of the initial tongue image includes:

8. The method according to any one of claims 1-7, wherein before performing the active area segmentation process on the initial tongue image to be processed to obtain the first tongue image and the first tongue contour mask image, the method further comprises: pre-training a tongue segmentation model;

The pre-trained tongue segmentation model comprises:

9. The method according to any one of claims 1-7, wherein before performing an active area segmentation process on an initial tongue image to be processed to obtain a first tongue image and a first tongue contour mask image, the method further comprises;

10. A tongue image recognition device based on image augmentation, the device comprising:

11. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor; the computer program, when executed by the processor, implements the tongue image recognition method based on image augmentation as claimed in any one of claims 1-9.