WO2021004402A1

WO2021004402A1 - Image recognition method and apparatus, storage medium, and processor

Info

Publication number: WO2021004402A1
Application number: PCT/CN2020/100247
Authority: WO
Inventors: 刘根; 何炳塬; 解春兰; 孔甜; 屈奇勋; 沈凌浩; 贡卓琳; 张帆; 郑汉城
Original assignee: 深圳数字生命研究院; 深圳碳云智能数字生命健康管理有限公司
Priority date: 2019-07-05
Filing date: 2020-07-03
Publication date: 2021-01-14
Also published as: CN111881913A

Abstract

Disclosed are an image recognition method and apparatus, a storage medium, and a processor. The image recognition method comprises: obtaining a target image to be recognized; obtaining a target region in said target image, wherein an image in the target region is used for reflecting information of a parameter of the specific type; determining a selected pixel coordinate in the target region; and determining, on the basis of the association between the value of the parameter of the specific type, and the pixel coordinate, the value of a parameter corresponding to the selected pixel coordinate.

Description

Image recognition method and device, storage medium and processor

Technical field

This application relates to the field of image recognition, and specifically to an image recognition method and device, storage medium and processor.

Background technique

Image recognition technology is an important field of artificial intelligence, which refers to the object recognition of images to identify targets and objects in various patterns. Common recognition objects can be roughly divided into natural scene objects and specific scene objects. For natural scene images, convolutional networks are used to train appropriate models, while for specific scene objects, certain network models and algorithms are required. Secondary development. For the recognition of the data in the picture, the existing optical character recognition (Optical Character Recognition, referred to as OCR) recognition technology is used to recognize the data of the picture, but the OCR recognition technology can only recognize the numbers displayed in the image. For dot, continuous or The value represented by the discontinuous curve cannot be identified.

There are many hardware devices for blood glucose monitoring, but the export of blood glucose data needs to wait until the end of the monitoring period (usually about 14 days). You need to connect to the computer through a data cable to export the blood glucose data table. It is impossible for users to understand the changes of personal blood glucose data in real time. Unable to monitor the influence of eating, moving, and sleeping on their blood glucose data in daily life.

It can be seen that the current recording methods and systems for physiological monitoring parameters (such as fingertip blood glucose meter readings) based on image recognition are based on optical character recognition technology, which can only identify physiological parameter values and cannot be automatically recognized as values through physiological curves. Follow-up application.

In view of the above-mentioned problems, no effective solutions have yet been proposed.

Summary of the invention

The embodiment of the application provides an image recognition method and device, storage medium, and processor, which can at least solve that the current image recognition method can only recognize the numerical value of the character format in the image, and cannot automatically recognize the curve or discrete point as a numerical value. Technical issues.

According to one aspect of the embodiments of the present application, there is provided an image recognition method, including: obtaining a target image to be recognized; obtaining a target region in the target image, wherein the image in the target region is used to reflect specified type parameter information; Determine the coordinate of the selected pixel in the target area; determine the parameter value corresponding to the selected pixel coordinate based on the correlation between the value of the specified type parameter and the coordinate of the pixel.

Optionally, before determining the value of the parameter corresponding to the selected pixel point coordinate based on the correlation between the value of the specified type parameter and the pixel point coordinate, the method further includes: separating the specified color channel from the target area, wherein, The designated color channel is the same color channel as the color channel corresponding to the standard color band of the target area among the R, G, and B color channels; image binarization processing is performed on the image of the designated color channel, and the binarization processing is selection designation The set of pixel points in the image of the color channel that are greater than the preset threshold to obtain a binarized image; select the threshold corresponding to the area where each pixel in the binarized image is located from the preset threshold set, and use the selected threshold Perform image segmentation on the target area; perform reference point pixel recognition on the binarized image obtained after segmentation to obtain the pixel coordinates of at least two reference points of the standard color band in the image; determine the actual values of at least two reference points And the corresponding relationship between the pixel point coordinates, and based on the corresponding relationship, a linear relationship between the value of the specified type parameter and the pixel point coordinate is established, and the linear relationship is regarded as the association relationship.

Optionally, determining the coordinates of selected pixels in the target area includes: performing grayscale processing on the image in the target area to obtain a grayscale image; performing clustering processing on each pixel in the grayscale image to obtain Multiple clusters; select a designated cluster from the multiple clusters, and determine the coordinates of the selected pixel from all pixels in the designated cluster.

Optionally, selecting a designated cluster from a plurality of clusters, and determining the coordinates of the selected pixel from all pixels in the designated cluster is specifically: selecting the cluster with the least number of pixels from the multiple clusters, and selecting from the pixels Determine the coordinates of the selected pixel in the cluster with the least number of points.

Optionally, the image in the target area includes: a curve image in the coordinate system or a discrete point image in the coordinate system, and the curve in the curve image or the discrete points in the discrete point image are used to reflect the selection of the specified type of parameters at different moments. value.

Optionally, the method further includes: determining the target recording time corresponding to the pixel point coordinate in the curve image or the discrete point image; determining the selected pixel point coordinate based on the correlation between the value of the specified type parameter and the pixel point coordinate The corresponding parameter value includes: determining the parameter value corresponding to the selected pixel point coordinate at the target recording time based on the correlation between the value of the specified type parameter and the pixel point coordinate.

Optionally, the target recording time is determined by the following methods: identifying the character information in the curve image, and extracting the time information of the specified type parameter from the character information; the time length between any two adjacent recording moments in the time information is determined by pixel The number of points is divided at equal intervals to obtain multiple time points; and the target recording time to which the selected pixel point coordinates belong is determined from the multiple time points.

Optionally, acquiring the target region in the target image includes: semantically segmenting the target image to obtain a mask image and a foreground image of the target image; determining the region of interest from the foreground image, and combining the region of interest As the target area.

Optionally, when the target image is of the first type, an efficient neural network model is used to perform semantic segmentation on the target image, where the efficient neural network model includes an initialization module and a bottleneck module, wherein each bottleneck module includes Three convolutional layers, wherein the first convolutional layer of the three convolutional layers is used for dimensionality reduction processing, and the second convolutional layer is used for hole convolution, full convolution and asymmetric convolution, The third convolutional layer is used to perform dimension-upgrading processing; when the target image is of the second type, adjust the bilateral segmentation network model, and use the adjusted segmentation network model to perform semantic segmentation on the target image, Among them, the adjustment of the segmentation network model includes: the bilateral segmentation network model includes a backbone network and an auxiliary network. The backbone network is composed of two layers. Each layer of the backbone network includes a convolutional layer, a batch normalization layer, and a nonlinear activation function. , Reduce the number of feature maps of the backbone network output channel; the auxiliary network model framework adopts a lightweight model to reduce the number of feature maps of the backbone network output channel. The lightweight model includes one of the following: Xception39, SqueezeNet, Xception, MobileNet, ShuffleNet, where the number of images in the first data set corresponding to the first type is smaller than the number of images in the second data set corresponding to the second type.

Optionally, determining the region of interest from the foreground image includes: determining the characteristic region in the foreground image and the corner coordinates of the target geometric region, where the characteristic region is the region in the foreground image that contains the specified type parameter information; Point coordinates calculate the projection transformation matrix; perform projection transformation on the pixel points in the characteristic area to obtain the region of interest.

Optionally, before acquiring the target image to be recognized, the method further includes: determining whether the image to be recognized is the target image; when the image to be recognized is the target image, determining to perform semantic segmentation on the target image. It should be noted that the target image is the image with the target area, and the image in the target area is used to reflect the specified type of parameter information. For example, one embodiment of the present application is a method for recognizing Abbott’s continuous blood glucose meter images. The continuous blood glucose meter contains images that reflect the continuous changes in blood glucose is the target image; of course, if it is used to identify the value of other curve images or discrete point images, then it contains the curve image or discrete points used to reflect the corresponding value. The image is the target image.

Optionally, the above method further includes: dividing the region of interest into a preset number of non-overlapping sliders; determining the feature values of the preset number of non-overlapping sliders to obtain the preset number of feature values; The preset number of feature values are combined into a feature vector; the feature vector is input to a support vector machine classifier for analysis to obtain the type of the region of interest.

Optionally, the specified type parameter information includes curve information used to reflect the trend of blood glucose data changes over time, or value information of discrete points in the coordinate system used to reflect the trend of blood glucose data changes over time.

Optionally, the above method further includes: displaying the parameter value corresponding to the selected pixel point coordinate.

Optionally, determining the coordinate of the selected pixel in the target area includes: receiving a user's instruction for the target image; and determining the coordinate of the selected pixel according to the instruction.

Optionally, the instruction is determined based on one of the following information: receiving touch position information of the user on the human-computer interaction interface where the target image is located; or receiving query information input by the user.

Optionally, when the instruction is to receive the user's touch point position information on the human-computer interaction interface where the target image is located, the method further includes: determining the selected pixel point coordinates based on the touch point position Before, it is judged whether the touch point position is located in the target area; when the judgment result indicates that the touch point position is located in the target area, trigger determination of the selected pixel point coordinates.

According to another aspect of the embodiments of the present application, there is provided a data display method, including: displaying a target image to be recognized; displaying a region of interest in the target image, wherein the image in the region of interest is used for Reflect the change information of the specified type of parameters over time; display the coordinates of the selected pixel in the region of interest and the target recording time corresponding to the pixel coordinates; display the coordinates of the selected pixel in the The parameter value corresponding to the target recording time, wherein the parameter value is determined based on the correlation between the value of the specified type parameter and the pixel coordinate.

Optionally, the association relationship is determined in the following manner: a designated color channel is separated from the region of interest, wherein the designated color channel is the standard color band of the R, G, and B color channels corresponding to the region of interest Color channels with the same color channel; perform image binarization processing on the image of the specified color channel to obtain a binarized image; select the area corresponding to each pixel in the binarized image from a preset threshold set And use the selected threshold to perform image segmentation on the region of interest; perform reference point pixel identification on the binarized image obtained after segmentation to obtain the pixel coordinates of at least two reference points of the standard color band in the image Determine the actual value of the at least two reference points and the corresponding relationship between the pixel coordinates, and establish a linear relationship between the values of the specified type parameters and the pixel coordinates based on the corresponding relationship, and The linear relationship serves as the association relationship.

Optionally, according to another aspect of the embodiments of the present application, an image recognition method is provided, including: detecting a user's touch point position in a target image; determining the selected pixel point coordinates based on the touch point position, And the target recording time corresponding to the pixel point coordinates; determine the parameter value corresponding to the selected pixel point coordinate at the target recording time based on the correlation between the value of the specified type parameter and the pixel point coordinate; The parameter value.

Optionally, before determining the coordinates of the selected pixel point based on the position of the touch point, the method further includes: determining whether the position of the touch point is located in the region of interest of the target image, wherein the image in the region of interest is used to reflect the specified type parameters Change information over time; when the judgment result indicates that the touch point is located in the region of interest, trigger the determination of the selected pixel point coordinates.

Optionally, before determining the selected pixel coordinates based on the association relationship between the values of the specified type parameters and the pixel coordinates before the parameter values corresponding to the target recording time, the method further includes: The designated color channel is separated from the region of interest, where the designated color channel is the same color channel as the color channel corresponding to the standard color band of the region of interest among the R, G, and B color channels; Perform image binarization processing on the image of the color channel to obtain a binarized image; select the threshold corresponding to the area where each pixel in the binarized image is located from the preset threshold set, and use the selected threshold to Perform image segmentation on the region of interest; perform reference point pixel recognition on the binarized image obtained after segmentation to obtain pixel coordinates of at least two reference points of the standard color band in the image; determine the at least two reference points The actual value of and the corresponding relationship between the pixel point coordinates, and establish a linear relationship between the value of the specified type parameter and the pixel point coordinates based on the corresponding relationship, and use the linear relationship as the association relationship .

According to another aspect of the embodiments of the present application, an image recognition method is provided, including: detecting query information input by a user; determining the coordinates of a selected pixel in a target image based on the query information, and the corresponding pixel coordinates The target recording time of the target; the value of the parameter corresponding to the selected pixel coordinate at the target recording time is determined based on the correlation between the value of the specified type parameter and the coordinate of the pixel; and the value of the parameter is output.

Optionally, before determining the coordinates of the selected pixel point in the target image based on the touch point position, the method further includes: determining whether the touch point position is located in the region of interest of the target image, wherein the feeling The image in the region of interest is used to reflect the change information of the specified type parameter over time; when the judgment result indicates that the position of the touch point is located in the region of interest, triggering the determination of the coordinates of the selected pixel point.

Optionally, before determining the selected pixel coordinates based on the association relationship between the values of the specified type parameters and the pixel coordinates before the parameter values corresponding to the target recording time, the method further includes: The designated color channel is separated from the region of interest, where the designated color channel is the same color channel as the color channel corresponding to the standard color band of the region of interest among the R, G, and B color channels; The image of the color channel is subjected to image binarization processing. The binarization process is to select a set of pixel points greater than a preset threshold in the image of the specified color channel to obtain a binarized image; select all from the preset threshold set The threshold corresponding to the area where each pixel in the binarized image is located, and the selected threshold is used to segment the region of interest; the binarized image obtained after the segmentation is identified by the reference point pixel to obtain the standard color The pixel coordinates of at least two reference points of the belt in the image; determine the actual values of the at least two reference points and the corresponding relationship between the pixel coordinates, and establish the specified type based on the corresponding relationship The linear relationship between the value of the parameter and the coordinate of the pixel point, and the linear relationship is taken as the association relationship.

According to another aspect of the embodiments of the present application, there is provided an image recognition device, which includes: a first acquisition module configured to acquire a target image to be recognized; a second acquisition module configured to acquire a target area in the target image , Wherein the image in the target area is used to reflect the specified type parameter information; the first determining module is set to determine the coordinates of the selected pixel in the target area; the second determining module is set to be based on the specified type The correlation between the value of the parameter and the coordinate of the pixel determines the value of the parameter corresponding to the selected pixel coordinate.

Optionally, the device further includes: a separation module configured to separate a designated color channel from the target area, wherein the designated color channel is the standard of the R, G, B color channels and the target area The color channel corresponding to the color band is the same color channel; the processing module is configured to perform image binarization processing on the image of the specified color channel, and the binarization process is to select the image of the specified color channel larger than the preset A set of threshold pixels to obtain a binarized image; the selection module is configured to select the threshold corresponding to the area of each pixel in the binarized image from the preset threshold set, and use the selected threshold to compare the Image segmentation of the target area; a fitting module, configured to perform reference point pixel recognition on the binarized image obtained after segmentation, to obtain the pixel coordinates of at least two reference points of the standard color band in the image; a building module, Set to determine the actual value of the at least two reference points and the corresponding relationship between the pixel point coordinates, and establish a linear relationship between the value of the specified type parameter and the pixel point coordinates based on the corresponding relationship, and Use the linear relationship as the association relationship.

Optionally, the first determining module includes: a grayscale processing unit configured to perform grayscale processing on the image in the target area to obtain a grayscale image; and a clustering unit configured to perform grayscale processing on the grayscale Each pixel in the image is clustered to obtain multiple clusters; the selection unit is configured to select a specified cluster from the multiple clusters, and determine the selected pixel from all the pixels in the specified cluster Point coordinates.

Optionally, the selection unit is configured to select a cluster with the least number of pixels from the plurality of clusters, and determine the coordinates of the selected pixel from the cluster with the least number of pixels.

Optionally, the image in the target area includes: a curve image in a coordinate system or a discrete point image in the coordinate system, and the curve in the curve image or the discrete point in the discrete point image is used to reflect a specified type parameter Values at different moments.

Optionally, the first determining module is further configured to determine the target recording time corresponding to the pixel point coordinates in the curve image; the second determining module is further configured to determine the value based on the specified type parameter and The correlation of the pixel coordinates determines the value of the parameter corresponding to the selected pixel coordinate at the target recording time.

Optionally, the first determining module further includes: a first recognition unit configured to recognize character information in a curved image, and extract time information of the specified type parameter from the character information; and a first dividing unit, To divide the time length between any two adjacent recording moments in the time information at equal intervals according to the number of pixels to obtain multiple time points; the first determining unit is set to determine from the multiple time points The target recording time to which the selected pixel coordinates belong.

Optionally, the second acquisition module includes: a segmentation unit configured to perform semantic segmentation on a target image to obtain a mask image and a foreground image of the target image; a second determining unit configured to be configured to Determine the region of interest in the, and use the region of interest as the target region.

Optionally, the segmentation unit is set to use an efficient neural network model to perform semantic segmentation on the target image when the target image is of the first type. The efficient neural network model includes an initialization module and a bottleneck module, where each The bottleneck module includes three convolutional layers. Among the three convolutional layers, the first convolutional layer is used for dimensionality reduction processing, and the second convolutional layer is used for hole convolution, full convolution, and asymmetrical. Convolution, the third convolutional layer is used for dimensional increase processing; the segmentation unit is also set to adjust the bilateral segmentation network model when the target image is of the second type, and use the adjusted segmentation The network model performs semantic segmentation on the target image, where the adjustment of the segmentation network model includes: the bilateral segmentation network model includes a backbone network and an auxiliary network. The backbone network is composed of two layers, and each layer of the backbone network includes volumes. Multiplying layers, batch normalization layers and nonlinear activation functions to reduce the number of feature maps of the backbone network output channel; the auxiliary network model framework adopts a lightweight model to reduce the number of feature maps of the backbone network output channel, the lightweight model It includes one of the following: Xception39, SqueezeNet, Xception, MobileNet, ShuffleNet; wherein the number of images in the first data set corresponding to the first type is smaller than the number of images in the second data set corresponding to the second type.

Optionally, the device further includes: a third determining module configured to determine whether the image to be recognized is a target image; and when the image to be recognized is a target image, determine to perform semantic segmentation on the target image.

Optionally, the third determining module is further configured to determine the type of the region of interest by: dividing the region of interest into a preset number of non-overlapping sliders; determining the preset number of non-overlapping sliders To obtain the preset number of feature values; combine the preset number of feature values into a feature vector; input the feature vector to a support vector machine classifier for analysis to obtain the region of interest Types of.

Optionally, the device further includes: a display module for displaying the parameter values corresponding to the selected pixel coordinates.

Optionally, the first determining module is further configured to receive a user's instruction for the target image; determine the coordinates of the selected pixel according to the instruction.

Optionally, the instruction is determined based on one of the following information: receiving position information of the user's touch point on the human-computer interaction interface where the target image is located; or receiving query information input by the user.

Optionally, the device further includes: a judging module, configured to determine the selected user based on the touch point position when the instruction is to receive the user's touch point position information on the human-computer interaction interface where the target image is located. Before setting the pixel coordinates, determine whether the touch point position is located in the target area; a trigger module is used to trigger the determination of the selected pixel when the determination result indicates that the touch point position is located in the target area Point coordinates.

According to yet another aspect of the embodiments of the present application, there is provided a non-volatile storage medium, the non-volatile storage medium includes a stored program, wherein the non-volatile storage is controlled while the program is running The device where the medium is located executes the image recognition method described above.

According to another aspect of the embodiments of the present application, a processor is provided, the processor is configured to run a program, wherein the image recognition method described above is executed when the program is running.

In the embodiment of the present application, the method of determining the parameter value corresponding to the selected pixel coordinate according to the correlation between the pixel coordinate in the target image and the value of the specified type parameter is adopted. The correlation between the point coordinates and the value of the specified type of parameter recognizes the value of the parameter corresponding to any pixel point in the image. Therefore, the recognition of the parameter value represented by the non-character information in the image is realized, and the image The pixel points in the image are automatically recognized as the purpose of the corresponding parameter value, which solves the technical problem that the current image recognition method can only recognize the value of the character format in the image, and cannot automatically recognize the curve or the discrete point as the value.

Description of the drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation of the application. In the attached picture:

FIG. 1 is a flowchart of an optional method for identifying blood glucose data in an embodiment of the present application;

FIG. 2 is a flowchart of an image recognition method in another embodiment of the present application;

FIG. 3 is an example diagram of an optional ROI region extraction according to an embodiment of the application;

4a-4d are exemplary diagrams of an optional blood glucose curve detection and segmentation process according to an embodiment of the application;

Fig. 5 is a schematic structural diagram of an optional BiSeNet-Xception39 simplified model according to an embodiment of the application;

6 is an optional 8-hour image R-square error distribution statistical result according to an embodiment of the application;

FIG. 7 is an optional 8-hour image error value distribution diagram according to an embodiment of the application;

8 is a schematic diagram of an optional 24-hour image R-square error distribution statistical result according to an embodiment of the application;

FIG. 9 is an optional 24-hour image error value distribution diagram according to an embodiment of the application;

FIG. 10 is a structural block diagram of an image recognition device according to an embodiment of the present application;

FIG. 11 is a structural block diagram of another optional image recognition device according to an embodiment of the present application;

FIG. 12 is a flowchart of a data display method according to an embodiment of the application;

FIG. 13 is a flowchart of another image recognition method according to an embodiment of the application;

FIG. 14 is a flowchart of another image recognition method according to an embodiment of the application.

Detailed ways

In order to enable those skilled in the art to better understand the solution of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.

It should be noted that the terms "first" and "second" in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to the clearly listed Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

In related technologies, when recognizing data in an image, only the parameter value corresponding to character information can be recognized, but the parameter value corresponding to non-character information cannot be recognized. Take Abbott continuous blood glucose equipment as an example: blood glucose equipment The instantaneous scanner can obtain a curve that reflects the change trend in the 8-hour range, and cannot accurately measure the accurate blood glucose value in the 8-hour range. During instantaneous scanning, only the blood glucose value at the time when the scanner is scanned can be obtained, or the existing text recognition technology can identify the fixed time point on the scanner and the blood glucose value at the current time of the scan. However, the above recognition scheme cannot make blood glucose. The tester grasps the continuous blood glucose value at all times, and cannot provide a basis for the follow-up system blood glucose management and real-time push intervention plan. In other words, the existing OCR technology cannot identify the blood glucose value at any point in the blood glucose curve. In the embodiments of this application, the correlation between pixel coordinates and actual parameter values is used to realize the recognition of parameter values at any point in the curve, simplify the process of converting images into quantitative values, and store them in a database in a certain format , Provide support for subsequent blood glucose analysis and intervention plan generation. At the same time, it can also facilitate the export and storage of blood glucose data, realize the mobile phone record and manage blood glucose, and improve the user experience.

In some embodiments of the present application, the recognition of blood glucose data in a blood glucose image is taken as an example to illustrate how to identify corresponding parameter values at the pixel level. As shown in FIG. 1, the process includes the following steps:

Step S102: Receive an image uploaded by the user.

Step S104: Identify whether the image uploaded by the user is a blood glucose meter image to be processed. In order to ensure the integrity and standardization of the blood glucose image recognition algorithm data, the deep network model is used for image classification before calling the algorithm model, such as MobileNet, Xception, SqueezeNet and other image classification models. Among them, the confidence level of the classification model output will be used to determine whether Whether the current image is the image to be processed, for example, the confidence threshold is set to 0.85 to ensure the quality of the image uploaded by the user.

Step S106: Perform image segmentation using the semantic segmentation network. Specifically, segment the foreground information in the entire image, that is, the highlighted screen part of the blood glucose meter image. The images uploaded by users contain various noises. In order to ensure the accuracy of the results returned by the algorithm, semantic segmentation models in deep learning are used in the image pre-segmentation part, such as BiSeNet, ICNet, BSPNet, ENET and all other semantic segmentation models. Considering the data complexity and actual application speed requirements, a network model with real-time segmentation characteristics is selected.

Step S108, perform image correction. Perform quadrilateral fitting and corner detection according to the semantic segmentation results, and return the coordinate information of the four corners of the screen in an orderly manner. Use the corner information to calculate the projection transformation matrix and perform image projection transformation. Image orientation judgment-rotate the image according to the grayscale, color, texture and other information in the image, and return to the positive direction image (ROI, Region of Interest, region of interest) on the screen of the blood glucose meter.

Step S110, extracting local standard deviation and color features-judging the image type, where the image type includes N-hour and 24-hour images, where N is less than 24.

Step S112: Use the SVM classifier to classify the image, and then use the standard color band to detect and perform the segmentation of the blood glucose curve. Among them, for the 24-hour blood glucose image, use OCR to identify the date information, and for the 8-hour blood glucose image Then use OCR to identify the start time and end time of the blood glucose device scan. Of course, in addition to OCR, the solution for identifying specific data in an image can also adopt digital image processing (DIP Digital Image Processing) technology.

Step S114: Determine the mapping relationship between the blood glucose level and the image pixels.

Step S116: Calculate the blood glucose value of a certain pixel by using the above mapping relationship, and output the calculated blood glucose value.

Based on the above embodiments, the embodiments of the present application provide a method embodiment of an image recognition method. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions And, although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than here.

Fig. 2 is a flowchart of an image recognition method according to another embodiment of the present application. As shown in Fig. 2, the method includes the following steps:

Step S202, acquiring a target image to be recognized;

Step S204: Acquire a target area in the target image, where the image in the target area is used to reflect specified type parameter information;

Step S206: Determine the coordinates of the selected pixel in the target area;

Step S208: Determine the value of the parameter corresponding to the selected pixel point coordinate based on the correlation between the value of the specified type parameter and the pixel point coordinate.

Through the solution provided by the above embodiment, the correlation between the pixel coordinates in the target image and the value of the specified type parameter is used to identify the parameter value corresponding to any pixel coordinate in the image. The recognition of the parameter value represented by the information achieves the purpose of automatically recognizing the pixel in the image as the corresponding parameter value, and thus solves the problem that the current image recognition method can only recognize the value of the character format in the image. The technical problem of automatically identifying curves or discrete points as numerical values.

The above-mentioned association relationship can be expressed in many ways, for example, it can be expressed as a mapping relationship, or as a linear function relationship. Among them, the former can be implemented in the following ways: based on the value of the specified type parameter and the pixel Before the correlation of the point coordinates determines the value of the parameter corresponding to the selected pixel point coordinate, a designated color channel is separated from the target area, wherein the designated color channel is R, G, B color channel The color channel in the same color channel as the color channel corresponding to the standard color band of the target area; image binarization is performed on the image of the specified color channel, and the binarization process is to select the image of the specified color channel to be greater than the preset threshold To obtain a binarized image; select a threshold corresponding to the area where each pixel in the binarized image is located from a preset threshold set, and use the selected threshold to perform image segmentation on the target area; Perform reference point pixel identification on the binarized image obtained after segmentation to obtain the pixel coordinates of at least two reference points of the standard color band in the image; determine the actual values of the at least two reference points and the And establish a linear relationship between the value of the specified type parameter and the pixel coordinate based on the corresponding relationship, and use the linear relationship as the association relationship.

Specifically, taking the identification of blood glucose data in the blood glucose curve as an example, the above-mentioned association relationship is expressed as a mapping relationship. At this time, a standard color band is required. This detection process is a way to effectively establish the relationship between the actual blood glucose value and the pixel coordinates of the blood glucose image, and its purpose is to find the corresponding linear relationship between the pixel coordinates on the blood glucose curve and the actual blood glucose value. In the detection process, the color channel R, G, and B are separated in the blood glucose image ROI area. Because the standard color band presents blue characteristics, the B channel is extracted for image processing. First, the channel image is grayscaled and grayscaled Value normalization processing, and then adaptive threshold segmentation of the image, at the same time, using image morphology processing to complete the noise processing of the segmented image, complete the standard color band image segmentation operation, and finally, the binary image in the above process is horizontal Straight line fitting, the fitting result is the pixel coordinate height of the upper and lower lines of the standard color band in the image. In a common blood glucose meter scanner (ie, blood glucose equipment), the upper and lower blood glucose values of the standard ribbon can be 3.9 and 7.8, respectively. The linear relationship is established by the known actual blood glucose value on the actual standard color band and the corresponding pixel coordinate height:

sValue=3.9/(std_lower-std_upper)

rValue=5.85+0.5*sValue*(std_lower+std_upper)-sValue*line_rho

Among them, sValue: image pixel and actual blood glucose value ratio coefficient

rValue: The actual blood glucose value returned by image processing

line_rho: pixel height of blood glucose curve in the image

std_upper: pixel height of the upper edge of the standard color band

std_lower: the pixel height of the lower edge of the standard color band

There are many ways to determine the target area mentioned above. For example, the pixels can be determined by clustering, or the area of interest can be used as the target area. For the former, the following methods can be used to process the image in the target area. Gray-scale processing to obtain a gray-scale image; perform clustering processing on each pixel in the gray-scale image to obtain multiple clusters; select a specified cluster from multiple clusters, and determine the selected cluster from all pixels in the specified cluster The coordinates of the selected pixel.

In addition, the target image in this embodiment may be an original image generated for blood glucose meter data that needs to be displayed to the customer, for example, analyzing a graph of blood glucose content value and time value, where the graph is on the time axis In the coordinate system formed by the coordinate axis of blood glucose content, the graph is the original image to be processed for data analysis of the blood glucose meter, that is, the target image.

In some embodiments of the present application, when a designated cluster is selected from a plurality of clusters, and the coordinates of the selected pixel point are determined from all the pixels in the designated cluster, the principle for selecting the designated cluster is flexible according to the actual situation. Determine, for example, select the cluster with the least number of pixels from a plurality of clusters, and determine the coordinates of the selected pixel from the cluster with the least number of pixels.

Specifically, when detecting the above-mentioned target area, it can be expressed as detecting the curve in the image. Taking the blood glucose curve as an example, in order to reduce the existence of image noise as much as possible, the input image is intercepted by the local area rect operation, and the parameters are set As shown in Table 1:

Table 1 Details of RECT parameters for blood glucose curve detection

As shown in Figure 4a, the blood glucose curve in the image can be formed by connecting black and red (colors are not distinguished in the figure). Therefore, the color channel of the image is separated and the R channel is extracted. At this time, as shown in Figure 4b, The grayscale distribution of the image presents three distribution trends of black, gray and white, so the number of clustering centers is set to 3 (as shown in Figure 4c), and the image grayscale clustering is performed by KMeans, and the blood glucose curve is displayed in the image. It occupies the least area. Therefore, the category with the least number of categories in the classification result is extracted as the category of the blood glucose curve (as shown in FIG. 4d), and the image post-processing operation of the category image is performed to complete the blood glucose curve segmentation and detection process.

As mentioned above, the target area can be determined according to the region of interest, specifically: semantically segment the target image to obtain the mask image and foreground image of the target image; determine the region of interest from the foreground image and use the region of interest as target area.

Among them, the region of interest can be determined from the foreground image by the following methods: determine the feature region in the foreground image, and the corner coordinates of the target geometric region, where the feature region is the region in the foreground image that contains the specified type of parameter information; Point coordinates calculate the projection transformation matrix; perform projection transformation on the pixel points in the characteristic area to obtain the region of interest. Among them, in the process of projection transformation, the image rotation processing may also be included to ensure that the region of interest can be correctly identified.

Use the pre-segmentation mask and image morphology processing to perform quadrilateral fitting, and return the quadrilateral corner coordinates (upper left, lower left, lower right, upper right, including but not limited to this order) in order (counterclockwise). Then, the projection transformation matrix is calculated from the returned corner coordinates, and the highlight area of the blood glucose image is projected and transformed to obtain the ROI area. The transformation process is shown in Figure 3.

In some other optional embodiments of the present application, the image in the above-mentioned target area includes: a curve image in a coordinate system, and the curve in the curve image is used to reflect the value of the specified type parameter at different times.

When recognizing the image, in order to facilitate the user to query the time corresponding to the specific parameter value, when determining the pixel point coordinates, the target recording time corresponding to the pixel point coordinates in the curve image can also be determined; at this time, step S206 is determining the selection When the parameter value corresponding to the pixel point is selected, it can be expressed as the following processing process: Determine the parameter value corresponding to the selected pixel point coordinate at the target recording time based on the correlation between the value of the specified type parameter and the pixel point coordinate .

In some embodiments of the present application, the target recording time can be determined by the following methods: identifying the character information in the curve image, and extracting the time information of the specified type parameter from the character information; for any two adjacent recording times in the time information The length of time between is divided at equal intervals according to the number of pixels to obtain multiple time points; from multiple time points, determine the target recording time to which the coordinates of the selected pixel point belongs, for example, in the horizontal direction between T1 and T2 If the number of pixels is N, the time corresponding to the M-th pixel in the horizontal direction between T1 and T2 is T1+[M*(T2-T1)/N]. Among them, OCR technology can be used to recognize the above-mentioned character information, but it is not limited to this.

Perform character recognition on the time information in the blood glucose image, accurately record the instantaneous blood glucose time, and provide a strong guarantee for the subsequent management of continuous blood glucose data. For the application program interface, huge data input is bound to increase the data transmission time, and at the same time, the interface processing data time will also increase. Therefore, in order to increase the character recognition rate, certain data pre-processing is performed on the input data during the recognition process. Through experimental tests, adjusting the image size can directly affect the recognition rate and accuracy. In specific experimental tests, the fixed input image size is 256* 256 (including but not limited to the size of the image, which can be appropriately adjusted according to the complexity of the data), the character recognition accuracy and rate can achieve relatively good results. After the fixed input image size is 256*256, the local area RECT is intercepted (RECT is the abbreviation of rectangular frame, which represents the rectangular area intercepted in the image) for character recognition, and the RECT area parameters are set to (235, 10, 20) for 8 hours , 225), the 24-hour blood glucose image RECT area parameter is set to (205,85,20,225), the specific parameter description takes 8 hours as an example, 235 represents the vertical coordinate of the starting point of the captured area image in the size of 256*256 input image , 10 is the abscissa of the starting point of the captured area image in the input image with a size of 256*256, 20 represents the height of the changed capture area, and 225 represents the width of the captured area. Specific examples are shown in Table 2:

Table 2 Details and examples of RECT parameters in the interception area

Optionally, when the target image is of the first type, an efficient neural network model is used to perform semantic segmentation on the target image. The efficient neural network model includes an initialization module and a bottleneck module, where each bottleneck module includes three Convolutional layer, where the first convolutional layer of the three convolutional layers is used for dimensionality reduction processing, the second convolutional layer is used for hole convolution, full convolution and asymmetric convolution, and the third convolution The layer is used for dimension upgrading; when the target image is of the second type, adjust the bilateral segmentation network model, and use the adjusted segmentation network model to perform semantic segmentation on the target image, where the segmentation network model is adjusted Including: The bilateral segmentation network model includes a backbone network and an auxiliary network. The backbone network is composed of two layers. Each backbone network includes a convolutional layer, a batch normalization layer and a nonlinear activation function to reduce the output channel of the backbone network. Feature maps; the auxiliary network model framework adopts a lightweight model to reduce the number of feature maps of the backbone network output channel. The lightweight model includes one of the following: Xception39, SqueezeNet, Xception, MobileNet, ShuffleNet; wherein, the The number of images in the first data set corresponding to the first type is smaller than the number of images in the second data set corresponding to the second type. Among them, the above-mentioned first type and second type respectively correspond to the images in the first data set and the images in the second data set, that is, the images in the first data set are the images of the first type, and the images in the second data set are the second Type of image. Wherein, the number of images in the first data set is less than the number of images in the second data set. In one embodiment of the present application, an efficient neural network (that is, the segmentation model used in the first type) or a segmentation network model after adjusting the bilateral segmentation network model (that is, the second type is used The segmentation model) performs semantic segmentation on the target image. In a preferred embodiment, the segmentation network model adjusted by the bilateral segmentation network model is used to perform semantic segmentation on the target image, and the segmentation network model adjusted by the bilateral segmentation network model is Processing large-scale data has the advantages of fast processing speed and the ability to protect the size of the receptive field while retaining a certain richness of spatial information. In addition, the function of the above-mentioned main network is to retain abundant spatial information, and the function of the auxiliary network is to protect the size of the receptive field.

In some embodiments of the present application, due to the strong processing capability of the adjusted bilateral segmentation network model, it can not only process large-scale data but also support small-scale data processing. Similarly, when the data scale is small, an efficient neural network model can be used to process the image, and when the data scale is large, it can also be manually or automatically switched to the adjusted bilateral segmentation network model for processing. The size of the data is determined based on the number of specific query requests in the context of specific hardware capabilities, or based on the statistical number of query requests within a time period. The query request can be used to request image processing (for example, Recognize images uploaded by users).

As mentioned above, the data set can be composed of images to be processed. On this basis, the second data set is a data set that satisfies the following conditions: within a certain period of time, it is the target image that needs to be processed in the data set (for example, to be recognized The number of images) is greater than the preset threshold; the first data set is a data set that meets the following conditions: the number of target images (for example, images to be recognized) that need to be processed in the data set within a specific time period is less than the preset threshold.

Among them, there are many ways to determine the above specific time period, for example:

When the number of target images is counted, it may be counted once every preset time length, and any time period for counting the number of target images is taken as the above-mentioned specific time period. The starting time point of this action in the target area is the starting point, and the preset time between it and the midpoint or end point (for example, 3ms, 5ms, 30ms, 50ms, 100ms, 500ms, 1s or 2s, etc.) is used as the specific time For example, if the 24 hours per day are divided into 12 time periods on average, any one of the 12 time periods is the above specific time period. At this time, if the number of received target images or target images to be processed in a certain time period is greater than the preset threshold, the data set corresponding to the certain time period is the second data set. The number of received target images or target images to be processed is less than a preset threshold, and the data set corresponding to the certain time period is the first data set.

Another example:

The aforementioned specific time period may be a predetermined fixed time period, for example, every 1 s, 1 min, 1 h, and so on.

In an embodiment of the present application, the number of images in the first data set and the second data set can be dynamically changed. When the semantic segmentation of the target image is performed, the number of images in the first data set and the second data set can be The number determines whether to use the model corresponding to the first type for semantic segmentation or to use the model corresponding to the second type for semantic segmentation. In a specific embodiment of this application, the size of the data is determined based on the number of specific query network requests in the context of specific hardware capabilities. For example, if a CPU server is used and it has no less than 10,000 CPU cores The computing power, response time requirement is less than 3 seconds, the number of target images in the data set is greater than or equal to 50000 is regarded as the second data set, and less than 50000 is regarded as the first data set; for example, if you use a GPU server, it contains many It is equivalent to the computing power of 24 NVIDIA V100GPUs, and the response time is less than 3 seconds. The number of target images in the data set is greater than or equal to 50,000 is considered the second data set, and less than 50,000 is considered the first data set. In another embodiment of the present application, the size of the data is determined based on the statistical number of time periods in the context of specific hardware capabilities. For example, since there are more users inquiring about 9-12 in the morning, the model corresponding to the second type can be used for semantic segmentation, and there are fewer users who query data in the evening from 00:00 to 08:00, so the first type can be used. The corresponding model performs semantic segmentation. In actual application: receive the target image uploaded by the user; determine the upload time of the target image; determine the time period corresponding to the upload time; determine the segmentation network for semantic segmentation of the target image according to the time period Model, and perform semantic segmentation on the target image according to the determined segmentation network model.

Among them, in order to avoid the waste of computing resources, before semantic segmentation is performed on the target image to obtain the region of interest of the target image, the type of the target image can also be determined; when the type is the preset type, the semantic segmentation of the target image is determined . When determining the type of the target image, it can be expressed as the following process: dividing the target image into a preset number of non-overlapping sliders; determining the feature values of the preset number of non-overlapping sliders to obtain the preset number of feature values; The preset number of feature values are combined into feature vectors; the feature vectors are input to the support vector machine classifier for analysis to obtain the target image type. Specifically, taking blood glucose images as an example, for example, the types of blood glucose images of Abbott's continuous blood glucose device include two types: 8 hours and 24 hours. First extract the local variance feature and local color feature of the image. Specifically, the input image with a size of 256*256 is divided into 256 16*16 non-overlapping sliders, and the partial variance of each relatively independent slider and the image blue channel are calculated. The average value of the pixel value, the variance and the average pixel value of the blue channel are combined into the feature value of the independent slider, and then the 256 slider features are combined into a feature vector with a dimension of 512, and finally, combined with SVM (support vector machine Support vector machine) classifier realizes the two classification of images and completes the classification of blood glucose images.

Taking blood glucose image as an example, when recognizing the image data therein, image segmentation is performed on the image that is determined to be recognized, so as to accurately extract the highlight part of the screen. First, use the ENET network to complete the pre-segmentation of the user uploaded image, and return the foreground area mask. The network has the characteristics of few parameters, small model and high accuracy. The basic implementation units of the pre-segmentation network ENET are: (1) the initialization module, (2) the bottleneck module designed based on the ResNet idea. Each module contains three convolutional layers. Among them, the first convolutional layer achieves dimensionality reduction, and the first The two convolutional layers implement hole convolution, full convolution and asymmetric convolution, etc., and the third convolution implements the dimensionality increase function. Each convolution kernel includes Batch Normalization and PReLU. In the experiment, there are 644 samples of the total data set, which are divided into training data: 515, validation set: 65, and test set: 64. All collected images cover multiple angles, and all photos have uniform illumination distribution. In network training, the initial learning rate is 0.005, and the learning rate decays once every 30 iterations. The total number of iterations epoch is 300, but not limited to 300. All specific network parameters can be adjusted according to the actual data. On the existing small data sets, the trained blood glucose image segmentation model has considerable effect. The specific training and test performance are shown in Table 2. Among them, the test environment is: memory 16G, CPU model Intel(R)Core(TM)i5-7500CPU@3.40GHz. The model performance is shown in the following table. Among them, IOU (Intersection over Union) is calculated from the true value GT and test PR. The final result is the intersection of GT and PR than the union of GT and PR, which is common in target detection and segmentation. Metrics. The performance of the ENET semantic segmentation network model is shown in Table 3.

Table 3 ENET semantic segmentation network model performance (small data set)

With the continuous increase of user data, it is found during the iteration of the segmentation network model that the semantic segmentation network ENET is no longer suitable for large data sets. With the continuous increase of data sets and the higher the data complexity, the semantic segmentation network ENET cannot efficiently and reasonably balance the spatial information and receptive fields in the image in the pursuit of speed. Therefore, the model performance of the network on large data sets Does not meet further application requirements. In the new segmentation data set, the total number of samples is 4912, which are divided into 4104 training sets, 608 verification sets, and 200 test sets. In network training, the initial learning rate is 0.01, and the learning rate decays once every 30 iterations. The total number of iterations epoch is 300. All network parameters include but are not limited to the above values. The specific data can be adjusted according to the actual data. In the same test environment, the model performance is shown in Table 4:

Table 4 ENET semantic segmentation network model performance (big data set)

Therefore, in order to meet the segmentation performance of the network model on the large training data set, a simplified BiSeNet model is proposed. The original segmentation model BiSeNet has certain advantages in speed and accuracy on public data sets (data set Cityscapes, data set CamVid, data set COCO-Stuff, etc.). For the training data in the embodiments of this application, the sample complexity is relatively clean and less complex than the data in the public data set. Therefore, the semantic segmentation BiSeNet network is appropriately adjusted and simplified. The adjustment ideas are mainly divided into: (1) Spatial information processing Layer Spatial Path, (2) Receptive field processing layer Context Path, (3) Number of input-output channels (feature maps) between network layers, (4) Compress input image size. The specific content of the simplified modification is as follows: (1) The Spatial Path part of the backbone network is reduced by the original 3-layer network (where each layer includes the common convolution layer conv, the batch normalization layer Batch Normalization, and the nonlinear activation function ReLU ), reduced to a 2-layer network, as shown in Layer1 and Layer2 in Figure 2. At the same time, the part of the output channel is reduced from 128 feature maps to 64 feature maps, which greatly reduces network parameters and effectively compresses the model size. In the case of segmentation accuracy, the segmentation rate is greatly improved; (2) The model framework of the auxiliary network Context Path is changed, and the original ResNet18 and ResNet101 are replaced with the lighter Xception39 model, which reduces the model size while effectively ensuring the range of the receptive field; (3) Reduce the number of Feature Maps output by each network layer; (4) Compress the input image size of the model from the original 640*640 to 320*320. After the model training test, the input image is directly compressed for image segmentation The method can meet a certain segmentation accuracy, and at the same time, the computational cost is significantly reduced. The modified and simplified network structure is shown in Figure 5.

The experimental results show that the segmentation model trained under the above-mentioned data set with a total of 4912 samples has achieved better performance, which meets the actual application requirements. In the test environment, the memory is 16G, and the CPU model is Intel(R)Core(TM)i5- Under the hardware conditions of 7500 CPU @ 3.40GHz, the model performance is shown in Table 5:

Table 5 BiSeNet-Xception39 simplified version segmentation model performance

Optionally, the above-mentioned designated type parameter information includes curve information used to reflect the change trend of blood glucose data or value information of discrete points in the coordinate system, wherein each discrete point corresponds to a blood glucose value at each sampling time.

After determining the value of the above parameter, the value of the parameter corresponding to the selected pixel point coordinate at the target recording time can also be displayed.

In some embodiments of the present application, the coordinates of selected pixels in the target area can be determined by the following methods: detecting a user's instruction for the target image; determining the coordinates of the selected pixel according to the instruction.

Optionally, the instruction is determined based on one of the following information: the user's touch position on the human-computer interaction interface where the target image is located; and the query information input by the user. For the former, before determining the coordinates of the selected pixel point based on the touch point position, the following processing can also be performed: determine whether the touch point position is located in the target area; when the determination result indicates that the touch point position is located in the target area, trigger the determination to be selected The specified pixel coordinates.

Based on the above image recognition method, data analysis and result statistics are performed on 100 8-hour blood glucose images and 100 24-hour blood glucose images. Among them, for the 8-hour image, 98 blood glucose images can be effectively identified (the blood glucose recognized by this method) The image trend of the value is consistent with the image trend of the blood glucose value recognized by the scanner), and the error range is about plus or minus 0.4, which is in line with the actual application scenario. For 24-hour images, all 100 blood glucose images can be effectively identified (the image trend of the blood glucose value recognized by this method is consistent with the image trend of the blood glucose value recognized by the scanner), and the error range is about -0.6 to 0.4. Meet the needs of missing blood glucose values. At the same time, the method error is measured based on the quantitative indicators of R-Square, mean square error (MSE), root mean square error (RMSE), and average absolute error (MAE). The specific quantitative indicators are shown in Table 6. For the 8-hour image, the statistical results of the error distribution are shown in Figure 6. The error value distribution conforms to the normal distribution. The specific distribution is shown in Figure 7, and the error values are concentrated within plus or minus 0.4. For the 24-hour image, the error The statistical results of the distribution are shown in Figure 8. The error value distribution is also in line with the normal distribution, and the specific distribution is shown in Figure 9.

Table 6 Quantitative index results

An embodiment of the present application also provides an image recognition device, which is used to implement the method shown in FIG. 2, and as shown in FIG. 10, the device includes:

The first obtaining module 10 is configured to obtain a target image to be recognized;

The second acquisition module 12 is configured to acquire a target area in a target image, where the image in the target area is used to reflect the specified type parameter information;

The first determining module 14 is configured to determine the coordinates of the selected pixel in the target area;

The second determining module 16 is configured to determine the value of the parameter corresponding to the coordinate of the selected pixel based on the correlation between the value of the specified type parameter and the coordinate of the pixel point.

Utilizing the functions implemented by the above modules can also realize the recognition of the parameter values represented by the non-character information in the image, and achieve the purpose of automatically identifying the pixels in the image as the corresponding parameter values, thereby solving the current problem The image recognition method can only recognize the numerical value of the character format in the image, and cannot automatically recognize the curve or the discrete point as the technical problem of the numerical value.

In some embodiments of the present application, as shown in FIG. 11, the device further includes: a separation module 11 configured to separate a designated color channel from the target area, wherein the designated color channel is R, G, B color channel The same color channel as the color channel corresponding to the standard color band of the target area; the processing module 13 is set to perform image binarization processing on the image of the specified color channel to obtain a binarized image; the selection module 15 is set to preset In the threshold set, select the threshold corresponding to the area of each pixel in the binarized image, and use the selected threshold to segment the target area; the fitting module 17 is set to reference the binarized image obtained after segmentation Point pixel recognition, to obtain the pixel coordinates of at least two reference points of the standard color band in the image; the establishment module 19 is configured to determine the actual values of the at least two reference points and the corresponding relationship between the pixel coordinates, and based on The corresponding relationship establishes the linear relationship between the value of the specified type parameter and the pixel coordinate, and uses the linear relationship as the association relationship.

As shown in FIG. 11, the first determining module 14 includes: a grayscale processing unit 140, which is configured to perform grayscale processing on the image in the target area to obtain a grayscale image; and the clustering unit 142 is configured to perform grayscale Each pixel in the image is clustered to obtain multiple clusters; the selection unit 144 is configured to select a specified cluster from the multiple clusters, and determine the selected pixel coordinates from all pixels in the specified cluster.

Wherein, the selection unit 144 is further configured to select a cluster with the least number of pixels from a plurality of clusters, and determine the coordinates of the selected pixel from the cluster with the least number of pixels.

Optionally, the image in the target area includes: a curve image in a coordinate system, and the curve in the curve image is used to reflect the values of the specified type parameters at different times.

In some embodiments of the present application, the first determining module 14 is further configured to determine the target recording time corresponding to the pixel coordinates in the curved image; the second determining module 16 is also configured to determine the value and pixel value based on the specified type parameter. The correlation of the point coordinates determines the parameter values corresponding to the selected pixel point coordinates at the target recording time.

The above-mentioned target recording time is determined by the following methods: identifying the character information in the curve image, and extracting the time information of the specified type parameter from the character information; the time between any two adjacent recording moments in the time information is determined according to the number of pixels Divide at equal intervals to obtain multiple time points; determine the target recording time to which the selected pixel coordinates belong from the multiple time points.

Optionally, as shown in FIG. 11, the second acquisition module 12 includes: a segmentation unit 120 configured to perform semantic segmentation on the target image to obtain a mask image and a foreground image of the target image; and a second determining unit 122 configured to Identify the target area in the foreground image. Wherein, the segmentation unit 120 is configured to use an efficient neural network model to perform semantic segmentation on the target image when the target image is of the first type. The efficient neural network model includes an initialization module and a bottleneck module, where each bottleneck The module includes three convolutional layers. Among the three convolutional layers, the first convolutional layer is used for dimensionality reduction, and the second convolutional layer is used for hole convolution, full convolution and asymmetric convolution. The third convolutional layer is used for dimensional upscaling; the segmentation unit 120 is also set to adjust the bilateral segmentation network model when the target image is of the second type, and use the adjusted segmentation network model to perform the target image Semantic segmentation, where adjusting the segmentation network model includes at least one of the following: reducing the number of spatial information processing layers in the segmentation network model; reducing the number of feature maps output by each network layer; performing the input image of the bilateral segmentation network model Compression processing; simplify the receptive field processing layer.

The segmentation unit 120 is further configured to simplify the receptive field processing layer by replacing the residual neural network (RESNET) module in the receptive field processing layer with a channel separation convolution (Xception39) module.

Optionally, as shown in FIG. 11, the above-mentioned apparatus may further include: a third determining module 21 configured to determine the type of the target image; and when the type is a preset type, determining to perform semantic segmentation on the target image. The third determining module 21 is further configured to determine the type of the target image by: dividing the target image into a preset number of non-overlapping sliders; determining the feature values of the preset number of non-overlapping sliders to obtain the preset number A feature value; a preset number of feature values are combined into a feature vector; the feature vector is input to the support vector machine classifier for analysis, and the type of the target image is obtained.

Optionally, the target area contains a curve image of the change trend of blood glucose data.

The embodiment of the present application also provides a data display method. As shown in FIG. 12, the method includes:

Step S1202, displaying and acquiring the target image to be recognized;

Step S1204, displaying the region of interest in the target image, where the image in the region of interest is used to reflect the change information of the specified type parameter over time;

Step S1206, displaying the coordinates of the selected pixel in the region of interest and the target recording time corresponding to the pixel coordinates;

Step S1208: Display the parameter value corresponding to the selected pixel point coordinate at the target recording time, where the parameter value is determined based on the correlation between the value of the specified type parameter and the pixel point coordinate.

It should be noted that the execution subject of the above steps S1202 to S1208 includes but is not limited to a mobile terminal.

In some embodiments of the present application, the above-mentioned association relationship is determined in the following manner: the designated color channel is separated from the region of interest, where the designated color channel is the standard color band of the R, G, B color channels and the region of interest Corresponding color channels with the same color channel; perform image binarization processing on the image of the specified color channel to obtain a binarized image; select the threshold corresponding to the area of each pixel in the binarized image from the preset threshold set , And use the selected threshold to perform image segmentation on the region of interest; perform reference point pixel recognition on the binarized image obtained after segmentation to obtain the pixel coordinates of at least two reference points of the standard color band in the image; determine at least two Correspondence between the actual values of the reference points and the pixel coordinates, and establish a linear relationship between the values of the specified type parameters and the pixel coordinates based on the correspondence, and use the linear relationship as the association relationship.

When it is necessary to explain, for the preferred implementation manner of the embodiment shown in FIG. 12, reference may be made to the related description of the embodiment shown in FIGS. 2 to 9, which will not be repeated here.

The embodiment of the present application also provides an image recognition method. The method can determine the selected pixel based on the user's touch operation, thereby determining the parameter value corresponding to the pixel. Specifically, as shown in FIG. 13, the method include:

Step S1302, detecting the position of the user's touch point in the target image;

Step S1304: Determine the coordinates of the selected pixel based on the position of the touch point, and the target recording time corresponding to the coordinate of the pixel;

Step S1306: Determine the parameter value corresponding to the selected pixel coordinate at the target recording time based on the correlation between the value of the specified type parameter and the pixel coordinate;

Step S1308, output parameter values. The value of the output parameter includes, but is not limited to, displaying the value of the parameter to the user, or sending the value of the parameter to the external device, but is not limited to the above-mentioned manifestation.

In some optional embodiments of the present application, before determining the selected pixel point coordinates based on the touch point position, in order to prevent interference from invalid touch operations, it can also be determined whether the touch point position is located in the interest area of the target image, where , The image in the region of interest is used to reflect the change information of the specified type of parameters over time; when the judgment result indicates that the touch point is located in the region of interest, trigger the determination of the coordinates of the selected pixel point.

In some other embodiments of the present application, before determining the value of the parameter corresponding to the selected pixel coordinate at the target recording time based on the correlation between the value of the specified type parameter and the pixel coordinate, the following processing may be performed Process: Separate the designated color channel from the region of interest, where the designated color channel is the same color channel in the R, G, and B color channels as the color channel corresponding to the standard color band of the region of interest; for the image of the designated color channel Perform image binarization processing to obtain a binarized image; select the threshold corresponding to the area where each pixel in the binarized image is located from the preset threshold set, and use the selected threshold to segment the region of interest; The binarized image obtained after segmentation performs reference point pixel recognition to obtain the pixel coordinates of at least two reference points of the standard color band in the image; determine the actual values of at least two reference points and the correspondence between the pixel coordinates It establishes the linear relationship between the value of the specified type parameter and the pixel coordinate based on the corresponding relationship, and uses the linear relationship as the association relationship.

When it is necessary to explain, the preferred implementation manner of the embodiment shown in FIG. 13 can refer to the related description of the embodiment shown in FIGS. 2 to 9, which will not be repeated here.

The embodiment of the present application also provides an image recognition method, which can determine the selected pixel based on user input, thereby determining the parameter value corresponding to the pixel. As shown in FIG. 14, the method includes:

Step S1402: Detect query information input by the user; wherein the query information may be input through a human-computer interaction interface, and the human-computer interaction interface includes a text input box for inputting query information.

Step S1404: Determine the coordinates of the selected pixel in the target image and the target recording time corresponding to the pixel coordinates based on the query information;

Step S1406: Determine the value of the parameter corresponding to the selected pixel coordinate at the target recording time based on the correlation between the value of the specified type parameter and the pixel coordinate;

Step S1408, output parameter values.

In some embodiments of the present application, before determining the value of the parameter corresponding to the selected pixel point coordinate at the target recording time based on the correlation between the value of the specified type parameter and the pixel point coordinate, the following process may also be performed : Separate the designated color channel from the region of interest, where the designated color channel is the same color channel in the R, G, and B color channels as the color channel corresponding to the standard color band of the region of interest; Image binarization processing to obtain a binarized image; select the threshold corresponding to the area of each pixel in the binarized image from the preset threshold set, and use the selected threshold to segment the region of interest; segmentation The binarized image obtained afterwards performs reference point pixel identification to obtain the pixel coordinates of at least two reference points of the standard color band in the image; determine the actual values of at least two reference points and the corresponding relationship between the pixel coordinates , And establish the linear relationship between the value of the specified type parameter and the pixel coordinate based on the corresponding relationship, and use the linear relationship as the association relationship.

When it is necessary to explain, for the preferred implementation of the embodiment shown in FIG. 14, reference may be made to the related description of the embodiment shown in FIGS. 2 to 9, which will not be repeated here.

The embodiment of the present application also provides a non-volatile storage medium, the storage medium includes a stored program, wherein the device where the storage medium is located is controlled to execute the above image recognition method when the program runs.

The embodiment of the present application also provides a processor, which is configured to run a program, wherein the above image recognition method is executed when the program is running.

The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.

In the above-mentioned embodiments of the present application, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the units may be a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code .

The above are only the preferred embodiments of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of this application, several improvements and modifications can be made, and these improvements and modifications are also Should be regarded as the scope of protection of this application.

Industrial applicability

The solutions provided in the embodiments of the present application can be applied to the image recognition process, for example, can be applied to the image recognition process of blood glucose data. Based on the solution provided by the embodiments of this application, since the correlation between the pixel coordinates in the target image and the values of the specified type parameters is used to identify the parameter values corresponding to any pixel coordinates in the image, the The recognition of the parameter value represented by the character information achieves the purpose of automatically recognizing the pixel in the image as the corresponding parameter value, and further solves that the current image recognition method can only recognize the value of the character format in the image. A technical problem that cannot automatically recognize curves or discrete points as numerical values.

Claims

An image recognition method, including:

Obtain the target image to be recognized;

Acquiring a target area in the target image, where the image in the target area is used to reflect specified type parameter information;

Determining the coordinates of selected pixels in the target area;

The value of the parameter corresponding to the selected pixel point coordinate is determined based on the correlation between the value of the specified type parameter and the pixel point coordinate.
The method according to claim 1, wherein before determining the value of the parameter corresponding to the selected pixel point coordinate based on the correlation between the value of the specified type parameter and the pixel point coordinate, the method further comprises

Separating a designated color channel from the target area, where the designated color channel is the same color channel as the color channel corresponding to the standard color band of the target area among the R, G, and B color channels;

Performing image binarization processing on the image of the designated color channel, where the binarization process is to select a set of pixel points greater than a preset threshold in the image of the designated color channel to obtain a binary image;

Performing reference point pixel identification on the obtained binarized image to obtain the pixel point coordinate height of at least two reference points of the standard color band in the image;

Determine the actual value of the at least two reference points and the corresponding relationship between the pixel point coordinates, and establish a linear relationship between the value of the specified type parameter and the pixel point coordinates based on the corresponding relationship, and compare the The linear relationship is the association relationship.
The method according to claim 1, wherein determining the coordinates of the selected pixel in the target area comprises:

Performing grayscale processing on the image in the target area to obtain a grayscale image;

Performing clustering processing on each pixel in the grayscale image to obtain multiple clusters;

A designated cluster is selected from the plurality of clusters, and the coordinates of the selected pixel point are determined from all the pixels in the designated cluster.
The method according to claim 3, wherein selecting a designated cluster from the plurality of clusters, and determining the coordinates of the selected pixel point from all pixels in the designated cluster is specifically:

Select the cluster with the least number of pixels from the plurality of clusters, and determine the selected pixel coordinates from the cluster with the least number of pixels.
The method of claim 1, wherein:

The image in the target area includes: a curve image in the coordinate system or an image of discrete points in the coordinate system, and the curve in the curve image or the discrete points in the discrete point image are used to reflect the specified type parameters at different moments. Value

The method further includes: determining the target recording time corresponding to the pixel point coordinate in the curve image or the discrete point image;

The determining the parameter value corresponding to the selected pixel point coordinate based on the correlation between the value of the specified type parameter and the pixel point coordinate includes: based on the association between the value of the specified type parameter and the pixel point coordinate The relationship determines the value of the parameter corresponding to the selected pixel point coordinate at the target recording time.
The method according to claim 5, wherein the target recording time is determined in the following manner:

Identifying character information in the curved image or the discrete point image, and extracting the time information of the specified type parameter from the character information;

Divide the time length between any two adjacent recording moments in the time information at equal intervals according to the number of pixels to obtain multiple time points;

The target recording time to which the coordinates of the selected pixel point belongs is determined from the multiple time points.
The method according to claim 1, wherein said obtaining the target area in the target image comprises:

Performing semantic segmentation on the target image to obtain a mask image and a foreground image of the target image;

Determine a region of interest from the foreground image, and use the region of interest as the target region.
The method according to claim 7, wherein:

In the case that the target image is of the first type, an efficient neural network model is used to perform semantic segmentation on the target image, where the efficient neural network model includes: an initialization module and a bottleneck module, wherein each bottleneck module includes Three convolutional layers, wherein the first convolutional layer of the three convolutional layers is used for dimensionality reduction processing, and the second convolutional layer is used for hole convolution, full convolution and asymmetric convolution, The third convolutional layer is used to perform dimension upgrading;

When the target image is of the second type, adjusting the bilateral segmentation network model, and using the adjusted segmentation network model to perform semantic segmentation on the target image, wherein the adjusting the bilateral segmentation network model includes :

The bilateral segmentation network model includes a backbone network and an auxiliary network. The backbone network is composed of two layers. Each backbone network includes a convolutional layer, a batch normalization layer and a nonlinear activation function, which reduces the output channel feature map of the backbone network. The auxiliary network model framework adopts a lightweight model to reduce the number of feature maps of the main network output channel. The lightweight model includes one of the following: Xception39, SqueezeNet, Xception, MobileNet, ShuffleNet;

Wherein, the number of images in the first data set corresponding to the first type is smaller than the number of images in the second data set corresponding to the second type.
The method according to claim 7, wherein said determining a region of interest from said foreground image comprises:

Determining the characteristic area in the foreground image and the corner point coordinates of the target geometric area, wherein the characteristic area is an area in the foreground image that contains the specified type parameter information;

Calculating a projection transformation matrix based on the corner coordinates;

Performing projection transformation on the pixel points in the characteristic region to obtain the region of interest.
The method according to claim 7, wherein, before said acquiring the target image to be recognized, the method further comprises:

Determine whether the image to be recognized is the target image;

When the image to be recognized is a target image, it is determined to perform semantic segmentation on the target image.
The method according to claim 9, wherein the method further comprises:

Dividing the region of interest into a preset number of non-overlapping sliders;

Determine the feature values of the preset number of non-overlapping sliders to obtain the preset number of feature values; combine the preset number of feature values into a feature vector; input the feature vector to support vector machine classification Detector performs analysis to obtain the type of the region of interest.
The method according to claim 1, wherein the specified type of parameter information contains curve information used to reflect the trend of blood glucose data changes over time, or discrete points in a coordinate system used to reflect the trend of blood glucose data changes over time Value information.
The method of claim 1, wherein the method further comprises:

Display the parameter values corresponding to the selected pixel coordinates.
The method according to claim 1, wherein determining the coordinates of the selected pixel in the target area comprises:

Receive a user's instruction for the target image; determine the selected pixel coordinates according to the instruction.
The method according to claim 14, wherein the instruction is determined based on one of the following information: receiving position information of the user's touch point on the human-computer interaction interface where the target image is located; or receiving query information input by the user .
The method according to claim 15, wherein when the instruction is to receive the position information of the user's touch point on the human-computer interaction interface where the target image is located, the method further comprises:

Before determining the coordinates of the selected pixel point based on the touch point position, determining whether the touch point position is located in the target area;

When the determination result indicates that the position of the touch point is located in the target area, triggering the determination of the selected pixel point coordinates.
An image recognition device includes:

The first obtaining module is configured to obtain the target image to be recognized;

The second acquisition module is configured to acquire a target area in the target image, wherein the image in the target area is used to reflect specified type parameter information;

The first determining module is configured to determine the coordinates of the selected pixel in the target area;

The second determining module is configured to determine the value of the parameter corresponding to the coordinate of the selected pixel based on the correlation between the value of the specified type parameter and the coordinate of the pixel point.
The device according to claim 17, wherein the device further comprises:

The separation module is configured to separate a designated color channel from the target area, wherein the designated color channel is the same color channel as the color channel corresponding to the standard color band of the target area among the R, G, and B color channels ；

The processing module is configured to perform image binarization processing on the image of the specified color channel, and the binarization process is to select a set of pixel points greater than a preset threshold in the image of the specified color channel to obtain a binarized image ；

A fitting module, configured to perform reference point pixel recognition on the obtained binarized image, and obtain the pixel point coordinates of at least two reference points of the standard color band in the image;

The establishment module is configured to determine the actual value of the at least two reference points and the corresponding relationship between the pixel coordinates, and establish the linear relationship between the value of the specified type parameter and the pixel coordinates based on the corresponding relationship Relationship, and use the linear relationship as the association relationship.
The apparatus according to claim 17, wherein the first determining module comprises:

A grayscale processing unit, configured to perform grayscale processing on the image in the target area to obtain a grayscale image;

The clustering unit is configured to perform clustering processing on each pixel in the grayscale image to obtain multiple clusters;

The selection unit is configured to select a designated cluster from the plurality of clusters, and determine the coordinates of the selected pixel point from all the pixels in the designated cluster.
The device according to claim 19, wherein the selection unit is configured to select a cluster with the least number of pixels from the plurality of clusters, and determine the selected pixel from the cluster with the least number of pixels Point coordinates.
The device of claim 17, wherein:

The image in the target area includes: a curve image in a coordinate system or a discrete point image in a coordinate system, and a curve in the curve image or a discrete point in the discrete point image is used to reflect the specified type parameters at different times The value of

The first determining module is further configured to determine the target recording time corresponding to the pixel point coordinates in the curve image;

The second determining module is configured to determine the value of the parameter corresponding to the selected pixel point coordinate at the target recording time based on the correlation between the value of the specified type parameter and the pixel point coordinate.
The apparatus according to claim 21, wherein the first determining module further comprises:

The first recognition unit is configured to recognize character information in the curve image or the discrete point image, and extract the time information of the specified type parameter from the character information;

The first dividing unit is configured to divide the time length between any two adjacent recording moments in the time information at equal intervals according to the number of pixels to obtain multiple time points;

The first determining unit is configured to determine the target recording time to which the coordinates of the selected pixel point belong from the multiple time points.
The apparatus according to claim 17, wherein the second acquisition module comprises:

A segmentation unit, configured to perform semantic segmentation on the target image to obtain a mask image and a foreground image of the target image;

The second determining unit is configured to determine a region of interest from the foreground image, and use the region of interest as the target region.
The device according to claim 23, wherein:

The segmentation unit is configured to use an efficient neural network model to perform semantic segmentation on the target image when the target image is of the first type, wherein the efficient neural network model includes an initialization module and a bottleneck module, Wherein, each bottleneck module includes three convolutional layers, wherein the first convolutional layer of the three convolutional layers is used for dimensionality reduction processing, and the second convolutional layer is used for hole convolution and full convolution. Product and asymmetric convolution, and the third convolution layer is used for dimensionality increase processing;

The segmentation unit is further configured to adjust the bilateral segmentation network model when the target image is of the second type, and use the adjusted segmentation network model to perform semantic segmentation on the target image, wherein The adjustment of the segmentation network model includes: the bilateral segmentation network model includes a backbone network and an auxiliary network. The backbone network is composed of two layers. Each layer of the backbone network includes a convolutional layer, a batch normalization layer, and a nonlinear activation function. , Reduce the number of feature maps of the backbone network output channel; the auxiliary network model framework adopts a lightweight model to reduce the number of feature maps of the backbone network output channel. The lightweight model includes one of the following: Xception39, SqueezeNet, Xception, MobileNet, ShuffleNet; wherein, the first type of

The number of images corresponding to the first data set is smaller than the number of images in the second data set corresponding to the second type.
23. The device according to claim 23, wherein the second determining unit is further configured to determine the characteristic area in the foreground image and the corner point coordinates of the target geometric area, wherein the characteristic area is the foreground The area in the image containing the specified type parameter information; the projection transformation matrix is calculated based on the corner coordinates; the pixel points in the characteristic area are projected and transformed to obtain the area of interest.
The device according to claim 23, wherein the device further comprises:

The third determining module is configured to determine whether the image to be recognized is a target image; and when the image to be recognized is a target image, determine to perform semantic segmentation on the target image.
The device according to claim 26, wherein the third determining module is further configured to determine the type of the region of interest by dividing the region of interest into a preset number of non-overlapping sliders; The preset number of feature values of non-overlapping sliders are obtained to obtain the preset number of feature values; the preset number of feature values are combined into a feature vector; the feature vector is input to a support vector machine classifier Analysis to obtain the type of the region of interest.
The device according to any one of claims 17 to 27, wherein the specified type parameter information includes curve information used to reflect the trend of blood glucose data changes over time, or used to reflect the trend of blood glucose data changes over time Value information of discrete points in the coordinate system.
The device according to claim 17, wherein the device further comprises:

The display module is used to display the parameter values corresponding to the selected pixel coordinates.
The device according to claim 17, wherein the first determining module is further configured to receive a user's instruction for the target image; determine the coordinates of the selected pixel according to the instruction.
The device according to claim 30, wherein the instruction is determined based on one of the following information: receiving position information of the user's touch point on the human-computer interaction interface where the target image is located; or receiving query information input by the user .
The device according to claim 31, wherein the device further comprises:

The judging module is configured to judge the coordinates of the selected pixel based on the touch point position when the instruction is to receive the user's touch point position information on the human-computer interaction interface where the target image is located Whether the touch point is located in the target area;

The triggering module is used for triggering the determination of the coordinates of the selected pixel when the judgment result indicates that the position of the touch point is located in the target area.
A non-volatile storage medium, wherein the non-volatile storage medium includes a stored program, wherein, when the program is running, the device where the non-volatile storage medium is located is controlled to execute claims 1 to 16 Any one of the image recognition methods.
A processor, wherein the processor is configured to run a program, wherein the image recognition method according to any one of claims 1 to 16 is executed when the program is running.