CN114821572A - Deep learning oral pill identification method based on multiple views and data expansion - Google Patents
Deep learning oral pill identification method based on multiple views and data expansion Download PDFInfo
- Publication number
- CN114821572A CN114821572A CN202210242282.0A CN202210242282A CN114821572A CN 114821572 A CN114821572 A CN 114821572A CN 202210242282 A CN202210242282 A CN 202210242282A CN 114821572 A CN114821572 A CN 114821572A
- Authority
- CN
- China
- Prior art keywords
- image
- medicine
- model
- shooting
- pill
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
Deep learning oral pill identification method based on multiple views and data expansion. And a database is built by adopting a multi-view and data augmentation method, and a data set is perfected at multiple angles. And (3) designing practical models embedded into mobile equipment and small and medium-sized equipment by using a lightweight network. And combining the multiple views with the two-dimensional model, and completing the construction of the practical model after the transfer learning. Meanwhile, an incomplete oral pill identification channel is established, and identification is carried out after template matching is carried out on incomplete pills and the incomplete pills are restored into complete pill pictures. The method effectively classifies the medicines with highly similar shapes and colors, assists medical staff in sorting the medicines, and reduces or even avoids the life safety problem of patients caused by the wrong medicine classification. The overfitting problem caused by small data volume is solved through multi-view database building, data augmentation and transfer learning, the lightweight model MobileNetv2 is used as a basic framework, an attention module mechanism is introduced, the parameter quantity of the model is greatly reduced compared with that of a three-dimensional model, and the method is convenient, practical and easy to popularize.
Description
Technical Field
The invention belongs to the field of clinical medicine and nursing, and relates to a correct identification method of a medicine.
Background
The hospital shoulder bears the difficult task of rescuing and supporting the injury, the work content is large, the personnel are busy, and the situations that some medicines without outer packages are difficult to identify and are not distributed properly are difficult to avoid.
The hospital medicine distribution work is easily affected by human errors. The task of administering drugs to patients in a hospital or care unit environment, etc., is that the current procedure is a manual process: 1) the correct medication and the correct number of pill sets are placed in a plastic cup; 2) the bolus is properly delivered to the corresponding patient; 3) the bolus is administered to the patient at the correct time (e.g., no more than 4 hours apart). In the process, the degree is easily affected by human errors, and absolute quality assurance is difficult to realize.
Medication errors may also occur in a pharmacy environment. Filled prescriptions may be incorrectly labeled with an incorrect dosage and quantity of pills, or with an incorrect medication. As pharmacists become overly fatigued and distracted to confuse pills with similar medication names and physical appearances, pharmacists may dispense the wrong medication, quantity, and dose, and mistakes may occur that cause serious injury or even death to the patient.
With the development of deep learning in the field of image recognition, the task of performing medical image recognition by using a deep learning model is gradually advanced, the development of computer vision technology is mature at present, and especially, unprecedented effects are obtained in the directions of image processing, voice recognition and the like. For the identification and classification of images, compared with artificial subjective experience judgment, the deep learning model is used for image identification to assist medical workers in medicine distribution, so that the death cases of patients caused by human errors of the medical workers can be avoided. The characteristics of the capsule medicines are extracted by adopting a traditional machine learning method, such as Lishuai and the like, so that the task of pharmacy automation is realized. The characteristics of medicine packages are extracted by Shihuayu and the like by adopting a deep learning method, a medicine package identification system applying a deep learning technology obtains primary results, the classification and identification of 500 kinds of medicines are completed, and the verification accuracy in training is 96.4%. Zhangzhenjiang et al, utilize computer vision technology to realize the automatic identification of medicine classification and quantity, opened the technological feasibility of utilizing the advanced learning technique to assist the outpatient service to distribute medicines. The main technical method comprises the following steps: collecting drug external packing images, generating a training image set by utilizing a preprocessing technology, establishing a 7-layer (3C3P1F) convolutional neural network model for training, and deploying drug image recognition service with RESTful interface specification. In practical application, however, for the drug delivery of inpatients, after the external package of the drug which is easy to be used as identification information is removed, a single or a plurality of pills, even 1/2 and 1/4, are packaged and dispensed again at regular time and quantity according to the prescription prescribed by a doctor. Therefore, the prior art has not fully satisfied the need for drug identification.
Disclosure of Invention
The object of the present invention is to develop a method for quickly and accurately identifying a pill/pill combination in an automated manner, which method must be able to correctly identify the pill required by the patient that has been dispensed. The difficulty of correct identification lies in that the sizes, colors and shapes of the pills after being removed from the package are not very different, and the tiny characteristics are difficult to distinguish; secondly, stacking, side placing and the like can happen during medicine distribution, and the picture recognition model with a single angle cannot meet the complex task; in addition, in consideration of clinical conditions, ages, physical qualities and the like of patients, doctors can increase or decrease the dosage of the medicine as appropriate, and half-tablets, even 1/4-metered pills and other cases of 'incomplete' pills appear in actual situations. Aiming at the situation, in order to correctly identify all pills including incomplete pills, the invention adopts a multi-angle and multi-view shooting method to collect the pills and deals with different situations such as side placement, stacking and the like of the medicines; a pill picture recovery channel is designed to solve the problem of incomplete pill identification. Meanwhile, in consideration of the fact that later-stage application scenes are mostly used for small and medium-sized equipment such as a medicine delivery robot, a pharmacy automatic medicine distribution system and the like, the task is realized by adopting a lightweight neural network, and the later-stage product conversion is facilitated.
The purpose of the invention is achieved by the following steps: a deep learning oral pill identification method based on multiple views and data expansion is characterized in that: a database is built by adopting a multi-view and data augmentation method, and a data set is perfected at multiple angles; a practical model embedded into mobile equipment and small and medium-sized equipment is designed by using a lightweight network; combining the multi-view with the two-dimensional model, completing the construction of a practical model after transfer learning, correctly identifying oral pills and distributing the oral pills to corresponding patients through mobile equipment and small and medium-sized equipment; an incomplete oral pill identification channel is established, incomplete pills are recovered, and the practicability of the model is enhanced;
the method comprises the following specific steps:
1) building a multi-view database;
2) recovery of incomplete pills;
3) data augmentation;
4) completing the establishment of a convolutional neural network pill identification model through pre-training transfer learning;
5) outputting a medicine classification result and medicine name information;
in the construction of the multi-view database, a shot medicine picture is used as data set basic data, and in the medicine picture shooting method, shooting rules and shooting angles are specified, wherein the shooting angles comprise a plane placing shooting angle, a non-axisymmetric medicine shooting angle, a vertical plane shooting angle and a special condition shooting angle;
the incomplete pill recovery is to identify the complete pill picture corresponding to the incomplete pill by using a template matching algorithm aiming at half or even 1/4 metered incomplete pills appearing in the actual situation, and send the identified corresponding complete pill into the built convolutional neural network medicine identification model.
The data augmentation, namely data augmentation, cuts the medicine picture according to the resolution ratio under the condition of not substantially increasing the data, generates different picture data and obtains data volume augmentation; during operation, from the mathematical angle, data augmentation is performed by using an image central point by default, and image translation, image turnover, image rotation, image scaling, image miscut, image cutting and combination transformation are completed at the central point;
the transfer learning is to pre-train on an ImageNet public data set to learn the shallow semantic expression of the edge information; and finally, the optimal learned weight in the iteration times set by ImageNet pre-training is used as the initialization weight of the drug identification network, so that the model can be quickly converged to achieve a better identification effect.
The model building adopts a lightweight model MobileNetv2 as a basic framework, and on the basis of the framework of the basic model, the size of a convolution kernel is enlarged, and the receptive field of a convolution neural network is increased; meanwhile, a mixed attention module mechanism of channel attention and space attention is introduced, so that the feature extraction capability of the network is improved, the pill features are extracted to the maximum extent, and the medicine identification is assisted.
In the medicine picture shooting rule, the medicine accounts for 50-60% of the whole picture and cannot be too small, the shooting background adopts a pure color uniform color background which is different from the medicine color, and if part of the medicine has the same or similar color with the background, the part of the medicine independently selects other pure color backgrounds; high quality is required for shooting and clear focusing is required.
When the shooting angle is planar placing shooting, 3 pictures of the centrally symmetrical medicine are shot at 180 degrees, 60 degrees and 30 degrees of the plane of the camera respectively;
when the shooting angle is non-axisymmetric medicine shooting, the shooting is carried out by rotating for a plurality of angles, and the shooting is carried out once at intervals of about 30 degrees;
when the shooting angle is vertical shooting, one picture is shot every 30 degrees by rotating 180 degrees, and 6 pictures are obtained in total;
for special medicines without vertical surfaces, such as capsules, the special medicines are only shot according to a plane and rotated by 180 degrees or 360 degrees; if the front and back surfaces of the medicine are not consistent, a group of pictures need to be shot on the front surface and the back surface according to the plane shooting rule.
Identifying a complete pill picture corresponding to the incomplete pill by using a template matching algorithm, wherein the template matching is a method for searching a specific target in an image, comparing whether each position is similar to the template or not by traversing each possible position in the image, and when the similarity is high enough, considering that the matching is successful;
the algorithm comprises the following steps:
1) determining the length and width x, y of the current picture;
2) determining the length and width w, h of the template picture;
3) comparing the images (i, j) to (i + w, j + h) at each point (i, j) in sequence from (0, 0) to (x-w, y-h), and calculating the similarity between the images (i, j) to (i + w, j + h) and the template at each point (i, j);
4) and returning the similarity of each point after the comparison is finished.
In data augmentation, image translation, image turnover, image rotation, image scaling, image miscut, image cutting and combination transformation are completed at a central point from a mathematical angle; the method comprises the following steps: 1) first move the rotation point to the origin; 2) performing a rotation around the origin; 3) moving the rotation point back to the original position;
assume the original coordinates of the image as (x) 0 ,y 0 ) If the coordinates after translation are (x, y), the relationship between the coordinates before translation and after translation is as follows, wherein H is a transformation matrix;
image translation: translation means that all pixels are respectively translated in the x direction and the y direction, and a mathematical matrix corresponding to translation transformation is as follows: d is a radical of x 、d y Distance of movement in x, y directions:
image turning, namely image mirror image processing, wherein the image turning comprises horizontal turning and vertical turning; the horizontally flipped transformation matrix is:
the vertically flipped transformation matrix is:
the image rotation is performed by taking an image central point as a default center and performing rotation of any angle theta, and a transformation matrix is as follows:
the image scaling refers to scaling the current image in an arbitrary scale, and the transformation matrix is as follows, wherein S x 、S y Representing the scaled size;
the image miscut refers to the non-perpendicular projection of a plane scene on a projection plane, and the transformation matrix is as follows, h y 、h x Is the angle of transformation in the x and y directions;
the image cutting is to zoom the picture to 1.1 times of the original image and then perform cutting operation on the zoomed image;
the combined transformation adopts the combination of multiple augmentation modes, and a given translation transformation matrix H is assumed shift Rotation matrix H rotate Scaling matrix H scale ,
For the combinatorial transform one, the combined matrix M is as follows: m ═ H shift x H rotate x H scale ;
For the combined transform two, the combined matrix N is as follows: n ═ H scale x H rotate x H shift 。
On the basis of the architecture of a basic model of a lightweight model MobileNetv2, the size of an expanded convolution kernel is as follows: the receptive field is increased from 3 multiplied by 3 to 5 multiplied by 5 or more, and a Conditional Block Attachment Module (CBAM) Attention mechanism is added, so that the feature extraction capability of the model is enhanced.
When the template matching algorithm is used for identifying the complete pill picture corresponding to the incomplete pill, a certain value with the similarity threshold value of more than 90% is set, and when the matching similarity reaches the threshold value, the pill matching is considered to be successful.
The practical model embedded into the small and medium-sized mobile equipment is designed, the multi-view and the two-dimensional model are combined, the code of the model is integrated into the main control end of the small and medium-sized equipment, the equipment is provided with an LCD screen and a camera, medicines are scanned through the camera, medicine pictures are collected and transmitted back to the main control end, and the medicine pictures are sent to the model identification; the identified medicine name and the collected medicine picture are correspondingly displayed on the LCD screen for an operator to check.
The invention has the beneficial effects that:
1. the medicine that appearance, colour height are similar is appeared in effectual classification, and supplementary medical personnel carry out the medicine letter sorting, reduce or even avoid the disease to receive the condition threatened because the life safety that the medicine classification made mistakes and leads to.
2. The multi-view database is used for solving the complex situations of medicine side placement, stacking and the like caused by the mixing of various medicines in the medicine distribution process, and is combined with different data augmentation means, so that the angle of a data set is further improved.
3. The data is enlarged, overfitting caused by small data amount is compensated by transfer learning, and a practical model embedded into mobile equipment and small and medium-sized equipment is designed by combining with a lightweight network, so that the method is convenient and practical and has wide application prospect.
4. The multi-view model and the two-dimensional model are combined, the identification effect of the three-dimensional model is achieved to a certain extent, the medicine identification which is randomly placed at multiple angles can be dealt with, meanwhile, the parameter quantity of the model is greatly reduced compared with that of the three-dimensional model, and the conversion of actual products is easy.
5. Aiming at incomplete pills appearing in actual pill distribution, an image recovery channel is designed, and the incomplete pills can be effectively identified.
6. The invention introduces a mixed attention CBAM module mechanism of channel attention and space attention, improves the feature extraction capability of the network, extracts the pill features to the maximum extent and assists in medicine identification.
Drawings
Fig. 1 is a flow chart of the identification method of the present invention.
FIG. 2 is a schematic flow chart of a convolutional neural network for drug identification constructed by the present invention.
FIG. 3 is a schematic diagram of the correct drug ratio in the shooting rule of the present invention.
Fig. 4 is a schematic diagram of the wrong drug proportion in the shooting rule of the present invention.
Fig. 5 is a schematic view of a multi-view shooting angle of a circular medicine. In the figure, 1 is a camera and 2 is a medicine.
FIGS. 6-8 are schematic views of non-axisymmetric drug plane shooting angles.
Fig. 9-10 are schematic diagrams of vertical shooting angles.
Fig. 11 is a schematic diagram of the template matching algorithm.
FIGS. 12-14 are schematic diagrams of examples of matching centrosymmetric drug templates.
FIGS. 15-17 are schematic diagrams of non-centrosymmetric drug template matching examples.
Fig. 18 is a schematic diagram of an image augmentation scheme.
Fig. 19 is the MobileNetv2 model framework.
FIG. 20 is a schematic view of the receptive field.
FIG. 21 is a schematic diagram of a CBAM attention mechanism model.
Fig. 22 shows a modification of the MobileNetv2 inverted residual block model.
FIG. 23 is a diagram illustrating training loss and iteration count for training effectiveness and accuracy in an embodiment of the present invention.
FIG. 24 is a diagram illustrating the test loss and the number of iterations for training effectiveness and accuracy in an embodiment of the present invention.
FIG. 25 is a diagram illustrating the test accuracy and the number of iterations for the training effect and the precision in the embodiment of the present invention.
Detailed Description
Experimental Environment
The hardware environment and software environment used in this experimental example are shown in table 1:
TABLE 1
The invention adopts a lightweight model MobileNetv2 as a basic framework, MobileNetv2 is proposed by Google in 2018, 1 month, and innovation points of the model are two technologies of invested resources and Linear Bottlenecks. The method aims to improve the accuracy and reduce the occupation of the memory. The whole model structure is shown in Table 2
TABLE 2
Experimental data
The experimental data used 753 pictures of individual pills, for a total of 93 categories of pills. The high-definition JPG format pictures are collected and shot, shooting rules are collected according to data set collection specifications, and the specific number and types are shown in a table 3. TABLE 3
See figure 1.
The pill recognition model is built based on MobileNetv2 and is a CNN model. The CNN medicine identification network is a model built on the basis of MobileNetv 2.
The medicines are divided into complete pills and incomplete pills, and the complete pills and the incomplete pills enter a CNN medicine identification network. The incomplete pills are subjected to template matching, restored into images of complete pills and then enter an identification system. And outputting the medicine name and the picture after the identification system completes identification.
FIG. 2 is a schematic flow chart of a convolutional neural network for drug identification constructed by the present invention.
In the construction of the multi-view database, the shot medicine pictures are used as the basic data of the data set. In the medicine picture shooting method, a shooting rule and a shooting angle are specified, and the shooting angle comprises a plane placing shooting angle, a non-axisymmetric medicine shooting angle, a vertical plane shooting angle and a special condition shooting angle.
Fig. 2 shows that, after the multi-view database is built, data augmentation is performed on the built MobileNetv2 model. Data augmentation includes data for image flipping, image scaling, image cropping, image rotation, and image cropping. Data augmentation firstly, pre-training migration learning is carried out on an ImageNet public data set, and the shallow semantic expression of edge information of the data is learned; and finally, the optimal learned weight in the iteration times set by ImageNet pre-training is used as the initialization weight of the drug identification network, so that the model can be quickly converged to achieve a better identification effect. The identification step comprises:
1) building a multi-view database;
2) recovery of incomplete pills;
3) data augmentation;
4) completing model building through pre-training transfer learning;
5) and outputting a medicine classification result and medicine name information.
The model building adopts a lightweight model MobileNetv2 as a basic framework, and on the basis of the framework of the basic model, the size of a convolution kernel is enlarged, the receptive field of a convolution neural network is increased, and the feature extraction capability of the network is improved; in addition, a CBAM (cubic boron-random access memory) attention mechanism is introduced into the inverted residual block of the model, the CBAM attention mechanism can take space (spatial) and channel (channel) into consideration, and compared with a mechanism only paying attention to one aspect, the method further improves the feature extraction capability of the model, extracts the fine features of the pills to the maximum extent, and is beneficial to later-stage recognition of the pills. And outputting correct medicine classification results and medicine names according to the processed data.
Fig. 3-10 are schematic diagrams of the shooting rules of the present invention. Fig. 3 is a diagram of correct drug proportion, fig. 4 is a diagram of wrong drug proportion, the drug itself accounts for 50-60% of the whole image in the process of taking a drug picture, the drug cannot be too small, the taking background adopts a solid uniform color background which is different from the drug color, and if some drugs are the same or similar to the background color, the drugs are selected from other solid backgrounds separately; high quality is required for shooting and clear focusing is required.
See fig. 5-10.
When the shooting angle is that the plane is placed and is shot, 3 photos are shot to the centrosymmetric medicine, and the shooting is respectively carried out at the camera at multiple angles of 180 degrees, 60 degrees, 30 degrees and the like on the plane. When the shooting angle is non-axisymmetric medicine shooting, the shooting is carried out by rotating for a plurality of angles, and the shooting is carried out once at intervals of about 30 degrees; if the medicines are symmetrical left and right, the medicines are rotated by 180 degrees, and one medicine is shot every 30 degrees. When the shooting angle is vertical shooting, the shooting is carried out by rotating 180 degrees every 30 degrees, and 6 pictures are taken in total. For special medicines without vertical surfaces, such as capsules, the special medicines are only rotated by 180 degrees or 360 degrees according to plane shooting. If the front and back surfaces of the medicine are not consistent, a group of pictures need to be shot on the front surface and the back surface according to the plane shooting rule.
See fig. 12-17.
FIGS. 12-14 are schematic diagrams of examples of matching centrosymmetric drug templates.
Fig. 12 is a complete drug, and fig. 13 is a one-half defective pill of fig. 12, which is restored to a complete pill identical to fig. 12 by centrosymmetric drug template matching.
FIGS. 15-17 are schematic diagrams of non-centrosymmetric drug template matching examples.
Fig. 15 is a complete medicine, fig. 16 is a half incomplete pill of fig. 15, which is matched by a centrosymmetric medicine template, and fig. 17 is a picture showing that the complete pill of fig. 16 is restored.
The data augmentation of this example is seen in fig. 18.
Based on the characteristic of less medical data sets, a data augmentation means of a model is merged into a data preprocessing part. Data enhancement is also called data augmentation, i.e., allowing limited data to produce value equivalent to more data without substantially increasing the data. In this embodiment, if the resolution of the network input picture is 256 × 256, we adopt the random cropping method to 224 × 224, and one picture can generate 32 × 32 different pictures at most, and the data size is nearly 1000 times. All data augmentation is done at the image center by default at the time of operation. 1) First move the rotation point to the origin; 2) performing a rotation around the origin; 3) the rotation point is then moved back to the original position.
Assume the original coordinates of the image as (x) 0 ,y 0 ) If the coordinates after translation are (x, y), the relationship between the coordinates before translation and after translation is as follows, wherein H is a transformation matrix;
image translation: translation means that all pixels are respectively translated in the x direction and the y direction, and a mathematical matrix corresponding to translation transformation is as follows: d x 、d y Distance of movement in x, y directions:
image turning, namely image mirror image processing, wherein the image turning comprises horizontal turning and vertical turning; the horizontally flipped transformation matrix is:
the vertically flipped transform matrix is:
the image rotation is performed by taking an image central point as a default center and performing rotation of any angle theta, and a transformation matrix is as follows:
the image scaling refers to scaling the current image in an arbitrary scale, and the transformation matrix is as follows, wherein S x 、S y Representing the scaled size;
image miscut refers to the non-perpendicular projection of a planar scene onto a projection plane, the transformation matrix of which is as follows, h y 、h x For conversion in x and y directionsThe angle of (d);
in accordance with the common practice of clipping in deep learning, in this embodiment, when image clipping is performed, the image is enlarged to 1.1 times of the original image, and then random scale clipping operation is performed on the enlarged image.
Fig. 19 is the MobileNetv2 model framework.
The invention adopts a lightweight model MobileNetv2 as a basic framework, MobileNetv2 is provided by Google in 2018, and the innovation points of the model are two technologies of invoked reactions and Linear bottleecks. The method aims to improve the accuracy and reduce the occupation of the memory.
For this model, the number of channels increases with time and the size of the space decreases accordingly. But overall, the tensor remains relatively small due to the bottleneck layer that makes up the connections between blocks.
In order to improve the feature extraction capability of the model, the research expands the size of a convolution kernel, increases the receptive field of a convolution neural network and improves the special gift extraction capability of the network on the basis of the architecture of a basic model. This embodiment increases the receptive field from 3 × 3 to 5 × 5 or more. As shown in fig. 20.
The CBAM attention mechanism has been increased to further improve feature extraction capability. As shown in fig. 21.
The relational Block Attention Module (CBAM) represents an Attention Module of a convolution Module, and is an Attention Module combining space (spatial) and channel (channel). For the pill identification of the invention, because the pill identification is a fine-grained identification problem, the network is required to have relatively high feature extraction capability to identify the nuances among the pills. The attention module in the CBAM allows the network to place attention in areas that are meaningful to pill identification without excessive learning of features such as background interference areas, thus allowing the model to extract maximum pill features to identify nuances between pills. Meanwhile, because the CBAM is a light-weight general module, the CBAM can be seamlessly integrated into any CNN architecture by neglecting the overhead of the module, and can be used for carrying out end-to-end training together with the basic CNN, and the light-weight module is added, so that the overall performance of the model is improved, the size of the model is not excessively increased, and the CBAM is beneficial to embedding mobile equipment in the later period.
Fig. 22 is a schematic diagram of a modification of MobileNetv2 inverted residual block model.
In the figure, the expansion of the channel is performed first, and then the reduction of the channel is performed, and a 1 × 1 "expansion" layer (Point-wise restriction (PW)) is added before the Depth-wise restriction (DW), in order to increase the number of channels and obtain more features, that is: "dilation" (PW) → "convolution extraction feature" (DW) → "compression" (PW).
The combination transformation adopts the combination of multiple augmentation modes. The data augmentation in deep learning generally adopts a combination of multiple augmentation modes, and here, matrix multiplication is involved, and according to the operation rule, the result of different combination sequences is different, that is, AB ≠ BA in linear algebra, except for special cases. For better explanation, assume that a translation transformation matrix H is given shift Rotation matrix H rotate Scaling matrix H scale . In this embodiment, we present two different combinatorial transformations.
For the combinatorial transform one, the combined matrix is as follows: m ═ H shift x H rotate x H scale ;
For the combined transform two, the combined matrix is as follows: n ═ H scale x H rotate x H shift 。
The data augmentation mode can prevent overfitting of the model, and can make up for the situation that picture data of certain angles are not collected due to incomplete collection methods in the database building process, so that the multi-latitude database is further improved.
The present embodiment uses MobileNetv2 as the infrastructure for pill identification, epoch is set to 500, batch _ size is set to 16, and the learning rate is designed to be 0.001. If the training precision is not reduced within 5 epochs, the learning rate is reduced by 10%.
The training effect and accuracy are shown in fig. 23. It can be seen in the figure that: with the increase of the iteration number, the loss of the training set gradually decreases and converges to a certain range.
The test loss versus number of iterations is shown in FIG. 24, where it can be seen that: with the increase of the iteration times, the loss of the test set gradually decreases and converges to a certain range.
The test accuracy and the number of iterations for the training effect and precision are shown in fig. 25. It can be seen in the figure that: with the increase of the iteration times, the precision of the test set gradually rises and finally stabilizes within a certain range.
Claims (7)
1. A deep learning oral pill identification method based on multiple views and data expansion is characterized in that: a database is built by adopting a multi-view and data augmentation method, and a data set is perfected at multiple angles; a practical model embedded into small and medium-sized mobile equipment is designed by using a lightweight network; combining the multi-view with the two-dimensional model, completing the construction of a practical model after transfer learning, correctly identifying oral pills and distributing the oral pills to corresponding patients through mobile equipment and small and medium-sized equipment; an incomplete oral pill identification channel is established, incomplete pills are recovered, and the practicability of the model is enhanced;
the method comprises the following specific steps:
1) building a multi-view database;
2) recovery of incomplete pills;
3) data augmentation;
4) completing the establishment of a convolutional neural network pill identification model through pre-training transfer learning;
5) outputting a medicine classification result and medicine name information;
in the construction of the multi-view database, a shot medicine picture is used as data set basic data, and in the medicine picture shooting method, shooting rules and shooting angles are specified, wherein the shooting angles comprise a plane placing shooting angle, a non-axisymmetric medicine shooting angle, a vertical plane shooting angle and a special condition shooting angle;
the incomplete pill recovery is to identify a complete pill picture corresponding to a incomplete pill by using a template matching algorithm aiming at half or even 1/4 metered incomplete pills appearing in an actual situation, and send the identified corresponding complete pill into a built convolutional neural network medicine identification model;
the data augmentation, namely data augmentation, cuts the medicine picture according to the resolution ratio under the condition of not substantially increasing the data, generates different picture data and obtains data volume augmentation; during operation, from the mathematical angle, data augmentation is performed by using an image central point by default, and image translation, image turnover, image rotation, image scaling, image miscut, image cutting and combination transformation are completed at the central point;
the transfer learning is to pre-train on an ImageNet public data set to learn the shallow semantic expression of the edge information; finally, the optimal learned weight in the iteration times set by ImageNet pre-training is used as the initialization weight of the drug identification network, so that the model can be quickly converged to achieve a better identification effect;
the model building adopts a lightweight model MobileNetv2 as a basic framework, and on the basis of the framework of the basic model, the size of a convolution kernel is enlarged, and the receptive field of a convolution neural network is increased; and meanwhile, a mixed attention CBAM module mechanism of channel attention and space attention is introduced, so that the feature extraction capability of the network is improved, the pill features are extracted to the maximum extent, the medicine identification is assisted, the processed data is output, and the correct medicine classification result and the medicine name are output as required.
2. The multiview and data expansion based deep learning oral pill identification method of claim 1, wherein: in the medicine picture shooting rule, the medicine accounts for 50-60% of the whole picture and cannot be too small, the shooting background adopts a pure color uniform color background which is different from the medicine color, and if part of the medicine has the same or similar color with the background, the part of the medicine independently selects other pure color backgrounds; high quality is required for shooting, and clear focusing is required;
when the shooting angle is planar placing shooting, 3 pictures of the centrally symmetrical medicine are shot at 180 degrees, 60 degrees and 30 degrees of the plane of the camera respectively;
when the shooting angle is non-axisymmetric medicine shooting, the shooting is carried out by rotating for a plurality of angles, and the shooting is carried out once at intervals of about 30 degrees; if the medicines are symmetrical left and right, the medicines are rotated by 180 degrees, and one medicine is shot every 30 degrees;
when the shooting angle is vertical shooting, one picture is shot every 30 degrees by rotating 180 degrees, and 6 pictures are obtained in total;
for special medicines without vertical surfaces, such as capsules, the special medicines are only shot according to a plane and rotated by 180 degrees or 360 degrees;
if the front and back surfaces of the medicine are not consistent, a group of pictures need to be shot on the front surface and the back surface according to the plane shooting rule.
3. The multiview and data expansion based deep learning oral pill identification method of claim 1, wherein: the complete pill picture corresponding to the incomplete pill is identified by using a template matching algorithm, and the template matching is carried out
The method is a method for searching a specific target in an image, compares whether each position is similar to a template or not by traversing each possible position in the image, and considers that matching is successful when the similarity is high enough;
the algorithm comprises the following steps:
1) determining the length and width x, y of the current picture;
2) determining the length and width w, h of the template picture;
3) comparing the images (i, j) to (i + w, j + h) at each point (i, j) in sequence from (0, 0) to (x-w, y-h), and calculating the similarity between the images (i, j) to (i + w, j + h) and the template at each point (i, j);
4) and returning the similarity of each point after the comparison is finished.
4. The multi-view and data expansion based deep learning oral pill identification method of claim 1, wherein: in data augmentation, image translation, image turnover, image rotation, image scaling, image miscut, image cutting and combination transformation are completed at a central point from a mathematical angle; the method comprises the following steps: 1) first move the rotation point to the origin; 2) performing a rotation around the origin; 3) moving the rotation point back to the original position;
assume the original coordinates of the image as (x) 0 ,y 0 ) If the coordinates after translation are (x, y), the relationship between the coordinates before translation and after translation is as follows, wherein H is a transformation matrix;
image translation: translation means that all pixels are respectively translated in the x direction and the y direction, and a mathematical matrix corresponding to translation transformation is as follows: d x 、d y Distance of movement in x, y directions:
image turning, namely image mirror image processing, wherein the image turning comprises horizontal turning and vertical turning; the horizontally flipped transformation matrix is:
the vertically flipped transform matrix is:
the image rotation is performed by taking an image central point as a default center and performing rotation of any angle theta, and a transformation matrix is as follows:
image scaling refers to scaling a current image by an arbitrary scaleThen, its transformation matrix is as follows, where S x 、S y Representing the scaled size;
the image miscut refers to the non-perpendicular projection of a plane scene on a projection plane, and the transformation matrix is as follows, h y 、h x Is the angle of transformation in the x and y directions;
the image cutting is to zoom the picture to 1.1 times of the original image and then perform cutting operation on the zoomed image;
the combined transformation adopts the combination of a plurality of augmentation modes, and a given translation transformation matrix H is assumed shift Rotation matrix H rotate Scaling matrix H scale ,
For the combinatorial transform one, the combined matrix M is as follows: m ═ H shift x H rotate xH scale ;
For the combined transform two, the combined matrix N is as follows: n ═ H scale x H rotate x H shift 。
5. The multiview and data expansion based deep learning oral pill identification method of claim 1, wherein: on the basis of the architecture of a lightweight model MobileNetv2 basic model, the size of an expanded convolution kernel is as follows: increasing the receptive field from 3 × 3 to 5 × 5; meanwhile, a CBAM attention mechanism is added in the inverted residual block, so that the fine characteristics of pills are extracted to the maximum extent from the channel attention and the space attention, and the characteristic extraction capability of the model is enhanced.
6. The multiview and data expansion based deep learning oral pill identification method of claim 1, wherein: the practical model embedded into the small and medium-sized mobile equipment is designed, the multi-view and the two-dimensional model are combined, the code of the model is integrated into the main control end of the small and medium-sized equipment, the equipment is provided with an LCD screen and a camera, medicines are scanned through the camera, medicine pictures are collected and transmitted back to the main control end, and the medicine pictures are sent to the model identification; the identified medicine name and the collected medicine picture are correspondingly displayed on the LCD screen for an operator to check.
7. The multiview and data expansion based deep learning oral pill identification method of claim 3, wherein: when the template matching algorithm is used for identifying the complete pill picture corresponding to the incomplete pill, a certain value with the similarity threshold value of more than 90% is set, and when the matching similarity reaches the threshold value, the pill matching is considered to be successful.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210242282.0A CN114821572B (en) | 2022-03-11 | 2022-03-11 | Deep learning oral pill identification method based on multi-view and data expansion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210242282.0A CN114821572B (en) | 2022-03-11 | 2022-03-11 | Deep learning oral pill identification method based on multi-view and data expansion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114821572A true CN114821572A (en) | 2022-07-29 |
CN114821572B CN114821572B (en) | 2023-04-21 |
Family
ID=82529659
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210242282.0A Active CN114821572B (en) | 2022-03-11 | 2022-03-11 | Deep learning oral pill identification method based on multi-view and data expansion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114821572B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107545150A (en) * | 2017-10-13 | 2018-01-05 | 张晨 | Medicine identifying system and its recognition methods based on deep learning |
CN109190643A (en) * | 2018-09-14 | 2019-01-11 | 华东交通大学 | Based on the recognition methods of convolutional neural networks Chinese medicine and electronic equipment |
CN111598130A (en) * | 2020-04-08 | 2020-08-28 | 天津大学 | Traditional Chinese medicine identification method based on multi-view convolutional neural network |
CN111914902A (en) * | 2020-07-08 | 2020-11-10 | 南京航空航天大学 | Traditional Chinese medicine identification and surface defect detection method based on deep neural network |
CN112927753A (en) * | 2021-02-22 | 2021-06-08 | 中南大学 | Method for identifying interface hot spot residues of protein and RNA (ribonucleic acid) compound based on transfer learning |
CN113449776A (en) * | 2021-06-04 | 2021-09-28 | 中南民族大学 | Chinese herbal medicine identification method and device based on deep learning and storage medium |
CN113989623A (en) * | 2021-12-03 | 2022-01-28 | 浙江中医药大学 | Automatic identification method for traditional Chinese medicine decoction piece image |
-
2022
- 2022-03-11 CN CN202210242282.0A patent/CN114821572B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107545150A (en) * | 2017-10-13 | 2018-01-05 | 张晨 | Medicine identifying system and its recognition methods based on deep learning |
CN109190643A (en) * | 2018-09-14 | 2019-01-11 | 华东交通大学 | Based on the recognition methods of convolutional neural networks Chinese medicine and electronic equipment |
CN111598130A (en) * | 2020-04-08 | 2020-08-28 | 天津大学 | Traditional Chinese medicine identification method based on multi-view convolutional neural network |
CN111914902A (en) * | 2020-07-08 | 2020-11-10 | 南京航空航天大学 | Traditional Chinese medicine identification and surface defect detection method based on deep neural network |
CN112927753A (en) * | 2021-02-22 | 2021-06-08 | 中南大学 | Method for identifying interface hot spot residues of protein and RNA (ribonucleic acid) compound based on transfer learning |
CN113449776A (en) * | 2021-06-04 | 2021-09-28 | 中南民族大学 | Chinese herbal medicine identification method and device based on deep learning and storage medium |
CN113989623A (en) * | 2021-12-03 | 2022-01-28 | 浙江中医药大学 | Automatic identification method for traditional Chinese medicine decoction piece image |
Non-Patent Citations (1)
Title |
---|
谭文军: "改进卷积神经网络算法研究及其在作物叶片病害图像识别中的应用", 《CNKI硕士电子期刊信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN114821572B (en) | 2023-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Raith et al. | Artificial Neural Networks as a powerful numerical tool to classify specific features of a tooth based on 3D scan data | |
US11972572B2 (en) | Intraoral scanning system with excess material removal based on machine learning | |
JP6409094B2 (en) | Automated pharmaceutical tablet identification | |
US10679344B2 (en) | Computerized device and method for processing image data | |
Cheung et al. | Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture | |
WO2023142956A1 (en) | Total hip replacement preoperative planning system based on deep learning | |
WO2020125498A1 (en) | Cardiac magnetic resonance image segmentation method and apparatus, terminal device and storage medium | |
WO2021155230A1 (en) | Teeth segmentation using neural networks | |
WO2023142781A1 (en) | Image three-dimensional reconstruction method and apparatus, electronic device, and storage medium | |
CN114663715B (en) | Medical image quality control and classification model training method and device and computer equipment | |
Chen et al. | Missing teeth and restoration detection using dental panoramic radiography based on transfer learning with CNNs | |
CN109215035A (en) | A kind of brain MRI hippocampus three-dimensional dividing method based on deep learning | |
CN113516017A (en) | Method and device for supervising medicine taking process, terminal equipment and storage medium | |
CN115661580A (en) | Convolutional neural network-based traditional Chinese medicine decoction piece image identification method and system | |
Ma et al. | Machine‐learning‐based approach for predicting postoperative skeletal changes for orthognathic surgical planning | |
US10699162B2 (en) | Method and system for sorting and identifying medication via its label and/or package | |
CN114821572A (en) | Deep learning oral pill identification method based on multiple views and data expansion | |
Yao et al. | Head CT image convolution feature segmentation and morphological filtering for densely matching points of IoTs | |
Schwartz et al. | Applications of computer graphics and image processing to 2D and 3D modeling of the functional architecture of visual cortex | |
Hnoohom et al. | Blister Package Classification Using ResNet-101 for Identification of Medication | |
TW202008981A (en) | Method of monitoring medication regimen complemented with portable apparatus | |
CN114937176A (en) | Medicine real-time identification method and system based on deep learning | |
CN111709389A (en) | Traditional Chinese medicine powder intelligent identification method and system based on microscopic image | |
García-García et al. | Automated location of orofacial landmarks to characterize airway morphology in anaesthesia via deep convolutional neural networks | |
CN114373040A (en) | Three-dimensional model reconstruction method and acquisition terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |