CN114821572A - Deep learning oral pill identification method based on multiple views and data expansion - Google Patents

Deep learning oral pill identification method based on multiple views and data expansion Download PDF

Info

Publication number
CN114821572A
CN114821572A CN202210242282.0A CN202210242282A CN114821572A CN 114821572 A CN114821572 A CN 114821572A CN 202210242282 A CN202210242282 A CN 202210242282A CN 114821572 A CN114821572 A CN 114821572A
Authority
CN
China
Prior art keywords
image
medicine
model
shooting
pill
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210242282.0A
Other languages
Chinese (zh)
Other versions
CN114821572B (en
Inventor
向军莲
张俊然
李南欣
谢贤凯
刘云飞
李杨
黄玲
唐良友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deyang Construction Investment Medical Co ltd
Sichuan University
Peoples Hospital of Deyang City
Original Assignee
Deyang Construction Investment Medical Co ltd
Sichuan University
Peoples Hospital of Deyang City
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deyang Construction Investment Medical Co ltd, Sichuan University, Peoples Hospital of Deyang City filed Critical Deyang Construction Investment Medical Co ltd
Priority to CN202210242282.0A priority Critical patent/CN114821572B/en
Publication of CN114821572A publication Critical patent/CN114821572A/en
Application granted granted Critical
Publication of CN114821572B publication Critical patent/CN114821572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Deep learning oral pill identification method based on multiple views and data expansion. And a database is built by adopting a multi-view and data augmentation method, and a data set is perfected at multiple angles. And (3) designing practical models embedded into mobile equipment and small and medium-sized equipment by using a lightweight network. And combining the multiple views with the two-dimensional model, and completing the construction of the practical model after the transfer learning. Meanwhile, an incomplete oral pill identification channel is established, and identification is carried out after template matching is carried out on incomplete pills and the incomplete pills are restored into complete pill pictures. The method effectively classifies the medicines with highly similar shapes and colors, assists medical staff in sorting the medicines, and reduces or even avoids the life safety problem of patients caused by the wrong medicine classification. The overfitting problem caused by small data volume is solved through multi-view database building, data augmentation and transfer learning, the lightweight model MobileNetv2 is used as a basic framework, an attention module mechanism is introduced, the parameter quantity of the model is greatly reduced compared with that of a three-dimensional model, and the method is convenient, practical and easy to popularize.

Description

Deep learning oral pill identification method based on multiple views and data expansion
Technical Field
The invention belongs to the field of clinical medicine and nursing, and relates to a correct identification method of a medicine.
Background
The hospital shoulder bears the difficult task of rescuing and supporting the injury, the work content is large, the personnel are busy, and the situations that some medicines without outer packages are difficult to identify and are not distributed properly are difficult to avoid.
The hospital medicine distribution work is easily affected by human errors. The task of administering drugs to patients in a hospital or care unit environment, etc., is that the current procedure is a manual process: 1) the correct medication and the correct number of pill sets are placed in a plastic cup; 2) the bolus is properly delivered to the corresponding patient; 3) the bolus is administered to the patient at the correct time (e.g., no more than 4 hours apart). In the process, the degree is easily affected by human errors, and absolute quality assurance is difficult to realize.
Medication errors may also occur in a pharmacy environment. Filled prescriptions may be incorrectly labeled with an incorrect dosage and quantity of pills, or with an incorrect medication. As pharmacists become overly fatigued and distracted to confuse pills with similar medication names and physical appearances, pharmacists may dispense the wrong medication, quantity, and dose, and mistakes may occur that cause serious injury or even death to the patient.
With the development of deep learning in the field of image recognition, the task of performing medical image recognition by using a deep learning model is gradually advanced, the development of computer vision technology is mature at present, and especially, unprecedented effects are obtained in the directions of image processing, voice recognition and the like. For the identification and classification of images, compared with artificial subjective experience judgment, the deep learning model is used for image identification to assist medical workers in medicine distribution, so that the death cases of patients caused by human errors of the medical workers can be avoided. The characteristics of the capsule medicines are extracted by adopting a traditional machine learning method, such as Lishuai and the like, so that the task of pharmacy automation is realized. The characteristics of medicine packages are extracted by Shihuayu and the like by adopting a deep learning method, a medicine package identification system applying a deep learning technology obtains primary results, the classification and identification of 500 kinds of medicines are completed, and the verification accuracy in training is 96.4%. Zhangzhenjiang et al, utilize computer vision technology to realize the automatic identification of medicine classification and quantity, opened the technological feasibility of utilizing the advanced learning technique to assist the outpatient service to distribute medicines. The main technical method comprises the following steps: collecting drug external packing images, generating a training image set by utilizing a preprocessing technology, establishing a 7-layer (3C3P1F) convolutional neural network model for training, and deploying drug image recognition service with RESTful interface specification. In practical application, however, for the drug delivery of inpatients, after the external package of the drug which is easy to be used as identification information is removed, a single or a plurality of pills, even 1/2 and 1/4, are packaged and dispensed again at regular time and quantity according to the prescription prescribed by a doctor. Therefore, the prior art has not fully satisfied the need for drug identification.
Disclosure of Invention
The object of the present invention is to develop a method for quickly and accurately identifying a pill/pill combination in an automated manner, which method must be able to correctly identify the pill required by the patient that has been dispensed. The difficulty of correct identification lies in that the sizes, colors and shapes of the pills after being removed from the package are not very different, and the tiny characteristics are difficult to distinguish; secondly, stacking, side placing and the like can happen during medicine distribution, and the picture recognition model with a single angle cannot meet the complex task; in addition, in consideration of clinical conditions, ages, physical qualities and the like of patients, doctors can increase or decrease the dosage of the medicine as appropriate, and half-tablets, even 1/4-metered pills and other cases of 'incomplete' pills appear in actual situations. Aiming at the situation, in order to correctly identify all pills including incomplete pills, the invention adopts a multi-angle and multi-view shooting method to collect the pills and deals with different situations such as side placement, stacking and the like of the medicines; a pill picture recovery channel is designed to solve the problem of incomplete pill identification. Meanwhile, in consideration of the fact that later-stage application scenes are mostly used for small and medium-sized equipment such as a medicine delivery robot, a pharmacy automatic medicine distribution system and the like, the task is realized by adopting a lightweight neural network, and the later-stage product conversion is facilitated.
The purpose of the invention is achieved by the following steps: a deep learning oral pill identification method based on multiple views and data expansion is characterized in that: a database is built by adopting a multi-view and data augmentation method, and a data set is perfected at multiple angles; a practical model embedded into mobile equipment and small and medium-sized equipment is designed by using a lightweight network; combining the multi-view with the two-dimensional model, completing the construction of a practical model after transfer learning, correctly identifying oral pills and distributing the oral pills to corresponding patients through mobile equipment and small and medium-sized equipment; an incomplete oral pill identification channel is established, incomplete pills are recovered, and the practicability of the model is enhanced;
the method comprises the following specific steps:
1) building a multi-view database;
2) recovery of incomplete pills;
3) data augmentation;
4) completing the establishment of a convolutional neural network pill identification model through pre-training transfer learning;
5) outputting a medicine classification result and medicine name information;
in the construction of the multi-view database, a shot medicine picture is used as data set basic data, and in the medicine picture shooting method, shooting rules and shooting angles are specified, wherein the shooting angles comprise a plane placing shooting angle, a non-axisymmetric medicine shooting angle, a vertical plane shooting angle and a special condition shooting angle;
the incomplete pill recovery is to identify the complete pill picture corresponding to the incomplete pill by using a template matching algorithm aiming at half or even 1/4 metered incomplete pills appearing in the actual situation, and send the identified corresponding complete pill into the built convolutional neural network medicine identification model.
The data augmentation, namely data augmentation, cuts the medicine picture according to the resolution ratio under the condition of not substantially increasing the data, generates different picture data and obtains data volume augmentation; during operation, from the mathematical angle, data augmentation is performed by using an image central point by default, and image translation, image turnover, image rotation, image scaling, image miscut, image cutting and combination transformation are completed at the central point;
the transfer learning is to pre-train on an ImageNet public data set to learn the shallow semantic expression of the edge information; and finally, the optimal learned weight in the iteration times set by ImageNet pre-training is used as the initialization weight of the drug identification network, so that the model can be quickly converged to achieve a better identification effect.
The model building adopts a lightweight model MobileNetv2 as a basic framework, and on the basis of the framework of the basic model, the size of a convolution kernel is enlarged, and the receptive field of a convolution neural network is increased; meanwhile, a mixed attention module mechanism of channel attention and space attention is introduced, so that the feature extraction capability of the network is improved, the pill features are extracted to the maximum extent, and the medicine identification is assisted.
In the medicine picture shooting rule, the medicine accounts for 50-60% of the whole picture and cannot be too small, the shooting background adopts a pure color uniform color background which is different from the medicine color, and if part of the medicine has the same or similar color with the background, the part of the medicine independently selects other pure color backgrounds; high quality is required for shooting and clear focusing is required.
When the shooting angle is planar placing shooting, 3 pictures of the centrally symmetrical medicine are shot at 180 degrees, 60 degrees and 30 degrees of the plane of the camera respectively;
when the shooting angle is non-axisymmetric medicine shooting, the shooting is carried out by rotating for a plurality of angles, and the shooting is carried out once at intervals of about 30 degrees;
when the shooting angle is vertical shooting, one picture is shot every 30 degrees by rotating 180 degrees, and 6 pictures are obtained in total;
for special medicines without vertical surfaces, such as capsules, the special medicines are only shot according to a plane and rotated by 180 degrees or 360 degrees; if the front and back surfaces of the medicine are not consistent, a group of pictures need to be shot on the front surface and the back surface according to the plane shooting rule.
Identifying a complete pill picture corresponding to the incomplete pill by using a template matching algorithm, wherein the template matching is a method for searching a specific target in an image, comparing whether each position is similar to the template or not by traversing each possible position in the image, and when the similarity is high enough, considering that the matching is successful;
the algorithm comprises the following steps:
1) determining the length and width x, y of the current picture;
2) determining the length and width w, h of the template picture;
3) comparing the images (i, j) to (i + w, j + h) at each point (i, j) in sequence from (0, 0) to (x-w, y-h), and calculating the similarity between the images (i, j) to (i + w, j + h) and the template at each point (i, j);
4) and returning the similarity of each point after the comparison is finished.
In data augmentation, image translation, image turnover, image rotation, image scaling, image miscut, image cutting and combination transformation are completed at a central point from a mathematical angle; the method comprises the following steps: 1) first move the rotation point to the origin; 2) performing a rotation around the origin; 3) moving the rotation point back to the original position;
assume the original coordinates of the image as (x) 0 ,y 0 ) If the coordinates after translation are (x, y), the relationship between the coordinates before translation and after translation is as follows, wherein H is a transformation matrix;
Figure BDA0003542872690000041
image translation: translation means that all pixels are respectively translated in the x direction and the y direction, and a mathematical matrix corresponding to translation transformation is as follows: d is a radical of x 、d y Distance of movement in x, y directions:
Figure BDA0003542872690000042
image turning, namely image mirror image processing, wherein the image turning comprises horizontal turning and vertical turning; the horizontally flipped transformation matrix is:
Figure BDA0003542872690000043
the vertically flipped transformation matrix is:
Figure BDA0003542872690000044
the image rotation is performed by taking an image central point as a default center and performing rotation of any angle theta, and a transformation matrix is as follows:
Figure BDA0003542872690000045
the image scaling refers to scaling the current image in an arbitrary scale, and the transformation matrix is as follows, wherein S x 、S y Representing the scaled size;
Figure BDA0003542872690000046
the image miscut refers to the non-perpendicular projection of a plane scene on a projection plane, and the transformation matrix is as follows, h y 、h x Is the angle of transformation in the x and y directions;
Figure BDA0003542872690000047
the image cutting is to zoom the picture to 1.1 times of the original image and then perform cutting operation on the zoomed image;
the combined transformation adopts the combination of multiple augmentation modes, and a given translation transformation matrix H is assumed shift Rotation matrix H rotate Scaling matrix H scale
For the combinatorial transform one, the combined matrix M is as follows: m ═ H shift x H rotate x H scale
For the combined transform two, the combined matrix N is as follows: n ═ H scale x H rotate x H shift
On the basis of the architecture of a basic model of a lightweight model MobileNetv2, the size of an expanded convolution kernel is as follows: the receptive field is increased from 3 multiplied by 3 to 5 multiplied by 5 or more, and a Conditional Block Attachment Module (CBAM) Attention mechanism is added, so that the feature extraction capability of the model is enhanced.
When the template matching algorithm is used for identifying the complete pill picture corresponding to the incomplete pill, a certain value with the similarity threshold value of more than 90% is set, and when the matching similarity reaches the threshold value, the pill matching is considered to be successful.
The practical model embedded into the small and medium-sized mobile equipment is designed, the multi-view and the two-dimensional model are combined, the code of the model is integrated into the main control end of the small and medium-sized equipment, the equipment is provided with an LCD screen and a camera, medicines are scanned through the camera, medicine pictures are collected and transmitted back to the main control end, and the medicine pictures are sent to the model identification; the identified medicine name and the collected medicine picture are correspondingly displayed on the LCD screen for an operator to check.
The invention has the beneficial effects that:
1. the medicine that appearance, colour height are similar is appeared in effectual classification, and supplementary medical personnel carry out the medicine letter sorting, reduce or even avoid the disease to receive the condition threatened because the life safety that the medicine classification made mistakes and leads to.
2. The multi-view database is used for solving the complex situations of medicine side placement, stacking and the like caused by the mixing of various medicines in the medicine distribution process, and is combined with different data augmentation means, so that the angle of a data set is further improved.
3. The data is enlarged, overfitting caused by small data amount is compensated by transfer learning, and a practical model embedded into mobile equipment and small and medium-sized equipment is designed by combining with a lightweight network, so that the method is convenient and practical and has wide application prospect.
4. The multi-view model and the two-dimensional model are combined, the identification effect of the three-dimensional model is achieved to a certain extent, the medicine identification which is randomly placed at multiple angles can be dealt with, meanwhile, the parameter quantity of the model is greatly reduced compared with that of the three-dimensional model, and the conversion of actual products is easy.
5. Aiming at incomplete pills appearing in actual pill distribution, an image recovery channel is designed, and the incomplete pills can be effectively identified.
6. The invention introduces a mixed attention CBAM module mechanism of channel attention and space attention, improves the feature extraction capability of the network, extracts the pill features to the maximum extent and assists in medicine identification.
Drawings
Fig. 1 is a flow chart of the identification method of the present invention.
FIG. 2 is a schematic flow chart of a convolutional neural network for drug identification constructed by the present invention.
FIG. 3 is a schematic diagram of the correct drug ratio in the shooting rule of the present invention.
Fig. 4 is a schematic diagram of the wrong drug proportion in the shooting rule of the present invention.
Fig. 5 is a schematic view of a multi-view shooting angle of a circular medicine. In the figure, 1 is a camera and 2 is a medicine.
FIGS. 6-8 are schematic views of non-axisymmetric drug plane shooting angles.
Fig. 9-10 are schematic diagrams of vertical shooting angles.
Fig. 11 is a schematic diagram of the template matching algorithm.
FIGS. 12-14 are schematic diagrams of examples of matching centrosymmetric drug templates.
FIGS. 15-17 are schematic diagrams of non-centrosymmetric drug template matching examples.
Fig. 18 is a schematic diagram of an image augmentation scheme.
Fig. 19 is the MobileNetv2 model framework.
FIG. 20 is a schematic view of the receptive field.
FIG. 21 is a schematic diagram of a CBAM attention mechanism model.
Fig. 22 shows a modification of the MobileNetv2 inverted residual block model.
FIG. 23 is a diagram illustrating training loss and iteration count for training effectiveness and accuracy in an embodiment of the present invention.
FIG. 24 is a diagram illustrating the test loss and the number of iterations for training effectiveness and accuracy in an embodiment of the present invention.
FIG. 25 is a diagram illustrating the test accuracy and the number of iterations for the training effect and the precision in the embodiment of the present invention.
Detailed Description
Experimental Environment
The hardware environment and software environment used in this experimental example are shown in table 1:
TABLE 1
Figure BDA0003542872690000061
The invention adopts a lightweight model MobileNetv2 as a basic framework, MobileNetv2 is proposed by Google in 2018, 1 month, and innovation points of the model are two technologies of invested resources and Linear Bottlenecks. The method aims to improve the accuracy and reduce the occupation of the memory. The whole model structure is shown in Table 2
TABLE 2
Figure BDA0003542872690000071
Experimental data
The experimental data used 753 pictures of individual pills, for a total of 93 categories of pills. The high-definition JPG format pictures are collected and shot, shooting rules are collected according to data set collection specifications, and the specific number and types are shown in a table 3. TABLE 3
Figure BDA0003542872690000072
Figure BDA0003542872690000081
Figure BDA0003542872690000091
Figure BDA0003542872690000101
See figure 1.
The pill recognition model is built based on MobileNetv2 and is a CNN model. The CNN medicine identification network is a model built on the basis of MobileNetv 2.
The medicines are divided into complete pills and incomplete pills, and the complete pills and the incomplete pills enter a CNN medicine identification network. The incomplete pills are subjected to template matching, restored into images of complete pills and then enter an identification system. And outputting the medicine name and the picture after the identification system completes identification.
FIG. 2 is a schematic flow chart of a convolutional neural network for drug identification constructed by the present invention.
In the construction of the multi-view database, the shot medicine pictures are used as the basic data of the data set. In the medicine picture shooting method, a shooting rule and a shooting angle are specified, and the shooting angle comprises a plane placing shooting angle, a non-axisymmetric medicine shooting angle, a vertical plane shooting angle and a special condition shooting angle.
Fig. 2 shows that, after the multi-view database is built, data augmentation is performed on the built MobileNetv2 model. Data augmentation includes data for image flipping, image scaling, image cropping, image rotation, and image cropping. Data augmentation firstly, pre-training migration learning is carried out on an ImageNet public data set, and the shallow semantic expression of edge information of the data is learned; and finally, the optimal learned weight in the iteration times set by ImageNet pre-training is used as the initialization weight of the drug identification network, so that the model can be quickly converged to achieve a better identification effect. The identification step comprises:
1) building a multi-view database;
2) recovery of incomplete pills;
3) data augmentation;
4) completing model building through pre-training transfer learning;
5) and outputting a medicine classification result and medicine name information.
The model building adopts a lightweight model MobileNetv2 as a basic framework, and on the basis of the framework of the basic model, the size of a convolution kernel is enlarged, the receptive field of a convolution neural network is increased, and the feature extraction capability of the network is improved; in addition, a CBAM (cubic boron-random access memory) attention mechanism is introduced into the inverted residual block of the model, the CBAM attention mechanism can take space (spatial) and channel (channel) into consideration, and compared with a mechanism only paying attention to one aspect, the method further improves the feature extraction capability of the model, extracts the fine features of the pills to the maximum extent, and is beneficial to later-stage recognition of the pills. And outputting correct medicine classification results and medicine names according to the processed data.
Fig. 3-10 are schematic diagrams of the shooting rules of the present invention. Fig. 3 is a diagram of correct drug proportion, fig. 4 is a diagram of wrong drug proportion, the drug itself accounts for 50-60% of the whole image in the process of taking a drug picture, the drug cannot be too small, the taking background adopts a solid uniform color background which is different from the drug color, and if some drugs are the same or similar to the background color, the drugs are selected from other solid backgrounds separately; high quality is required for shooting and clear focusing is required.
See fig. 5-10.
When the shooting angle is that the plane is placed and is shot, 3 photos are shot to the centrosymmetric medicine, and the shooting is respectively carried out at the camera at multiple angles of 180 degrees, 60 degrees, 30 degrees and the like on the plane. When the shooting angle is non-axisymmetric medicine shooting, the shooting is carried out by rotating for a plurality of angles, and the shooting is carried out once at intervals of about 30 degrees; if the medicines are symmetrical left and right, the medicines are rotated by 180 degrees, and one medicine is shot every 30 degrees. When the shooting angle is vertical shooting, the shooting is carried out by rotating 180 degrees every 30 degrees, and 6 pictures are taken in total. For special medicines without vertical surfaces, such as capsules, the special medicines are only rotated by 180 degrees or 360 degrees according to plane shooting. If the front and back surfaces of the medicine are not consistent, a group of pictures need to be shot on the front surface and the back surface according to the plane shooting rule.
See fig. 12-17.
FIGS. 12-14 are schematic diagrams of examples of matching centrosymmetric drug templates.
Fig. 12 is a complete drug, and fig. 13 is a one-half defective pill of fig. 12, which is restored to a complete pill identical to fig. 12 by centrosymmetric drug template matching.
FIGS. 15-17 are schematic diagrams of non-centrosymmetric drug template matching examples.
Fig. 15 is a complete medicine, fig. 16 is a half incomplete pill of fig. 15, which is matched by a centrosymmetric medicine template, and fig. 17 is a picture showing that the complete pill of fig. 16 is restored.
The data augmentation of this example is seen in fig. 18.
Based on the characteristic of less medical data sets, a data augmentation means of a model is merged into a data preprocessing part. Data enhancement is also called data augmentation, i.e., allowing limited data to produce value equivalent to more data without substantially increasing the data. In this embodiment, if the resolution of the network input picture is 256 × 256, we adopt the random cropping method to 224 × 224, and one picture can generate 32 × 32 different pictures at most, and the data size is nearly 1000 times. All data augmentation is done at the image center by default at the time of operation. 1) First move the rotation point to the origin; 2) performing a rotation around the origin; 3) the rotation point is then moved back to the original position.
Assume the original coordinates of the image as (x) 0 ,y 0 ) If the coordinates after translation are (x, y), the relationship between the coordinates before translation and after translation is as follows, wherein H is a transformation matrix;
Figure BDA0003542872690000121
image translation: translation means that all pixels are respectively translated in the x direction and the y direction, and a mathematical matrix corresponding to translation transformation is as follows: d x 、d y Distance of movement in x, y directions:
Figure BDA0003542872690000122
image turning, namely image mirror image processing, wherein the image turning comprises horizontal turning and vertical turning; the horizontally flipped transformation matrix is:
Figure BDA0003542872690000123
the vertically flipped transform matrix is:
Figure BDA0003542872690000124
the image rotation is performed by taking an image central point as a default center and performing rotation of any angle theta, and a transformation matrix is as follows:
Figure BDA0003542872690000125
the image scaling refers to scaling the current image in an arbitrary scale, and the transformation matrix is as follows, wherein S x 、S y Representing the scaled size;
Figure BDA0003542872690000126
image miscut refers to the non-perpendicular projection of a planar scene onto a projection plane, the transformation matrix of which is as follows, h y 、h x For conversion in x and y directionsThe angle of (d);
Figure BDA0003542872690000127
in accordance with the common practice of clipping in deep learning, in this embodiment, when image clipping is performed, the image is enlarged to 1.1 times of the original image, and then random scale clipping operation is performed on the enlarged image.
Fig. 19 is the MobileNetv2 model framework.
The invention adopts a lightweight model MobileNetv2 as a basic framework, MobileNetv2 is provided by Google in 2018, and the innovation points of the model are two technologies of invoked reactions and Linear bottleecks. The method aims to improve the accuracy and reduce the occupation of the memory.
For this model, the number of channels increases with time and the size of the space decreases accordingly. But overall, the tensor remains relatively small due to the bottleneck layer that makes up the connections between blocks.
In order to improve the feature extraction capability of the model, the research expands the size of a convolution kernel, increases the receptive field of a convolution neural network and improves the special gift extraction capability of the network on the basis of the architecture of a basic model. This embodiment increases the receptive field from 3 × 3 to 5 × 5 or more. As shown in fig. 20.
The CBAM attention mechanism has been increased to further improve feature extraction capability. As shown in fig. 21.
The relational Block Attention Module (CBAM) represents an Attention Module of a convolution Module, and is an Attention Module combining space (spatial) and channel (channel). For the pill identification of the invention, because the pill identification is a fine-grained identification problem, the network is required to have relatively high feature extraction capability to identify the nuances among the pills. The attention module in the CBAM allows the network to place attention in areas that are meaningful to pill identification without excessive learning of features such as background interference areas, thus allowing the model to extract maximum pill features to identify nuances between pills. Meanwhile, because the CBAM is a light-weight general module, the CBAM can be seamlessly integrated into any CNN architecture by neglecting the overhead of the module, and can be used for carrying out end-to-end training together with the basic CNN, and the light-weight module is added, so that the overall performance of the model is improved, the size of the model is not excessively increased, and the CBAM is beneficial to embedding mobile equipment in the later period.
Fig. 22 is a schematic diagram of a modification of MobileNetv2 inverted residual block model.
In the figure, the expansion of the channel is performed first, and then the reduction of the channel is performed, and a 1 × 1 "expansion" layer (Point-wise restriction (PW)) is added before the Depth-wise restriction (DW), in order to increase the number of channels and obtain more features, that is: "dilation" (PW) → "convolution extraction feature" (DW) → "compression" (PW).
The combination transformation adopts the combination of multiple augmentation modes. The data augmentation in deep learning generally adopts a combination of multiple augmentation modes, and here, matrix multiplication is involved, and according to the operation rule, the result of different combination sequences is different, that is, AB ≠ BA in linear algebra, except for special cases. For better explanation, assume that a translation transformation matrix H is given shift Rotation matrix H rotate Scaling matrix H scale . In this embodiment, we present two different combinatorial transformations.
For the combinatorial transform one, the combined matrix is as follows: m ═ H shift x H rotate x H scale
For the combined transform two, the combined matrix is as follows: n ═ H scale x H rotate x H shift
The data augmentation mode can prevent overfitting of the model, and can make up for the situation that picture data of certain angles are not collected due to incomplete collection methods in the database building process, so that the multi-latitude database is further improved.
The present embodiment uses MobileNetv2 as the infrastructure for pill identification, epoch is set to 500, batch _ size is set to 16, and the learning rate is designed to be 0.001. If the training precision is not reduced within 5 epochs, the learning rate is reduced by 10%.
The training effect and accuracy are shown in fig. 23. It can be seen in the figure that: with the increase of the iteration number, the loss of the training set gradually decreases and converges to a certain range.
The test loss versus number of iterations is shown in FIG. 24, where it can be seen that: with the increase of the iteration times, the loss of the test set gradually decreases and converges to a certain range.
The test accuracy and the number of iterations for the training effect and precision are shown in fig. 25. It can be seen in the figure that: with the increase of the iteration times, the precision of the test set gradually rises and finally stabilizes within a certain range.

Claims (7)

1. A deep learning oral pill identification method based on multiple views and data expansion is characterized in that: a database is built by adopting a multi-view and data augmentation method, and a data set is perfected at multiple angles; a practical model embedded into small and medium-sized mobile equipment is designed by using a lightweight network; combining the multi-view with the two-dimensional model, completing the construction of a practical model after transfer learning, correctly identifying oral pills and distributing the oral pills to corresponding patients through mobile equipment and small and medium-sized equipment; an incomplete oral pill identification channel is established, incomplete pills are recovered, and the practicability of the model is enhanced;
the method comprises the following specific steps:
1) building a multi-view database;
2) recovery of incomplete pills;
3) data augmentation;
4) completing the establishment of a convolutional neural network pill identification model through pre-training transfer learning;
5) outputting a medicine classification result and medicine name information;
in the construction of the multi-view database, a shot medicine picture is used as data set basic data, and in the medicine picture shooting method, shooting rules and shooting angles are specified, wherein the shooting angles comprise a plane placing shooting angle, a non-axisymmetric medicine shooting angle, a vertical plane shooting angle and a special condition shooting angle;
the incomplete pill recovery is to identify a complete pill picture corresponding to a incomplete pill by using a template matching algorithm aiming at half or even 1/4 metered incomplete pills appearing in an actual situation, and send the identified corresponding complete pill into a built convolutional neural network medicine identification model;
the data augmentation, namely data augmentation, cuts the medicine picture according to the resolution ratio under the condition of not substantially increasing the data, generates different picture data and obtains data volume augmentation; during operation, from the mathematical angle, data augmentation is performed by using an image central point by default, and image translation, image turnover, image rotation, image scaling, image miscut, image cutting and combination transformation are completed at the central point;
the transfer learning is to pre-train on an ImageNet public data set to learn the shallow semantic expression of the edge information; finally, the optimal learned weight in the iteration times set by ImageNet pre-training is used as the initialization weight of the drug identification network, so that the model can be quickly converged to achieve a better identification effect;
the model building adopts a lightweight model MobileNetv2 as a basic framework, and on the basis of the framework of the basic model, the size of a convolution kernel is enlarged, and the receptive field of a convolution neural network is increased; and meanwhile, a mixed attention CBAM module mechanism of channel attention and space attention is introduced, so that the feature extraction capability of the network is improved, the pill features are extracted to the maximum extent, the medicine identification is assisted, the processed data is output, and the correct medicine classification result and the medicine name are output as required.
2. The multiview and data expansion based deep learning oral pill identification method of claim 1, wherein: in the medicine picture shooting rule, the medicine accounts for 50-60% of the whole picture and cannot be too small, the shooting background adopts a pure color uniform color background which is different from the medicine color, and if part of the medicine has the same or similar color with the background, the part of the medicine independently selects other pure color backgrounds; high quality is required for shooting, and clear focusing is required;
when the shooting angle is planar placing shooting, 3 pictures of the centrally symmetrical medicine are shot at 180 degrees, 60 degrees and 30 degrees of the plane of the camera respectively;
when the shooting angle is non-axisymmetric medicine shooting, the shooting is carried out by rotating for a plurality of angles, and the shooting is carried out once at intervals of about 30 degrees; if the medicines are symmetrical left and right, the medicines are rotated by 180 degrees, and one medicine is shot every 30 degrees;
when the shooting angle is vertical shooting, one picture is shot every 30 degrees by rotating 180 degrees, and 6 pictures are obtained in total;
for special medicines without vertical surfaces, such as capsules, the special medicines are only shot according to a plane and rotated by 180 degrees or 360 degrees;
if the front and back surfaces of the medicine are not consistent, a group of pictures need to be shot on the front surface and the back surface according to the plane shooting rule.
3. The multiview and data expansion based deep learning oral pill identification method of claim 1, wherein: the complete pill picture corresponding to the incomplete pill is identified by using a template matching algorithm, and the template matching is carried out
The method is a method for searching a specific target in an image, compares whether each position is similar to a template or not by traversing each possible position in the image, and considers that matching is successful when the similarity is high enough;
the algorithm comprises the following steps:
1) determining the length and width x, y of the current picture;
2) determining the length and width w, h of the template picture;
3) comparing the images (i, j) to (i + w, j + h) at each point (i, j) in sequence from (0, 0) to (x-w, y-h), and calculating the similarity between the images (i, j) to (i + w, j + h) and the template at each point (i, j);
4) and returning the similarity of each point after the comparison is finished.
4. The multi-view and data expansion based deep learning oral pill identification method of claim 1, wherein: in data augmentation, image translation, image turnover, image rotation, image scaling, image miscut, image cutting and combination transformation are completed at a central point from a mathematical angle; the method comprises the following steps: 1) first move the rotation point to the origin; 2) performing a rotation around the origin; 3) moving the rotation point back to the original position;
assume the original coordinates of the image as (x) 0 ,y 0 ) If the coordinates after translation are (x, y), the relationship between the coordinates before translation and after translation is as follows, wherein H is a transformation matrix;
Figure FDA0003542872680000021
image translation: translation means that all pixels are respectively translated in the x direction and the y direction, and a mathematical matrix corresponding to translation transformation is as follows: d x 、d y Distance of movement in x, y directions:
Figure FDA0003542872680000031
image turning, namely image mirror image processing, wherein the image turning comprises horizontal turning and vertical turning; the horizontally flipped transformation matrix is:
Figure FDA0003542872680000032
the vertically flipped transform matrix is:
Figure FDA0003542872680000033
the image rotation is performed by taking an image central point as a default center and performing rotation of any angle theta, and a transformation matrix is as follows:
Figure FDA0003542872680000034
image scaling refers to scaling a current image by an arbitrary scaleThen, its transformation matrix is as follows, where S x 、S y Representing the scaled size;
Figure FDA0003542872680000035
the image miscut refers to the non-perpendicular projection of a plane scene on a projection plane, and the transformation matrix is as follows, h y 、h x Is the angle of transformation in the x and y directions;
Figure FDA0003542872680000036
the image cutting is to zoom the picture to 1.1 times of the original image and then perform cutting operation on the zoomed image;
the combined transformation adopts the combination of a plurality of augmentation modes, and a given translation transformation matrix H is assumed shift Rotation matrix H rotate Scaling matrix H scale
For the combinatorial transform one, the combined matrix M is as follows: m ═ H shift x H rotate xH scale
For the combined transform two, the combined matrix N is as follows: n ═ H scale x H rotate x H shift
5. The multiview and data expansion based deep learning oral pill identification method of claim 1, wherein: on the basis of the architecture of a lightweight model MobileNetv2 basic model, the size of an expanded convolution kernel is as follows: increasing the receptive field from 3 × 3 to 5 × 5; meanwhile, a CBAM attention mechanism is added in the inverted residual block, so that the fine characteristics of pills are extracted to the maximum extent from the channel attention and the space attention, and the characteristic extraction capability of the model is enhanced.
6. The multiview and data expansion based deep learning oral pill identification method of claim 1, wherein: the practical model embedded into the small and medium-sized mobile equipment is designed, the multi-view and the two-dimensional model are combined, the code of the model is integrated into the main control end of the small and medium-sized equipment, the equipment is provided with an LCD screen and a camera, medicines are scanned through the camera, medicine pictures are collected and transmitted back to the main control end, and the medicine pictures are sent to the model identification; the identified medicine name and the collected medicine picture are correspondingly displayed on the LCD screen for an operator to check.
7. The multiview and data expansion based deep learning oral pill identification method of claim 3, wherein: when the template matching algorithm is used for identifying the complete pill picture corresponding to the incomplete pill, a certain value with the similarity threshold value of more than 90% is set, and when the matching similarity reaches the threshold value, the pill matching is considered to be successful.
CN202210242282.0A 2022-03-11 2022-03-11 Deep learning oral pill identification method based on multi-view and data expansion Active CN114821572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210242282.0A CN114821572B (en) 2022-03-11 2022-03-11 Deep learning oral pill identification method based on multi-view and data expansion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210242282.0A CN114821572B (en) 2022-03-11 2022-03-11 Deep learning oral pill identification method based on multi-view and data expansion

Publications (2)

Publication Number Publication Date
CN114821572A true CN114821572A (en) 2022-07-29
CN114821572B CN114821572B (en) 2023-04-21

Family

ID=82529659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210242282.0A Active CN114821572B (en) 2022-03-11 2022-03-11 Deep learning oral pill identification method based on multi-view and data expansion

Country Status (1)

Country Link
CN (1) CN114821572B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545150A (en) * 2017-10-13 2018-01-05 张晨 Medicine identifying system and its recognition methods based on deep learning
CN109190643A (en) * 2018-09-14 2019-01-11 华东交通大学 Based on the recognition methods of convolutional neural networks Chinese medicine and electronic equipment
CN111598130A (en) * 2020-04-08 2020-08-28 天津大学 Traditional Chinese medicine identification method based on multi-view convolutional neural network
CN111914902A (en) * 2020-07-08 2020-11-10 南京航空航天大学 Traditional Chinese medicine identification and surface defect detection method based on deep neural network
CN112927753A (en) * 2021-02-22 2021-06-08 中南大学 Method for identifying interface hot spot residues of protein and RNA (ribonucleic acid) compound based on transfer learning
CN113449776A (en) * 2021-06-04 2021-09-28 中南民族大学 Chinese herbal medicine identification method and device based on deep learning and storage medium
CN113989623A (en) * 2021-12-03 2022-01-28 浙江中医药大学 Automatic identification method for traditional Chinese medicine decoction piece image

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545150A (en) * 2017-10-13 2018-01-05 张晨 Medicine identifying system and its recognition methods based on deep learning
CN109190643A (en) * 2018-09-14 2019-01-11 华东交通大学 Based on the recognition methods of convolutional neural networks Chinese medicine and electronic equipment
CN111598130A (en) * 2020-04-08 2020-08-28 天津大学 Traditional Chinese medicine identification method based on multi-view convolutional neural network
CN111914902A (en) * 2020-07-08 2020-11-10 南京航空航天大学 Traditional Chinese medicine identification and surface defect detection method based on deep neural network
CN112927753A (en) * 2021-02-22 2021-06-08 中南大学 Method for identifying interface hot spot residues of protein and RNA (ribonucleic acid) compound based on transfer learning
CN113449776A (en) * 2021-06-04 2021-09-28 中南民族大学 Chinese herbal medicine identification method and device based on deep learning and storage medium
CN113989623A (en) * 2021-12-03 2022-01-28 浙江中医药大学 Automatic identification method for traditional Chinese medicine decoction piece image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭文军: "改进卷积神经网络算法研究及其在作物叶片病害图像识别中的应用", 《CNKI硕士电子期刊信息科技辑》 *

Also Published As

Publication number Publication date
CN114821572B (en) 2023-04-21

Similar Documents

Publication Publication Date Title
Raith et al. Artificial Neural Networks as a powerful numerical tool to classify specific features of a tooth based on 3D scan data
US11972572B2 (en) Intraoral scanning system with excess material removal based on machine learning
JP6409094B2 (en) Automated pharmaceutical tablet identification
US10679344B2 (en) Computerized device and method for processing image data
Cheung et al. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture
WO2023142956A1 (en) Total hip replacement preoperative planning system based on deep learning
WO2020125498A1 (en) Cardiac magnetic resonance image segmentation method and apparatus, terminal device and storage medium
WO2021155230A1 (en) Teeth segmentation using neural networks
WO2023142781A1 (en) Image three-dimensional reconstruction method and apparatus, electronic device, and storage medium
CN114663715B (en) Medical image quality control and classification model training method and device and computer equipment
Chen et al. Missing teeth and restoration detection using dental panoramic radiography based on transfer learning with CNNs
CN109215035A (en) A kind of brain MRI hippocampus three-dimensional dividing method based on deep learning
CN113516017A (en) Method and device for supervising medicine taking process, terminal equipment and storage medium
CN115661580A (en) Convolutional neural network-based traditional Chinese medicine decoction piece image identification method and system
Ma et al. Machine‐learning‐based approach for predicting postoperative skeletal changes for orthognathic surgical planning
US10699162B2 (en) Method and system for sorting and identifying medication via its label and/or package
CN114821572A (en) Deep learning oral pill identification method based on multiple views and data expansion
Yao et al. Head CT image convolution feature segmentation and morphological filtering for densely matching points of IoTs
Schwartz et al. Applications of computer graphics and image processing to 2D and 3D modeling of the functional architecture of visual cortex
Hnoohom et al. Blister Package Classification Using ResNet-101 for Identification of Medication
TW202008981A (en) Method of monitoring medication regimen complemented with portable apparatus
CN114937176A (en) Medicine real-time identification method and system based on deep learning
CN111709389A (en) Traditional Chinese medicine powder intelligent identification method and system based on microscopic image
García-García et al. Automated location of orofacial landmarks to characterize airway morphology in anaesthesia via deep convolutional neural networks
CN114373040A (en) Three-dimensional model reconstruction method and acquisition terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant