CN114821572B - Deep learning oral pill identification method based on multi-view and data expansion - Google Patents

Deep learning oral pill identification method based on multi-view and data expansion Download PDF

Info

Publication number
CN114821572B
CN114821572B CN202210242282.0A CN202210242282A CN114821572B CN 114821572 B CN114821572 B CN 114821572B CN 202210242282 A CN202210242282 A CN 202210242282A CN 114821572 B CN114821572 B CN 114821572B
Authority
CN
China
Prior art keywords
image
pill
medicine
model
shooting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210242282.0A
Other languages
Chinese (zh)
Other versions
CN114821572A (en
Inventor
向军莲
张俊然
李南欣
谢贤凯
刘云飞
李杨
黄玲
唐良友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deyang Construction Investment Medical Co ltd
Sichuan University
Peoples Hospital of Deyang City
Original Assignee
Deyang Construction Investment Medical Co ltd
Sichuan University
Peoples Hospital of Deyang City
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deyang Construction Investment Medical Co ltd, Sichuan University, Peoples Hospital of Deyang City filed Critical Deyang Construction Investment Medical Co ltd
Priority to CN202210242282.0A priority Critical patent/CN114821572B/en
Publication of CN114821572A publication Critical patent/CN114821572A/en
Application granted granted Critical
Publication of CN114821572B publication Critical patent/CN114821572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A deep learning oral pill identification method based on multi-view and data expansion. And constructing a database by adopting a multi-view and data augmentation method, and perfecting the data set at multiple angles. And designing a practical model embedded into mobile equipment and small and medium-sized equipment by using a lightweight network. And combining the multi-view with the two-dimensional model, and completing the construction of the practical model after transfer learning. Meanwhile, an incomplete oral pill identification channel is established, and the incomplete pill is identified after template matching is carried out on the incomplete pill and restored into a complete pill picture. The method effectively classifies the medicines with the similar appearance and color, assists medical staff in sorting the medicines, and reduces or even avoids the life safety problem caused by mistakes in medicine classification of patients. The over-fitting problem caused by small data quantity is solved through multi-view database construction, data augmentation and transfer learning, a lightweight model MobileNetv2 is adopted as a basic framework, a attention module mechanism is introduced, the parameter quantity of the model is greatly reduced compared with that of a three-dimensional model, and the method is convenient, practical and easy to popularize.

Description

Deep learning oral pill identification method based on multi-view and data expansion
Technical Field
The invention belongs to the field of clinical medicine and nursing, and relates to a correct identification method of medicines.
Background
Hospitals carry the difficult task of curing the death and the handicap, have a large amount of work content and busy personnel, and have the problems of difficult identification and improper distribution of medicines without external packaging.
The work of dispensing medicines in hospitals is easily affected by human errors. The work of administering drugs to patients in a hospital or care-ward environment is an existing procedure that is a manual process: 1) Placing the correct medicament and the correct number of pill sets into a plastic cup; 2) The pill sets are properly delivered to the corresponding patients; 3) The bolus sets are administered to the patient at the correct times (e.g., no more than 4 hours apart). In this process, the influence of human errors is highly susceptible, and absolute quality assurance is difficult to realize.
Medication errors may also occur in a pharmacy environment. The filled prescription may be incorrectly marked with an incorrect dosage and quantity of the pill, or with an incorrect medication. Because the pharmacist overstrain and confuse pills with similar medication names and physical appearances, the pharmacist may dispense the wrong medications, amounts, and dosages, and errors may occur that cause serious injury and even death to the patient.
With the development of deep learning in the field of image recognition, the task of performing medical image recognition by using a deep learning model is gradually pushed to climax, and the development of computer vision technology is gradually mature at present, and particularly, unprecedented effects are obtained in the directions of image processing, voice recognition and the like. Compared with manual subjective experience judgment, the image recognition method based on the deep learning model has the advantages that image recognition is carried out by using the image recognition method based on the deep learning model to assist medical staff in classifying medicines, so that death cases of patients due to human errors of the medical staff can be avoided. The traditional machine learning method is adopted to extract the characteristics of the capsule medicines, such as Li Shuai, and the automatic task of pharmacy is realized. Shi Huayu et al have extracted the characteristic of the medicine package by deep learning, have already obtained the preliminary achievement by applying the medicine package recognition system of the deep learning technology, have already finished classifying and identifying 500 medicines, verification accuracy is 96.4% while training. Zhang Zhenjiang et al, utilize computer vision technique to realize automatic identification of medicine class and quantity, opened the technical feasibility that utilizes deep learning technique to assist out-patient delivery. The main technical method adopted is as follows: and acquiring an external medicine packing image, generating a training image set by utilizing a preprocessing technology, building a 7-layer (3C 3P 1F) convolutional neural network model for training, and deploying a medicine image recognition service with RESTful interface specification. However, in practical application, for the medicine dispensing of inpatients, after the medicine external package which is easy to be used as identification information is removed, single, multiple and even 1/2, 1/4 and other fractional pills are packaged and dispensed again at fixed time and fixed quantity according to the prescription medicine list issued by doctors. Thus, the prior art has not fully met the need for drug identification.
Disclosure of Invention
The object of the present invention is to develop a method for quickly and accurately identifying pill/pill combinations in an automated manner, which must be able to correctly identify the pill required by the patient who has been dispensed. The difficulty of correct identification is that the size, the color and the shape of the packaged pill are not different, and the tiny characteristics are difficult to distinguish; secondly, the conditions of stacking, side placing and the like can be possibly caused during medicine distribution, and the complex task cannot be met by a single-angle picture identification model; in addition, considering that the clinical conditions, age, physical quality and the like of patients are aimed at, doctors can increase or decrease the dosage of medicines as appropriate, and in actual cases, half tablets, even 1/4 metering and other cases of 'incomplete' pills can appear. Aiming at the situation, in order to accurately identify all pills including incomplete pills, the invention adopts a multi-angle and multi-view shooting method to collect the pills and deal with different situations such as side placement, stacking and the like of the medicines; a pill picture recovery channel is designed to solve the problem of incomplete pill identification. Meanwhile, the application scene of the later stage is considered to be mostly used for small and medium-sized equipment, such as a drug delivery robot, an automatic drug distribution system of a pharmacy and the like, and a lightweight neural network is adopted to realize the task, so that the conversion of the later-stage product is facilitated.
The aim of the invention is achieved in that: a deep learning oral pill identification method based on multi-view and data expansion is characterized in that: constructing a database by adopting a multi-view and data augmentation method, and perfecting a data set at multiple angles; designing a practical model embedded in mobile equipment and small and medium-sized equipment by using a lightweight network; combining the multiple views with the two-dimensional model, completing the construction of a practical model after transfer learning, correctly identifying the oral pills and issuing the oral pills to corresponding patients through mobile equipment and small and medium-sized equipment; establishing an incomplete oral pill identification channel, recovering the incomplete pill, and enhancing the practicability of the model;
the method comprises the following specific steps of:
1) Constructing a multi-view database;
2) Restoring the incomplete pill;
3) Data augmentation;
4) Building a convolutional neural network pill recognition model through pre-training transfer learning;
5) Outputting medicine classification results and medicine name information;
in the construction of the multi-view database, a shot medicine picture is taken as basic data of a data set, and in the medicine picture shooting method, shooting rules and shooting angles are specified, wherein the shooting angles comprise plane placement shooting angles, non-axisymmetric medicine shooting angles, vertical plane shooting angles and special condition shooting angles;
the recovery of the incomplete pills is to identify the complete pill picture corresponding to the incomplete pill by using a template matching algorithm aiming at half and even 1/4 metered incomplete pills appearing in the actual situation, and send the identified corresponding complete pill into a built convolutional neural network drug identification model.
The data amplification is to cut out medicine pictures according to resolution under the condition of not substantially increasing data, so as to generate different picture data and obtain data quantity amplification; in operation, from a mathematical perspective, data augmentation defaults to an image center point where image translation, image flipping, image rotation, image scaling, image miscut, image cropping, and combination transformation are completed;
the transfer learning is to perform pre-training on an ImageNet public data set and learn the semantic expression of the edge information shallow layer; and finally, using the learned optimal weight as the initialization weight of the drug identification network in the iteration times set by the ImageNet pre-training, so that the model is quickly converged to achieve a better identification effect.
The model building adopts a lightweight model MobileNetv2 as a basic framework, and on the framework of the basic model, the size of a convolution kernel is enlarged, and the receptive field of the convolution neural network is increased; meanwhile, a mixed attention module mechanism of channel attention and space attention is introduced, the feature extraction capability of a network is improved, pill features are extracted to the maximum extent, and medicine identification is assisted.
In the medicine picture shooting rules, medicines per se account for 50-60% of the whole picture and cannot be too small, the shooting background adopts a solid color unified color background which is different from the medicine color, and if some medicines are the same as or similar to the background color, other solid color backgrounds are selected by the medicines independently; high quality is required for shooting and clear focusing is required.
When the shooting angle is plane placement shooting, for a centrally symmetrical medicine, 3 pictures are shot, and the pictures are shot at 180 degrees, 60 degrees and 30 degrees of the plane of the camera respectively;
when the shooting angle is non-axisymmetric medicine shooting, the medicine is shot by rotating a plurality of angles, and shooting is performed at intervals of about 30 degrees;
when the shooting angle is vertical shooting, shooting one picture every 30 degrees by rotating 180 degrees, wherein the total number of the pictures is 6;
for special medicines without vertical surfaces, such as capsules, the medicines are only shot and rotated 180 degrees or 360 degrees according to the plane; if the front and back sides of the medicine are inconsistent, a group of pictures are required to be shot according to the plane shooting rules on the front and back sides.
Identifying a complete pill picture corresponding to the incomplete pill by using a template matching algorithm, wherein the template matching is a method for searching a specific target in an image, comparing whether each part is similar to the template by traversing each possible position in the image, and considering that the matching is successful when the similarity is high enough;
the algorithm comprises the following steps:
1) Determining the length and width x and y of the current picture;
2) Determining the length and width w and h of the template picture;
3) Sequentially comparing (x-w, y-h) from (0, 0) as an initial point, and calculating the similarity between the pictures (i, j) - (i+w, j+h) at each point (i, j) and the template;
4) And returning the similarity of each point after the comparison is completed.
In the data augmentation, image translation, image overturning, image rotation, image scaling, image miscut, image clipping and combination transformation are completed at a central point from a mathematical angle; the method comprises the following steps: 1) Firstly, moving the rotation point to the original point; 2) Performing a rotation about an origin; 3) Moving the rotation point back to the original position;
assume that the original coordinates of the image are
Figure SMS_1
The coordinates after translation are +.>
Figure SMS_2
The coordinate relationship before and after translation is as follows, wherein H is a transformation matrix;
Figure SMS_3
image translation: translation means that all pixels are in
Figure SMS_4
And->
Figure SMS_5
The directions are translated and the mathematical matrix corresponding to the translation transformation is as follows: />
Figure SMS_6
、/>
Figure SMS_7
Representation->
Figure SMS_8
Distance moved in direction:
Figure SMS_9
image overturning, namely image mirror image processing, wherein the image overturning comprises horizontal overturning and vertical overturning; the horizontally flipped transform matrix is:
Figure SMS_10
the vertically flipped transform matrix is:
Figure SMS_11
the image rotation is to default to the rotation of any angle theta by taking the center point of the image as the center, and the transformation matrix is as follows:
Figure SMS_12
image scaling refers to scaling of a current image by any scale, the transformation matrix of which is as follows, wherein
Figure SMS_13
Representing the scaled size;
Figure SMS_14
image miscut refers to the non-perpendicular projection of a planar scene onto a projection plane, the transformation matrix of which is as follows,
Figure SMS_15
is the angle converted in the x and y directions;
Figure SMS_16
the image clipping is to scale the picture to 1.1 times of the original picture, and then clipping operation is carried out on the scaled image;
the combination transformation adopts a combination of a plurality of augmentation modes, and a given translation transformation matrix is assumed
Figure SMS_17
Rotation matrix
Figure SMS_18
Scaling matrix->
Figure SMS_19
For the combination transform one, its combined matrix M is as follows:
Figure SMS_20
for the combination transform two, the matrix N after its combination is as follows:
Figure SMS_21
on the framework of a lightweight model MobileNetv2 basic model, the size of the enlarged convolution kernel is as follows: increasing receptive fields from 3×3 to 5×5 or more and increasing ConvolutionalBlock Attention Module (CBAM) attention mechanisms enhances the feature extraction capability of the model.
When a complete pill picture corresponding to the incomplete pill is identified by using a template matching algorithm, setting a similarity threshold to be a certain value above 90%, and when the matching similarity reaches the threshold, considering that the pill matching is successful.
The method comprises the steps of designing a practical model embedded in small and medium-sized mobile equipment, combining multiple views with a two-dimensional model, integrating codes of the model into a main control end of the small and medium-sized mobile equipment, arranging an LCD screen and a camera on the equipment, scanning medicines through the camera, collecting medicine pictures, transmitting the medicine pictures back to the main control end, and sending the medicine pictures into the model for identification; the identified medicine name and the acquired medicine picture are correspondingly displayed on an LCD screen for an operator to check.
The beneficial effects of the invention are as follows:
1. the medicine with similar appearance and color is effectively classified, medical staff is assisted in sorting medicines, and the situation that life safety of patients is threatened due to medicine classification errors is reduced or even avoided.
2. The multi-view database is used for constructing a multi-view database to cope with complex situations of drug side placement, stacking and the like of multiple drugs in the drug dispensing process, and the multi-view database is combined with different data augmentation means to further improve the angle of a data set.
3. The data augmentation and the transfer learning make up for the overfitting caused by small data volume, and a practical model embedded into mobile equipment and small and medium-sized equipment is designed by combining the data augmentation and the transfer learning with a lightweight network, so that the method is convenient and practical and has wide application prospect.
4. The multi-view and the two-dimensional model are combined, so that the identification effect of the three-dimensional model is achieved to a certain extent, medicine identification randomly placed at multiple angles can be dealt with, meanwhile, the parameter number of the model is greatly reduced compared with that of the three-dimensional model, and conversion of actual products is easy.
5. An image recovery channel is designed for incomplete pills in actual pill distribution, so that the incomplete pills can be effectively identified.
6. The invention introduces a mixed attention CBAM module mechanism of channel attention and space attention, improves the characteristic extraction capacity of the network, extracts pill characteristics to the maximum extent and assists in drug identification.
Drawings
FIG. 1 is a schematic flow chart of an identification method of the present invention.
Fig. 2 is a schematic flow chart of a medicine identification convolutional neural network constructed by the invention.
Fig. 3 is a schematic diagram of correct drug duty ratio in the photographing rule of the present invention.
Fig. 4 is a schematic diagram of the wrong drug ratio in the shooting rule of the present invention.
Fig. 5 is a schematic view of a circular medicine multi-view photographing view angle. In the figure, 1 is a camera, and 2 is a medicine.
Fig. 6 to 8 are schematic diagrams of the photographing angles of non-axisymmetric medicine planes.
Fig. 9-10 are schematic views of vertical plane shooting angles.
Fig. 11 is a schematic diagram of the principle of the template matching algorithm.
Fig. 12-14 are schematic diagrams of example centrally symmetric drug template matching.
Fig. 15-17 are schematic diagrams of non-centrosymmetric drug template matching examples.
Fig. 18 is a schematic diagram of an image augmentation scheme.
Fig. 19 is a MobileNetv2 model framework.
FIG. 20 is a schematic diagram of receptive fields.
FIG. 21 is a schematic diagram of a CBAM attention mechanism model.
Fig. 22 MobileNetv2 inverted residual block model modification schematic.
FIG. 23 is a diagram showing training effects and training loss of accuracy versus iteration number in an embodiment of the present invention.
FIG. 24 is a diagram showing the test loss and iteration number of training effect and accuracy according to an embodiment of the present invention.
FIG. 25 is a diagram showing the accuracy and the number of iterations of the training effect and the accuracy test according to the embodiment of the present invention.
Detailed Description
Experimental environment
The hardware environment and the software environment used in this experimental example are shown in table 1:
TABLE 1
Figure SMS_22
The invention adopts a lightweight model MobileNet v2 as an infrastructure, the MobileNet v2 is proposed by Google in 2018, and the innovation point of the model is Inverted Residuals and Linear Bottlenecks. The method aims at improving accuracy and reducing occupation of the memory. The whole model architecture is shown in Table 2
TABLE 2
Figure SMS_23
Experimental data
The experimental data used pictures of 753 individual pills, for a total of 93 pill-like pictures. And collecting photographed high-definition JPG format pictures, wherein photographing rules are collected according to data set collection standards, and specific numbers and categories are shown in Table 3. TABLE 3 Table 3
Name of the name Quantity of
Chlorpheniramine maleate tablet (chlorphenamine 4 mg) 5
Metformin hydrochloride tablet (Guhua Zhi 1 g) 5
Folic acid 5mg 5
Bei Xi (acarbose 50 mg) 5
Red source up to 0.15g 5
Carbazochrome sodium sulfonate tablet (Lo Ye 5 mg) 4
Nifedipine tablets 10mg 5
Nifedipine controlled release tablet 30mg (Xinran) 5
Profen codeine sustained release tablet 5
Amlodipine besylate (pennies) 5mg 5
Rosuvastatin calcium tablet (clonidine) 10mg 4
Bai Liang capsule 0.5g 4
Finasteride tablet (quinine) 5mg 4
Irbesartan tablet (illida) 0.15g 5
Ibuprofen sustained release capsule 0.3g 4
Meng Tesi Lu Na tablet (Shunning) 10mg 5
Metformin sustained release tablet 0.5g 4
Nifedipine controlled release tablet (Baixin same) 30mg 4
Linagliptin tablet (Outangning) 5mg 5
Engliflozin tablet (European Tang Jing) 10mg 5
Calcitriol capsule (Luo Gaiquan) 0.25ug 4
Polyene phosphatidylcholine capsule (Yi Shanfu) 228mg 4
Quetiapine fumarate sustained release tablet 200mg 4
Irbesartan hydrochlorothiazide tablet 150mg 5
Tolvaptan tablet (Su Maika) 15mg 4
0.5g of adenosylmethionine butanesulfonate enteric-coated tablet (Simetate) 4
Eszopiclone tablet (itannin) 3mg 4
Perindopril tert-butylamine tablet (minoxidil) 4mg 4
Bacillus subtilis bigeminal live bacterium enteric capsule (metoprolol) 250mg 5
Valsartan amlodipineFlat (Baibang) 1 tablet 4
Dapagliflozin tablet (Andapong) 10mg 5
Fluoxetine hydrochloride capsule (omaren) 20mg 4
Fenofibrate capsule 200mg 4
Teprenone capsules (Shi Weishu) 50mg 4
Ivabradine hydrochloride tablet (colant) 5mg 4
Isosorbide mononitrate sustained release tablet (Emulation) 60mg 4
Dabigatran etexilate capsule (Tai Bi Quan) 150mg 4
Isosorbide mononitrate sustained release capsule 40mg 4
Olmesartan medoxomil hydrochlorothiazide tablet (compound ao tan) 1 tablet 4
Moxifloxacin hydrochloride tablet 0.4g 4
Alolol hydrochloride tablet (Almarol 10 mg) 4
Sertraline hydrochloride (vitamin D) 50mg 4
Doxycycline hydrochloride tablet 0.1g 4
Irbesartan hydrochlorothiazide tablet (clenbuterol) 1 tablet 4
Cefdinir capsule (tepu kang) 0.1g 4
Clopidogrel hydrogen sulfate tablet (Sichuang) 75mg 4
Sulindac tablet (Pivot force reaches) 0.1g 4
Lercanidipine hydrochloride tablet 10mg 4
Levofloxacin tablet (jidakang) 0.5g 4
Rivaroxaban tablet (Li Erban) 10mg 4
Acetylcysteine (Rich in application) 0.6g 5
Eucalyptus lemon pinusEnteric soft capsule (Nao-cut) 0.3g 5
Voriconazole tablet (pinacol) 50mg 7
Methylprednisolone tablet (Mei Zhuo Le) 4mg 8
Silybin capsule (Water Lin Jia) 1 granule 5
Itraconazole capsule (spinornol) 1 granule 4
Bulleyaconitine A tablet (Sefomet) 0.4 mg 5
Furanolazine fumarate tablet (Wook) 20mg 6
Compound glycyrrhizin tablet (Mei Neng) 1 tablet 4
Sodium rosiglitazone tablet (tairo) 4mg 4
Rifampicin capsules 0.15g 5
Itraconazole capsule (Yi kang) 0.1g 5
Olopatadine hydrochlorideTablet (ao Hui Da) 5mg 5
Sitagliptin metformin tablet (minoxidil) 50mg 8
Left thyroxine sodium tablet (Youjiale) 50ug 13
Nicergoline tablet (Le Xilin) 10mg 9
Dioseltamine tablet (Ge Tai) 0.45g 14
Rivaroxaban tablet (beritol) 10mg 13
Mycophenolate mofetil capsule (Ma Kexi) 0.25g 14
Cyclosporine soft capsule (Xinsaiping) 25mg 7
Bicalutamide tablet (Kangshide) 1 granule 11
Letrozole (Furui) 2.5mg 14
Sodium aescinate tablet (European style) 30mg 14
Valeric acidEstradiol tablet (Bujiale) 1mg 3
Mesalazine enteric coated tablet (salafol) 0.5g 12
Paeonia total glycosides capsule (Pavlin) 0.3g 23
Prucarbide succinate tablet (Ralisheng) 2mg 8
Ondansetron hydrochloride tablet (European scallop) 4mg 22
Clarithromycin tablet (Clarithromycin) 1 granule 14
Sodium rabeprazole enteric-coated tablet (rebot) 10mg 7
Rebamipide tablet (moxibusida) 0.1g 13
Piwei ammonium bromide tablet (Naite's) 50mg 7
Omeprazole enteric-coated tablet (loxic) 10mg 26
Dydrogesterone tablet (Dafutong) 10mg 11
Esomeprazole enteric-coated tablet (anti-letter) 20mg 32
Gliclazide tablet (Meidakang) 80mg 25
Enteric-coated tablet of sulfasalazine (confidence) 0.25g 7
Levofloxacin tablet (colabi) 1 tablet 17
Glimepiride tablet (Limussel apple) 2mg 10
Lamivudine tablet 0.1g 19
Famciclovir tablet (cis-intravenous) 0.125g 17
Compound digestive enzyme capsule (Qian hong Yimei) 1 granule 22
Sitagliptin phosphate tablet (Mentha arvensis) 100mg 14
Amoxicillin and clavulanate potassium dispersible tablet (Junlqing) 0.22g 15
See fig. 1.
The pill identification model is built based on MobileNetv2, and is a CNN model. The CNN medicine identification network is a model built based on MobileNetv 2.
The medicines are divided into complete pills and incomplete pills and enter a CNN medicine identification network. The incomplete pill is subjected to template matching, restored into an image of the complete pill, and then enters the recognition system. And outputting the name and the picture of the medicine after the identification system completes identification.
Fig. 2 is a schematic flow chart of a medicine identification convolutional neural network constructed by the invention.
In the construction of the multi-view database, the shot medicine picture is adopted as the basic data of the data set. In the medicine picture shooting method, shooting rules and shooting angles are specified, and the shooting angles comprise plane placement shooting angles, non-axisymmetric medicine shooting angles, vertical plane shooting angles and special condition shooting angles.
Fig. 2 shows that after the multi-view database is built, data augmentation is performed on the built MobileNetv2 model. Data augmentation includes data for image flipping, image scaling, image miscut, image rotation, and image cropping. Firstly, performing pre-training transfer learning on an ImageNet public data set, and learning semantic expression of an edge information shallow layer; and finally, using the learned optimal weight in the iteration times set by the ImageNet pre-training to serve as the initialization weight of the drug identification network, so that the model is quickly converged to achieve a good identification effect. The identification step comprises the following steps:
1) Constructing a multi-view database;
2) Restoring the incomplete pill;
3) Data augmentation;
4) The model building is completed through pre-training transfer learning;
5) Outputting the medicine classification result and medicine name information.
The model building adopts a lightweight model MobileNetv2 as a basic framework, on the framework of the basic model, the size of a convolution kernel is enlarged, the receptive field of a convolution neural network is increased, and the feature extraction capability of the network is improved; in addition, in the inverted residual block of the model, a CBAM attention mechanism is introduced, and the CBAM attention mechanism can take into account two aspects of space (spatial) and channel (channel), so that compared with the mechanism focusing on only one aspect, the feature extraction capability of the model is further improved, the fine features of pills are extracted to the maximum extent, and the recognition of the pills in the later stage is facilitated. And outputting the processed data as required to obtain correct medicine classification result and medicine name.
Fig. 3 to 10 are schematic views of shooting rules according to the present invention. Fig. 3 is a correct medicine proportion illustration, fig. 4 is an incorrect medicine proportion illustration, in the medicine picture shooting process, medicines themselves account for 50-60% of the whole picture and cannot be too small, the shooting background adopts a solid color unified color background which is different from the medicine color, and if some medicines are the same as or similar to the background color, other solid color backgrounds are selected by the medicines independently; high quality is required for shooting and clear focusing is required.
See fig. 5-10.
When the shooting angle is plane placement shooting, 3 pictures are shot for the centrosymmetric medicine, and the pictures are shot at 180 degrees, 60 degrees, 30 degrees and other angles of the plane of the camera. When the shooting angle is non-axisymmetric medicine shooting, the medicine is shot by rotating a plurality of angles, and shooting is performed at intervals of about 30 degrees; if the medicine is symmetrical left and right, the medicine is rotated 180 degrees, and one medicine is shot every 30 degrees. When the shooting angle is vertical shooting, the shooting angle is rotated 180 degrees, and a total of 6 pictures are shot every 30 degrees. For special medicines without vertical surfaces, such as capsules, the medicines are only shot by a plane to rotate 180 degrees or 360 degrees. If the front and back sides of the medicine are inconsistent, a group of pictures are required to be shot according to the plane shooting rules on the front and back sides.
See fig. 12-17.
Fig. 12 to 14 are schematic diagrams of examples of centrally symmetric drug template matching.
Fig. 12 is a complete pill and fig. 13 is a half of the incomplete pill of fig. 12, which has been restored to a complete pill consistent with fig. 12 by center-symmetrical drug template matching.
Fig. 15-17 are schematic diagrams of non-centrosymmetric drug template matching examples.
Fig. 15 is a complete medicine, fig. 16 is a half incomplete pill of fig. 15, after center-symmetrical medicine template matching, fig. 17 is a picture schematic of the complete pill of fig. 16 being reduced.
The data augmentation of this embodiment is shown in fig. 18.
Based on the characteristic of fewer medical data sets, the data preprocessing part integrates a data augmentation means of a model. Data augmentation is also known as data augmentation, i.e., letting limited data produce a value equivalent to more data without substantially augmenting the data. In this embodiment, if the resolution of the picture input by the network is 256×256, we use a method of randomly clipping 224×224, and one picture can generate at most 32×32 different pictures, and the data size is expanded by approximately 1000 times. All data augmentation is done by default at the image center point at the time of operation from a mathematical point of view, the operation is divided into the following steps: 1) Firstly, moving the rotation point to the original point; 2) Performing a rotation about an origin; 3) And moving the rotation point back to the original position.
Assume that the original coordinates of the image are
Figure SMS_24
The coordinates after translation are +.>
Figure SMS_25
The coordinate relationship before and after translation is as follows, wherein H is a transformation matrix;
Figure SMS_26
image translation: translation means that all pixels are in
Figure SMS_27
And->
Figure SMS_28
The directions are translated and the mathematical matrix corresponding to the translation transformation is as follows: />
Figure SMS_29
、/>
Figure SMS_30
Representation->
Figure SMS_31
Distance moved in direction:
Figure SMS_32
image overturning, namely image mirror image processing, wherein the image overturning comprises horizontal overturning and vertical overturning; the horizontally flipped transform matrix is:
Figure SMS_33
the vertically flipped transform matrix is:
Figure SMS_34
the image rotation is to default to the rotation of any angle theta by taking the center point of the image as the center, and the transformation matrix is as follows:
Figure SMS_35
image scaling refers to scaling of a current image by any scale, the transformation matrix of which is as follows, wherein
Figure SMS_36
Representing the scaled size;
Figure SMS_37
image miscut refers to the non-perpendicular projection of a planar scene onto a projection plane, the transformation matrix of which is as follows,
Figure SMS_38
is the angle converted in the x and y directions;
Figure SMS_39
consistent with the common practice of deep learning cropping, this embodiment enlarges the image 1.1 times that of the original image when cropping the image, and then performs random scale cropping operation on the enlarged image.
Fig. 19 is a MobileNetv2 model framework.
The invention adopts a lightweight model MobileNet v2 as an infrastructure, the MobileNet v2 is proposed by Google in 2018, and the innovation point of the model is Inverted Residuals and Linear Bottlenecks. The method aims at improving accuracy and reducing occupation of the memory.
For this model, the number of channels increases with time and the spatial dimensions correspondingly decrease. Overall, however, the tensor remains relatively small due to the bottleneck layer that constitutes the connection between the blocks.
In order to improve the feature extraction capability of the model, the method enlarges the size of a convolution kernel, increases the receptive field of a convolution neural network and improves the extra extraction capability of the network on the framework of a basic model. This embodiment increases the receptive field from 3×3 to 5×5 or more. As shown in fig. 20.
CBAM attention mechanisms have been added to further enhance feature extraction capabilities. As shown in fig. 21.
Convolutional BlockAttention Module (CBAM) represents the attention mechanism module of a convolution module, which is a kind of attention mechanism module that combines space (spatial) and channel (channel). For pill identification according to the present invention, because pill identification is a fine-grained identification problem, a network with relatively high feature extraction capability is required to identify subtle differences between pills. The attention module in the CBAM allows the network to place attention in areas of interest for pill identification without excessive learning such as background disturbance of the characteristics of the areas, thus maximizing the model to extract pill characteristics to identify nuances between pills. Meanwhile, because the CBAM is a lightweight general module, the module spending can be ignored and seamlessly integrated into any CNN architecture, and the module spending can be used for end-to-end training together with the basic CNN, and the lightweight module is added, so that the overall performance of the model is improved, the size of the model is not excessively increased, and the method is beneficial for embedding mobile equipment in the later period.
Fig. 22 is a diagram of a modification of the MobileNetv2 inverted residual block model.
The figure shows that the channel expansion is performed first, then the channel reduction is performed, and a 1×1 "expansion" layer (PW) is added before the Depth-wiseconvolution (DW), so as to increase the channel number, and obtain more features, namely: "dilation" (PW) → "convolution characterization" (DW) → "compression" (PW).
The combination transformation adopts a combination of a plurality of augmentation modes. The data augmentation in deep learning generally adopts a combination of a plurality of augmentation modes, and matrix multiplication operation is involved, and according to the operation rule, the different combination sequence results are known to be different, namely, the data augmentation in linear algebra
Figure SMS_40
Of course, the specific examples are excluded. For better explanation, assume a given translation transformation matrix +.>
Figure SMS_41
Rotation matrix->
Figure SMS_42
Scaling matrix->
Figure SMS_43
. In this embodiment we present two different combination transformations.
For the combination transform one, its combined matrix is as follows:
Figure SMS_44
for the combination transform two, the matrix after combination is as follows:
Figure SMS_45
the data augmentation mode can not only prevent the model from being fitted excessively, but also make up for the condition that the image data of certain angles are not acquired due to insufficient comprehensive acquisition method in the database establishment process, and further improves the multi-latitude database.
In this example, mobileNetv2 was used as the basic framework for pill recognition, epoch was set to 500, batch_size was set to 16, and learning rate was designed to 0.001. If the training accuracy is not reduced within 5 epochs, the learning rate is reduced by 10%.
The training effect and accuracy are shown in fig. 23. The figure can be seen: as the number of iterations increases, the loss of training sets gradually decreases, converging to a certain range.
The test loss and the number of iterations are shown in the graph of fig. 24, which can be seen: as the number of iterations increases, the loss of the test set gradually decreases, converging to a certain range.
The test accuracy and iteration number of the training effect and accuracy are shown in fig. 25. The figure can be seen: with the increase of the iteration times, the precision of the test set gradually rises and finally stabilizes in a certain range.

Claims (7)

1. A deep learning oral pill identification method based on multi-view and data expansion is characterized in that: constructing a database by adopting a multi-view and data augmentation method, and perfecting a data set at multiple angles; designing a practical model embedded in small and medium-sized mobile equipment by using a lightweight network; combining the multiple views with the two-dimensional model, completing the construction of a practical model after transfer learning, correctly identifying the oral pills and issuing the oral pills to corresponding patients through mobile equipment and small and medium-sized equipment; establishing an incomplete oral pill identification channel, recovering the incomplete pill, and enhancing the practicability of the model;
the method comprises the following specific steps of:
1) Constructing a multi-view database;
2) Restoring the incomplete pill;
3) Data augmentation;
4) Building a convolutional neural network pill recognition model through pre-training transfer learning;
5) Outputting medicine classification results and medicine name information;
in the construction of the multi-view database, a shot medicine picture is taken as basic data of a data set, and in the medicine picture shooting method, shooting rules and shooting angles are specified, wherein the shooting angles comprise plane placement shooting angles, non-axisymmetric medicine shooting angles, vertical plane shooting angles and special condition shooting angles;
the recovery of the incomplete pills is to identify the complete pill picture corresponding to the incomplete pill by using a template matching algorithm aiming at half pills or even 1/4 metered incomplete pills appearing in the actual situation, and send the identified corresponding complete pill into a built convolutional neural network pill identification model;
the data amplification is to cut out medicine pictures according to resolution under the condition of not substantially increasing data, so as to generate different picture data and obtain data quantity amplification; in operation, from a mathematical perspective, data augmentation defaults to an image center point where image translation, image flipping, image rotation, image scaling, image miscut, image cropping, and combination transformation are completed;
the pre-training transfer learning is to perform pre-training on an ImageNet public data set firstly, and learn the semantic expression of the edge information shallow layer of the image Net public data set; finally, the optimal weight learned in the iteration times set in the ImageNet pre-training is used for initializing the convolutional neural network pill recognition model, so that the model is quickly converged to achieve a better recognition effect;
the model building adopts a lightweight model MobileNetv2 as a basic framework, and on the framework of the basic model, the size of a convolution kernel is enlarged, and the receptive field of the convolution neural network is increased; meanwhile, a mixed attention CBAM module mechanism of channel attention and space attention is introduced, the feature extraction capacity of a network is improved, pill features are extracted to the maximum extent, drug identification is assisted, processed data are output, and a correct drug classification result and a drug name are output according to requirements.
2. The multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: in the medicine picture shooting rules, medicines per se account for 50-60% of the whole picture, the shooting background cannot be too small, the shooting background adopts a solid color unified color background which is different from the medicine color, and if some medicines are the same as or similar to the background color, other solid color backgrounds are selected by the medicines independently; high quality is required for shooting, and clear focusing is required;
when the shooting angle is plane placement shooting, for a centrally symmetrical medicine, 3 pictures are shot, and the pictures are shot at 180 degrees, 60 degrees and 30 degrees of the plane of the camera respectively;
when the shooting angle is non-axisymmetric medicine shooting, the medicine is shot by rotating a plurality of angles, and shooting is performed at intervals of about 30 degrees; if the medicine is bilaterally symmetrical, the medicine is rotated 180 degrees, and one medicine is shot every 30 degrees;
when the shooting angle is vertical shooting, shooting one picture every 30 degrees by rotating 180 degrees, wherein the total number of the pictures is 6;
for special medicines without vertical surfaces, such as capsules, the medicines are only shot and rotated 180 degrees or 360 degrees according to the plane; if the front and back sides of the medicine are inconsistent, a group of pictures are required to be shot according to the plane shooting rules on the front and back sides.
3. The multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: the method comprises the steps that a template matching algorithm is used for identifying a complete pill picture corresponding to a defective pill, template matching is a method for searching a specific target in an image, whether each part is similar to a template or not is compared by traversing each possible position in the image, and when the similarity is high enough, the matching is considered to be successful;
the algorithm comprises the following steps:
1) Determining the length and width x and y of the current picture;
2) Determining the length and width w and h of the template picture;
3) Sequentially comparing (x-w, y-h) from (0, 0) as an initial point, and calculating the similarity between the pictures (i, j) - (i+w, j+h) at each point (i, j) and the template;
4) And returning the similarity of each point after the comparison is completed.
4. The multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: in the data augmentation, image translation, image overturning, image rotation, image scaling, image miscut, image clipping and combination transformation are completed at a central point from a mathematical angle; the method comprises the following steps: 1) Firstly, moving the rotation point to the original point; 2) Performing a rotation about an origin; 3) Moving the rotation point back to the original position;
assume that the original coordinates of the image are
Figure QLYQS_1
The coordinates after translation are +.>
Figure QLYQS_2
The coordinate relationship before and after translation is as follows, wherein H is a transformation matrix;
Figure QLYQS_3
image translation: translation means that all pixels are in
Figure QLYQS_4
And->
Figure QLYQS_5
The directions are translated and the mathematical matrix corresponding to the translation transformation is as follows:
Figure QLYQS_6
、/>
Figure QLYQS_7
representation->
Figure QLYQS_8
Distance moved in direction:
Figure QLYQS_9
image overturning, namely image mirror image processing, wherein the image overturning comprises horizontal overturning and vertical overturning; the horizontally flipped transform matrix is:
Figure QLYQS_10
the vertically flipped transform matrix is:
Figure QLYQS_11
the image rotation is to default to the rotation of any angle theta by taking the center point of the image as the center, and the transformation matrix is as follows:
Figure QLYQS_12
image scaling refers to scaling of a current image by any scale, the transformation matrix of which is as follows, wherein
Figure QLYQS_13
Representing the scaled size;
Figure QLYQS_14
image miscut refers to the non-perpendicular projection of a planar scene onto a projection plane, the transformation matrix of which is as follows,
Figure QLYQS_15
is the angle converted in the x and y directions;
Figure QLYQS_16
the image clipping is to scale the picture to 1.1 times of the original picture, and then clipping operation is carried out on the scaled image;
the combination transformation adopts a combination of a plurality of augmentation modes, and a given translation transformation matrix is assumed
Figure QLYQS_17
Rotation matrix->
Figure QLYQS_18
Scaling matrix->
Figure QLYQS_19
For the combination transform one, its combined matrix M is as follows:
Figure QLYQS_20
for the combination transform two, the matrix N after its combination is as follows:
Figure QLYQS_21
5. the multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: on the framework of a lightweight model MobileNetv2 basic model, the size of the enlarged convolution kernel is as follows: increasing the receptive field from 3×3 to 5×5; meanwhile, a CBAM attention mechanism is added in the inverted residual block, so that fine features of pills are extracted to the maximum extent from channel attention and space attention, and the feature extraction capability of a model is enhanced.
6. The multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: the method comprises the steps of designing a practical model embedded in small and medium-sized mobile equipment, combining multiple views with a two-dimensional model, integrating codes of the model into a main control end of the small and medium-sized mobile equipment, arranging an LCD screen and a camera on the equipment, scanning medicines through the camera, collecting medicine pictures, transmitting the medicine pictures back to the main control end, and sending the medicine pictures into the model for identification; the identified medicine name and the acquired medicine picture are correspondingly displayed on an LCD screen for an operator to check.
7. A multiple view and data extension based deep learning oral pill recognition method according to claim 3, wherein: when a complete pill picture corresponding to the incomplete pill is identified by using a template matching algorithm, setting a similarity threshold to be a certain value above 90%, and when the matching similarity reaches the threshold, considering that the pill matching is successful.
CN202210242282.0A 2022-03-11 2022-03-11 Deep learning oral pill identification method based on multi-view and data expansion Active CN114821572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210242282.0A CN114821572B (en) 2022-03-11 2022-03-11 Deep learning oral pill identification method based on multi-view and data expansion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210242282.0A CN114821572B (en) 2022-03-11 2022-03-11 Deep learning oral pill identification method based on multi-view and data expansion

Publications (2)

Publication Number Publication Date
CN114821572A CN114821572A (en) 2022-07-29
CN114821572B true CN114821572B (en) 2023-04-21

Family

ID=82529659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210242282.0A Active CN114821572B (en) 2022-03-11 2022-03-11 Deep learning oral pill identification method based on multi-view and data expansion

Country Status (1)

Country Link
CN (1) CN114821572B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598130A (en) * 2020-04-08 2020-08-28 天津大学 Traditional Chinese medicine identification method based on multi-view convolutional neural network
CN113989623A (en) * 2021-12-03 2022-01-28 浙江中医药大学 Automatic identification method for traditional Chinese medicine decoction piece image

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545150A (en) * 2017-10-13 2018-01-05 张晨 Medicine identifying system and its recognition methods based on deep learning
CN109190643A (en) * 2018-09-14 2019-01-11 华东交通大学 Based on the recognition methods of convolutional neural networks Chinese medicine and electronic equipment
CN111914902B (en) * 2020-07-08 2024-03-26 南京航空航天大学 Traditional Chinese medicine identification and surface defect detection method based on deep neural network
CN112927753A (en) * 2021-02-22 2021-06-08 中南大学 Method for identifying interface hot spot residues of protein and RNA (ribonucleic acid) compound based on transfer learning
CN113449776B (en) * 2021-06-04 2023-07-25 中南民族大学 Deep learning-based Chinese herbal medicine identification method, device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598130A (en) * 2020-04-08 2020-08-28 天津大学 Traditional Chinese medicine identification method based on multi-view convolutional neural network
CN113989623A (en) * 2021-12-03 2022-01-28 浙江中医药大学 Automatic identification method for traditional Chinese medicine decoction piece image

Also Published As

Publication number Publication date
CN114821572A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
Shin et al. Development of a gastroretentive delivery system for acyclovir by 3D printing technology and its in vivo pharmacokinetic evaluation in Beagle dogs
ES2643291T3 (en) Controlled release dosage forms with inviolable closure coated
EP3435982A1 (en) Compositions
WO2002025568A3 (en) Pill identification and prescription management device, system and method
CN114821572B (en) Deep learning oral pill identification method based on multi-view and data expansion
Tian et al. Applications of excipients in the field of 3D printed pharmaceuticals
EP3015111B1 (en) Use of chinese medicine preparation in preparing drug for preventing and/or treating crohn's disease
Brunaugh et al. Essential pharmaceutics
CN101596147B (en) Automatic warning method of medicine distribution
Johannesson et al. Manipulations and age-appropriateness of oral medications in pediatric oncology patients in Sweden: Need for personalized dosage forms
Ma et al. Machine‐learning‐based approach for predicting postoperative skeletal changes for orthognathic surgical planning
US20230368552A1 (en) Drug identification device, drug identification method and program, drug identification system, drug loading table, illumination device, imaging assistance device, trained model, and learning device
JP2014053791A (en) Image processing device, image processing program and image pickup device
CN111081341A (en) Method and system for checking prescription medicines in bags
Tian et al. Multi-face real-time tracking based on dual panoramic camera for full-parallax light-field display
Kambayashi et al. A physiologically-based drug absorption modeling for orally disintegrating tablets
CN111709389A (en) Traditional Chinese medicine powder intelligent identification method and system based on microscopic image
Venkatesh et al. Effect of Hydrophilic Polymers on the Release Rate and Pharmacokinetics of Acyclovir Tablets Obtained by Wet Granulation: In Vitro and In Vivo Assays
CN115273053A (en) Drug disintegration performance identification method based on data processing
Salave et al. Recent Progress in Hot Melt Extrusion Technology in Pharmaceutical Dosage Form Design
Chareonying et al. Development of floating 3D-printed devices for carvedilol tablet
Meruva et al. Current State of Minitablet Product Design: A Review
Zuccari et al. The Role of the Pharmacist in Selecting the Best Choice of Medication Formulation in Dysphagic Patients
McGinity et al. Oral Controlled‐Release Polymeric Drug Delivery Systems
Reddy et al. Evaluation of compression process variables for multiunit particulate system (MUPS) tablet by QbD approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant