CN114821572B

CN114821572B - Deep learning oral pill identification method based on multi-view and data expansion

Info

Publication number: CN114821572B
Application number: CN202210242282.0A
Authority: CN
Inventors: 向军莲; 张俊然; 李南欣; 谢贤凯; 刘云飞; 李杨; 黄玲; 唐良友
Original assignee: Deyang Construction Investment Medical Co ltd; Sichuan University; Peoples Hospital of Deyang City
Current assignee: Deyang Construction Investment Medical Co ltd; Sichuan University; Peoples Hospital of Deyang City
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2023-04-21
Anticipated expiration: 2042-03-11
Also published as: CN114821572A

Abstract

A deep learning oral pill identification method based on multi-view and data expansion. And constructing a database by adopting a multi-view and data augmentation method, and perfecting the data set at multiple angles. And designing a practical model embedded into mobile equipment and small and medium-sized equipment by using a lightweight network. And combining the multi-view with the two-dimensional model, and completing the construction of the practical model after transfer learning. Meanwhile, an incomplete oral pill identification channel is established, and the incomplete pill is identified after template matching is carried out on the incomplete pill and restored into a complete pill picture. The method effectively classifies the medicines with the similar appearance and color, assists medical staff in sorting the medicines, and reduces or even avoids the life safety problem caused by mistakes in medicine classification of patients. The over-fitting problem caused by small data quantity is solved through multi-view database construction, data augmentation and transfer learning, a lightweight model MobileNetv2 is adopted as a basic framework, a attention module mechanism is introduced, the parameter quantity of the model is greatly reduced compared with that of a three-dimensional model, and the method is convenient, practical and easy to popularize.

Description

Deep learning oral pill identification method based on multi-view and data expansion

Technical Field

The invention belongs to the field of clinical medicine and nursing, and relates to a correct identification method of medicines.

Background

Hospitals carry the difficult task of curing the death and the handicap, have a large amount of work content and busy personnel, and have the problems of difficult identification and improper distribution of medicines without external packaging.

The work of dispensing medicines in hospitals is easily affected by human errors. The work of administering drugs to patients in a hospital or care-ward environment is an existing procedure that is a manual process: 1) Placing the correct medicament and the correct number of pill sets into a plastic cup; 2) The pill sets are properly delivered to the corresponding patients; 3) The bolus sets are administered to the patient at the correct times (e.g., no more than 4 hours apart). In this process, the influence of human errors is highly susceptible, and absolute quality assurance is difficult to realize.

Medication errors may also occur in a pharmacy environment. The filled prescription may be incorrectly marked with an incorrect dosage and quantity of the pill, or with an incorrect medication. Because the pharmacist overstrain and confuse pills with similar medication names and physical appearances, the pharmacist may dispense the wrong medications, amounts, and dosages, and errors may occur that cause serious injury and even death to the patient.

With the development of deep learning in the field of image recognition, the task of performing medical image recognition by using a deep learning model is gradually pushed to climax, and the development of computer vision technology is gradually mature at present, and particularly, unprecedented effects are obtained in the directions of image processing, voice recognition and the like. Compared with manual subjective experience judgment, the image recognition method based on the deep learning model has the advantages that image recognition is carried out by using the image recognition method based on the deep learning model to assist medical staff in classifying medicines, so that death cases of patients due to human errors of the medical staff can be avoided. The traditional machine learning method is adopted to extract the characteristics of the capsule medicines, such as Li Shuai, and the automatic task of pharmacy is realized. Shi Huayu et al have extracted the characteristic of the medicine package by deep learning, have already obtained the preliminary achievement by applying the medicine package recognition system of the deep learning technology, have already finished classifying and identifying 500 medicines, verification accuracy is 96.4% while training. Zhang Zhenjiang et al, utilize computer vision technique to realize automatic identification of medicine class and quantity, opened the technical feasibility that utilizes deep learning technique to assist out-patient delivery. The main technical method adopted is as follows: and acquiring an external medicine packing image, generating a training image set by utilizing a preprocessing technology, building a 7-layer (3C 3P 1F) convolutional neural network model for training, and deploying a medicine image recognition service with RESTful interface specification. However, in practical application, for the medicine dispensing of inpatients, after the medicine external package which is easy to be used as identification information is removed, single, multiple and even 1/2, 1/4 and other fractional pills are packaged and dispensed again at fixed time and fixed quantity according to the prescription medicine list issued by doctors. Thus, the prior art has not fully met the need for drug identification.

Disclosure of Invention

The object of the present invention is to develop a method for quickly and accurately identifying pill/pill combinations in an automated manner, which must be able to correctly identify the pill required by the patient who has been dispensed. The difficulty of correct identification is that the size, the color and the shape of the packaged pill are not different, and the tiny characteristics are difficult to distinguish; secondly, the conditions of stacking, side placing and the like can be possibly caused during medicine distribution, and the complex task cannot be met by a single-angle picture identification model; in addition, considering that the clinical conditions, age, physical quality and the like of patients are aimed at, doctors can increase or decrease the dosage of medicines as appropriate, and in actual cases, half tablets, even 1/4 metering and other cases of 'incomplete' pills can appear. Aiming at the situation, in order to accurately identify all pills including incomplete pills, the invention adopts a multi-angle and multi-view shooting method to collect the pills and deal with different situations such as side placement, stacking and the like of the medicines; a pill picture recovery channel is designed to solve the problem of incomplete pill identification. Meanwhile, the application scene of the later stage is considered to be mostly used for small and medium-sized equipment, such as a drug delivery robot, an automatic drug distribution system of a pharmacy and the like, and a lightweight neural network is adopted to realize the task, so that the conversion of the later-stage product is facilitated.

The aim of the invention is achieved in that: a deep learning oral pill identification method based on multi-view and data expansion is characterized in that: constructing a database by adopting a multi-view and data augmentation method, and perfecting a data set at multiple angles; designing a practical model embedded in mobile equipment and small and medium-sized equipment by using a lightweight network; combining the multiple views with the two-dimensional model, completing the construction of a practical model after transfer learning, correctly identifying the oral pills and issuing the oral pills to corresponding patients through mobile equipment and small and medium-sized equipment; establishing an incomplete oral pill identification channel, recovering the incomplete pill, and enhancing the practicability of the model;

the method comprises the following specific steps of:

1) Constructing a multi-view database;

2) Restoring the incomplete pill;

3) Data augmentation;

4) Building a convolutional neural network pill recognition model through pre-training transfer learning;

5) Outputting medicine classification results and medicine name information;

in the construction of the multi-view database, a shot medicine picture is taken as basic data of a data set, and in the medicine picture shooting method, shooting rules and shooting angles are specified, wherein the shooting angles comprise plane placement shooting angles, non-axisymmetric medicine shooting angles, vertical plane shooting angles and special condition shooting angles;

the recovery of the incomplete pills is to identify the complete pill picture corresponding to the incomplete pill by using a template matching algorithm aiming at half and even 1/4 metered incomplete pills appearing in the actual situation, and send the identified corresponding complete pill into a built convolutional neural network drug identification model.

The data amplification is to cut out medicine pictures according to resolution under the condition of not substantially increasing data, so as to generate different picture data and obtain data quantity amplification; in operation, from a mathematical perspective, data augmentation defaults to an image center point where image translation, image flipping, image rotation, image scaling, image miscut, image cropping, and combination transformation are completed;

the transfer learning is to perform pre-training on an ImageNet public data set and learn the semantic expression of the edge information shallow layer; and finally, using the learned optimal weight as the initialization weight of the drug identification network in the iteration times set by the ImageNet pre-training, so that the model is quickly converged to achieve a better identification effect.

The model building adopts a lightweight model MobileNetv2 as a basic framework, and on the framework of the basic model, the size of a convolution kernel is enlarged, and the receptive field of the convolution neural network is increased; meanwhile, a mixed attention module mechanism of channel attention and space attention is introduced, the feature extraction capability of a network is improved, pill features are extracted to the maximum extent, and medicine identification is assisted.

In the medicine picture shooting rules, medicines per se account for 50-60% of the whole picture and cannot be too small, the shooting background adopts a solid color unified color background which is different from the medicine color, and if some medicines are the same as or similar to the background color, other solid color backgrounds are selected by the medicines independently; high quality is required for shooting and clear focusing is required.

When the shooting angle is plane placement shooting, for a centrally symmetrical medicine, 3 pictures are shot, and the pictures are shot at 180 degrees, 60 degrees and 30 degrees of the plane of the camera respectively;

when the shooting angle is non-axisymmetric medicine shooting, the medicine is shot by rotating a plurality of angles, and shooting is performed at intervals of about 30 degrees;

when the shooting angle is vertical shooting, shooting one picture every 30 degrees by rotating 180 degrees, wherein the total number of the pictures is 6;

for special medicines without vertical surfaces, such as capsules, the medicines are only shot and rotated 180 degrees or 360 degrees according to the plane; if the front and back sides of the medicine are inconsistent, a group of pictures are required to be shot according to the plane shooting rules on the front and back sides.

Identifying a complete pill picture corresponding to the incomplete pill by using a template matching algorithm, wherein the template matching is a method for searching a specific target in an image, comparing whether each part is similar to the template by traversing each possible position in the image, and considering that the matching is successful when the similarity is high enough;

the algorithm comprises the following steps:

1) Determining the length and width x and y of the current picture;

2) Determining the length and width w and h of the template picture;

3) Sequentially comparing (x-w, y-h) from (0, 0) as an initial point, and calculating the similarity between the pictures (i, j) - (i+w, j+h) at each point (i, j) and the template;

4) And returning the similarity of each point after the comparison is completed.

In the data augmentation, image translation, image overturning, image rotation, image scaling, image miscut, image clipping and combination transformation are completed at a central point from a mathematical angle; the method comprises the following steps: 1) Firstly, moving the rotation point to the original point; 2) Performing a rotation about an origin; 3) Moving the rotation point back to the original position;

assume that the original coordinates of the image are

The coordinates after translation are +.>

The coordinate relationship before and after translation is as follows, wherein H is a transformation matrix;

image translation: translation means that all pixels are in

And->

The directions are translated and the mathematical matrix corresponding to the translation transformation is as follows: />

、/>

Representation->

Distance moved in direction:

image overturning, namely image mirror image processing, wherein the image overturning comprises horizontal overturning and vertical overturning; the horizontally flipped transform matrix is:

the vertically flipped transform matrix is:

the image rotation is to default to the rotation of any angle theta by taking the center point of the image as the center, and the transformation matrix is as follows:

image scaling refers to scaling of a current image by any scale, the transformation matrix of which is as follows, wherein

Representing the scaled size;

image miscut refers to the non-perpendicular projection of a planar scene onto a projection plane, the transformation matrix of which is as follows,

is the angle converted in the x and y directions;

the image clipping is to scale the picture to 1.1 times of the original picture, and then clipping operation is carried out on the scaled image;

the combination transformation adopts a combination of a plurality of augmentation modes, and a given translation transformation matrix is assumed

Rotation matrix

Scaling matrix->

，

For the combination transform one, its combined matrix M is as follows:

；

for the combination transform two, the matrix N after its combination is as follows:

。

on the framework of a lightweight model MobileNetv2 basic model, the size of the enlarged convolution kernel is as follows: increasing receptive fields from 3×3 to 5×5 or more and increasing ConvolutionalBlock Attention Module (CBAM) attention mechanisms enhances the feature extraction capability of the model.

When a complete pill picture corresponding to the incomplete pill is identified by using a template matching algorithm, setting a similarity threshold to be a certain value above 90%, and when the matching similarity reaches the threshold, considering that the pill matching is successful.

The method comprises the steps of designing a practical model embedded in small and medium-sized mobile equipment, combining multiple views with a two-dimensional model, integrating codes of the model into a main control end of the small and medium-sized mobile equipment, arranging an LCD screen and a camera on the equipment, scanning medicines through the camera, collecting medicine pictures, transmitting the medicine pictures back to the main control end, and sending the medicine pictures into the model for identification; the identified medicine name and the acquired medicine picture are correspondingly displayed on an LCD screen for an operator to check.

The beneficial effects of the invention are as follows:

1. the medicine with similar appearance and color is effectively classified, medical staff is assisted in sorting medicines, and the situation that life safety of patients is threatened due to medicine classification errors is reduced or even avoided.

2. The multi-view database is used for constructing a multi-view database to cope with complex situations of drug side placement, stacking and the like of multiple drugs in the drug dispensing process, and the multi-view database is combined with different data augmentation means to further improve the angle of a data set.

3. The data augmentation and the transfer learning make up for the overfitting caused by small data volume, and a practical model embedded into mobile equipment and small and medium-sized equipment is designed by combining the data augmentation and the transfer learning with a lightweight network, so that the method is convenient and practical and has wide application prospect.

4. The multi-view and the two-dimensional model are combined, so that the identification effect of the three-dimensional model is achieved to a certain extent, medicine identification randomly placed at multiple angles can be dealt with, meanwhile, the parameter number of the model is greatly reduced compared with that of the three-dimensional model, and conversion of actual products is easy.

5. An image recovery channel is designed for incomplete pills in actual pill distribution, so that the incomplete pills can be effectively identified.

6. The invention introduces a mixed attention CBAM module mechanism of channel attention and space attention, improves the characteristic extraction capacity of the network, extracts pill characteristics to the maximum extent and assists in drug identification.

Drawings

FIG. 1 is a schematic flow chart of an identification method of the present invention.

Fig. 2 is a schematic flow chart of a medicine identification convolutional neural network constructed by the invention.

Fig. 3 is a schematic diagram of correct drug duty ratio in the photographing rule of the present invention.

Fig. 4 is a schematic diagram of the wrong drug ratio in the shooting rule of the present invention.

Fig. 5 is a schematic view of a circular medicine multi-view photographing view angle. In the figure, 1 is a camera, and 2 is a medicine.

Fig. 6 to 8 are schematic diagrams of the photographing angles of non-axisymmetric medicine planes.

Fig. 9-10 are schematic views of vertical plane shooting angles.

Fig. 11 is a schematic diagram of the principle of the template matching algorithm.

Fig. 12-14 are schematic diagrams of example centrally symmetric drug template matching.

Fig. 15-17 are schematic diagrams of non-centrosymmetric drug template matching examples.

Fig. 18 is a schematic diagram of an image augmentation scheme.

Fig. 19 is a MobileNetv2 model framework.

FIG. 20 is a schematic diagram of receptive fields.

FIG. 21 is a schematic diagram of a CBAM attention mechanism model.

Fig. 22 MobileNetv2 inverted residual block model modification schematic.

FIG. 23 is a diagram showing training effects and training loss of accuracy versus iteration number in an embodiment of the present invention.

FIG. 24 is a diagram showing the test loss and iteration number of training effect and accuracy according to an embodiment of the present invention.

FIG. 25 is a diagram showing the accuracy and the number of iterations of the training effect and the accuracy test according to the embodiment of the present invention.

Detailed Description

Experimental environment

The hardware environment and the software environment used in this experimental example are shown in table 1:

TABLE 1

The invention adopts a lightweight model MobileNet v2 as an infrastructure, the MobileNet v2 is proposed by Google in 2018, and the innovation point of the model is Inverted Residuals and Linear Bottlenecks. The method aims at improving accuracy and reducing occupation of the memory. The whole model architecture is shown in Table 2

TABLE 2

Experimental data

The experimental data used pictures of 753 individual pills, for a total of 93 pill-like pictures. And collecting photographed high-definition JPG format pictures, wherein photographing rules are collected according to data set collection standards, and specific numbers and categories are shown in Table 3. TABLE 3 Table 3

Name of the name	Quantity of
		Chlorpheniramine maleate tablet (chlorphenamine 4 mg)	5
Metformin hydrochloride tablet (Guhua Zhi 1 g)	5
		Folic acid 5mg	5
Bei Xi (acarbose 50 mg)	5
		Red source up to 0.15g	5
Carbazochrome sodium sulfonate tablet (Lo Ye 5 mg)	4
		Nifedipine tablets 10mg	5
Nifedipine controlled release tablet 30mg (Xinran)	5
		Profen codeine sustained release tablet	5
Amlodipine besylate (pennies) 5mg	5
		Rosuvastatin calcium tablet (clonidine) 10mg	4
Bai Liang capsule 0.5g	4
		Finasteride tablet (quinine) 5mg	4
Irbesartan tablet (illida) 0.15g	5
		Ibuprofen sustained release capsule 0.3g	4
Meng Tesi Lu Na tablet (Shunning) 10mg	5
		Metformin sustained release tablet 0.5g	4
Nifedipine controlled release tablet (Baixin same) 30mg	4
		Linagliptin tablet (Outangning) 5mg	5
Engliflozin tablet (European Tang Jing) 10mg	5
		Calcitriol capsule (Luo Gaiquan) 0.25ug	4
Polyene phosphatidylcholine capsule (Yi Shanfu) 228mg	4
		Quetiapine fumarate sustained release tablet 200mg	4
Irbesartan hydrochlorothiazide tablet 150mg	5
		Tolvaptan tablet (Su Maika) 15mg	4
0.5g of adenosylmethionine butanesulfonate enteric-coated tablet (Simetate)	4
		Eszopiclone tablet (itannin) 3mg	4
Perindopril tert-butylamine tablet (minoxidil) 4mg	4
		Bacillus subtilis bigeminal live bacterium enteric capsule (metoprolol) 250mg	5
Valsartan amlodipineFlat (Baibang) 1 tablet	4
		Dapagliflozin tablet (Andapong) 10mg	5
Fluoxetine hydrochloride capsule (omaren) 20mg	4
		Fenofibrate capsule 200mg	4
Teprenone capsules (Shi Weishu) 50mg	4
		Ivabradine hydrochloride tablet (colant) 5mg	4
Isosorbide mononitrate sustained release tablet (Emulation) 60mg	4
		Dabigatran etexilate capsule (Tai Bi Quan) 150mg	4
Isosorbide mononitrate sustained release capsule 40mg	4
		Olmesartan medoxomil hydrochlorothiazide tablet (compound ao tan) 1 tablet	4
Moxifloxacin hydrochloride tablet 0.4g	4
		Alolol hydrochloride tablet (Almarol 10 mg)	4
Sertraline hydrochloride (vitamin D) 50mg	4
		Doxycycline hydrochloride tablet 0.1g	4
Irbesartan hydrochlorothiazide tablet (clenbuterol) 1 tablet	4
		Cefdinir capsule (tepu kang) 0.1g	4
Clopidogrel hydrogen sulfate tablet (Sichuang) 75mg	4
		Sulindac tablet (Pivot force reaches) 0.1g	4
Lercanidipine hydrochloride tablet 10mg	4
		Levofloxacin tablet (jidakang) 0.5g	4
Rivaroxaban tablet (Li Erban) 10mg	4
		Acetylcysteine (Rich in application) 0.6g	5
Eucalyptus lemon pinusEnteric soft capsule (Nao-cut) 0.3g	5
		Voriconazole tablet (pinacol) 50mg	7
Methylprednisolone tablet (Mei Zhuo Le) 4mg	8
		Silybin capsule (Water Lin Jia) 1 granule	5
Itraconazole capsule (spinornol) 1 granule	4
		Bulleyaconitine A tablet (Sefomet) 0.4 mg	5
Furanolazine fumarate tablet (Wook) 20mg	6
		Compound glycyrrhizin tablet (Mei Neng) 1 tablet	4
Sodium rosiglitazone tablet (tairo) 4mg	4
		Rifampicin capsules 0.15g	5
Itraconazole capsule (Yi kang) 0.1g	5
		Olopatadine hydrochlorideTablet (ao Hui Da) 5mg	5
Sitagliptin metformin tablet (minoxidil) 50mg	8
		Left thyroxine sodium tablet (Youjiale) 50ug	13
Nicergoline tablet (Le Xilin) 10mg	9
		Dioseltamine tablet (Ge Tai) 0.45g	14
Rivaroxaban tablet (beritol) 10mg	13
		Mycophenolate mofetil capsule (Ma Kexi) 0.25g	14
Cyclosporine soft capsule (Xinsaiping) 25mg	7
		Bicalutamide tablet (Kangshide) 1 granule	11
Letrozole (Furui) 2.5mg	14
		Sodium aescinate tablet (European style) 30mg	14
Valeric acidEstradiol tablet (Bujiale) 1mg	3
		Mesalazine enteric coated tablet (salafol) 0.5g	12
Paeonia total glycosides capsule (Pavlin) 0.3g	23
		Prucarbide succinate tablet (Ralisheng) 2mg	8
Ondansetron hydrochloride tablet (European scallop) 4mg	22
		Clarithromycin tablet (Clarithromycin) 1 granule	14
Sodium rabeprazole enteric-coated tablet (rebot) 10mg	7
		Rebamipide tablet (moxibusida) 0.1g	13
Piwei ammonium bromide tablet (Naite's) 50mg	7
		Omeprazole enteric-coated tablet (loxic) 10mg	26
Dydrogesterone tablet (Dafutong) 10mg	11
		Esomeprazole enteric-coated tablet (anti-letter) 20mg	32
Gliclazide tablet (Meidakang) 80mg	25
		Enteric-coated tablet of sulfasalazine (confidence) 0.25g	7
Levofloxacin tablet (colabi) 1 tablet	17
		Glimepiride tablet (Limussel apple) 2mg	10
Lamivudine tablet 0.1g	19
		Famciclovir tablet (cis-intravenous) 0.125g	17
Compound digestive enzyme capsule (Qian hong Yimei) 1 granule	22
		Sitagliptin phosphate tablet (Mentha arvensis) 100mg	14
Amoxicillin and clavulanate potassium dispersible tablet (Junlqing) 0.22g	15

See fig. 1.

The pill identification model is built based on MobileNetv2, and is a CNN model. The CNN medicine identification network is a model built based on MobileNetv 2.

The medicines are divided into complete pills and incomplete pills and enter a CNN medicine identification network. The incomplete pill is subjected to template matching, restored into an image of the complete pill, and then enters the recognition system. And outputting the name and the picture of the medicine after the identification system completes identification.

In the construction of the multi-view database, the shot medicine picture is adopted as the basic data of the data set. In the medicine picture shooting method, shooting rules and shooting angles are specified, and the shooting angles comprise plane placement shooting angles, non-axisymmetric medicine shooting angles, vertical plane shooting angles and special condition shooting angles.

Fig. 2 shows that after the multi-view database is built, data augmentation is performed on the built MobileNetv2 model. Data augmentation includes data for image flipping, image scaling, image miscut, image rotation, and image cropping. Firstly, performing pre-training transfer learning on an ImageNet public data set, and learning semantic expression of an edge information shallow layer; and finally, using the learned optimal weight in the iteration times set by the ImageNet pre-training to serve as the initialization weight of the drug identification network, so that the model is quickly converged to achieve a good identification effect. The identification step comprises the following steps:

1) Constructing a multi-view database;

2) Restoring the incomplete pill;

3) Data augmentation;

4) The model building is completed through pre-training transfer learning;

5) Outputting the medicine classification result and medicine name information.

The model building adopts a lightweight model MobileNetv2 as a basic framework, on the framework of the basic model, the size of a convolution kernel is enlarged, the receptive field of a convolution neural network is increased, and the feature extraction capability of the network is improved; in addition, in the inverted residual block of the model, a CBAM attention mechanism is introduced, and the CBAM attention mechanism can take into account two aspects of space (spatial) and channel (channel), so that compared with the mechanism focusing on only one aspect, the feature extraction capability of the model is further improved, the fine features of pills are extracted to the maximum extent, and the recognition of the pills in the later stage is facilitated. And outputting the processed data as required to obtain correct medicine classification result and medicine name.

Fig. 3 to 10 are schematic views of shooting rules according to the present invention. Fig. 3 is a correct medicine proportion illustration, fig. 4 is an incorrect medicine proportion illustration, in the medicine picture shooting process, medicines themselves account for 50-60% of the whole picture and cannot be too small, the shooting background adopts a solid color unified color background which is different from the medicine color, and if some medicines are the same as or similar to the background color, other solid color backgrounds are selected by the medicines independently; high quality is required for shooting and clear focusing is required.

See fig. 5-10.

When the shooting angle is plane placement shooting, 3 pictures are shot for the centrosymmetric medicine, and the pictures are shot at 180 degrees, 60 degrees, 30 degrees and other angles of the plane of the camera. When the shooting angle is non-axisymmetric medicine shooting, the medicine is shot by rotating a plurality of angles, and shooting is performed at intervals of about 30 degrees; if the medicine is symmetrical left and right, the medicine is rotated 180 degrees, and one medicine is shot every 30 degrees. When the shooting angle is vertical shooting, the shooting angle is rotated 180 degrees, and a total of 6 pictures are shot every 30 degrees. For special medicines without vertical surfaces, such as capsules, the medicines are only shot by a plane to rotate 180 degrees or 360 degrees. If the front and back sides of the medicine are inconsistent, a group of pictures are required to be shot according to the plane shooting rules on the front and back sides.

See fig. 12-17.

Fig. 12 to 14 are schematic diagrams of examples of centrally symmetric drug template matching.

Fig. 12 is a complete pill and fig. 13 is a half of the incomplete pill of fig. 12, which has been restored to a complete pill consistent with fig. 12 by center-symmetrical drug template matching.

Fig. 15 is a complete medicine, fig. 16 is a half incomplete pill of fig. 15, after center-symmetrical medicine template matching, fig. 17 is a picture schematic of the complete pill of fig. 16 being reduced.

The data augmentation of this embodiment is shown in fig. 18.

Based on the characteristic of fewer medical data sets, the data preprocessing part integrates a data augmentation means of a model. Data augmentation is also known as data augmentation, i.e., letting limited data produce a value equivalent to more data without substantially augmenting the data. In this embodiment, if the resolution of the picture input by the network is 256×256, we use a method of randomly clipping 224×224, and one picture can generate at most 32×32 different pictures, and the data size is expanded by approximately 1000 times. All data augmentation is done by default at the image center point at the time of operation from a mathematical point of view, the operation is divided into the following steps: 1) Firstly, moving the rotation point to the original point; 2) Performing a rotation about an origin; 3) And moving the rotation point back to the original position.

Assume that the original coordinates of the image are

The coordinates after translation are +.>

image translation: translation means that all pixels are in

And->

、/>

Representation->

Distance moved in direction:

the vertically flipped transform matrix is:

Representing the scaled size;

is the angle converted in the x and y directions;

consistent with the common practice of deep learning cropping, this embodiment enlarges the image 1.1 times that of the original image when cropping the image, and then performs random scale cropping operation on the enlarged image.

Fig. 19 is a MobileNetv2 model framework.

The invention adopts a lightweight model MobileNet v2 as an infrastructure, the MobileNet v2 is proposed by Google in 2018, and the innovation point of the model is Inverted Residuals and Linear Bottlenecks. The method aims at improving accuracy and reducing occupation of the memory.

For this model, the number of channels increases with time and the spatial dimensions correspondingly decrease. Overall, however, the tensor remains relatively small due to the bottleneck layer that constitutes the connection between the blocks.

In order to improve the feature extraction capability of the model, the method enlarges the size of a convolution kernel, increases the receptive field of a convolution neural network and improves the extra extraction capability of the network on the framework of a basic model. This embodiment increases the receptive field from 3×3 to 5×5 or more. As shown in fig. 20.

CBAM attention mechanisms have been added to further enhance feature extraction capabilities. As shown in fig. 21.

Convolutional BlockAttention Module (CBAM) represents the attention mechanism module of a convolution module, which is a kind of attention mechanism module that combines space (spatial) and channel (channel). For pill identification according to the present invention, because pill identification is a fine-grained identification problem, a network with relatively high feature extraction capability is required to identify subtle differences between pills. The attention module in the CBAM allows the network to place attention in areas of interest for pill identification without excessive learning such as background disturbance of the characteristics of the areas, thus maximizing the model to extract pill characteristics to identify nuances between pills. Meanwhile, because the CBAM is a lightweight general module, the module spending can be ignored and seamlessly integrated into any CNN architecture, and the module spending can be used for end-to-end training together with the basic CNN, and the lightweight module is added, so that the overall performance of the model is improved, the size of the model is not excessively increased, and the method is beneficial for embedding mobile equipment in the later period.

Fig. 22 is a diagram of a modification of the MobileNetv2 inverted residual block model.

The figure shows that the channel expansion is performed first, then the channel reduction is performed, and a 1×1 "expansion" layer (PW) is added before the Depth-wiseconvolution (DW), so as to increase the channel number, and obtain more features, namely: "dilation" (PW) → "convolution characterization" (DW) → "compression" (PW).

The combination transformation adopts a combination of a plurality of augmentation modes. The data augmentation in deep learning generally adopts a combination of a plurality of augmentation modes, and matrix multiplication operation is involved, and according to the operation rule, the different combination sequence results are known to be different, namely, the data augmentation in linear algebra

Of course, the specific examples are excluded. For better explanation, assume a given translation transformation matrix +.>

Rotation matrix->

Scaling matrix->

. In this embodiment we present two different combination transformations.

For the combination transform one, its combined matrix is as follows:

；

for the combination transform two, the matrix after combination is as follows:

。

the data augmentation mode can not only prevent the model from being fitted excessively, but also make up for the condition that the image data of certain angles are not acquired due to insufficient comprehensive acquisition method in the database establishment process, and further improves the multi-latitude database.

In this example, mobileNetv2 was used as the basic framework for pill recognition, epoch was set to 500, batch_size was set to 16, and learning rate was designed to 0.001. If the training accuracy is not reduced within 5 epochs, the learning rate is reduced by 10%.

The training effect and accuracy are shown in fig. 23. The figure can be seen: as the number of iterations increases, the loss of training sets gradually decreases, converging to a certain range.

The test loss and the number of iterations are shown in the graph of fig. 24, which can be seen: as the number of iterations increases, the loss of the test set gradually decreases, converging to a certain range.

The test accuracy and iteration number of the training effect and accuracy are shown in fig. 25. The figure can be seen: with the increase of the iteration times, the precision of the test set gradually rises and finally stabilizes in a certain range.

Claims

1. A deep learning oral pill identification method based on multi-view and data expansion is characterized in that: constructing a database by adopting a multi-view and data augmentation method, and perfecting a data set at multiple angles; designing a practical model embedded in small and medium-sized mobile equipment by using a lightweight network; combining the multiple views with the two-dimensional model, completing the construction of a practical model after transfer learning, correctly identifying the oral pills and issuing the oral pills to corresponding patients through mobile equipment and small and medium-sized equipment; establishing an incomplete oral pill identification channel, recovering the incomplete pill, and enhancing the practicability of the model;

the method comprises the following specific steps of:

1) Constructing a multi-view database;

2) Restoring the incomplete pill;

3) Data augmentation;

5) Outputting medicine classification results and medicine name information;

the recovery of the incomplete pills is to identify the complete pill picture corresponding to the incomplete pill by using a template matching algorithm aiming at half pills or even 1/4 metered incomplete pills appearing in the actual situation, and send the identified corresponding complete pill into a built convolutional neural network pill identification model;

the pre-training transfer learning is to perform pre-training on an ImageNet public data set firstly, and learn the semantic expression of the edge information shallow layer of the image Net public data set; finally, the optimal weight learned in the iteration times set in the ImageNet pre-training is used for initializing the convolutional neural network pill recognition model, so that the model is quickly converged to achieve a better recognition effect;

the model building adopts a lightweight model MobileNetv2 as a basic framework, and on the framework of the basic model, the size of a convolution kernel is enlarged, and the receptive field of the convolution neural network is increased; meanwhile, a mixed attention CBAM module mechanism of channel attention and space attention is introduced, the feature extraction capacity of a network is improved, pill features are extracted to the maximum extent, drug identification is assisted, processed data are output, and a correct drug classification result and a drug name are output according to requirements.

2. The multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: in the medicine picture shooting rules, medicines per se account for 50-60% of the whole picture, the shooting background cannot be too small, the shooting background adopts a solid color unified color background which is different from the medicine color, and if some medicines are the same as or similar to the background color, other solid color backgrounds are selected by the medicines independently; high quality is required for shooting, and clear focusing is required;

when the shooting angle is non-axisymmetric medicine shooting, the medicine is shot by rotating a plurality of angles, and shooting is performed at intervals of about 30 degrees; if the medicine is bilaterally symmetrical, the medicine is rotated 180 degrees, and one medicine is shot every 30 degrees;

3. The multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: the method comprises the steps that a template matching algorithm is used for identifying a complete pill picture corresponding to a defective pill, template matching is a method for searching a specific target in an image, whether each part is similar to a template or not is compared by traversing each possible position in the image, and when the similarity is high enough, the matching is considered to be successful;

the algorithm comprises the following steps:

1) Determining the length and width x and y of the current picture;

2) Determining the length and width w and h of the template picture;

4. The multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: in the data augmentation, image translation, image overturning, image rotation, image scaling, image miscut, image clipping and combination transformation are completed at a central point from a mathematical angle; the method comprises the following steps: 1) Firstly, moving the rotation point to the original point; 2) Performing a rotation about an origin; 3) Moving the rotation point back to the original position;

assume that the original coordinates of the image are

The coordinates after translation are +.>

image translation: translation means that all pixels are in

And->

The directions are translated and the mathematical matrix corresponding to the translation transformation is as follows:

、/>

representation->

Distance moved in direction:

the vertically flipped transform matrix is:

Representing the scaled size;

is the angle converted in the x and y directions;

Rotation matrix->

Scaling matrix->

，

For the combination transform one, its combined matrix M is as follows:

；

。

5. the multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: on the framework of a lightweight model MobileNetv2 basic model, the size of the enlarged convolution kernel is as follows: increasing the receptive field from 3×3 to 5×5; meanwhile, a CBAM attention mechanism is added in the inverted residual block, so that fine features of pills are extracted to the maximum extent from channel attention and space attention, and the feature extraction capability of a model is enhanced.

6. The multiple view and data expansion based deep learning oral pill identification method of claim 1, wherein: the method comprises the steps of designing a practical model embedded in small and medium-sized mobile equipment, combining multiple views with a two-dimensional model, integrating codes of the model into a main control end of the small and medium-sized mobile equipment, arranging an LCD screen and a camera on the equipment, scanning medicines through the camera, collecting medicine pictures, transmitting the medicine pictures back to the main control end, and sending the medicine pictures into the model for identification; the identified medicine name and the acquired medicine picture are correspondingly displayed on an LCD screen for an operator to check.

7. A multiple view and data extension based deep learning oral pill recognition method according to claim 3, wherein: when a complete pill picture corresponding to the incomplete pill is identified by using a template matching algorithm, setting a similarity threshold to be a certain value above 90%, and when the matching similarity reaches the threshold, considering that the pill matching is successful.