CN114937176A

CN114937176A - Medicine real-time identification method and system based on deep learning

Info

Publication number: CN114937176A
Application number: CN202210560976.9A
Authority: CN
Inventors: 张锋; 韩幸; 李智勇; 张鹏
Original assignee: Hunan Shenfan Technology Co ltd
Current assignee: Hunan Shenfan Technology Co ltd
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-08-23

Abstract

The invention discloses a medicine real-time identification method and a medicine real-time identification system based on deep learning, wherein a first image collected by a first camera and a second image collected by a second camera are loaded to respectively process the loaded first image and the loaded second image so as to obtain a medicine and character detection result; fusing the obtained medicine classification and the medicine area, and outputting a classification confidence based on appearance and the fused medicine area; performing text region segmentation according to the acquired text region and the output fused medicine region; respectively carrying out character recognition on the first image and the second image after the text region is cut, and recognizing character information cut in the text region; and fusing the output classification confidence degree based on the appearance and the recognized character information, and finally outputting the recognition result based on the appearance and the character information. The invention can realize the real-time checking of the prescription drugs in the dispensing link, increase the checking precision and reduce the burden of the working personnel.

Description

Medicine real-time identification method and system based on deep learning

Technical Field

The invention relates to the technical field of medical treatment, and particularly discloses a medicine real-time identification method and system based on deep learning.

Background

With the development of medical technology and the increase of urban population, the types and the quantity of injection medicines required to be prepared by hospitals every day are huge, and the identification and the checking work of the medicine preparation link face great pressure. The existing medicine identification mode mainly depends on the manual identification and check of the medicine types of medical staff. At present, medical staff in hospitals have large workload, and the unreasonable site configuration causes the disordered management of the pharmacy, thereby greatly increasing the recognition error rate. The difficulty in identifying medicines is aggravated by the fact that medicines of different manufacturers have a plurality of medicines with similar appearances and distinct functions. As the last pass of medication administration, the drug identification check work is extremely important.

Deep learning is used as the core force of a new technological revolution and industrial change, the traditional industry is promoted to be upgraded and updated, the rapid development of 'unmanned economy' is driven, and positive effects are generated in the civil fields of intelligent transportation, intelligent home, intelligent medical treatment and the like. At present, a target detection network based on a deep neural network can accurately position and identify targets in thousands of different natural scenes, and meanwhile, a scene text detection and identification network based on deep learning can position and identify characters at a speed close to real time. By adopting the medicine real-time identification method based on deep learning, the work efficiency of medicine identification can be greatly improved, the work complexity is reduced, the work intensity is reduced, the reliability of medicine identification and checking links is improved, and the psychological pressure of medical staff is relieved.

At present, most intelligent medicine identification methods mainly rely on taking pictures of medicines and then identifying the types of the medicines by means of appearance information. For example, patent CN113837070 mainly relies on appearance information to process medicine pictures, so as to realize inspection and classification of medicines. However, the following two problems exist in the method of taking a picture of a medicine by photographing and then identifying the medicine by using appearance information. Firstly, need operating personnel to put the manual operation behind the good medicine and shoot, increased staff's burden, discernment speed can't satisfy the requirement of real-time. Secondly, the medicines identified by the appearance information are easy to misjudge the medicines with similar appearances, and the identification precision can not meet the actual requirement.

Therefore, the above-mentioned defects of the existing drug identification methods are a technical problem to be solved.

Disclosure of Invention

The invention provides a medicine real-time identification method and system based on deep learning, and aims to solve the technical problems of the defects of the existing medicine identification method.

One aspect of the invention relates to a medicine real-time identification method based on deep learning, which comprises the following steps:

respectively loading a first image collected by a first camera and a second image collected by a second camera;

respectively processing the loaded first image and the loaded second image by using a pre-trained model, and respectively obtaining medicine and character detection results in the first image and the second image, wherein the medicine and character detection results comprise medicine classification, a medicine area and a text area;

fusing the obtained medicine classification and the medicine area, and outputting a classification confidence based on appearance and the fused medicine area;

performing text region segmentation on the first image and the second image according to the acquired text region and the output fused medicine region;

respectively carrying out character recognition on the first image and the second image after the text region is cut by utilizing a pre-trained model, and recognizing character information in the first image and the second image after the text region is cut;

and fusing the output classification confidence degree based on the appearance and the recognized character information, and finally outputting the recognition result based on the appearance and the character information.

Further, the step of loading the first image captured by the first camera and the second image captured by the second camera respectively comprises:

training a model in a training stage, wherein the model comprises a medicine and character detection network model and a character recognition network model;

the steps of training the medicine and character detection network model comprise:

cutting an image to be trained into sample images with consistent sizes, and loading related labels;

carrying out horizontal or turning action on the sample image loaded with the related label to carry out data expansion work;

a pre-trained model loaded on the COCO data set;

in the training process, different data enhancement modes are used for increasing data quantity, and the medicine and character detection network model is trained on a training set until convergence;

the step of training the character recognition network model comprises:

synthesizing text pictures of common medicines by using a character synthesis program to increase the training data volume;

loading a pre-training model trained on a data set;

cutting out a text recognition training set in the real training set according to the text label, and combining the text recognition training set with the synthesized drug name data set;

and training the character recognition network model in the combined text recognition training set until convergence.

Further, the step of loading the first image collected by the first camera and the second image collected by the second camera respectively comprises:

the real-time medicine identification video that acquires the video acquisition box collection, the inner wall of video acquisition box adopts the white material preparation of non-reflection of light to form, be equipped with the strip light source that is used for providing stable illumination condition and the lid fits video acquisition box top and is used for preventing that strip light source's light from penetrating operating personnel's eyes and causing uncomfortable shading lid on the upper portion of video acquisition box, the bottom of video acquisition box is equipped with the medicine and places the district, first camera and second camera symmetry are installed and are placed the district both sides and the half of the average height of highly being the medicine commonly used of installation in the medicine.

Further, the step of fusing the acquired drug classifications and drug regions and outputting a classification confidence based on appearance and fused drug regions comprises:

calculating the intersection and the parallel ratio of all the detection frames to obtain the association relationship of each medicine in the first image and the second image;

if the object with the intersection ratio larger than the preset intersection threshold value and the unified class is identified, the object is regarded as the same object, and finally the classification confidence coefficient based on the appearance is the product of the classification confidence coefficients of the first image and the second image and the same medicine.

Further, the step of fusing the output classification confidence based on the appearance and the recognized character information and finally outputting the recognition result based on the appearance and the character information comprises the following steps:

confidence of detection appearance classification P _i And a character recognition confidence level;

let p be _i Confidence of i characters, if p _i If the value is larger than alpha, the character prediction result is considered to be correct; wherein J is the correctly predicted character set, the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely

Final result P ═ P _t ^β ×P _i ^1-β Alpha is a preset character confidence coefficient threshold value, and beta is a confidence coefficient.

Another aspect of the present invention relates to a deep learning-based drug real-time identification system, comprising:

the video acquisition module is used for respectively loading a first image acquired by the first camera and a second image acquired by the second camera;

the medicine and character detection module is used for processing the loaded first image and the loaded second image respectively by utilizing a pre-trained model and respectively acquiring medicine and character detection results in the first image and the second image, wherein the medicine and character detection results comprise medicine classification, a medicine area and a text area;

the detection result fusion module is used for fusing the acquired medicine classification and the medicine area and outputting a classification confidence degree based on appearance and the fused medicine area;

the text region cutting module is used for cutting the text regions of the first image and the second image according to the acquired text regions and the output fused medicine regions;

the character recognition module is used for respectively carrying out character recognition on the first image and the second image after the text region is cut by utilizing a pre-trained model, and recognizing character information in the first image and the second image after the text region is cut;

and the recognition result fusion module is used for fusing the output classification confidence degree based on the appearance and the recognized character information and finally outputting a recognition result based on the appearance and the character information.

Further, the deep learning based drug real-time identification system further comprises:

the training module is used for training a model in a training stage, and the model comprises a medicine and character detection network model and a character recognition network model;

the training module comprises a medicine and character detection network model training module and a character recognition network model training module,

the medicine and character detection network model training module comprises:

the cutting unit is used for cutting the image to be trained into sample images with consistent sizes and loading related labels;

the data expansion unit is used for carrying out horizontal or turning actions on the sample image loaded with the related annotations so as to carry out data expansion work;

the device comprises a first loading unit, a second loading unit and a control unit, wherein the first loading unit is used for loading a pre-training model on a COCO data set;

the first training unit is used for increasing the data volume by using different data enhancement modes in the training process and training the medicine and character detection network model on a training set until convergence;

the character recognition network model training module comprises:

the synthesis unit is used for synthesizing the text pictures of the common medicines by using a character synthesis program so as to increase the training data volume;

the second loading unit loads the pre-training model trained in other large text recognition data sets;

the merging unit is used for cutting out a text recognition training set from the real training set according to the text label and merging the text recognition training set with the synthesized drug name data set;

and the second training unit is used for training the character recognition network model in the combined text recognition training set until convergence.

Further, the video capture module includes:

the video acquires the unit, a real-time medicine discernment video for acquireing the video acquisition box and gathering, the inner wall of video acquisition box adopts the white material preparation of non-reflection of light to form, be equipped with the strip light source that is used for providing stable illumination condition and the lid fits video acquisition box top and is used for preventing that strip light source's light from penetrating operating personnel's eyes and causing uncomfortable shading lid, the bottom of video acquisition box is equipped with the medicine and places the district, first camera and second camera symmetry are installed and are placed the half of the average height of district both sides and installation for the medicine commonly used in the medicine.

Further, the detection result fusion module comprises:

the calculation unit is used for calculating the intersection ratio of all the detection frames to acquire the association relation of each medicine in the first image and the second image;

and the object identification unit is used for identifying the same object if the identified object is identified as the unified object of which the intersection ratio is greater than a preset intersection threshold, and finally the classification confidence coefficient based on the appearance is the product of the classification confidence coefficients of the first image and the second image for the same medicine.

Further, the recognition result fusion module comprises:

a detection unit for detecting an appearance classification confidence P _i And a character recognition confidence level;

a character recognition unit for assuming p _i Confidence of i characters, if p _i If alpha is greater than alpha, the character prediction result is considered to be correct; wherein J is the correctly predicted character set, the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely

The beneficial effects obtained by the invention are as follows:

the invention provides a medicine real-time identification method and a medicine real-time identification system based on deep learning, wherein a first image acquired by a first camera and a second image acquired by a second camera are loaded respectively; respectively preprocessing the loaded first image and the loaded second image by utilizing a pre-trained model, and respectively acquiring medicine and character detection results in the first image and the second image, wherein the medicine and character detection results comprise medicine classification, a medicine area and a text area; fusing the obtained medicine classification and the medicine area, and outputting a classification confidence based on the appearance and the fused medicine area; performing text region segmentation on the first image and the second image according to the acquired text region and the output fused medicine region; respectively carrying out character recognition on the first image and the second image after the text region is cut by utilizing a pre-trained model, and recognizing character information in the first image and the second image after the text region is cut; and fusing the output classification confidence degree based on the appearance and the recognized character information, and finally outputting the recognition result based on the appearance and the character information. According to the medicine real-time identification method and system based on deep learning, provided by the invention, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by approximately placing the medicines in the designated area by an operator, and a plurality of medicines can be identified at one time; by combining a prescription two-dimensional code recognition system, real-time checking of prescription drugs in a dispensing link can be realized, checking precision is increased, and workload of workers can be relieved.

Drawings

FIG. 1 is a schematic flow chart of a first example of a deep learning-based real-time drug identification method provided by the present invention;

FIG. 2 is a schematic flow chart of a second example of a deep learning-based real-time drug identification method provided by the present invention;

FIG. 3 is a flowchart illustrating a refinement of a first example of the step of training the model during the training phase shown in FIG. 2;

FIG. 4 is a flowchart illustrating a refinement of a second example of the step of training the model during the training phase shown in FIG. 2;

FIG. 5 is a schematic structural diagram of an example of a target detection network in the deep learning-based real-time drug identification method according to the present invention;

FIG. 6 is a schematic perspective view of an example of a video capture box in the method for real-time drug identification based on deep learning according to the present invention;

FIG. 7 is a flowchart illustrating a detailed process of an example of the step shown in FIG. 1 of fusing the obtained drug classifications and drug regions and outputting an appearance-based classification confidence and fused drug region;

FIG. 8 is a functional block diagram of a first embodiment of a deep learning-based real-time drug identification system according to the present invention;

FIG. 9 is a functional block diagram of a second embodiment of a deep learning-based real-time drug identification system provided by the present invention;

FIG. 10 is a functional block diagram of a first embodiment of the training module shown in FIG. 9;

FIG. 11 is a functional block diagram of a second embodiment of the training module shown in FIG. 9;

FIG. 12 is a functional block diagram of an example of the test result fusion module shown in FIG. 8;

fig. 13 is a functional block diagram of an example of the recognition result fusion module shown in fig. 8.

The reference numbers illustrate:

10. a video acquisition module; 20. a medicine and character detection module; 30. a detection result fusion module; 40. a text region cutting module; 50. a character recognition module; 60. a recognition result fusion module; 70. a training module; 71. a cutting unit; 72. a data expansion unit; 73. a first loading unit; 74. a first training unit; 75. a synthesis unit; 76. a second loading unit; 77. a merging unit; 78. a second training unit; 31. a calculation unit; 32. an object recognition unit; 61. a detection unit; 62. a character recognition unit; 100. a strip light source; 200. a light-shielding cover; 300. a drug placement area; 400. a first camera; 500. and a second camera.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

As shown in fig. 1, a first embodiment of the present invention provides a method for identifying a medicine in real time based on deep learning, comprising the following steps:

and S100, respectively loading a first image collected by a first camera and a second image collected by a second camera.

And acquiring a real-time medicine identification video acquired by the video acquisition box. Referring to fig. 6, the video capture box is in a high-front-low posture for convenient operation. The inner wall of the video acquisition box is made of a non-reflective white material, the upper portion of the video acquisition box is provided with a strip-shaped light source 100 for providing stable illumination conditions and a shading cover 200 which covers the upper portion of the video acquisition box and is used for preventing light of the strip-shaped light source 100 from directly irradiating eyes of an operator to cause discomfort, the bottom of the video acquisition box is provided with a medicine placing area 300, and a first camera 400 and a second camera 500 are symmetrically installed on two sides of the medicine placing area 300 and are half of the average height of common medicines. The video acquisition box is required to control the video acquisition environment on the premise of convenient operation, so that the video acquisition environment is relatively stable, and the robustness of medicine identification is improved.

Step S200, the loaded first image and the loaded second image are respectively processed by utilizing a pre-trained model, and medicine and character detection results in the first image and the second image are respectively obtained, wherein the medicine and character detection results comprise medicine classification, a medicine area and a text area.

The training data set preparation mainly comprises three aspects: firstly, collecting training pictures, secondly, marking the training pictures, and thirdly, segmenting the data set.

(1) Training pictures are collected. Firstly, two high definition cameras, namely a first camera 400 and a second camera 50, are adopted to continuously collect the video of the medicine placing area by the operator in the whole process according to different combinations, different directions and different positions of the medicine placing area in a random and horizontal arrangement mode. Secondly, after all the medicines are collected, the medicine bottles and the hands are detected and identified on the whole video by using the medicine bottle and hand detection models trained in the public data set. Images with medicine bottles but without hands are selected as training images according to certain intervals. This approach has two advantages. Firstly, the collection mode of the training pictures is completely the same as the mode of the operator when using the system to identify the drugs, and the condition that the pictures are not consistent during training and reasoning is avoided. Secondly, the mode of video acquisition is adopted, manual intervention is not needed for shooting, the acquisition speed is high, and the workload is low.

(2) And marking a training picture. The labeling picture is not only a surrounding frame and a type of the medicine to be labeled, but also a text box and characters of a visible part of the medicine name to be labeled. The label box is to tightly surround all the medicine areas or the text area of the medicine name, and the type name of the text box is the character of the visible part of the text.

(3) And (4) segmenting the data set. And dividing the marked training set into a training set and a verification set according to a certain proportion. The training set is mainly used for training the model, and the verification set is mainly responsible for verifying the effect of the model and guiding the adjustment of relevant parameters.

The processing of the loaded first image and the loaded second image mainly comprises three tasks: the first is to detect the location of the drug. Secondly, the type of the medicine is identified according to the appearance information, and the appearance classification confidence is output. And thirdly, outputting the position of the medicine name text box and providing a text slice for text recognition. In this example, a general purpose real-time object detection network is used, such as YOLOX (YoloX: experienting yolo series in 2021: arXiv preprinting arXiv: 2107.08430.2021.). The structure of the network is shown in fig. 5, and the expression capability of the model is increased by adopting data expansion modes such as Mosaic and MixUp in the training process. All text boxes are treated as a class of targets for the network to locate the text box.

And S300, fusing the acquired medicine classification and the medicine area, and outputting a classification confidence degree based on the appearance and the fused medicine area.

The detection recognition results from the two cameras, i.e., the first camera 400 and the second camera 500, are fused, and the classification confidence based on the appearance is output. The first image and the second image respectively acquired by the first camera 400 and the second camera 500 are processed simultaneously in the inference process, and because the first camera 400 and the second camera 500 are symmetrically installed, the position of the medicine in the image can be roughly aligned by horizontally turning the second image acquired by the second camera, and then the association relationship of each medicine in the first image and the second image is obtained by calculating the IOU (Intersection) of all detection frames. And regarding the objects with the IOU larger than 0.5 and the objects in the uniform category as the same objects, and finally, the classification confidence coefficient based on the appearance is the product of the classification confidence coefficients of the same medicine in the two images.

And S400, performing text region segmentation on the first image and the second image according to the acquired text region and the output fused medicine region.

And cutting text regions in the first image and the second image according to the detection result to be used as input of a character recognition module. And finding out a text box inside the same medicine area in the first image and the second image according to the fused detection result, associating the text box with the medicine type, cutting out a text slice in the original image according to the position of the text box, and normalizing the height of the slice to 32 pixels.

Step S500, respectively carrying out character recognition on the first image and the second image after the text region is cut by using a pre-trained model, and recognizing character information in the first image and the second image after the text region is cut.

And the characters of the medicine name are identified, so that the accuracy of the whole medicine identification is further improved. And (4) for text region slices output by the detection result fusion module, the heights of all the slices are normalized to 32 pixels, and the recognized Chinese and English characters and confidence are output. The word recognition module may employ the CRNN algorithm (An End-to-End convertible Neural Network for Image-based Sequence recognition. IEEE Transactions on Pattern Analysis and Machine understanding 2017).

And S600, fusing the output classification confidence degree based on the appearance and the recognized character information, and finally outputting a recognition result based on the appearance and the character information.

And fusing the appearance and the character recognition information to output a final recognition result of the medicine. Appearance classification confidence P output by the detection result fusion module _i And the character recognition confidence coefficient output by the character recognition module. For arbitrarily placed drugs, the drug name may appear in the images acquired by the two cameras, so that the character recognition results from the two pictures need to be fused. Let p be _i Confidence of i characters, if p _i If the character recognition confidence coefficient is larger than alpha, the character prediction result is considered to be correct, J is a character set which is correctly predicted, wherein the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters

Final result P ═ P _t ^β ×P _i ^1-β Where β is the confidence coefficient in order to balance the impact of appearance and text recognition on the final confidence.

According to the medicine real-time identification method based on deep learning, a first image collected by a first camera and a second image collected by a second camera are loaded respectively; respectively processing the loaded first image and the loaded second image by using a pre-trained model, and respectively acquiring medicine and character detection results in the first image and the second image, wherein the medicine and character detection results comprise medicine classification, a medicine area and a text area; fusing the obtained medicine classification and the medicine area, and outputting a classification confidence based on the appearance and the fused medicine area; performing text region segmentation on the first image and the second image according to the acquired text region and the output fused medicine region; respectively carrying out character recognition on the first image and the second image after the text region is cut by utilizing a pre-trained model, and recognizing character information in the first image and the second image after the text region is cut; and fusing the output classification confidence degree based on the appearance and the recognized character information, and finally outputting the recognition result based on the appearance and the character information. According to the medicine real-time identification method based on deep learning, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by placing the medicines in the designated area approximately by an operator, and a plurality of medicines can be identified at one time; by combining the prescription two-dimensional code recognition system, the prescription drugs in the dispensing link can be checked in real time, the checking precision is increased, and the burden of workers can be reduced.

Further, please refer to fig. 2, fig. 2 is a schematic flowchart of a second example of the real-time drug identification method based on deep learning according to the present invention, and on the basis of the first example, the real-time drug identification method based on deep learning according to the present example includes, before step S100:

step S100A, training a model in a training phase, wherein the model comprises a medicine and character detection network model and a character recognition network model.

The whole system is divided into two stages of training and reasoning. In the training phase, two networks are mainly trained, namely a medicine and character detection network and a character recognition network.

Referring to fig. 3, fig. 3 is a schematic flow chart of a network for training drug and text detection, and step S100A includes:

and step S110a, cutting the image to be trained into sample images with consistent sizes, and loading related labels.

And cutting the image into sample images with consistent sizes so as to facilitate network processing, and loading the images and related labels.

And step S120a, performing horizontal or turning action on the sample image loaded with the related annotation to perform data expansion work.

And performing data expansion work such as horizontal or turning Mosaic and MixUp on the image.

Step S130a, load the pre-trained model on the COCO dataset.

Step S140a, increasing data volume by using different data enhancement modes in the training process, and training the medicine and character detection network model on the training set until convergence.

And training the character detection network model on the training set until convergence.

Referring to fig. 4, fig. 4 is a schematic flowchart of the process of training the character recognition network, and step S100A includes:

step S110b, synthesizing text pictures of the common medicines by using a text synthesis program to increase the amount of training data.

Text pictures of commonly used drugs are synthesized using a text synthesis program to increase the amount of training data.

Step S120b, load pre-trained models trained on other large text recognition data sets.

The pre-trained model trained in the data set Syn90k was loaded.

Step S130b, cutting out a text recognition training set in the real training set according to the text labels, and combining the text recognition training set with the synthesized drug name data set.

And cutting out a text recognition training set in the real training set according to the text label, and merging the data set with the synthesized drug name data set.

And step S140b, training the character recognition network model in the combined text recognition training set until convergence.

And training the character recognition network model until convergence.

In the medicine real-time identification method based on deep learning, the loaded first image and the loaded second image are respectively preprocessed by utilizing the pre-trained model through training the medicine and character detection network and the character identification network, and the medicine and character detection results in the first image and the second image are respectively obtained. According to the medicine real-time identification method based on deep learning, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by placing the medicines in the designated area approximately by an operator, and a plurality of medicines can be identified at one time; by combining a prescription two-dimensional code recognition system, real-time checking of prescription drugs in a dispensing link can be realized, checking precision is increased, and workload of workers can be relieved.

Preferably, please refer to fig. 7, fig. 7 is a detailed flowchart of an example of step S300 shown in fig. 1, in this example, step S300 includes:

step S310, calculating the intersection ratio of all the detection frames to acquire the association relationship between the first image and the second image of each medicine.

Calculating IOU (Intersection ratio) of all detection frames to obtain the association relationship of each medicine in the first image and the second image.

Step S320, if the intersection ratio is larger than P _i And if the preset intersection threshold is an object of a uniform class, the object is regarded as the same object, and finally the classification confidence coefficient based on the appearance is the product of the classification confidence coefficients of the first image and the second image for the same medicine.

And regarding the objects with the IOU larger than 0.5 and the objects in the uniform category as the same objects, and finally, the classification confidence coefficient based on the appearance is the product of the classification confidence coefficients of the same medicine in the two images.

In the medicine real-time identification method based on deep learning provided by the embodiment, the association relation of each medicine between the first image and the second image is obtained by calculating the intersection ratio of all detection frames; if the intersection ratio is larger than P _i If the preset intersection threshold value is the objects of the uniform category, the objects are regarded as the same object, and finally the classification confidence based on the appearance is realizedThe degree is the product of the classification confidence of the same drug in the first image and the second image. According to the medicine real-time identification method based on deep learning, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by placing the medicines in the designated area approximately by an operator, and a plurality of medicines can be identified at one time; by combining a prescription two-dimensional code recognition system, real-time checking of prescription drugs in a dispensing link can be realized, checking precision is increased, and workload of workers can be relieved.

Further, in the method for identifying a medicine in real time based on deep learning provided by this example, step S600 includes:

step S610, detecting an appearance classification confidence coefficient and a character recognition confidence coefficient.

And fusing the appearance and the character recognition information to output a final recognition result of the medicine. Appearance classification confidence P output by detection result fusion module _i And the character recognition confidence coefficient output by the character recognition module. For arbitrarily placed drugs, the drug name may appear in the images acquired by the two cameras, so that the character recognition results from the two pictures need to be fused.

Step S620, assume p _i Confidence of i characters, if p _i If alpha is greater than alpha, the character prediction result is considered to be correct; wherein J is the correctly predicted character set, the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely

Let p be _i Confidence of i characters, if p _i If the character recognition confidence coefficient is more than alpha, the character prediction result is considered to be correct, J is a character set which is correctly predicted, wherein the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the proportion of the correctly predicted characters and the confidence of all the correctly predicted charactersThe product of degrees is

The medicine real-time identification method based on deep learning provided by the embodiment detects the appearance classification confidence coefficient and the character identification confidence coefficient; let p be _i Confidence for i characters, if p _i If alpha is greater than alpha, the character prediction result is considered to be correct; wherein J is the correctly predicted character set, the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely

Final result P ═ P _t ^β ×P _i ^1-β Alpha is a preset character confidence coefficient threshold value, and beta is a confidence coefficient. According to the medicine real-time identification method based on deep learning, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by placing the medicines in the designated area approximately by an operator, and a plurality of medicines can be identified at one time; by combining the prescription two-dimensional code recognition system, the prescription drugs in the dispensing link can be checked in real time, the checking precision is increased, and the burden of workers can be reduced.

As shown in fig. 8, fig. 8 is a functional block diagram of a first example of a medicine real-time identification system based on deep learning according to the present invention, in this example, the medicine real-time identification system based on deep learning includes a video capture module 10, a medicine and text detection module 20, a detection result fusion module 30, a text region cutting module 40, a text recognition module 50, and a recognition result fusion module 60, where the video capture module 10 is configured to load a first image captured by a first camera and a second image captured by a second camera, respectively; the medicine and character detection module 20 is configured to process the loaded first image and the loaded second image respectively by using a pre-trained model, and obtain medicine and character detection results in the first image and the second image respectively, where the medicine and character detection results include a medicine classification, a medicine region, and a text region; a detection result fusion module 30, configured to fuse the acquired drug classifications and drug regions, and output a classification confidence based on appearance and a fused drug region; a text region cutting module 40, configured to perform text region cutting on the first image and the second image according to the obtained text region and the output fused medicine region; the character recognition module 50 is configured to perform character recognition on the first image and the second image after the text region is cut by using a pre-trained model, and recognize character information in the first image and the second image after the text region is cut; and a recognition result fusion module 60, configured to fuse the output classification confidence based on the appearance and the recognized text information, and finally output a recognition result based on the appearance and the text information.

The video acquisition module 10 acquires training samples acquired by the video acquisition box and input real-time drug identification videos. Referring to fig. 6, the video capture box is in a high-front-low posture for convenient operation. The inner wall of the video acquisition box is made of a non-reflective white material, the upper portion of the video acquisition box is provided with a strip-shaped light source 100 for providing stable illumination conditions and a shading cover 200 which covers the upper portion of the video acquisition box and is used for preventing light of the strip-shaped light source 100 from directly irradiating eyes of an operator to cause discomfort, the bottom of the video acquisition box is provided with a medicine placing area 300, and a first camera 400 and a second camera 500 are symmetrically installed on two sides of the medicine placing area 300 and are half of the average height of common medicines. The video acquisition box is required to control the video acquisition environment on the premise of convenient operation, so that the video acquisition environment is relatively stable, and the robustness of medicine identification is improved.

The training data preparation mainly comprises three aspects: firstly, collecting training pictures, secondly, marking the training pictures, and thirdly, segmenting the data set.

(1) Training pictures are collected. Firstly, adopt two high definition digtal cameras first camera 400 and second camera 50 whole journey to gather in succession operating personnel and place the video in the district is placed to the medicine according to the combination of difference, different direction, different position, horizontal row at will. Secondly, after all the medicines are collected, the medicine bottles and the hands are detected and identified on the whole video by using the medicine bottle and hand detection models trained in the public data set. Images with medicine bottles but without hands are selected as training images according to certain intervals. This approach has two advantages. Firstly, the collection mode of the training pictures is completely the same as the mode of the operator when using the system to identify the drugs, and the condition that the pictures are not consistent during training and reasoning is avoided. Secondly, the mode of video acquisition is adopted, manual intervention is not needed for shooting, the acquisition speed is high, and the workload is low.

The preprocessing of the loaded first image and the loaded second image by the medicine and character detection module 20 mainly comprises three tasks: the first is to detect the location of the drug. Secondly, the type of the medicine is identified according to the appearance information, and the appearance classification confidence is output. And thirdly, outputting the position of the medicine name text box and providing a text slice for text recognition. In this example, a general purpose real-time target detection network is used, such as YOLOX (YoloX: empirical yolo series in 2021: arXiv preprint arXiv: 2107.08430, 2021.). The structure of the network is shown in fig. 5, and the expression capability of the model is increased by adopting data expansion modes such as Mosaic and mix up in the training process. All text boxes are treated as a class of targets for the network to locate the text box.

The detection result fusion module 30 fuses the detection recognition results from the two cameras, i.e., the first camera 400 and the second camera 500, and outputs a classification confidence based on the appearance. The first image and the second image respectively acquired by the first camera 400 and the second camera 500 are processed simultaneously in the inference process, and because the first camera 400 and the second camera 500 are symmetrically installed, the position of the medicine in the image can be roughly aligned by horizontally turning the second image acquired by the second camera, and then the association relationship of each medicine in the first image and the second image is obtained by calculating the IOU (Intersection) of all detection frames. And regarding the objects with the IOU larger than 0.5 and the objects in the uniform category as the same objects, and finally, the classification confidence coefficient based on the appearance is the product of the classification confidence coefficients of the same medicine in the two images.

The text region cutting module 40 cuts the text region in the first image and the second image as an input of the character recognition module according to the detection result. And according to the fused detection result, finding out a text box inside the same medicine area in the first image and the second image, associating the text box with the medicine type, cutting out a text slice in the original image according to the position of the text box, and normalizing the height of the slice to 32 pixels.

The character recognition module 50 recognizes the characters of the drug name, and further improves the accuracy of the entire drug recognition. The input of the module is a text region slice output by the detection result fusion module, the heights of all slices are normalized to 32 pixels, and the recognized Chinese and English characters and confidence are output. The word recognition module may employ the CRNN algorithm (An End-to-End convertible Neural Network for Image-based Sequence registration. IEEE Transactions on Pattern Analysis and Machine Analysis 2017).

The recognition result fusion module 60 fuses the appearance and the character recognition information to output the final recognition result of the medicine. Appearance classification confidence P output by detection result fusion module _i And the character recognition confidence coefficient output by the character recognition module. For arbitrarily placed drugs, the drug name may appear in the images acquired by the two cameras, so that text from the two pictures needs to be fusedAnd (5) a word recognition result. Let p be _i Confidence of i characters, if p _i If the character recognition confidence coefficient is larger than alpha, the character prediction result is considered to be correct, J is a character set which is correctly predicted, wherein the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters

The medicine real-time identification system based on deep learning provided by the embodiment loads a first image collected by a first camera and a second image collected by a second camera respectively; respectively processing the loaded first image and the loaded second image by using a pre-trained model, and respectively obtaining medicine and character detection results in the first image and the second image, wherein the medicine and character detection results comprise medicine classification, a medicine area and a text area; fusing the obtained medicine classification and the medicine area, and outputting a classification confidence based on the appearance and the fused medicine area; performing text region segmentation on the first image and the second image according to the acquired text region and the output fused medicine region; respectively carrying out character recognition on the first image and the second image after the text region is cut by utilizing a pre-trained model, and recognizing character information in the first image and the second image after the text region is cut; and fusing the output classification confidence degree based on the appearance and the recognized character information, and finally outputting the recognition result based on the appearance and the character information. According to the medicine real-time identification system based on deep learning, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by placing the medicines in the designated area approximately by an operator, and a plurality of medicines can be identified at one time; by combining the prescription two-dimensional code recognition system, the prescription drugs in the dispensing link can be checked in real time, the checking precision is increased, and the burden of workers can be reduced.

Further, please refer to fig. 9, where fig. 9 is a functional block diagram of a second example of the drug real-time recognition system based on deep learning according to the present invention, and on the basis of the first implementation, the drug real-time recognition system based on deep learning further includes a training module 70, and the training module 70 is configured to train a model in a training phase, where the model includes a drug and character detection network model and a character recognition network model.

The training module 70 cuts the images into sample images of consistent size to facilitate processing by the network and loads the images and associated annotations.

Referring to fig. 10 and fig. 11, in the deep learning-based real-time drug identification system provided in this embodiment, the training module 70 includes a drug and character detection network model training module and a character recognition network model training module, and the drug and character detection network model training module includes a cutting unit 71, a data expansion unit 72, a first loading unit 73 and a first training unit 74, where the cutting unit 71 is configured to cut an image to be trained into sample images with consistent sizes and load related labels; a data expansion unit 72, configured to perform horizontal or turning actions on the sample image loaded with the relevant annotations to perform data expansion work; a first loading unit 73, configured to load a pre-training model on the COCO dataset; and the first training unit 74 is used for increasing the data volume by using different data enhancement modes in the training process, and training the medicine and character detection network model on the training set until convergence. The character recognition network model training module comprises a synthesis unit 75, a second loading unit 76, a merging unit 77 and a second training unit 78, wherein the synthesis unit 75 is used for synthesizing text pictures of common medicines by using a character synthesis program so as to increase the training data volume; a second loading unit 76 for loading pre-training models trained on other large text recognition data sets; a merging unit 77, configured to cut out a text recognition training set in the real training set according to the text label, and merge the text recognition training set with the synthesized drug name data set; a second training unit 78 for training the character recognition network model in the merged text recognition training set until convergence.

In the deep learning-based medicine real-time recognition system provided by the embodiment, the loaded first image and the loaded second image are respectively preprocessed by utilizing the pre-trained model through training the medicine and character detection network and the character recognition network, and the medicine and character detection results in the first image and the second image are respectively obtained. According to the medicine real-time identification system based on deep learning, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by placing the medicines in the designated area approximately by an operator, and a plurality of medicines can be identified at one time; by combining the prescription two-dimensional code recognition system, the prescription drugs in the dispensing link can be checked in real time, the checking precision is increased, and the burden of workers can be reduced.

Preferably, referring to fig. 12, fig. 12 is a functional module schematic diagram of an example of the detection result fusion module shown in fig. 8, and the detection result fusion module 30 includes a calculating unit 31 and an object identifying unit 32, where the calculating unit 31 is configured to calculate an intersection ratio of all detection frames to obtain an association relationship between each medicine in the first image and the second image; and the object identification unit 32 is configured to identify an object as the same object if the intersection ratio is greater than a preset intersection threshold and the object is of a uniform class, and finally determine that the classification confidence based on the appearance is a product of the classification confidence of the same medicine in the first image and the second image.

The calculation unit 31 calculates IOUs (Intersection overlapping units) of all the detection frames to obtain the association relationship of each medicine in the first image and the second image.

The object recognition unit 32 considers the same object as the object whose IOU is greater than 0.5 and is in the uniform category, and the final appearance-based classification confidence is the product of the classification confidences of the same medicine in the two images.

In the medicine real-time identification system based on deep learning provided by the embodiment, the association relation of each medicine in the first image and the second image is obtained by calculating the intersection ratio of all detection frames; if the intersection ratio is larger than P _i If the preset intersection threshold value is the objects of the uniform category, the objects are regarded as the same objectAnd finally, the classification confidence based on the appearance is the product of the classification confidence of the same medicine of the first image and the second image. According to the medicine real-time identification system based on deep learning, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by placing the medicines in the designated area approximately by an operator, and a plurality of medicines can be identified at one time; by combining a prescription two-dimensional code recognition system, real-time checking of prescription drugs in a dispensing link can be realized, checking precision is increased, and workload of workers can be relieved.

Further, referring to fig. 13, fig. 13 is a functional module schematic diagram of an example of the recognition result fusion module shown in fig. 8, in this example, the recognition result fusion module 60 includes a detection unit 61 and a character recognition unit 62, where the detection unit 61 is used for detecting the appearance classification confidence P _i And a character recognition confidence level; a character recognition unit 62 for assuming p _i Confidence of i characters, if p _i If alpha is greater than alpha, the character prediction result is considered to be correct; wherein J is the correctly predicted character set, the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely

The detection unit 61 outputs the final identification result of the medicine by fusing the appearance and the character identification information. Appearance classification confidence P output by detection result fusion module _i And the character recognition confidence coefficient output by the character recognition module. For arbitrarily placed drugs, the drug name may appear in the images acquired by the two cameras, so that the character recognition results from the two pictures need to be fused.

The character recognition unit 62 assumes p _i Confidence of i characters, if p _i If alpha is greater than alpha, the character prediction result is considered to be correct, J is a correctly predicted characterA character set, wherein the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the predicted correct characters and the confidence coefficients of all the predicted correct characters, namely

Final result P ═ P _t ^β ×P _t ^1-β Where β is the confidence coefficient in order to balance the impact of appearance and text recognition on the final confidence.

The medicine real-time recognition system based on deep learning provided by the embodiment detects the confidence coefficient of appearance classification and the confidence coefficient of character recognition; let p be _i Confidence of i characters, if p _i If alpha is greater than alpha, the character prediction result is considered to be correct; wherein J is the correctly predicted character set, the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters

Final result P ═ P _t ^β ×P _i ^1-β Alpha is a preset character confidence coefficient threshold value, and beta is a confidence coefficient. According to the medicine real-time identification system based on deep learning, the real-time accurate identification of the medicine types can be automatically realized through the appearance information and the character information only by placing the medicines in the designated area approximately by an operator, and a plurality of medicines can be identified at one time; by combining the prescription two-dimensional code recognition system, the prescription drugs in the dispensing link can be checked in real time, the checking precision is increased, and the burden of workers can be reduced.

While preferred examples of the present invention have been described, additional variations and modifications in those examples may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A medicine real-time identification method based on deep learning is characterized by comprising the following steps:

fusing the obtained medicine classification and the medicine area, and outputting a classification confidence based on the appearance and the fused medicine area;

2. The method for real-time drug identification based on deep learning of claim 1, wherein the step of loading the first image collected by the first camera and the second image collected by the second camera respectively comprises:

training the model in a training stage, wherein the model comprises a medicine and text region detection network model and a character recognition network model;

the step of training the medicine and character detection network model comprises the following steps:

loading a pre-training model on the COCO data set;

in the training process, different data enhancement modes are used for increasing the data volume, and the medicine and character detection network model is trained on a training set until convergence;

the step of training the character recognition network model comprises:

loading a pre-training model trained on other large text recognition data sets;

3. The method for real-time drug identification based on deep learning of claim 1, wherein the step of loading the first image collected by the first camera and the second image collected by the second camera respectively comprises:

the method comprises the steps of obtaining real-time medicine videos collected by a video collection box, wherein the inner wall of the video collection box is made of non-reflective white materials, a strip-shaped light source used for providing stable illumination conditions and a shading cover covering the upper portion of the video collection box and used for preventing the strip-shaped light source from directly irradiating eyes of an operator to cause discomfort are arranged on the upper portion of the video collection box, a medicine placing area is arranged at the bottom of the video collection box, and a first camera and a second camera are symmetrically arranged on two sides of the medicine placing area and installed at a position which is half of the average height of common medicines.

4. The method for real-time drug identification based on deep learning of claim 1, wherein the step of fusing the acquired drug classifications and drug regions and outputting the appearance-based classification confidence and fused drug regions comprises:

calculating the intersection and combination ratio of all the detection frames to obtain the association relationship between the first image and the second image of each medicine;

5. The method for real-time drug identification based on deep learning of claim 1, wherein the step of fusing the output appearance-based classification confidence and the identified text information and finally outputting the identification result based on the appearance and the text information comprises:

detection appearance classification confidence P _i And the confidence of character recognition;

let p be _i Confidence of i characters, if p _i If the value is larger than alpha, the character prediction result is considered to be correct; wherein J is the correctly predicted character set, the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters

6. A medicine real-time identification system based on deep learning is characterized by comprising:

the video acquisition module (10) is used for respectively loading a first image acquired by the first camera and a second image acquired by the second camera;

the medicine and character detection module (20) is used for processing the loaded first image and the loaded second image respectively by utilizing a pre-trained model and respectively acquiring medicine and character detection results in the first image and the second image, wherein the medicine and character detection results comprise medicine classification, a medicine area and a text area;

a detection result fusion module (30) for fusing the acquired medicine classification and the medicine region and outputting a classification confidence based on appearance and a fused medicine region;

a text region cutting module (40) for cutting the text regions of the first image and the second image according to the obtained text regions and the output fused medicine regions;

the character recognition module (50) is used for respectively carrying out character recognition on the first image and the second image after the text region is cut by utilizing a pre-trained model, and recognizing character information in the first image and the second image after the text region is cut;

and the recognition result fusion module (60) is used for fusing the output classification confidence degree based on the appearance and the recognized character information and finally outputting a recognition result based on the appearance and the character information.

7. The deep learning based medicine real-time identification system according to claim 6, wherein the deep learning based medicine real-time identification system further comprises:

a training module (70) for training the model in a training phase, the model comprising a drug and text detection network model and a text recognition network model;

the training module (70) comprises a medicine and character detection network model training module and a character recognition network model training module,

the medicine and character detection network model training module comprises:

the cutting unit (71) is used for cutting the image to be trained into sample images with consistent sizes and loading related labels;

the data expansion unit (72) is used for carrying out horizontal or turning action on the sample image loaded with the related label so as to carry out data expansion work;

a first loading unit (73) for loading the pre-trained model on the COCO data set;

the first training unit (74) is used for increasing the data volume by using different data enhancement modes in the training process, and training the medicine and character detection network model on a training set until convergence;

the character recognition network model training module comprises:

a synthesizing unit (75) for synthesizing a text picture of a usual medicine using a character synthesizing program to increase the amount of training data;

a second loading unit (76) for loading the pre-training model trained on other large text recognition data sets;

a merging unit (77) for cutting out a text recognition training set from the real training set according to the text label, and merging the text recognition training set with the synthesized drug name data set;

and a second training unit (78) for training the character recognition network model in the combined text recognition training set until convergence.

8. The deep learning based medicine real-time identification system according to claim 6, wherein the video capture module (10) comprises:

the video acquisition unit is used for acquiring a real-time medicine identification video acquired by the video acquisition box, the inner wall of the video acquisition box is made of a non-reflective white material, a strip-shaped light source used for providing stable illumination conditions and a shading cover covering the video acquisition box are arranged above the video acquisition box and used for preventing the direct irradiation of light of the strip-shaped light source to eyes of an operator from causing discomfort, a medicine placing area is arranged at the bottom of the video acquisition box, and a first camera and a second camera are symmetrically arranged on two sides of the medicine placing area and are half of the average height of a common medicine.

9. The deep learning based medicine real-time identification system according to claim 6, wherein the detection result fusion module (30) comprises:

the calculating unit (31) is used for calculating the intersection ratio of all the detection frames to acquire the association relation of each medicine in the first image and the second image;

and the object identification unit (32) is used for identifying the same object if the object with the intersection ratio larger than the preset intersection threshold value and the unified class is identified, and finally, the classification confidence coefficient based on the appearance is the product of the classification confidence coefficients of the first image and the second image and the same medicine.

10. The deep learning based medicine real-time identification system according to claim 6, wherein the identification result fusion module (60) comprises:

a detection unit (61) for detecting an appearance classification confidence P _i And a character recognition confidence level;

a character recognition unit (62) for assuming p _i Confidence of i characters, if p _i If alpha is greater than alpha, the character prediction result is considered to be correct; wherein J is the correctly predicted character set, the total number of characters is n, the number of correct characters is m, and the final character recognition confidence coefficient is the product of the proportion of the correctly predicted characters and the confidence coefficients of all the correctly predicted characters, namely