WO2020103676A1 - 图像识别方法、装置、系统及存储介质 - Google Patents
图像识别方法、装置、系统及存储介质Info
- Publication number
- WO2020103676A1 WO2020103676A1 PCT/CN2019/115117 CN2019115117W WO2020103676A1 WO 2020103676 A1 WO2020103676 A1 WO 2020103676A1 CN 2019115117 W CN2019115117 W CN 2019115117W WO 2020103676 A1 WO2020103676 A1 WO 2020103676A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- area
- model
- target
- training
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
- G06T7/0014—Biomedical image inspection using an image reference approach
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/174—Segmentation; Edge detection involving the use of two or more images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30028—Colon; Small intestine
- G06T2207/30032—Colon polyp
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Definitions
- This application relates to the Internet field, specifically to image recognition technology.
- Embodiments of the present application provide an image recognition method, device, system, and storage medium to at least solve the technical problem of low accuracy in detecting target objects in related technologies.
- an image recognition method which is applied to an electronic device, and includes: acquiring a first image; dividing the first image into a plurality of first regions through a target model, and selecting from the first image Find the target area in the candidate area centered on the point in the first area; where the target area is the candidate area where the target object is located in the first image; the target model is pre-trained to identify the target object from the image A neural network model of the area where the target model is trained using positive samples that identify the area where the target object is located and negative samples that identify the area where the noise is located; the target area is identified in the first image.
- an image recognition device including: a first acquisition unit for acquiring a first image; a search unit for dividing the first image into a plurality of An area, and find the target area from the candidate area centered on the point in the first area in the first image; where the target area is the candidate area where the target object in the first image is located; the target model is pre-trained A neural network model used to identify the area where the target object is located from the image; an identification unit is used to identify the target area in the first image.
- a storage medium is further provided, where the storage medium includes a stored program, and the above method is executed when the program runs.
- an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the foregoing method through the computer program.
- the first image is divided into a plurality of first regions through the target model, and the target region is searched from the candidate regions centered on the points in the first region in the first image.
- the target area is identified in, the target area is the candidate area where the target object is located in the first image, because the target model is obtained by training using the training image identified by the identification information, and the training image includes the positive area that identifies the area where the target object is located. Samples and negative samples that identify the area where the noise is located; it can be seen that the technical solution of the present application can use the target model to accurately identify the target object even in the presence of noise in the first image, to avoid noise caused by the first image Filtering produces information distortion, which improves the accuracy of target object detection.
- FIG. 1 is a schematic diagram of a hardware environment of an image recognition method according to an embodiment of the present application
- FIG. 2 is a flowchart of an optional image recognition method according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of an optional data clustering according to an embodiment of the present application.
- FIG. 4 is a schematic diagram of an optional data clustering according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of an optional abnormality identification solution according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of an optional abnormality identification solution according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of an optional abnormal recognition model according to an embodiment of the present application.
- FIG. 8 is a schematic diagram of an optional abnormal candidate frame according to an embodiment of the present application.
- FIG. 9 is a schematic diagram of an optional abnormal recognition model according to an embodiment of the present application.
- FIG. 10 is a schematic diagram of an optional image recognition device according to an embodiment of the present application.
- FIG. 11 is a structural block diagram of a terminal according to an embodiment of the present application.
- the above image recognition method may be applied to the hardware environment composed of the server 101 and / or the terminal 103 as shown in FIG. 1.
- the server 101 is connected to the terminal 103 through a network, and can be used to provide services (such as application services, anomaly detection services, etc.) to the terminal 103 or the client installed on the terminal 103, and can be on the server or independent of the server
- a database 105 is provided for providing a data storage service for the server 103.
- the above network includes but is not limited to: a wide area network, a metropolitan area network, or a local area network, and the terminal 103 is not limited to a PC, mobile phone, tablet computer, or the like.
- FIG. 2 is a flowchart of an optional image recognition method according to an embodiment of the present application. As shown in FIG. 2, the method may include the following steps:
- Step S202 the terminal acquires the first image.
- the first image may be a single image captured by a visible light camera, an infrared camera, X-ray, CT, fluoroscopy, etc.
- the first image may also be an image frame in a video stream captured by the above method;
- the first image may specifically be an image obtained by photographing an organism in the above manner, and the organism may be an animal or plant (such as a person, animal, plant, etc.).
- Step S204 the terminal divides the first image into a plurality of first areas through the target model, and searches for the target area from the candidate areas centered on the points in the first area in the first image, and the target area is the first image
- the area where the target object is located is located.
- the target model is a neural network model that is pre-trained to identify the area where the target object is located from the image.
- the target model uses positive samples that identify the area where the target object is located and negative that identifies the area where the noise is located. Sample training.
- the above target model may be pre-trained, which can be used to identify abnormal parts of organs and tissues of the organism; when using the target model to identify abnormal parts of organs and tissues, During internal anomaly detection, it is assumed that the characteristics inside the abnormal biological object are abnormal than the characteristics inside the normal biological object. According to this concept, the "feature set" of the normal biological object and / or the “feature set” of the abnormal biological object are established. The characteristics of the biological object are compared with the "characteristic set” of the normal biological object and / or the “characteristic set” of the abnormal biological object. When the statistical rule is violated, if the "characteristic set” of the abnormal biological object matches, the biological There may be an exception inside the object.
- the target model can be a deep neural network model, such as the YOLO neural network model (English You Only Only Look Once), a deep learning neural network for target detection.
- the above-mentioned first image is divided into a plurality of first areas with regular shapes, such as squares or rectangles; the purpose of dividing the first image into a plurality of first areas with regular shapes is to use the first area as a unit Within the area, determine whether there is an abnormal part in the candidate area centered on the point in the first area.
- the candidate area may also be an area with a regular shape, such as a square, a rectangle, a circle, and a diamond.
- the target model after training has the ability to recognize noise in various scenes, which can avoid identifying scene noise as the target object. Area, which can improve the stability, robustness and reliability in various scenarios.
- the abnormal part in the case where the target model is used to identify abnormal parts of organs and tissues of the living body, the abnormal part may be a part of a specific abnormal type or one of multiple abnormal types .
- the first image is divided into multiple first areas through the target model, and the target area is found from the candidate areas centered on the points in the first area in the first image, in other words, through a model (ie, the target model )
- the positioning and classification of abnormal parts are completed, so that the positioning and classification are no longer separated, which can solve the problem of information distortion and error amplification that may exist in multi-module multi-level.
- the model structure is simpler, which is convenient for online and maintenance, and the single model processing efficiency High, to ensure the smoothness of the video frame rate, has the advantages of high real-time high availability.
- Step S206 the terminal identifies the target area in the first image, so that the user can see the area, which is convenient for assisting the user to judge the target object identified by the target model.
- the above terminal may be a medical device used for diagnosis, or a remote user device that assists diagnosis, such as a mobile terminal (such as a mobile phone or a tablet) or a personal computer PC used by the user.
- a mobile terminal such as a mobile phone or a tablet
- a personal computer PC used by the user.
- the target area is identified in the first image, if there is only one abnormal type, then the target area can be directly framed. If there are multiple abnormal types, in addition to the target area, you can also prompt each The abnormal type of the abnormal part of the target area class, and the confidence of the abnormal type.
- the above target model is equivalent to locate the coordinate position of the target area and the abnormal type information.
- the target model must deal with the positioning and classification problems at the same time, that is, the positioning and classification of this application are through the same model (That is, the target model).
- steps S202 to S206 of the present application can be applied to the abnormal detection of animal and plant biological objects, such as the identification of tumors in animals.
- the above steps S202 to S206 can be used to The video scanned inside the human body realizes the detection of tumor abnormalities, and identifies the area where the tumor is located in the image frame, which is convenient for assisting the doctor to diagnose.
- the internal solutions for animals, plants, etc. can also be realized through the above scheme Detection of abnormal parts.
- the above technical solution is explained by taking the image recognition method of the embodiment of the present application executed by the terminal 103 as an example.
- the target model may be integrated in the system of the terminal 103 in the form of offline, or installed on the terminal in the form of an application.
- the terminal may be offline The form realizes anomaly recognition.
- the image recognition method according to the embodiment of the present application may also be jointly executed by the server 101 and the terminal 103.
- the above target model may be set on the server 101, and the terminal 103 may use a public account (such as a public account in an instant messaging application), a web page, and an application. , Applets and other forms, call the server to use its internal target model to provide services, such as users can pay attention to the public account that provides the service, and enter the public account to get the video stream to the server 101 to facilitate the server 101 Provide image frames that need to be processed, and the server returns the recognition results to the terminal to assist the user in making abnormal judgments.
- a public account such as a public account in an instant messaging application
- a web page such as a public account in an instant messaging application
- an application such as a public account in an instant messaging application
- Applets and other forms call the server to use its internal target model to provide services, such as users can pay attention to the public account that provides the service, and enter the public account to get the video stream
- the first image is divided into a plurality of first regions through the target model, and the target region is searched from the candidate regions centered on the points in the first region in the first image.
- the target area is identified in the image.
- the target area is the candidate area where the target object is located in the first image.
- the target model uses positive samples that identify the area where the target object is located and negative samples where the area where the noise is located. Even in the presence of noise, the technical solution can directly use the target model to accurately identify the target object in the first image, which solves the technical problem of low accuracy of detecting the target object in the related technology, and thus improves the target object The technical effect of the accuracy of detection.
- a user can download an application for assisting judgment on a mobile terminal, open the application when abnormality detection is required, and the application collects a video stream through the camera of the mobile terminal and transmits it to the server.
- the first image frame is obtained from the video stream, and the first image frame is an image of an organ tissue including the organism obtained by photographing the organism.
- the first image frame is divided into a plurality of first regions through the target model, and the target region is searched from the candidate regions centered on the points in the first region in the first image frame,
- the target area is the area where the abnormal part of the organ tissue of the organism is located in the candidate area of the first image frame
- the target model is a neural network model that is pre-trained to identify the area where the abnormal part of the organ tissue is located from the image frame.
- An optional model training scheme includes the following steps 1-step 2:
- Step 1 Obtain a training image including a positive sample that identifies the area where the abnormal part is located and a negative sample that identifies the area where the noise is located.
- the noise is generated when shooting inside the biological object.
- Step 2 Use the training image to train the parameters in the original model to obtain a recognition model for identifying abnormal parts.
- the recognition model includes the target model.
- step 2 training the parameters in the original model using the training image to obtain a recognition model for abnormal part recognition
- Step 21 Input the color data of the training image to the original model, and use the color data of the training image to train the parameters in the original model.
- using the color data of the training image to train the parameters in the original model includes: using the color data of the training image to determine the image features used to describe the abnormal part, where the image features may refer to color features, texture features, Shape features, spatial relationship features, etc.
- the parts with loose structures in the mountain are abnormal parts.
- texture the texture formed when the mountain is loose when shooting is the texture learned by the model.
- the texture at this time is the texture learned by the model; determine the image feature as the input and training in the fully connected layer of the original model
- the area where the abnormal part in the image is located is the value of the parameter in the original model at the time of output.
- Each layer inside the model can be understood as a function to be initialized (the number of functions can be one or more).
- the input of these functions is the previous layer.
- Output (the input of the first layer is the color data X), and the output Y of the last layer is used to represent the information of the abnormal part.
- the original model includes two parts of the neural network, one part is a neural network that uses image features for abnormal recognition and positioning, and the other part is a neural network for image feature extraction, which is used for image
- the neural network for feature extraction and the neural network for anomaly recognition and location using image features can be trained together, or the latter can be trained separately.
- An optional training method is as follows. Using the color data of the training image to determine the image features used to describe the abnormal part includes: setting the values of the parameters of the convolutional layer in the original model to the pre-trained feature extraction model. The values of the parameters of the layer are the same.
- some open source projects are used to train the feature extraction model, such as ImageNet, a computer vision system recognition project. It is currently the world ’s largest database for image recognition. It is established for the simulation of human recognition systems.
- ImageNet dataset to pre-train the first few layers of the feature extraction model (such as YOLO model), and then use transfer learning technology to initialize the original parameters of the pre-trained network model (Such as YOLO target detection network), using this training method, the quasi-call rate and mAP (a kind of evaluation index) of the finally trained model have increased; through the convolutional layer in the original model from the color data of the training image Extract the image features used to describe the abnormal parts.
- the candidate region when using the color data of the training image to train the parameters in the original model, in addition to fitting the weight parameters in each layer in the original model, it also includes determining the candidate region as follows Length and width parameters:
- Step 211 Obtain multiple sets of regional parameters corresponding to multiple positive samples.
- Each set of regional parameters is used to describe the third region where the abnormal part identified by the identification information in a positive sample is located.
- the regional parameters include the center used to represent the third region
- the first parameter of the point, the second parameter used to indicate the length of the third area, and the third parameter used to indicate the width of the third area can be obtained by fitting multiple second parameters to the length parameter of the candidate area. Multiple third parameters are fitted to obtain the width parameter of the candidate region.
- Step 212 For the above multiple sets of regional parameters, in order to facilitate unified processing, the center points in all the regional parameters can be translated to the same point (such as the origin) in the two-dimensional coordinate system, and the central points in the regional parameters are translated Point, the second parameter and the third parameter in the area parameters also perform the same translation (ie, the distance and direction of movement in the X and Y directions are the same as the first parameter), after the translation, the multiple sets of regional parameters
- the multiple second parameter clusters are multiple first data sets (that is, cluster the second parameters close to the same data set), referring to FIG.
- each point on the X axis corresponds to a second parameter
- Each dotted frame is equivalent to a first data set, and clusters multiple third parameters in multiple sets of regional parameters into multiple second data sets (that is, clusters third parameters that are close to each other into the same data set) .
- clustering multiple second parameters in multiple sets of regional parameters into multiple first data sets includes processing all second parameters as follows: first obtain the target parameters among the multiple second parameters, the target parameters are The unprocessed second parameter of the plurality of second parameters; when the target parameter is the core parameter, create a parameter set including the target parameter and the second parameter associated with the target parameter in the plurality of second parameters, and the core parameter The number of second parameters whose distance between them is within the first threshold is not less than the second threshold, find all the second parameters that can reach the density starting from the core parameters, and form a first data set; if the target parameter is an edge Parameters (non-core parameters), then jump out of this loop and look for the next second parameter until all the second parameters are processed.
- the parameter in the E neighborhood of the given target parameter is greater than or equal to MinPts (a set parameter greater than 1) , which is the second threshold), the parameter is called the core parameter.
- the third parameter may also be clustered in the above manner.
- Step 213 Acquire a fourth parameter of each first data set in multiple first data sets and a fifth parameter of each second data set in multiple second data sets.
- the fourth parameter is used to indicate the center of the first data set
- the fifth parameter is used to indicate the center of the second data set.
- the fourth parameter is used as a parameter in the original model to indicate the length of the region where the abnormal part is identified
- the fifth parameter is used as the parameter in the original model to indicate the width of the region where the abnormal part is identified.
- the value of the center of each first data set can be used as the value of the length of the candidate area
- the value of the center of each second data set can be used as the value of the width of the candidate area.
- the number is the product of the number m of the first data set and the number n of the second data set.
- Step 215 Obtain multiple sets of regional parameters corresponding to multiple positive samples.
- Each set of regional parameters is used to describe the third region where the abnormal part identified by the identification information in a positive sample is located.
- the regional parameters include the center used to represent the third region
- the first parameter of the point, the second parameter used to indicate the length of the third area, and the third parameter used to indicate the width of the third area can be obtained by fitting multiple second parameters
- the width parameter of the candidate region can be obtained by fitting multiple third parameters.
- Step 216 For the above multiple sets of regional parameters, in order to facilitate unified processing, the center points in all the regional parameters can be translated to the same point (such as the origin) in the two-dimensional coordinate system, and the central points in the regional parameters are translated Point, the second parameter and the third parameter in the area parameter also perform the same translation (ie, the distance and direction of movement in the X and Y directions are the same as the first parameter), and after the translation, each area parameter can correspond A point to be fitted in a two-dimensional coordinate system, the X-axis coordinate of the point to be fitted is the second or third parameter after translation, and the Y-axis coordinate is the third or second parameter after translation, and then It is possible to fit all points to be fitted into multiple point sets.
- the fitting method is similar to the above method for fitting the second parameter. Define a first threshold indicating the separation distance and a second threshold indicating the minimum number of neighboring points, and then perform the fitting according to the above steps. For the fitting results, see the figure. 4.
- Each dotted frame is equivalent to a set of points, and the points in the set are the points to be fitted.
- Step 217 Acquire the center of each point set in the multiple point sets.
- step 218 the larger value of the X coordinate and Y coordinate of the center point of the point set is used as the length of the area where the abnormal part is identified (that is, the length of the candidate area), and the smaller X coordinate and Y coordinate of the center point of the point set
- the value is taken as the width of the area where the identified abnormal part is located (that is, the width of the candidate area).
- the value of the center of each point set can be used as a set of values for the length and width of the candidate area, then the number of combinations of the size of the candidate area is the number of point sets.
- Step 22 Determine whether the number of training images used reaches a specified threshold.
- Step 23 If the number of training images used does not reach the specified threshold, continue to input the color data of the training image to the original model, and use the color data of the training image to train the parameters in the original model.
- Step 24 if the number of training images used has reached a specified threshold, the color data of the verification image is used as the input of the original model after training to verify whether the original model has recognition capabilities.
- Step 25 When the color data of the verification image is used as the input of the original model after training, and the second region of the original model identified in the verification image matches the labeled region of the verification image, the original model after training As a recognition model, the second area is the area where the abnormal part inside the biological object identified in the verification image of the original model after training is located, and the marked area is the area where the abnormal part inside the biological object marked in the training image is actually located.
- Step 26 when the color data of the verification image is used as the input of the original model after training, and the second region of the original model identified in the verification image does not match the labeled region, continue to use the color data of the positive sample and the negative
- the color data of the sample trains the parameters in the original model until the color region of the verification image is used as the input of the original model after the training, and the second region identified by the original model in the verification image matches the labeled region.
- the training image used in training is the same as the resolution of the image frame in the recognized video stream.
- the above embodiment uses the image frame in the video stream as a fixed resolution as an example for description.
- the training Only the training images with the same resolution are used for training.
- multiple resolution trainings can be used respectively
- the image is trained on the parameters in multiple original models to obtain a recognition model corresponding to each resolution.
- Each recognition model is trained using the training image of the same resolution. Any two recognition models are trained during training
- the resolutions of the training images used are different, in other words, each model is only used to identify image frames of one resolution (ie, the resolution of the training image used during training).
- the resolution matching the first image can be selected from multiple recognition models (e.g. Target model with the same resolution or the closest resolution), the first image is divided into a plurality of first regions by the target model; find in the candidate region centered on the point in the first region in the first image In the target area, a plurality of fourth areas found by the target model from all candidate areas are obtained, and the fourth area is the area where the abnormal part of the organ tissue of the organism identified by the target model from the first image is located.
- multiple recognition models e.g. Target model with the same resolution or the closest resolution
- the model will tend to generate several different fourth areas for the same area that is difficult to distinguish types.
- the highest confidence level among the fourth areas whose distance between centers is not greater than the third threshold (a predetermined threshold) is regarded as a target area, in other words, Adjacent, partial, or all overlapping fourth areas only retain the high confidence level (confidence level is a parameter of the model output indicating confidence), and the distance between the center and any other fourth area
- the fourth areas that are all greater than the third threshold are used as a target area.
- the target area is identified in the first image, and the abnormal type of the abnormal part of the target area and the confidence level of the abnormal type are identified so that the user can see the area for easy auxiliary use
- the author judges the abnormal part recognized by the target model.
- the following uses the application of the technical solution of the present application as an example for the detection of abnormalities of the type of malignant tumors.
- the abnormal detection of biological objects such as mountains, rivers, and plants is similar to this, and is no longer Repeat.
- the image can be preprocessed by a white light NBI (Narrow Band Imaging (NBI) and other preprocessing modules, and then the image is assisted
- NBI Narrow Band Imaging
- the diagnostic system detects the position of the polyp, that is, the polyp candidate frame, based on the sliding window scheme, etc., and then obtains the nature of the polyp through the classification method of the candidate frame property classification module (such as Support Vector Machine SVM, Support Vector Machine).
- the classification of polyp properties has been basically replaced by CNN networks.
- the selective search algorithm SelectiveSearch can also be used to propose a regional proposal network RPN (Region Proposal Network) The scheme such as the network generates a candidate area.
- RPN Regional proposal network
- the above technical solution (a solution to deal with the location and classification of abnormal parts separately) mainly has the following shortcomings: 1) The positioning and classification stages are forcibly separated, and the effect of the classification stage largely depends on the recall of polyp positioning. This separation will make it impossible to obtain the characteristics of the entire picture in the classification stage; 2) the problem of multi-level and multi-module error caused by layer amplification and information distortion, the above technical solutions often divide the detection process of polyps into several sub-modules There are negative effects such as information distortion and error amplification between each sub-module, and it becomes more and more serious as the module level increases; 3) Low processing efficiency and poor real-time performance, most of the above technical solutions do not satisfy real-time performance, especially It is the serial transmission between multiple modules, which will greatly increase the processing time; 4) The model is more complicated and complex, and the above technical solutions are more complicated, which may increase the difficulty of tracing the problem, which is not convenient for daily online, maintenance and evaluation ; 5) Poor production environment performance and poor robustness. Most of the above
- an end-to-end colon polyp positioning and property discrimination method is proposed.
- the advantages of the end-to-end network it has the following advantages: 1) positioning and classification are no longer separated, positioning Share a network with the classification, no longer processed in stages; 2) Use a single network model to solve the problem of information distortion and error amplification that may exist in multiple modules at multiple levels; the model is simpler and easy to go online and maintain; 3) has high real-time and high availability , Single model processing efficiency is higher, to ensure the smoothness of the video frame rate;
- the following special optimization points are proposed: 1) anchor boxes optimization, based on DBSCAN (replace K-means) anchor box parameters initialization method; 2) adaptive more Scale prediction network, according to the characteristics of the input picture, select the appropriate pre-load model for prediction.
- the method proposed in this application is an end-to-end method for localizing and characterizing colonic polyps.
- the doctor In the process of using an endoscope to examine polyps, the doctor only needs to connect the video stream to our end-to-end network. Position and discover the position of polyps in real time, and get the nature of polyps at the same time (optional).
- the technology and method of this application can provide the following functions and benefits: 1) assisting doctors to locate and find polyps to prevent missed diagnosis; 2) assisting doctors to determine the nature of polyps and improve the accuracy of discrimination.
- the technology and method of the present application can effectively assist the doctor's diagnosis with a relatively simple high-real-time and high-availability model (with fewer modules involved) under the condition of ensuring better detection results.
- the effect is better. It is a set of end-to-end colorectal detection solutions (including positioning or positioning and nature discrimination) that can be directly applied to the hospital production environment.
- One of the main purposes of this application is to assist doctors in locating and discovering polyps with a relatively simple, high-real-time and high-availability network model, and to obtain the nature of polyps.
- the overall architecture of the technical solution of the present application that is, the end-to-end target model is shown in FIG. 6.
- the scheme appears as a polyp positioning scheme; when the nature is classified as multiple categories (such as three categories: non-adenoma, adenoma, and adenocarcinoma), the end-to-end scheme can simultaneously predict positioning coordinates and Break down polyp types.
- this module uses the optimized YOLO end-to-end model to locate and detect polyps.
- YOLO transforms the target detection problem into a candidate area bounding boxes coordinate and class probability regression problem.
- YOLO is a one-stage (one-step positioning) target detection algorithm. The positioning coordinates and category probabilities of the frame are obtained.
- NMS Non-Maximum Suppression, Non-maximum suppression
- the network structure used is shown in FIG. 7 and includes 32 layers.
- Each layer network may be a conv layer conv, a pooling layer max, a fully connected layer detection, and so on.
- this application proposes the following several optimization methods:
- An optional training optimization method is based on DBSCAN's anchor boxes parameter initialization method:
- anchor boxes are obtained by clustering using the K-means algorithm, and this application can be implemented using the DBSCAN algorithm.
- the K-means algorithm has the following disadvantages: 1) K-means needs to determine the number of clusters.
- DBSCAN only needs the second threshold minPts and the first threshold eps parameter to automatically determine the number of clusters; 2 ) K-means are easily affected by noise points, DBSCAN can identify noise points and are not affected by noise, as shown in Figures 3 and 4; 3) K-means is greatly affected by initial point selection and cluster shape, and DBSCAN has no such problems and is more adaptable; 4) Since DBSCAN is a density-based clustering algorithm, it is necessary to calculate the distance between points (domain calculation). Under the problem of polyp positioning, it can actually be converted into a polyp box IOU (Intersection Over Union, a standard that measures the accuracy of detecting corresponding objects in a specific data set). Therefore, the distancemetric formula shown below can be used:
- the minPts and eps parameters of DBSCAN can be obtained manually or with the help of the neighboring algorithm kNN (k-Nearest Neighbor) algorithm, box represents the candidate box, centroid represents the center point, and IOU () is obtained.
- kNN k-Nearest Neighbor
- the anchor boxes method based on DBSCAN proposed in this application can also be applied to other deep learning algorithms that use anchor boxes (such as FasterRCNN, etc.). On some polyp datasets, especially when the frame size is not of high quality, this parameter initialization method has a better performance.
- An optional training optimization is pre-training and multi-scale training:
- ImageNet is a computer vision system recognition project. It is currently the world's largest database for image recognition, and is established by a human-recognized recognition system. Able to recognize objects from pictures. First use the ImageNet data set to pre-train the first few layers of the YOLO model network, and then use transfer learning technology to initialize the network parameters obtained by the pre-training to the YOLO target detection network. The final model's quasi-call rate and mAP have both increased.
- YOLO's multi-scale training range is [320,608], combined with the actual characteristics of colon polyp endoscopy data, the multi-scale training range can be fine-tuned to [320, 800], making the model more adaptable to actual hospital instruments.
- Initializing the target detection network parameters using a model trained with a large number of high-quality annotated datasets such as ImageNet, combined with multi-scale training techniques, can make the model jump out of the local optimal solution and converge to the global optimal solution better.
- a pre-processing module is added before the polyp location and discovery module to filter noise.
- NMS algorithm optimization An optional service optimization is NMS algorithm optimization:
- the original NMS algorithm does sorting, deduplication and merging in each category, and does not merge and rearrange across categories. Since the subdivision types of polyp datasets are easy to confuse, for the same polyp region that is difficult to distinguish types, the model will tend to generate several different polyp frames.
- Figure 8 shows one example. There are multiple candidate frames for similar regions. In order to improve the actual user experience of doctors, this application proposes the following NMS optimization algorithm.
- the similar conditions can be defined as: IOU is greater than a certain threshold or the center point falls within a certain area; based on the above algorithm, only the result with high confidence is finally output, which improves the doctor's experience in actual use.
- a new method of initializing anchors based on DBSCAN (replaces K-means) is proposed to assist in the initial parameter configuration of YOLO; an end-to-end cross-class NMS suppression algorithm applied to colon polyps is proposed to improve the user experience ; Propose a multi-scale prediction network, select the appropriate network according to the size of the picture, and improve the final effect.
- an end-to-end colon polyp positioning and nature discrimination method is proposed.
- the problem of information distortion error amplification that may exist between multiple modules and multiple levels is solved, which is convenient for online and maintenance; with high real-time
- the model has high processing efficiency and can guarantee the smoothness of the video frame rate (> 40fps); it has certain robustness and anti-noise ability, and can be adapted to the actual production environment of various hospitals.
- the technical solution of this application is a set of end-to-end colorectal detection solutions that can be directly applied to the production environment of the hospital. In the current situation of scarce and uneven medical resources, it can help doctors locate and find polyps to prevent missed diagnosis; Assist doctors to judge the nature of polyps and improve the accuracy of judgment.
- FIG. 10 is a schematic diagram of an optional abnormal recognition device for biological organ tissue images according to an embodiment of the present application. As shown in FIG. 10, the device may include: a first acquisition unit 1001, a search unit 1003, and an identification unit 1005 .
- the first acquiring unit 1001 is configured to acquire the first image
- the searching unit 1003 is used to divide the first image into a plurality of first regions through the target model, and find the target region from the candidate regions centered on the points in the first region in the first image; wherein, the target region is The candidate area where the target object is located in the first image; the target model is a neural network model that is pre-trained to identify the area where the target object is located from the image.
- the target model uses positive samples and It is obtained by training negative samples that identify the area where the noise is located;
- the identification unit 1005 is used to identify the target area in the first image.
- first obtaining unit 1001 in this embodiment may be used to perform step S202 in the embodiment of the present application
- search unit 1003 in this embodiment may be used to perform step S204 in the embodiment of the present application
- the identification unit 1005 in the embodiment may be used to perform step S206 in the embodiment of the present application.
- the above-mentioned module can run in the hardware environment shown in FIG. 1, and can be implemented by software or hardware.
- the first image is divided into multiple first areas through the target model, and the target area is searched from the candidate areas centered on the points in the first area in the first image, and identified in the first image
- the target area, the target area is the candidate area where the target object is located in the first image. Since the target model is trained using positive samples identifying the area where the target object is located and negative samples identifying the area where the noise is located, therefore, the technology of the present application Even in the presence of noise, the solution can directly use the target model to accurately identify the target object in the first image, thereby solving the technical problem of low accuracy of detecting the target object in the related technology, and thereby achieving the improvement goal The technical effect of the accuracy of object detection.
- the device of the present application may further include:
- the second obtaining unit is used to obtain the included area before the first image is divided into a plurality of first areas through the target model and the target area is searched from the candidate areas centered on the points in the first area in the first image Training images that identify the positive samples in the area where the target object is located and the negative samples in the area where the noise is located;
- the training unit is used for training the parameters in the original model using the training image to obtain a recognition model for recognizing the target object, where the recognition model includes the target model.
- the training unit may include:
- the training module is used to input the color data of the training image to the original model, and use the color data of the training image to train the parameters in the original model;
- the first verification model is used when the color data of the verification image is used as the input of the original model after training, and the second region identified by the original model in the verification image matches the marked area of the verification image,
- the original model after training is used as a recognition model, where the second area is the area where the target object identified by the original model after training is located in the verification image, and the marked area is the area where the target object marked in the training image is actually located;
- the second verification model is used to instruct the training module to continue when the color data of the verification image is used as the input of the original model after training and the second region identified by the original model in the verification image does not match the labeled region
- the labeled area matches.
- the training module can also be used to: use the color data of the training image to determine the image features used to describe the target object; determine the image feature as the input in the fully connected layer of the original model and take the area of the target object in the training image as The values of the parameters in the original model at the time of output.
- the training module may also be used to: obtain multiple sets of regional parameters corresponding to multiple positive samples, where each set of regional parameters is used to describe the third region where the target object identified by the identification information in a positive sample is located, and the regional parameters Including the first parameter for indicating the center point of the third area, the second parameter for indicating the length of the third area, and the third parameter for indicating the width of the third area;
- the second parameter is clustered into multiple first data sets, and the multiple third parameters in the multiple sets of regional parameters are clustered into multiple second data sets; the data of each first data set in the multiple first data sets is obtained A fourth parameter and a fifth parameter of each second data set in a plurality of second data sets, where the fourth parameter is used to indicate the center of the first data set, and the fifth parameter is used to indicate the center of the second data set;
- the fourth parameter is used as a parameter in the original model to indicate the length of the region where the identified target object is located, and the fifth parameter is used as a parameter in the original model to indicate the width
- the training module may also be used to: when clustering multiple second parameters in multiple sets of regional parameters into multiple first data sets: obtain target parameters among multiple second parameters, where the target parameters are The unprocessed second parameter of the plurality of second parameters; when the target parameter is the core parameter, create a parameter set including the target parameter and the second parameter associated with the target parameter of the plurality of second parameters, wherein, The number of second parameters whose spacing between core parameters is within the first threshold is not less than the second threshold.
- the training module can also be used to: when using the color data of the training image to determine the image features used to describe the abnormal part: set the values of the parameters of the convolution layer in the original model and the pre-trained feature extraction model in the volume The values of the parameters of the accumulation layer are the same; the image features used to describe the target object are extracted from the color data of the training image through the convolution layer in the original model.
- the training unit can also be used to separately train the parameters in the original model using training images of multiple resolutions to obtain a recognition model corresponding to each resolution, where each recognition model uses a resolution
- the training images obtained from the training rate are different.
- the resolution of the training images used by any two recognition models during training is different.
- the searching unit may also be used to: obtain a plurality of fourth regions found by the target model from all candidate regions, where the fourth region is the region where the target object identified by the target model from the first image is located; When there is a fourth area where the distance between the centers is not greater than the third threshold in the plurality of fourth areas, the highest confidence among the fourth areas where the distance between the centers is not greater than the third threshold is regarded as a target area, The fourth area whose distance between the center and any other fourth area is greater than the third threshold is used as a target area.
- the searching unit may be further configured to: select a target model matching the resolution of the first image from multiple recognition models, and divide the first image into multiple first regions by the target model.
- the above-mentioned module can run in the hardware environment shown in FIG. 1, and can be implemented by software or hardware.
- the hardware environment includes a network environment.
- a server or a terminal for implementing the above-mentioned abnormal recognition method of a biological organ tissue image.
- FIG. 11 is a structural block diagram of a terminal according to an embodiment of the present application.
- the terminal may include: one or more (only one shown in FIG. 11) a processor 1101, a memory 1103, and a transmission device 1105, as shown in FIG. 11, the terminal may further include an input and output device 1107.
- the memory 1103 may be used to store software programs and modules, such as program instructions / modules corresponding to the image recognition method and device in the embodiments of the present application, and the processor 1101 executes the software programs and modules stored in the memory 1103 by executing Various functional applications and data processing, that is, to realize the above-mentioned image recognition method.
- the memory 1103 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
- the memory 1103 may further include memories remotely provided with respect to the processor 1101, and these remote memories may be connected to the terminal through a network. Examples of the above network include but are not limited to the Internet, intranet, local area network, mobile communication network, and combinations thereof.
- the above-mentioned transmission device 1105 is used for receiving or sending data via a network, and can also be used for data transmission between the processor and the memory.
- Specific examples of the aforementioned network may include a wired network and a wireless network.
- the transmission device 1105 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers through a network cable to communicate with the Internet or a local area network.
- the transmission device 1105 is a radio frequency (Radio Frequency) module, which is used to communicate with the Internet in a wireless manner.
- Radio Frequency Radio Frequency
- the memory 1103 is used to store an application program.
- the processor 1101 may call the application program stored in the memory 1103 through the transmission device 1105 to perform the following steps:
- the target model is a neural network model that is pre-trained to identify the area where the target object is located from the image.
- the target model uses positive samples that identify the area where the target object is located and the area where the noise is located. The negative sample training is obtained;
- the target area is identified in the first image.
- FIG. 11 is only an illustration, and the terminal may be a smartphone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a mobile Internet device (Mobile Internet Devices (MID), Terminal equipment such as PAD.
- FIG. 11 does not limit the structure of the above electronic device.
- the terminal may further include more or fewer components than those shown in FIG. 11 (such as a network interface, a display device, etc.), or have a configuration different from that shown in FIG. 11.
- the program may be stored in a computer-readable storage medium, and the storage medium may Including: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), magnetic disk or optical disk, etc.
- the embodiments of the present application also provide a storage medium.
- the above storage medium may be used to execute the program code of the image recognition method.
- the above storage medium may be located on at least one network device among multiple network devices in the network shown in the above embodiment.
- the storage medium is set to store program code for performing the following steps:
- the target model Divide the first image into multiple first areas by the target model, and find the target area from the candidate areas centered on the points in the first area in the first image frame; where the target area is in the first image frame
- the candidate area where the target object is located the target model is a pre-trained neural network model used to identify the area where the target object is located from the image.
- the negative sample training is obtained;
- the target area is identified in the first image.
- the above storage medium may include, but is not limited to: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic Various media such as discs or optical discs that can store program codes.
- the integrated unit in the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the computer-readable storage medium.
- the technical solution of the present application may essentially be a part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product, and the computer software product is stored in a storage medium.
- Several instructions are included to enable one or more computer devices (which may be personal computers, servers, network devices, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
- the disclosed client may be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or may Integration into another system, or some features can be ignored, or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above integrated unit may be implemented in the form of hardware or software functional unit.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Quality & Reliability (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
一种图像识别方法、装置、系统及存储介质。其中,该方法包括:获取第一图像(S202);通过目标模型将第一图像划分为多个第一区域,并从在第一图像帧中以第一区域内的点为中心的候选区域中查找目标区域,其中,目标区域为第一图像中目标对象所在的候选区域,目标模型是预先训练好的用于从图像中识别出目标对象所在区域的神经网络模型,该目标模型是利用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的(S204);在第一图像中标识出目标区域(S206)。该方法有效地提高了检测图像中目标对象的准确率。
Description
本申请要求于2018年11月23日提交中国专利局、申请号为2018114102210、申请名称为“生物体器官组织图像的异常识别方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及互联网领域,具体涉及图像识别技术。
随着科学技术的发展,对于自然界的探索已经不仅仅局限于事物表面,而是更加倾向于探索事物的内在,例如,对于自然界的各种动植物对象,存在检测动植物对象等的内部是否存在异常的需求,如检测动植物内部是否发生病变异常等。
对于上述的检测动植物对象的内部是否存在异常的需求,目前均是由人工根据拍摄动植物对象的内部获得的图像进行检测的,受限于相关专业人员的个人经验,这种人工检测图像中是否存在目标对象的方式效率较低,且准确性较差。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种图像识别方法、装置、系统及存储介质,以至少解决相关技术中检测目标对象的准确率较低的技术问题。
根据本申请实施例的一个方面,提供了一种图像识别方法,应用于电子设备,包括:获取第一图像;通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域;其中,目标区域为第一图像中目标对象所在的候选区域;目标模型是预先训练好的用于从图像中识别出目标对象所在区域的神经网络模型,所述目标模型是利用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的;在第一图像中标识出目标区域。
根据本申请实施例的另一方面,还提供了一种图像识别装置,包括:第一获取单元,用于获取第一图像;查找单元,用于通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域;其中,目标区域为第一图像中目标对象所在的候选区域;目标模型是预先训练好的用于从图像中识别出目标对象所在区域的神经网络模型;标识单元,用于在第一图像中标识出目标区域。
根据本申请实施例的另一方面,还提供了一种存储介质,该存储介质包括存储的程序,程序运行时执行上述的方法。
根据本申请实施例的另一方面,还提供了一种电子装置,包括存储器、 处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器通过计算机程序执行上述的方法。
在本申请实施例中,通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域,在第一图像中标识出目标区域,目标区域为第一图像中目标对象所在的候选区域,由于目标模型是使用标识信息进行标识过的训练图像进行训练得到的,而训练图像包括标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本;可见,本申请的技术方案即使在第一图像中存在噪音的情况下,也可利用目标模型进行目标对象的准确识别,避免因对第一图像进行噪声滤波而产生信息失真,提高了目标对象检测的准确率。
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的图像识别方法的硬件环境的示意图;
图2是根据本申请实施例的一种可选的图像识别方法的流程图;
图3是根据本申请实施例的一种可选的数据聚类的示意图;
图4是根据本申请实施例的一种可选的数据聚类的示意图;
图5是根据本申请实施例的一种可选的异常的识别方案的示意图;
图6是根据本申请实施例的一种可选的异常的识别方案的示意图;
图7是根据本申请实施例的一种可选的异常的识别模型的示意图;
图8是根据本申请实施例的一种可选的异常的候选框的示意图;
图9是根据本申请实施例的一种可选的异常的识别模型的示意图;
图10是根据本申请实施例的一种可选的图像识别装置的示意图;
图11是根据本申请实施例的一种终端的结构框图。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如, 包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本申请实施例的一方面,提供了一种图像识别方法的方法实施例。
可选地,在本实施例中,上述图像识别方法可以应用于如图1所示的由服务器101和/或终端103所构成的硬件环境中。如图1所示,服务器101通过网络与终端103进行连接,可用于为终端103或终端103上安装的客户端提供服务(如应用服务、异常检测服务等),可在服务器上或独立于服务器设置数据库105,用于为服务器103提供数据存储服务,上述网络包括但不限于:广域网、城域网或局域网,终端103并不限定于PC、手机、平板电脑等。
本申请实施例的图像识别方法可以由终端103来执行,终端103执行本申请实施例的图像识别方法也可以是由安装在其上的客户端来执行。图2是根据本申请实施例的一种可选的图像识别方法的流程图,如图2所示,该方法可以包括以下步骤:
步骤S202,终端获取第一图像。
上述第一图像可以为采用可见光摄像头、红外摄像头、X射线、CT、透视等方式拍摄得到的单独的一张图像,第一图像也可以为通过上述方式拍摄得到的视频流中的图像帧;该第一图像具体可以是通过上述方式拍摄生物体而获得的图像,该生物体可以为动植物(如人、动物、植物等)。
步骤S204,终端通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域,目标区域为第一图像中目标对象所在的区域,目标模型是预先训练好的用于从图像中识别出目标对象所在区域的神经网络模型,目标模型是利用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的。
上述的目标模型可以是预先训练好的,其具体可以用于识别生物体的器官组织的异常部位;当利用该目标模型识别器官组织的异常部位时,在目标模型中,在进行生物体生物对象内部的异常检测时,假设异常生物对象内部的特征异常于正常生物对象内部的特征,根据这一理念建立正常生物对象的“特征集合”和/或建立异常生物对象的“特征集合”,将当前生物对象的特征与正常生物对象的“特征集合”和/或异常生物对象的“特征集合”相比较,当违反其统计规律时,如与异常生物对象的“特征集合”相匹配,认为该生物对象内部可能存在异常。
目标模型可以为深度神经网络模型,如YOLO神经网络模型(英文You Only Look Once缩写),一种用于目标检测的深度学习神经网络。上述第一图像划分为多个形状规则的第一区域,如正方形或长方形;将第一图像划分为多个形状规则的第一区域的目的在于,以第一区域为单位,在每个第一区域内,确定以该第一区域内的点为中心点的候选区域内是否存在异常部位,候 选区域也可为形状规则的区域,如正方形、长方形、圆形、菱形等。
由于目标模型训练时使用的训练图像包括具有因为环境原因产生的噪声数据的负样本,使得训练完成后的目标模型具备识别各种场景下噪音的能力,即可以避免将场景噪音识别为目标对象所在的区域,从而可以提高在各种场景下使用的稳定性、鲁棒性、可靠性。
上述步骤S204所示的方案中,在目标模型用于识别生物体的器官组织的异常部位的情况下,异常部位可以是某个特定异常类型的部位,也可以是多种异常类型中的一种,通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域,换言之,是通过一个模型(即目标模型)完成了异常部位的定位和分类,使得定位和分类不再分离,可以解决多模块多层级可能存在的信息失真、误差放大问题,同时模型结构更简单,便于上线和维护,单模型处理效率更高,保证视频帧率的流畅度,具有高实时高可用的优点。
步骤S206,终端在第一图像中标识出目标区域,以使使用者看到该区域,便于辅助使用者判断目标模型识别出的目标对象。
上述终端可以是用于诊断的医疗设备,也可以是辅助诊断的远程用户设备,如用户使用的移动终端(如手机、平板)或个人计算机PC等。
可选地,此处在第一图像中标识出目标区域,如果异常类型只有一个,那么直接框出目标区域即可,若异常类型有多个,除了框出目标区域外,还可以提示每个目标区域类的异常部位的异常类型,以及该异常类型的置信度。
为了框出目标区域,上述的目标模型相当于要定位出目标区域的坐标位置、以及异常类型信息,换言之,目标模型要同时处理定位和分类问题,也即本申请的定位和分类是通过同一模型(即目标模型)实现的。
如前述描述,本申请的步骤S202至步骤S206所示的技术方案可以应用于动植物生物对象的异常检测,如动物体内肿瘤的识别,在医生进行诊断时,可以使用上述步骤S202至步骤S206在对人体内部进行扫描到的视频中实现对肿瘤异常的检测,并在图像帧中标识出肿瘤所在的区域,便于辅助医生进行诊断,类似地,还可以通过上述方案实现对动物、植物等内部的异常部位进行检测。
上述技术方案以本申请实施例的图像识别方法由终端103来执行为例进行说明,目标模型可以离线的形式集成在终端103的系统内,或者以应用的形式安装在终端上,终端可以离线的形式实现异常的识别。
本申请实施例的图像识别方法还可以由服务器101和终端103共同执行,上述的目标模型可以设置在服务器101上,终端103可以通过公众号(如即时通讯应用中的公众号)、网页、应用、小程序等形式,调用服务器利用其内部设置的目标模型提供服务,如使用者可以关注提供该服务的公众号,并将进入公众号后拍摄得到视频流传输至服务器101,以便于向服务器101提供 所需处理的图像帧,服务器将识别结果返回给终端,辅助使用者进行异常判断。
通过上述步骤S202至步骤S206,通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域,在第一图像中标识出目标区域,目标区域为第一图像中目标对象所在的候选区域,该目标模型是利用标识出目标对象所在区域的正样本和利用标识出噪声所在区域的负样本,可见本申请的技术方案即使在存在噪音的情况下,也可以直接利用目标模型对第一图像中的目标对象进行准确识别,解决了相关技术中检测目标对象的准确率较低的技术问题,进而达到提高目标对象检测的准确率的技术效果。
下面以本申请实施例提供的方法用于识别生物体器官组织的异常部位为例,结合图2所示的步骤进一步详述本申请的技术方案:
在步骤S202提供的技术方案中,使用者可以在移动终端上下载辅助判断的应用,在需要进行异常检测时开启该应用,应用通过移动终端的摄像头进行视频流的采集,并传送给服务器,服务器从视频流中获取第一图像帧,第一图像帧为对生物体进行拍摄得到的包括生物体的器官组织的图像。
在步骤S204提供的技术方案中,通过目标模型将第一图像帧划分为多个第一区域,并从在第一图像帧中以第一区域内的点为中心的候选区域中查找目标区域,目标区域为第一图像帧的候选区域中生物体的器官组织的异常部位所在的区域,目标模型是预先训练好的用于从图像帧中识别出器官组织的异常部位所在区域的神经网络模型。
上述的目标模型可以是预先训练好的,也可以是在使用时训练的,一种可选地模型训练方案包括如下步骤1-步骤2:
步骤1,获取包括标识出异常部位所在区域的正样本和标识出噪声所在区域的负样本的训练图像,噪声为在生物对象的内部进行拍摄时产生的。
步骤2,使用训练图像对原始模型中的参数进行训练,得到用于进行异常部位识别的识别模型,识别模型包括目标模型。
可选地,步骤2的“使用训练图像对原始模型中的参数进行训练,得到用于进行异常部位识别的识别模型”包括:
步骤21,将训练图像的颜色数据输入至原始模型,使用训练图像的颜色数据对原始模型中的参数进行训练。
可选地,使用训练图像的颜色数据对原始模型中的参数进行训练包括:利用训练图像的颜色数据确定用于描述异常部位的图像特征,此处的图像特征可以是指颜色特征、纹理特征、形状特征、空间关系特征等,例如,对于山体内构造比较松散的部位为异常的部位,以纹理为例,当拍摄时山体内部松散时形成的纹理即模型学习到的纹理,再如,对于人体内部癌变的部位,往往会在颜色上与周围的组织不同,从而会形成独特的纹理,此时的纹理即 模型学习到的纹理;确定原始模型的全连接层中以图像特征为输入且以训练图像中的异常部位所在区域为输出时原始模型中参数的取值,模型内部的每一层可以理解为待初始化的函数(函数数量可以为一个或者多个),这些函数输入为前一层的输出(第一层的输入即为颜色数据X),最后一层的输出Y即用于表示异常部位的信息,通过多个训练图像,其相当于是提供多组X和Y,从而对函数的参数进行拟合求解,即完成了前述训练。
在本申请的实施例中,原始模型中包括两个部分的神经网络,一部分是利用图像特征进行异常识别和定位的神经网络,另一部分是用于进行图像特征提取的神经网络,用于进行图像特征提取的神经网络和利用图像特征进行异常识别和定位的神经网络可以一起训练,也可以将后者单独训练。
一种可选的训练方式如下,利用训练图像的颜色数据确定用于描述异常部位的图像特征包括:将原始模型中卷积层的参数的取值设置为与预先训练好的特征提取模型中卷积层的参数的取值相同,例如,使用一些开源的项目进行特征提取模型的训练,如ImageNet,一个计算机视觉系统识别项目,是目前世界上图像识别最大的数据库,为模拟人类的识别系统建立的,能够从图片识别物体等特征,可先使用ImageNet数据集对特征提取模型(如YOLO模型)的前几层网络进行预训练,再运用迁移学习技术,将预训练得到的网络参数初始化原始模型(如YOLO目标检测网络),采用这种训练方式,最后训练得到的模型的准召率和mAP(一种评价指标)都有所上升;通过原始模型中的卷积层从训练图像的颜色数据中提取用于描述异常部位的图像特征。
在一个可选地实施例中,在使用训练图像的颜色数据对原始模型中的参数进行训练时,除了拟合出原始模型内各层中的权重参数外,还包括按照如下方式确定候选区域的长宽的参数:
步骤211,获取多个正样本对应的多组区域参数,每组区域参数用于描述一个正样本中标识信息标识出的异常部位所在的第三区域,区域参数包括用于表示第三区域的中心点的第一参数、用于表示第三区域的长度的第二参数以及用于表示第三区域的宽度的第三参数,可以通过多个第二参数拟合得到候选区域的长度参数,可以通过多个第三参数拟合得到候选区域的宽度参数。
步骤212,对于上述的多组区域参数,为了便于统一处理,可以将所有区域参数中的中心点平移到二维坐标系中的同一个点(如原点),对于区域参数内中心点进行平移了的点,区域参数内的第二参数和第三参数也进行相同的平移(即在X方向和Y方向移动的距离和方向与第一参数相同),在进行平移之后,将多组区域参数中的多个第二参数聚类为多个第一数据集(即将位置接近的第二参数聚类至同一个数据集),参考图3,X轴上的每个点对应于一个第二参数,每个虚线框相当于一个第一数据集,并将多组区域参数中 的多个第三参数聚类为多个第二数据集(即将位置接近的第三参数聚类至同一个数据集)。
可选地,将多组区域参数中的多个第二参数聚类为多个第一数据集包括按照如下方式处理所有第二参数:先获取多个第二参数中的目标参数,目标参数为多个第二参数中未处理过的第二参数;在目标参数为核心参数的情况下,创建包括目标参数和多个第二参数中与目标参数关联的第二参数的参数集,与核心参数之间的间距在第一阈值内的第二参数的个数不小于第二阈值,找出从核心参数出发的所有密度可达的第二参数,形成一个第一数据集;若目标参数是边缘参数(非核心参数),则跳出本次循环,寻找下一个第二参数,直至所有的第二参数被处理。
如果一个第二参数a在另一个第二参数b的Ε邻域内,并且b为核心生物对象,那么生物对象a从生物对象b直接密度可达;给定多个第二参数b1,b2…,bn,b=b1,a=bn,假如bi从bi-1直接密度可达,那么b从生物对象a密度可达。
需要说明的是,如果给定目标参数的Ε邻域(给定参数半径为Ε内的区域称为该参数的Ε邻域)内的第二参数大于等于MinPts(一个设定的大于1的参数,即第二阈值),则称该参数为核心参数。
类似地,也可按照上述方式对第三参数进行聚类处理。
步骤213,获取多个第一数据集中每个第一数据集的第四参数和多个第二数据集中每个第二数据集的第五参数,第四参数用于表示第一数据集的中心,第五参数用于表示第二数据集的中心。
步骤214,将第四参数作为原始模型中用于表示识别异常部位所在区域的长度的参数,并将第五参数作为原始模型中用于表示识别异常部位所在区域的宽度的参数。换言之,每个第一数据集的中心的取值可以作为候选区域长度的取值,每个第二数据集的中心的取值可以作为候选区域宽度的取值,那么候选区域大小的组合方式的个数为第一数据集的个数m与第二数据集的个数n之间的乘积。
在又一个可选地实施例中,在使用训练图像的颜色数据对原始模型中的参数进行训练时,除了拟合出原始模型内各层中的权重参数外,还包括按照如下方式确定候选区域的长宽的参数:
步骤215,获取多个正样本对应的多组区域参数,每组区域参数用于描述一个正样本中标识信息标识出的异常部位所在的第三区域,区域参数包括用于表示第三区域的中心点的第一参数、用于表示第三区域的长度的第二参数以及用于表示第三区域的宽度的第三参数(如X轴的取值),可以通过多个第二参数拟合得到候选区域的长度参数(如Y轴的取值),可以通过多个第三参数拟合得到候选区域的宽度参数。
步骤216,对于上述的多组区域参数,为了便于统一处理,可以将所有 区域参数中的中心点平移到二维坐标系中的同一个点(如原点),对于区域参数内中心点进行平移了的点,区域参数内的第二参数和第三参数也进行相同的平移(即在X方向和Y方向移动的距离和方向与第一参数相同),在进行平移之后,每个区域参数可以对应于一个二维坐标系中的待拟合点,该待拟合点的X轴坐标为平移后的第二参数或第三参数,Y轴坐标为平后的第三参数或者第二参数,然后可以将所有待拟合点拟合成为多个点集合。
拟合方式与上述对第二参数的拟合方式类似,定义一个表示间隔距离的第一阈值和表示最少相邻点数量的第二阈值,然后按照上述步骤进行拟合,拟合的结果参见图4,每个虚线框相当于一个点集合,集合中的点即待拟合点。
步骤217,获取多个点集合中每个点集合的中心。
步骤218,将点集合的中心点X坐标和Y坐标中较大值作为识别出的异常部位所在区域的长度(即候选区域的长度),将点集合的中心点X坐标和Y坐标中较小值作为识别出的异常部位所在区域的宽度(即候选区域的宽度)。换言之,每个点集合的中心的取值可以作为候选区域的长度和宽度的一组取值,那么候选区域大小的组合方式的个数为点集合的个数。
步骤22,判断使用的训练图像的数量是否达到指定阈值。
步骤23,若使用的训练图像的数量未达到指定阈值,则继续将训练图像的颜色数据输入至原始模型,使用训练图像的颜色数据对原始模型中的参数进行训练。
步骤24,若使用的训练图像的数量已达到指定阈值,则使用验证图像的颜色数据作为训练后的原始模型的输入,以验证原始模型是否具备识别能力。
步骤25,在使用验证图像的颜色数据作为训练后的原始模型的输入、且原始模型在验证图像中识别出的第二区域与验证图像的标注区域相匹配的情况下,将训练后的原始模型作为识别模型,第二区域为训练后的原始模型在验证图像中识别出的生物对象内部的异常部位所在的区域,标注区域为训练图像中标注出的生物对象内部的异常部位实际所在的区域。
步骤26,在使用验证图像的颜色数据作为训练后的原始模型的输入、且原始模型在验证图像中识别出的第二区域与标注区域不匹配的情况下,继续使用正样本的颜色数据和负样本的颜色数据对原始模型中的参数进行训练,直至使用验证图像的颜色数据作为训练后的原始模型的输入时原始模型在验证图像中识别出的第二区域与标注区域相匹配。
在上述实施例中,训练时使用的训练图像和识别的视频流中的图像帧的分辨率相同,上述实施例以视频流中的图像帧为某个固定分辨率为例进行说明,换言之,训练时仅仅使用了相同的一个分辨率的训练图像进行训练。为了提高本申请的技术方案对不同分辨率图像的适应能力,在使用训练图像对原始模型中的参数进行训练,得到用于进行异常部位识别的识别模型时,可 以分别使用多种分辨率的训练图像对多个原始模型中的参数进行训练,从而得到与每种分辨率对应的识别模型,每个识别模型是使用同一种分辨率的训练图像进行训练得到的,任意两个识别模型在训练时所使用的训练图像的分辨率不同,换言之,每个模型仅用于识别一种分辨率(即训练时使用的训练图像的分辨率)的图像帧。
在按照上述方式完成训练之后,在执行步骤S204的过程,在通过目标模型将第一图像划分为多个第一区域时,可从多个识别模型中选择与第一图像的分辨率匹配(如分辨率相同或者分辨率最为接近的)的目标模型,通过目标模型将第一图像划分为多个第一区域;在从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域时,获取目标模型从所有候选区域中查找到的多个第四区域,第四区域为目标模型从第一图像中识别出的生物体的器官组织的异常部位所在的区域。
由于异常部位的异常类型容易混淆,对于同一难以区分类型的区域,模型会倾向生成几种不同的第四区域,为了改善使用者的实际用户体验,在多个第四区域中存在中心之间的距离不大于第三阈值的第四区域的情况下,将中心之间的距离不大于第三阈值(一个预先设定的阈值)的第四区域中置信度最高的作为一个目标区域,换言之,对于相邻的、部分或者全部重合的第四区域仅仅保留其中置信度高的(置信度为模型输出的一个表示可信度的参数),并将中心与任意一个其它的第四区域之间的距离均大于第三阈值的第四区域作为一个目标区域。
在步骤S206提供的技术方案中,在第一图像中标识出目标区域,并标识出目标区域的异常部位的异常类型以及该异常类型的置信度,以使使用者看到该区域,便于辅助使用者判断目标模型识别出的异常部位。
作为一种可选的实施例,下面以将本申请的技术方案应用于恶性肿瘤这一类型的异常检测为例进行说明,对于山体、河流、植物等生物对象的异常检测与此类似,不再赘述。
当前,据各组织统计的数据显示,我国高发的恶性肿瘤类型中,结肠癌的发病率和死亡率常位居TOP5。然而,一方面,随着中国人口的日益增长和老龄化加剧,我国医疗卫生系统负荷日渐加剧,仅有的医生在面对这些不断产生的大量医疗影像时,容易造成漏诊或误诊;另一方面,由于区域发展的不均衡,我国医疗资源的分布也极其不均衡,高水平医生常集中在大城市的大型三甲医院中,而其它地区的医院水平层次不齐,容易导致患者得不到正确的诊断和治疗。
为了解决上述问题,在一个可选的实施方式中,如图5所示,可以通过白光NBI(Narrow Band Imaging,即内镜窄带成像术NBI)等预处理模块对图像进行预处理,然后影像辅助诊断系统基于滑窗方案等检测息肉位置,即息肉候选框,再通过候选框性质分类模块的分类方法(如支持向量机SVM, Support Vector Machine)来得到息肉的性质。在使用深度卷积神经网络CNN(Convolutional Neural Networks)的情况下,息肉性质分类已基本被CNN网络取代,对于自然图像的定位,也可通过选择性搜索算法SelectiveSearch、提出区域建议网络RPN(Region Proposal Network)网络等方案生成候选区域的方案。
上述的技术方案(即将异常部位定位和分类独立开来处理的方案),主要存在如下几个缺点:1)定位与分类阶段被强制分离,而分类阶段的效果很大程度依赖息肉定位的召回,这种分离将使得分类阶段无法获得整张图的特征;2)采用多层级和多模块带来的误差逐层放大、信息失真的问题,上述技术方案常常把息肉的检测流程分割成若干个子模块,每个子模块之间存在信息失真、误差放大等负面影响,并随着模块层级的增加而变得越来越严重;3)处理效率低、实时性差,上述技术方案大多不满足实时性,尤其是多个模块之间的串行传递,会使处理时间大大增加;4)模型比较复杂、复杂度高,上述技术方案比较复杂,可能会增加追查问题的难度,不便于日常上线、维护和评估;5)生产环境表现不佳、鲁棒性差,上述技术方案大多人为回避了很多实际场景出现的无关、低质量的噪音数据,导致医院实际应用的时候误判多、假阳性高,甚至不可用。
在本申请的又一个可选的实施方式中,提出了一种端到端的结肠息肉定位与性质判别方法,借助端到端网络的优势,有以下优点:1)定位和分类不再分离,定位和分类共用一个网络,不再分阶段处理;2)采用单网络模型,解决多模块多层级可能存在的信息失真、误差放大问题;模型更简单,便于上线和维护;3)具备高实时高可用,单模型处理效率更高,保证视频帧率的流畅度;同时,提出以下特殊优化点:1)anchor boxes优化,基于DBSCAN(替代K-means)的anchor boxes参数初始化方法;2)自适应多尺度预测网络,根据输入图片的特征,选择合适的预加载模型进行预测。
本申请提出的方法是一种端到端的结肠息肉定位与性质判别的方法,医生在使用内窥镜检查息肉的过程中,只需将视频流接入我们提出的端到端网络中,即可实时定位和发现息肉的位置,并同时得到息肉的性质(可选)。本申请的技术和方法,可以提供以下功能及好处:1)辅助医生定位和发现息肉,防止漏诊;2)辅助医生判别息肉性质,提升判别准确率。
由上述分析可知,本申请的技术和方法与其它方案相比,在保证较好的检测效果的情况下,能够以较为简洁的高实时高可用模型(涉及模块少)来有效辅助医生的诊断,同时,经过一些特殊优化后,效果表现更佳,是一套能直接应用于医院生产环境的端到端结直肠检测方案(含定位或定位以及性质判别)。
下面对本申请的技术方案进行详细阐述。本申请主要目的之一是在以较为简洁的高实时高可用网络模型辅助医生定位和发现息肉,同时得到息肉的 性质类型。
本申请的技术方案整体架构,即端到端的目标模型如图6所示。当性质分类为单类别时,该方案表现为息肉定位方案;当性质分类为多类别(如三分类:非腺瘤、腺瘤、腺癌)时,该端到端方案可同时预测定位坐标和细分息肉类型。
考虑到高可用高实时性的要求,本模块采用优化后的YOLO端到端模型对息肉进行定位检测。YOLO将目标检测问题转化为候选区域bounding boxes坐标和类别概率的回归问题,与FasterRCNN等算法相比,YOLO是一种one-stage(一步定位)的目标检测算法,走一遍前向网络即可同时得到框的定位坐标和类别概率,因此在保证一定检测准确率的情况下,YOLO的预测性能和效率远高于其它two-stage的目标检测算法,同时采用优化后的NMS(Non-Maximum Suppression,非极大值抑制)模块进行非极大值抑制优化,能达到大于40FPS的帧率,因此,能完全达到实际医院生产环境的高实时性要求。
在本申请的技术方案中,所使用的网络结构如图7所示,包括32层,各层网络可以是卷积层conv、池化层max、全连接层detection等。基于本申请的网络模型,本申请提出以下几种优化方法:
一种可选的训练优化方式是基于DBSCAN的anchor boxes参数初始化方法:
相关技术中anchor boxes是使用K-means算法聚类得到的,而本申请可采用DBSCAN算法实现。K-means算法与DBSCAN算法相比主要有以下几个缺点:1)K-means需要确定簇的个数,DBSCAN只需第二阈值minPts和第一阈值eps参数即可自动确定簇个数;2)K-means容易受噪音点的影响,DBSCAN可将噪音点识别出来,不受噪音影响,如图3和图4所示;3)K-means受初始点选取和簇形状影响比较大,而DBSCAN没有此类问题,适应性更高;4)由于DBSCAN是基于密度的聚类算法,需要计算点与点之间的距离(领域计算),在息肉定位这个问题下,其实可转换为息肉框与框之间的IOU(Intersection over Union,一种测量在特定数据集中检测相应物体准确度的一个标准)。因此,可采用如下所示的distancemetric公式:
d(box,centroid)=1-IOU(box,centroid)
其中,DBSCAN的minPts和eps参数可人工或借助邻近算法kNN(k-Nearest Neighbor)算法得到,box表示候选框,centroid表示中心点,IOU()为求取。
本申请提出的基于DBSCAN的anchor boxes方法,也可应用在其它采用anchor boxes的深度学习算法(如FasterRCNN等)。在某些息肉数据集上,尤其是框尺寸标注质量不高的情况下,该参数初始化方法有较好的表现。
一种可选的训练优化是预训练和多尺度训练:
ImageNet是一个计算机视觉系统识别项目,是目前世界上图像识别最大的数据库,是模拟人类的识别系统建立的。能够从图片识别物体。先使用ImageNet数据集对YOLO模型的前几层网络进行预训练,再运用迁移学习技术,将预训练得到的网络参数初始化YOLO目标检测网络,最后的模型的准召率和mAP都有所上升。
YOLO的多尺度训练范围是[320,608],结合结肠息肉内窥镜数据的实际特征,可以将多尺度训练范围微调为[320,800],使模型更具备实际医院仪器的适应能力。
使用经过ImageNet等含大量高质量标注的数据集训练的模型初始化目标检测网络参数,结合多尺度训练技术,可以让模型跳出局部最优解,并更好地收敛到全局最优解。
一种可选的训练优化是鲁棒性训练
相关技术方案中大多是在息肉定位发现模块前加入预处理模块,对噪音进行过滤。
由于加入预处理模块会增加子层级和子模块,会存在误差传递和放大的问题。本申请的技术方案将噪音数据(含过曝光、闪光噪声、随机噪声等)直接加入训练集,和正常息肉图一样,当负样本进行端到端训练,减少层级以规避误差传递和放大问题,最终实际上线效果达到很好的预期。
一种可选的服务优化是采用NMS算法优化:
YOLO模型输出的候选框boundingboxes有S*S个,需要经过NMS算法模块进行去重和合并。
原生的NMS算法是在每个类别中做排序、去重和合并,跨类别不做合并和排重。由于息肉数据集的细分类型容易混淆,对于同一难以区分类型的息肉区域,模型会倾向生成几种不同息肉的框,图8示出的为其中一种例子,相似区域有多个候选框,为了改善医生的实际用户体验,本申请提出以下NMS优化算法。
将相似条件可定义为:IOU大于一定阈值或中心点落于一定区域内;基于上述算法,最终只输出置信度confidence高的结果,改进了医生实际使用时的体验。
服务优化2:多尺度模型选择模块
在云服务中,对于不同尺寸输入的图,相关技术方案中通常都是将他们输入的一个固定大小的神经网络输入层。经过申请人实践发现,网络的input层size与原始图片的输入越相近,最终的准召率越高,因此本申请设计了一个多尺度模型选择模块,该模块不存在传递误差,也不存在误差放大问题,不影响端到端的检测速度,如图9所示。
以上模型视服务器承载能力部署,皆需要预加载于显存中,否则影响实时性。
在本申请的技术方案中,提供了一整套可实际应用于医院的端到端的结直肠息肉定位与性质判别解决方案;一种新的基于DBSCAN(替代K-means)的anchor boxes参数初始化方法(算法方案);一种应用于结直肠息肉的跨类NMS抑制算法模块(算法方案);一种基于图片尺寸自适应选择的多尺度预加载预测网络(服务模式)。提出了一种新的基于DBSCAN(替代K-means)的anchor boxes参数初始化方法,可辅助YOLO的初始参数配置;提出了一种应用于结肠息肉的端到端跨类NMS抑制算法,改善用户体验;提出了多尺度预测网络,根据图片尺寸选择恰当的网络,提高最终效果。
通过本申请提出了一种端到端的结肠息肉定位与性质判别方法,借助端到端网络的优势,解决了多模块多层级间可能存在的信息失真误差放大问题,便于上线和维护;具备高实时性,模型处理效率高,能保障视频帧率的流畅度(>40fps);具备一定鲁棒性和抗噪能力,能适配各家医院的实际生产环境。
综上,本申请的技术方案是一套能直接应用于医院生产环境的端到端结直肠检测方案,在当今医疗资源紧缺且不均衡的现状下,可辅助医生定位和发现息肉,防止漏诊;辅助医生判别息肉性质,提升判别准确率。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
根据本申请实施例的另一个方面,还提供了一种用于实施上述生物体器官组织图像的异常识别方法的生物体器官组织图像的异常识别装置。图10是根据本申请实施例的一种可选的生物体器官组织图像的异常识别装置的示意图,如图10所示,该装置可以包括:第一获取单元1001、查找单元1003以及标识单元1005。
第一获取单元1001,用于获取第一图像;
查找单元1003,用于通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域;其中,目标区域为第一图像中目标对象所在的候选区域;目标模型是预先训练好的 用于从图像中识别出目标对象所在区域的神经网络模型,所述目标模型是利用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的;
标识单元1005,用于在第一图像中标识出目标区域。
需要说明的是,该实施例中的第一获取单元1001可以用于执行本申请实施例中的步骤S202,该实施例中的查找单元1003可以用于执行本申请实施例中的步骤S204,该实施例中的标识单元1005可以用于执行本申请实施例中的步骤S206。
此处需要说明的是,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例所公开的内容。上述模块作为装置的一部分可以运行在如图1所示的硬件环境中,可以通过软件实现,也可以通过硬件实现。
通过上述模块,通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域,在第一图像中标识出目标区域,目标区域为第一图像中目标对象所在的候选区域,由于目标模型是使用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的,因此,本申请的技术方案即使在存在噪音的情况下,也可直接利用目标模型对第一图像中的目标对象进行准确地识别,从而解决了相关技术中检测目标对象的准确率较低的技术问题,进而达到提高目标对象检测的准确率的技术效果。
可选地,本申请的装置还可包括:
第二获取单元,用于在通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域之前,获取包括标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本的训练图像;
训练单元,用于使用训练图像对原始模型中的参数进行训练,得到用于识别目标对象的识别模型,其中,识别模型包括目标模型。
可选地,训练单元可包括:
训练模块,用于将训练图像的颜色数据输入至原始模型,使用训练图像的颜色数据对原始模型中的参数进行训练;
第一校验模型,用于在使用验证图像的颜色数据作为训练后的原始模型的输入、且原始模型在验证图像中识别出的第二区域与验证图像的标注区域相匹配的情况下,将训练后的原始模型作为识别模型,其中,第二区域为训练后的原始模型在验证图像中识别出的目标对象所在的区域,标注区域为训练图像中标注出的目标对象实际所在的区域;
第二校验模型,用于在使用验证图像的颜色数据作为训练后的原始模型的输入、且原始模型在验证图像中识别出的第二区域与标注区域不匹配的情况下,指示训练模块继续使用正样本的颜色数据和负样本的颜色数据对原始 模型中的参数进行训练,直至使用验证图像的颜色数据作为训练后的原始模型的输入时原始模型在验证图像中识别出的第二区域与标注区域相匹配。
可选地,训练模块还可用于:利用训练图像的颜色数据确定用于描述目标对象的图像特征;确定原始模型的全连接层中以图像特征为输入并以训练图像中的目标对象所在区域为输出时原始模型中参数的取值。
可选地,训练模块还可用于:获取多个正样本对应的多组区域参数,其中,每组区域参数用于描述一个正样本中标识信息标识出的目标对象所在的第三区域,区域参数包括用于表示第三区域的中心点的第一参数、用于表示第三区域的长度的第二参数以及用于表示第三区域的宽度的第三参数;将多组区域参数中的多个第二参数聚类为多个第一数据集,并将多组区域参数中的多个第三参数聚类为多个第二数据集;获取多个第一数据集中每个第一数据集的第四参数和多个第二数据集中每个第二数据集的第五参数,其中,第四参数用于表示第一数据集的中心,第五参数用于表示第二数据集的中心;将第四参数作为原始模型中用于表示识别出的目标对象所在区域的长度的参数,并将第五参数作为原始模型中用于表示识别出的目标对象所在区域的宽度的参数。
可选地,训练模块还可用于:在将多组区域参数中的多个第二参数聚类为多个第一数据集时:获取多个第二参数中的目标参数,其中,目标参数为多个第二参数中未处理过的第二参数;在目标参数为核心参数的情况下,创建包括目标参数和多个第二参数中与目标参数关联的第二参数的参数集,其中,与核心参数之间的间距在第一阈值内的第二参数的个数不小于第二阈值。
可选地,训练模块还可用于:在利用训练图像的颜色数据确定用于描述异常部位的图像特征时:设置原始模型中卷积层的参数的取值与预先训练好的特征提取模型中卷积层的参数的取值相同;通过原始模型中的卷积层从训练图像的颜色数据中提取用于描述目标对象的图像特征。
可选地,训练单元还可用于:分别使用多种分辨率的训练图像对原始模型中的参数进行训练,得到与每种分辨率对应的识别模型,其中,每个识别模型是使用一种分辨率的训练图像进行训练得到的,任意两个识别模型在训练时所使用的训练图像的分辨率不同。
可选地,查找单元还可用于:获取目标模型从所有候选区域中查找到的多个第四区域,其中,第四区域为目标模型从第一图像中识别出的目标对象所在的区域;在多个第四区域中存在中心之间的距离不大于第三阈值的第四区域的情况下,将中心之间的距离不大于第三阈值的第四区域中置信度最高的作为一个目标区域,并将中心与任意一个其它的第四区域之间的距离均大于第三阈值的第四区域作为一个目标区域。
可选地,查找单元还可用于:从多个识别模型中选择与第一图像的分辨率匹配的目标模型,通过目标模型将第一图像划分为多个第一区域。
此处需要说明的是,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在如图1所示的硬件环境中,可以通过软件实现,也可以通过硬件实现,其中,硬件环境包括网络环境。
根据本申请实施例的另一个方面,还提供了一种用于实施上述生物体器官组织图像的异常识别方法的服务器或终端。
图11是根据本申请实施例的一种终端的结构框图,如图11所示,该终端可以包括:一个或多个(图11中仅示出一个)处理器1101、存储器1103、以及传输装置1105,如图11所示,该终端还可以包括输入输出设备1107。
其中,存储器1103可用于存储软件程序以及模块,如本申请实施例中的图像识别方法和装置对应的程序指令/模块,处理器1101通过运行存储在存储器1103内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的图像识别方法。存储器1103可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器1103可进一步包括相对于处理器1101远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置1105用于经由一个网络接收或者发送数据,还可以用于处理器与存储器之间的数据传输。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置1105包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置1105为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
其中,具体地,存储器1103用于存储应用程序。
处理器1101可以通过传输装置1105调用存储器1103存储的应用程序,以执行下述步骤:
获取第一图像;
通过目标模型将第一图像划分为多个第一区域,并从在第一图像中以第一区域内的点为中心的候选区域中查找目标区域;其中,目标区域为第一图像帧中目标对象所在的候选区域,目标模型是预先训练好的用于从图像中识别出目标对象所在区域的神经网络模型,所述目标模型是利用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的;
在第一图像中标识出目标区域。
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。
本领域普通技术人员可以理解,图11所示的结构仅为示意,终端可以是 智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图11其并不对上述电子装置的结构造成限定。例如,终端还可包括比图11中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图11所示不同的配置。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本申请的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于执行图像识别方法的程序代码。
可选地,在本实施例中,上述存储介质可以位于上述实施例所示的网络中的多个网络设备中的至少一个网络设备上。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:
获取第一图像;
通过目标模型将第一图像划分为多个第一区域,并从在第一图像帧中以第一区域内的点为中心的候选区域中查找目标区域;其中,目标区域为第一图像帧中目标对象所在的候选区域;目标模型是预先训练好的用于从图像中识别出目标对象所在区域的神经网络模型,该目标模型是利用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的;
在第一图像中标识出目标区域。
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。
Claims (16)
- 一种图像识别方法,应用于电子设备所述方法包括:获取第一图像;通过目标模型将所述第一图像划分为多个第一区域,并从在所述第一图像中以所述第一区域内的点为中心的候选区域中查找目标区域;其中,所述目标区域为所述第一图像中所述目标对象所在的候选区域;所述目标模型是预先训练好的用于从图像中识别出目标对象所在区域的神经网络模型,所述目标模型是利用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的;在所述第一图像中标识出所述目标区域。
- 根据权利要求1所述的方法,在通过目标模型将所述第一图像划分为多个第一区域,并从在所述第一图像中以所述第一区域内的点为中心的候选区域中查找目标区域之前,所述方法还包括:获取包括标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本的所述训练图像;使用所述训练图像对原始模型中的参数进行训练,得到用于识别目标对象的识别模型,其中,所述识别模型包括所述目标模型。
- 根据权利要求2所述的方法,所述使用所述训练图像对原始模型中的参数进行训练,得到用于识别目标对象的识别模型包括:将所述训练图像的颜色数据输入至所述原始模型,使用所述训练图像的颜色数据对所述原始模型中的参数进行训练;在使用验证图像的颜色数据作为训练后的所述原始模型的输入、且所述原始模型在所述验证图像中识别出的第二区域与所述验证图像的标注区域相匹配的情况下,将训练后的所述原始模型作为所述识别模型,其中,所述第二区域为训练后的所述原始模型在所述验证图像中识别出的目标对象所在的区域,所述标注区域为所述训练图像中标注出的目标对象实际所在的区域;在使用所述验证图像的颜色数据作为训练后的所述原始模型的输入、且所述原始模型在所述验证图像中识别出的所述第二区域与所述标注区域不匹配的情况下,继续使用所述正样本的颜色数据和所述负样本的颜色数据对所述原始模型中的参数进行训练,直至使用所述验证图像的颜色数据作为训练后的所述原始模型的输入时所述原始模型在所述验证图像中识别出的所述第二区域与所述标注区域相匹配。
- 根据权利要求3所述的方法,使用所述训练图像的颜色数据对所述原始模型中的参数进行训练包括:利用所述训练图像的颜色数据确定用于描述目标对象的图像特征;确定所述原始模型的全连接层中以所述图像特征为输入并以所述训练图像中的目标对象所在区域为输出时所述原始模型中参数的取值。
- 根据权利要求3所述的方法,使用所述训练图像的颜色数据对所述原始模型中的参数进行训练包括:获取多个所述正样本对应的多组区域参数,其中,每组所述区域参数用于描述一个所述正样本中标识出的目标对象所在的第三区域,所述区域参数包括用于表示所述第三区域的中心点的第一参数、用于表示所述第三区域的长度的第二参数以及用于表示所述第三区域的宽度的第三参数;将所述多组区域参数中的多个所述第二参数聚类为多个第一数据集,并将所述多组区域参数中的多个所述第三参数聚类为多个第二数据集;获取所述多个第一数据集中每个第一数据集的第四参数和所述多个第二数据集中每个第二数据集的第五参数,其中,所述第四参数用于表示所述第一数据集的中心,所述第五参数用于表示所述第二数据集的中心;将所述第四参数作为所述原始模型中用于表示识别出的目标对象所在区域的长度的参数,并将所述第五参数作为所述原始模型中用于表示识别出的目标对象所在区域的宽度的参数。
- 根据权利要求5所述的方法,所述将所述多组区域参数中的多个所述第二参数聚类为多个第一数据集包括:获取多个所述第二参数中的目标参数,其中,所述目标参数为多个所述第二参数中未处理过的所述第二参数;在所述目标参数为核心参数的情况下,创建包括所述目标参数和多个所述第二参数中与所述目标参数关联的所述第二参数的参数集,其中,与所述核心参数之间的间距在第一阈值内的所述第二参数的个数不小于第二阈值。
- 根据权利要求4所述的方法,所述利用所述训练图像的颜色数据确定用于描述目标对象的图像特征包括:设置所述原始模型中卷积层的参数的取值与预先训练好的特征提取模型中卷积层的参数的取值相同;通过所述原始模型中的卷积层从所述训练图像的颜色数据中提取用于描述目标对象的图像特征。
- 根据权利要求2所述的方法,所述使用所述训练图像对原始模型中的参数进行训练,得到用于识别目标对象的识别模型包括:分别使用多种分辨率的所述训练图像对所述原始模型中的参数进行训练,得到与每种分辨率对应的所述识别模型,其中,每个所述识别模型是使用一种分辨率的所述训练图像进行训练得到的,任意两个所述识别模型在训练时所使用的所述训练图像的分辨率不同。
- 根据权利要求1至8中任意一项所述的方法,从在所述第一图像中以所述第一区域内的点为中心的候选区域中查找目标区域包括:获取所述目标模型从所有所述候选区域中查找到的多个第四区域,其中,所述第四区域为所述目标模型从所述第一图像中识别出的所述目标对象所在的区域;在所述多个第四区域中存在中心之间的距离不大于第三阈值的所述第四区域的情况下,将中心之间的距离不大于所述第三阈值的所述第四区域中置信度最高的作为一个所述目标区域,并将中心与任意一个其它的所述第四区域之间的距离均大于所述第三阈值的所述第四区域作为一个所述目标区域。
- 根据权利要求1至8中任意一项所述的方法,通过目标模型将所述第一图像划分为多个第一区域包括:从多个识别模型中选择与所述第一图像的分辨率匹配的所述目标模型,通过所述目标模型将所述第一图像划分为所述多个第一区域。
- 一种图像识别装置,所述装置包括:第一获取单元,用于获取第一图像;查找单元,用于通过目标模型将所述第一图像划分为多个第一区域,并从在所述第一图像中以所述第一区域内的点为中心的候选区域中查找目标区域;其中,所述目标区域为所述第一图像中所述目标对象所在的候选区域;所述目标模型是预先训练好的用于从图像中识别出目标对象所在区域的神经网络模型,所述目标模型的利用标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本训练得到的;标识单元,用于在所述第一图像中标识出所述目标区域。
- 根据权利要求11所述的装置,所述装置还包括:第二获取单元,用于在通过目标模型将所述第一图像划分为多个第一区域,并从在所述第一图像中以所述第一区域内的点为中心的候选区域中查找目标区域之前,获取包括标识出目标对象所在区域的正样本和标识出噪声所在区域的负样本的所述训练图像;训练单元,用于使用所述训练图像对原始模型中的参数进行训练,得到用于识别目标对象的识别模型,其中,所述识别模型包括所述目标模型。
- 根据权利要求12所述的装置,所述训练单元包括:训练模块,用于将所述训练图像的颜色数据输入至所述原始模型,使用所述训练图像的颜色数据对所述原始模型中的参数进行训练;第一校验模型,用于在使用验证图像的颜色数据作为训练后的所述原始模型的输入、且所述原始模型在所述验证图像中识别出的第二区域与所述验证图像的标注区域相匹配的情况下,将训练后的所述原始模型作为所述识别模型,其中,所述第二区域为训练后的所述原始模型在所述验证图像中识别出的目标对象所在的区域,所述标注区域为所述训练图像中标注出的目标对象实际所在的区域;第二校验模型,用于在使用所述验证图像的颜色数据作为训练后的所述原始模型的输入、且所述原始模型在所述验证图像中识别出的所述第二区域与所述标注区域不匹配的情况下,指示所述训练模块继续使用所述正样本的颜色数据和所述负样本的颜色数据对所述原始模型中的参数进行训练,直至 使用所述验证图像的颜色数据作为训练后的所述原始模型的输入时所述原始模型在所述验证图像中识别出的所述第二区域与所述标注区域相匹配。
- 一种电子装置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器通过所述计算机程序执行上述权利要求1至10任一项中所述的方法。
- 一种图像识别系统,包括图像摄取设备和图像识别设备,所述图像识别设备用于执行权利要求1至10任一项所述的方法。
- 一种存储介质,所述存储介质包括存储的计算机程序,其中,所述计算机程序运行时执行上述权利要求1至10任一项中所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19886675.8A EP3819822A4 (en) | 2018-11-23 | 2019-11-01 | IMAGE IDENTIFICATION PROCESS AND APPARATUS, TERMINAL AND STORAGE MEDIA |
US17/225,861 US11869227B2 (en) | 2018-11-23 | 2021-04-08 | Image recognition method, apparatus, and system and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811410221.0 | 2018-11-23 | ||
CN201811410221.0A CN109670532B (zh) | 2018-11-23 | 2018-11-23 | 生物体器官组织图像的异常识别方法、装置及系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/225,861 Continuation US11869227B2 (en) | 2018-11-23 | 2021-04-08 | Image recognition method, apparatus, and system and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020103676A1 true WO2020103676A1 (zh) | 2020-05-28 |
Family
ID=66142236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/115117 WO2020103676A1 (zh) | 2018-11-23 | 2019-11-01 | 图像识别方法、装置、系统及存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11869227B2 (zh) |
EP (1) | EP3819822A4 (zh) |
CN (1) | CN109670532B (zh) |
WO (1) | WO2020103676A1 (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553327A (zh) * | 2020-05-29 | 2020-08-18 | 上海依图网络科技有限公司 | 一种服饰识别方法、装置、设备和介质 |
CN112364715A (zh) * | 2020-10-23 | 2021-02-12 | 岭东核电有限公司 | 核电作业异常监控方法、装置、计算机设备和存储介质 |
CN112818865A (zh) * | 2021-02-02 | 2021-05-18 | 北京嘀嘀无限科技发展有限公司 | 车载领域图像识别方法、识别模型建立方法、装置、电子设备和可读存储介质 |
CN112818970A (zh) * | 2021-01-28 | 2021-05-18 | 北京科技大学设计研究院有限公司 | 一种钢卷喷码识别通用检测方法 |
CN113469057A (zh) * | 2021-07-02 | 2021-10-01 | 中南大学 | 火眼视频自适应检测方法、装置、设备及介质 |
CN113468938A (zh) * | 2020-07-31 | 2021-10-01 | 成都通甲优博科技有限责任公司 | 交通图像识别方法、装置、图像处理设备及可读存储介质 |
CN114417962A (zh) * | 2021-12-08 | 2022-04-29 | 航天科工网络信息发展有限公司 | 基于k近邻算法的异常数据检测方法、系统、设备、介质 |
CN118135484A (zh) * | 2024-03-04 | 2024-06-04 | 北京数原数字化城市研究中心 | 目标检测方法、装置及相关设备 |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670532B (zh) | 2018-11-23 | 2022-12-09 | 腾讯医疗健康(深圳)有限公司 | 生物体器官组织图像的异常识别方法、装置及系统 |
CN110377587B (zh) * | 2019-07-15 | 2023-02-10 | 腾讯科技(深圳)有限公司 | 基于机器学习的迁移数据确定方法、装置、设备及介质 |
KR20210067442A (ko) * | 2019-11-29 | 2021-06-08 | 엘지전자 주식회사 | 객체 인식을 위한 자동 레이블링 장치 및 방법 |
CN112712492B (zh) * | 2020-02-04 | 2021-12-24 | 首都医科大学附属北京友谊医院 | 确定设备质量的方法、装置、服务器和存储介质 |
CN112766481B (zh) * | 2020-03-13 | 2023-11-24 | 腾讯科技(深圳)有限公司 | 神经网络模型的训练方法、装置及图像检测的方法 |
CN111931424B (zh) * | 2020-08-12 | 2024-04-16 | 北京卫星环境工程研究所 | 异常数据的均衡化处理方法、装置、设备及存储介质 |
CN112329721B (zh) * | 2020-11-26 | 2023-04-25 | 上海电力大学 | 一种模型轻量化设计的遥感小目标检测方法 |
CN113205512B (zh) * | 2021-05-26 | 2023-10-24 | 北京市商汤科技开发有限公司 | 图像异常检测方法、装置、设备及计算机可读存储介质 |
CN113361413B (zh) * | 2021-06-08 | 2024-06-18 | 南京三百云信息科技有限公司 | 一种里程显示区域检测方法、装置、设备及存储介质 |
CN113610006B (zh) * | 2021-08-09 | 2023-09-08 | 中电科大数据研究院有限公司 | 一种基于目标检测模型的超时劳动判别方法 |
CN113537248B (zh) * | 2021-08-13 | 2024-06-07 | 珠海格力电器股份有限公司 | 图像识别方法和装置、电子设备和存储介质 |
CN113763389B (zh) * | 2021-08-24 | 2022-06-14 | 深圳前海爱客风信息技术有限公司 | 一种基于多主体检测分割的图像识别方法 |
CN113887638B (zh) * | 2021-10-09 | 2024-08-06 | 上海识装信息科技有限公司 | 图像数据扩增方法、装置、设备及存储介质 |
CN113642537B (zh) * | 2021-10-14 | 2022-01-04 | 武汉大学 | 一种医学图像识别方法、装置、计算机设备及存储介质 |
CN114187590A (zh) * | 2021-10-21 | 2022-03-15 | 山东师范大学 | 同色系背景下目标果实识别方法及系统 |
CN113920140B (zh) * | 2021-11-12 | 2022-04-19 | 哈尔滨市科佳通用机电股份有限公司 | 一种基于深度学习的铁路货车管盖脱落故障识别方法 |
CN114820459A (zh) * | 2022-03-31 | 2022-07-29 | 江苏本峰新材料科技有限公司 | 基于计算机辅助的铝单板打磨质量评估方法及系统 |
CN115100587B (zh) * | 2022-05-25 | 2023-06-23 | 水利部珠江水利委员会水文局 | 基于多元数据的区域乱采监测方法及装置 |
CN115100492B (zh) * | 2022-08-26 | 2023-04-07 | 摩尔线程智能科技(北京)有限责任公司 | Yolov3网络训练、pcb表面缺陷检测方法及装置 |
CN116391693B (zh) * | 2023-06-07 | 2023-09-19 | 北京市农林科学院智能装备技术研究中心 | 天牛灭杀方法及系统 |
CN116778138B (zh) * | 2023-06-27 | 2024-10-15 | 中国人民解放军陆军军医大学第二附属医院 | 一种肺结节定位系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609680A (zh) * | 2011-12-22 | 2012-07-25 | 中国科学院自动化研究所 | 一种基于三维深度图像信息的并行统计学习人体部位检测方法 |
US20170330054A1 (en) * | 2016-05-10 | 2017-11-16 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method And Apparatus Of Establishing Image Search Relevance Prediction Model, And Image Search Method And Apparatus |
CN107886110A (zh) * | 2017-10-23 | 2018-04-06 | 深圳云天励飞技术有限公司 | 人脸检测方法、装置及电子设备 |
CN109670532A (zh) * | 2018-11-23 | 2019-04-23 | 腾讯科技(深圳)有限公司 | 生物体器官组织图像的异常识别方法、装置及系统 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3287177B2 (ja) * | 1995-05-17 | 2002-05-27 | 東洋インキ製造株式会社 | カラーイメージング方法および装置 |
JP5085573B2 (ja) * | 2009-01-13 | 2012-11-28 | 新日本製鐵株式会社 | 欠陥検査方法および欠陥検査装置 |
US8526723B2 (en) * | 2009-06-23 | 2013-09-03 | Los Alamos National Security, Llc | System and method for the detection of anomalies in an image |
CN102855478B (zh) * | 2011-06-30 | 2015-11-25 | 富士通株式会社 | 图像中文本区域定位方法和装置 |
US10055850B2 (en) * | 2014-09-19 | 2018-08-21 | Brain Corporation | Salient features tracking apparatus and methods using visual initialization |
WO2016086341A1 (en) * | 2014-12-01 | 2016-06-09 | Dongguan Zkteco Electronic Technology Co., Ltd | System and method for acquiring multimodal biometric information |
JP7026062B2 (ja) * | 2016-03-17 | 2022-02-25 | アビジロン コーポレイション | 機械学習によってオブジェクト分類器を訓練するためのシステム及び方法 |
KR102506459B1 (ko) * | 2016-05-20 | 2023-03-03 | 매직 립, 인코포레이티드 | 콘볼루셔널 이미지 변환 추정을 수행하기 위한 방법 및 시스템 |
CN106327502A (zh) * | 2016-09-06 | 2017-01-11 | 山东大学 | 一种安防视频中多场景多目标识别和跟踪方法 |
CN108304754A (zh) * | 2017-03-02 | 2018-07-20 | 腾讯科技(深圳)有限公司 | 车型的识别方法和装置 |
EP3392832A1 (en) * | 2017-04-21 | 2018-10-24 | General Electric Company | Automated organ risk segmentation machine learning methods and systems |
CN107122732B (zh) * | 2017-04-25 | 2019-12-31 | 福州大学 | 一种监控场景下高鲁棒性的快速车牌定位方法 |
CN108022238B (zh) * | 2017-08-09 | 2020-07-03 | 深圳科亚医疗科技有限公司 | 对3d图像中对象进行检测的方法、计算机存储介质和系统 |
CN107658028A (zh) * | 2017-10-25 | 2018-02-02 | 北京华信佳音医疗科技发展有限责任公司 | 一种获取病变数据的方法、识别病变方法及计算机设备 |
CN107644225A (zh) * | 2017-10-31 | 2018-01-30 | 北京青燕祥云科技有限公司 | 肺部病灶识别方法、装置和实现装置 |
CN110349156B (zh) * | 2017-11-30 | 2023-05-30 | 腾讯科技(深圳)有限公司 | 眼底图片中病变特征的识别方法和装置、存储介质 |
CN108509861B (zh) * | 2018-03-09 | 2020-06-30 | 山东师范大学 | 一种基于样本学习和目标检测结合的目标跟踪方法和装置 |
CN108564570A (zh) * | 2018-03-29 | 2018-09-21 | 哈尔滨工业大学(威海) | 一种智能化的病变组织定位的方法和装置 |
CN108764306B (zh) * | 2018-05-15 | 2022-04-22 | 深圳大学 | 图像分类方法、装置、计算机设备和存储介质 |
CN108765408B (zh) * | 2018-05-31 | 2021-09-10 | 杭州同绘科技有限公司 | 构建癌症病理图像虚拟病例库的方法以及基于卷积神经网络的多尺度癌症检测系统 |
CN109102543B (zh) * | 2018-08-17 | 2021-04-02 | 深圳蓝胖子机器智能有限公司 | 基于图像分割的物体定位方法、设备和存储介质 |
-
2018
- 2018-11-23 CN CN201811410221.0A patent/CN109670532B/zh active Active
-
2019
- 2019-11-01 WO PCT/CN2019/115117 patent/WO2020103676A1/zh unknown
- 2019-11-01 EP EP19886675.8A patent/EP3819822A4/en active Pending
-
2021
- 2021-04-08 US US17/225,861 patent/US11869227B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609680A (zh) * | 2011-12-22 | 2012-07-25 | 中国科学院自动化研究所 | 一种基于三维深度图像信息的并行统计学习人体部位检测方法 |
US20170330054A1 (en) * | 2016-05-10 | 2017-11-16 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method And Apparatus Of Establishing Image Search Relevance Prediction Model, And Image Search Method And Apparatus |
CN107886110A (zh) * | 2017-10-23 | 2018-04-06 | 深圳云天励飞技术有限公司 | 人脸检测方法、装置及电子设备 |
CN109670532A (zh) * | 2018-11-23 | 2019-04-23 | 腾讯科技(深圳)有限公司 | 生物体器官组织图像的异常识别方法、装置及系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3819822A4 |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553327A (zh) * | 2020-05-29 | 2020-08-18 | 上海依图网络科技有限公司 | 一种服饰识别方法、装置、设备和介质 |
CN111553327B (zh) * | 2020-05-29 | 2023-10-27 | 上海依图网络科技有限公司 | 一种服饰识别方法、装置、设备和介质 |
CN113468938A (zh) * | 2020-07-31 | 2021-10-01 | 成都通甲优博科技有限责任公司 | 交通图像识别方法、装置、图像处理设备及可读存储介质 |
CN112364715A (zh) * | 2020-10-23 | 2021-02-12 | 岭东核电有限公司 | 核电作业异常监控方法、装置、计算机设备和存储介质 |
CN112364715B (zh) * | 2020-10-23 | 2024-05-24 | 岭东核电有限公司 | 核电作业异常监控方法、装置、计算机设备和存储介质 |
CN112818970A (zh) * | 2021-01-28 | 2021-05-18 | 北京科技大学设计研究院有限公司 | 一种钢卷喷码识别通用检测方法 |
CN112818970B (zh) * | 2021-01-28 | 2023-07-21 | 北京科技大学设计研究院有限公司 | 一种钢卷喷码识别通用检测方法 |
CN112818865A (zh) * | 2021-02-02 | 2021-05-18 | 北京嘀嘀无限科技发展有限公司 | 车载领域图像识别方法、识别模型建立方法、装置、电子设备和可读存储介质 |
CN113469057A (zh) * | 2021-07-02 | 2021-10-01 | 中南大学 | 火眼视频自适应检测方法、装置、设备及介质 |
CN114417962A (zh) * | 2021-12-08 | 2022-04-29 | 航天科工网络信息发展有限公司 | 基于k近邻算法的异常数据检测方法、系统、设备、介质 |
CN118135484A (zh) * | 2024-03-04 | 2024-06-04 | 北京数原数字化城市研究中心 | 目标检测方法、装置及相关设备 |
Also Published As
Publication number | Publication date |
---|---|
CN109670532B (zh) | 2022-12-09 |
US20210224998A1 (en) | 2021-07-22 |
CN109670532A (zh) | 2019-04-23 |
US11869227B2 (en) | 2024-01-09 |
EP3819822A1 (en) | 2021-05-12 |
EP3819822A4 (en) | 2021-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020103676A1 (zh) | 图像识别方法、装置、系统及存储介质 | |
US11612311B2 (en) | System and method of otoscopy image analysis to diagnose ear pathology | |
US11900647B2 (en) | Image classification method, apparatus, and device, storage medium, and medical electronic device | |
WO2020088288A1 (zh) | 内窥镜图像的处理方法、系统及计算机设备 | |
Tania et al. | Advances in automated tongue diagnosis techniques | |
CN110689025B (zh) | 图像识别方法、装置、系统及内窥镜图像识别方法、装置 | |
JP6458394B2 (ja) | 対象追跡方法及び対象追跡装置 | |
Walia et al. | Recent advances on multicue object tracking: a survey | |
CN112052831B (zh) | 人脸检测的方法、装置和计算机存储介质 | |
CN108292366A (zh) | 在内窥镜手术中检测可疑组织区域的系统和方法 | |
US11972571B2 (en) | Method for image segmentation, method for training image segmentation model | |
CN116188392B (zh) | 图像处理方法、计算机可读存储介质以及计算机终端 | |
WO2022089257A1 (zh) | 医学图像处理方法、装置、设备、存储介质及产品 | |
WO2019184851A1 (zh) | 图像处理方法和装置及神经网络模型的训练方法 | |
CN113261012B (zh) | 处理图像的方法、装置及系统 | |
US11721023B1 (en) | Distinguishing a disease state from a non-disease state in an image | |
CN111325709A (zh) | 无线胶囊内窥镜图像检测系统及检测方法 | |
Du et al. | Improving the classification performance of esophageal disease on small dataset by semi-supervised efficient contrastive learning | |
CN113658145B (zh) | 一种肝脏超声标准切面识别方法、装置、电子设备及存储介质 | |
TWM586599U (zh) | 人工智慧雲端膚質與皮膚病灶辨識系統 | |
CN116912154A (zh) | 一种皮损检测网络的相关方法、装置、设备和存储介质 | |
CN112288768B (zh) | 一种结肠镜图像序列肠息肉区域的跟踪初始化决策系统 | |
CA3205896A1 (en) | Machine learning enabled system for skin abnormality interventions | |
He et al. | Intestinal polyp recognition based on salient codebook locality-constrained linear coding with annular spatial pyramid matching | |
Nazir et al. | Deep Learning Techniques in Cervical Cancer Diagnosis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19886675 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019886675 Country of ref document: EP Effective date: 20210208 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |