WO2020253852A1 - 图片识别方法、识别模型训练方法、装置及存储介质 - Google Patents
图片识别方法、识别模型训练方法、装置及存储介质 Download PDFInfo
- Publication number
- WO2020253852A1 WO2020253852A1 PCT/CN2020/097273 CN2020097273W WO2020253852A1 WO 2020253852 A1 WO2020253852 A1 WO 2020253852A1 CN 2020097273 W CN2020097273 W CN 2020097273W WO 2020253852 A1 WO2020253852 A1 WO 2020253852A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- recognition model
- picture
- training
- cube
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 202
- 238000000034 method Methods 0.000 title claims abstract description 105
- 238000003860 storage Methods 0.000 title claims abstract description 23
- 230000006870 function Effects 0.000 claims description 40
- 230000015654 memory Effects 0.000 claims description 36
- 238000004590 computer program Methods 0.000 claims description 23
- 230000011218 segmentation Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 9
- 201000010099 disease Diseases 0.000 description 16
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 16
- 238000010586 diagram Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 12
- 206010008111 Cerebral haemorrhage Diseases 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000002595 magnetic resonance imaging Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 4
- 206010002329 Aneurysm Diseases 0.000 description 3
- 208000022211 Arteriovenous Malformations Diseases 0.000 description 3
- 206010020772 Hypertension Diseases 0.000 description 3
- 208000009433 Moyamoya Disease Diseases 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000005744 arteriovenous malformation Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000002591 computed tomography Methods 0.000 description 2
- 238000002059 diagnostic imaging Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008707 rearrangement Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 208000003174 Brain Neoplasms Diseases 0.000 description 1
- 208000014644 Brain disease Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 201000007983 brain glioma Diseases 0.000 description 1
- JXSJBGJIGXNWCI-UHFFFAOYSA-N diethyl 2-[(dimethoxyphosphorothioyl)thio]succinate Chemical compound CCOC(=O)CC(SP(=S)(OC)OC)C(=O)OCC JXSJBGJIGXNWCI-UHFFFAOYSA-N 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000003325 tomography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Definitions
- This application relates to the field of computers, and in particular to a picture recognition method, recognition model training method, device and storage medium.
- the embodiments of the present application provide a picture recognition method, a recognition model training method, a device, and a storage medium, which can improve the accuracy of picture recognition while improving the efficiency of model training.
- the embodiment of the application provides a picture recognition method, including:
- the first recognition model is used to recognize the target three-dimensional picture to obtain the picture type of the target three-dimensional picture;
- the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
- the second recognition model is a model obtained by training the original recognition model using target training samples
- the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture, where N is a natural number greater than 1;
- the embodiment of the present application provides a picture recognition method, which is executed by a terminal, and the terminal includes one or more processors and memories, and one or more programs, wherein the one or more The program is stored in the memory, the program may include one or more units each corresponding to a set of instructions, and the one or more processors are configured to execute the instructions; the method includes:
- the first recognition model is used to recognize the target 3D picture to obtain the picture type of the target 3D picture;
- the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
- the second recognition model is a model obtained by training the original recognition model using target training samples
- the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture, where N is a natural number greater than 1;
- the embodiment of the present application also provides a method for training a recognition model.
- the method is executed by a network device.
- the network device includes one or more processors and memories, and one or more programs, wherein the one Or more than one program is stored in the memory, the program may include one or more units each corresponding to a set of instructions, and the one or more processors are configured to execute the instructions; including:
- the convolution block of the second recognition model is the same as the convolution block of the first recognition model, and is used for the first recognition model to recognize the target three-dimensional picture to obtain the picture type of the target three-dimensional picture.
- the embodiment of the application also provides a recognition model training method, including:
- the convolution block of the second recognition model is the same as the convolution block of the first recognition model, and is used for the first recognition model to recognize the target three-dimensional picture to obtain the type of the target three-dimensional picture.
- An embodiment of the present application also provides a picture recognition device, including:
- the first obtaining unit is configured to obtain a three-dimensional picture of the target to be recognized;
- the first input unit is configured to input the three-dimensional picture of the target into the first recognition model;
- the first recognition model is used to recognize the target 3D picture to obtain the picture type of the target 3D picture;
- the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
- the second recognition model is a model obtained by training the original recognition model using target training samples
- the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture, where N is a natural number greater than 1;
- the second acquiring unit is configured to acquire the type of the target three-dimensional picture output by the first recognition model.
- the embodiment of the present application also provides a recognition model training device, including:
- a segmentation unit configured to obtain a three-dimensional sample picture, and segment N target cubes from the three-dimensional sample picture, where N is a natural number greater than one;
- a processing unit configured to rotate and sort the N target cubes to obtain target training samples
- a training unit configured to use the target training sample to train the original recognition model to obtain a second recognition model
- the convolution block of the second recognition model is the same as the convolution block of the first recognition model, and is used for the first recognition model to recognize the target three-dimensional picture to obtain the type of the target three-dimensional picture.
- the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is set to execute the above-mentioned image recognition method when running.
- An embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the above-mentioned image recognition method by running the computer program.
- the training efficiency of training the first recognition model is improved by training the first recognition model before using the first recognition model , While improving the accuracy of image recognition.
- FIG. 1 is a schematic diagram of an application environment of a picture recognition method according to an embodiment of the present application
- FIG. 2 is a schematic flowchart of a picture recognition method according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of a picture recognition method according to an embodiment of the present application.
- FIG. 4 is a schematic diagram of another picture recognition method according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of another image recognition method according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of another image recognition method according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of another image recognition method according to an embodiment of the present application.
- FIG. 8 is a schematic diagram of another image recognition method according to an embodiment of the present application.
- FIG. 9 is a schematic diagram of another image recognition method according to an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of a method for training a recognition model according to an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a picture recognition apparatus according to an embodiment of the present application.
- FIG. 12 is a schematic structural diagram of a recognition model training device according to an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- Magnetic resonance imaging Magnetic Resonance Imagin, MRI: A type of medical imaging.
- Computerized Tomography (Computed Tomography, CT): A type of medical imaging that can be used to check a variety of diseases.
- Convolutional neural network (convolution neural network, CNN)
- BRATS Multimodal Brain Tumor Segmentation
- Feature map A feature map obtained after image and filter convolution. In practical applications, Feature map can be convolved with filter to generate a new feature map.
- Siamese network (Siamese network): Contains several convolutional neural networks with the same structure, and each network can share weight parameters.
- Hamming distance used to measure the number of different characters at the corresponding positions of two strings.
- FCN Fully convolutional network
- a picture recognition method is provided, and the picture recognition method can be, but is not limited to, applied to the environment shown in FIG. 1.
- the user 102 and the user equipment 104 may perform human-computer interaction.
- the user equipment 104 includes a memory 106 configured to store interactive data, and a processor 108 configured to process interactive data.
- the user equipment 104 can exchange data with the server 112 through the network 110.
- the server 112 includes a database 114 for storing interactive data, and a processing engine 116 for processing interactive data.
- the user equipment 104 includes a first recognition model.
- the user equipment 104 can obtain the target 3D picture 104-2 to be recognized, recognize the target 3D picture 104-2, and output the picture type 104- of the target 3D picture 104-2. 4.
- the above-mentioned image recognition method can be, but is not limited to, applied to terminals that can calculate data, such as mobile phones, tablets, laptops, PCs and other terminals.
- the above-mentioned networks can include, but are not limited to, wireless networks or wired networks. .
- the wireless network includes: Bluetooth, WIFI and other networks that realize wireless communication.
- the aforementioned wired network may include, but is not limited to: wide area network, metropolitan area network, and local area network.
- the aforementioned server may include, but is not limited to, any hardware device that can perform computing, such as an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or the provision of cloud services, cloud databases, and cloud services.
- Cloud servers for basic cloud computing services such as computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, and big data and artificial intelligence platforms.
- the above-mentioned picture recognition method includes:
- S202 Acquire a 3D picture of a target to be identified.
- S204 Input the 3D image of the target to be recognized into the first recognition model.
- the first recognition model is used to recognize the target 3D picture to be recognized to obtain the picture type of the target 3D picture to be recognized;
- the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
- the second recognition model is a model obtained by training the original recognition model using target training samples.
- the target training samples include: a cube obtained by rotating and sorting N target cubes obtained from a 3D sample picture, where N is greater than 1. Natural number.
- S206 Acquire the first type of the target 3D picture output by the first recognition model.
- the above-mentioned picture recognition method can be but not limited to be applied to the field of picture recognition.
- the above method is applied to the process of identifying the type of 3D picture.
- the process of recognizing the type of disease in 3D disease pictures For example, when recognizing the type of cerebral hemorrhage, after obtaining a 3D disease picture (the 3D disease picture can be an MRI picture or a CT picture), the 3D disease picture is input into the first recognition model, and the first model is used to compare the 3D disease picture Recognize and output the first type of 3D disease pictures.
- the first type can be healthy, or aneurysm, arteriovenous malformation, moyamoya disease, high blood pressure, etc.
- the second recognition model is trained in advance using the cube extracted from the 3D picture, thereby improving the training efficiency of the second recognition model, and the convolution block of the second recognition model is used as the convolution of the first recognition model.
- the building block uses the first recognition model to recognize 3D pictures, which greatly improves the training efficiency of the first recognition model.
- the second recognition model before acquiring the target 3D picture, the second recognition model needs to be trained first. During training, you first need to obtain 3D sample pictures. The 3D sample pictures are unlabeled pictures. After obtaining the 3D sample picture, it is necessary to extract the original cube from the 3D sample picture, and split the original cube into N target cubes.
- the geometric center of the 3D sample picture when extracting the original cube, may be determined first. After the geometric center is determined, the geometric center is taken as the geometric center of the original cube, and the original cube is determined. The length of the side of the original cube is smaller than the length of the smallest side of the 3D sample picture.
- the geometric center 304 of the 3D sample picture 302 is determined first, and then the original cube 306 with the geometric center 304 as the geometric center is determined.
- a radius r can also be determined, and then the geometric center of the 3D sample image is taken as the center, the radius r is used as the radius to make a sphere, and then any sphere is selected.
- One point is used as the geometric center of the original cube to determine the original cube. It should be noted that the determined original cube is located in the 3D sample picture and will not exceed the scope of the 3D sample picture.
- the original cube needs to be split to obtain N target cubes.
- any method can be used, such as randomly digging out N target cubes from the original cube, or splitting a part of the original cube to obtain N target cubes. Or, split the original cube into N target cubes, where N is the third power of a positive integer.
- N is the third power of a positive integer.
- N is the third power of a positive integer.
- an original cube 404 is split in the directions indicated by the arrows of 402-1, 402-2, and 402-3 to obtain 8 target cubes (in Figure 4
- the split method is just an example).
- every two adjacent cubes are separated by M voxels.
- the original cube 502 is split into 8 target cubes 504.
- the side length of the original cube 502 is 10 voxels, and the side length of the target cube 504 is 4 voxels.
- the first target cube among the N target cubes may be rotated by a first angle, such as 90 degrees, 180 degrees, and so on. There may be one or more first target cubes, and the rotation angle of each first target cube may be the same or different. Sort the rotated first target cube and the remaining unrotated target cubes. The sorting can be randomly sorted, and the target training samples are obtained after sorting.
- the original recognition model is trained using the target training sample, and the original recognition model outputs the probability of what kind of rotation and the sequence of the target cube in the target training sample.
- the aforementioned probability may or may not satisfy the first objective function.
- the first objective function may be a loss function. If the above probability satisfies the first objective function, it means that the original recognition model is correct. If the above probability does not satisfy the first objective function, it means that the recognition result of the original recognition model is incorrect.
- the probability of the recognition result satisfying the first objective function is greater than the first threshold, the original recognition model is determined as the second recognition model. It shows that the accuracy of the second recognition model is greater than the first threshold. If the accuracy is above 99.95%.
- the convolution block in the second recognition model can be obtained, the convolution block can be used as the convolution block of the first recognition model, and the first training sample is used to A recognition model is trained.
- the first training sample is a 3D picture including picture types.
- the first recognition model can be put into use. Such as identifying the type of disease in 3D pictures.
- a selection button 602-1 is displayed on the display interface 602 of the terminal. The user can select the target 3D picture 604 to be recognized. The terminal recognizes the target 3D picture 604 to be recognized, and outputs the target 3D picture to be recognized The first type 606.
- the BRATS-2018 data set includes MRI images of 285 patients. Each patient's MRI image includes 4 different modalities, namely T1, T1Gd, T2, and FLAIR. The data of different modalities are co-registered. Each image The size is 240x240x155.
- the cerebral hemorrhage dataset includes 1486 brain CT scan images of cerebral hemorrhage.
- the types of cerebral hemorrhage are aneurysm, arteriovenous malformation, moyamoya disease, and hypertension.
- the size of each CT image is 230x270x30.
- the above picture is used as the training of the second recognition model.
- the original cube is extracted from the picture and the original cube is split into the target cube.
- the specific method of selecting the original cube please refer to the above example, and will not repeat it here.
- After selecting the original cube in order to encourage the network to learn high-level semantic feature information through the agent task of Rubik’s cube restoration instead of low-level statistical feature information of pixel distribution, we cut the original cube to obtain the target cube when the adjacent two A random interval within 10 voxels is reserved between the target cubes, and then the voxels in each target cube are normalized by [-1,1]. Get the target training sample.
- the Siamese network (Siamese network) includes X sub-networks that share weights with each other, where X represents the number of target cubes.
- X represents the number of target cubes.
- an 8-in-1 Siamese network with 8 target cube inputs is used.
- Each sub-network has the same network structure and shares weights with each other.
- the backbone structure of each sub-network can use various types of 3D CNN that currently exist, and the 3D VGG network is used in the experiment.
- the output feature map of the last fully connected layer of all sub-networks is superimposed and then input into different branches, which are respectively used for the task of spatial rearrangement of the target cube and the task of determining the rotation of the target cube.
- the above feature map is the content output by any network in the convolution model.
- the first step is to rearrange the target cube.
- Hamming distance is used as a measurement index, and K sequences that are more different from each other are selected in turn.
- lj represents the true label one-hot label of the sequence
- table pj represents the predicted probability of each sequence output by the network.
- a new operation is added to the 3D Rubik's Cube restoration task, that is, the rotation of the target cube.
- the network can learn the rotation-invariant characteristics of the 3D image block.
- 3 rotation axis, x, y, z axis
- x 2 rotation direction, clockwise, counterclockwise
- x 4 rotation
- the target cube can only be rotated by 180° in the horizontal or vertical direction.
- the magic cubes 3 and 4 are rotated by 180° horizontally, and the magic cubes 5 and 7 are rotated by 180° in the vertical direction.
- the network needs to determine how each target cube has proceeded. This type of rotation, so the loss function for the magic cube rotation task is as follows:
- Formula M represents the number of targets cube, g i hor cube represents a target vertical rotating one-hot labels, g i ver cube represents the target horizontal rotation one-hot labels, r i hor, r i ver represent network The predicted output probability in the vertical and horizontal directions.
- the objective function of the model is the linear weighting of the permutation loss function and the rotation loss function.
- the overall loss function of the model is as follows:
- a and b are the weights of the two loss functions, which control the degree of mutual influence between the two subtasks. In the experiment, setting the two weights to 0.5 can make the pre-training achieve better results.
- the second recognition model can be obtained.
- the accuracy of the second recognition model is greater than the first threshold.
- the convolution block of the second recognition model can be extracted and used as other target tasks after fine-tuning.
- the convolution block of the second recognition model is extracted and used for the first recognition model to recognize the type of 3D picture.
- the convolution block of the second recognition model is extracted and used for the first recognition model to recognize the type of 3D picture.
- classification tasks only the fully connected layer behind the CNN network needs to be retrained, and the convolutional layer before the fully connected layer can be fine-tuned with a smaller learning rate.
- the pre-training network can be used in a fully convolutional neural network (FCN) commonly used in image segmentation tasks, such as a 3D U-Net structure, as shown in Figure 8.
- FCN fully convolutional neural network
- the network parameters in the up-sampling stage of U-Net still need to be initialized randomly during training.
- DUC Dense Upsampling Convolution
- the cube extracted from the 3D picture is used to train the second recognition model in advance, the training efficiency of the second recognition model is improved, and the convolution block of the second recognition model is used as the convolution of the first recognition model.
- the building block uses the first recognition model to recognize 3D pictures, which greatly improves the training efficiency of the first recognition model.
- the obtaining the 3D picture of the target to be recognized may further include:
- the 3D sample picture and the target 3D picture may be the same picture. That is, after the second recognition model is trained using 3D sample pictures, and the second convolution block is used as the convolution block of the first recognition model, the 3D sample pictures can be input into the first recognition model, and the A recognition model recognizes the type of 3D sample picture. When the 3D sample picture is input to the second recognition model, it is not necessary to input the type of the 3D sample picture.
- N target cubes are obtained to train the second recognition model, which improves the training efficiency of training the second recognition model and improves the training of the first recognition model effectiveness.
- the N is a positive integer greater than 1 to the third power
- the splitting the original cube into the N target cubes includes:
- M voxels are spaced between two adjacent target cubes, so that the second recognition model can learn high-level semantic feature information instead of low-level pixel distribution.
- the statistical feature information improves the training efficiency of the second recognition model and improves the training efficiency of the first recognition model.
- the following operations may also be included:
- S3 Sort the first target cube after rotating the first angle with other target cubes among the N target cubes to obtain the target training sample.
- the foregoing sorting may be randomly sorting the N target cubes.
- the above-mentioned rotation can rotate multiple first target cubes among the N target cubes, and the rotation can be rotated at any angle.
- sorting the first target cube after rotating the first angle with other target cubes among the N target cubes, after obtaining the target training sample may further include:
- the target training sample is input into the original recognition model to train the original recognition model to obtain the second recognition model.
- the target to be recognized may further include:
- the recognition accuracy of the second recognition model is greater than an accuracy value, it is considered that the second recognition model meets the requirements, and the training of the second recognition model is stopped.
- the target to be recognized may further include:
- a first sample picture with a label may be input. Then the first recognition model is trained until the recognition accuracy of the first recognition model is greater than the second threshold, then the first recognition model can be put into use.
- the target training samples include rotating and sorting N target cubes obtained from three-dimensional sample pictures. Cube; Pre-training the second recognition model with cubes extracted from three-dimensional pictures improves the training efficiency of the second recognition model and at the same time improves the recognition accuracy of three-dimensional pictures;
- the convolution block of the first recognition model is the same as the convolution block of the second recognition model; that is, the convolution block trained by the second recognition model is used as the convolution block of the first recognition model. 1. The training efficiency of the recognition model;
- the target three-dimensional picture is recognized, which improves the recognition accuracy; by training the first recognition model before using the first recognition model, the recognition is improved.
- the training efficiency of the first recognition model for training is improved.
- the embodiment of the present application also provides a recognition model training method. As shown in Figure 10, the method includes:
- S1002 Obtain a 3D sample picture, and segment N target cubes from the 3D sample picture.
- N is a natural number greater than 1.
- S1004 Perform a predetermined operation on the N target cubes to obtain target training samples, where the predetermined operation includes rotating and sorting the N target cubes.
- S1006 Use the target training sample to train the original recognition model to obtain a second recognition model.
- the original recognition model is used to output the recognition result of the target training sample, and when the probability of the recognition result satisfying the first objective function is greater than the first threshold, the original recognition model is determined as the second recognition model.
- the convolution block of the second recognition model is the same as the convolution block of the first recognition model, and is used for the first recognition model to recognize the target three-dimensional picture to obtain the picture type of the target three-dimensional picture.
- the above-mentioned method may but is not limited to be applied to the process of model training.
- N target cubes are extracted from a 3D sample picture, and the N target cubes are rotated and sorted to obtain N cubes as target training samples and input into the original recognition model.
- the extraction, rotation, and sorting methods can refer to the methods in the foregoing embodiment, and details are not described in this embodiment.
- the original recognition model outputs the probability of what kind of rotation and the order of the target cube in the target training sample.
- the aforementioned probability may or may not satisfy the first objective function.
- the first objective function may be a loss function.
- the current original recognition model is determined to be a well-trained model.
- the convolution block of the original recognition model can be extracted, and after adding a new fully connected layer, a new recognition model can be formed, and the new recognition model can be used for recognition Other characters.
- the new recognition model can have high recognition accuracy after training with a small number of samples. For example, applying the new recognition model to the process of recognizing the type of 3D pictures, or applying the new recognition model to tasks such as segmentation of 3D pictures, will not be repeated here.
- the embodiment of the present application also provides a picture recognition device for implementing the above picture recognition method.
- the device includes:
- the first obtaining unit 1102 is configured to obtain a 3D picture of the target to be recognized
- the first input unit 1104 is configured to input the target 3D picture to be recognized into the first recognition model.
- the first recognition model is used to recognize the target 3D picture to be recognized to obtain the picture type of the target 3D picture to be recognized;
- the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
- the second recognition model is a model obtained by training the original recognition model using target training samples
- the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a 3D sample picture, where N is a natural number greater than 1;
- the second obtaining unit 1106 is configured to obtain the first type of the target 3D picture to be recognized and output by the first recognition model.
- the above-mentioned picture recognition apparatus may be but not limited to be applied in the field of picture recognition.
- the above method is applied to the process of identifying the type of 3D picture.
- the process of recognizing the type of disease in 3D disease pictures For example, when recognizing the type of cerebral hemorrhage, after obtaining the 3D disease picture, the 3D disease picture is input into the first recognition model, and the 3D disease picture is recognized using the first model and the first type of 3D disease picture is output.
- the first type can be healthy, or aneurysm, arteriovenous malformation, moyamoya disease, high blood pressure, etc.
- the second recognition model is trained with cubes extracted from the 3D picture in advance, thereby improving the training efficiency of the second recognition model; the convolution block of the second recognition model is used as the volume of the first recognition model The building block uses the first recognition model to recognize 3D pictures, which greatly improves the training efficiency of the first recognition model.
- the second recognition model before acquiring the target 3D picture, the second recognition model needs to be trained first. During training, you first need to obtain 3D sample pictures. The 3D sample pictures are unlabeled pictures. After obtaining the 3D sample picture, it is necessary to extract the original cube from the 3D sample picture, and split the original cube into N target cubes.
- the geometric center of the 3D sample picture when extracting the original cube, the geometric center of the 3D sample picture may be determined first. After the geometric center is determined, the geometric center is taken as the geometric center of the original cube, and the original cube is determined. The side length of the original cube is smaller than the minimum variable length of the 3D sample picture.
- the geometric center 304 of the 3D sample picture 302 is first determined, and then the original cube 306 with the geometric center 304 as the geometric center is determined.
- a radius r can also be determined, and then the geometric center of the 3D sample image is taken as the center, the radius r is used as the radius to make a sphere, and then any sphere is selected.
- One point is used as the geometric center of the original cube to determine the original cube. It should be noted that the determined original cube is in the 3D sample picture and will not exceed the scope of the 3D sample picture.
- the original cube needs to be split to obtain N target cubes.
- any method can be used, such as randomly digging out N target cubes from the original cube, or splitting a part of the original cube to obtain N target cubes. Or, split the original cube into N target cubes, where N is the third power of a positive integer.
- N is the third power of a positive integer.
- N is the third power of a positive integer.
- an original cube 404 is split in the directions indicated by the arrows of 402-1, 402-2, and 402-3 to obtain 8 target cubes (in Figure 4
- the split method is just an example).
- every two adjacent cubes are separated by M voxels.
- the original cube 502 is split into 8 target cubes 504.
- the side length of the original cube 502 is 10 voxels, and the side length of the target cube 504 is 4 voxels.
- the first target cube among the N target cubes may be rotated by a first angle, such as 90 degrees, 180 degrees, and so on. There may be one or more first target cubes, and the rotation angle of each first target cube may be the same or different. Sort the rotated first target cube and the remaining unrotated target cubes. The sorting can be randomly sorted, and the target training samples are obtained after sorting.
- the original recognition model is trained using the target training sample, and the original recognition model outputs the probability of what kind of rotation and the sequence of the target cube in the target training sample.
- the aforementioned probability may or may not satisfy the first objective function.
- the first objective function may be a loss function. If the above probability satisfies the first objective function, it means that the original recognition model is correct. If the above probability does not satisfy the first objective function, it means that the recognition result of the original recognition model is incorrect.
- the probability of the recognition result satisfying the first objective function is greater than the first threshold, the original recognition model is determined as the second recognition model. It shows that the accuracy of the second recognition model is greater than the first threshold. If the accuracy is above 99.95%.
- the convolution block in the second recognition model can be obtained, the convolution block can be used as the convolution block of the first recognition model, and the first training sample is used to A recognition model is trained.
- the first training sample is a 3D picture including picture types.
- the first recognition model can be put into use. Such as identifying the type of disease in 3D pictures.
- a selection button 602-1 is displayed on the display interface 602 of the terminal. The user can select the target 3D picture 604 to be recognized. The terminal recognizes the target 3D picture 604 to be recognized, and outputs the target 3D picture to be recognized The first type 606.
- the second recognition model is trained in advance using cubes extracted from 3D pictures, thereby improving the training efficiency of the second recognition model; the convolution block of the second recognition model is used as the first recognition model The convolution block of, uses the first recognition model to recognize 3D pictures, which greatly improves the training efficiency of the first recognition model.
- the device further includes:
- the third acquiring unit is configured to acquire the 3D sample picture before the acquiring the target 3D picture to be identified;
- the first determining unit is configured to determine the original cube from the 3D sample picture
- the splitting unit is configured to split the original cube into the N target cubes.
- the 3D sample picture and the target 3D picture may be the same picture. That is, after the second recognition model is trained using 3D sample pictures, and the second convolution block is used as the convolution block of the first recognition model, the 3D sample pictures can be input into the first recognition model, and the A recognition model recognizes the type of 3D sample picture. When the 3D sample picture is input to the second recognition model, it is not necessary to input the type of the 3D sample picture.
- the N is a positive integer greater than 1 to the third power
- the splitting unit includes:
- the splitting module is configured to maintain an interval of M voxels between two adjacent target cubes, and split the N target cubes from the original cube, where M is greater than 0 and less than J-1 A positive integer of, the J is the side length of the target cube.
- M voxels are spaced between two adjacent target cubes, so that the second recognition model can learn high-level semantic feature information instead of low-level pixel distribution.
- the statistical feature information improves the training efficiency of the second recognition model and improves the training efficiency of the first recognition model.
- the device further includes:
- the second determining unit is configured to determine a first target cube from the N target cubes before the acquiring the target 3D picture to be recognized;
- a rotating unit configured to rotate the first target cube by a first angle
- the sorting unit is configured to sort the first target sample cube and other target cubes after rotating the first angle among the N target cubes to obtain the target training sample.
- the foregoing sorting may be randomly sorting the N target cubes.
- the above-mentioned rotation can rotate a plurality of first target cubes among the N target cubes. Rotation can rotate at any angle.
- the device further includes:
- the second input unit is configured to sort the first target sample cube and other target cubes after rotating the first angle among the N target cubes, and after obtaining the target training sample,
- the target training samples are input into the original recognition model to train the original recognition model to obtain the second recognition model.
- the device further includes:
- the fourth obtaining unit is configured to obtain, before obtaining the 3D picture of the target to be recognized, the recognition result output after the original recognition model recognizes the target training sample, wherein the recognition result includes the target training Probability of the various sorting orders of the target cube in the sample and the rotation angle of each target cube;
- the third determining unit is configured to determine the original recognition model as the second recognition model when the probability that the recognition result meets the first objective function is greater than a first threshold.
- the device further includes:
- a fourth determining unit configured to determine the convolution block of the second recognition model as the convolution block of the first recognition model before acquiring the target 3D picture to be recognized;
- the training unit is configured to use a first training sample to train the first recognition model until the accuracy of the first recognition model is greater than a second threshold, wherein the first training sample includes the first 3D picture and the Describe the type of the first 3D picture.
- a first sample picture with a label may be input. Then the first recognition model is trained until the recognition accuracy of the first recognition model is greater than the second threshold, then the first recognition model can be put into use.
- An embodiment of the present application also provides a recognition model training device for implementing the above recognition model training method.
- the device includes:
- the segmentation unit 1202 is configured to obtain 3D sample pictures, and segment N target cubes from the 3D sample pictures;
- the processing unit 1204 is configured to perform a predetermined operation on the N target cubes to obtain target training samples, where the predetermined operation includes rotating and sorting the N target cubes;
- the training unit 1206 is configured to use the target training sample to train the original recognition model to obtain a second recognition model, where the original recognition model is used to output the recognition result of the target training sample, and the probability that the recognition result meets the first objective function is greater than At the first threshold, the original recognition model is determined as the second recognition model.
- the above-mentioned device can be, but not limited to, applied in the process of model training.
- N target cubes are extracted from a 3D sample picture, and the N target cubes are rotated and sorted to obtain N cubes as target training samples and input into the original recognition model.
- the original recognition model outputs the probability of what kind of rotation and the order of the target cube in the target training sample.
- the aforementioned probability may or may not satisfy the first objective function.
- the first objective function may be a loss function.
- the current original recognition model is determined to be a well-trained model.
- the convolution block of the original recognition model can be extracted, and after adding a new fully connected layer, a new recognition model can be formed, and the new recognition model can be used for recognition Other characters.
- the new recognition model can have high recognition accuracy after training with a small number of samples. For example, applying the new recognition model to the process of recognizing the type of 3D pictures, or applying the new recognition model to tasks such as segmentation of 3D pictures, will not be repeated here.
- the embodiment of the present application also provides an electronic device for implementing the above-mentioned picture recognition method.
- the electronic device includes a memory 1302 and a processor 1304.
- the memory 1302 stores a computer program
- the processor 1304 It is configured to execute the image recognition method provided in the embodiment of the present application through a computer program.
- the above electronic device may be located in at least one network device among a plurality of network devices in a computer network.
- the aforementioned processor may be configured to execute the following steps through a computer program:
- S3 Acquire the first type of the target 3D picture to be recognized and output by the first recognition model.
- the structure shown in FIG. 13 is only for illustration, and the electronic device may also be a smart phone (such as an Androrid mobile phone, an ios mobile phone, etc., a tablet computer, a palmtop computer, and a mobile Internet device ( Mobile Internet Devices, MID), PAD and other terminal equipment.
- Figure 13 does not limit the structure of the above electronic device.
- the electronic device may also include more or less components (such as network interfaces, etc.) than shown in Figure 13 , Or have a different configuration from that shown in Figure 13.
- the memory 1302 can be used to store software programs and modules, such as program instructions/modules corresponding to the image recognition method and device in the embodiment of the present application.
- the processor 1304 executes the software programs and modules stored in the memory 1302 by running the software programs and modules. This kind of functional application and data processing realize the above-mentioned picture recognition method.
- the memory 1302 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
- the memory 1302 may further include a memory remotely provided with respect to the processor 1304, and these remote memories may be connected to the terminal through a network.
- the memory 1302 may specifically, but is not limited to, storing information such as target 3D pictures to be recognized.
- the memory 1302 may include, but is not limited to, the first acquisition unit 1102, the first input unit 1104, and the second acquisition unit 1106 in the image recognition apparatus described above.
- it may also include, but is not limited to, other module units in the above-mentioned picture recognition device, which will not be repeated in this example.
- the aforementioned transmission device 1306 is used to receive or send data via a network.
- the above-mentioned specific examples of networks may include wired networks and wireless networks.
- the transmission device 1306 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers via a network cable so as to communicate with the Internet or a local area network.
- the transmission device 1306 is a radio frequency (RF) module, which can be configured to communicate with the Internet in a wireless manner.
- RF radio frequency
- the above electronic device further includes: a display 1308 configured to display the first type of 3D picture to be recognized; and a connection bus 1310 configured to connect each module component in the above electronic device.
- the embodiment of the present application also provides an electronic device for implementing the above recognition model training method.
- the electronic device includes a memory 1402 and a processor 1404.
- the memory 1402 stores a computer program
- the processor 1404 is configured to execute the above-mentioned recognition model training method provided in the embodiment of the present application through a computer program.
- the above electronic device may be located in at least one network device among a plurality of network devices in a computer network.
- the aforementioned processor may be configured to execute the following steps through a computer program:
- the structure shown in FIG. 14 is only for illustration, and the electronic device may also be a smart phone (such as an Androrid mobile phone, an ios mobile phone, etc., a tablet computer, a handheld computer, and a mobile Internet device Mobile Internet Devices, MID), PAD and other terminal equipment.
- Figure 14 does not limit the structure of the above electronic device.
- the electronic device may also include more or less components (such as network interfaces, etc.) than those shown in Figure 14 ), or have a different configuration from that shown in Figure 14.
- the memory 1402 may be configured to store software programs and modules, such as the program instructions/modules corresponding to the recognition model training method and device in the embodiment of the present application.
- the processor 1404 runs the software programs and modules stored in the memory 1402, thereby Perform various functional applications and data processing, that is, realize the above-mentioned recognition model training method.
- the memory 1402 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
- the storage 1402 may further include a storage remotely provided with respect to the processor 1404, and these remote storages may be connected to the terminal through a network.
- the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- the memory 1402 can be specifically but not limited to storing information such as 3D sample pictures.
- the aforementioned memory 1402 may, but is not limited to, include the segmentation unit 1202, the processing unit 1204, and the training unit 1206 in the aforementioned recognition model training device.
- it may also include, but is not limited to, other module units in the above-mentioned recognition model training device, which will not be repeated in this example.
- the aforementioned transmission device 1406 is configured to receive or send data via a network.
- the above-mentioned specific examples of networks may include wired networks and wireless networks.
- the transmission device 1406 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers via a network cable so as to communicate with the Internet or a local area network.
- the transmission device 1406 is a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
- RF radio frequency
- the above electronic device further includes: a display 1408 configured to display the training accuracy of the original recognition model, etc.; and a connection bus 1410 configured to connect each module component in the above electronic device.
- the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the recognition model training method provided in the embodiment of the present application when the computer program is run.
- the aforementioned storage medium may be configured to store a computer program for executing the following steps:
- the second recognition model is a model obtained by training an original recognition model using a target training sample, and the target training sample includes a cube obtained by rotating and sorting N target cubes obtained from a 3D sample picture, where N is Natural numbers greater than 1;
- S3 Acquire the first type of the target 3D picture to be recognized and output by the first recognition model.
- the storage medium may be configured to store a computer program for executing the following steps:
- the storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
- the integrated unit in the foregoing embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the foregoing computer-readable storage medium.
- the technical solution of this application essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, A number of instructions are included to enable one or more computer devices (which may be personal computers, servers, or network devices, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the disclosed client can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division.
- multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- a three-dimensional picture of a target to be recognized is acquired; the three-dimensional picture of the target to be recognized is input into a first recognition model; wherein, the first recognition model is used to recognize the target three-dimensional picture, Obtain the picture type of the target three-dimensional picture; the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used for the recognition of the target three-dimensional picture; the second recognition model Is a model obtained by training the original recognition model using the target training sample; wherein the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture, where N is greater than 1.
- the second recognition model is a model obtained by training the original recognition model using the target training sample.
- the target training sample includes a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture;
- the cube extracted from the 3D picture trains the second recognition model, which improves the training efficiency of the second recognition model and at the same time improves the recognition accuracy of the 3D picture;
- the convolution block of the first recognition model and the volume of the second recognition model The product blocks are the same; that is, the convolution block trained by the second recognition model is used as the convolution block of the first recognition model, so that the training efficiency of the first recognition model is improved; 3) the first recognition model is compared with The second recognition model of the same convolution block recognizes the target three-dimensional picture, which improves the recognition accuracy.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (17)
- 一种图片识别方法,所述方法由终端执行,所述方法包括:获取待识别的目标三维图片;将所述待识别的目标三维图片输入到第一识别模型中;其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到所述目标三维图片的图片类型;所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;获取所述第一识别模型输出的所述目标三维图片的类型。
- 根据权利要求1所述的方法,其中,所述获取待识别的目标三维图片之前,所述方法还包括:获取所述三维样本图片;从所述三维样本图片中确定出原始正方体;将所述原始正方体拆分为所述N个目标正方体。
- 根据权利要求2所述的方法,其中,所述N的值为大于1的正整数的3次方,所述将所述原始正方体拆分为所述N个目标正方体,包括:保持相邻的两个所述目标正方体之间间隔M个体素,从所述原始正方体中拆分出所述N个目标正方体,所述M为大于0且小于J-1的正整数,所述J为所述目标正方体的边长。
- 根据权利要求1所述的方法,其中,所述获取待识别的目标三维图片之前,所述方法还包括:从所述N个目标正方体中确定出第一目标正方体;将所述第一目标正方体旋转第一角度;对旋转所述第一角度之后的第一目标正方体,与所述N个目标正方体中其他目标正方体进行排序,得到所述目标训练样本。
- 根据权利要求4所述的方法,其中,所述得到所述目标训练样本之后,所述方法还包括:将所述目标训练样本输入到所述原始识别模型中,以对所述原始识别模型进行训练,得到所述第二识别模型。
- 根据权利要求1所述的方法,其中,所述获取待识别的目标三维图片之前,所述方法还包括:获取所述原始识别模型对所述目标训练样本进行识别后输出的识别结果;其中,所述识别结果包括:所述目标训练样本中所述目标正方体的各种排序顺序与每一个所述目标正方体的旋转角度的概率;在所述识别结果满足第一目标函数的概率大于第一阈值时,将所述原始识别模型确定为所述第二识别模型。
- 根据权利要求1所述的方法,其中,所述获取待识别的目标三维图片之前,还包括:将所述第二识别模型的卷积块确定为所述第一识别模型的卷积块;使用第一训练样本对所述第一识别模型进行训练,直到所述第一识别模型的准确度大于第二阈值,其中,所述第一训练样本包括第一三维图片与所述第一三维图片的类型。
- 一种识别模型训练方法,所述方法由终端执行,所述方法包括:获取三维样本图片,并从所述三维样本图片中分割出N个目标正方体,N为大于1的自然数;对所述N个目标正方体进行旋转和排序,得到目标训练样本;使用所述目标训练样本对原始识别模型进行训练,得到第二识别模型;其中,所述第二识别模型的卷积块与第一识别模型的卷积块相同,用于所述第一识别模型对目标三维图片进行所述识别,得到所述目标三维图片的类型。
- 一种图片识别装置,包括:第一获取单元,配置为获取待识别的目标三维图片;第一输入单元,配置为将所述目标三维图片输入到第一识别模型中;其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到所述目标三维图片的图片类型;所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;第二获取单元,配置为获取所述第一识别模型输出的所述目标三维图片的类型。
- 根据权利要求9所述的装置,其中,所述装置还包括:第三获取单元,配置为在所述获取待识别的目标三维图片之前,获取所述三维样本图片;第一确定单元,配置为从所述三维样本图片中确定出原始正方体;拆分单元,配置为将所述原始正方体拆分为所述N个目标正方体。
- 根据权利要求10所述的装置,其中,所述装置还包括:第二确定单元,配置为在所述获取待识别的目标三维图片之前,从所述N个目标正方体中确定出第一目标正方体;旋转单元,配置为将所述第一目标正方体旋转第一角度;排序单元,配置为对旋转所述第一角度之后的第一目标正方体,与所述N个目标正方体中其他目标正方体进行排序,得到所述目标训练样本。
- 根据权利要求11所述的装置,其中,所述装置还包括:第二输入单元,配置为将排序得到的所述目标训练样本输入到所述原始识别模型中,以对所述原始识别模型进行训练,得到所述第二识别模型。
- 一种识别模型训练装置,包括:分割单元,配置为获取三维样本图片,并从所述三维样本图片中分割出N个目标正方体,N为大于1的自然数;处理单元,配置为对所述N个目标正方体进行旋转和排序,得到目标训练样本;训练单元,配置为使用所述目标训练样本对原始识别模型进行训练,得到第二识别模型;其中,所述第二识别模型的卷积块与第一识别模型的卷积块相同,用于所述第一识别模型对目标三维图片进行所述识别,得到所述目标三维图片的类型。
- 一种计算机存储介质,所述存储介质存储有计算机程序,所述计算机程序运行时执行所述权利要求1至7中任一项所述的图片识别方法。
- 一种计算机存储介质,所述存储介质存储有计算机程序,所述计算机程序运行时执行所述权利要求8所述的识别模型训练方法。
- 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序时,执行所述权利要求1至7中任一项所述的图片识别方法。
- 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序时,执行所述权利要求8所述的识别模型训练方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021547115A JP7233555B2 (ja) | 2019-06-21 | 2020-06-20 | 画像認識方法、認識モデルのトレーニング方法及びその装置、並びにコンピュータプログラム |
EP20825586.9A EP3989109A4 (en) | 2019-06-21 | 2020-06-20 | IMAGE IDENTIFICATION METHOD AND DEVICE, IDENTIFICATION PATTERN TRAINING METHOD AND DEVICE, AND STORAGE MEDIA |
KR1020217029414A KR102645533B1 (ko) | 2019-06-21 | 2020-06-20 | 이미지 식별 방법 및 기기, 식별 모델 훈련 방법 및 기기, 그리고 저장 매체 |
US17/402,500 US12112556B2 (en) | 2019-06-21 | 2021-08-13 | Image recognition method and apparatus, recognition model training method and apparatus, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910544392.0 | 2019-06-21 | ||
CN201910544392.0A CN110263724A (zh) | 2019-06-21 | 2019-06-21 | 图片识别方法、识别模型训练方法、装置及存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/402,500 Continuation US12112556B2 (en) | 2019-06-21 | 2021-08-13 | Image recognition method and apparatus, recognition model training method and apparatus, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020253852A1 true WO2020253852A1 (zh) | 2020-12-24 |
Family
ID=67920476
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/097273 WO2020253852A1 (zh) | 2019-06-21 | 2020-06-20 | 图片识别方法、识别模型训练方法、装置及存储介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US12112556B2 (zh) |
EP (1) | EP3989109A4 (zh) |
JP (1) | JP7233555B2 (zh) |
KR (1) | KR102645533B1 (zh) |
CN (2) | CN110263724A (zh) |
WO (1) | WO2020253852A1 (zh) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263724A (zh) * | 2019-06-21 | 2019-09-20 | 腾讯科技(深圳)有限公司 | 图片识别方法、识别模型训练方法、装置及存储介质 |
CN110710986B (zh) * | 2019-10-25 | 2021-01-22 | 华院数据技术(上海)有限公司 | 一种基于ct图像的脑部动静脉畸形检测方法及检测系统 |
CN111166070A (zh) * | 2019-12-17 | 2020-05-19 | 五邑大学 | 一种基于指静脉认证的医疗储物柜及其管理方法 |
CN111242952B (zh) * | 2020-01-15 | 2023-06-30 | 腾讯科技(深圳)有限公司 | 图像分割模型训练方法、图像分割方法、装置及计算设备 |
CN111353580B (zh) * | 2020-02-03 | 2023-06-20 | 中国人民解放军国防科技大学 | 目标检测网络的训练方法、电子设备及存储介质 |
CN111723868B (zh) * | 2020-06-22 | 2023-07-21 | 海尔优家智能科技(北京)有限公司 | 用于去除同源图片的方法、装置及服务器 |
CN112241764B (zh) * | 2020-10-23 | 2023-08-08 | 北京百度网讯科技有限公司 | 图像识别方法、装置、电子设备及存储介质 |
CN112686898B (zh) * | 2021-03-15 | 2021-08-13 | 四川大学 | 一种基于自监督学习的放疗靶区自动分割方法 |
CN112949583A (zh) * | 2021-03-30 | 2021-06-11 | 京科互联科技(山东)有限公司 | 复杂城市场景的目标检测方法、系统、设备及存储介质 |
CN113362313B (zh) * | 2021-06-18 | 2024-03-15 | 四川启睿克科技有限公司 | 一种基于自监督学习的缺陷检测方法及系统 |
CN114092446B (zh) * | 2021-11-23 | 2024-07-16 | 中国人民解放军总医院 | 基于自监督学习与M-Net的颅内出血参数获取方法及装置 |
CN114549904B (zh) * | 2022-02-25 | 2023-07-07 | 北京百度网讯科技有限公司 | 视觉处理及模型训练方法、设备、存储介质及程序产品 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107154043A (zh) * | 2017-06-05 | 2017-09-12 | 杭州健培科技有限公司 | 一种基于3dcnn的肺结节假阳性样本抑制方法 |
CN107977963A (zh) * | 2017-11-30 | 2018-05-01 | 北京青燕祥云科技有限公司 | 肺结节的判定方法、装置和实现装置 |
CN108389201A (zh) * | 2018-03-16 | 2018-08-10 | 北京推想科技有限公司 | 基于3d卷积神经网络与深度学习的肺结节良恶性分类方法 |
US20180260621A1 (en) * | 2017-03-10 | 2018-09-13 | Baidu Online Network Technology (Beijing) Co., Ltd. | Picture recognition method and apparatus, computer device and computer- readable medium |
CN110263724A (zh) * | 2019-06-21 | 2019-09-20 | 腾讯科技(深圳)有限公司 | 图片识别方法、识别模型训练方法、装置及存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10467495B2 (en) * | 2015-05-11 | 2019-11-05 | Siemens Healthcare Gmbh | Method and system for landmark detection in medical images using deep neural networks |
CN107025642B (zh) * | 2016-01-27 | 2018-06-22 | 百度在线网络技术(北京)有限公司 | 基于点云数据的车辆轮廓检测方法和装置 |
AU2018313841B2 (en) * | 2017-08-09 | 2023-10-26 | Allen Institute | Systems, devices, and methods for image processing to generate an image having predictive tagging |
CN109147940B (zh) * | 2018-07-05 | 2021-05-25 | 科亚医疗科技股份有限公司 | 从患者的医学图像自动预测生理状况的装置和系统 |
CN109063753B (zh) * | 2018-07-18 | 2021-09-14 | 北方民族大学 | 一种基于卷积神经网络的三维点云模型分类方法 |
CN109886933B (zh) * | 2019-01-25 | 2021-11-02 | 腾讯科技(深圳)有限公司 | 一种医学图像识别方法、装置和存储介质 |
-
2019
- 2019-06-21 CN CN201910544392.0A patent/CN110263724A/zh active Pending
-
2020
- 2020-01-15 CN CN202010043334.2A patent/CN111046855A/zh active Pending
- 2020-06-20 KR KR1020217029414A patent/KR102645533B1/ko active IP Right Grant
- 2020-06-20 WO PCT/CN2020/097273 patent/WO2020253852A1/zh active Application Filing
- 2020-06-20 EP EP20825586.9A patent/EP3989109A4/en active Pending
- 2020-06-20 JP JP2021547115A patent/JP7233555B2/ja active Active
-
2021
- 2021-08-13 US US17/402,500 patent/US12112556B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180260621A1 (en) * | 2017-03-10 | 2018-09-13 | Baidu Online Network Technology (Beijing) Co., Ltd. | Picture recognition method and apparatus, computer device and computer- readable medium |
CN107154043A (zh) * | 2017-06-05 | 2017-09-12 | 杭州健培科技有限公司 | 一种基于3dcnn的肺结节假阳性样本抑制方法 |
CN107977963A (zh) * | 2017-11-30 | 2018-05-01 | 北京青燕祥云科技有限公司 | 肺结节的判定方法、装置和实现装置 |
CN108389201A (zh) * | 2018-03-16 | 2018-08-10 | 北京推想科技有限公司 | 基于3d卷积神经网络与深度学习的肺结节良恶性分类方法 |
CN110263724A (zh) * | 2019-06-21 | 2019-09-20 | 腾讯科技(深圳)有限公司 | 图片识别方法、识别模型训练方法、装置及存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3989109A4 * |
Also Published As
Publication number | Publication date |
---|---|
CN110263724A (zh) | 2019-09-20 |
EP3989109A4 (en) | 2022-07-20 |
JP7233555B2 (ja) | 2023-03-06 |
KR102645533B1 (ko) | 2024-03-07 |
JP2022520390A (ja) | 2022-03-30 |
US12112556B2 (en) | 2024-10-08 |
CN111046855A (zh) | 2020-04-21 |
US20210374475A1 (en) | 2021-12-02 |
EP3989109A1 (en) | 2022-04-27 |
KR20210119539A (ko) | 2021-10-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020253852A1 (zh) | 图片识别方法、识别模型训练方法、装置及存储介质 | |
EP3732619B1 (en) | Convolutional neural network-based image processing method and image processing apparatus | |
JP6993371B2 (ja) | ディープラーニングに基づいたコンピュータ断層撮影肺結節検出法 | |
US10970520B1 (en) | Apparatus and method for image analysis using virtual three-dimensional deep neural network | |
CN109522874B (zh) | 人体动作识别方法、装置、终端设备及存储介质 | |
WO2020107847A1 (zh) | 基于骨骼点的跌倒检测方法及其跌倒检测装置 | |
CN109685819B (zh) | 一种基于特征增强的三维医学图像分割方法 | |
US20210158023A1 (en) | System and Method for Generating Image Landmarks | |
EP4002161A1 (en) | Image retrieval method and apparatus, storage medium, and device | |
WO2020125498A1 (zh) | 心脏磁共振图像分割方法、装置、终端设备及存储介质 | |
CN110838125A (zh) | 医学图像的目标检测方法、装置、设备、存储介质 | |
EP4404148A1 (en) | Image processing method and apparatus, and computer-readable storage medium | |
WO2022111387A1 (zh) | 一种数据处理方法及相关装置 | |
WO2023160157A1 (zh) | 三维医学图像的识别方法、装置、设备、存储介质及产品 | |
CN112529068A (zh) | 一种多视图图像分类方法、系统、计算机设备和存储介质 | |
TW202215303A (zh) | 使用基於自我注意之神經網路處理影像 | |
JP2021039758A (ja) | 画像間の類似度を利用した類似領域強調方法およびシステム | |
CN110427870B (zh) | 眼部图片识别方法、目标识别模型训练方法及装置 | |
CN117710670A (zh) | 一种基于多种特征融合的胶质瘤影像分割方法及系统 | |
CN117373064A (zh) | 基于自适应跨维度加权的人体姿态估计方法、计算机设备及存储介质 | |
CN111709473A (zh) | 对象特征的聚类方法及装置 | |
CN113139490B (zh) | 一种图像特征匹配方法、装置、计算机设备及存储介质 | |
CN115545085A (zh) | 微弱故障电流的故障类型识别方法、装置、设备和介质 | |
Qi et al. | An efficient deep learning hashing neural network for mobile visual search | |
JP7105918B2 (ja) | 領域特定装置、方法およびプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20825586 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021547115 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217029414 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2020825586 Country of ref document: EP |