WO2020253852A1 - 图片识别方法、识别模型训练方法、装置及存储介质 - Google Patents

图片识别方法、识别模型训练方法、装置及存储介质 Download PDF

Info

Publication number
WO2020253852A1
WO2020253852A1 PCT/CN2020/097273 CN2020097273W WO2020253852A1 WO 2020253852 A1 WO2020253852 A1 WO 2020253852A1 CN 2020097273 W CN2020097273 W CN 2020097273W WO 2020253852 A1 WO2020253852 A1 WO 2020253852A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
recognition model
picture
training
cube
Prior art date
Application number
PCT/CN2020/097273
Other languages
English (en)
French (fr)
Inventor
庄新瑞
李悦翔
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2021547115A priority Critical patent/JP7233555B2/ja
Priority to EP20825586.9A priority patent/EP3989109A4/en
Priority to KR1020217029414A priority patent/KR102645533B1/ko
Publication of WO2020253852A1 publication Critical patent/WO2020253852A1/zh
Priority to US17/402,500 priority patent/US12112556B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Definitions

  • This application relates to the field of computers, and in particular to a picture recognition method, recognition model training method, device and storage medium.
  • the embodiments of the present application provide a picture recognition method, a recognition model training method, a device, and a storage medium, which can improve the accuracy of picture recognition while improving the efficiency of model training.
  • the embodiment of the application provides a picture recognition method, including:
  • the first recognition model is used to recognize the target three-dimensional picture to obtain the picture type of the target three-dimensional picture;
  • the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
  • the second recognition model is a model obtained by training the original recognition model using target training samples
  • the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture, where N is a natural number greater than 1;
  • the embodiment of the present application provides a picture recognition method, which is executed by a terminal, and the terminal includes one or more processors and memories, and one or more programs, wherein the one or more The program is stored in the memory, the program may include one or more units each corresponding to a set of instructions, and the one or more processors are configured to execute the instructions; the method includes:
  • the first recognition model is used to recognize the target 3D picture to obtain the picture type of the target 3D picture;
  • the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
  • the second recognition model is a model obtained by training the original recognition model using target training samples
  • the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture, where N is a natural number greater than 1;
  • the embodiment of the present application also provides a method for training a recognition model.
  • the method is executed by a network device.
  • the network device includes one or more processors and memories, and one or more programs, wherein the one Or more than one program is stored in the memory, the program may include one or more units each corresponding to a set of instructions, and the one or more processors are configured to execute the instructions; including:
  • the convolution block of the second recognition model is the same as the convolution block of the first recognition model, and is used for the first recognition model to recognize the target three-dimensional picture to obtain the picture type of the target three-dimensional picture.
  • the embodiment of the application also provides a recognition model training method, including:
  • the convolution block of the second recognition model is the same as the convolution block of the first recognition model, and is used for the first recognition model to recognize the target three-dimensional picture to obtain the type of the target three-dimensional picture.
  • An embodiment of the present application also provides a picture recognition device, including:
  • the first obtaining unit is configured to obtain a three-dimensional picture of the target to be recognized;
  • the first input unit is configured to input the three-dimensional picture of the target into the first recognition model;
  • the first recognition model is used to recognize the target 3D picture to obtain the picture type of the target 3D picture;
  • the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
  • the second recognition model is a model obtained by training the original recognition model using target training samples
  • the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture, where N is a natural number greater than 1;
  • the second acquiring unit is configured to acquire the type of the target three-dimensional picture output by the first recognition model.
  • the embodiment of the present application also provides a recognition model training device, including:
  • a segmentation unit configured to obtain a three-dimensional sample picture, and segment N target cubes from the three-dimensional sample picture, where N is a natural number greater than one;
  • a processing unit configured to rotate and sort the N target cubes to obtain target training samples
  • a training unit configured to use the target training sample to train the original recognition model to obtain a second recognition model
  • the convolution block of the second recognition model is the same as the convolution block of the first recognition model, and is used for the first recognition model to recognize the target three-dimensional picture to obtain the type of the target three-dimensional picture.
  • the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is set to execute the above-mentioned image recognition method when running.
  • An embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the above-mentioned image recognition method by running the computer program.
  • the training efficiency of training the first recognition model is improved by training the first recognition model before using the first recognition model , While improving the accuracy of image recognition.
  • FIG. 1 is a schematic diagram of an application environment of a picture recognition method according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a picture recognition method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a picture recognition method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another picture recognition method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another image recognition method according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another image recognition method according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another image recognition method according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another image recognition method according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of another image recognition method according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a method for training a recognition model according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a picture recognition apparatus according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a recognition model training device according to an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • Magnetic resonance imaging Magnetic Resonance Imagin, MRI: A type of medical imaging.
  • Computerized Tomography (Computed Tomography, CT): A type of medical imaging that can be used to check a variety of diseases.
  • Convolutional neural network (convolution neural network, CNN)
  • BRATS Multimodal Brain Tumor Segmentation
  • Feature map A feature map obtained after image and filter convolution. In practical applications, Feature map can be convolved with filter to generate a new feature map.
  • Siamese network (Siamese network): Contains several convolutional neural networks with the same structure, and each network can share weight parameters.
  • Hamming distance used to measure the number of different characters at the corresponding positions of two strings.
  • FCN Fully convolutional network
  • a picture recognition method is provided, and the picture recognition method can be, but is not limited to, applied to the environment shown in FIG. 1.
  • the user 102 and the user equipment 104 may perform human-computer interaction.
  • the user equipment 104 includes a memory 106 configured to store interactive data, and a processor 108 configured to process interactive data.
  • the user equipment 104 can exchange data with the server 112 through the network 110.
  • the server 112 includes a database 114 for storing interactive data, and a processing engine 116 for processing interactive data.
  • the user equipment 104 includes a first recognition model.
  • the user equipment 104 can obtain the target 3D picture 104-2 to be recognized, recognize the target 3D picture 104-2, and output the picture type 104- of the target 3D picture 104-2. 4.
  • the above-mentioned image recognition method can be, but is not limited to, applied to terminals that can calculate data, such as mobile phones, tablets, laptops, PCs and other terminals.
  • the above-mentioned networks can include, but are not limited to, wireless networks or wired networks. .
  • the wireless network includes: Bluetooth, WIFI and other networks that realize wireless communication.
  • the aforementioned wired network may include, but is not limited to: wide area network, metropolitan area network, and local area network.
  • the aforementioned server may include, but is not limited to, any hardware device that can perform computing, such as an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or the provision of cloud services, cloud databases, and cloud services.
  • Cloud servers for basic cloud computing services such as computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, and big data and artificial intelligence platforms.
  • the above-mentioned picture recognition method includes:
  • S202 Acquire a 3D picture of a target to be identified.
  • S204 Input the 3D image of the target to be recognized into the first recognition model.
  • the first recognition model is used to recognize the target 3D picture to be recognized to obtain the picture type of the target 3D picture to be recognized;
  • the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
  • the second recognition model is a model obtained by training the original recognition model using target training samples.
  • the target training samples include: a cube obtained by rotating and sorting N target cubes obtained from a 3D sample picture, where N is greater than 1. Natural number.
  • S206 Acquire the first type of the target 3D picture output by the first recognition model.
  • the above-mentioned picture recognition method can be but not limited to be applied to the field of picture recognition.
  • the above method is applied to the process of identifying the type of 3D picture.
  • the process of recognizing the type of disease in 3D disease pictures For example, when recognizing the type of cerebral hemorrhage, after obtaining a 3D disease picture (the 3D disease picture can be an MRI picture or a CT picture), the 3D disease picture is input into the first recognition model, and the first model is used to compare the 3D disease picture Recognize and output the first type of 3D disease pictures.
  • the first type can be healthy, or aneurysm, arteriovenous malformation, moyamoya disease, high blood pressure, etc.
  • the second recognition model is trained in advance using the cube extracted from the 3D picture, thereby improving the training efficiency of the second recognition model, and the convolution block of the second recognition model is used as the convolution of the first recognition model.
  • the building block uses the first recognition model to recognize 3D pictures, which greatly improves the training efficiency of the first recognition model.
  • the second recognition model before acquiring the target 3D picture, the second recognition model needs to be trained first. During training, you first need to obtain 3D sample pictures. The 3D sample pictures are unlabeled pictures. After obtaining the 3D sample picture, it is necessary to extract the original cube from the 3D sample picture, and split the original cube into N target cubes.
  • the geometric center of the 3D sample picture when extracting the original cube, may be determined first. After the geometric center is determined, the geometric center is taken as the geometric center of the original cube, and the original cube is determined. The length of the side of the original cube is smaller than the length of the smallest side of the 3D sample picture.
  • the geometric center 304 of the 3D sample picture 302 is determined first, and then the original cube 306 with the geometric center 304 as the geometric center is determined.
  • a radius r can also be determined, and then the geometric center of the 3D sample image is taken as the center, the radius r is used as the radius to make a sphere, and then any sphere is selected.
  • One point is used as the geometric center of the original cube to determine the original cube. It should be noted that the determined original cube is located in the 3D sample picture and will not exceed the scope of the 3D sample picture.
  • the original cube needs to be split to obtain N target cubes.
  • any method can be used, such as randomly digging out N target cubes from the original cube, or splitting a part of the original cube to obtain N target cubes. Or, split the original cube into N target cubes, where N is the third power of a positive integer.
  • N is the third power of a positive integer.
  • N is the third power of a positive integer.
  • an original cube 404 is split in the directions indicated by the arrows of 402-1, 402-2, and 402-3 to obtain 8 target cubes (in Figure 4
  • the split method is just an example).
  • every two adjacent cubes are separated by M voxels.
  • the original cube 502 is split into 8 target cubes 504.
  • the side length of the original cube 502 is 10 voxels, and the side length of the target cube 504 is 4 voxels.
  • the first target cube among the N target cubes may be rotated by a first angle, such as 90 degrees, 180 degrees, and so on. There may be one or more first target cubes, and the rotation angle of each first target cube may be the same or different. Sort the rotated first target cube and the remaining unrotated target cubes. The sorting can be randomly sorted, and the target training samples are obtained after sorting.
  • the original recognition model is trained using the target training sample, and the original recognition model outputs the probability of what kind of rotation and the sequence of the target cube in the target training sample.
  • the aforementioned probability may or may not satisfy the first objective function.
  • the first objective function may be a loss function. If the above probability satisfies the first objective function, it means that the original recognition model is correct. If the above probability does not satisfy the first objective function, it means that the recognition result of the original recognition model is incorrect.
  • the probability of the recognition result satisfying the first objective function is greater than the first threshold, the original recognition model is determined as the second recognition model. It shows that the accuracy of the second recognition model is greater than the first threshold. If the accuracy is above 99.95%.
  • the convolution block in the second recognition model can be obtained, the convolution block can be used as the convolution block of the first recognition model, and the first training sample is used to A recognition model is trained.
  • the first training sample is a 3D picture including picture types.
  • the first recognition model can be put into use. Such as identifying the type of disease in 3D pictures.
  • a selection button 602-1 is displayed on the display interface 602 of the terminal. The user can select the target 3D picture 604 to be recognized. The terminal recognizes the target 3D picture 604 to be recognized, and outputs the target 3D picture to be recognized The first type 606.
  • the BRATS-2018 data set includes MRI images of 285 patients. Each patient's MRI image includes 4 different modalities, namely T1, T1Gd, T2, and FLAIR. The data of different modalities are co-registered. Each image The size is 240x240x155.
  • the cerebral hemorrhage dataset includes 1486 brain CT scan images of cerebral hemorrhage.
  • the types of cerebral hemorrhage are aneurysm, arteriovenous malformation, moyamoya disease, and hypertension.
  • the size of each CT image is 230x270x30.
  • the above picture is used as the training of the second recognition model.
  • the original cube is extracted from the picture and the original cube is split into the target cube.
  • the specific method of selecting the original cube please refer to the above example, and will not repeat it here.
  • After selecting the original cube in order to encourage the network to learn high-level semantic feature information through the agent task of Rubik’s cube restoration instead of low-level statistical feature information of pixel distribution, we cut the original cube to obtain the target cube when the adjacent two A random interval within 10 voxels is reserved between the target cubes, and then the voxels in each target cube are normalized by [-1,1]. Get the target training sample.
  • the Siamese network (Siamese network) includes X sub-networks that share weights with each other, where X represents the number of target cubes.
  • X represents the number of target cubes.
  • an 8-in-1 Siamese network with 8 target cube inputs is used.
  • Each sub-network has the same network structure and shares weights with each other.
  • the backbone structure of each sub-network can use various types of 3D CNN that currently exist, and the 3D VGG network is used in the experiment.
  • the output feature map of the last fully connected layer of all sub-networks is superimposed and then input into different branches, which are respectively used for the task of spatial rearrangement of the target cube and the task of determining the rotation of the target cube.
  • the above feature map is the content output by any network in the convolution model.
  • the first step is to rearrange the target cube.
  • Hamming distance is used as a measurement index, and K sequences that are more different from each other are selected in turn.
  • lj represents the true label one-hot label of the sequence
  • table pj represents the predicted probability of each sequence output by the network.
  • a new operation is added to the 3D Rubik's Cube restoration task, that is, the rotation of the target cube.
  • the network can learn the rotation-invariant characteristics of the 3D image block.
  • 3 rotation axis, x, y, z axis
  • x 2 rotation direction, clockwise, counterclockwise
  • x 4 rotation
  • the target cube can only be rotated by 180° in the horizontal or vertical direction.
  • the magic cubes 3 and 4 are rotated by 180° horizontally, and the magic cubes 5 and 7 are rotated by 180° in the vertical direction.
  • the network needs to determine how each target cube has proceeded. This type of rotation, so the loss function for the magic cube rotation task is as follows:
  • Formula M represents the number of targets cube, g i hor cube represents a target vertical rotating one-hot labels, g i ver cube represents the target horizontal rotation one-hot labels, r i hor, r i ver represent network The predicted output probability in the vertical and horizontal directions.
  • the objective function of the model is the linear weighting of the permutation loss function and the rotation loss function.
  • the overall loss function of the model is as follows:
  • a and b are the weights of the two loss functions, which control the degree of mutual influence between the two subtasks. In the experiment, setting the two weights to 0.5 can make the pre-training achieve better results.
  • the second recognition model can be obtained.
  • the accuracy of the second recognition model is greater than the first threshold.
  • the convolution block of the second recognition model can be extracted and used as other target tasks after fine-tuning.
  • the convolution block of the second recognition model is extracted and used for the first recognition model to recognize the type of 3D picture.
  • the convolution block of the second recognition model is extracted and used for the first recognition model to recognize the type of 3D picture.
  • classification tasks only the fully connected layer behind the CNN network needs to be retrained, and the convolutional layer before the fully connected layer can be fine-tuned with a smaller learning rate.
  • the pre-training network can be used in a fully convolutional neural network (FCN) commonly used in image segmentation tasks, such as a 3D U-Net structure, as shown in Figure 8.
  • FCN fully convolutional neural network
  • the network parameters in the up-sampling stage of U-Net still need to be initialized randomly during training.
  • DUC Dense Upsampling Convolution
  • the cube extracted from the 3D picture is used to train the second recognition model in advance, the training efficiency of the second recognition model is improved, and the convolution block of the second recognition model is used as the convolution of the first recognition model.
  • the building block uses the first recognition model to recognize 3D pictures, which greatly improves the training efficiency of the first recognition model.
  • the obtaining the 3D picture of the target to be recognized may further include:
  • the 3D sample picture and the target 3D picture may be the same picture. That is, after the second recognition model is trained using 3D sample pictures, and the second convolution block is used as the convolution block of the first recognition model, the 3D sample pictures can be input into the first recognition model, and the A recognition model recognizes the type of 3D sample picture. When the 3D sample picture is input to the second recognition model, it is not necessary to input the type of the 3D sample picture.
  • N target cubes are obtained to train the second recognition model, which improves the training efficiency of training the second recognition model and improves the training of the first recognition model effectiveness.
  • the N is a positive integer greater than 1 to the third power
  • the splitting the original cube into the N target cubes includes:
  • M voxels are spaced between two adjacent target cubes, so that the second recognition model can learn high-level semantic feature information instead of low-level pixel distribution.
  • the statistical feature information improves the training efficiency of the second recognition model and improves the training efficiency of the first recognition model.
  • the following operations may also be included:
  • S3 Sort the first target cube after rotating the first angle with other target cubes among the N target cubes to obtain the target training sample.
  • the foregoing sorting may be randomly sorting the N target cubes.
  • the above-mentioned rotation can rotate multiple first target cubes among the N target cubes, and the rotation can be rotated at any angle.
  • sorting the first target cube after rotating the first angle with other target cubes among the N target cubes, after obtaining the target training sample may further include:
  • the target training sample is input into the original recognition model to train the original recognition model to obtain the second recognition model.
  • the target to be recognized may further include:
  • the recognition accuracy of the second recognition model is greater than an accuracy value, it is considered that the second recognition model meets the requirements, and the training of the second recognition model is stopped.
  • the target to be recognized may further include:
  • a first sample picture with a label may be input. Then the first recognition model is trained until the recognition accuracy of the first recognition model is greater than the second threshold, then the first recognition model can be put into use.
  • the target training samples include rotating and sorting N target cubes obtained from three-dimensional sample pictures. Cube; Pre-training the second recognition model with cubes extracted from three-dimensional pictures improves the training efficiency of the second recognition model and at the same time improves the recognition accuracy of three-dimensional pictures;
  • the convolution block of the first recognition model is the same as the convolution block of the second recognition model; that is, the convolution block trained by the second recognition model is used as the convolution block of the first recognition model. 1. The training efficiency of the recognition model;
  • the target three-dimensional picture is recognized, which improves the recognition accuracy; by training the first recognition model before using the first recognition model, the recognition is improved.
  • the training efficiency of the first recognition model for training is improved.
  • the embodiment of the present application also provides a recognition model training method. As shown in Figure 10, the method includes:
  • S1002 Obtain a 3D sample picture, and segment N target cubes from the 3D sample picture.
  • N is a natural number greater than 1.
  • S1004 Perform a predetermined operation on the N target cubes to obtain target training samples, where the predetermined operation includes rotating and sorting the N target cubes.
  • S1006 Use the target training sample to train the original recognition model to obtain a second recognition model.
  • the original recognition model is used to output the recognition result of the target training sample, and when the probability of the recognition result satisfying the first objective function is greater than the first threshold, the original recognition model is determined as the second recognition model.
  • the convolution block of the second recognition model is the same as the convolution block of the first recognition model, and is used for the first recognition model to recognize the target three-dimensional picture to obtain the picture type of the target three-dimensional picture.
  • the above-mentioned method may but is not limited to be applied to the process of model training.
  • N target cubes are extracted from a 3D sample picture, and the N target cubes are rotated and sorted to obtain N cubes as target training samples and input into the original recognition model.
  • the extraction, rotation, and sorting methods can refer to the methods in the foregoing embodiment, and details are not described in this embodiment.
  • the original recognition model outputs the probability of what kind of rotation and the order of the target cube in the target training sample.
  • the aforementioned probability may or may not satisfy the first objective function.
  • the first objective function may be a loss function.
  • the current original recognition model is determined to be a well-trained model.
  • the convolution block of the original recognition model can be extracted, and after adding a new fully connected layer, a new recognition model can be formed, and the new recognition model can be used for recognition Other characters.
  • the new recognition model can have high recognition accuracy after training with a small number of samples. For example, applying the new recognition model to the process of recognizing the type of 3D pictures, or applying the new recognition model to tasks such as segmentation of 3D pictures, will not be repeated here.
  • the embodiment of the present application also provides a picture recognition device for implementing the above picture recognition method.
  • the device includes:
  • the first obtaining unit 1102 is configured to obtain a 3D picture of the target to be recognized
  • the first input unit 1104 is configured to input the target 3D picture to be recognized into the first recognition model.
  • the first recognition model is used to recognize the target 3D picture to be recognized to obtain the picture type of the target 3D picture to be recognized;
  • the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used to perform the recognition on the target three-dimensional picture;
  • the second recognition model is a model obtained by training the original recognition model using target training samples
  • the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a 3D sample picture, where N is a natural number greater than 1;
  • the second obtaining unit 1106 is configured to obtain the first type of the target 3D picture to be recognized and output by the first recognition model.
  • the above-mentioned picture recognition apparatus may be but not limited to be applied in the field of picture recognition.
  • the above method is applied to the process of identifying the type of 3D picture.
  • the process of recognizing the type of disease in 3D disease pictures For example, when recognizing the type of cerebral hemorrhage, after obtaining the 3D disease picture, the 3D disease picture is input into the first recognition model, and the 3D disease picture is recognized using the first model and the first type of 3D disease picture is output.
  • the first type can be healthy, or aneurysm, arteriovenous malformation, moyamoya disease, high blood pressure, etc.
  • the second recognition model is trained with cubes extracted from the 3D picture in advance, thereby improving the training efficiency of the second recognition model; the convolution block of the second recognition model is used as the volume of the first recognition model The building block uses the first recognition model to recognize 3D pictures, which greatly improves the training efficiency of the first recognition model.
  • the second recognition model before acquiring the target 3D picture, the second recognition model needs to be trained first. During training, you first need to obtain 3D sample pictures. The 3D sample pictures are unlabeled pictures. After obtaining the 3D sample picture, it is necessary to extract the original cube from the 3D sample picture, and split the original cube into N target cubes.
  • the geometric center of the 3D sample picture when extracting the original cube, the geometric center of the 3D sample picture may be determined first. After the geometric center is determined, the geometric center is taken as the geometric center of the original cube, and the original cube is determined. The side length of the original cube is smaller than the minimum variable length of the 3D sample picture.
  • the geometric center 304 of the 3D sample picture 302 is first determined, and then the original cube 306 with the geometric center 304 as the geometric center is determined.
  • a radius r can also be determined, and then the geometric center of the 3D sample image is taken as the center, the radius r is used as the radius to make a sphere, and then any sphere is selected.
  • One point is used as the geometric center of the original cube to determine the original cube. It should be noted that the determined original cube is in the 3D sample picture and will not exceed the scope of the 3D sample picture.
  • the original cube needs to be split to obtain N target cubes.
  • any method can be used, such as randomly digging out N target cubes from the original cube, or splitting a part of the original cube to obtain N target cubes. Or, split the original cube into N target cubes, where N is the third power of a positive integer.
  • N is the third power of a positive integer.
  • N is the third power of a positive integer.
  • an original cube 404 is split in the directions indicated by the arrows of 402-1, 402-2, and 402-3 to obtain 8 target cubes (in Figure 4
  • the split method is just an example).
  • every two adjacent cubes are separated by M voxels.
  • the original cube 502 is split into 8 target cubes 504.
  • the side length of the original cube 502 is 10 voxels, and the side length of the target cube 504 is 4 voxels.
  • the first target cube among the N target cubes may be rotated by a first angle, such as 90 degrees, 180 degrees, and so on. There may be one or more first target cubes, and the rotation angle of each first target cube may be the same or different. Sort the rotated first target cube and the remaining unrotated target cubes. The sorting can be randomly sorted, and the target training samples are obtained after sorting.
  • the original recognition model is trained using the target training sample, and the original recognition model outputs the probability of what kind of rotation and the sequence of the target cube in the target training sample.
  • the aforementioned probability may or may not satisfy the first objective function.
  • the first objective function may be a loss function. If the above probability satisfies the first objective function, it means that the original recognition model is correct. If the above probability does not satisfy the first objective function, it means that the recognition result of the original recognition model is incorrect.
  • the probability of the recognition result satisfying the first objective function is greater than the first threshold, the original recognition model is determined as the second recognition model. It shows that the accuracy of the second recognition model is greater than the first threshold. If the accuracy is above 99.95%.
  • the convolution block in the second recognition model can be obtained, the convolution block can be used as the convolution block of the first recognition model, and the first training sample is used to A recognition model is trained.
  • the first training sample is a 3D picture including picture types.
  • the first recognition model can be put into use. Such as identifying the type of disease in 3D pictures.
  • a selection button 602-1 is displayed on the display interface 602 of the terminal. The user can select the target 3D picture 604 to be recognized. The terminal recognizes the target 3D picture 604 to be recognized, and outputs the target 3D picture to be recognized The first type 606.
  • the second recognition model is trained in advance using cubes extracted from 3D pictures, thereby improving the training efficiency of the second recognition model; the convolution block of the second recognition model is used as the first recognition model The convolution block of, uses the first recognition model to recognize 3D pictures, which greatly improves the training efficiency of the first recognition model.
  • the device further includes:
  • the third acquiring unit is configured to acquire the 3D sample picture before the acquiring the target 3D picture to be identified;
  • the first determining unit is configured to determine the original cube from the 3D sample picture
  • the splitting unit is configured to split the original cube into the N target cubes.
  • the 3D sample picture and the target 3D picture may be the same picture. That is, after the second recognition model is trained using 3D sample pictures, and the second convolution block is used as the convolution block of the first recognition model, the 3D sample pictures can be input into the first recognition model, and the A recognition model recognizes the type of 3D sample picture. When the 3D sample picture is input to the second recognition model, it is not necessary to input the type of the 3D sample picture.
  • the N is a positive integer greater than 1 to the third power
  • the splitting unit includes:
  • the splitting module is configured to maintain an interval of M voxels between two adjacent target cubes, and split the N target cubes from the original cube, where M is greater than 0 and less than J-1 A positive integer of, the J is the side length of the target cube.
  • M voxels are spaced between two adjacent target cubes, so that the second recognition model can learn high-level semantic feature information instead of low-level pixel distribution.
  • the statistical feature information improves the training efficiency of the second recognition model and improves the training efficiency of the first recognition model.
  • the device further includes:
  • the second determining unit is configured to determine a first target cube from the N target cubes before the acquiring the target 3D picture to be recognized;
  • a rotating unit configured to rotate the first target cube by a first angle
  • the sorting unit is configured to sort the first target sample cube and other target cubes after rotating the first angle among the N target cubes to obtain the target training sample.
  • the foregoing sorting may be randomly sorting the N target cubes.
  • the above-mentioned rotation can rotate a plurality of first target cubes among the N target cubes. Rotation can rotate at any angle.
  • the device further includes:
  • the second input unit is configured to sort the first target sample cube and other target cubes after rotating the first angle among the N target cubes, and after obtaining the target training sample,
  • the target training samples are input into the original recognition model to train the original recognition model to obtain the second recognition model.
  • the device further includes:
  • the fourth obtaining unit is configured to obtain, before obtaining the 3D picture of the target to be recognized, the recognition result output after the original recognition model recognizes the target training sample, wherein the recognition result includes the target training Probability of the various sorting orders of the target cube in the sample and the rotation angle of each target cube;
  • the third determining unit is configured to determine the original recognition model as the second recognition model when the probability that the recognition result meets the first objective function is greater than a first threshold.
  • the device further includes:
  • a fourth determining unit configured to determine the convolution block of the second recognition model as the convolution block of the first recognition model before acquiring the target 3D picture to be recognized;
  • the training unit is configured to use a first training sample to train the first recognition model until the accuracy of the first recognition model is greater than a second threshold, wherein the first training sample includes the first 3D picture and the Describe the type of the first 3D picture.
  • a first sample picture with a label may be input. Then the first recognition model is trained until the recognition accuracy of the first recognition model is greater than the second threshold, then the first recognition model can be put into use.
  • An embodiment of the present application also provides a recognition model training device for implementing the above recognition model training method.
  • the device includes:
  • the segmentation unit 1202 is configured to obtain 3D sample pictures, and segment N target cubes from the 3D sample pictures;
  • the processing unit 1204 is configured to perform a predetermined operation on the N target cubes to obtain target training samples, where the predetermined operation includes rotating and sorting the N target cubes;
  • the training unit 1206 is configured to use the target training sample to train the original recognition model to obtain a second recognition model, where the original recognition model is used to output the recognition result of the target training sample, and the probability that the recognition result meets the first objective function is greater than At the first threshold, the original recognition model is determined as the second recognition model.
  • the above-mentioned device can be, but not limited to, applied in the process of model training.
  • N target cubes are extracted from a 3D sample picture, and the N target cubes are rotated and sorted to obtain N cubes as target training samples and input into the original recognition model.
  • the original recognition model outputs the probability of what kind of rotation and the order of the target cube in the target training sample.
  • the aforementioned probability may or may not satisfy the first objective function.
  • the first objective function may be a loss function.
  • the current original recognition model is determined to be a well-trained model.
  • the convolution block of the original recognition model can be extracted, and after adding a new fully connected layer, a new recognition model can be formed, and the new recognition model can be used for recognition Other characters.
  • the new recognition model can have high recognition accuracy after training with a small number of samples. For example, applying the new recognition model to the process of recognizing the type of 3D pictures, or applying the new recognition model to tasks such as segmentation of 3D pictures, will not be repeated here.
  • the embodiment of the present application also provides an electronic device for implementing the above-mentioned picture recognition method.
  • the electronic device includes a memory 1302 and a processor 1304.
  • the memory 1302 stores a computer program
  • the processor 1304 It is configured to execute the image recognition method provided in the embodiment of the present application through a computer program.
  • the above electronic device may be located in at least one network device among a plurality of network devices in a computer network.
  • the aforementioned processor may be configured to execute the following steps through a computer program:
  • S3 Acquire the first type of the target 3D picture to be recognized and output by the first recognition model.
  • the structure shown in FIG. 13 is only for illustration, and the electronic device may also be a smart phone (such as an Androrid mobile phone, an ios mobile phone, etc., a tablet computer, a palmtop computer, and a mobile Internet device ( Mobile Internet Devices, MID), PAD and other terminal equipment.
  • Figure 13 does not limit the structure of the above electronic device.
  • the electronic device may also include more or less components (such as network interfaces, etc.) than shown in Figure 13 , Or have a different configuration from that shown in Figure 13.
  • the memory 1302 can be used to store software programs and modules, such as program instructions/modules corresponding to the image recognition method and device in the embodiment of the present application.
  • the processor 1304 executes the software programs and modules stored in the memory 1302 by running the software programs and modules. This kind of functional application and data processing realize the above-mentioned picture recognition method.
  • the memory 1302 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 1302 may further include a memory remotely provided with respect to the processor 1304, and these remote memories may be connected to the terminal through a network.
  • the memory 1302 may specifically, but is not limited to, storing information such as target 3D pictures to be recognized.
  • the memory 1302 may include, but is not limited to, the first acquisition unit 1102, the first input unit 1104, and the second acquisition unit 1106 in the image recognition apparatus described above.
  • it may also include, but is not limited to, other module units in the above-mentioned picture recognition device, which will not be repeated in this example.
  • the aforementioned transmission device 1306 is used to receive or send data via a network.
  • the above-mentioned specific examples of networks may include wired networks and wireless networks.
  • the transmission device 1306 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers via a network cable so as to communicate with the Internet or a local area network.
  • the transmission device 1306 is a radio frequency (RF) module, which can be configured to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • the above electronic device further includes: a display 1308 configured to display the first type of 3D picture to be recognized; and a connection bus 1310 configured to connect each module component in the above electronic device.
  • the embodiment of the present application also provides an electronic device for implementing the above recognition model training method.
  • the electronic device includes a memory 1402 and a processor 1404.
  • the memory 1402 stores a computer program
  • the processor 1404 is configured to execute the above-mentioned recognition model training method provided in the embodiment of the present application through a computer program.
  • the above electronic device may be located in at least one network device among a plurality of network devices in a computer network.
  • the aforementioned processor may be configured to execute the following steps through a computer program:
  • the structure shown in FIG. 14 is only for illustration, and the electronic device may also be a smart phone (such as an Androrid mobile phone, an ios mobile phone, etc., a tablet computer, a handheld computer, and a mobile Internet device Mobile Internet Devices, MID), PAD and other terminal equipment.
  • Figure 14 does not limit the structure of the above electronic device.
  • the electronic device may also include more or less components (such as network interfaces, etc.) than those shown in Figure 14 ), or have a different configuration from that shown in Figure 14.
  • the memory 1402 may be configured to store software programs and modules, such as the program instructions/modules corresponding to the recognition model training method and device in the embodiment of the present application.
  • the processor 1404 runs the software programs and modules stored in the memory 1402, thereby Perform various functional applications and data processing, that is, realize the above-mentioned recognition model training method.
  • the memory 1402 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the storage 1402 may further include a storage remotely provided with respect to the processor 1404, and these remote storages may be connected to the terminal through a network.
  • the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the memory 1402 can be specifically but not limited to storing information such as 3D sample pictures.
  • the aforementioned memory 1402 may, but is not limited to, include the segmentation unit 1202, the processing unit 1204, and the training unit 1206 in the aforementioned recognition model training device.
  • it may also include, but is not limited to, other module units in the above-mentioned recognition model training device, which will not be repeated in this example.
  • the aforementioned transmission device 1406 is configured to receive or send data via a network.
  • the above-mentioned specific examples of networks may include wired networks and wireless networks.
  • the transmission device 1406 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers via a network cable so as to communicate with the Internet or a local area network.
  • the transmission device 1406 is a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • the above electronic device further includes: a display 1408 configured to display the training accuracy of the original recognition model, etc.; and a connection bus 1410 configured to connect each module component in the above electronic device.
  • the embodiment of the present application also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the recognition model training method provided in the embodiment of the present application when the computer program is run.
  • the aforementioned storage medium may be configured to store a computer program for executing the following steps:
  • the second recognition model is a model obtained by training an original recognition model using a target training sample, and the target training sample includes a cube obtained by rotating and sorting N target cubes obtained from a 3D sample picture, where N is Natural numbers greater than 1;
  • S3 Acquire the first type of the target 3D picture to be recognized and output by the first recognition model.
  • the storage medium may be configured to store a computer program for executing the following steps:
  • the storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • the integrated unit in the foregoing embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the foregoing computer-readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, A number of instructions are included to enable one or more computer devices (which may be personal computers, servers, or network devices, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the disclosed client can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • a three-dimensional picture of a target to be recognized is acquired; the three-dimensional picture of the target to be recognized is input into a first recognition model; wherein, the first recognition model is used to recognize the target three-dimensional picture, Obtain the picture type of the target three-dimensional picture; the convolution block of the first recognition model is the same as the convolution block of the second recognition model, and is used for the recognition of the target three-dimensional picture; the second recognition model Is a model obtained by training the original recognition model using the target training sample; wherein the target training sample includes: a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture, where N is greater than 1.
  • the second recognition model is a model obtained by training the original recognition model using the target training sample.
  • the target training sample includes a cube obtained by rotating and sorting N target cubes obtained from a three-dimensional sample picture;
  • the cube extracted from the 3D picture trains the second recognition model, which improves the training efficiency of the second recognition model and at the same time improves the recognition accuracy of the 3D picture;
  • the convolution block of the first recognition model and the volume of the second recognition model The product blocks are the same; that is, the convolution block trained by the second recognition model is used as the convolution block of the first recognition model, so that the training efficiency of the first recognition model is improved; 3) the first recognition model is compared with The second recognition model of the same convolution block recognizes the target three-dimensional picture, which improves the recognition accuracy.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Image Analysis (AREA)

Abstract

一种图片识别方法、识别模型训练方法、装置及存储介质。其中,图片识别方法包括:获取待识别的目标三维图片(S202);将所述待识别的目标三维图片输入到第一识别模型中(S204);其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到所述目标三维图片的图片类型;所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;获取所述第一识别模型输出的所述目标三维图片的图片类型(S206)。

Description

图片识别方法、识别模型训练方法、装置及存储介质
相关申请的交叉引用
本申请基于申请号为201910544392.0、申请日为2019年06月21日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机领域,尤其涉及一种图片识别方法、识别模型训练方法、装置及存储介质。
背景技术
相关技术中,在识别3D图像的类型时,通常需要使用大量的3D图片样本对3D模型进行训练,然后才可以使用训练好的3D模型识别3D图像的类型。
然而,若是使用上述方法,需要消耗大量的时间进行模型的训练,模型的训练效率低。
发明内容
本申请实施例提供了一种图片识别方法、识别模型训练方法、装置及存储介质,能够在提高模型训练效率的同时,提高图片识别准确度。
本申请实施例提供了一种图片识别方法,包括:
获取待识别的目标三维图片;
将所述待识别的目标三维图片输入到第一识别模型中;
其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到 所述目标三维图片的图片类型;
所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;
所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;
其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;
获取所述第一识别模型输出的所述目标三维图片的类型。
本申请实施例提供了一种图片识别方法,所述方法由终端执行,所述终端包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的单元,所述一个或多个处理器被配置为执行指令;所述方法包括:
获取待识别的目标三维图片;
将所述待识别的目标三维图片输入到第一识别模型中;
其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到所述目标三维图片的图片类型;
所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;
所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;
其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;
获取所述第一识别模型输出的所述目标三维图片的类型。
本申请实施例还提供了一种识别模型训练方法,所述方法由网络设 备执行,所述网络设备包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的单元,所述一个或多个处理器被配置为执行指令;包括:
获取三维样本图片,并从所述三维样本图片中分割出N个目标正方体,N为大于1的自然数;
对所述N个目标正方体进行旋转和排序,得到目标训练样本;
使用所述目标训练样本对原始识别模型进行训练,得到第二识别模型;
其中,所述第二识别模型的卷积块与第一识别模型的卷积块相同,用于所述第一识别模型对目标三维图片进行所述识别,得到所述目标三维图片的图片类型。
本申请实施例还提供了一种识别模型训练方法,包括:
获取三维样本图片,并从所述三维样本图片中分割出N个目标正方体,N为大于1的自然数;
对所述N个目标正方体进行旋转和排序,得到目标训练样本;使用所述目标训练样本对原始识别模型进行训练,得到第二识别模型;
其中,所述第二识别模型的卷积块与第一识别模型的卷积块相同,用于所述第一识别模型对目标三维图片进行所述识别,得到所述目标三维图片的类型。
本申请实施例还提供了一种图片识别装置,包括:
第一获取单元,配置为获取待识别的目标三维图片;第一输入单元,配置为将所述目标三维图片输入到第一识别模型中;
其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到所述目标三维图片的图片类型;
所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;
所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;
其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;
第二获取单元,配置为获取所述第一识别模型输出的所述目标三维图片的类型。
本申请实施例还提供了一种识别模型训练装置,包括:
分割单元,配置为获取三维样本图片,并从所述三维样本图片中分割出N个目标正方体,N为大于1的自然数;
处理单元,配置为对所述N个目标正方体进行旋转和排序,得到目标训练样本;
训练单元,配置为使用所述目标训练样本对原始识别模型进行训练,得到第二识别模型;
其中,所述第二识别模型的卷积块与第一识别模型的卷积块相同,用于所述第一识别模型对目标三维图片进行所述识别,得到所述目标三维图片的类型。
本申请实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述图片识别方法。
本申请实施例还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,上述处理器通过运行计算机程序执行上述的图片识别方法。
应用本申请实施例提供的图片识别方法、识别模型训练方法、装置及存储介质,通过在使用第一识别模型之前对第一识别模型进行训练,从而 提高了对第一识别模型进行训练的训练效率,同时提高图片识别准确度。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是本申请实施例的一种图片识别方法的应用环境的示意图;
图2是本申请实施例的一种图片识别方法的流程示意图;
图3是本申请实施例的一种图片识别方法的示意图;
图4是本申请实施例的另一种图片识别方法的示意图;
图5是本申请实施例的又一种图片识别方法的示意图;
图6是本申请实施例的又一种图片识别方法的示意图;
图7是本申请实施例的又一种图片识别方法的示意图;
图8是本申请实施例的又一种图片识别方法的示意图;
图9是本申请实施例的又一种图片识别方法的示意图;
图10是本申请实施例的一种识别模型训练方法的结构示意图;
图11是本申请实施例的一种图片识别装置的结构示意图;
图12是本申请实施例的一种识别模型训练装置的结构示意图;
图13是本申请实施例的一种电子装置的结构示意图;
图14是本申请实施例的一种电子装置的结构示意图。
具体实施方式
本为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出 创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
磁共振成像(Magnetic Resonance Imagin,MRI):医学影像的一种。
电子计算机断层扫描(Computed Tomography,CT):医学影像的一种,可用于多种疾病的检查。
卷积神经网络(convolution neural network,CNN)
多模态脑部肿瘤分割(Multimodal Brain Tumor Segmentation,BRATS)
特征图(Feature map):图像和滤波器进行卷积后得到的特征图,在实际应用中,Feature map可以和滤波器进行卷积生成新的feature map。
孪生网络(Siamese网络):包含几个相同结构的卷积神经网络,各个网络之间能够权重参数共享。
汉明距离(Hamming distance):用于衡量两个字符串对应位置的不同字符的数目。
全卷积网络(Fully convolutional network,FCN):图像分割技术的一种卷积网络,完全由卷积层和池化层组成。
根据本申请实施例提供了一种图片识别方法,该图片识别方法可以 但不限于应用于如图1所示的环境中。
图1中用户102与用户设备104之间可以进行人机交互。用户设备104中包含有存储器106,配置为存储交互数据、处理器108,配置为处理交互数据。用户设备104可以通过网络110与服务器112之间进行数据交互。服务器112中包含有数据库114,用于存储交互数据、处理引擎116,用于处理交互数据。用户设备104中包括有第一识别模型,用户设备104可以获取待识别的目标3D图片104-2,并对目标3D图片104-2进行识别,并输出目标3D图片104-2的图片类型104-4。
在一些实施例中,上述图片识别方法可以但不限于应用于可以计算数据的终端上,例如手机、平板电脑、笔记本电脑、PC机等终端上,上述网络可以包括但不限于无线网络或有线网络。其中,该无线网络包括:蓝牙、WIFI及其他实现无线通信的网络。上述有线网络可以包括但不限于:广域网、城域网、局域网。上述服务器可以包括但不限于任何可以进行计算的硬件设备,如可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、以及大数据和人工智能平台等基础云计算服务的云服务器。
在一些实施例中,如图2所示,上述图片识别方法包括:
S202,获取待识别的目标3D图片。
S204,将所述待识别的目标3D图片输入到第一识别模型中。
其中,所述第一识别模型,用于对所述待识别的目标3D图片进行识别,得到所述待识别的目标3D图片的图片类型;
第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;
第二识别模型为使用目标训练样本对原始识别模型进行训练得到的模型,目标训练样本包括:对从3D样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数。
S206,获取第一识别模型输出的目标3D图片的第一类型。
在一些实施例中,上述图片识别方法可以但不限于应用于图片识别领域。例如,将上述方法应用到识别3D图片的类型的过程中。如识别3D病症图片的中病症的类型的过程中。举例说明,在识别脑出血类型时,在获取到3D病症图片后(3D病症图片可以为MRI图片或者CT图片),将3D病症图片输入到第一识别模型中,使用第一模型对3D病症图片进行识别并输出3D病症图片的第一类型。如第一类型可以为健康,或者动脉瘤,动静脉畸形,烟雾病,高血压等。
在上述方法中,由于预先使用从3D图片中提取的正方体对第二识别模型进行训练,从而提高了第二识别模型的训练效率,将第二识别模型的卷积块作为第一识别模型的卷积块,使用第一识别模型识别3D图片,实现了大大提高第一识别模型的训练效率的效果。
在一些实施例中,在获取到目标3D图片之前,需要先对第二识别模型进行训练。在训练时,首先需要获取到3D样本图片。3D样本图片为未标签标注的图片。在获取到3D样本图片之后,需要从3D样本图片中提取原始正方体,并将原始正方体拆分为N个目标正方体。
在一些实施例中,在提取原始正方体时,可以先确定3D样本图片的几何中心。在确定几何中心后,以该几何中心为上述原始正方体的几何中心,并确定出原始正方体。上述原始正方体的边长小于3D样本图片的最小边的长度。
例如,如图3所示,对于一张3D样本图片302,首先确定出3D样本图片302的几何中心304,然后确定出以几何中心304为几何中心的原 始正方体306。
在一些实施例中,在确定出3D样本图片的几何中心之后,还可以确定出一个半径r,然后以3D样本图片的几何中心为圆心,以半径r为半径做球,然后从球中选择任意一点作为上述原始正方体的几何中中心,确定上述原始正方体。需要说明的是,确定出的原始正方体是位于3D样本图片中的,不会超出3D样本图片的范围。
在一些实施例中,在确定出原始正方体之后,需要对原始正方体进行拆分,拆分得到N个目标正方体。在拆分时,可以使用任意方法,如从原始正方体中随机挖出N个目标正方体,或者将原始正方体的一部分拆分得到N个目标正方体。或者,将原始正方体平均拆分成N个目标正方体,N为正整数的三次方。以N为8为例,如图4所示,将一个原始正方体404延402-1、402-2、402-3的箭头所示的方向进行拆分,得到8个目标正方体(图4中的拆分方法仅为示例)。或者,在拆分时,每两个相邻的正方体之间间隔M个体素。例如,以M为2为例,如图5所示,将原始正方体502拆分为8个目标正方体504。原始正方体502的边长为10个体素,则目标正方体504的边长为4个体素。
在一些实施例中,在获取到N个目标正方体之后,还可以对N个目标正方体中的第一目标正方体旋转第一角度,如旋转90度,旋转180度等。第一目标正方体可以有一个或多个,每一个第一目标正方体旋转角度可以相同或不同。将旋转后的第一目标正方体与剩余未旋转的目标正方体进行排序,排序可以随机排序,排序后得到目标训练样本。
在获取到目标训练样本后,使用目标训练样本对原始识别模型进行训练,由原始识别模型输出目标训练样本中的目标正方体进行了哪种旋转以及排列的顺序的概率。上述概率可能满足第一目标函数也可能不满足第一目标函数。第一目标函数可以为损失函数。若是上述概率满足第 一目标函数,则说明原始识别模型识别结果正确。若是上述概率不满足第一目标函数,则说明原始识别模型的识别结果不正确。在所述识别结果满足第一目标函数的概率大于第一阈值时,将所述原始识别模型确定为所述第二识别模型。说明第二识别模型的准确度大于了第一阈值。如准确度达到99.95%以上。
使用上述训练方法大大提高了训练第二识别模型的效率。
在一些实施例中,在训练得到第二识别模型之后,可以获取第二识别模型中的卷积块,并将卷积块作为第一识别模型的卷积块,并使用第一训练样本对第一识别模型进行训练。第一训练样本为包括图片类型的3D图片。在第一识别模型的识别准确度大于第二阈值后,可以将第一识别模型投入到使用之中。如识别3D图片的病症类型。如图6所示,终端的显示界面602上显示有选择按钮602-1,用户可以选择待识别的目标3D图片604,终端对待识别的目标3D图片604进行识别,并输出待识别的目标3D图片的第一类型606。
举例说明,识别脑部病症时,获取公开的BRATS-2018脑部神经胶质瘤分割数据集和从合作医院采集的脑出血分类数据集,上述数据作为实验数据。
BRATS-2018数据集包括285个病人的MRI影像,每个病人MRI影像包括4个不同的模态,分别是T1,T1Gd,T2,FLAIR,不同模态的数据均经过共同配准,每幅图像的大小为240x240x155。
脑出血数据集包括1486个脑出血的脑部CT扫描图像,脑出血类型分别为动脉瘤,动静脉畸形,烟雾病,高血压。每幅CT图像的大小均为230x270x30。
将上述图片用作第二识别模型的训练。如图7所示,对于一幅图,从图中提取出原始正方体并将原始正方体拆分为目标正方体。具体选择 原始正方体的方法请参见上述示例,在此不做重复。在选择出原始正方体之后,为了鼓励网络通过魔方复原的代理任务学习到高级别的语义特征信息而非低级别的像素分布的统计特征信息,我们在切割原始正方体得到目标正方体时在相邻的两个目标正方体之间预留10个体素以内的随机间隔,之后对每个目标正方体内的体素进行[-1,1]归一化操作。得到目标训练样本。
在得到目标训练样本之后,需要对第二识别模型进行训练。如图7所示,孪生网络(Siamese网络)中包括X个互相共享权重的子网络,其中X表示目标正方体的数目。在实验中使用了有8个目标正方体输入的八合一Siamese网络,各个子网络具有相同的网络结构且互相共享权重。每个子网络的主干结构可以使用目前存在的各个类型的3D CNN,在实验中使用了3D VGG网络。将所有子网络最后一个全连接层的输出特征图feature map进行叠加然后输入到不同的分支中,分别用于目标正方体的空间重排任务和目标正方体旋转判断任务。上述feature map为卷积模型中任意一个网络所输出的内容。
1、目标正方体的重排
对于本方案所提出的魔方复原任务,其第一步就是对目标正方体进行重排。以二阶魔方为例,如图7所示,其总共具有2x2x2=8个目标正方体,我们首先要生成8个目标正方体的所有排列组合序列P=(P1,P2,…,P8!),这些排列序列控制着魔方复原任务的复杂程度,如果两个排列序列相互之间过于相似,那么网络的学习过程就会变得非常简单,很难学习到复杂的特征信息。为了保证学习的有效性,使用Hamming distance作为衡量指标,依次选取相互之间差别更大的K个序列。对于每次魔方复原的训练输入数据,从K个序列中随机抽取一个,例如(2,5,8,4,1,7,3,6),然后将裁切好的8个目标正方体按照该序列的顺序进 行重新排列,之后将重新排列好的目标正方体依次输入到网络中,最终网络要学习的目标就是要判断输入序列属于这K个序列中的哪一个,因此对于目标正方体重排其损失函数如下:
Figure PCTCN2020097273-appb-000001
上式中的lj表示序列的真实标签one-hot标签,表pj示网络输出的对于各个序列的预测概率。
2、目标正方体的旋转
在3D魔方复原任务中增加一个新的操作,即目标正方体的旋转,通过这个操作能够让网络学习到3D图像块的旋转不变的特征。
目标正方体通常均为立方体结构,如果让一个目标正方体在空间中自由的旋转,会有3(旋转轴,x,y,z轴)x 2(旋转方向,顺时针,逆时针)x 4(旋转角度,0°,90°,180°,270°)=24种不同的可能。为了降低任务的复杂性,限制目标正方体的旋转选择,规定目标正方体只能沿水平或者垂直方向进行180°的旋转。如图2所示,魔方块3,4进行了水平180°旋转,魔方块5,7进行了垂直方向180°旋转,旋转之后的魔方块输入进网络后网络要判断每个目标正方体进行了何种形式的旋转,因此对于魔方块旋转任务其损失函数如下:
Figure PCTCN2020097273-appb-000002
公式中M表示目标正方体的数目,g i hor表示目标正方体竖直方向旋转的one-hot标签,g i ver表示目标正方体水平方向旋转的one-hot标签,r i hor,r i ver分别表示网络在竖直、水平方向的预测输出概率。
根据前面的定义,模型的目标函数为排列损失函数和旋转损失函数的线性加权,模型的整体损失函数如下:
loss=a*loss p+b*loss R   (3);
其中a和b分别为两个损失函数的权重,控制了两个子任务之间的互相影响程度,在实验中将两个权重值均设置成0.5能够使预训练达到更好地效果。
经过上述训练后,可以得到第二识别模型。第二识别模型的准确度大于了第一阈值。
此时,可以将第二识别模型的卷积块提取出来,微调后用作其他目标任务。
例如,将第二识别模型的卷积块提取出来,用于第一识别模型识别3D图片的类型。对于分类任务,只需要对CNN网络后面的全连接层进行重新训练,对于全连接层之前的卷积层可以使用更小的学习率进行微调。
或者将上述第二识别模型的卷积块用于分割任务。对于分割任务,预训练网络可以使用于在图像分割任务上较为常用的全卷积神经网络(FCN),例如3D U-Net结构,如图8所示。但是,由于前期魔方复原式的预训练只能针对U-Net的下采样阶段,对于U-Net上采样阶段的网络参数在训练时仍需要进行随机初始化,为了避免大量参数初始化对前期预训练效果带来的影响,使用了密集上采样卷积模块Dense Upsampling Convolution(DUC)模块来替代原有的转置卷积,对特征图进行上采样,恢复到图像原始输入大小,DUC模块的结构如图9所示。其中,C表示通道数量,d表示扩大倍数。H为特征图的长,W为特征图的宽。
通过本实施例,由于预先使用从3D图片中提取的正方体对第二识别模型进行训练,从而提高了第二识别模型的训练效率,将第二识别模型的卷积块作为第一识别模型的卷积块,使用第一识别模型识别3D图片,实现了大大提高第一识别模型的训练效率的效果。
在一些实施例中,在所述获取待识别的目标3D图片之前,还可以包 括:
S1,获取所述3D样本图片;
S2,从所述3D样本图片中确定出原始正方体;
S3,将所述原始正方体拆分为所述N个目标正方体。
在一些实施例中,3D样本图片与目标3D图片可以为相同的图片。即,在使用3D样本图片对第二识别模型进行训练之后,并将第二卷积块用作第一识别模型的卷积块之后,可以将3D样本图片输入到第一识别模型中,由第一识别模型识别3D样本图片的类型。3D样本图片在输入第二识别模型时,不需要输入3D样本图片的类型。
应用本申请上述实施例,在使用第一识别模型之前,获取N个目标正方体对第二识别模型进行训练,提高了对第二识别模型进行训练的训练效率,以及提高了第一识别模型的训练效率。
在一些实施例中,所述N为大于1的正整数的3次方,所述将所述原始正方体拆分为所述N个目标正方体,包括:
S1,保持相邻的两个所述目标正方体之间间隔M个体素,从所述原始正方体中拆分出所述N个目标正方体,所述M为大于0且小于J-1的正整数,所述J为所述目标正方体的边长。
在一些实施例中,在确定N个目标正方体时,两个相邻的目标正方体之间间隔M个体素,可以使第二识别模型学习到高级别的语义特征信息而非低级别的像素分布的统计特征信息,提高了第二识别模型的训练效率,以及提高了第一识别模型的训练效率。
在一些实施例中,在所述获取待识别的目标3D图片之前,还可以包括如下操作:
S1,从所述N个目标正方体中确定出第一目标正方体;
S2,将所述第一目标正方体旋转第一角度;
S3,对旋转所述第一角度之后的第一目标正方体,与所述N个目标正方体中其他目标正方体进行排序,得到所述目标训练样本。
在一些实施例中,上述排序可以为随机对N个目标正方体进行排序。上述旋转可以对N个目标正方体中的多个第一目标正方体进行旋转,旋转可以旋转任意角度。
应用本申请上述实施例,在使用第一识别模型之前,在获取N个目标正方体之后,对N个目标正方体中的第一目标正方体进行旋转,提高了对第二识别模型进行训练的训练效率,提高了第一识别模型的训练效率。
在一些实施例中,对旋转所述第一角度之后的第一目标正方体,与所述N个目标正方体中其他目标正方体进行排序,得到所述目标训练样本之后,还可以包括:
将所述目标训练样本输入到所述原始识别模型中,以对所述原始识别模型进行训练,得到所述第二识别模型。
应用本申请上述实施例,提高了对第二识别模型进行训练的训练效率。
在一些实施例中,在获取待识别的目标3D图片之前,还可以包括:
S1,获取所述原始识别模型对所述目标训练样本进行识别后输出的识别结果,其中,所述识别结果中包括所述目标训练样本中所述目标正方体的各种排序顺序与每一个所述目标正方体的旋转角度的概率;
S2,在所述识别结果满足第一目标函数的概率大于第一阈值时,将所述原始识别模型确定为所述第二识别模型。
在一些实施例中,当第二识别模型的识别准确度大于一个准确度值时,则认为第二识别模型符合要求,从而停止对第二识别模型的训练。
这里,通过设置一个跳出条件从而停止对第二识别模型的训练,提 高了对第二识别模型进行训练的训练效率。
在一些实施例中,获取待识别的目标3D图片之前,还可以包括:
S1,将所述第二识别模型的卷积块确定为所述第一识别模型的卷积块;
S2,使用第一训练样本对所述第一识别模型进行训练,直到所述第一识别模型的准确度大于第二阈值,其中,所述第一训练样本包括第一3D图片与所述第一3D图片的类型。
在一些实施例中,在对第一识别模型进行训练时,可以输入带有标签的第一样本图片。然后对第一识别模型进行训练,直到第一识别模型的识别准确度大于第二阈值,则第一识别模型可以投入到使用中。
应用本申请上述实施例,由于第二识别模型为使用目标训练样本对原始识别模型进行训练得到的模型,上述目标训练样本包括对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体;预先使用从三维图片中提取的正方体对第二识别模型进行训练,提高了第二识别模型的训练效率,同时提高了对三维图片的识别准确度;
由于第一识别模型的卷积块与第二识别模型的卷积块相同;也即,使用通过第二识别模型训练好的卷积块作为第一识别模型的卷积块,如此,提高了第一识别模型的训练效率;
通过第一识别模型中与第二识别模型相同的卷积块,对目标三维图片进行识别,提高了识别准确度;通过在使用第一识别模型之前对第一识别模型进行训练,从而提高了对第一识别模型进行训练的训练效率。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所 描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
本申请实施例还提供了一种识别模型训练方法。如图10所示,该方法包括:
S1002,获取3D样本图片,从3D样本图片中分割出N个目标正方体。
这里,N为大于1的自然数。
S1004,对N个目标正方体执行预定操作,得到目标训练样本,其中,预定操作包括对N个目标正方体进行旋转和排序。
S1006,使用目标训练样本对原始识别模型进行训练,得到第二识别模型。
其中,原始识别模型用于输出对目标训练样本的识别结果,在识别结果满足第一目标函数的概率大于第一阈值时,将原始识别模型确定为第二识别模型。
第二识别模型的卷积块与第一识别模型的卷积块相同,用于所述第一识别模型对目标三维图片进行所述识别,得到所述目标三维图片的图片类型。
在一些实施例中,上述方法可以但不限于应用于模型训练的过程中。在训练原始识别模型时,从一张3D样本图片中提取出N个目标正方体,将对N个目标正方体进行旋转与排序后得到N个正方体作为目标训练样本输入到原始识别模型中。提取、旋转、排序方法可以参见上述实施例中的方法,本实施例不再赘述。在训练原始识别模型时,由原始识别模型输出目标训练样本中的目标正方体进行了哪种旋转以及排列的顺序的概率。上述概率可能满足第一目标函数也可能不满足第一目标函数。第一目标函数可以为损失函数。若是上述概率满足第一目标函数,则说明 原始识别模型识别结果正确。若是上述概率不满足第一目标函数,则说明原始识别模型的识别结果不正确。在所述识别结果满足第一目标函数的概率大于第一阈值时,将当前的原始识别模型确定为训练成熟的模型。
通过上述方法,可以大大提高对原始识别模型的训练效率。
在一些实施例中,在训练得到成熟的原始识别模型之后,可以将原始识别模型的卷积块提取出来,添加新的全连接层之后,形成新的识别模型,并可以使用新的识别模型识别其他人物。新的识别模型经过少量样本的训练即可具备较高的识别准确度。例如,将新的识别模型应用到识别3D图片的类型的过程中,或者将新的识别模型应用到3D图片的分割等任务中,在此不再赘述。
本申请实施例还提供了一种用于实施上述图片识别方法的图片识别装置。如图11所示,该装置包括:
第一获取单元1102,配置为获取待识别的目标3D图片;
第一输入单元1104,配置为将所述待识别的目标3D图片输入到第一识别模型中。
其中,所述第一识别模型,用于对所述待识别的目标3D图片进行识别得到所述待识别的目标3D图片的图片类型;
所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;
所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;
所述目标训练样本包括:对从3D样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;
第二获取单元1106,配置为获取所述第一识别模型输出的所述待识别的目标3D图片的第一类型。
在一些实施例中,上述图片识别装置可以但不限于应用于图片识别领域。例如,将上述方法应用到识别3D图片的类型的过程中。如识别3D病症图片的中病症的类型的过程中。举例说明,在识别脑出血类型时,在获取到3D病症图片后,将3D病症图片输入到第一识别模型中,使用第一模型对3D病症图片进行识别并输出3D病症图片的第一类型。如第一类型可以为健康,或者动脉瘤,动静脉畸形,烟雾病,高血压等。
在上述方法中,由于预先使用从3D图片中提取的正方体对第二识别模型进行训练,从而提高了第二识别模型的训练效率;将第二识别模型的卷积块作为第一识别模型的卷积块,使用第一识别模型识别3D图片,实现了大大提高第一识别模型的训练效率的效果。
在一些实施例中,在获取到目标3D图片之前,需要先对第二识别模型进行训练。在训练时,首先需要获取到3D样本图片。3D样本图片为未标签标注的图片。在获取到3D样本图片之后,需要从3D样本图片中提取原始正方体,并将原始正方体拆分为N个目标正方体。
在一些实施例中,在提取原始正方体时,可以先确定3D样本图片的几何中心。在确定几何中心后,以该几何中心为上述原始正方体的几何中心,并确定出原始正方体。上述原始正方体的边长小于3D样本图片的最小变的长度。
例如,如图3所示,对于一张3D样本图片302,首先确定出3D样本图片302的几何中心304,然后确定出以几何中心304为几何中心的原始正方体306。
在一些实施例中,在确定出3D样本图片的几何中心之后,还可以确定出一个半径r,然后以3D样本图片的几何中心为圆心,以半径r为半径做球,然后从球中选择任意一点作为上述原始正方体的几何中中心,确定上述原始正方体。需要说明的是,确定出的原始正方体是位于3D样 本图片中的,不会超出3D样本图片的范围。
在一些实施例中,在确定出原始正方体之后,需要对原始正方体进行拆分,拆分得到N个目标正方体。在拆分时,可以使用任意方法,如从原始正方体中随机挖出N个目标正方体,或者将原始正方体的一部分拆分得到N个目标正方体。或者,将原始正方体平均拆分成N个目标正方体,N为正整数的三次方。以N为8为例,如图4所示,将一个原始正方体404延402-1、402-2、402-3的箭头所示的方向进行拆分,得到8个目标正方体(图4中的拆分方法仅为示例)。或者,在拆分时,每两个相邻的正方体之间间隔M个体素。例如,以M为2为例,如图5所示,将原始正方体502拆分为8个目标正方体504。原始正方体502的边长为10个体素,则目标正方体504的边长为4个体素。
在一些实施例中,在获取到N个目标正方体之后,还可以对N个目标正方体中的第一目标正方体旋转第一角度,如旋转90度,旋转180度等。第一目标正方体可以有一个或多个,每一个第一目标正方体旋转角度可以相同或不同。将旋转后的第一目标正方体与剩余未旋转的目标正方体进行排序,排序可以随机排序,排序后得到目标训练样本。
在获取到目标训练样本后,使用目标训练样本对原始识别模型进行训练,由原始识别模型输出目标训练样本中的目标正方体进行了哪种旋转以及排列的顺序的概率。上述概率可能满足第一目标函数也可能不满足第一目标函数。第一目标函数可以为损失函数。若是上述概率满足第一目标函数,则说明原始识别模型识别结果正确。若是上述概率不满足第一目标函数,则说明原始识别模型的识别结果不正确。在所述识别结果满足第一目标函数的概率大于第一阈值时,将所述原始识别模型确定为所述第二识别模型。说明第二识别模型的准确度大于了第一阈值。如准确度达到99.95%以上。
使用上述训练方法大大提高了训练第二识别模型的效率。
在一些实施例中,在训练得到第二识别模型之后,可以获取第二识别模型中的卷积块,并将卷积块作为第一识别模型的卷积块,并使用第一训练样本对第一识别模型进行训练。第一训练样本为包括图片类型的3D图片。在第一识别模型的识别准确度大于第二阈值后,可以将第一识别模型投入到使用之中。如识别3D图片的病症类型。如图6所示,终端的显示界面602上显示有选择按钮602-1,用户可以选择待识别的目标3D图片604,终端对待识别的目标3D图片604进行识别,并输出待识别的目标3D图片的第一类型606。
应用本申请上述实施例,由于预先使用从3D图片中提取的正方体对第二识别模型进行训练,从而提高了第二识别模型的训练效率;将第二识别模型的卷积块作为第一识别模型的卷积块,使用第一识别模型识别3D图片,实现了大大提高第一识别模型的训练效率的效果。
在一些实施例中,所述装置还包括:
第三获取单元,配置为在所述获取待识别的目标3D图片之前,获取所述3D样本图片;
第一确定单元,配置为从所述3D样本图片中确定出原始正方体;
拆分单元,配置为将所述原始正方体拆分为所述N个目标正方体。
在一些实施例中,3D样本图片与目标3D图片可以为相同的图片。即,在使用3D样本图片对第二识别模型进行训练之后,并将第二卷积块用作第一识别模型的卷积块之后,可以将3D样本图片输入到第一识别模型中,由第一识别模型识别3D样本图片的类型。3D样本图片在输入第二识别模型时,不需要输入3D样本图片的类型。
应用本申请上述实施例,在使用第一识别模型之前,获取N个目标正方体对第二识别模型进行训练,提高了对第二识别模型进行训练的训 练效率,提高了第一识别模型的训练效率。
在一些实施例中,所述N为大于1的正整数的3次方,所述拆分单元包括:
拆分模块,配置为保持相邻的两个所述目标正方体之间间隔M个体素,从所述原始正方体中拆分出所述N个目标正方体,所述M为大于0且小于J-1的正整数,所述J为所述目标正方体的边长。
在一些实施例中,在确定N个目标正方体时,两个相邻的目标正方体之间间隔M个体素,可以使第二识别模型学习到高级别的语义特征信息而非低级别的像素分布的统计特征信息,提高了第二识别模型的训练效率,提高了第一识别模型的训练效率。
在一些实施例中,所述装置还包括:
第二确定单元,配置为在所述获取待识别的目标3D图片之前,从所述N个目标正方体中确定出第一目标正方体;
旋转单元,配置为将所述第一目标正方体旋转第一角度;
排序单元,配置为将所述N个目标正方体中,旋转所述第一角度之后的所述第一目标样本正方体与其他目标正方体进行排序,得到所述目标训练样本。
在一些实施例中,上述排序可以为随机对N个目标正方体进行排序。上述旋转可以对N个目标正方体中的多个第一目标正方体进行旋转。旋转可以旋转任意角度。
应用本申请上述实施例,从而在使用第一识别模型之前,在获取N个目标正方体之后,对N个目标正方体中的第一目标正方体进行旋转,提高了对第二识别模型进行训练的训练效率,提高了第一识别模型的训练效率。
在一些实施例中,所述装置还包括:
第二输入单元,配置为在所述将所述N个目标正方体中,旋转所述第一角度之后的所述第一目标样本正方体与其他目标正方体进行排序,得到所述目标训练样本之后,将所述目标训练样本输入到所述原始识别模型中,以对所述原始识别模型进行训练,得到所述第二识别模型。
在一些实施例中,所述装置还包括:
第四获取单元,配置为所述获取待识别的目标3D图片之前,获取所述原始识别模型对所述目标训练样本进行识别后输出的识别结果,其中,所述识别结果中包括所述目标训练样本中所述目标正方体的各种排序顺序与每一个所述目标正方体的旋转角度的概率;
第三确定单元,配置为在所述识别结果满足第一目标函数的概率大于第一阈值时,将所述原始识别模型确定为所述第二识别模型。
在一些实施例中,所述装置还包括:
第四确定单元,配置为所述获取待识别的目标3D图片之前,将所述第二识别模型的卷积块确定为所述第一识别模型的卷积块;
训练单元,配置为使用第一训练样本对所述第一识别模型进行训练,直到所述第一识别模型的准确度大于第二阈值,其中,所述第一训练样本包括第一3D图片与所述第一3D图片的类型。
在一些实施例中,在对第一识别模型进行训练时,可以输入带有标签的第一样本图片。然后对第一识别模型进行训练,直到第一识别模型的识别准确度大于第二阈值,则第一识别模型可以投入到使用中。
通过在使用第一识别模型之前对第一识别模型进行训练,从而提高了对第一识别模型进行训练的训练效率。
本申请实施例还提供了一种用于实施上述识别模型训练方法的识别模型训练装置。如图12所示,该装置包括:
分割单元1202,配置为获取3D样本图片,从3D样本图片中分割出 N个目标正方体;
处理单元1204,配置为对N个目标正方体执行预定操作,得到目标训练样本,其中,预定操作包括对N个目标正方体进行旋转和排序;
训练单元1206,配置为使用目标训练样本对原始识别模型进行训练,得到第二识别模型,其中,原始识别模型用于输出对目标训练样本的识别结果,在识别结果满足第一目标函数的概率大于第一阈值时,将原始识别模型确定为第二识别模型。
在一些实施例中,上述装置可以但不限于应用于模型训练的过程中。在训练原始识别模型时,从一张3D样本图片中提取出N个目标正方体,将对N个目标正方体进行旋转与排序后得到N个正方体作为目标训练样本输入到原始识别模型中。具体提取、旋转、排序方法可以参见上述实施例中的方法,本实施例不再赘述。在训练原始识别模型时,由原始识别模型输出目标训练样本中的目标正方体进行了哪种旋转以及排列的顺序的概率。上述概率可能满足第一目标函数也可能不满足第一目标函数。第一目标函数可以为损失函数。若是上述概率满足第一目标函数,则说明原始识别模型识别结果正确。若是上述概率不满足第一目标函数,则说明原始识别模型的识别结果不正确。在所述识别结果满足第一目标函数的概率大于第一阈值时,将当前的原始识别模型确定为训练成熟的模型。
在一些实施例中,在训练得到成熟的原始识别模型之后,可以将原始识别模型的卷积块提取出来,添加新的全连接层之后,形成新的识别模型,并可以使用新的识别模型识别其他人物。新的识别模型经过少量样本的训练即可具备较高的识别准确度。例如,将新的识别模型应用到识别3D图片的类型的过程中,或者将新的识别模型应用到3D图片的分割等任务中,在此不再赘述。
本申请实施例还提供了一种用于实施上述图片识别方法的电子装置,如图13所示,该电子装置包括存储器1302和处理器1304,该存储器1302中存储有计算机程序,该处理器1304被设置为通过计算机程序执行本申请实施例提供的图片识别方法。
在一些实施例中,上述电子装置可以位于计算机网络的多个网络设备中的至少一个网络设备。
在一些实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,获取待识别的目标3D图片;
S2,将所述待识别的目标3D图片输入到第一识别模型中,其中,所述第一识别模型用于对所述待识别的目标3D图片进行识别得到所述待识别的目标3D图片的图片类型,所述第一识别模型的卷积块与第二识别模型的卷积块相同,所述第二识别模型为使用目标训练样本对原始识别模型进行训练得到的模型,所述目标训练样本包括对从3D样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;
S3,获取所述第一识别模型输出的所述待识别的目标3D图片的第一类型。
在一些实施例中,本领域普通技术人员可以理解,图13所示的结构仅为示意,电子装置也可以是智能手机(如Androrid手机、ios手机等、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图13并不对上述电子装置的结构造成限定。例如,电子装置还可包括比图13中所示更多或者更少的组件(如网络接口等),或者具有与图13所示不同的配置。
其中,存储器1302可用于存储软件程序以及模块,如本申请实施例 中的图片识别方法和装置对应的程序指令/模块,处理器1304通过运行存储在存储器1302内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的图片识别方法。存储器1302可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器1302还可包括相对于处理器1304远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。其中,存储器1302具体可以但不限于用于存储待识别的目标3D图片等信息。作为一种示例,如图13所示,上述存储器1302中可以但不限于包括上述图片识别装置中的第一获取单元1102、第一输入单元1104与第二获取单元1106。此外,还可以包括但不限于上述图片识别装置中的其他模块单元,本示例中不再赘述。
在一些实施例中,上述的传输装置1306用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置1306包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置1306为射频(Radio Frequency,RF)模块,可配置为通过无线方式与互联网进行通讯。
此外,上述电子装置还包括:显示器1308,配置为显示待识别的3D图片的第一类型;和连接总线1310,配置为连接上述电子装置中的各个模块部件。
本申请实施例还提供了一种用于实施上述识别模型训练方法的电子装置,如图14所示,该电子装置包括存储器1402和处理器1404,该存储器1402中存储有计算机程序,该处理器1404被设置为通过计算机程序执行本申请实施例提供的上述识别模型训练方法。
在一些实施例中,上述电子装置可以位于计算机网络的多个网络设备中的至少一个网络设备。
在一些实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,获取3D样本图片,从3D样本图片中分割出N个目标正方体;
S2,对N个目标正方体执行预定操作,得到目标训练样本,其中,预定操作包括对N个目标正方体进行旋转和排序;
S3,使用目标训练样本对原始识别模型进行训练,得到第二识别模型,其中,原始识别模型用于输出对目标训练样本的识别结果,在识别结果满足第一目标函数的概率大于第一阈值时,将原始识别模型确定为第二识别模型。
在一些实施例中,本领域普通技术人员可以理解,图14所示的结构仅为示意,电子装置也可以是智能手机(如Androrid手机、ios手机等、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图14其并不对上述电子装置的结构造成限定。例如,电子装置还可包括比图14中所示更多或者更少的组件(如网络接口等),或者具有与图14所示不同的配置。
其中,存储器1402可配置为存储软件程序以及模块,如本申请实施例中的识别模型训练方法和装置对应的程序指令/模块,处理器1404通过运行存储在存储器1402内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的识别模型训练方法。存储器1402可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器1402可进一步包括相对于处理器1404远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内 部网、局域网、移动通信网及其组合。其中,存储器1402具体可以但不限于用于存储3D样本图片等信息。作为一种示例,如图14所示,上述存储器1402中可以但不限于包括上述识别模型训练装置中的分割单元1202、处理单元1204与训练单元1206。此外,还可以包括但不限于上述识别模型训练装置中的其他模块单元,本示例中不再赘述。
在一些实施例中,上述的传输装置1406配置为经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置1406包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置1406为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
此外,上述电子装置还包括:显示器1408,配置为显示原始识别模型的训练准确度等;和连接总线1410,配置为连接上述电子装置中的各个模块部件。
本申请的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行本申请实施例提供的识别模型训练方法。
在一些实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,获取待识别的目标3D图片;
S2,将所述待识别的目标3D图片输入到第一识别模型中,其中,所述第一识别模型用于对所述待识别的目标3D图片进行识别得到所述待识别的目标3D图片的图片类型,所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;
所述第二识别模型为使用目标训练样本对原始识别模型进行训练得 到的模型,所述目标训练样本包括对从3D样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;
S3,获取所述第一识别模型输出的所述待识别的目标3D图片的第一类型。
或者,在一些实施例中,存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,获取3D样本图片,从3D样本图片中分割出N个目标正方体;
S2,对N个目标正方体执行预定操作,得到目标训练样本,其中,预定操作包括对N个目标正方体进行旋转和排序;
S3,使用目标训练样本对原始识别模型进行训练,得到第二识别模型,其中,原始识别模型用于输出对目标训练样本的识别结果,在识别结果满足第一目标函数的概率大于第一阈值时,将原始识别模型确定为第二识别模型。
在一些实施例中,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请 各个实施例所述方法的全部或部分步骤。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。
工业实用性
本申请实施例中获取待识别的目标三维图片;将所述待识别的目标三维图片输入到第一识别模型中;其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到所述目标三维图片的图片类型;所述第一识别 模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;获取所述第一识别模型输出的所述目标三维图片的类型。如此,第二识别模型为使用目标训练样本对原始识别模型进行训练得到的模型,上述目标训练样本包括对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体;预先使用从三维图片中提取的正方体对第二识别模型进行训练,提高了第二识别模型的训练效率,同时提高了对三维图片的识别准确度;第一识别模型的卷积块与第二识别模型的卷积块相同;也即,使用通过第二识别模型训练好的卷积块作为第一识别模型的卷积块,如此,提高了第一识别模型的训练效率;3)通过第一识别模型中与第二识别模型相同的卷积块,对目标三维图片进行识别,提高了识别准确度。

Claims (17)

  1. 一种图片识别方法,所述方法由终端执行,所述方法包括:
    获取待识别的目标三维图片;
    将所述待识别的目标三维图片输入到第一识别模型中;
    其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到所述目标三维图片的图片类型;
    所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;
    所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;
    其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;
    获取所述第一识别模型输出的所述目标三维图片的类型。
  2. 根据权利要求1所述的方法,其中,所述获取待识别的目标三维图片之前,所述方法还包括:
    获取所述三维样本图片;
    从所述三维样本图片中确定出原始正方体;
    将所述原始正方体拆分为所述N个目标正方体。
  3. 根据权利要求2所述的方法,其中,所述N的值为大于1的正整数的3次方,所述将所述原始正方体拆分为所述N个目标正方体,包括:
    保持相邻的两个所述目标正方体之间间隔M个体素,从所述原始正方体中拆分出所述N个目标正方体,所述M为大于0且小于J-1的正整数,所述J为所述目标正方体的边长。
  4. 根据权利要求1所述的方法,其中,所述获取待识别的目标三维图片之前,所述方法还包括:
    从所述N个目标正方体中确定出第一目标正方体;
    将所述第一目标正方体旋转第一角度;
    对旋转所述第一角度之后的第一目标正方体,与所述N个目标正方体中其他目标正方体进行排序,得到所述目标训练样本。
  5. 根据权利要求4所述的方法,其中,所述得到所述目标训练样本之后,所述方法还包括:
    将所述目标训练样本输入到所述原始识别模型中,以对所述原始识别模型进行训练,得到所述第二识别模型。
  6. 根据权利要求1所述的方法,其中,所述获取待识别的目标三维图片之前,所述方法还包括:
    获取所述原始识别模型对所述目标训练样本进行识别后输出的识别结果;
    其中,所述识别结果包括:所述目标训练样本中所述目标正方体的各种排序顺序与每一个所述目标正方体的旋转角度的概率;
    在所述识别结果满足第一目标函数的概率大于第一阈值时,将所述原始识别模型确定为所述第二识别模型。
  7. 根据权利要求1所述的方法,其中,所述获取待识别的目标三维图片之前,还包括:
    将所述第二识别模型的卷积块确定为所述第一识别模型的卷积块;
    使用第一训练样本对所述第一识别模型进行训练,直到所述第一识别模型的准确度大于第二阈值,其中,所述第一训练样本包括第一三维图片与所述第一三维图片的类型。
  8. 一种识别模型训练方法,所述方法由终端执行,所述方法包括:
    获取三维样本图片,并从所述三维样本图片中分割出N个目标正方体,N为大于1的自然数;
    对所述N个目标正方体进行旋转和排序,得到目标训练样本;
    使用所述目标训练样本对原始识别模型进行训练,得到第二识别模型;
    其中,所述第二识别模型的卷积块与第一识别模型的卷积块相同,用于所述第一识别模型对目标三维图片进行所述识别,得到所述目标三维图片的类型。
  9. 一种图片识别装置,包括:
    第一获取单元,配置为获取待识别的目标三维图片;
    第一输入单元,配置为将所述目标三维图片输入到第一识别模型中;
    其中,所述第一识别模型,用于对所述目标三维图片进行识别,得到所述目标三维图片的图片类型;
    所述第一识别模型的卷积块与第二识别模型的卷积块相同,用于对所述目标三维图片进行所述识别;
    所述第二识别模型为,使用目标训练样本对原始识别模型进行训练得到的模型;
    其中,所述目标训练样本包括:对从三维样本图片中获取的N个目标正方体进行旋转与排序后得到的正方体,N为大于1的自然数;
    第二获取单元,配置为获取所述第一识别模型输出的所述目标三维图片的类型。
  10. 根据权利要求9所述的装置,其中,所述装置还包括:
    第三获取单元,配置为在所述获取待识别的目标三维图片之前,获取所述三维样本图片;
    第一确定单元,配置为从所述三维样本图片中确定出原始正方体;
    拆分单元,配置为将所述原始正方体拆分为所述N个目标正方体。
  11. 根据权利要求10所述的装置,其中,所述装置还包括:
    第二确定单元,配置为在所述获取待识别的目标三维图片之前,从所述N个目标正方体中确定出第一目标正方体;
    旋转单元,配置为将所述第一目标正方体旋转第一角度;
    排序单元,配置为对旋转所述第一角度之后的第一目标正方体,与所述N个目标正方体中其他目标正方体进行排序,得到所述目标训练样本。
  12. 根据权利要求11所述的装置,其中,所述装置还包括:
    第二输入单元,配置为将排序得到的所述目标训练样本输入到所述原始识别模型中,以对所述原始识别模型进行训练,得到所述第二识别模型。
  13. 一种识别模型训练装置,包括:
    分割单元,配置为获取三维样本图片,并从所述三维样本图片中分割出N个目标正方体,N为大于1的自然数;
    处理单元,配置为对所述N个目标正方体进行旋转和排序,得到目标训练样本;
    训练单元,配置为使用所述目标训练样本对原始识别模型进行训练,得到第二识别模型;
    其中,所述第二识别模型的卷积块与第一识别模型的卷积块相同,用于所述第一识别模型对目标三维图片进行所述识别,得到所述目标三维图片的类型。
  14. 一种计算机存储介质,所述存储介质存储有计算机程序,所述计算机程序运行时执行所述权利要求1至7中任一项所述的图片识别方法。
  15. 一种计算机存储介质,所述存储介质存储有计算机程序,所述计算机程序运行时执行所述权利要求8所述的识别模型训练方法。
  16. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序时,执行所述权利要求1至7中任一项所述的图片识别方法。
  17. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序时,执行所述权利要求8所述的识别模型训练方法。
PCT/CN2020/097273 2019-06-21 2020-06-20 图片识别方法、识别模型训练方法、装置及存储介质 WO2020253852A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2021547115A JP7233555B2 (ja) 2019-06-21 2020-06-20 画像認識方法、認識モデルのトレーニング方法及びその装置、並びにコンピュータプログラム
EP20825586.9A EP3989109A4 (en) 2019-06-21 2020-06-20 IMAGE IDENTIFICATION METHOD AND DEVICE, IDENTIFICATION PATTERN TRAINING METHOD AND DEVICE, AND STORAGE MEDIA
KR1020217029414A KR102645533B1 (ko) 2019-06-21 2020-06-20 이미지 식별 방법 및 기기, 식별 모델 훈련 방법 및 기기, 그리고 저장 매체
US17/402,500 US12112556B2 (en) 2019-06-21 2021-08-13 Image recognition method and apparatus, recognition model training method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910544392.0 2019-06-21
CN201910544392.0A CN110263724A (zh) 2019-06-21 2019-06-21 图片识别方法、识别模型训练方法、装置及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/402,500 Continuation US12112556B2 (en) 2019-06-21 2021-08-13 Image recognition method and apparatus, recognition model training method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2020253852A1 true WO2020253852A1 (zh) 2020-12-24

Family

ID=67920476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097273 WO2020253852A1 (zh) 2019-06-21 2020-06-20 图片识别方法、识别模型训练方法、装置及存储介质

Country Status (6)

Country Link
US (1) US12112556B2 (zh)
EP (1) EP3989109A4 (zh)
JP (1) JP7233555B2 (zh)
KR (1) KR102645533B1 (zh)
CN (2) CN110263724A (zh)
WO (1) WO2020253852A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263724A (zh) * 2019-06-21 2019-09-20 腾讯科技(深圳)有限公司 图片识别方法、识别模型训练方法、装置及存储介质
CN110710986B (zh) * 2019-10-25 2021-01-22 华院数据技术(上海)有限公司 一种基于ct图像的脑部动静脉畸形检测方法及检测系统
CN111166070A (zh) * 2019-12-17 2020-05-19 五邑大学 一种基于指静脉认证的医疗储物柜及其管理方法
CN111242952B (zh) * 2020-01-15 2023-06-30 腾讯科技(深圳)有限公司 图像分割模型训练方法、图像分割方法、装置及计算设备
CN111353580B (zh) * 2020-02-03 2023-06-20 中国人民解放军国防科技大学 目标检测网络的训练方法、电子设备及存储介质
CN111723868B (zh) * 2020-06-22 2023-07-21 海尔优家智能科技(北京)有限公司 用于去除同源图片的方法、装置及服务器
CN112241764B (zh) * 2020-10-23 2023-08-08 北京百度网讯科技有限公司 图像识别方法、装置、电子设备及存储介质
CN112686898B (zh) * 2021-03-15 2021-08-13 四川大学 一种基于自监督学习的放疗靶区自动分割方法
CN112949583A (zh) * 2021-03-30 2021-06-11 京科互联科技(山东)有限公司 复杂城市场景的目标检测方法、系统、设备及存储介质
CN113362313B (zh) * 2021-06-18 2024-03-15 四川启睿克科技有限公司 一种基于自监督学习的缺陷检测方法及系统
CN114092446B (zh) * 2021-11-23 2024-07-16 中国人民解放军总医院 基于自监督学习与M-Net的颅内出血参数获取方法及装置
CN114549904B (zh) * 2022-02-25 2023-07-07 北京百度网讯科技有限公司 视觉处理及模型训练方法、设备、存储介质及程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154043A (zh) * 2017-06-05 2017-09-12 杭州健培科技有限公司 一种基于3dcnn的肺结节假阳性样本抑制方法
CN107977963A (zh) * 2017-11-30 2018-05-01 北京青燕祥云科技有限公司 肺结节的判定方法、装置和实现装置
CN108389201A (zh) * 2018-03-16 2018-08-10 北京推想科技有限公司 基于3d卷积神经网络与深度学习的肺结节良恶性分类方法
US20180260621A1 (en) * 2017-03-10 2018-09-13 Baidu Online Network Technology (Beijing) Co., Ltd. Picture recognition method and apparatus, computer device and computer- readable medium
CN110263724A (zh) * 2019-06-21 2019-09-20 腾讯科技(深圳)有限公司 图片识别方法、识别模型训练方法、装置及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10467495B2 (en) * 2015-05-11 2019-11-05 Siemens Healthcare Gmbh Method and system for landmark detection in medical images using deep neural networks
CN107025642B (zh) * 2016-01-27 2018-06-22 百度在线网络技术(北京)有限公司 基于点云数据的车辆轮廓检测方法和装置
AU2018313841B2 (en) * 2017-08-09 2023-10-26 Allen Institute Systems, devices, and methods for image processing to generate an image having predictive tagging
CN109147940B (zh) * 2018-07-05 2021-05-25 科亚医疗科技股份有限公司 从患者的医学图像自动预测生理状况的装置和系统
CN109063753B (zh) * 2018-07-18 2021-09-14 北方民族大学 一种基于卷积神经网络的三维点云模型分类方法
CN109886933B (zh) * 2019-01-25 2021-11-02 腾讯科技(深圳)有限公司 一种医学图像识别方法、装置和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180260621A1 (en) * 2017-03-10 2018-09-13 Baidu Online Network Technology (Beijing) Co., Ltd. Picture recognition method and apparatus, computer device and computer- readable medium
CN107154043A (zh) * 2017-06-05 2017-09-12 杭州健培科技有限公司 一种基于3dcnn的肺结节假阳性样本抑制方法
CN107977963A (zh) * 2017-11-30 2018-05-01 北京青燕祥云科技有限公司 肺结节的判定方法、装置和实现装置
CN108389201A (zh) * 2018-03-16 2018-08-10 北京推想科技有限公司 基于3d卷积神经网络与深度学习的肺结节良恶性分类方法
CN110263724A (zh) * 2019-06-21 2019-09-20 腾讯科技(深圳)有限公司 图片识别方法、识别模型训练方法、装置及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3989109A4 *

Also Published As

Publication number Publication date
CN110263724A (zh) 2019-09-20
EP3989109A4 (en) 2022-07-20
JP7233555B2 (ja) 2023-03-06
KR102645533B1 (ko) 2024-03-07
JP2022520390A (ja) 2022-03-30
US12112556B2 (en) 2024-10-08
CN111046855A (zh) 2020-04-21
US20210374475A1 (en) 2021-12-02
EP3989109A1 (en) 2022-04-27
KR20210119539A (ko) 2021-10-05

Similar Documents

Publication Publication Date Title
WO2020253852A1 (zh) 图片识别方法、识别模型训练方法、装置及存储介质
EP3732619B1 (en) Convolutional neural network-based image processing method and image processing apparatus
JP6993371B2 (ja) ディープラーニングに基づいたコンピュータ断層撮影肺結節検出法
US10970520B1 (en) Apparatus and method for image analysis using virtual three-dimensional deep neural network
CN109522874B (zh) 人体动作识别方法、装置、终端设备及存储介质
WO2020107847A1 (zh) 基于骨骼点的跌倒检测方法及其跌倒检测装置
CN109685819B (zh) 一种基于特征增强的三维医学图像分割方法
US20210158023A1 (en) System and Method for Generating Image Landmarks
EP4002161A1 (en) Image retrieval method and apparatus, storage medium, and device
WO2020125498A1 (zh) 心脏磁共振图像分割方法、装置、终端设备及存储介质
CN110838125A (zh) 医学图像的目标检测方法、装置、设备、存储介质
EP4404148A1 (en) Image processing method and apparatus, and computer-readable storage medium
WO2022111387A1 (zh) 一种数据处理方法及相关装置
WO2023160157A1 (zh) 三维医学图像的识别方法、装置、设备、存储介质及产品
CN112529068A (zh) 一种多视图图像分类方法、系统、计算机设备和存储介质
TW202215303A (zh) 使用基於自我注意之神經網路處理影像
JP2021039758A (ja) 画像間の類似度を利用した類似領域強調方法およびシステム
CN110427870B (zh) 眼部图片识别方法、目标识别模型训练方法及装置
CN117710670A (zh) 一种基于多种特征融合的胶质瘤影像分割方法及系统
CN117373064A (zh) 基于自适应跨维度加权的人体姿态估计方法、计算机设备及存储介质
CN111709473A (zh) 对象特征的聚类方法及装置
CN113139490B (zh) 一种图像特征匹配方法、装置、计算机设备及存储介质
CN115545085A (zh) 微弱故障电流的故障类型识别方法、装置、设备和介质
Qi et al. An efficient deep learning hashing neural network for mobile visual search
JP7105918B2 (ja) 領域特定装置、方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20825586

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021547115

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217029414

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2020825586

Country of ref document: EP