WO2019101021A1 - Image recognition method, apparatus, and electronic device - Google Patents

Image recognition method, apparatus, and electronic device Download PDF

Info

Publication number
WO2019101021A1
WO2019101021A1 PCT/CN2018/116044 CN2018116044W WO2019101021A1 WO 2019101021 A1 WO2019101021 A1 WO 2019101021A1 CN 2018116044 W CN2018116044 W CN 2018116044W WO 2019101021 A1 WO2019101021 A1 WO 2019101021A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
candidate region
layer
target candidate
Prior art date
Application number
PCT/CN2018/116044
Other languages
French (fr)
Chinese (zh)
Inventor
李峰
左小祥
陈家君
李昊沅
曾维亿
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201711180320.XA priority Critical patent/CN109829456A/en
Priority to CN201711180320.X priority
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2019101021A1 publication Critical patent/WO2019101021A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/32Aligning or centering of the image pick-up or image-field
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means

Abstract

An image recognition method, an apparatus, and an electronic device. The method comprises: using an image detection model to detect a target candidate area in a target image (101); when the target candidate area is detected from the target image, extracting the target candidate area (102); and using an image recognition model to perform image recognition on the basis of the target candidate area, to obtain a recognition result of the target image (103). Said method firstly and preliminarily detects, by means of an image detection model, a target candidate area that may include a target in an image, then uses an image recognition model to perform recognition on the basis of the detected target candidate area, and combines the two models, so that in cases where the proportion of the target in the image is small, the target in the image can still be accurately recognized, improving the accuracy of image recognition.

Description

图像识别方法、装置及电子设备Image recognition method, device and electronic device
本申请要求于2017年11月23日提交的申请号为201711180320.X、发明名称为“图像识别方法、装置及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No.
技术领域Technical field
本申请实施例涉及机器学习技术领域,特别涉及一种图像识别方法、装置及电子设备。The embodiments of the present invention relate to the field of machine learning technologies, and in particular, to an image recognition method, device, and electronic device.
背景技术Background technique
图像识别技术是指识别出图像所包括的物体的技术,是一种常见的图像处理的方式。Image recognition technology refers to the technique of recognizing objects included in an image, and is a common method of image processing.
相关技术中,终端先采用样本集对卷积神经网络(Convolutional Neural Network,CNN)进行训练,得到图像识别模型,之后将待识别的图像输入上述训练好的图像识别模型,由图像识别模型对图像进行识别,并输出识别结果。In the related art, the terminal first uses a sample set to train a Convolutional Neural Network (CNN) to obtain an image recognition model, and then inputs the image to be recognized into the trained image recognition model, and the image recognition model pairs the image. Identify and output the recognition result.
相关技术中,当待识别的物体在图像中所占的比例较小时,会出现识别错误或者无法识别的情况。In the related art, when the proportion of the object to be recognized in the image is small, an identification error or an unrecognizable situation may occur.
发明内容Summary of the invention
本申请实施例提供了一种图像识别方法、装置及电子设备,可用以解决相关技术中所存在的当待识别的物体在图像中所占的比例较小时,会出现识别错误或者无法识别的情况的问题。所述技术方案如下:The embodiment of the present application provides an image recognition method, device, and electronic device, which can be used to solve the problem that the recognition error or the unrecognizable situation occurs when the proportion of the object to be identified in the image is small in the related art. The problem. The technical solution is as follows:
一方面,本申请实施例提供了一种图像识别方法,应用于电子设备中,所述方法包括:In one aspect, an embodiment of the present application provides an image recognition method, which is applied to an electronic device, where the method includes:
采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块;Detecting, by using an image detection model, a target candidate region in the target image, where the target candidate region is an image block including the target object;
当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域;Extracting the target candidate region when the target candidate region is detected from the target image;
采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。Image recognition is performed based on the target candidate region by using an image recognition model, and a recognition result of the target image is obtained.
另一方面,本申请实施例提供了一种图像识别装置,应用于电子设备中,所述装置包括:On the other hand, an embodiment of the present application provides an image recognition apparatus, which is applied to an electronic device, and the apparatus includes:
图像检测模块,用于采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块;An image detecting module, configured to detect, by using an image detection model, a target candidate region in the target image, where the target candidate region is an image block that includes the target object;
区域提取模块,用于当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域;a region extracting module, configured to extract the target candidate region when the target candidate region is detected from the target image;
图像识别模块,用于采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。An image recognition module is configured to perform image recognition based on the target candidate region by using an image recognition model to obtain a recognition result of the target image.
可选地,所述图像检测模块,用于:Optionally, the image detecting module is configured to:
采用所述图像检测模型获取所述目标图像中的每一个像素属于所述目标对象的概率;Acquiring, by the image detection model, a probability that each pixel in the target image belongs to the target object;
根据各个像素对应的概率确定所述目标候选区域,所述目标候选区域包括概率大于预设阈值的像素。The target candidate region is determined according to a probability corresponding to each pixel, and the target candidate region includes a pixel whose probability is greater than a preset threshold.
可选地,所述图像检测模块,用于:Optionally, the image detecting module is configured to:
根据各个像素对应的概率获取符合第一预设条件的图像块,将所述符合第一预设条件的图像块确定为目标图像块,其中,所述第一预设条件是指包含连续且大于预设数量的目标像素,所述目标像素是指概率大于预设阈值的像素;Obtaining an image block that meets the first preset condition according to a probability corresponding to each pixel, and determining the image block that meets the first preset condition as a target image block, where the first preset condition is continuous and larger than a preset number of target pixels, wherein the target pixel refers to a pixel whose probability is greater than a preset threshold;
将包含所述目标图像块且符合第二预设条件的矩形区域确定为所述目标候选区域,所述第二预设条件为所述目标图像块在所述矩形区域内的占比大于预设比例。Determining, as the target candidate region, a rectangular area including the target image block and meeting a second preset condition, wherein the second preset condition is that the proportion of the target image block in the rectangular area is greater than a preset proportion.
可选地,所述图像识别模块,用于:Optionally, the image recognition module is configured to:
采用所述图像识别模型对所述目标候选区域进行特征提取,得到所述目标候选区域的图像特征;Performing feature extraction on the target candidate region by using the image recognition model to obtain an image feature of the target candidate region;
根据所述目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第一概率分布;Determining, according to the image feature of the target candidate region, a first probability distribution of the target object in the target candidate region in the plurality of recognition results;
将所述第一概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。The recognition result corresponding to the maximum value in the first probability distribution is determined as the recognition result of the target image.
可选地,所述图像识别模块,用于:Optionally, the image recognition module is configured to:
对所述目标候选区域进行预处理,得到处理后的目标候选区域,所述处理后的目标候选区域的分辨率达到预设分辨率;Performing pre-processing on the target candidate region to obtain a processed target candidate region, where the resolution of the processed target candidate region reaches a preset resolution;
采用所述图像识别模型对所述处理后的目标候选区域进行特征提取,得到所述处理后的目标候选区域的图像特征;Performing feature extraction on the processed target candidate region by using the image recognition model to obtain image features of the processed target candidate region;
根据所述处理后的目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第二概率分布;Determining, according to the image feature of the processed target candidate region, a second probability distribution of the target object in the target candidate region in the plurality of recognition results;
将所述第二概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。The recognition result corresponding to the maximum value in the second probability distribution is determined as the recognition result of the target image.
可选地,所述图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、归一层和输出层;所述输入层用于输入所述目标图像;所述卷积层用于将所述目标图像转化为特征图;所述池化层用于对所述卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量;所述上卷积层用于对所述卷积层输出的特征图执行上卷积操作;所述拼接层用于对经过所述池化层和所述上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图;所述归一层,用于对所述拼接后的特征图进行归一处理,得到所述目标候选区域的位置信息;所述输出层,用于输出所述目标候选区域的位置信息。Optionally, the image detection model includes an input layer, a convolution layer, a pooling layer, an upper convolution layer, a stitching layer, a layer and an output layer; the input layer is used to input the target image; a convolution layer is configured to convert the target image into a feature map; the pooling layer is configured to perform a pooling process on a feature map output by the convolution layer to reduce a number of features in the feature map; The upper convolution layer is configured to perform an upper convolution operation on the feature map outputted by the convolution layer; the concatenation layer is configured to perform splicing processing on the feature map processed by the pooling layer and the upper convolution layer And obtaining a spliced feature map; the homing layer is configured to perform normalization processing on the spliced feature map to obtain location information of the target candidate region; and the output layer is configured to output the target Location information of the candidate area.
可选地,所述图像识别模型包括输入层、卷积层、池化层、归一层和输出层;所述输入层用于输入所述目标候选区域;所述卷积层用于将所述目标候选区域转化为特征图;所述池化层用于对所述特征图进行池化处理,以减少所述特征图中的特征数量;所述归一层用于对经过所述卷积层和所述池化层处理后的特征图进行归一处理,得到所述识别结果;所述输出层用于输出所述识别结果。Optionally, the image recognition model includes an input layer, a convolution layer, a pooling layer, a layer and an output layer; the input layer is used to input the target candidate region; and the convolution layer is used for Converting the target candidate region into a feature map; the pooling layer is configured to perform pooling processing on the feature map to reduce the number of features in the feature map; and the layer is used to perform the convolution The layer and the pooled layer processed feature map are subjected to normalization processing to obtain the recognition result; and the output layer is configured to output the recognition result.
可选地,所述装置还包括:Optionally, the device further includes:
比例获取模块,用于获取所述目标候选区域占所述目标图像的比例;a ratio acquisition module, configured to acquire a ratio of the target candidate area to the target image;
所述图像识别模块,还用于若所述比例大于预设门限,则直接执行所述采用图像识别模型对所述目标候选区域进行识别,得到所述目标图像的识别结果的步骤。The image recognition module is further configured to directly perform the step of recognizing the target candidate region by using an image recognition model to obtain a recognition result of the target image, if the ratio is greater than a preset threshold.
可选地,所述装置还包括:Optionally, the device further includes:
第一获取模块,用于获取第一训练样本集,所述第一训练样本集包含多张第一训练样本,每张所述第一训练样本被标记出包括所述目标的区域和/或不包括所述目标的区域;a first acquiring module, configured to acquire a first training sample set, where the first training sample set includes a plurality of first training samples, each of the first training samples is marked with an area including the target and/or not The area including the target;
第一训练模块,用于采用所述第一训练样本集对卷积神经网络CNN进行训 练,得到所述图像检测模型。The first training module is configured to train the convolutional neural network CNN by using the first training sample set to obtain the image detection model.
可选地,所述装置还包括:Optionally, the device further includes:
第二获取模块,用于获取第二训练样本集,所述第二训练样本集包含多张第二训练样本,每张所述第二训练样本对应有识别结果;a second acquiring module, configured to acquire a second training sample set, where the second training sample set includes a plurality of second training samples, and each of the second training samples corresponds to a recognition result;
第二训练模块,用于采用所述第二训练样本集对卷积神经网络CNN进行训练,得到所述图像识别模型。The second training module is configured to train the convolutional neural network CNN by using the second training sample set to obtain the image recognition model.
又一方面,本申请实施例提供了一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如第一方面所述的图像识别方法。In another aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction The at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the image recognition method of the first aspect.
再一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如第一方面所述的图像识别方法。In a further aspect, the embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the At least one program, the set of codes, or a set of instructions is loaded and executed by a processor to implement the image recognition method of the first aspect.
再一方面,本申请实施例提供了一种计算机程序产品,当该计算机程序产品被执行时,其用于执行上述第一方面所述的图像识别方法。In still another aspect, an embodiment of the present application provides a computer program product for performing the image recognition method of the first aspect described above when the computer program product is executed.
本申请实施例提供的技术方案可以带来如下有益效果:The technical solution provided by the embodiment of the present application can bring the following beneficial effects:
先通过图像检测模型初步检测出图像中可能包括目标对象的目标候选区域,并提取出目标候选区域,之后采用图像识别模型基于提取到的目标候选区域进行识别,得到识别结果,当目标对象在图像中所占的比例较小时,由于电子设备从图像中提取出了包含目标对象的目标候选区域,目标对象在目标候选区域所占的比例较大,此时通过图像识别模型对目标候选区域进行识别,可以避免相关技术中由于目标对象在图像中所占的比例较小时出现无法识别甚至识别错误的情况,提高图像识别的成功率。Firstly, the target candidate region of the target object may be detected by the image detection model, and the target candidate region may be extracted, and then the image recognition model is used to identify the target candidate region based on the extracted target, and the recognition result is obtained when the target object is in the image. When the proportion of the target is small, the electronic device extracts the target candidate region including the target object from the image, and the target object occupies a large proportion in the target candidate region. At this time, the target candidate region is identified by the image recognition model. It can avoid the situation in the related art that the recognition of the target object in the image is small, and the recognition success rate of the image recognition is improved.
附图说明DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.
图1是本申请一个示例性实施例示出的图像识别方法的流程图;1 is a flowchart of an image recognition method illustrated by an exemplary embodiment of the present application;
图2是图1所示实施例涉及的示意图;Figure 2 is a schematic view of the embodiment shown in Figure 1;
图3是本申请一个示例性实施例示出的第一训练样本的示意图;3 is a schematic diagram of a first training sample shown in an exemplary embodiment of the present application;
图4是本申请一个示例性实施例示出的检测过程的示意图;4 is a schematic diagram of a detection process illustrated by an exemplary embodiment of the present application;
图5是本申请一个示例性实施例示出的第二训练样本集的示意图;FIG. 5 is a schematic diagram of a second training sample set shown in an exemplary embodiment of the present application; FIG.
图6是本申请一个示例性实施例示出的识别过程的示意图;6 is a schematic diagram of an identification process illustrated by an exemplary embodiment of the present application;
图7是本申请另一个示例性实施例示出的图像识别方法的流程图;FIG. 7 is a flowchart of an image recognition method illustrated by another exemplary embodiment of the present application; FIG.
图8是本申请一个示例性实施例示出的图形识别的界面示意图;FIG. 8 is a schematic diagram of an interface for pattern recognition according to an exemplary embodiment of the present application; FIG.
图9是本申请一个示例性实施例示出的图形识别的界面示意图;FIG. 9 is a schematic diagram of an interface for pattern recognition according to an exemplary embodiment of the present application; FIG.
图10是本申请一个示例性实施例示出的图像识别装置的结构方框图;FIG. 10 is a block diagram showing the structure of an image recognition apparatus according to an exemplary embodiment of the present application; FIG.
图11是本申请另一个实施例示出的图像识别装置的结构方框图;11 is a block diagram showing the structure of an image recognition apparatus according to another embodiment of the present application;
图12是本申请一个示例性实施例示出的电子设备的结构方框图。FIG. 12 is a block diagram showing the structure of an electronic device according to an exemplary embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objects, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
相关技术中,在通过相关模型进行图像识别时,该模型通常根据对一张图像中的感兴趣程度将该图像划分为多个区域,之后从感兴趣程度较高的区域中学习相关特征,进而根据学习到的特征来确定图像识别结果。当待识别物体在图像中所占的比例较小时,后续通过上述模型进行识别时,包含该待识别物体的区域被模型确定为感兴趣区域的概率较低,此时上述模型基于图像中除包含该待识别物体的区域之外的区域来进行图像识别,可能会出现识别错误或者无法识别的情况。In the related art, when performing image recognition by a correlation model, the model generally divides the image into a plurality of regions according to the degree of interest in one image, and then learns related features from regions with a higher degree of interest, and further The image recognition result is determined based on the learned features. When the proportion of the object to be identified in the image is small, when the image is subsequently identified by the above model, the probability that the region containing the object to be identified is determined by the model as the region of interest is low, and the above model is based on the inclusion of the image. An area other than the area of the object to be identified is used for image recognition, and an identification error or an unrecognizable condition may occur.
基于此,本申请实施例提供了一种图像识别方法、装置及电子设备。在本申请实施例中,先通过图像检测模型初步检测出图像中可能包括目标对象的目标候选区域,并提取出目标候选区域,之后采用图像识别模型基于提取到的目标候选区域进行识别,得到识别结果,当目标对象在图像中所占的比例较小时,由于电子设备从图像中提取出了包含目标对象的目标候选区域,目标对象在目标候选区域所占的比例较大,此时通过图像识别模型对目标候选区域进行识别,可以避免相关技术中由于目标对象在图像中所占的比例较小时出现无法识别甚 至识别错误的情况,提高图像识别的成功率。Based on this, the embodiment of the present application provides an image recognition method, device, and electronic device. In the embodiment of the present application, the target candidate region that may include the target object in the image is initially detected by the image detection model, and the target candidate region is extracted, and then the image recognition model is used to identify and extract the target candidate region based on the extracted image. As a result, when the proportion of the target object in the image is small, since the electronic device extracts the target candidate region including the target object from the image, the target object occupies a larger proportion in the target candidate region, and at this time, through image recognition. The model can identify the target candidate region, which can avoid the situation that the target object cannot be recognized or even recognized due to the small proportion of the target object in the image, and the success rate of image recognition is improved.
本申请实施例提供的方法,各步骤的执行主体可以是电子设备,该电子设备具有图像处理能力。可选地,该电子设备可以是诸如个人计算机、手机、平板电脑等终端,也可以是服务器。The method provided by the embodiment of the present application may be an electronic device having an image processing capability. Optionally, the electronic device may be a terminal such as a personal computer, a mobile phone, a tablet computer, or a server.
请参考图1,其示出了本申请一个实施例示出的图像识别方法的流程图。该方法可以包括如下步骤:Please refer to FIG. 1, which shows a flowchart of an image recognition method according to an embodiment of the present application. The method can include the following steps:
步骤101,采用图像检测模型检测目标图像中的目标候选区域。In step 101, an image detection model is used to detect a target candidate region in the target image.
目标候选区域为包含目标对象的图像块。目标对象是指目标图像中的待识别物体,其可以是人脸、物体、手势等等,本申请实施例对此不作限定。目标图像是待检测的图像,其可以是图片,也可以是视频中的某一帧图像。The target candidate area is an image block containing the target object. The target object refers to the object to be identified in the target image, which may be a face, an object, a gesture, and the like, which is not limited in this embodiment of the present application. The target image is an image to be detected, which may be a picture or a certain frame image in the video.
图像检测模型用于检测目标图像中是否包括目标对象。可选地,图像检测模型还用于检测目标对象在目标图像中的大致区域,也即目标候选区域。可选地,图像检测模型是对CNN进行训练得到的模型。对于图像检测模型的训练过程以及网络架构,将在下文实施例进行介绍。The image detection model is used to detect whether a target object is included in the target image. Optionally, the image detection model is further configured to detect a rough area of the target object in the target image, that is, a target candidate area. Optionally, the image detection model is a model obtained by training the CNN. The training process and network architecture for the image detection model will be described in the following examples.
可选地,步骤101可以包括如下几个子步骤:Optionally, step 101 may include the following sub-steps:
步骤101a,采用图像检测模型获取目标图像中的每一个像素属于目标对象的概率;Step 101a, using an image detection model to acquire a probability that each pixel in the target image belongs to the target object;
图像检测模型能够对目标图像中的每一个像素进行特征提取,并将各个像素对应的特征提取结果与预设的图像特征进行匹配,上述特征提取结果与预设的图像特征之间的匹配程度可以用来衡量特征提取结果对应的像素属于目标对象的概率。特征提取结果与预设的图像特征之间的匹配程度越大,则该特征提取结果对应的像素属于目标对象的概率越大;特征提取结果与预设的图像特征之间的匹配程度越小,则该特征提取结果对应的像素属于目标对象的概率越小。其中,预设的图像特征可以是组成目标的像素对应的图像特征,其可以在训练出图像检测模型之后得到。The image detection model can perform feature extraction on each pixel in the target image, and match the feature extraction result corresponding to each pixel with the preset image feature, and the matching degree between the feature extraction result and the preset image feature can be It is used to measure the probability that the pixel corresponding to the feature extraction result belongs to the target object. The greater the degree of matching between the feature extraction result and the preset image feature, the greater the probability that the pixel corresponding to the feature extraction result belongs to the target object; the smaller the degree of matching between the feature extraction result and the preset image feature, Then, the probability that the pixel corresponding to the feature extraction result belongs to the target object is smaller. The preset image feature may be an image feature corresponding to the pixel constituting the target, which may be obtained after the image detection model is trained.
另外,获取目标图像的各个像素属于目标对象的概率之后,可以采用概率矩阵来表示上述概率。其中,概率矩阵所包括的概率与目标图像所包括的像素点一一对应。例如,概率矩阵第4行第3列的数值用于指示目标图像第4行第3列的像素点对应的概率。In addition, after acquiring the probability that each pixel of the target image belongs to the target object, the probability matrix may be used to represent the above probability. The probability included in the probability matrix corresponds one-to-one with the pixel points included in the target image. For example, the value of the fourth row and the third column of the probability matrix is used to indicate the probability that the pixel points of the fourth row and the third column of the target image correspond.
步骤101b,根据各个像素对应的概率确定目标候选区域。In step 101b, the target candidate region is determined according to the probability corresponding to each pixel.
目标候选区域包括概率大于预设阈值的像素。预设阈值可以根据图像识别模型对目标占目标图像的比例要求实际确定。例如,图像识别模型要求目标占目标图像的比例较大时,则预设阈值也越大。示例性地,预设阈值为0.7。可选地,终端对概率矩阵进行二值化处理,将大于或等于预设阈值的概率设置为1,将不大于预设阈值的设置为0。通过上述方式,将大于或等于预设阈值的概率,以及小于预设阈值的概率进行区分。The target candidate region includes pixels whose probability is greater than a preset threshold. The preset threshold may be determined according to the image recognition model's requirement for the target to occupy the target image. For example, when the image recognition model requires that the target has a large proportion of the target image, the preset threshold is also larger. Illustratively, the preset threshold is 0.7. Optionally, the terminal performs binarization processing on the probability matrix, sets a probability greater than or equal to the preset threshold to 1, and sets a threshold that is not greater than the preset threshold to 0. In the above manner, the probability of being greater than or equal to the preset threshold and the probability of being less than the preset threshold are distinguished.
可选地,确定目标候选区域可以采用如下方式:根据各个像素对应的概率获取符合第一预设条件的图像块,将符合第一预设条件的图像块确定为目标图像块,其中,第一预设条件是指包含连续且大于预设数量的目标像素,目标像素是指概率大于预设阈值的像素;将包含目标图像块且符合第二预设条件的矩形区域确定为目标候选区域,第二预设条件为目标图像块在矩形区域内的占比大于预设比例。预设数量、预设阈值和预设比例均可以根据实际需求设定,本申请实施例对此不作限定。Optionally, determining the target candidate region may be performed by: acquiring an image block that meets the first preset condition according to a probability corresponding to each pixel, and determining an image block that meets the first preset condition as a target image block, where The preset condition is to include a target pixel that is continuous and larger than the preset number, and the target pixel refers to a pixel whose probability is greater than a preset threshold; and the rectangular area that includes the target image block and meets the second preset condition is determined as the target candidate area, The second preset condition is that the proportion of the target image block in the rectangular area is greater than the preset ratio. The preset number, the preset threshold, and the preset ratio may be set according to actual requirements, which is not limited in this embodiment of the present application.
进一步地,第二预设条件还可以是目标图像块的占比达到最大,也即,矩形区域是包含目标图像块的最小矩形区域。通过上述方式,目标在目标候选区域的占比尽可能地大,后续采用图像识别模型识别时,能使识别效率得到提高,并且能提高识别的准确度。Further, the second preset condition may also be that the proportion of the target image block is maximized, that is, the rectangular area is the smallest rectangular area including the target image block. In the above manner, the proportion of the target in the target candidate area is as large as possible, and when the image recognition model is used for subsequent recognition, the recognition efficiency can be improved, and the accuracy of the recognition can be improved.
结合参考图2,其示出了图1所示实施例涉及的示意图。图像检测模型11对输入的目标图像10进行检测之后,输出被标记有目标候选区域12的目标图像10。Referring to Figure 2, there is shown a schematic diagram of the embodiment of Figure 1. After the image detection model 11 detects the input target image 10, the target image 10 marked with the target candidate region 12 is output.
步骤102,当从目标图像中检测出目标候选区域时,提取目标候选区域。Step 102: When the target candidate region is detected from the target image, the target candidate region is extracted.
从目标图像中提取目标候选区域,也即从目标图像中截取目标候选区域。结合参考图2,终端从目标图像10中提取目标候选区域12。The target candidate region is extracted from the target image, that is, the target candidate region is intercepted from the target image. Referring to FIG. 2, the terminal extracts the target candidate region 12 from the target image 10.
当从目标图像中未检测到目标候选区域时,说明该目标图像中不包括目标对象,即可结束流程。When the target candidate region is not detected from the target image, it is indicated that the target image is not included in the target image, and the flow can be ended.
另外,当目标对象在目标图像中所占的比例较大时,终端可以直接对目标图像进行识别,而无需执行步骤102,也即无需从目标图像中提取目标候选区域,因此在步骤102之前,终端可以获取目标候选区域占目标图像的比例,若比例大于预设门限,则直接执行步骤103,若比例小于或等于预设门限,则执行步骤 102。其中,预设门限可以根据图像识别模型的识别精度实际确定。示例性地,预设门限为30%。通过上述方式,可以省去提取目标候选区域所需的时间,提升图像识别的效率。In addition, when the proportion of the target object in the target image is large, the terminal can directly identify the target image without performing step 102, that is, without extracting the target candidate region from the target image, so before step 102, The terminal can obtain the ratio of the target candidate area to the target image. If the ratio is greater than the preset threshold, step 103 is performed. If the ratio is less than or equal to the preset threshold, step 102 is performed. The preset threshold may be actually determined according to the recognition accuracy of the image recognition model. Illustratively, the preset threshold is 30%. In the above manner, the time required to extract the target candidate region can be omitted, and the efficiency of image recognition can be improved.
步骤103,采用图像识别模型基于目标候选区域进行图像识别,得到目标图像的识别结果。Step 103: Perform image recognition based on the target candidate region by using an image recognition model to obtain a recognition result of the target image.
目标图像的识别结果是指目标图像中所包括的目标对象所属的分类。例如,图标图像为包括一手势的图像,该目标图像的识别结果是指该手势所属的分类。图像识别模型用于识别目标并对目标进行分类。可选地,图像识别模型也是对CNN进行训练得到的模型。对于图像识别模型的训练过程以及网络架构,将在下文实施例进行解释说明。The recognition result of the target image refers to the classification to which the target object included in the target image belongs. For example, the icon image is an image including a gesture, and the recognition result of the target image refers to the classification to which the gesture belongs. Image recognition models are used to identify targets and classify them. Optionally, the image recognition model is also a model obtained by training the CNN. The training process and network architecture for the image recognition model will be explained in the following examples.
另外,终端获取目标候选区域之后,可以直接对目标候选区域进行识别,也可以在对目标候选区域进行预处理之后,再对处理后的目标候选区域进行识别。下面将分别对上述两种方式进行讲解。In addition, after the terminal acquires the target candidate region, the target candidate region may be directly identified, or the processed target candidate region may be identified after the target candidate region is preprocessed. The above two methods will be explained separately below.
在第一种可能的实施方式中,终端直接对目标候选区域进行识别,步骤103可以包括如下子步骤:In a first possible implementation manner, the terminal directly identifies the target candidate area, and step 103 may include the following sub-steps:
步骤103a,采用图像识别模型对目标候选区域进行特征提取,得到目标候选区域的图像特征;Step 103a: performing feature extraction on the target candidate region by using an image recognition model to obtain image features of the target candidate region;
步骤103b,根据目标候选区域的图像特征,确定目标候选区域中的目标对象在多个识别结果中的第一概率分布;Step 103b: Determine, according to image features of the target candidate region, a first probability distribution of the target object in the target candidate region in the plurality of recognition results;
步骤103c,将第一概率分布中的最大值所对应的识别结果,确定为目标图像的识别结果。In step 103c, the recognition result corresponding to the maximum value in the first probability distribution is determined as the recognition result of the target image.
目标对象在多个识别结果中的第一概率分布食指目标对象属于上述多个识别结果中的每个识别结果的概率。示例性,目标对象属于手势“Good”的概率为0.95,目标对象属于手势“Yeah”的概率为0.05,此时电子设备将手势“Good”确定为目标图像的识别结果。The first probability distribution of the target object among the plurality of recognition results The probability that the index finger target object belongs to each of the plurality of recognition results. Exemplarily, the probability that the target object belongs to the gesture "Good" is 0.95, and the probability that the target object belongs to the gesture "Yeah" is 0.05, at which time the electronic device determines the gesture "Good" as the recognition result of the target image.
在第二种可能的实施方式中,终端在对目标候选区域进行预处理之后,再对处理后的目标候选区域进行识别,此时步骤103可以包括如下子步骤:In a second possible implementation manner, after the terminal performs pre-processing on the target candidate area, the terminal identifies the processed target candidate area. Step 103 may include the following sub-steps:
步骤103d,对目标候选区域进行预处理,得到处理后的目标候选区域,处理后的目标候选区域的分辨率达到预设分辨率;Step 103d: Perform pre-processing on the target candidate region to obtain the processed target candidate region, and the resolution of the processed target candidate region reaches a preset resolution;
预设分辨率是图像识别模型对待识别图像的分辨率的要求。示例性地,预 设分辨率为440*360。由于图像识别模型对待识别的分辨率存在要求,若分辨率不符合要求,则图像识别模型在识别过程中由于需要考虑到分辨率换算问题,该过程所需的计算量较多,耗时较长。在该示例中,在通过图像识别模型进行图像识别时,预先将待识别图像的分辨率转换至图像识别模型要求的分辨率,后续图像识别时可以减小工作量,并节省图像识别所需的时间,提升图像识别的效率。终端先获取目标候选区域的分辨率,之后对目标候选区域的分辨率进行分辨率提升处理,并使处理后的目标候选区域的分辨率达到预设分辨率。其中,分辨率提升处理所采用的算法可以是最近邻插值法算法、双线性插值算法、立方卷积插值算法等等,本申请实施例对此不作限定。The preset resolution is a requirement of the resolution of the image recognition model to be recognized. Illustratively, the preset resolution is 440*360. Due to the requirement of the resolution of the image recognition model to be recognized, if the resolution does not meet the requirements, the image recognition model needs to take into account the resolution conversion problem in the recognition process, and the process requires more calculation and takes longer. . In this example, when image recognition is performed by the image recognition model, the resolution of the image to be recognized is converted to the resolution required by the image recognition model in advance, and the workload can be reduced in subsequent image recognition, and the image recognition is saved. Time to improve the efficiency of image recognition. The terminal first acquires the resolution of the target candidate region, and then performs resolution improvement processing on the resolution of the target candidate region, and the resolution of the processed target candidate region reaches a preset resolution. The algorithm used in the resolution enhancement processing may be a nearest neighbor interpolation algorithm, a bilinear interpolation algorithm, a cubic convolution interpolation algorithm, and the like, which are not limited in this embodiment of the present application.
步骤103e,采用图像识别模型对处理后的目标候选区域进行特征提取,得到处理后的目标候选区域的图像特征;Step 103e: performing feature extraction on the processed target candidate region by using an image recognition model to obtain an image feature of the processed target candidate region;
步骤103f,根据处理后的目标候选区域的图像特征,确定目标候选区域中的目标对象在多个识别结果中的第二概率分布;Step 103f: Determine, according to the image feature of the processed target candidate region, a second probability distribution of the target object in the target candidate region in the plurality of recognition results;
步骤103g,将第二概率分布中的最大值所对应的识别结果,确定为目标图像的识别结果。In step 103g, the recognition result corresponding to the maximum value in the second probability distribution is determined as the recognition result of the target image.
步骤103e至步骤103f与步骤103a至103b相同,此处不再赘述。Steps 103e to 103f are the same as steps 103a to 103b, and are not described herein again.
结合参考图2,图像识别模型13对目标候选区域12进行识别,输出目标图像10的识别结果14,该识别结果14为图中所示的手势“Good”,也即竖起大拇指的手势。2, the image recognition model 13 recognizes the target candidate region 12, and outputs the recognition result 14 of the target image 10, which is the gesture "Good" shown in the figure, that is, the gesture of raising the thumb.
综上所述,本申请实施例提供的方法,先通过图像检测模型初步检测出图像中可能包括目标对象的目标候选区域,并提取出目标候选区域,之后采用图像识别模型基于提取到的目标候选区域进行识别,得到识别结果,当目标对象在图像中所占的比例较小时,由于电子设备从图像中提取出了包含目标对象的目标候选区域,目标对象在目标候选区域所占的比例较大,此时通过图像识别模型对目标候选区域进行识别,可以避免相关技术中由于目标对象在图像中所占的比例较小时出现无法识别甚至识别错误的情况,提高图像识别的成功率。In summary, the method provided by the embodiment of the present application firstly detects a target candidate region that may include a target object in an image by using an image detection model, and extracts a target candidate region, and then uses the image recognition model based on the extracted target candidate. The region is identified and the recognition result is obtained. When the proportion of the target object in the image is small, since the electronic device extracts the target candidate region including the target object from the image, the target object accounts for a larger proportion in the target candidate region. At this time, the target candidate region is identified by the image recognition model, which can avoid the situation that the target object does not recognize or even recognize the error when the proportion of the target object in the image is small, and the success rate of the image recognition is improved.
另外,在本申请实施例中,由于级联网络中的每个子网络(也即图像检测模型与图像识别模型)是相互独立和解耦的,因此能够灵活复用或者替换每一个子网络,方便针对不同的用户提供不同优化偏好的模型组合。例如,有的用户对对准确率要求更高,则可以对图形识别模型进行优化,以期获得更准确的 图像识别结果。In addition, in the embodiment of the present application, since each sub-network (that is, the image detection model and the image recognition model) in the cascade network is independent and decoupled from each other, each sub-network can be flexibly reused or replaced. A combination of models that provide different optimization preferences for different users. For example, if some users have higher accuracy requirements, the graphic recognition model can be optimized to obtain more accurate image recognition results.
下面将对图像检测模型的训练过程以及网络架构进行讲解。The training process and network architecture of the image detection model will be explained below.
图像检测模型的训练过程如下:获取第一训练样本集,采用第一训练样本集对CNN进行训练,得到图像检测模型。The training process of the image detection model is as follows: the first training sample set is acquired, and the CNN is trained by using the first training sample set to obtain an image detection model.
第一训练样本集包含多张第一训练样本。第一训练样本集所包括的第一训练样本的数量可以根据实际需求确定。每张第一训练样本被标记出包括目标对象的区域和/或不包括目标对象的区域。其中,对第一训练样本进行标记的过程可以人工完成。结合参考图3,其示出了本申请一个示例性实施例示出的第一训练样本20的示意图。其中,第一训练样本20中包括由黑线组成的轮廓21,轮廓21的内部是包括目标对象的区域,轮廓21的外部是不包括目标对象的区域。The first training sample set includes a plurality of first training samples. The number of first training samples included in the first training sample set may be determined according to actual needs. Each first training sample is marked with an area including the target object and/or an area not including the target object. The process of marking the first training sample can be done manually. Referring to FIG. 3, a schematic diagram of a first training sample 20 shown in an exemplary embodiment of the present application is shown. Wherein, the first training sample 20 includes a contour 21 composed of black lines, the inside of the contour 21 is an area including a target object, and the outside of the contour 21 is an area not including the target object.
需要说明的是,在不同的第一训练样本中,目标对象占第一训练样本的比例可以相同,也可以不同。示例性地,目标对象占第一训练样本A的比例为0.3,目标占第一训练样本B的比例为0.6。另外,第一训练样本所包括的目标对象的类型可以相同,也可以不同。示例性地,第一训练样本A所包括的目标对象为手势“Good”,第一训练样本B所包括的目标对象为手势“Yeah”。It should be noted that, in different first training samples, the proportion of the target object to the first training sample may be the same or different. Illustratively, the ratio of the target object to the first training sample A is 0.3, and the ratio of the target to the first training sample B is 0.6. In addition, the types of target objects included in the first training sample may be the same or different. Illustratively, the target object included in the first training sample A is the gesture "Good", and the target object included in the first training sample B is the gesture "Yeah".
另外,CNN可以是alexNet网络、VGG-16网络等等。另外。对CNN进行训练并得到图像检测模型所采用的算法可以是采用区域卷积神经网络(Regions with Convolutional Neural Network,RCNN)算法、快速区域卷积神经网络(faster RCNN)算法等等。本申请实施例对CNN,以及训练CNN的算法不作具体限定。In addition, CNN can be an alexNet network, a VGG-16 network, and the like. Also. The algorithm used to train the CNN and obtain the image detection model may be a Regions Convolutional Neural Network (RCNN) algorithm, a Fast Region Convolutional Neural Network (faster RCNN) algorithm, or the like. The embodiments of the present application do not specifically limit the CNN and the algorithm for training the CNN.
另外,在训练出图像检测模型之后,还可以采用第一测试样本集对图像检测模型进行测试。第一测试样本集包括多张第一测试样本,每张测试样本对应有测试结果。终端将第一测试样本输入图像检测模型后,检测图像检测模型输出的检测结果与该测试样本对应的测试结果是否相同,以实现检测图像检测模型是否训练至设定的精度。In addition, after training the image detection model, the image detection model may also be tested using the first test sample set. The first test sample set includes a plurality of first test samples, each of which corresponds to a test result. After inputting the first test sample into the image detection model, the terminal detects whether the detection result output by the image detection model is the same as the test result corresponding to the test sample, so as to realize whether the detection image detection model is trained to the set precision.
图像检测模型的网络架构参见下文介绍。The network architecture of the image detection model is described below.
图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、归一层和输出层。本申请实施例对图像检测模型所包括的各层的数量不作限定,一般来说,图像检测模型的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合对检测精度和效率的要求,设计适当层数的图像检测模型。The image detection model includes an input layer, a convolution layer, a pooling layer, an upper convolution layer, a mosaic layer, a layer and an output layer. The embodiment of the present application does not limit the number of layers included in the image detection model. Generally speaking, the more layers of the image detection model, the better the effect but the longer the calculation time. In practical applications, the pair can be combined. To determine the accuracy and efficiency requirements, design an image detection model with the appropriate number of layers.
输入层用于输入目标图像。The input layer is used to input the target image.
卷积层用于将目标图像转化为特征图。在本申请实施例中,卷积层用于对目标图像、激活层的输出、池化层的输出、拼接层的输出执行卷积操作。卷积操作的作用是提取图像特征,并将输入数据映射到特征空间。每个卷积层用于执行一次或多次卷积操作。另外,各个卷积层的输入数据可以根据卷积层在图像检测模型中的位置确定,当卷积层位于图像检测模型中的第一层时,该卷积层的输入数据为目标图像;当卷积层位于激活层之后的一层时,该卷积层的输入数据为激活层的输出数据;当卷积层位于池化层之后的一层时,该卷积层的输入数据为池化层的输出数据;当卷积层位于拼接层之后的一层时,该卷积层的输入数据为拼接层的输出数据。A convolution layer is used to convert the target image into a feature map. In the embodiment of the present application, the convolution layer is used to perform a convolution operation on the target image, the output of the active layer, the output of the pooling layer, and the output of the splicing layer. The purpose of the convolution operation is to extract image features and map the input data to the feature space. Each convolution layer is used to perform one or more convolution operations. In addition, the input data of each convolution layer may be determined according to the position of the convolution layer in the image detection model. When the convolution layer is located in the first layer in the image detection model, the input data of the convolution layer is the target image; When the convolutional layer is located at a layer behind the active layer, the input data of the convolutional layer is the output data of the active layer; when the convolutional layer is located at a layer after the pooling layer, the input data of the convolutional layer is pooled. The output data of the layer; when the convolution layer is located at a layer behind the concatenation layer, the input data of the convolution layer is the output data of the concatenation layer.
池化层用于对卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量。池化处理可以是最大值池化处理,也可以是均值池化处理。其中,最大值池化操作的作用是降低特征图的尺寸,增大下一层的感受野。感受野是图像检测模型每一层输出的特征图(feature map)上的像素点在原始图像上映射的区域大小。池化层的输入数据通常为卷积层的输出数据,池化层的输出数据通常为卷积层的输入数据。The pooling layer is used to pool the feature map outputted by the convolutional layer to reduce the number of features in the feature map. The pooling process can be a maximum pooling process or a mean pooling process. Among them, the function of the maximum pooling operation is to reduce the size of the feature map and increase the receptive field of the next layer. The receptive field is the size of the area on the original image on which the pixel points on the feature map of each layer of the image detection model are mapped. The input data of the pooling layer is usually the output data of the convolution layer, and the output data of the pooling layer is usually the input data of the convolution layer.
上卷积层用于对卷积层输出的特征图进行上卷积操作。上卷积操作的作用是增大特征图的尺寸,将学习到的特征映射到更大的尺寸上。上卷积层的输入数据通常为激活层的输出数据,上卷积层的输出数据通常为拼接层的输入数据。The upper convolution layer is used to perform a convolution operation on the feature map of the convolutional layer output. The effect of the upper convolution operation is to increase the size of the feature map and map the learned features to larger sizes. The input data of the upper convolution layer is usually the output data of the active layer, and the output data of the upper convolution layer is usually the input data of the concatenated layer.
拼接层用于对经过池化层和上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图。拼接操作的作用是将不同的特征图拼接起来,方便融合不同特征维度的信息,从而学习到更鲁棒的特征。拼接层的输入数据通常为池化层的输出数据和上卷积层的输出数据,拼接层的输出数据通常为卷积层的输入数据。The splicing layer is used for splicing the feature map processed by the pooling layer and the upper convolution layer to obtain a spliced feature map. The function of the splicing operation is to splicing different feature images to facilitate the fusion of information of different feature dimensions, so as to learn more robust features. The input data of the stitching layer is usually the output data of the pooling layer and the output data of the upper convolution layer, and the output data of the stitching layer is usually the input data of the convolution layer.
归一层用于对拼接后的特征图进行归一处理,得到目标候选区域的位置信息。归一处理的作用是得到拼接后的特征图中每个像素点属于目标对象的概率,并根据上述概率来确定目标候选区域的位置信息。The layer is used to normalize the stitched feature map to obtain the location information of the target candidate region. The function of the normalization process is to obtain the probability that each pixel in the spliced feature map belongs to the target object, and determine the position information of the target candidate region according to the above probability.
可选地,该图像检测模型还可以包括激活层。激活层可以位于池化层和/或上卷积层的前面,以及卷积层的后面,激活层用于对卷积层的输出执行激活操作,并输出被标记出目标候选区域的目标图像。由于卷积操作所得到的特征空间有限,因此可以通过激活操作对特征空间进行处理,以使得特征空间能表示 的特征更多。激活层的输入数据通常都是卷积层的输出数据。激活层的输出数据可以根据激活层在图像检测模型中的位置确定,当激活层位于图像检测模型中的最后一层时,该激活层的输出数据为被标记出目标候选区域的目标图像。Optionally, the image detection model may further include an activation layer. The activation layer may be located in front of the pooling layer and/or the upper convolution layer, and the convolution layer is used to perform an activation operation on the output of the convolutional layer and output a target image that is marked out of the target candidate region. Since the feature space obtained by the convolution operation is limited, the feature space can be processed by an activation operation so that the feature space can represent more features. The input data of the active layer is usually the output data of the convolutional layer. The output data of the active layer may be determined according to the position of the active layer in the image detection model. When the active layer is located at the last layer in the image detection model, the output data of the active layer is the target image labeled with the target candidate region.
下面将结合图像检测模型的网络架构,对图像检测模型的检测过程进行讲解。结合参考图4,其示出了本申请一个示例性实施例示出的检测过程的示意图(图中仅示出了卷积层、激活层、池化层、上卷积层与拼接层)。其中,①代表卷积操作,②代表激活操作,③代表最大值池化操作,④代表上卷卷积操作,⑤代表拼接操作;最左边的矩形框表示目标图像,最右边的矩形框表示被标记出目标候选区域的目标图像,其它的矩形框表示多通道特征图,矩形框的高度表示特征图的尺寸,特征图的尺寸越大,矩形框的高度就越高;矩形框的厚度表示特征图的通道数量,特征图的通道数量越多,矩形框的厚度就越厚。黑色的矩形框表示对激活层的输出数据的复制结果,与黑色的矩形框拼接的矩形框表示上卷积层的输出数据。The detection process of the image detection model will be explained below in combination with the network architecture of the image detection model. Referring to FIG. 4, there is shown a schematic diagram of a detection process shown in an exemplary embodiment of the present application (only the convolutional layer, the active layer, the pooled layer, the upper convolutional layer, and the splice layer are shown). Where 1 represents a convolution operation, 2 represents an activation operation, 3 represents a maximum pooling operation, 4 represents a roll-up convolution operation, and 5 represents a splicing operation; the leftmost rectangle represents the target image, and the rightmost rectangle represents the Mark the target image of the target candidate area. The other rectangular boxes represent the multi-channel feature map. The height of the rectangular frame indicates the size of the feature map. The larger the size of the feature image, the higher the height of the rectangular frame. The thickness of the rectangular frame indicates the feature. The number of channels in the graph, the more the number of channels in the feature map, the thicker the thickness of the rectangle. The black rectangular frame indicates the copy result of the output data of the active layer, and the rectangular frame spliced with the black rectangular frame indicates the output data of the upper convolution layer.
在本申请实施例中,以图像检测模型中的每层仅执行一次操作来进行解释说明。在图4中,图像检测模型共执行了15次卷积操作、15次激活操作、3次最大值池化操作、3次上卷积操作和3个拼接操作,也即,图像识别模型包括9个卷积层、9个激活层、3个池化层、3个上卷积层和3个拼接层。图像检测模型中的各个层按照图4中各个操作的执行顺序由左及右顺次连接,其中,拼接层的输入端与上卷积层和激活层均连接。第一个卷积层的输入数据是目标候选区域,之后每一层的输入数据是上一层的输出数据,拼接层的输入数据是激活层的输出数据和上卷积层的输出数据,最后一个激活层的输出数据是标记有目标候选区域的目标图像。In the embodiment of the present application, explanation is performed by performing only one operation per layer in the image detection model. In FIG. 4, the image detection model performs a total of 15 convolution operations, 15 activation operations, 3 maximum pool operations, 3 convolution operations, and 3 splicing operations, that is, the image recognition model includes 9 Convolutional layer, 9 active layers, 3 pooled layers, 3 upper convolutional layers, and 3 splice layers. The layers in the image detection model are sequentially connected by left and right in accordance with the execution order of the operations in FIG. 4, wherein the input end of the stitching layer is connected to both the upper convolution layer and the active layer. The input data of the first convolutional layer is the target candidate area, and the input data of each layer is the output data of the upper layer. The input data of the concatenated layer is the output data of the active layer and the output data of the upper convolution layer, and finally The output data of an active layer is a target image marked with a target candidate area.
下面将对图像识别模型的训练过程进行讲解。图像识别模型的训练过程如下:获取第二训练样本集,采用第二训练样本集对卷积神经网络CNN进行训练,得到图像识别模型。The training process of the image recognition model will be explained below. The training process of the image recognition model is as follows: the second training sample set is acquired, and the second training sample set is used to train the convolutional neural network CNN to obtain an image recognition model.
第二训练样本集包含多张第二训练样本。第一训练样本集所包括的第一训练样本的数量可以根据实际需求确定。第一训练样本越多时,图像检测模型的检测的精度越高;第一训练样本越低时,图像检测模型的精度越低。The second training sample set includes a plurality of second training samples. The number of first training samples included in the first training sample set may be determined according to actual needs. The more the first training samples, the higher the accuracy of the detection of the image detection model; the lower the first training samples, the lower the accuracy of the image detection model.
每张第二训练样本对应有识别结果。第二训练样本对应的识别结果可以根 据第二训练样本包括的目标对象的类型实际确定。另外,终端还可以根据各张训练样本的识别结果进行分类。结合参考图5,其示出了本申请一个实施例示出的第二训练样本集的示意图。第二训练样本集包括两个识别结果,分别为手势“Good”31和手势“Yeah”32,手势“Good”31对应有多张包含竖起大拇指的手势的第二训练样本311,手势“Good”32对应有多张包含竖起食指与中指的手势的第二训练样本321。Each second training sample corresponds to a recognition result. The recognition result corresponding to the second training sample may be actually determined according to the type of the target object included in the second training sample. In addition, the terminal can also classify according to the recognition result of each training sample. Referring to FIG. 5, a schematic diagram of a second training sample set shown in one embodiment of the present application is shown. The second training sample set includes two recognition results, respectively a gesture "Good" 31 and a gesture "Yeah" 32, and the gesture "Good" 31 corresponds to a plurality of second training samples 311 including a gesture of a thumbs up, the gesture " The Good" 32 corresponds to a plurality of second training samples 321 including gestures for erecting the index finger and the middle finger.
另外,CNN可以是alexNet网络、VGG-16网络等等。另外。对CNN进行训练并得到图像识别模型所采用的算法可以是采用faster RCNN算法、RCNN算法等等。本申请实施例对CNN,以及训练CNN的算法不作具体限定。In addition, CNN can be an alexNet network, a VGG-16 network, and the like. Also. The algorithm used to train the CNN and obtain the image recognition model may be a fast RCNN algorithm, an RCNN algorithm, or the like. The embodiments of the present application do not specifically limit the CNN and the algorithm for training the CNN.
另外,在训练出图像识别模型之后,还可以采用第二测试样本集对图像识别模型进行测试。第二测试样本集包括多张第二测试样本,每张测试样本对应有识别结果。终端将第二测试样本输入图像识别模型后,检测图像是被模型输出的识别结果与该测试样本对应的识别结果是否相同,以实现检测图像识别模型是否训练至设定的精度。In addition, after training the image recognition model, the image recognition model may also be tested using the second test sample set. The second test sample set includes a plurality of second test samples, each of which corresponds to a recognition result. After the terminal inputs the second test sample into the image recognition model, the detected image is whether the recognition result output by the model is the same as the recognition result corresponding to the test sample, so as to realize whether the detected image recognition model is trained to the set precision.
图像识别模型的网络架构参见下文介绍。The network architecture of the image recognition model is described below.
可选地,图像识别模型包括输入层、卷积层、池化层和输出层,本申请实施例对图像识别模型所包括的各层的数量不作限定,一般来说,图像识别模型的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合对检测精度和效率的要求,设计适当层数的图像识别模型。Optionally, the image recognition model includes an input layer, a convolution layer, a pooling layer, and an output layer. The embodiment of the present application does not limit the number of layers included in the image recognition model. Generally, the number of layers of the image recognition model is The more the effect, the better the calculation time will be. In practical applications, the image recognition model with the appropriate number of layers can be designed in combination with the requirements for detection accuracy and efficiency.
输入层用于输入目标候选区域。The input layer is used to input a target candidate area.
卷积层用于将目标候选区域转化为特征图。在本申请实施例中,卷积层用于对目标候选区域、和池化层的输出执行卷积操作。卷积操作的作用是提取图像特征,并将输入数据映射到特征空间。每个卷积层用于执行一次或多次卷积操作。另外,各个卷积层的输入数据可以根据卷积层在图像识别模型中的位置确定,当卷积层位于图像识别模型中的第一层时,该卷积层的输入数据为目标候选区域或处理后的目标候选区域;当卷积层位于激活层之后的一层时,该卷积层的输入数据为激活层的输出数据;当卷积层位于池化层之后的一层时,该卷积层的输入数据为池化层的输出数据。The convolution layer is used to convert the target candidate region into a feature map. In the embodiment of the present application, the convolution layer is used to perform a convolution operation on the output of the target candidate region and the pooled layer. The purpose of the convolution operation is to extract image features and map the input data to the feature space. Each convolution layer is used to perform one or more convolution operations. In addition, the input data of each convolution layer may be determined according to the position of the convolution layer in the image recognition model. When the convolution layer is located in the first layer in the image recognition model, the input data of the convolution layer is the target candidate region or The processed target candidate region; when the convolution layer is located at a layer subsequent to the active layer, the input data of the convolution layer is the output data of the active layer; when the convolution layer is located at a layer subsequent to the pooling layer, the volume The layered input data is the output data of the pooled layer.
池化层用于对卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量。池化处理可以是最大值池化处理,也可以是均值池化处理。最大值 池化操作的作用是降低特征图的尺寸,增大下一层的感受野。感受野是图像识别模型每一层输出的特征图上的像素点在原始图像上映射的区域大小。池化层的输入数据通常为激活层的输出数据,池化层的输出数据通常为卷积层的输入数据。The pooling layer is used to pool the feature map outputted by the convolutional layer to reduce the number of features in the feature map. The pooling process can be a maximum pooling process or a mean pooling process. Maximum The effect of the pooling operation is to reduce the size of the feature map and increase the receptive field of the next layer. The receptive field is the size of the area on the original image of the pixel on the feature map output by each layer of the image recognition model. The input data of the pooling layer is usually the output data of the active layer, and the output data of the pooling layer is usually the input data of the convolution layer.
归一层用于对经过卷积层和池化层处理后的特征图进行归一处理,得到识别结果。在该实施例中,归一处理的作用是得目标对象属于多个识别结果的概率分布,并根据该概率分布来确定出识别结果。The layer is used to normalize the feature map processed by the convolution layer and the pooling layer to obtain the recognition result. In this embodiment, the effect of the normalization process is to obtain a probability distribution in which the target object belongs to a plurality of recognition results, and to determine the recognition result based on the probability distribution.
可选地,该图像识别模型还可以包括激活层。激活层可以位于池化层之前,且位于激活层之后。激活层用于对卷积层的输出执行激活操作。由于卷积操作所得到的特征空间有限,因此可以通过激活操作对特征空间进行处理,以使得特征空间能表示的特征更多。激活层的输入数据通常都是卷积层的输出数据。激活层的输出数据可以根据激活层在图像识别模型中的位置确定,当激活层位于图像识别模型中的最后一层时,该激活层的输出数据为目标图像的识别结果。Optionally, the image recognition model may further include an activation layer. The activation layer can be located before the pooling layer and after the activation layer. The activation layer is used to perform an activation operation on the output of the convolutional layer. Since the feature space obtained by the convolution operation is limited, the feature space can be processed by an activation operation so that the feature space can represent more features. The input data of the active layer is usually the output data of the convolutional layer. The output data of the active layer may be determined according to the position of the active layer in the image recognition model. When the active layer is located at the last layer in the image recognition model, the output data of the active layer is the recognition result of the target image.
下面将结合图像识别模型的网络架构,对图像识别模型的检测过程进行讲解。结合参考图6,其示出了本申请一个示例性实施例示出的检测过程的示意图(图中仅示出了卷积层、激活层、池化层)。其中,①代表卷积操作,②代表激活操作,③代表最大值池化操作;最左边的矩形框表示目标候选区域或者处理后的目标候选区域,最右边的矩形框表示被标记出目标图像的识别结果,其它的矩形框表示多通道特征图,矩形框的高度表示特征图的尺寸,特征图的尺寸越大,矩形框的高度就越高;矩形框的厚度表示特征图的通道数量,特征图的通道数量越多,矩形框的厚度就越厚。The detection process of the image recognition model will be explained below in combination with the network architecture of the image recognition model. Referring to FIG. 6, there is shown a schematic diagram of a detection process (only the convolution layer, the activation layer, the pooling layer are shown) shown in an exemplary embodiment of the present application. Where 1 represents a convolution operation, 2 represents an activation operation, and 3 represents a maximum pooling operation; the leftmost rectangular box represents the target candidate area or the processed target candidate area, and the rightmost rectangular box represents the target image is marked. Recognizing the result, other rectangular boxes represent multi-channel feature maps. The height of the rectangular frame indicates the size of the feature map. The larger the size of the feature map, the higher the height of the rectangular frame. The thickness of the rectangular frame indicates the number of channels of the feature map. The larger the number of channels in the graph, the thicker the thickness of the rectangular frame.
在本申请实施例中,以图像识别模型中的每层仅执行一次操作来进行解释说明。在图6中,图像识别模型共执行了9次卷积操作、9次激活操作和3次最大值池化操作,也即,图像识别模型包括9个卷积层、9个激活层和3个池化层。图像识别模型中的各个层按照图6中各个操作的执行顺序由左及右顺次连接。第一个卷积层的输入数据是目标候选区域,之后每一层的输入数据是上一层的输出数据,最后一个激活层的输出数据是目标图像的识别结果。In the embodiment of the present application, explanation is performed by performing only one operation per layer in the image recognition model. In FIG. 6, the image recognition model performs a total of 9 convolution operations, 9 activation operations, and 3 maximum pooling operations, that is, the image recognition model includes 9 convolution layers, 9 activation layers, and 3 Pooling layer. The respective layers in the image recognition model are sequentially connected by left and right in the order in which the operations of the respective operations in FIG. 6 are performed. The input data of the first convolutional layer is the target candidate area, and the input data of each layer is the output data of the upper layer, and the output data of the last active layer is the recognition result of the target image.
请参考图7,其示出了本申请另一个实施例示出的图像识别方法的流程图。该方法可以包括如下步骤:Please refer to FIG. 7, which shows a flowchart of an image recognition method according to another embodiment of the present application. The method can include the following steps:
步骤401,获取第一训练样本集。Step 401: Acquire a first training sample set.
第一训练样本集包含多张第一训练样本,每张第一训练样本被标记出包括目标对象的区域和/或不包括目标对象的区域。The first training sample set includes a plurality of first training samples, each of which is marked with an area including the target object and/or an area not including the target object.
步骤402,采用第一训练样本集对CNN进行训练,得到图像检测模型。Step 402: Train the CNN with the first training sample set to obtain an image detection model.
步骤403,获取第二训练样本集。Step 403: Acquire a second training sample set.
第二训练样本集包含多张第二训练样本,每张第二训练样本对应有识别结果。The second training sample set includes a plurality of second training samples, each of which corresponds to a recognition result.
步骤404,采用第二训练样本集对CNN进行训练,得到图像识别模型。Step 404: The CNN is trained by using the second training sample set to obtain an image recognition model.
本申请实施例对图像检测模型的训练过程,以及对图像识别过程的训练过程的先后顺序不作限定。也即,终端可以先执行步骤401和402,再执行步骤403和404;终端还可以先执行步骤403和404,再执行步骤401和402。The training process of the image detection model and the sequence of the training process of the image recognition process are not limited in the embodiment of the present application. That is, the terminal may perform steps 401 and 402 first, and then perform steps 403 and 404; the terminal may also perform steps 403 and 404 before performing steps 401 and 402.
步骤405,采用图像检测模型检测目标图像中的目标候选区域。Step 405: The target candidate region in the target image is detected by using an image detection model.
目标候选区域为包含目标对象的图像块。The target candidate area is an image block containing the target object.
步骤406,获取目标候选区域占目标图像的比例。Step 406: Obtain a ratio of the target candidate area to the target image.
若比例小于或等于预设门限,则执行步骤407;若比例大于预设门限,则执行步骤408。If the ratio is less than or equal to the preset threshold, step 407 is performed; if the ratio is greater than the preset threshold, step 408 is performed.
步骤407,当从目标图像中检测出目标候选区域时,提取目标候选区域。Step 407, when the target candidate region is detected from the target image, the target candidate region is extracted.
步骤408,采用图像识别模型对目标候选区域进行识别,得到目标图像的识别结果。Step 408: Identify the target candidate region by using an image recognition model, and obtain a recognition result of the target image.
综上所述,本申请实施例提供的方法,先通过图像检测模型初步检测出图像中可能包括目标的目标候选区域,之后采用图像识别模型基于检测出的目标候选区域进行识别,将上述两种模型结合,从而在目标在图像中所占的比例较小的情况下,也能准确地识别出图像中的目标,提高了图像识别的准确性。In summary, the method provided by the embodiment of the present application firstly detects a target candidate region in the image that may include a target by using an image detection model, and then uses the image recognition model to identify the target candidate region based on the detected target. The model is combined so that the target in the image can be accurately recognized when the proportion of the target in the image is small, and the accuracy of image recognition is improved.
实际应用中,终端存在对用户进行身份验证的需求,例如,终端要求用户做出指定动作,比如摆出手势“Good”或者手势“Yeah”,终端通过摄像头采集图像,并对采集到的图像进行识别,得到识别结果,之后将该识别结果与所要求的指定动作进行比对,若一致,则说明身份验证成功,若不一致,则说明身份验证失败。In practical applications, the terminal needs to authenticate the user. For example, the terminal requires the user to perform a specified action, such as putting a gesture "Good" or a gesture "Yeah", the terminal collects an image through the camera, and performs the captured image. Identifying, obtaining the recognition result, and then comparing the recognition result with the specified specified action. If they are consistent, the identity verification is successful. If not, the identity verification fails.
结合参考图8,其示出了本申请一个实施例提供的图像识别的界面示意图。 在该图中,终端对电子设备采集到的图像进行识别,该图像的识别结果为为图中所示的手势“Good”,也即竖起大拇指的手势。Referring to FIG. 8, a schematic diagram of an interface for image recognition provided by an embodiment of the present application is shown. In the figure, the terminal recognizes the image collected by the electronic device, and the recognition result of the image is the gesture “Good” shown in the figure, that is, the gesture of raising the thumb.
结合参考图9,其示出了本申请另一个实施例提供的图像识别的界面示意图。在该图中,终端对电子设备采集到的图像进行识别,该图像的识别结果为为图中所示的手势“Yeah”,也即竖起食指和中指的手势。Referring to FIG. 9, a schematic diagram of an interface for image recognition provided by another embodiment of the present application is shown. In the figure, the terminal recognizes the image collected by the electronic device, and the recognition result of the image is the gesture “Yeah” shown in the figure, that is, the gesture of erecting the index finger and the middle finger.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following is an embodiment of the apparatus of the present application, which may be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
请参考图10,其示出了本申请一个实施例提供的图像识别装置的框图。该装置应用于电子设备中,具有实现上述方法示例中的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以包括:图像检测模块501、区域提取模块502和图像识别模块503。Please refer to FIG. 10, which shows a block diagram of an image recognition apparatus provided by an embodiment of the present application. The device is applied to an electronic device and has the functions in the example of the above method, and the function may be implemented by hardware or may be implemented by hardware. The apparatus may include an image detection module 501, an area extraction module 502, and an image recognition module 503.
图像检测模块501,用于采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块。The image detecting module 501 is configured to detect a target candidate region in the target image by using an image detection model, where the target candidate region is an image block that includes the target object.
区域提取模块502,用于当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域。The region extraction module 502 is configured to extract the target candidate region when the target candidate region is detected from the target image.
图像识别模块503,用于采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。The image recognition module 503 is configured to perform image recognition based on the target candidate region by using an image recognition model to obtain a recognition result of the target image.
在基于图10所示实施例提供的一个可选实施例中,所述图像检测模块501,用于:In an optional embodiment provided by the embodiment shown in FIG. 10, the image detecting module 501 is configured to:
采用所述图像检测模型获取所述目标图像中的每一个像素属于所述目标对象的概率;Acquiring, by the image detection model, a probability that each pixel in the target image belongs to the target object;
根据各个像素对应的概率确定所述目标候选区域,所述目标候选区域包括概率大于预设阈值的像素。The target candidate region is determined according to a probability corresponding to each pixel, and the target candidate region includes a pixel whose probability is greater than a preset threshold.
在基于图10所示实施例提供的另一个可选实施例中,所述图像检测模块501,用于:In another optional embodiment provided based on the embodiment shown in FIG. 10, the image detecting module 501 is configured to:
根据各个像素对应的概率获取符合第一预设条件的图像块,将所述符合第一预设条件的图像块确定为目标图像块,其中,所述第一预设条件是指包含连续且大于预设数量的目标像素,所述目标像素是指概率大于预设阈值的像素;Obtaining an image block that meets the first preset condition according to a probability corresponding to each pixel, and determining the image block that meets the first preset condition as a target image block, where the first preset condition is continuous and larger than a preset number of target pixels, wherein the target pixel refers to a pixel whose probability is greater than a preset threshold;
将包含所述目标图像块且符合第二预设条件的矩形区域确定为所述目标候 选区域,所述第二预设条件为所述目标图像块在矩形区域内的占比大于预设比例。A rectangular area including the target image block and meeting the second preset condition is determined as the target candidate area, and the second preset condition is that the proportion of the target image block in the rectangular area is greater than a preset ratio.
在基于图10所示实施例提供的另一个可选实施例中,所述图像识别模块503,用于:In another optional embodiment provided based on the embodiment shown in FIG. 10, the image recognition module 503 is configured to:
采用所述图像识别模型对所述目标候选区域进行特征提取,得到所述目标候选区域的图像特征;Performing feature extraction on the target candidate region by using the image recognition model to obtain an image feature of the target candidate region;
根据所述目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第一概率分布;Determining, according to the image feature of the target candidate region, a first probability distribution of the target object in the target candidate region in the plurality of recognition results;
将所述第一概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。The recognition result corresponding to the maximum value in the first probability distribution is determined as the recognition result of the target image.
在基于图10所示实施例提供的另一个可选实施例中,所述图像识别模块503,用于:In another optional embodiment provided based on the embodiment shown in FIG. 10, the image recognition module 503 is configured to:
对所述目标候选区域进行预处理,得到处理后的目标候选区域,所述处理后的目标候选区域的分辨率达到预设分辨率;Performing pre-processing on the target candidate region to obtain a processed target candidate region, where the resolution of the processed target candidate region reaches a preset resolution;
采用所述图像识别模型对所述处理后的目标候选区域进行特征提取,得到所述处理后的目标候选区域的图像特征;Performing feature extraction on the processed target candidate region by using the image recognition model to obtain image features of the processed target candidate region;
根据所述处理后的目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第二概率分布;Determining, according to the image feature of the processed target candidate region, a second probability distribution of the target object in the target candidate region in the plurality of recognition results;
将所述第二概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。The recognition result corresponding to the maximum value in the second probability distribution is determined as the recognition result of the target image.
在基于图10所示实施例提供的另一个可选实施例中,所述图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、归一层和输出层;所述输入层用于输入所述目标图像;所述卷积层用于将所述目标图像转化为特征图;所述池化层用于对所述卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量;所述上卷积层用于对所述卷积层输出的特征图执行上卷积操作;所述拼接层用于对经过所述池化层和所述上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图;所述归一层,用于对所述拼接后的特征图进行归一处理,得到所述目标候选区域的位置信息;所述输出层,用于输出所述目标候选区域的位置信息。In another optional embodiment provided based on the embodiment shown in FIG. 10, the image detection model includes an input layer, a convolution layer, a pooling layer, an upper convolution layer, a concatenation layer, a layer and an output layer; The input layer is configured to input the target image; the convolution layer is configured to convert the target image into a feature map; and the pooling layer is configured to perform a pooling process on a feature map output by the convolution layer To reduce the number of features in the feature map; the upper convolution layer is configured to perform an up-convolution operation on a feature map output by the convolution layer; the stitching layer is used to pass through the pooling layer and The feature layer processed by the upper convolution layer is spliced to obtain a spliced feature map; the homing layer is used for normalizing the spliced feature map to obtain the target candidate region. Position information; the output layer, configured to output location information of the target candidate area.
在基于图10所示实施例提供的另一个可选实施例中,所述图像识别模型包 括输入层、卷积层、池化层、归一层和输出层;所述输入层用于输入所述目标候选区域;所述卷积层用于将所述目标候选区域转化为特征图;所述池化层用于对所述特征图进行池化处理,以减少所述特征图中的特征数量;所述归一层用于对经过所述卷积层和所述池化层处理后的特征图进行归一处理,得到所述识别结果;所述输出层用于输出所述识别结果。In another optional embodiment provided based on the embodiment shown in FIG. 10, the image recognition model includes an input layer, a convolution layer, a pooling layer, a layer and an output layer; and the input layer is used for inputting a target candidate region; the convolution layer is configured to convert the target candidate region into a feature map; the pooling layer is configured to perform pooling processing on the feature map to reduce the number of features in the feature map And returning the layer to perform normalization processing on the feature map processed by the convolution layer and the pooling layer to obtain the recognition result; and the output layer is configured to output the recognition result.
在基于图10所示实施例提供的另一个可选实施例中,请参考图11,所述装置还包括:比例获取模块504(图中未示出)。In another alternative embodiment provided based on the embodiment shown in FIG. 10, referring to FIG. 11, the apparatus further includes a ratio acquisition module 504 (not shown).
比例获取模块504,用于获取所述目标候选区域占所述目标图像的比例。The ratio acquisition module 504 is configured to acquire a ratio of the target candidate area to the target image.
所述图像识别模块503,还用于若所述比例大于预设门限,则直接执行所述采用图像识别模型对所述目标候选区域进行识别,得到所述目标图像的识别结果的步骤。The image recognition module 503 is further configured to directly perform the step of recognizing the target candidate region by using an image recognition model to obtain a recognition result of the target image, if the ratio is greater than a preset threshold.
在基于图8所示实施例提供的另一个可选实施例中,请参考图9,所述装置还包括:第一获取模块505和第一训练模块506(图中未示出)。In another alternative embodiment provided based on the embodiment shown in FIG. 8, referring to FIG. 9, the apparatus further includes: a first obtaining module 505 and a first training module 506 (not shown).
第一获取模块505,用于获取第一训练样本集,所述第一训练样本集包含多张第一训练样本,每张所述第一训练样本被标记出包括所述目标的区域和/或不包括所述目标的区域。a first obtaining module 505, configured to acquire a first training sample set, where the first training sample set includes a plurality of first training samples, each of the first training samples is marked with an area including the target and/or The area of the target is not included.
第一训练模块506,用于采用所述第一训练样本集对卷积神经网络CNN进行训练,得到所述图像检测模型。The first training module 506 is configured to train the convolutional neural network CNN by using the first training sample set to obtain the image detection model.
在基于图8所示实施例提供的另一个可选实施例中,请参考图11,所述装置还包括:第二获取模块507和第二训练模块508(图中未示出)。In another alternative embodiment provided based on the embodiment shown in FIG. 8, referring to FIG. 11, the apparatus further includes: a second acquisition module 507 and a second training module 508 (not shown).
第二获取模块507,用于获取第二训练样本集,所述第二训练样本集包含多张第二训练样本,每张所述第二训练样本对应有识别结果。The second obtaining module 507 is configured to obtain a second training sample set, where the second training sample set includes a plurality of second training samples, and each of the second training samples corresponds to a recognition result.
第二训练模块508,用于采用所述第二训练样本集对卷积神经网络CNN进行训练,得到所述图像识别模型。The second training module 508 is configured to train the convolutional neural network CNN by using the second training sample set to obtain the image recognition model.
综上所述,本申请实施例提供的装置,先通过图像检测模型初步检测出图像中可能包括目标的目标候选区域,之后采用图像识别模型基于检测出的目标候选区域进行识别,将上述两种模型结合,从而在目标在图像中所占的比例较小的情况下,也能准确地识别出图像中的目标,提高了图像识别的准确性。In summary, the apparatus provided by the embodiment of the present application firstly detects a target candidate region that may include a target by using an image detection model, and then uses an image recognition model to identify the target candidate region based on the detected target. The model is combined so that the target in the image can be accurately recognized when the proportion of the target in the image is small, and the accuracy of image recognition is improved.
图11示出了本申请一个示例性实施例提供的电子设备600的结构框图。该 电子设备600可以是诸如智能手机、平板电脑、笔记本电脑或台式电脑之类的终端,也可以是服务器。在本申请实施例中,仅以电子设备600为终端为例进行说明。FIG. 11 is a structural block diagram of an electronic device 600 provided by an exemplary embodiment of the present application. The electronic device 600 can be a terminal such as a smart phone, tablet, laptop or desktop computer, or can be a server. In the embodiment of the present application, only the electronic device 600 is taken as an example for description.
通常,电子设备600包括有:处理器601和存储器602。Generally, the electronic device 600 includes a processor 601 and a memory 602.
处理器601可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器601可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器601也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器601可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器601还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。Processor 601 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 601 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve. The processor 601 may also include a main processor and a coprocessor. The main processor is a processor for processing data in an awake state, which is also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display. In some embodiments, the processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
存储器602可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器602还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器602中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器601所执行以实现本申请中方法实施例提供的图像识别方法。Memory 602 can include one or more computer readable storage media, which can be non-transitory. Memory 602 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in memory 602 is configured to store at least one instruction for execution by processor 601 to implement image recognition provided by the method embodiments of the present application. method.
在一些实施例中,电子设备600还可选包括有:外围设备接口603和至少一个外围设备。处理器601、存储器602和外围设备接口603之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口603相连。具体地,外围设备包括:射频电路604、触摸显示屏605、摄像头606、音频电路607、定位组件608和电源609中的至少一种。In some embodiments, the electronic device 600 also optionally includes a peripheral device interface 603 and at least one peripheral device. The processor 601, the memory 602, and the peripheral device interface 603 can be connected by a bus or a signal line. Each peripheral device can be connected to the peripheral device interface 603 via a bus, signal line or circuit board. Specifically, the peripheral device includes at least one of a radio frequency circuit 604, a touch display screen 605, a camera 606, an audio circuit 607, a positioning component 608, and a power source 609.
外围设备接口603可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器601和存储器602。在一些实施例中,处理器601、存储器602和外围设备接口603被集成在同一芯片或电路板上;在一些其他实施例中,处理器601、存储器602和外围设备接口603中的任意一个或两个可以 在单独的芯片或电路板上实现,本实施例对此不加以限定。The peripheral device interface 603 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any of processor 601, memory 602, and peripheral interface 603 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
射频电路604用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路604通过电磁信号与通信网络以及其他通信设备进行通信。射频电路604将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路604包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路604可以通过至少一种无线通信协议来与其它电子设备进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路604还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。The RF circuit 604 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal. Radio frequency circuit 604 communicates with the communication network and other communication devices via electromagnetic signals. The RF circuit 604 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like. Radio frequency circuitry 604 can communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to, the World Wide Web, a metropolitan area network, an intranet, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 604 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
显示屏605用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏605是触摸显示屏时,显示屏605还具有采集在显示屏605的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器601进行处理。此时,显示屏605还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏605可以为一个,设置电子设备600的前面板;在另一些实施例中,显示屏605可以为至少两个,分别设置在电子设备600的不同表面或呈折叠设计;在再一些实施例中,显示屏605可以是柔性显示屏,设置在电子设备600的弯曲表面上或折叠面上。甚至,显示屏605还可以设置成非矩形的不规则图形,也即异形屏。显示屏605可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。The display screen 605 is used to display a UI (User Interface). The UI can include graphics, text, icons, video, and any combination thereof. When display 605 is a touch display, display 605 also has the ability to capture touch signals over the surface or surface of display 605. The touch signal can be input to the processor 601 as a control signal for processing. At this time, the display screen 605 can also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display screen 605 can be one, and the front panel of the electronic device 600 is disposed; in other embodiments, the display screen 605 can be at least two, respectively disposed on different surfaces of the electronic device 600 or in a folded design. In still other embodiments, the display screen 605 can be a flexible display screen disposed on a curved surface or a folded surface of the electronic device 600. Even the display screen 605 can be set to a non-rectangular irregular pattern, that is, a profiled screen. The display screen 605 can be prepared by using a material such as an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
摄像头组件606用于采集图像或视频。可选地,摄像头组件606包括前置摄像头和后置摄像头。通常,前置摄像头设置在电子设备的前面板,后置摄像头设置在电子设备的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件606还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的 组合,可以用于不同色温下的光线补偿。Camera component 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Typically, the front camera is placed on the front panel of the electronic device and the rear camera is placed on the back of the electronic device. In some embodiments, the rear camera is at least two, which are respectively a main camera, a depth camera, a wide-angle camera, and a telephoto camera, so as to realize the background blur function of the main camera and the depth camera, and the main camera Combine with a wide-angle camera for panoramic shooting and VR (Virtual Reality) shooting or other integrated shooting functions. In some embodiments, camera assembly 606 can also include a flash. The flash can be a monochrome temperature flash or a two-color temperature flash. The two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.
音频电路607可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器601进行处理,或者输入至射频电路604以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在电子设备600的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器601或射频电路604的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路607还可以包括耳机插孔。The audio circuit 607 can include a microphone and a speaker. The microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for processing to the processor 601 for processing, or input to the radio frequency circuit 604 for voice communication. For the purpose of stereo acquisition or noise reduction, the microphones may be multiple, and are respectively disposed at different parts of the electronic device 600. The microphone can also be an array microphone or an omnidirectional acquisition microphone. The speaker is then used to convert electrical signals from the processor 601 or the RF circuit 604 into sound waves. The speaker can be a conventional film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only can the electrical signal be converted into human audible sound waves, but also the electrical signal can be converted into sound waves that are inaudible to humans for ranging and the like. In some embodiments, audio circuit 607 can also include a headphone jack.
定位组件608用于定位电子设备600的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件608可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。The location component 608 is used to locate the current geographic location of the electronic device 600 to implement navigation or LBS (Location Based Service). The positioning component 608 can be a positioning component based on a US-based GPS (Global Positioning System), a Chinese Beidou system, or a Russian Galileo system.
电源609用于为电子设备600中的各个组件进行供电。电源609可以是交流电、直流电、一次性电池或可充电电池。当电源609包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。Power source 609 is used to power various components in electronic device 600. The power source 609 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 609 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. A wired rechargeable battery is a battery that is charged by a wired line, and a wireless rechargeable battery is a battery that is charged by a wireless coil. The rechargeable battery can also be used to support fast charging technology.
在一些实施例中,电子设备600还包括有一个或多个传感器610。该一个或多个传感器610包括但不限于:加速度传感器611、陀螺仪传感器612、压力传感器613、指纹传感器614、光学传感器615以及接近传感器616。In some embodiments, electronic device 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to, an acceleration sensor 611, a gyro sensor 612, a pressure sensor 613, a fingerprint sensor 614, an optical sensor 615, and a proximity sensor 616.
加速度传感器611可以检测以电子设备600建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器611可以用于检测重力加速度在三个坐标轴上的分量。处理器601可以根据加速度传感器611采集的重力加速度信号,控制触摸显示屏605以横向视图或纵向视图进行用户界面的显示。加速度传感器611还可以用于游戏或者用户的运动数据的采集。The acceleration sensor 611 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the electronic device 600. For example, the acceleration sensor 611 can be used to detect components of gravity acceleration on three coordinate axes. The processor 601 can control the touch display screen 605 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 can also be used for the acquisition of game or user motion data.
陀螺仪传感器612可以检测电子设备600的机体方向及转动角度,陀螺仪传感器612可以与加速度传感器611协同采集用户对电子设备600的3D动作。处理器601根据陀螺仪传感器612采集的数据,可以实现如下功能:动作感应 (比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。The gyro sensor 612 can detect the body direction and the rotation angle of the electronic device 600, and the gyro sensor 612 can cooperate with the acceleration sensor 611 to collect the 3D action of the user on the electronic device 600. The processor 601 can realize functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation, based on the data collected by the gyro sensor 612.
压力传感器613可以设置在电子设备600的侧边框和/或触摸显示屏605的下层。当压力传感器613设置在电子设备600的侧边框时,可以检测用户对电子设备600的握持信号,由处理器601根据压力传感器613采集的握持信号进行左右手识别或快捷操作。当压力传感器613设置在触摸显示屏605的下层时,由处理器601根据用户对触摸显示屏605的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。The pressure sensor 613 may be disposed on a side frame of the electronic device 600 and/or a lower layer of the touch display screen 605. When the pressure sensor 613 is disposed on the side frame of the electronic device 600, the user's holding signal to the electronic device 600 can be detected, and the processor 601 performs left and right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed on the lower layer of the touch display screen 605, the operability control on the UI interface is controlled by the processor 601 according to the user's pressure operation on the touch display screen 605. The operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
指纹传感器614用于采集用户的指纹,由处理器601根据指纹传感器614采集到的指纹识别用户的身份,或者,由指纹传感器614根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器601授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器614可以被设置电子设备600的正面、背面或侧面。当电子设备600上设置有物理按键或厂商Logo时,指纹传感器614可以与物理按键或厂商Logo集成在一起。The fingerprint sensor 614 is used to collect the fingerprint of the user. The processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 601 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like. The fingerprint sensor 614 can be disposed on the front, back, or side of the electronic device 600. When the physical device 600 or the manufacturer logo is disposed on the electronic device 600, the fingerprint sensor 614 can be integrated with the physical button or the manufacturer logo.
光学传感器615用于采集环境光强度。在一个实施例中,处理器601可以根据光学传感器615采集的环境光强度,控制触摸显示屏605的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏605的显示亮度;当环境光强度较低时,调低触摸显示屏605的显示亮度。在另一个实施例中,处理器601还可以根据光学传感器615采集的环境光强度,动态调整摄像头组件606的拍摄参数。Optical sensor 615 is used to collect ambient light intensity. In one embodiment, the processor 601 can control the display brightness of the touch display 605 according to the ambient light intensity acquired by the optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 605 is raised; when the ambient light intensity is low, the display brightness of the touch display screen 605 is lowered. In another embodiment, the processor 601 can also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.
接近传感器616,也称距离传感器,通常设置在电子设备600的前面板。接近传感器616用于采集用户与电子设备600的正面之间的距离。在一个实施例中,当接近传感器616检测到用户与电子设备600的正面之间的距离逐渐变小时,由处理器601控制触摸显示屏605从亮屏状态切换为息屏状态;当接近传感器616检测到用户与电子设备600的正面之间的距离逐渐变大时,由处理器601控制触摸显示屏605从息屏状态切换为亮屏状态。Proximity sensor 616, also referred to as a distance sensor, is typically disposed on the front panel of electronic device 600. Proximity sensor 616 is used to capture the distance between the user and the front of electronic device 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front side of the electronic device 600 is gradually decreasing, the touch screen display 605 is controlled by the processor 601 to switch from the bright screen state to the screen state; when the proximity sensor 616 When it is detected that the distance between the user and the front side of the electronic device 600 gradually becomes larger, the processor 601 controls the touch display screen 605 to switch from the state of the screen to the state of the screen.
本领域技术人员可以理解,图11中示出的结构并不构成对电子设备600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同 的组件布置。It will be understood by those skilled in the art that the structure shown in FIG. 11 does not constitute a limitation to the electronic device 600, and may include more or less components than those illustrated, or some components may be combined, or different component arrangements may be employed.
在示例性实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由终端的处理器加载并执行以实现上述方法实施例中的图像识别方法。In an exemplary embodiment, there is also provided a computer readable storage medium having stored therein at least one instruction, at least one program, a code set or a set of instructions, the at least one instruction, the at least one program The code set or instruction set is loaded and executed by a processor of the terminal to implement the image recognition method in the above method embodiment.
可选地,上述计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。Alternatively, the computer readable storage medium described above may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。本文中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。It should be understood that "a plurality" as referred to herein means two or more. "and/or", describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately. The character "/" generally indicates that the contextual object is an "or" relationship. The words "first," "second," and similar terms used herein do not denote any order, quantity, or importance, but are used to distinguish different components.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
以上仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above is only an exemplary embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application are included in the scope of the present application. Inside.

Claims (18)

  1. 一种图像识别方法,应用于电子设备中,所述方法包括:An image recognition method is applied to an electronic device, the method comprising:
    采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块;Detecting, by using an image detection model, a target candidate region in the target image, where the target candidate region is an image block including the target object;
    当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域;Extracting the target candidate region when the target candidate region is detected from the target image;
    采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。Image recognition is performed based on the target candidate region by using an image recognition model, and a recognition result of the target image is obtained.
  2. 根据权利要求1所述的方法,其中,所述采用图像检测模型检测目标图像中的目标候选区域,包括:The method of claim 1, wherein the detecting the target candidate region in the target image using the image detection model comprises:
    采用所述图像检测模型获取所述目标图像中的每一个像素属于所述目标对象的概率;Acquiring, by the image detection model, a probability that each pixel in the target image belongs to the target object;
    根据各个像素对应的概率确定所述目标候选区域,所述目标候选区域包括概率大于预设阈值的像素。The target candidate region is determined according to a probability corresponding to each pixel, and the target candidate region includes a pixel whose probability is greater than a preset threshold.
  3. 根据权利要求2所述的方法,其中,所述根据各个像素对应的概率确定所述目标候选区域,包括:The method according to claim 2, wherein the determining the target candidate region according to a probability corresponding to each pixel comprises:
    根据各个像素对应的概率获取符合第一预设条件的图像块,将所述符合第一预设条件的图像块确定为目标图像块,其中,所述第一预设条件是指包含连续且大于预设数量的目标像素,所述目标像素是指概率大于预设阈值的像素;Obtaining an image block that meets the first preset condition according to a probability corresponding to each pixel, and determining the image block that meets the first preset condition as a target image block, where the first preset condition is continuous and larger than a preset number of target pixels, wherein the target pixel refers to a pixel whose probability is greater than a preset threshold;
    将包含所述目标图像块且符合第二预设条件的矩形区域确定为所述目标候选区域,所述第二预设条件为所述目标图像块在所述矩形区域内的占比大于预设比例。Determining, as the target candidate region, a rectangular area including the target image block and meeting a second preset condition, wherein the second preset condition is that the proportion of the target image block in the rectangular area is greater than a preset proportion.
  4. 根据权利要求1所述的方法,其中,所述采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果,包括:The method according to claim 1, wherein the image recognition is performed based on the target candidate region by using an image recognition model, and the recognition result of the target image is obtained, including:
    采用所述图像识别模型对所述目标候选区域进行特征提取,得到所述目标候选区域的图像特征;Performing feature extraction on the target candidate region by using the image recognition model to obtain an image feature of the target candidate region;
    根据所述目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第一概率分布;Determining, according to the image feature of the target candidate region, a first probability distribution of the target object in the target candidate region in the plurality of recognition results;
    将所述第一概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。The recognition result corresponding to the maximum value in the first probability distribution is determined as the recognition result of the target image.
  5. 根据权利要求1所述的方法,其中,所述采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果,包括:The method according to claim 1, wherein the image recognition is performed based on the target candidate region by using an image recognition model, and the recognition result of the target image is obtained, including:
    对所述目标候选区域进行预处理,得到处理后的目标候选区域,所述处理后的目标候选区域的分辨率达到预设分辨率;Performing pre-processing on the target candidate region to obtain a processed target candidate region, where the resolution of the processed target candidate region reaches a preset resolution;
    采用所述图像识别模型对所述处理后的目标候选区域进行特征提取,得到所述处理后的目标候选区域的图像特征;Performing feature extraction on the processed target candidate region by using the image recognition model to obtain image features of the processed target candidate region;
    根据所述处理后的目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第二概率分布;Determining, according to the image feature of the processed target candidate region, a second probability distribution of the target object in the target candidate region in the plurality of recognition results;
    将所述第二概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。The recognition result corresponding to the maximum value in the second probability distribution is determined as the recognition result of the target image.
  6. 根据权利要求1所述的方法,其中,所述图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、归一层和输出层;The method of claim 1, wherein the image detection model comprises an input layer, a convolution layer, a pooling layer, an upper convolution layer, a stitching layer, a layer and an output layer;
    所述输入层用于输入所述目标图像;The input layer is configured to input the target image;
    所述卷积层用于将所述目标图像转化为特征图;The convolution layer is configured to convert the target image into a feature map;
    所述池化层用于对所述卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量;The pooling layer is configured to perform a pooling process on the feature map output by the convolution layer to reduce the number of features in the feature map;
    所述上卷积层用于对所述卷积层输出的特征图执行上卷积操作;The upper convolution layer is configured to perform an upper convolution operation on a feature map output by the convolution layer;
    所述拼接层用于对经过所述池化层和所述上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图;The splicing layer is configured to perform splicing processing on the feature map processed by the pooling layer and the upper convolution layer to obtain a spliced feature map;
    所述归一层,用于对所述拼接后的特征图进行归一处理,得到所述目标候选区域的位置信息;And returning to the layered feature map for performing normalization processing on the spliced feature map to obtain location information of the target candidate region;
    所述输出层,用于输出所述目标候选区域的位置信息。The output layer is configured to output location information of the target candidate region.
  7. 根据权利要求1所述的方法,其中,所述图像识别模型包括输入层、卷积层、池化层、归一层和输出层;The method of claim 1, wherein the image recognition model comprises an input layer, a convolution layer, a pooling layer, a layer and an output layer;
    所述输入层用于输入所述目标候选区域;The input layer is configured to input the target candidate area;
    所述卷积层用于将所述目标候选区域转化为特征图;The convolution layer is configured to convert the target candidate region into a feature map;
    所述池化层用于对所述特征图进行池化处理,以减少所述特征图中的特征数量;The pooling layer is configured to perform pooling processing on the feature map to reduce the number of features in the feature map;
    所述归一层用于对经过所述卷积层和所述池化层处理后的特征图进行归一处理,得到所述识别结果;And returning a layer to normalize the feature image processed by the convolution layer and the pooling layer to obtain the recognition result;
    所述输出层用于输出所述识别结果。The output layer is for outputting the recognition result.
  8. 根据权利要求1至7任一项所述的方法,其中,所述提取所述目标候选区域之前,还包括:The method according to any one of claims 1 to 7, wherein before the extracting the target candidate region, the method further comprises:
    获取所述目标候选区域占所述目标图像的比例;Obtaining a ratio of the target candidate region to the target image;
    若所述比例大于预设门限,则直接执行所述采用图像识别模型对所述目标候选区域进行识别,得到所述目标图像的识别结果的步骤。If the ratio is greater than the preset threshold, the step of using the image recognition model to identify the target candidate region to obtain the recognition result of the target image is directly performed.
  9. 一种图像识别装置,应用于电子设备中,所述装置包括:An image recognition device is applied to an electronic device, the device comprising:
    图像检测模块,用于采用图像检测模型检测目标图像中的目标候选区域,所述目标候选区域为包含目标对象的图像块;An image detecting module, configured to detect, by using an image detection model, a target candidate region in the target image, where the target candidate region is an image block that includes the target object;
    区域提取模块,用于当从所述目标图像中检测出所述目标候选区域时,提取所述目标候选区域;a region extracting module, configured to extract the target candidate region when the target candidate region is detected from the target image;
    图像识别模块,用于采用图像识别模型基于所述目标候选区域进行图像识别,得到所述目标图像的识别结果。An image recognition module is configured to perform image recognition based on the target candidate region by using an image recognition model to obtain a recognition result of the target image.
  10. 根据权利要求9所述的装置,其中,所述图像检测模块,用于:The apparatus according to claim 9, wherein said image detecting module is configured to:
    采用所述图像检测模型获取所述目标图像中的每一个像素属于所述目标对象的概率;Acquiring, by the image detection model, a probability that each pixel in the target image belongs to the target object;
    根据各个像素对应的概率确定所述目标候选区域,所述目标候选区域包括概率大于预设阈值的像素。The target candidate region is determined according to a probability corresponding to each pixel, and the target candidate region includes a pixel whose probability is greater than a preset threshold.
  11. 根据权利要求10所述的装置,其中,所述图像检测模块,用于:The apparatus according to claim 10, wherein said image detecting module is configured to:
    根据各个像素对应的概率获取符合第一预设条件的图像块,将所述符合第一预设条件的图像块确定为目标图像块,其中,所述第一预设条件是指包含连 续且大于预设数量的目标像素,所述目标像素是指概率大于预设阈值的像素;Obtaining an image block that meets the first preset condition according to a probability corresponding to each pixel, and determining the image block that meets the first preset condition as a target image block, where the first preset condition is continuous and larger than a preset number of target pixels, wherein the target pixel refers to a pixel whose probability is greater than a preset threshold;
    将包含所述目标图像块且符合第二预设条件的矩形区域确定为所述目标候选区域,所述第二预设条件为所述目标图像块在所述矩形区域内的占比大于预设比例。Determining, as the target candidate region, a rectangular area including the target image block and meeting a second preset condition, wherein the second preset condition is that the proportion of the target image block in the rectangular area is greater than a preset proportion.
  12. 根据权利要求9所述的装置,其中,所述图像识别模块,用于:The apparatus of claim 9, wherein the image recognition module is configured to:
    采用所述图像识别模型对所述目标候选区域进行特征提取,得到所述目标候选区域的图像特征;Performing feature extraction on the target candidate region by using the image recognition model to obtain an image feature of the target candidate region;
    根据所述目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第一概率分布;Determining, according to the image feature of the target candidate region, a first probability distribution of the target object in the target candidate region in the plurality of recognition results;
    将所述第一概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。The recognition result corresponding to the maximum value in the first probability distribution is determined as the recognition result of the target image.
  13. 根据权利要求9所述的装置,其中,所述图像识别模块,用于:The apparatus of claim 9, wherein the image recognition module is configured to:
    对所述目标候选区域进行预处理,得到处理后的目标候选区域,所述处理后的目标候选区域的分辨率达到预设分辨率;Performing pre-processing on the target candidate region to obtain a processed target candidate region, where the resolution of the processed target candidate region reaches a preset resolution;
    采用所述图像识别模型对所述处理后的目标候选区域进行特征提取,得到所述处理后的目标候选区域的图像特征;Performing feature extraction on the processed target candidate region by using the image recognition model to obtain image features of the processed target candidate region;
    根据所述处理后的目标候选区域的图像特征,确定所述目标候选区域中的目标对象在多个识别结果中的第二概率分布;Determining, according to the image feature of the processed target candidate region, a second probability distribution of the target object in the target candidate region in the plurality of recognition results;
    将所述第二概率分布中的最大值所对应的识别结果,确定为所述目标图像的识别结果。The recognition result corresponding to the maximum value in the second probability distribution is determined as the recognition result of the target image.
  14. 根据权利要求9所述的装置,其中,所述图像检测模型包括输入层、卷积层、池化层、上卷积层、拼接层、输出层;The apparatus according to claim 9, wherein the image detection model comprises an input layer, a convolution layer, a pooling layer, an upper convolution layer, a splicing layer, and an output layer;
    所述输入层用于输入所述目标图像;The input layer is configured to input the target image;
    所述卷积层用于将所述目标图像转化为特征图;The convolution layer is configured to convert the target image into a feature map;
    所述池化层用于对所述卷积层输出的特征图进行池化处理,以减少所述特征图中的特征数量;The pooling layer is configured to perform a pooling process on the feature map output by the convolution layer to reduce the number of features in the feature map;
    所述上卷积层用于对所述卷积层输出的特征图执行上卷积操作;The upper convolution layer is configured to perform an upper convolution operation on a feature map output by the convolution layer;
    所述拼接层用于对经过所述池化层和所述上卷积层处理后的特征图进行拼接处理,得到拼接后的特征图;The splicing layer is configured to perform splicing processing on the feature map processed by the pooling layer and the upper convolution layer to obtain a spliced feature map;
    所述归一层,用于对所述拼接后的特征图进行归一处理,得到所述目标候选区域的位置信息;And returning to the layered feature map for performing normalization processing on the spliced feature map to obtain location information of the target candidate region;
    所述输出层,用于输出所述目标候选区域的位置信息。The output layer is configured to output location information of the target candidate region.
  15. 根据权利要求9所述的装置,其中,所述图像识别模型包括输入层、卷积层、池化层、归一层和输出层;The apparatus according to claim 9, wherein said image recognition model comprises an input layer, a convolution layer, a pooling layer, a layer and an output layer;
    所述输入层用于输入所述目标候选区域;The input layer is configured to input the target candidate area;
    所述卷积层用于将所述目标候选区域转化为特征图;The convolution layer is configured to convert the target candidate region into a feature map;
    所述池化层用于对所述特征图进行池化处理,以减少所述特征图中的特征数量;The pooling layer is configured to perform pooling processing on the feature map to reduce the number of features in the feature map;
    所述归一层用于对经过所述卷积层和所述池化层处理后的特征图进行归一处理,得到所述识别结果;And returning a layer to normalize the feature image processed by the convolution layer and the pooling layer to obtain the recognition result;
    所述输出层用于输出所述识别结果。The output layer is for outputting the recognition result.
  16. 根据权利要求9至15任一项所述的装置,其中,所述装置还包括:The device according to any one of claims 9 to 15, wherein the device further comprises:
    比例获取模块,用于获取所述目标候选区域占所述目标图像的比例;a ratio acquisition module, configured to acquire a ratio of the target candidate area to the target image;
    所述图像识别模块,还用于若所述比例大于预设门限,则直接执行所述采用图像识别模型对所述目标候选区域进行识别,得到所述目标图像的识别结果的步骤。The image recognition module is further configured to directly perform the step of recognizing the target candidate region by using an image recognition model to obtain a recognition result of the target image, if the ratio is greater than a preset threshold.
  17. 一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至8任一项所述的图像识别方法。An electronic device, comprising: a processor and a memory, wherein the memory stores at least one instruction, at least one program, a code set or a set of instructions, the at least one instruction, the at least one program, the code A set or set of instructions is loaded and executed by the processor to implement the image recognition method of any one of claims 1 to 8.
  18. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至8任一 项所述的图像识别方法。A computer readable storage medium having stored therein at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one program, the code set, or an instruction The set is loaded and executed by a processor to implement the image recognition method according to any one of claims 1 to 8.
PCT/CN2018/116044 2017-11-23 2018-11-16 Image recognition method, apparatus, and electronic device WO2019101021A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711180320.XA CN109829456A (en) 2017-11-23 2017-11-23 Image-recognizing method, device and terminal
CN201711180320.X 2017-11-23

Publications (1)

Publication Number Publication Date
WO2019101021A1 true WO2019101021A1 (en) 2019-05-31

Family

ID=66631339

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116044 WO2019101021A1 (en) 2017-11-23 2018-11-16 Image recognition method, apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN109829456A (en)
WO (1) WO2019101021A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390261A (en) * 2019-06-13 2019-10-29 北京汽车集团有限公司 Object detection method, device, computer readable storage medium and electronic equipment
CN112990387B (en) * 2021-05-17 2021-07-20 腾讯科技(深圳)有限公司 Model optimization method, related device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850845A (en) * 2015-05-30 2015-08-19 大连理工大学 Traffic sign recognition method based on asymmetric convolution neural network
CN105320945A (en) * 2015-10-30 2016-02-10 小米科技有限责任公司 Image classification method and apparatus
CN106446784A (en) * 2016-08-30 2017-02-22 东软集团股份有限公司 Image detection method and apparatus
CN107194393A (en) * 2016-03-15 2017-09-22 杭州海康威视数字技术股份有限公司 A kind of method and device for detecting Provisional Number Plate

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9514381B1 (en) * 2013-03-15 2016-12-06 Pandoodle Corporation Method of identifying and replacing an object or area in a digital image with another object or area

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850845A (en) * 2015-05-30 2015-08-19 大连理工大学 Traffic sign recognition method based on asymmetric convolution neural network
CN105320945A (en) * 2015-10-30 2016-02-10 小米科技有限责任公司 Image classification method and apparatus
CN107194393A (en) * 2016-03-15 2017-09-22 杭州海康威视数字技术股份有限公司 A kind of method and device for detecting Provisional Number Plate
CN106446784A (en) * 2016-08-30 2017-02-22 东软集团股份有限公司 Image detection method and apparatus

Also Published As

Publication number Publication date
CN109829456A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
WO2019105285A1 (en) Facial attribute recognition method, electronic device, and storage medium
WO2019101021A1 (en) Image recognition method, apparatus, and electronic device
EP3779883A1 (en) Method and device for repositioning in camera orientation tracking process, and storage medium
CN108594997B (en) Gesture skeleton construction method, device, equipment and storage medium
CN110647865A (en) Face gesture recognition method, device, equipment and storage medium
CN111079576A (en) Living body detection method, living body detection device, living body detection equipment and storage medium
WO2020108041A1 (en) Detection method and device for key points of ear region and storage medium
CN109360222B (en) Image segmentation method, device and storage medium
US20210158021A1 (en) Method for processing images and electronic device
US20210134022A1 (en) Method and electronic device for adding virtual item
CN111354378B (en) Voice endpoint detection method, device, equipment and computer storage medium
WO2020019873A1 (en) Image processing method and apparatus, terminal and computer-readable storage medium
CN111242090A (en) Human face recognition method, device, equipment and medium based on artificial intelligence
CN110807361A (en) Human body recognition method and device, computer equipment and storage medium
WO2019219065A1 (en) Video analysis method and device
CN109886208B (en) Object detection method and device, computer equipment and storage medium
CN111027490A (en) Face attribute recognition method and device and storage medium
CN110991457A (en) Two-dimensional code processing method and device, electronic equipment and storage medium
CN110795019A (en) Key identification method and device of soft keyboard and storage medium
CN110991445A (en) Method, device, equipment and medium for identifying vertically arranged characters
CN112990424A (en) Method and device for training neural network model
CN110929675A (en) Image processing method, image processing device, computer equipment and computer readable storage medium
CN113129221A (en) Image processing method, device, equipment and storage medium
CN111757146A (en) Video splicing method, system and storage medium
CN112308104A (en) Abnormity identification method and device and computer storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18881277

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18881277

Country of ref document: EP

Kind code of ref document: A1