WO2019119919A1

WO2019119919A1 - Image recognition method and electronic device

Info

Publication number: WO2019119919A1
Application number: PCT/CN2018/108229
Authority: WO
Inventors: 张子敬; 颜奉丽; 王星晨; 朱涛
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2017-12-19
Filing date: 2018-09-28
Publication date: 2019-06-27
Also published as: CN109934077B; CN109934077A

Abstract

Disclosed are an image recognition method and an electronic device. The method is applied to a co-processor in an electronic device, and the electronic device further comprises a CPU. The method comprises: receiving an image to be recognized sent by a CPU; inputting the image to be recognized into a pre-built content recognition neural network to obtain a content recognition result, wherein the content recognition result comprises: the category and position area of an object contained in the image to be recognized; inputting an image block corresponding to each obtained position area into a pre-built attribute recognition neural network, to obtain an attribute of each object; and sending the obtained category and attribute of each object to the CPU, so that the CPU takes the received category and attribute of the object as an image recognition result of the image to be recognized. Applying the embodiments of the present application, the category and attribute of an object contained in an image can be accurately recognized via a content recognition neural network and an attribute recognition neural network, and the pressure of computation of a CPU can be reduced.

Description

Image recognition method and electronic device

The present application claims priority to Chinese Patent Application No. JP-A No. No. No. No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No .

Technical field

The present application relates to the field of image processing technologies, and in particular, to an image recognition method and an electronic device.

Background technique

Currently, it is often necessary to identify images that are monitored by the camera to identify the categories of individual objects contained in the image, as well as the properties of the individual objects. For example, it is recognized that the category of an object included in a road monitoring image is a car, and the vehicle type, color, and the like of the vehicle are identified.

Among them, since the camera continuously collects image data, the number of images that need to be recognized is very large. In the related art, these large numbers of images are often processed by the central processing unit CPU to identify the categories of objects contained in these images, as well as the attributes of these objects.

However, when there are many images to be recognized, the way of using the CPU to recognize the image will cause a large calculation pressure on the CPU.

Summary of the invention

An object of the embodiments of the present application is to provide an image recognition method and an electronic device to accurately identify categories and attributes of objects included in an image, and reduce the calculation pressure of the CPU.

In a first aspect, an embodiment of the present application provides an image recognition method, which is applied to a coprocessor in an electronic device, where the electronic device further includes a central processing unit CPU, and the method may include:

Receiving an image to be recognized sent by the CPU; inputting the image to be recognized into a pre-constructed content recognition neural network, and obtaining a content recognition result, where the content recognition result includes: a category and a location area of the object included in the image to be recognized; The image blocks corresponding to each location area are input to a pre-built attribute recognition neural network to obtain attributes of each object; the obtained category and attributes of each object are sent to the CPU, so that the CPU will receive the type of the object and Attribute as the result of image recognition of the image to be recognized.

Optionally, the step of inputting the image block corresponding to each location area to the pre-built attribute recognition neural network to obtain the attribute of each object may include:

Determining an attribute recognition neural network corresponding to each object based on a preset mapping relationship and a category of each object included in the image to be identified; wherein the preset mapping relationship comprises: a preset category and a pre-built attribute recognition neural network Correspondence between the two; input the image block corresponding to each location area to: the attribute corresponding to the object identifies the neural network, and obtains the attribute of each object.

Optionally, the obtained image block corresponding to each location area is input to: the attribute corresponding to the object identifies the neural network, and the step of obtaining the attribute of each object may include:

The objects included in the image to be identified are divided into two groups to obtain a first group of objects and a second group of objects; and an object corresponding to a location area of each object in the first group of objects is input to: an attribute recognition neural network corresponding to the object Obtaining an attribute of each object in the first group of objects; sending a location area of each object in the second group of objects to the CPU, so that the CPU inputs the image block corresponding to the location area of each object in the second group of objects To: the attribute corresponding to the object identifies the neural network, and obtains the attribute of each object in the second group of objects;

Correspondingly, the step of sending the obtained category and attribute of each object to the CPU may include:

Sending the category and attribute of each object in the first group of objects, the category of each object in the second group of objects to the CPU, so that the CPU will classify the category and attribute of each object in the first group of objects, and the second group of objects The category and attribute of each object in the image as the result of image recognition of the image to be recognized.

Optionally, the step of dividing the objects included in the image to be identified into two groups, and obtaining the first group of objects and the second group of objects may include:

Selecting a preset number of objects from the objects included in the image to be identified, as the first group of objects, and the remaining objects as the second group of objects; or, as the first group, the objects of the first preset category in the image to be identified The object is an object that is not the first preset category in the image to be identified as the second group of objects.

Sending, to the CPU, a location area of each of the objects of the second preset category among the objects included in the image to be identified, so that the CPU inputs the image block corresponding to the location area of each first object to: the first The attribute corresponding to the object identifies the first type of attribute recognition neural network in the neural network, and obtains the first type attribute of each first object; the image block corresponding to the position area of each first object is input to: the first object corresponds to The second type of attribute recognition neural network in the attribute recognition neural network obtains the second type attribute of each first object; the second object of the second preset category is not included in the object to be recognized An image block corresponding to the location area is input to: a second type of attribute recognition neural network in the attribute recognition neural network corresponding to the second object, to obtain a second type attribute of each second object;

Sending a second type of attribute and category of each first object, and a second attribute and category of each second object to the CPU, so that the CPU will first class attribute, second type attribute, and The category, and the second type of attribute and category of each second object, are the image recognition results of the image to be recognized.

Optionally, the step of sending the obtained category and the attribute of each object to the CPU, so that the CPU uses the category and the attribute of the received object as the image recognition result of the image to be identified, may include:

The obtained location area, category, and attribute of each object are sent to the CPU, so that the CPU determines the location area, category, and attribute of the received object as the image recognition result of the image to be recognized.

Optionally, the content recognition neural network is further configured to identify a confidence level corresponding to a category of the object included in the image; and the content recognition result further includes: a confidence level corresponding to the category of the object included in the image;

Before the image blocks corresponding to each of the obtained location areas are input to the pre-built attribute recognition neural network to obtain the attributes of each object, the method may further include:

Determining whether the obtained confidence is greater than a preset threshold; if yes, the object corresponding to the confidence level greater than the preset threshold is used as the filtered object;

Correspondingly, the step of inputting the image block corresponding to each location area to the pre-built attribute recognition neural network to obtain the attribute of each object may include:

Sending the image block corresponding to the selected location area of each object to the pre-built attribute recognition neural network for attribute recognition, and obtaining the attribute of each object after the screening;

Send the obtained categories and attributes of each object after filtering to the CPU.

Optionally, the coprocessor includes at least one of a graphics processor GPU, a digital signal processor DSP, and a field programmable gate array processor FPGA.

And scaling the image block corresponding to each of the obtained location areas;

Each image block obtained after the scaling process is input to a pre-built attribute recognition neural network to obtain attributes of each object.

Optionally, the image to be identified is obtained by the CPU performing image format conversion and scaling processing on the original image.

In a second aspect, an embodiment of the present application further provides an electronic device, where the electronic device may include a coprocessor and a central processing unit CPU;

The CPU is configured to send the image to be recognized to the coprocessor;

The coprocessor is configured to receive an image to be recognized sent by the CPU;

The coprocessor is further configured to: input the image to be recognized into the pre-constructed content recognition neural network, and obtain a content recognition result, where the content recognition result includes: a category and a location area of the object included in the image to be identified;

The coprocessor is further configured to: input the obtained image block corresponding to each location area to a pre-built attribute recognition neural network, and obtain an attribute of each object;

The coprocessor is further configured to: send the obtained category and attribute of each object to the CPU;

The CPU is further configured to: receive a category and an attribute of each object sent by the coprocessor, and use the category and attribute of the received object as an image recognition result of the image to be identified.

Optionally, in the embodiment of the present application, the coprocessor may be specifically configured to:

The objects included in the image to be identified are divided into two groups to obtain a first group of objects and a second group of objects; the image blocks corresponding to the position regions of each object in the first group of objects are input to: the attribute recognition nerve corresponding to the object The network obtains the attributes of each object in the first group of objects; sends the location area of each object in the second group of objects to the CPU; the categories and attributes of each object in the first group of objects, each of the second group of objects The category of the object is sent to the CPU;

The CPU is specifically configured to: input an image block corresponding to a location area of each object in the second group of objects to: an attribute corresponding to the object identifies the neural network, and obtain an attribute of each object in the second group of objects; and the first group of objects The category and attribute of each object in the object, and the category and attribute of each object in the second group of objects, as the image recognition result of the image to be recognized.

Sending, to the CPU, a location area of each of the objects of the second preset category among the objects included in the image to be identified;

Inputting an image block corresponding to the location area of each first object to: a second type of attribute recognition neural network in the attribute recognition neural network corresponding to the first object, obtaining a second type attribute of each first object; Identifying, in the object included in the image, an image block corresponding to a location area of each second object of the second preset category, inputting to: a second type of attribute recognition neural network in the attribute recognition neural network corresponding to the second object Obtaining a second type of attribute of each second object; sending a second type of attribute and category of each first object, and a second attribute and category of each second object to the CPU;

The CPU is specifically configured to: input an image block corresponding to the location area of each first object to: the first type of attribute recognition neural network in the attribute recognition neural network corresponding to the first object, and obtain the first of each first object The class attribute; the first type attribute, the second type attribute and the category of each first object, and the second type attribute and category of each second object are used as image recognition results of the image to be recognized.

Optionally, in the embodiment of the present application, the coprocessor may be specifically configured to: send the obtained location area, category, and attribute of each object to the CPU;

The CPU may be specifically configured to: use the location area, the category, and the attribute of the received object as the image recognition result of the image to be identified.

Correspondingly, the coprocessor can be further configured to: input the image block corresponding to each location area obtained to the pre-built attribute recognition neural network, and determine whether the obtained confidence is greater than a preset before obtaining the attribute of each object. a threshold value; if yes, an object corresponding to a confidence level greater than a preset threshold is used as a filtered object; and the image block corresponding to the selected location area of each object is sent to a pre-built attribute recognition neural network for attribute recognition, The attributes of each object after filtering; the categories and attributes of each object after filtering are sent to the CPU.

Optionally, the coprocessor may include at least one of a graphics processor GPU, a digital signal processor DSP, and a field programmable gate array processor FPGA.

The obtained image blocks corresponding to each location area are subjected to scaling processing; each image block obtained after the scaling processing is input to a pre-built attribute recognition neural network to obtain attributes of each object.

Optionally, in the embodiment of the present application, the CPU is further configured to:

Image format conversion and scaling processing is performed on the original image to obtain an image to be recognized.

In a third aspect, the embodiment of the present application further provides a readable storage medium, which is a storage medium in an electronic device including a coprocessor and a central processing unit CPU, where the readable storage medium stores a computer The program, when the computer program is executed by the coprocessor, implements the following steps:

In a fourth aspect, an embodiment of the present application further provides an application that, when running on an electronic device including a coprocessor and a central processing unit CPU, causes the coprocessor to execute:

Optionally, the image to be identified is obtained by the CPU performing image format conversion and scaling processing on the original image. In the embodiment of the present application, the coprocessor in the electronic device may receive the image to be recognized sent by the CPU in the electronic device, and input the image to be recognized into the pre-built content recognition neural network, thereby obtaining the to-be-identified image. Identify the category and location area of the object contained in the image. Then, the coprocessor inputs the obtained image blocks corresponding to each location area into the pre-built attribute recognition neural network, so that the attributes of each object can be obtained. Further, the coprocessor can send the obtained category and attribute of each object to the CPU, so that the CPU can use the type and attribute of the received object as the image recognition result of the image to be recognized. In this manner, the coprocessor can identify the type and attribute of the object included in the image to be identified by means of the content recognition neural network and the attribute recognition neural network, and share the calculation pressure of the CPU to recognize the image. , which reduces the computational pressure on the CPU.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following description of the embodiments and the drawings used in the prior art will be briefly introduced. Obviously, the drawings in the following description are only Some embodiments of the application may also be used to obtain other figures from those of ordinary skill in the art without departing from the scope of the invention.

FIG. 1 is a flowchart of an image recognition method according to an embodiment of the present application;

2 is a schematic diagram of an image recognition method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of another image recognition method according to an embodiment of the present application; FIG.

4 is a schematic diagram of still another image recognition method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of still another image recognition method according to an embodiment of the present application; FIG.

FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed ways

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

To solve the problems in the prior art, the embodiment of the present application provides an image recognition method and an electronic device.

The image recognition method provided by the embodiment of the present application is first described below.

The image recognition method provided by the embodiment of the present application is applied to a coprocessor in an electronic device, and the coprocessor may be a GPU (Graphics Processing Unit) or a DSP (Digital Signal Processing). It can also be an FPGA (Field-Programmable Gate Array). Of course, it can also be any combination of GPU, DSP and FPGA, which is reasonable. In addition, the electronic device further includes a CPU (Central Processing Unit).

The electronic device may be a front-end device, such as a video camera, or a back-end device, such as a server. Specifically, when the electronic device is a front-end device, the coprocessor can select a low-power DSP and/or an FPGA; when the electronic device is a back-end device, the coprocessor can select a higher power consumption but more The GPU that is easy to develop is certainly not limited to this. Among them, in the embodiment of the present application, the coprocessor can support complex floating point calculation.

Referring to FIG. 1 , an image recognition method provided by an embodiment of the present application includes the following steps:

S101: Receive an image to be identified sent by the CPU;

S102: input the image to be identified to the pre-constructed content recognition neural network, and obtain a content recognition result, where the content recognition result includes: a category and a location area of the object included in the image to be identified;

S103: input the obtained image block corresponding to each location area to a pre-built attribute recognition neural network, and obtain an attribute of each object;

S104: Send the obtained category and attribute of each object to the CPU, so that the CPU uses the category and attribute of the received object as the image recognition result of the image to be recognized.

It can be understood that the coprocessor in the electronic device can receive the image to be recognized sent by the CPU in the electronic device, and can input the image to be recognized into the pre-constructed content recognition neural network, thereby obtaining the image to be recognized. The category and location area of the object contained in it. Then, the coprocessor inputs the obtained image blocks corresponding to each location area into the pre-built attribute recognition neural network, so that the attributes of each object can be obtained. Further, the coprocessor can send the obtained category and attribute of each object to the CPU, so that the CPU can use the type and attribute of the received object as the image recognition result of the image to be recognized. In this manner, the coprocessor can identify the type and attribute of the object included in the image to be identified by using the content recognition neural network and the attribute recognition neural network, and share the calculation pressure of the CPU to recognize the image, thereby Reduced the computational pressure of the CPU.

For example, the image to be identified may be a road monitoring image that includes a person and a car. Then, the coprocessor can recognize that the category of one object included in the road monitoring image is a person, and the category of another object included is a vehicle. And it can be identified that the gender attribute of the person is female, the color attribute of the clothing is blue, and the like, and the color attribute of the vehicle is black, the vehicle attribute is a car, and the like.

The image to be identified may be obtained by the CPU preprocessing the original image. Of course, the image to be recognized may also be the original image itself to be recognized by the CPU, which is reasonable. The operation corresponding to the preprocessing may include: image format conversion, image scaling, and the like. In this way, the original image can be converted into an image format recognizable by the content recognition neural network by image format conversion, and the original image can be converted into a resolution recognizable by the content recognition neural network by image scaling, so that the obtained image to be recognized is satisfied. Content recognition neural network image format and resolution requirements. Of course, the operation corresponding to the pre-processing may further include: performing region extraction on the region of interest in the original image to obtain the region of interest; performing denoising processing on the original image to improve image quality, etc., thereby improving subsequent pairs. The recognition effect of the image for recognition.

That is to say, the original image can be preprocessed by the CPU to obtain an image to be identified. Then, the co-processor recognizes the image to be recognized, and obtains the category and attribute of the object included in the image to be recognized. In this way, when the number of images to be identified is large, the CPU can send the pre-processed image to be recognized to the coprocessor for processing after pre-processing the image. Then, the idle CPU can start preprocessing the next frame image, so that the CPU and the coprocessor can realize parallel computing, which avoids the image that needs to be queued for the CPU to perform image recognition: the image recognition result is slower. The problem.

The pre-built content recognition neural network may be based on Faster Region-based Convolutional Network Method (Faster Region-based Convolutional Network Method), YOLO (You Only Look Once) algorithm or SSD (Single) Shot MultiBox Detector) algorithm and other artificial neural network algorithm training. Moreover, a large number of image samples are used in the training process, and the content recognition neural network is trained by the category and location area of the objects included in each image sample. Therefore, the content recognition neural network obtained by the training can identify the category and location area of the object included in the image. Moreover, the inventors have found through a large number of experiments that the content recognition neural network based on neural network training can be compared with the traditional SVM (Support Vector Machine) algorithm for identifying the categories of objects contained in the image. Get more accurate category and location area recognition results.

In addition, in the prior art, the attribute of an object included in an image is determined by using an attribute feature that is artificially set. For example, the color value range corresponding to the red color is artificially set, that is, the color feature of the red color is artificially set. When the color of the object is determined to be within the range of the color corresponding to the red color, the color attribute of the object is determined to be red. However, when the color range corresponding to the red color is not set accurately, the judgment result of the color attribute is not accurate. Therefore, it can be known that the accuracy of the method for determining the attribute is greatly influenced by the human factor, and the attribute recognition effect is not stable.

In the embodiment of the present application, the attribute recognition neural network can be trained based on a convolutional neural network algorithm such as LeNet, AlexNet, or GoogleNet. And, since the attribute recognition neural network is trained through a large number of object samples, and the attributes of each object sample. Therefore, the attribute-recognition neural network obtained by the training can identify the attributes of the objects contained in the image without depending on the experience setting characteristics of the person. And with the increase of the training samples, the recognition accuracy of the attribute recognition neural network is higher, and the recognition effect is more stable.

Of course, before the object is input to the attribute recognition neural network for recognition, the object may be scaled, and then the object obtained by the scaling process is input to the attribute recognition neural network for recognition. Wherein, when the object is reduced (ie, subjected to downsampling processing), the amount of data processing of the object by the attribute recognition neural network can be reduced, thereby improving the processing speed. Of course, it is also possible to enlarge the object so that the size of the enlarged object matches the size of the object sample used to train the attribute recognition neural network, thereby obtaining a better attribute recognition result.

The image recognition method provided by the embodiment of the present application will be described in detail below with reference to FIG. 2 .

Referring to FIG. 2, in the embodiment of the present application, after the CPU performs pre-processing on the original image, the image to be identified can be obtained. Thereafter, the CPU can send the image to be identified to the coprocessor. After receiving the image to be identified, the coprocessor may input the image to be recognized into a pre-built content recognition neural network.

When the content recognition neural network can calculate the confidence level corresponding to the category of each object obtained by the image in addition to the category and location area of the object included in the image, the content recognition neural network can output: the to-be-identified The object 1, the object 2, the object 3, ..., the object N-1 and the category of the object N included in the image, the position area of the N objects in the image to be recognized, and each of the N objects The confidence level corresponding to the category. Among them, the confidence level refers to the credibility of the identified category.

In this way, the coprocessor can filter each object according to the confidence level. The specific filtering method can be: determining whether the confidence level corresponding to the object category is greater than a preset threshold. If it is greater than the preset threshold, it indicates that the identified category of the object has high credibility. At this time, the object may continue to be input to the pre-built attribute recognition neural network to identify the attribute of the object; if less than the preset The threshold indicates that the credibility of the identified category of the object is not high, and the object is not input to the attribute identifying neural network for subsequent attribute recognition. In this manner, the coprocessor may not continue to identify the attributes of the objects corresponding to the identified and less reliable categories, that is, some untrusted objects may be deleted, thereby improving the accuracy of the image recognition result. Sexuality, and can reduce the recognition pressure of the coprocessor to identify the attributes of the object.

It should be noted that the attribute recognition neural network shown in FIG. 2 may be the same attribute recognition neural network, and the attribute recognition neural network is used to identify the same attribute feature (eg, color feature). There may also be a plurality of different attribute recognition neural networks, and each attribute recognition neural network is used to identify attributes of an object of a category. For example, when the category of the object 1 in FIG. 2 is a vehicle, the attribute recognition neural network corresponding to the object 1 may be an attribute recognition neural network for identifying the color characteristics of the vehicle. When the category of the object 2 is a person, the attribute recognition neural network corresponding to the object 2 may be an attribute recognition neural network for identifying the gender feature of the person.

Of course, the attribute identification network corresponding to the object 1 and the object 2 may also be multiple. For example, the attribute recognition neural network corresponding to the object 1 may be an attribute recognition neural network that recognizes the color characteristics of the vehicle, and an attribute recognition neural network that identifies the vehicle type characteristics of the vehicle, and is of course not limited thereto.

In this way, different attribute recognition networks can be set for different categories of objects, and each object can be set to have multiple attribute recognition networks, so that multiple attributes of the object can be identified, so that richer attribute information can be obtained.

The method for determining the attribute corresponding to the object 1 to identify the neural network is: after determining that the category of the object 1 is a vehicle, the neural network is identified based on the attribute of the color characteristic of the category car and the identification vehicle recorded in the preset relationship, and The attribute identifying the vehicle type feature of the vehicle identifies the correspondence relationship of the neural network, and determines the attribute recognition neural network corresponding to the object 1. The attribute recognition neural network can be set by a person skilled in the art according to actual needs, and is not illustrated here.

Of course, the image recognition method provided by the embodiment of the present application may also be described in conjunction with the schematic diagram shown in FIG. 3 .

Referring to FIG. 3, it is assumed that the camera continuously transmits an image frame to be identified to the CPU in the electronic device. Moreover, after preprocessing the received N-1th frame image, the CPU can obtain an image to be recognized corresponding to the image of the N-1th frame. Afterwards, the CPU transmits the image to be identified to the coprocessor, and the coprocessor identifies the image to be identified corresponding to the image of the N-1th frame, and identifies the location area and the category of the object included in the image to be recognized. And an attribute, and returning the identified location area, category, and attribute of the object to the CPU, so that the CPU uses the location area, category, and attribute of the received object as the image recognition result of the image to be recognized.

After transmitting the image to be identified to the coprocessor, the CPU may continue to preprocess the received image of the Nth frame, and send the image to be identified corresponding to the obtained image of the Nth frame to the coprocessor, so that The coprocessor identifies the image to be identified corresponding to the image of the Nth frame. According to this method, the CPU and the coprocessor can perform asynchronous cooperative processing on the image, thereby improving the recognition speed of the image by the electronic device.

In addition, when the coprocessor needs to identify the attributes of more objects, the image recognition method as shown in FIG. 4 can be used to improve the speed of image recognition.

Referring to FIG. 4, it is assumed that the coprocessor receives the image to be recognized corresponding to the image of the N-1th frame transmitted by the CPU. At this time, the coprocessor can input the image to be recognized corresponding to the image of the N-1th frame into the pre-constructed content recognition neural network, and identify the category and the location area of the object included in the image to be recognized. It is also assumed that after the coprocessor recognizes the category and location area of the object included in the image to be identified, it also needs to identify the attributes of more objects in the identified object. At this time, the objects included in the image to be identified may be divided into two groups, and the first group of objects and the second group of objects are obtained. Then, the coprocessor can identify the attributes of the first group of objects that are more computationally intensive. Specifically, the coprocessor can input the image blocks corresponding to the location area of each object in the first group of objects to: the object The corresponding attribute identifies the neural network, and the attributes of each object in the first set of objects are obtained.

Moreover, the coprocessor can migrate the attribute recognition task of the second group of objects with less computational load to the CPU for calculation. Specifically, the coprocessor sends a location area of each object in the second group of objects to the CPU, so that the CPU inputs the image block corresponding to the location area of each object in the second group of objects to: attribute recognition corresponding to the object The neural network gets the properties of each object in the second set of objects.

In this way, the computing power of the CPU and the coprocessor can be fully utilized, and the attribute recognition speed is high. Moreover, when the attribute recognition pressure of the coprocessor is large, a part of the attribute recognition task can be sent to the CPU for processing, which avoids the situation that the coprocessor calculation pressure is large and the CPU waits.

Then, the coprocessor can send the calculated category and attribute of each object in the first group of objects, and the category of each object in the second group of objects to the CPU, so that the CPU will each object in the first group of objects The category and the attribute are summarized with the category and attribute of each object in the second group of objects to obtain an image recognition result of the image to be recognized.

The grouping manner of dividing the object included in the image to be identified into two groups may be: selecting a preset number of objects from the objects included in the image to be identified, as the first group of objects, and remaining objects as the second group. a group object; or, the object included in the image to be recognized is a first preset category (for example, a category car), as the first group of objects, and the object included in the image to be recognized is not the first object It is reasonable to preset the object of the category as the second group of objects.

In addition, when the coprocessor needs to recognize multiple attributes of an object, an image recognition method as shown in FIG. 5 can also be used to improve the speed of image recognition.

Referring to FIG. 5, it is assumed that after the coprocessor recognizes the category and location area of the object included in the image to be identified corresponding to the image of the N-1th frame, the identified object of the second preset category is also needed. Multiple attributes are identified.

For example, after the second preset category is a car, after identifying the object as the vehicle and the location area of the vehicle, it is also necessary to identify various attributes such as the color and the model of the object of the category. However, for objects that are not in the second preset category (for example, the category is a person), only the color attributes need to be identified. Then, you can use the car type attribute as the first type attribute and the color attribute as the second type attribute.

In the attribute recognition process, each object whose category is a car (ie, an object whose category is the second preset category) can be regarded as a first object. And, the identified location area of each first object of the second preset category is sent to the CPU, so that the CPU inputs the image block corresponding to the location area of each first object to: the first The attribute corresponding to the object identifies the first type of attribute recognition neural network in the neural network (ie, the attribute recognition neural network for identifying the vehicle type of the vehicle), and obtains the first type of attribute of each first object. In this way, when the attribute recognition pressure of the coprocessor is large, a part of the attribute recognition task can be sent to the CPU for processing, which avoids the situation that the coprocessor calculation pressure is large and the CPU waits, which improves the image recognition speed. .

And for the coprocessor, the coprocessor may further input the image block corresponding to the location area of each first object of the second preset category to: the attribute of the first object corresponding to the identifier in the neural network The second type of attribute identifies the neural network (ie, the attribute recognition neural network used to identify the color of the car) to obtain a second type of attribute for each first object. Meanwhile, the coprocessor may also treat each object whose category is not the vehicle (ie, the object whose category is not the second preset category) as a second object, and the second object whose category is not the second preset category. The image block corresponding to the location area of the object (for example, the object of the category) is input to: the second type of attribute recognition neural network in the attribute recognition neural network corresponding to the second object (ie, the attribute recognition nerve for identifying the color of the human hair) Network), get the second type of property for each second object.

Then, the coprocessor can send the identified second type of attributes and categories of each first object, and the second type of attributes and categories of each second object to the CPU, so that the CPU will each of the first objects The first type attribute, the second type attribute and the category, and the second type attribute and category of each second object are summarized to obtain an image recognition result of the image to be recognized.

For example, the first type of attribute recognition neural network and the second type of attribute recognition neural network can be trained according to actual needs. Illustratively, the first type of attribute recognition neural network may include a first number of attribute recognition neural networks, each of the first number of attribute recognition neural network identifiers used to identify different attributes of the same category object . The second type of attribute recognition neural network may include a second number of attribute recognition neural networks that identify each attribute in the neural network to identify different attributes of the same category object. And the attribute identified by the first type of attribute recognition neural network is not the same as the attribute identified by the second type of attribute recognition neural network.

In addition, the second preset category may also be set according to actual conditions, and is not limited herein.

In summary, the embodiment of the present application can identify the location area, category, and attributes of the object included in the image, and can reduce the calculation pressure of the CPU, and can improve the image recognition effect and the image recognition speed.

Corresponding to the above method embodiment, the embodiment of the present application further provides an electronic device, as shown in FIG. 6, the electronic device 600 includes a coprocessor 601 and a central processing unit CPU 602;

The CPU 602 is configured to send the image to be identified to the coprocessor 601.

The coprocessor 601 is configured to receive an image to be identified sent by the CPU 602.

The coprocessor 601 is further configured to input the image to be recognized to the pre-constructed content recognition neural network to obtain a content recognition result, where the content recognition result includes: a category and a location area of the object included in the image to be identified;

The coprocessor 601 is further configured to input the obtained image block corresponding to each location area to a pre-built attribute recognition neural network to obtain an attribute of each object;

The coprocessor 601 is further configured to send the obtained category and attribute of each object to the CPU 602;

The CPU 602 is further configured to receive the category and attribute of each object sent by the coprocessor 601, and use the category and attribute of the received object as the image recognition result of the image to be identified.

In the embodiment of the present application, the coprocessor in the electronic device may receive the image to be recognized sent by the CPU in the electronic device, and input the image to be recognized into the pre-built content recognition neural network, thereby obtaining the to-be-identified image. Identify the category and location area of the object contained in the image. Then, the coprocessor inputs the obtained image blocks corresponding to each location area into the pre-built attribute recognition neural network, so that the attributes of each object can be obtained. Further, the coprocessor can send the obtained category and attribute of each object to the CPU, so that the CPU can use the type and attribute of the received object as the image recognition result of the image to be recognized. In this manner, the coprocessor can identify the type and attribute of the object included in the image to be identified by using the content recognition neural network and the attribute recognition neural network, and share the calculation pressure of the CPU to recognize the image, thereby Reduced the computational pressure of the CPU.

Optionally, the coprocessor 601 can be specifically configured to:

More specifically, the coprocessor 601 can be specifically used to:

Dividing the objects included in the image to be identified into two groups, obtaining a first group of objects and a second group of objects; and correspondingly determining a location area of each object in the first group of objects based on a position of each object in the first group of objects The image block is input to: the attribute corresponding to the object identifies the neural network, and obtains the attribute of each object in the first group of objects; and sends the location area of each object in the second group of objects to the CPU;

Correspondingly, the CPU 602 may be configured to: input an image block corresponding to a location area of each object in the second group of objects to: an attribute identifying the neural network corresponding to the object, and obtain an attribute of each object in the second group of objects; The category and attribute of each object in the first group of objects, and the category and attribute of each object in the second group of objects, as the image recognition result of the image to be recognized.

Optionally, the coprocessor 601 can be specifically configured to:

Transmitting, to the CPU, a location area of each of the objects of the second preset category among the objects included in the image to be identified; inputting each of the first objects to: an attribute in the neural network corresponding to the first object The second type of attribute identifies the neural network, and obtains a second type of attribute of each of the first objects; and inputs an image block corresponding to the position area of each second object that is not the second preset category among the objects included in the image to be identified. To: the second object of the second object identifies the second type of attribute in the neural network to identify the neural network, and obtains the second type of attribute of each second object; the second type of attribute and category of each first object, and each The second attribute and category of the second object are sent to the CPU;

Correspondingly, the CPU 602 is specifically configured to: input an image block corresponding to the location area of each first object to: the first type of attribute recognition neural network in the attribute recognition neural network corresponding to the first object, and obtain each first The first type of attributes of the object; the first type attribute, the second type attribute and the category of each first object, and the second type attribute and category of each second object are used as image recognition results of the image to be recognized.

Optionally, the coprocessor 601 can be specifically configured to:

Sending the obtained location area, category, and attribute of each object to the CPU;

Correspondingly, the CPU 602 may be specifically configured to: use the location area, the category, and the attribute of the received object as the image recognition result of the image to be identified.

The coprocessor 601 is further configured to: input the image block corresponding to each of the obtained location areas to the pre-built attribute recognition neural network, and determine whether the obtained confidence level is greater than a preset threshold before obtaining the attribute of each object; If yes, the object corresponding to the confidence level greater than the preset threshold is used as the filtered object; the image block corresponding to the selected location area of each object is sent to the pre-built attribute recognition neural network for attribute recognition, and after screening The properties of each object; the selected categories and attributes of each object after filtering are sent to the CPU.

Optionally, the coprocessor 601 includes at least one of a graphics processor GPU, a digital signal processor DSP, and a field programmable gate array processor FPGA.

Optionally, in the embodiment of the present application, the coprocessor 601 may be specifically configured to:

Optionally, in the embodiment of the present application, the CPU 602 is further configured to:

Corresponding to the above method embodiment, the embodiment of the present application further provides a readable storage medium, which is a storage medium in an electronic device including a coprocessor and a central processing unit CPU, and the readable storage medium A computer program is stored, and when the computer program is executed by the coprocessor, the following steps are implemented:

By applying the embodiments of the present application, the categories and attributes of the objects included in the image can be accurately identified, and the calculation pressure of the CPU can be reduced, and the image recognition effect and the image recognition speed are improved.

Corresponding to the above method embodiment, the embodiment of the present application further provides an application that, when running on an electronic device including a coprocessor and a central processing unit CPU, causes the coprocessor to execute:

It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

The various embodiments in the present specification are described in a related manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for an electronic device embodiment and a readable storage medium embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The above description is only the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the scope of the present application.

Claims

An image recognition method is characterized in that it is applied to a coprocessor in an electronic device, and the electronic device further includes a central processing unit CPU, and the method includes:

Receiving an image to be recognized transmitted by the CPU;

Inputting the to-be-identified image into a pre-constructed content recognition neural network, and obtaining a content recognition result, where the content recognition result includes: a category and a location area of the object included in the image to be recognized;

Inputting the obtained image blocks corresponding to each location area into a pre-built attribute recognition neural network to obtain attributes of each object;

Sending the obtained category and attribute of each object to the CPU, so that the CPU uses the category and attribute of the received object as the image recognition result of the image to be recognized.
The method according to claim 1, wherein the step of inputting the obtained image block corresponding to each location area to a pre-built attribute recognition neural network to obtain attributes of each object comprises:

Determining an attribute recognition neural network corresponding to each object according to a preset mapping relationship and a category of each object included in the image to be identified; wherein the preset mapping relationship comprises: a preset category and a pre-built Attributes identify the correspondence between neural networks;

The obtained image block corresponding to each position area is input to: the attribute corresponding to the object identifies the neural network, and the attribute of each object is obtained.
The method according to claim 2, wherein the step of inputting the image block corresponding to each location area to the attribute recognition neural network corresponding to the object, and obtaining the attribute of each object comprises:

Dividing the objects included in the image to be identified into two groups, and obtaining a first group of objects and a second group of objects;

Inputting an image block corresponding to a location area of each object in the first group of objects to: an attribute recognition neural network corresponding to the object, and obtaining an attribute of each object in the first group of objects;

Sending a location area of each of the second group of objects to the CPU, so that the CPU inputs an image block corresponding to a location area of each object of the second group of objects to: corresponding to the object Attribute recognition neural network, obtaining attributes of each object in the second set of objects;

The step of sending the obtained category and attribute of each object to the CPU includes:

Transmitting a category and an attribute of each of the first set of objects, a category of each of the second set of objects to the CPU, such that the CPU will each of the first set of objects The category and attribute, and the category and attribute of each object in the second group of objects, as the image recognition result of the image to be recognized.
The method according to claim 3, wherein the step of dividing the objects included in the image to be identified into two groups, and obtaining the first group of objects and the second group of objects comprises:

Selecting, from the objects included in the image to be identified, a preset number of objects as the first group of objects, and the remaining objects as the second group of objects;

or,

The object of the first preset category in the image to be identified is used as the first group of objects, and the object that is not the first preset category in the image to be identified is used as the second group of objects.
The method according to claim 2, wherein the step of inputting the image block corresponding to each location area to the attribute recognition neural network corresponding to the object, and obtaining the attribute of each object comprises:

Transmitting, to the CPU, a location area of each of the objects of the second preset category among the objects included in the image to be identified, so that the CPU blocks the image block corresponding to the location area of each first object Inputting: the attribute corresponding to the first object identifies a first type of attribute recognition neural network in the neural network, and obtains a first type attribute of each first object;

Inputting an image block corresponding to the location area of each of the first objects to: a second type of attribute recognition neural network in the attribute recognition neural network corresponding to the first object, and obtaining a second type attribute of each first object;

Inputting, in the object included in the image to be recognized, an image block corresponding to a location area of each second object of the second preset category to: an attribute in the attribute recognition neural network corresponding to the second object The second type of attribute identifies the neural network and obtains a second type of attribute of each second object;

The step of sending the obtained category and attribute of each object to the CPU includes:

Sending a second type of attribute and category of each first object, and a second attribute and category of each second object to the CPU, such that the CPU will first class attribute of each first object, The second type of attributes and categories, and the second type of attributes and categories of each second object, serve as image recognition results for the image to be identified.
The method according to any one of claims 1 to 5, wherein the class and attribute of each object obtained are sent to the CPU such that the CPU will receive the type of the object and An attribute, as a result of the image recognition result of the image to be identified, comprising:

Sending the obtained location area, category, and attribute of each object to the CPU, so that the CPU uses the location area, category, and attribute of the received object as the image recognition result of the image to be recognized.
The method according to any one of claims 1 to 5, wherein the content recognition neural network is further configured to identify a confidence level corresponding to a category of the object included in the image; the content recognition result further includes: a confidence level corresponding to a category of an object included in the image;

Before the image blocks corresponding to each of the obtained location regions are input to the pre-built attribute recognition neural network to obtain the attributes of each object, the method further includes:

Determining whether the obtained confidence is greater than a preset threshold;

If yes, the object corresponding to the confidence level greater than the preset threshold is used as the filtered object;

The step of inputting the obtained image block corresponding to each location area to the pre-built attribute recognition neural network to obtain the attributes of each object includes:

Sending the image block corresponding to the selected location area of each object to the pre-built attribute recognition neural network for attribute recognition, and obtaining the attribute of each object after the screening;

The step of sending the obtained category and attribute of each object to the CPU includes:

The obtained filtered categories and attributes of each object are sent to the CPU.
The method of claim 1 wherein the coprocessor comprises at least one of a graphics processor GPU, a digital signal processor DSP, and a field programmable gate array processor FPGA.
The method according to claim 1, wherein the step of inputting the obtained image block corresponding to each location area to a pre-built attribute recognition neural network to obtain attributes of each object comprises:

And scaling the image block corresponding to each of the obtained location areas;

Each image block obtained after the scaling process is input to a pre-built attribute recognition neural network to obtain attributes of each object.
The method according to claim 1, wherein the image to be identified is obtained by the CPU performing image format conversion and scaling processing on the original image.
An electronic device, comprising: a coprocessor and a central processing unit CPU;

The CPU is configured to send an image to be identified to the coprocessor;

The coprocessor is configured to receive an image to be identified sent by the CPU;

The coprocessor is further configured to input the image to be recognized to a pre-constructed content recognition neural network, and obtain a content recognition result, where the content recognition result includes: a category of the object included in the image to be recognized Location area

The coprocessor is further configured to input the obtained image block corresponding to each location area to a pre-built attribute recognition neural network to obtain an attribute of each object;

The coprocessor is further configured to send the obtained category and attribute of each object to the CPU;

The CPU is further configured to: receive a category and an attribute of each object sent by the coprocessor, and use a category and an attribute of the received object as an image recognition result of the image to be identified.
The electronic device according to claim 11, wherein the coprocessor is specifically configured to:

Determining an attribute recognition neural network corresponding to each object according to a preset mapping relationship and a category of each object included in the image to be identified; wherein the preset mapping relationship comprises: a preset category and a pre-built Attributes identify the correspondence between neural networks;

The obtained image block corresponding to each position area is input to: the attribute corresponding to the object identifies the neural network, and the attribute of each object is obtained.
The electronic device according to claim 12, wherein the coprocessor is specifically configured to:

Dividing the objects included in the image to be identified into two groups, and obtaining a first group of objects and a second group of objects;

Inputting an image block corresponding to a location area of each object in the first group of objects to: an attribute recognition neural network corresponding to the object, and obtaining an attribute of each object in the first group of objects;

Sending a location area of each of the second set of objects to the CPU to send a category and an attribute of each of the first set of objects, and a category of each of the second set of objects to the CPU

The CPU is specifically configured to: input an image block corresponding to a location area of each object in the second group of objects to: an attribute recognition neural network corresponding to the object, and obtain an attribute of each object in the second group of objects And selecting, as the image recognition result of the image to be recognized, a category and an attribute of each object in the first group of objects, and a category and an attribute of each object in the second group of objects.
The electronic device according to claim 13, wherein the coprocessor is specifically configured to:

Selecting, from the objects included in the image to be identified, a preset number of objects as the first group of objects, and the remaining objects as the second group of objects;

Alternatively, the object of the first preset category in the image to be identified is used as the first group of objects, and the object that is not the first preset category in the image to be identified is used as the second group of objects.
The electronic device according to claim 12, wherein the coprocessor is specifically configured to:

Transmitting, to the CPU, a location area corresponding to a location area of each of the first objects, to the CPU, the location area of each of the objects included in the image to be identified The attribute matching the first object identifies a second type of attribute in the neural network to identify the neural network, and obtains a second type of attribute of each first object;

Inputting, in the object included in the image to be recognized, an image block corresponding to a location area of each second object of the second preset category to: an attribute in the attribute recognition neural network corresponding to the second object The second type of attribute identifies the neural network and obtains a second type of attribute of each second object;

Sending a second type of attribute and category of each first object, and a second attribute and category of each second object to the CPU;

The CPU is specifically configured to: input an image block corresponding to a location area of each first object to: a first type of attribute recognition neural network in the attribute recognition neural network corresponding to the first object, and obtain a first object The first type of attribute;

The first type attribute, the second type attribute and the category of each first object, and the second type attribute and category of each second object are used as image recognition results of the image to be recognized.
The electronic device according to any one of claims 11 to 15, wherein the coprocessor is specifically configured to: send the obtained location area, category and attribute of each object to the CPU;

The CPU is specifically configured to: use a location area, a category, and an attribute of the received object as an image recognition result of the image to be identified.
The electronic device according to any one of claims 11 to 15, wherein the content recognition neural network is further configured to identify a confidence level corresponding to a category of the object included in the image; the content recognition result further includes : a confidence level corresponding to a category of an object included in the image;

The coprocessor is further configured to: input the image block corresponding to each of the obtained location areas to the pre-built attribute recognition neural network, and determine whether the obtained confidence level is greater than a preset threshold before obtaining the attribute of each object; If yes, the object corresponding to the confidence level of the preset threshold is used as the filtered object; the image block corresponding to the selected location area of each object is sent to the pre-built attribute recognition neural network for attribute recognition, and The attributes of each object after filtering; the obtained categories and attributes of each object after filtering are sent to the CPU.
The electronic device of claim 11, wherein the coprocessor comprises at least one of a graphics processor GPU, a digital signal processor DSP, and a field programmable gate array processor FPGA.
The electronic device according to claim 11, wherein the coprocessor is specifically configured to:

And scaling the image block corresponding to each of the obtained location areas;

Each image block obtained after the scaling process is input to a pre-built attribute recognition neural network to obtain attributes of each object.
The electronic device according to claim 11, wherein the CPU is further configured to:

Image format conversion and scaling processing is performed on the original image to obtain an image to be recognized.
A readable storage medium, characterized in that the readable storage medium is a storage medium in an electronic device including a coprocessor and a central processing unit CPU, and the readable storage medium stores therein a computer program, the computer The image recognition method according to any one of claims 1 to 10 is implemented when the program is executed by a coprocessor.
An application characterized by causing the coprocessor to perform the image recognition of any one of claims 1-10 when it is run on an electronic device comprising a coprocessor and a central processing unit CPU method.