CN114495039A

CN114495039A - Object identification method and device, electronic equipment and storage medium

Info

Publication number: CN114495039A
Application number: CN202210072098.6A
Authority: CN
Inventors: 陈杜煜
Original assignee: Guangzhou Xiaopeng Autopilot Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-05-13

Abstract

The application provides an object identification method, an object identification device, an electronic device and a storage medium, wherein the object identification method comprises the following steps: acquiring a target image to be identified; determining an identification task corresponding to the target image, wherein the identification task is used for indicating an object to be identified; performing feature extraction on the target image by using a first neural network to obtain a first feature map of the target image; and carrying out object recognition on the first feature map by a target second neural network to obtain a recognition result of the target image corresponding to the recognition task, wherein the target second neural network is the second neural network corresponding to the size of the first feature map and the recognition task. By the method and the device, the number of the constructed and trained neural network models can be reduced, the time for training the neural network models is shortened, the utilization rate of computing resources is improved, and the application range of the neural network models is widened.

Description

Object identification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to an object identification method, apparatus, electronic device, and storage medium.

Background

With the development of computer vision technology and the development of automatic driving technology, a plurality of cameras are generally used to identify the environment and objects around the vehicle, and in the related art, images are generally input into a neural network model, and the neural network model outputs the identification result. In the related art, the neural network model for object recognition is not only related to the recognition task but also requires the input size, and therefore, it is necessary to construct a corresponding neural network model for each recognition task and each input size in advance. Therefore, under the scene that a plurality of recognition tasks are provided and different sizes exist, more neural network models need to be built and trained, and the time for training the neural network models is longer.

Disclosure of Invention

In view of the above, embodiments of the present application provide an object identification method, an object identification apparatus, an electronic device, and a storage medium to solve the above problem.

According to an aspect of an embodiment of the present application, there is provided an object recognition method including: acquiring a target image to be identified; determining an identification task corresponding to the target image, wherein the identification task is used for indicating an object to be identified; performing feature extraction on the target image by using a first neural network to obtain a first feature map of the target image; and performing object recognition on the first feature map by a target second neural network to obtain a recognition result of the target image corresponding to the recognition task, wherein the target second neural network refers to the second neural network corresponding to the size of the first feature map and the recognition task.

According to an aspect of an embodiment of the present application, there is provided an object recognition apparatus including: the acquisition module is used for acquiring a target image to be identified; the determining module is used for determining a recognition task corresponding to the target image, and the recognition task is used for indicating an object to be recognized; the first feature extraction module is used for performing feature extraction on the target image through a first neural network to obtain a first feature map of the target image; and the first recognition module is used for carrying out object recognition on the first feature map by a target second neural network to obtain a recognition result of the target image corresponding to the recognition task, wherein the target second neural network refers to the second neural network corresponding to the size of the first feature map and the recognition task.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the object recognition method as described above.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor, implement an object recognition method as described above.

In the scheme of the application, a neural network model for object recognition is split, namely the neural network model is split into a first neural network which has no requirement on the input size and has no association relation with a recognition task, and a second neural network which has the requirement on the input size and is related to the recognition task, wherein the first neural network is used for carrying out feature extraction on an input image, and the second neural network is used for carrying out object recognition based on a feature map of the image and outputting a recognition result. On the basis, inputting a target image to be recognized into a first neural network for feature extraction to obtain a first feature map; and determining a corresponding target second neural network according to the recognition task of the target image and the size of the first feature map, and then performing object recognition by the target second neural network according to the first feature map to obtain a recognition result of the target image corresponding to the recognition task.

In the scheme of the application, the first neural network has no incidence relation with the recognition tasks and has no requirement on the input size, so that the method can be suitable for images of different recognition tasks and images of various sizes; the part which has size requirement and is related to the recognition task is taken as the second neural network, so that, in the applied scheme, for the images with different sizes and different recognition tasks, only the relatively adaptive second neural network needs to be constructed in advance, and the images with different sizes and different recognition tasks can multiplex the first neural network.

The scheme of the application realizes that the neural network used for feature extraction and the neural network used for object recognition are used in a split mode, only the neural network used for object recognition is associated with the recognition task, and the input size is required, so that the neural network used for feature extraction can be suitable for images of different sizes and different recognition tasks, the number of built and trained neural network models is reduced, the time for training the neural network models is shortened, meanwhile, the utilization rate of computing resources is improved, and the application range of the neural network models is further enlarged.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a flowchart illustrating an object recognition method according to an embodiment of the present application.

Fig. 2 is a flowchart illustrating specific steps of step 120 according to an embodiment of the present application.

Fig. 3 is a flowchart illustrating specific steps of step 140 according to an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating a display of a detection result of a reachable space around a vehicle according to an embodiment of the present application.

Fig. 5 is a flow diagram illustrating object recognition of a target image according to another embodiment of the application.

FIG. 6 is a flowchart illustrating specific steps prior to step 140 according to one embodiment of the present application.

Fig. 7 is a flowchart illustrating specific steps following step 620 according to an embodiment of the present application.

Fig. 8 is a flow diagram illustrating object recognition of a target image according to another embodiment of the present application.

Fig. 9 is a flow diagram illustrating object recognition of a target image according to another embodiment of the present application.

Fig. 10 is a block diagram illustrating an object recognition apparatus according to an embodiment of the present application.

Fig. 11 is a hardware block diagram of an electronic device shown in accordance with an exemplary embodiment of the present application.

While specific embodiments of the invention have been shown by way of example in the drawings and will be described in detail hereinafter, such drawings and description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by way of specific embodiments.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that an embodiment of the application can be practiced without one or more of the specific details, or with other methods, apparatus, steps, etc. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means. The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In the perception task of automatic driving, it is often necessary to recognize the environment and objects around the vehicle using a plurality of cameras. In the prior art, due to the fact that pixel imaging effects of different cameras are different and the technology is limited, images collected by one camera can be recognized only by using a fixed neural network, and therefore the utilization rate of computing resources is low.

Therefore, in order to overcome the above-mentioned drawbacks, the present application provides an object recognition method, in which a first neural network for extracting shallow features is used in common to obtain a first feature map, and a target second neural network for object recognition in a model set is determined according to a recognition task of each image and a size of the first feature map, so that shallow features of each target image to be recognized can be extracted through the first neural network corresponding to different recognition tasks, thereby improving utilization rate of computing resources.

Fig. 1 is a flowchart illustrating an object recognition method according to an embodiment of the present application, which may be performed by an electronic device with processing capability, such as a server, a cloud server, a vehicle-mounted terminal, and the like, and is not limited in detail herein. As shown in fig. 1, the method specifically includes the following steps:

step 110, a target image to be identified is obtained.

The target image refers to an image to be recognized, and the target image is acquired by an image acquisition device, for example, if the image acquisition device is installed on a vehicle, the target image may be an image acquired by the image acquisition device on the vehicle. In a specific embodiment, a plurality of image capturing devices may be installed on the vehicle, for example, a left front view camera for capturing an image of the front left side of the vehicle, a right front view camera for capturing an image of the front right side of the vehicle, a front view camera for capturing an image of the front rear side of the vehicle, and the like, and an image captured by any one camera on the vehicle and required to be identified may be used as a target image in the present application.

Step 120, determining an identification task corresponding to the target image, wherein the identification task is used for indicating an object to be identified.

The recognition task may be set according to an actual application scenario, for example, in the field of smart driving, the recognition task may be a task of recognizing a pedestrian, a task of recognizing a lane line, a task of recognizing a fixed obstacle, a task of recognizing a vehicle line, a task of recognizing a traffic signal, a task of recognizing a reachable space of a vehicle (i.e., a position where the vehicle can travel, a driving space, DS), or the like. The recognition task indicates an object to be recognized, for example, the object to be recognized corresponding to the task of recognizing a pedestrian is a pedestrian, and the object to be recognized corresponding to the task of recognizing a lane line is a lane line.

In some embodiments, before step 120, the target image may be marked with a recognition task identifier, for example, a recognition task identifier is marked, where one recognition task identifier is used to indicate a recognition task, and after the target image is acquired, the recognition task corresponding to the target image may be determined according to the recognition task identifier marked for the target image.

In other embodiments, the identification task corresponding to each image capturing device may be set, so that after the target image is acquired, the image capturing device from which the target image originates is determined, and the identification task corresponding to the image capturing device from which the target image originates is used as the identification task corresponding to the target image.

In other embodiments, as shown in FIG. 2, step 120 includes:

step 210, obtaining the installation position information of the image acquisition device from which the target image comes.

The image acquisition device can be an on-vehicle camera, a camera, or other equipment (such as a vehicle event recorder) with an integrated image acquisition function.

The installation position information is used to indicate the installation position of the corresponding image capturing device, for example, on a vehicle, the image capturing device may be an on-vehicle camera, and the on-vehicle camera includes but is not limited to: the corresponding installation position information of each vehicle-carrying camera is used for indicating the installation position on the vehicle.

Step 220, based on the corresponding relationship between the installation position and the recognition task, determining the recognition task corresponding to the installation position indicated by the installation position information as the recognition task corresponding to the target image.

The corresponding relationship between the installation position and the identification task may be preset, wherein one installation position may correspond to one identification task or to multiple identification tasks, and is not specifically limited herein, and may be specifically set according to actual needs.

Specifically, the corresponding relationship between the installation location and the recognition task may be: the image acquisition device is arranged in front of the vehicle, and the recognition task of the image acquisition device can be to recognize traffic lights in the target image, recognize obstacles in the target image and the like; the image acquisition device is arranged on the left side of the vehicle, and the identification task of the image acquisition device can be to identify lane lines around the vehicle; the specific corresponding relationship between the installation position and the recognition task can be set according to actual needs, and is not specifically limited herein.

In other embodiments, different identifiers may be prepared in advance for image capturing devices with different mounting positions, a mapping relationship is established between each identifier and a corresponding recognition task, and after the mounting position information of the image capturing device from which the target image originates is determined, the recognition task corresponding to the image captured by the image capturing device may be determined according to the identifier of the image capturing device.

Referring to fig. 1, in step 130, feature extraction is performed on the target image by the first neural network to obtain a first feature map of the target image.

The first feature map is a feature map output after feature extraction is performed on the target image by the first neural network.

In some embodiments, the first neural network may include one or more networks for extracting features of the target image, such as a convolutional neural network, a depth residual shrinking network, a forward feedback network, a hopfel network, an inverse image network, an AlexNet network, an vgg network, a googlenet network, a rescet network, and the like. Specifically, in the embodiment of the present application, a resnet network is used to perform feature extraction on a target image. The inside of the resnet network is a pure convolution network structure and is characterized in that no requirement is imposed on the input size. For example, the input of the first neural network is a picture of H × W (where H is the length of the picture and W is the width of the picture), and the output is a feature map of C × H × W, where C is the number of channels (i.e., the depth of the image).

Specifically, the feature extraction is performed on the target image, which may be performed by extracting a shallow feature of the target image, for example, a low-level visual feature of the image, including texture, color, shape, and the like. In the embodiments of the present application, the first neural network may be used to extract feature points, feature lines, object contours, and the like in the target image.

In some embodiments, since the target image captured by the image capturing device may have problems such as exposure and noise, which may result in poor quality of the target image, the target image may be pre-processed in advance to ensure accuracy of the recognition result. The preprocessing of the target image may include denoising the target image, white balance processing of the target image, and clipping the target image. Compared with the target image before preprocessing, the preprocessed target image has less noise, more proper white balance proportion, higher definition and higher image quality.

And 140, performing object recognition on the first feature map by a target second neural network to obtain a recognition result of the target image corresponding to the recognition task, wherein the target second neural network is the second neural network corresponding to the size of the first feature map and the recognition task.

In the scheme of the application, a plurality of second neural networks can be preset according to the recognition task and the input size requirement, and the second neural networks are trained so as to be convenient for subsequent online application.

After the identification task and the size of the first feature map corresponding to the target image are determined, the corresponding second neural network can be determined by combining the identification task and the size of the first feature map, namely the target second neural network.

In some embodiments, the second neural network may be a fully-connected network for classification, which may include one or more fully-connected layers.

In other embodiments, the second neural network may include a second feature extraction network and a fully-connected network, which are cascaded, wherein the second feature extraction network is used to perform deep feature (or deep feature) extraction on the first feature map output by the first neural network, and the second feature extraction network may be a convolutional neural network, a cyclic neural network, a long-term memory neural network, or the like; the full-connection network is used for classifying the feature map output by the second feature extraction network and outputting a classification recognition result.

For fully connected networks, different fully connected networks have different size requirements for the input, and a fully connected network of one configuration fits an input image of one size, so for a second neural network comprising a fully connected network it is also size-required for the input, i.e. the second neural network has a required input size for indicating the size of the input required by the second neural network, wherein the required input size for the second neural network is determined by the structure of the fully connected network in the second neural network.

The recognition result may be used to indicate whether an object indicated by the corresponding recognition task exists in the target image, and if it is determined that the object exists, the recognition result may further indicate a position of the object indicated by the recognition task in the target image.

FIG. 3 is a flow chart showing specific steps of step 140, in some embodiments a target second neural network comprising a cascade of a target convolutional neural network and a fully-connected layer, according to an embodiment of the present application; step 140 comprises:

and 310, performing deep feature extraction on the first feature map by using the target convolutional neural network to obtain a second feature map.

The target convolutional neural network refers to a convolutional neural network in the target second neural network, and in particular, the target convolutional neural network in the target second neural network may have multiple layers, which is not limited herein. In this embodiment, although both the target convolutional neural network and the first neural network are used for feature extraction, the first neural network is used for extracting shallow features, which is equivalent to the extraction of coarse-grained features, and the target convolutional neural network is used for extracting deep features, which is the extraction of fine-grained features.

And step 320, classifying the second feature map by the full connection layer to obtain an identification result of the target image corresponding to the identification task.

In some embodiments, the second neural network may set N classes, and the parameter of the last fully-connected layer is also set to N (i.e., how many neurons are), so that the fully-connected layer may output a 1 × N multidimensional vector, each dimension corresponds to a class, after the softmax activation function, the number of each dimension in the multidimensional vector may be normalized to a range of 0 to 1, the data of each dimension may represent how many the probability of the second feature map for the corresponding class is, and then the class with the highest probability value is output, and it is understood that the output is the recognition result of the target image corresponding to the recognition task.

In the field of intelligent driving, the recognition result can be sent to the vehicle, the recognition result can be displayed by the vehicle, and further, the vehicle can be combined with the recognition result to make a driving decision, such as determining whether to adjust the driving direction, the driving speed, whether to stop the vehicle, and the like.

FIG. 4 is a diagram illustrating results of identifying a vehicle reachable space according to an embodiment of the present application. The dashed black lines in fig. 4 indicate the boundary lines of the space accessible by the vehicle (again, the boundary lines of obstacles).

Assuming that the target image is an image with the size of H x Wmm, wherein H is the length of the target image, W is the width of the target image, shallow feature extraction is firstly carried out on the image by a first neural network, such as a feature line, the first neural network outputs a first feature map with the size of H x Wmm aiming at the target image, then a target second neural network corresponding to the identification task of the target image and the size of the first feature map is determined, the reachable space of a vehicle is identified by the target second neural network according to the first feature map corresponding to the target image, and the target convolution neural network in the target second neural network extracts the features of a boundary line based on the first feature map of the target image to obtain a second feature map; and then classifying the fully-connected layer according to the second feature map, and outputting a recognition result, wherein the recognition result can indicate the coordinates of W points of the boundary line of the vehicle reachable space in the target image.

Further, the server may send the coordinates of the W points to the vehicle end, the vehicle end marks the target image according to the coordinates of the W points and displays the target image on a screen of the vehicle end, and optionally, the server may also send the coordinates of the W points to an electronic device with a display function, such as a smart phone, a smart watch, a tablet computer, or the like, which is in communication connection with the server, where no specific limitation is made herein; the communication connection may be the same local area network, bluetooth, ZigBee (ZigBee) network, etc., and is not limited in this regard.

Fig. 5 is a flowchart illustrating an object recognition method according to another embodiment of the application, and as shown in fig. 5, the target images to be recognized include a first target image 510 with a size of 400 × 300mm and a second target image 520 with a size of 800 × 600mm, and feature extraction is performed on the first target image 510 and the second target image 520 by a backbone network (i.e., a first neural network) 530 respectively, and corresponding first feature maps are output. In this embodiment, the backbone is a Resnet network.

The backbone has no requirement on the size of the input target image, and the size of the input target image is the same as the size of the output feature map. Thus, the backbone may output a first feature map of size 400 x 300mm for the first target image 510 and a first feature map of size 800 x 600mm for the second target image 520.

Then, the recognition task corresponding to the first target image 510 and the recognition task corresponding to the second target image 520 are determined according to the installation information of the image capturing apparatus from which the first target image 510 and the second target image 520 are derived.

In the present embodiment, it is assumed that a second neural network (e.g., head α 540 in fig. 5) for identifying the subject I and a second neural network (e.g., head β 550 in fig. 5) for identifying the subject II are provided, where the input requirement size of the head α 540 is 400 × 300mm and the input requirement size of the head β 550 is 800 × 600 mm.

If the recognition task of the first target image 510 includes a task for recognizing the object I (assumed as the first recognition task for convenience of description), and the first feature map of the first target image 510 satisfies the input size requirement of the head α 540, the first feature map of the first target image 510 may be input into the head α 540, and the recognition result corresponding to the first recognition task may be output by the head α 540 based on the first feature map of the first target image 510.

If the recognition task of the second target image 520 includes a task for recognizing the object II (assumed as the second recognition task for convenience of description), and the first feature map of the second target image 520 satisfies the input size requirement of the head β 550, the first feature map of the second target image 520 may be input into the head β 550, and the recognition result corresponding to the second recognition task may be output by the head β 550 based on the first feature map of the second target image 520.

The scheme of the application realizes that the neural network used for feature extraction and the neural network used for object recognition are used in a split mode, only the neural network used for object recognition is associated with the recognition task, and the input size is required, so that the neural network used for feature extraction can be suitable for images of different sizes and different recognition tasks, the number of built and trained neural network models is reduced, the time for training the neural network models is shortened, meanwhile, the utilization rate of computing resources is improved, and the application range of the neural network models is further increased.

In some embodiments, as shown in fig. 6, prior to step 140, the method further comprises:

step 610, obtaining identification object information corresponding to each second neural network in the model set, wherein the identification object information is used for indicating an object which the corresponding second neural network uses for identification.

The identification object information is used to indicate the object for identification by the corresponding second neural network, that is, the identification object information may be a main feature characterizing the object for identification by the second neural network, for example, when the identification object is a traffic light, the identification object information may be color, shape, position information in a target image, and the like of the traffic light.

In other embodiments, different identifiers may be further set for different identification objects, the identification object information may be an identifier set for an identification object, and when the identifier set for the identification object corresponding to each second neural network in the model set is obtained, an object used for identification by the identifier corresponding to the second neural network may be determined.

And step 620, matching the identification task with the identification object information corresponding to each second neural network, and determining candidate second neural networks, of which the objects indicated by the corresponding identification object information are matched with the objects indicated by the identification task, in the model set.

In some embodiments, when the object indicated by a recognition task is the same as the object indicated by the recognition object information corresponding to a second neural network in the model set, the object indicated by the recognition task may be determined to match the object indicated by the recognition object information corresponding to the second neural network, thereby determining the second neural network as a candidate second neural network.

In other embodiments, when the object indicated by the identification object information corresponding to a second neural network includes an object indicated by an identification task, it may be determined that the object indicated by the identification task matches the object indicated by the identification object information corresponding to the second neural network.

Step 630, matching the size of the first feature map with the required input size corresponding to each candidate second neural network, and determining the candidate second neural networks with the corresponding required input sizes identical to the size of the first feature map; and taking the candidate second neural network with the determined requirement input size being the same as that of the first feature map as the target second neural network.

In this embodiment, the identification task is matched with the identification object information corresponding to each second neural network, so as to confirm the candidate second neural network in which the identification object indicated by the identification object information in the second neural network is the same as the identification object indicated by the identification task, and then the target second neural network in which the input requirement size is the same as the size of the first feature map in the candidate second neural network is determined, so that the target second neural network can identify the first feature map.

In some embodiments, after step 620, the method further comprises:

step 710, if it is determined that there is no candidate second neural network corresponding to the input requirement size identical to the size of the first feature map, selecting one candidate second neural network as the target second neural network.

And 720, according to the required input size corresponding to the target second neural network, carrying out size adjustment on the first characteristic diagram so as to enable the size of the adjusted first characteristic diagram to be the same as the required input size corresponding to the target second neural network.

In particular embodiments, in order for the target second neural network to be able to identify the first feature map, the size of the first feature map may be adjusted according to an interpolation algorithm. The adjustment may be an enlargement or a reduction, as the case may be. The enlargement or reduction of the image can be performed using interpolation algorithms, which are commonly used in the image field to adjust the image size, from the points in the old image matrix the points in the new image matrix are calculated and inserted, thereby generating images of different sizes.

Alternatively, a nearest neighbor interpolation algorithm, a bilinear interpolation algorithm, or a bicubic interpolation algorithm may be used. In a specific embodiment, the first feature map is resized using a bilinear interpolation algorithm, which is also called a first-order interpolation method, by first calculating the gray level of a sample point according to the coordinates and gray level of four points around the sample point (i.e. one point in the old image matrix), wherein the coordinates of four points in the horizontal x direction are first linearly interpolated (requiring two first-order linear interpolations), and then vertically interpolated_yThe direction performs a first order linear interpolation on the coordinates of the four points (only one first order linear interpolation is needed). For example, coordinates P (x, y) of a sampling point, coordinates of four points around the P point are Q₁₁(x₁，y₁)，Q₁₂(x₁，y₂)，Q₂₁(x₂，y₁)，Q₂₂(x₂，y₂) First, a first-order linear interpolation of the point P in the x direction is calculated to obtain two pointsTemporary point R₁(x，y₁) And R₂(x，y₂) Wherein, the specific calculation method of the first order linear interpolation of the point P in the x direction is as follows:

wherein, f (R)₁) Indicates a temporary point R₁Gray value in x-direction, f (Q)₁₁) Represents point Q₁₁Gray value of f (Q)₂₁) Q representing a point₂₁Grey value, f (R)₂) Indicates a temporary point R₂Gray value in x-direction, f (Q)₁₂) Represents point Q₁₂Gray value of f (Q)₂₂) Q representing a point₂₂A grey value.

Then, the first-order linear interpolation of the point P in the y direction is calculated, and the specific calculation method of the first-order linear interpolation of the point P in the y direction is as follows:

where f (P) represents the gray scale value of point P in the y-direction.

Then, substituting the formula (1) and the formula (2) into the formula (3) to obtain the gray-scale value of the point P, specifically, the gray-scale value of the point P is as follows:

then, a new coordinate of the point P is determined according to the coordinate of the point P in the original image, the size of the original image, and the required size, and specifically, the new coordinate of the point P can be calculated according to the following formula:

srcX ═ dstX × (srcWidth/dstWidth) (equation 4)

srcY ═ dstY × (srcHeight/dstHeight) (equation 5)

Where (dstX, dsY) is the coordinates of point P in the original image, srcpidth is the width of the original image, dstWidth is the width of the new image, srcpighet is the length of the original image, dstHeight is the length of the new image, and (srcX, srcY) is the coordinates of point P in the new image. When the above formulas (4) and (5) are used, the geometric center of the original image does not coincide with the geometric center of the new image, and the pixel points near the boundary line of the original image cannot perform bilinear difference, and in order to solve these problems, the above formulas (4) and (5) are optimized as follows:

srcX ═ (dstX +0.5) × (srcmdth/dstWidth) -0.5 (equation 6)

srcY ═ (dstY +0.5) × (srchheight/dstHeight) -0.5 (equation 7)

And calculating the gray value of each pixel point in the new image and the coordinate of each pixel point in the new image according to the formulas, so as to realize the amplification or reduction of the original image.

And 730, carrying out object recognition on the adjusted first characteristic diagram by the target second neural network to obtain a recognition result of the target object corresponding to the recognition task.

And the adjusted first feature map size meets the required input size of the target second neural network, and the adjusted first feature map is input into the target second neural network, so that the target neural network can be identified according to the first feature map.

In this embodiment, when there is no candidate second neural network having the same required input size as the first feature map, the size of the first feature map is adjusted according to the required input size corresponding to the target second neural network by selecting one candidate second neural network as the target second neural network, so that the size of the first feature map is the same as the required input size corresponding to the target second neural network, and the adjusted first feature map is identified to obtain the identification result of the target image. Therefore, the application range of the neural network model is increased; in the training process, the training can be carried out by combining the images acquired by the image acquisition devices at multiple positions.

In order to ensure the accuracy of the recognition result, the first neural network and the second neural network need to be trained in advance. Specifically, a sample set is constructed in advance, and the sample set comprises a plurality of sample images and label information of the sample images, wherein the label information of the sample images is used for indicating position information of an object indicated by an identification task corresponding to the sample images in the sample images. The sample set comprises sample images with various sizes and sample images corresponding to various identification tasks.

In the training process, each sample image is respectively input into a first neural network, shallow layer features are extracted to obtain a first feature map of each sample image, then a target second neural network corresponding to the size of the first feature map of the sample image and an identification task corresponding to the sample image is determined, and a sample identification result corresponding to the identification task corresponding to the sample image is output by the target second neural network according to the first feature map of the sample image.

And then calculating a loss value of the loss function based on the label information of the sample image and the sample identification result corresponding to the sample image, if the loss value is not converged, reversely adjusting the parameters of the first neural network and the target second neural network, outputting the sample identification result for the sample image again through the first neural network and the target second neural network after the parameters are adjusted, and calculating the loss value of the loss function again until the loss value is converged.

And repeating the process for each sample image, and ending the training of the first neural network and the second neural network when the training ending condition is reached. And then, the first neural network and the second neural network are used for carrying out object identification on line, so that the accuracy of object identification can be ensured.

Fig. 8 is a flowchart illustrating an object recognition method according to another embodiment of the application, and as shown in fig. 8, the target images to be recognized include a first target image 510 with a size of 400 × 300mm and a second target image 520 with a size of 800 × 600mm, and feature extraction is performed on the first target image 510 and the second target image 520 by a backbone α 810 (first neural network), and corresponding first feature maps are output. In the present embodiment, backbone α 810 is a Resnet network.

The backbone α 810 has no requirement for the size of the input target image, and the size of the input target image is the same as the size of the output feature map. Thus, the backbone α 810 may output a feature map having a size of 400 × 300mm for the first target image 510 and a feature map having a size of 800 × 600mm for the second target image 520.

In the present embodiment, it is assumed that a second neural network for identifying the subject I and a second neural network for identifying the subject II are provided, in which the input requirement size of the head α 540 is 400 × 300mm and the input requirement size of the head β 550 is 800 × 600 mm.

If the recognition task of the first target image 510 includes a task for recognizing the object I (assumed as the first recognition task for convenience of description), and the first feature map of the first target image 510 satisfies the input size requirement of the head α 540, the first feature map of the first target image 510 may be directly input into the head α 540, and the recognition result corresponding to the first recognition task may be output by the head α 540 based on the first feature map of the first target image 510.

If the recognition task of the first target image includes a task for recognizing the object II (for convenience of description, it is assumed to be the second recognition task), but the first feature map of the first target image 510 does not satisfy the input size requirement of the head β 550, the size of the first feature map of the first target image 510 is first adjusted by resize 820 so that the size of the first feature map of the adjusted first target image is the same as the required input size (i.e., 800 × 600mm) of the head β 550, and then the adjusted first feature map of the first target image is input into the head β 550, and the recognition result corresponding to the second recognition task is output by the head β 550 based on the first feature map of the first target image.

It can be understood that, if the recognition tasks of the second target image 520 include the first recognition task, but the first feature map of the second target image 520 does not satisfy the input size requirement of the head α 540, the size of the first feature map of the second target image 520 is first adjusted by resize 820 so that the size of the first feature map of the adjusted second target image is the same as the required input size (i.e., 400 × 300mm) of the head α 540, and then the adjusted first feature map of the second target image is input into the head α 540, and the head α 540 outputs the recognition result corresponding to the first recognition task based on the first feature map of the second target image.

If the recognition task of the second target image 520 includes the second recognition task and the first feature map of the second target image 520 satisfies the input size requirement of the head β 550, the first feature map of the second target image 520 may be directly input into the head β 550, and the head β 550 outputs the recognition result corresponding to the second recognition task based on the first feature map of the second target image 520.

In some embodiments, the size of the feature map output by the first neural network is the same as the size of the image input to the first neural network. In a specific embodiment, before step 130, the method further comprises:

and carrying out size adjustment on the target image so that the size of the adjusted target image is the same as the required input size corresponding to the target second neural network.

Specifically, the first neural network may adopt a Resnet (residual neural network) network, which has a pure convolution network structure inside, and the Resnet network is characterized in that there is no requirement on the size of the input image, the size of the input image is the same as that of the output image, and after a target image to be recognized is subjected to feature extraction by the first neural network, the number of channels of the output first feature map is changed, but the size of the image is the same as that of the target image. For example, a target image of H × Wmm (where H is the length of the target image and W is the width of the target image) is input, and the first neural network outputs a first feature map of C × H × Wmm, where C is the number of channels.

In some embodiments, the target image may be adjusted according to a required input size corresponding to the target second neural network, and the adjusted target image may be input into the first neural network.

Fig. 9 is a flowchart illustrating an object recognition method according to another embodiment of the application, where, as shown in fig. 9, target images to be recognized include a first target image 510 with a size of 400 × 300mm and a second target image 520 with a size of 800 × 600mm, and a recognition task corresponding to the first target image 510 and a recognition task corresponding to the second target image 520 are determined according to installation information of an image capturing device from which the first target image 510 and the second target image 520 originate. .

In the present embodiment, assuming that a second neural network (e.g., head α 540 in fig. 9) is provided for identifying the subject I, the input requirement size of the head α 540 is 400 × 300 mm.

If the task of identifying the first target image includes a task of identifying the object I (for convenience of description, the first target image is assumed to be the first identification task), and the first target image meets the input size requirement of the head α 540, the first target image may be directly input into the backbone β 910, and feature extraction may be performed to obtain a first feature map of the first target image. In the present embodiment, backbone β 910 is a Resnet network. The backbone β 910 has no requirement for the size of the input target image, and the size of the input target image is the same as the size of the output feature map. Therefore, the backbone β 910 may output a first feature map with a size of 400 × 300mm for the first target image 510, and then input the first feature map of the first target image 510 into the head α 540, and output a recognition result corresponding to the first recognition task by the head α 540 based on the first feature map of the first target image 510.

If the recognition task of the second target image 520 includes the first recognition task, but the second target image 520 does not meet the input size requirement of the head α 540, the size of the second target image 520 is adjusted by resize 820, so that the adjusted size of the second target image is the same as the required input size (i.e. 400 × 300mm) of the head α 540, then the adjusted second target image is input into the backbone β 910, feature extraction is performed, and a first feature map of the adjusted second target image is obtained, the backbone β 910 may output the first feature map with the size of 400 × 300mm for the adjusted second target image, then input the first feature map of the adjusted second target image into the head α 540, and the head α 540 outputs the recognition result corresponding to the first recognition task based on the adjusted first feature map of the second target image.

Fig. 10 is a block diagram illustrating an object recognition apparatus according to an embodiment of the present application, and as shown in fig. 8, the object recognition apparatus 1000 includes: an acquisition module 1010, a determination module 1020, a first feature extraction module 1030, and a first identification module 1040.

An obtaining module 1010, configured to obtain a target image to be identified; a determining module 1020, configured to determine an identification task corresponding to the target image, where the identification task is used to indicate an object to be identified; a first feature extraction module 1030, configured to perform feature extraction on the target image by using a first neural network, to obtain a first feature map of the target image; the first recognition module 1040 is configured to perform object recognition on the first feature map by using a target second neural network, and obtain a recognition result that the target image corresponds to the recognition task, where the target second neural network is the second neural network corresponding to the size of the first feature map and the recognition task.

In some embodiments, the object recognition apparatus 1000 further comprises: the identification object information acquisition module is used for acquiring identification object information corresponding to each second neural network in the model set, and the identification object information is used for indicating an object which is used for identification by the corresponding second neural network; the candidate second neural network determining module is used for matching the recognition task with the recognition object information corresponding to each second neural network and determining a candidate second neural network, wherein the object indicated by the corresponding recognition object information is matched with the object indicated by the recognition task, in the model set; the target second neural network determining module is used for matching the size of the first feature map with the requirement input size corresponding to each candidate second neural network and determining the candidate second neural network of which the corresponding requirement input size is the same as the size of the first feature map; and taking the candidate second neural network with the determined requirement input size being the same as that of the first feature map as the target second neural network.

In some embodiments, the object recognition apparatus 1000 further comprises: the selecting module is used for selecting a candidate second neural network as a target second neural network if the candidate second neural network corresponding to the requirement that the input size is the same as the size of the first characteristic diagram does not exist; the first size adjusting module is used for adjusting the size of the first characteristic diagram according to the required input size corresponding to the target second neural network, so that the size of the adjusted first characteristic diagram is the same as the required input size corresponding to the target second neural network; and the second identification module is used for carrying out object identification on the adjusted first characteristic diagram by the target second neural network to obtain an identification result of the target object corresponding to the identification task.

In some embodiments, the size of the feature map output by the first neural network is the same as the size of the image input to the first neural network; the object recognition apparatus 1000 further includes: and the second size adjusting module is used for adjusting the size of the target image so that the size of the adjusted target image is the same as the required input size corresponding to the target second neural network.

In some embodiments, the target image corresponds to at least two recognition tasks.

In some embodiments, the determination module 1020 includes: the installation position information acquisition unit is used for acquiring the installation position information of the image acquisition device from which the target image comes; and an identification task determination unit configured to determine, as an identification task corresponding to the target image, an identification task corresponding to the mounting position indicated by the mounting position information based on a correspondence relationship between the mounting position and the identification task.

In some embodiments, the target second neural network comprises a cascaded target convolutional neural network and a fully-connected layer; the first identification module 1040 includes: the characteristic extraction unit is used for carrying out deep characteristic extraction on the first characteristic diagram by the target convolutional neural network to obtain a second characteristic diagram; and the classification unit is used for classifying the second characteristic diagram by the full connection layer to obtain the identification result of the target image corresponding to the identification task.

According to an aspect of the embodiments of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of the above embodiments.

According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of any of the above embodiments.

In an aspect of the embodiment of the present application, there is also provided an electronic device, as shown in fig. 11, the electronic device 1100 includes a processor 1110 and one or more memories 1120, the one or more memories 1120 are used for storing program instructions executed by the processor 1110, and the processor 1110 implements the object recognition method described above when executing the program instructions.

Further, processor 1110 may include one or more processing cores. The processor 1110 executes or otherwise executes instructions, programs, sets of code, or sets of instructions stored within the memory 1120 and invokes data stored within the memory 1120. Alternatively, the processor 1110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1110 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is to be understood that the modem may be implemented by a communication chip without being integrated into the processor.

According to an aspect of the present application, there is also provided a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries computer readable instructions which, when executed by a processor, implement the method of any of the embodiments described above.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An object recognition method, characterized in that the method comprises:

acquiring a target image to be identified;

determining an identification task corresponding to the target image, wherein the identification task is used for indicating an object to be identified;

performing feature extraction on the target image by using a first neural network to obtain a first feature map of the target image;

and carrying out object recognition on the first feature map by a target second neural network to obtain a recognition result of the target image corresponding to the recognition task, wherein the target second neural network is the second neural network corresponding to the size of the first feature map and the recognition task.

2. The method of claim 1, wherein before the object recognition of the first feature map by the target second neural network and obtaining the recognition result of the target image corresponding to the recognition task, the method further comprises:

obtaining identification object information corresponding to each second neural network in the model set, wherein the identification object information is used for indicating an object which is used for identification by the corresponding second neural network;

matching the identification task with identification object information corresponding to each second neural network, and determining candidate second neural networks, of which the objects indicated by the corresponding identification object information are matched with the objects indicated by the identification task, in the model set;

matching the size of the first feature map with the requirement input size corresponding to each candidate second neural network, and determining the candidate second neural networks with the corresponding requirement input sizes identical to the size of the first feature map; using the candidate second neural network with the determined required input size being the same as the size of the first feature map as the target second neural network.

3. The method of claim 2, wherein after matching the recognition task with the recognition object information corresponding to each second neural network and determining candidate second neural networks in the set of models for which the object indicated by the corresponding recognition object information matches the object indicated by the recognition task, the method further comprises:

if it is determined that a candidate second neural network with the corresponding input requirement size being the same as the size of the first feature map does not exist, selecting one candidate second neural network as the target second neural network;

according to the required input size corresponding to the target second neural network, carrying out size adjustment on the first characteristic diagram so that the size of the adjusted first characteristic diagram is the same as the required input size corresponding to the target second neural network;

and performing object recognition on the adjusted first feature map by the target second neural network to obtain a recognition result of the target object corresponding to the recognition task.

4. The method of claim 1, wherein the first neural network outputs a feature map having the same size as the image input to the first neural network;

before the feature extraction is performed on the target image by the first neural network and the first feature map of the target image is obtained, the method further includes:

5. The method of claim 1, wherein the target image corresponds to at least two recognition tasks.

6. The method of claim 1, wherein the determining the recognition task corresponding to the target image comprises:

acquiring installation position information of an image acquisition device from which the target image comes;

and determining the identification task corresponding to the installation position indicated by the installation position information as the identification task corresponding to the target image based on the corresponding relation between the installation position and the identification task.

7. The method of claim 1, wherein the target second neural network comprises a cascade of a target convolutional neural network and a fully-connected layer;

the object recognition of the first feature map by the second neural network to obtain a recognition result of the target image corresponding to the recognition task comprises:

deep feature extraction is carried out on the first feature map by the target convolutional neural network to obtain a second feature map;

and classifying the second feature map by the full connection layer to obtain an identification result of the target image corresponding to the identification task.

8. An object recognition apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a target image to be identified;

the determining module is used for determining a recognition task corresponding to the target image, and the recognition task is used for indicating an object to be recognized;

the first feature extraction module is used for performing feature extraction on the target image through a first neural network to obtain a first feature map of the target image;

and the first recognition module is used for carrying out object recognition on the first feature map by a target second neural network to obtain a recognition result of the target image corresponding to the recognition task, wherein the target second neural network refers to the second neural network corresponding to the size of the first feature map and the recognition task.

9. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any one of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 7.