CN111104945A

CN111104945A - Object identification method and related product

Info

Publication number: CN111104945A
Application number: CN201911302438.4A
Authority: CN
Inventors: 贾书军; 程帅; 杨春阳
Original assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Current assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2020-05-05

Abstract

The embodiment of the application provides an object identification method and a related product, and the method comprises the following steps: processing an image to be recognized by using a neural network model to obtain a first feature vector of each pixel point in the image to be recognized, wherein the first feature vector of each pixel point comprises a spatial relation between each pixel point and an adjacent pixel point; determining a target pixel point set according to the first feature vector of each pixel point in the image to be identified; and identifying the object to be identified in the image to be identified according to the target pixel point set. By adopting the embodiment of the application, the object identification precision can be improved.

Description

Object identification method and related product

Technical Field

The present application relates to the field of image recognition technology, and in particular, to an object recognition method and related products.

Background

With the development of artificial intelligence technology, neural networks are more and more widely applied. For example, a pre-trained neural network is used for identifying people in a surveillance video, identifying lane lines or identifying tumors in a nuclear magnetic resonance image.

Although neural networks have a good expression for image recognition. However, when the neural network is used to recognize the image, semantic segmentation is mainly relied on, but the semantic segmentation only considers the characteristics of the objects, does not consider the connection between the objects, and further causes poor segmentation effect. For example, when a lane line is identified, the pixels belonging to the lane line are found mainly depending on the characteristics of the pixels, so that the pixels which have similar characteristics to the lane line but do not belong to the lane line are mistakenly identified as the pixels of the lane line, and the identification accuracy of the lane line is low.

Therefore, the object recognition depending on the semantic segmentation method has low recognition accuracy.

Disclosure of Invention

The embodiment of the application provides an object identification method and a related product, and the accuracy of object identification is improved by obtaining the spatial relationship between each pixel point and adjacent pixel points.

In a first aspect, an embodiment of the present application provides an object identification method, including:

processing an image to be recognized by using a neural network model to obtain a first feature vector of each pixel point in the image to be recognized, wherein the first feature vector of each pixel point comprises a spatial relation between each pixel point and an adjacent pixel point;

determining a target pixel point set according to the first feature vector of each pixel point in the image to be identified;

and identifying the object to be identified in the image to be identified according to the target pixel point set.

In a second aspect, an embodiment of the present application provides an object identification apparatus, including:

the processing unit is used for processing the image to be recognized by using a neural network model to obtain a first feature vector of each pixel point in the image to be recognized, wherein the first feature vector of each pixel point comprises the spatial relationship between each pixel point and an adjacent pixel point;

the determining unit is used for determining a target pixel point set according to the first feature vector of each pixel point in the image to be identified;

and the identification unit is used for identifying the object to be identified in the image to be identified according to the target pixel point set.

In a third aspect, embodiments of the present application provide an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for performing the steps in the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, where the computer program makes a computer execute the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program, the computer being operable to cause a computer to perform the method according to the first aspect.

The embodiment of the application has the following beneficial effects:

it can be seen that, in the embodiment of the present application, the first feature vector output by the neural network model includes a spatial relationship between each pixel point and a pixel point of an adjacent vector, a target pixel point set is determined by the spatial relationship, and an object to be identified is identified based on the target pixel point set. Therefore, when the object is identified, the spatial relationship of each pixel point is considered, the spatial relationship is not only singly independent but also depends on the characteristics of each pixel point, and the object identification accuracy is improved through the spatial relationship restriction.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an object identification method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a model training method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a method for constructing an entry vector according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a constructed exit vector provided by an embodiment of the present application;

fig. 5 is a schematic diagram of constructing a second feature vector of each pixel point according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of filling in a sample image according to an embodiment of the present application

Fig. 7 is a schematic flowchart illustrating a process of fusing pixel points according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of another object identification method according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an object recognition device according to an embodiment of the present application;

fig. 10 is a block diagram illustrating functional units of an object recognition apparatus according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The object recognition device in the present application may include a smart phone (such as an Android phone, an iOS phone, a windows phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile internet device MID (Mobile internet devices, abbreviated as MID), a wearable device, etc., and the object recognition devices are merely examples, but not exhaustive, and include but are not limited to the object recognition devices. In practical applications, the object recognition apparatus may further include: intelligent vehicle-mounted terminal, computer equipment and the like.

First, it should be noted that, in the identification of the object to be identified mentioned in the present application, the identification of the lane line is specifically described by taking the identification of the lane line as an example, and the identification and fitting processes of other objects to be identified (for example, the identification of the zebra crossing, the identification of the address texture, and the like) are similar to the identification and fitting processes of the lane line and are not described again.

Referring to fig. 1, fig. 1 is a schematic flow chart of an object identification method according to an embodiment of the present application. The method is applied to an object recognition device. The method of the embodiments of the present application includes, but is not limited to, the following steps:

101: the object recognition device processes an image to be recognized by using a neural network model to obtain a first feature vector of each pixel point in the image to be recognized,

the first feature vector of each pixel point comprises a spatial relationship between each pixel point and an adjacent pixel point corresponding to the pixel point and an instance segmentation result of each pixel point, wherein the instance segmentation result indicates whether the pixel point is a pixel point on a lane line or a pixel point on a background, and the spatial relationship is a spatial relationship between each pixel point and the adjacent pixel point corresponding to the pixel point in the spatial direction, namely a spatial position relationship of the pixel point.

102: and the object identification device determines a target pixel point set according to the first feature vector of each pixel point in the image to be identified.

And determining a target pixel point set in the image to be identified according to the spatial relationship of each pixel point, wherein the target pixel point set is a pixel point set formed by pixel points belonging to the same lane line, and the lane line can be any lane line in the image to be identified.

103: and the object recognition device fits the object to be recognized on the image to be recognized according to the target pixel point set.

Optionally, curve fitting is performed on the pixels in the target pixel set to obtain a lane line corresponding to the target pixel set. Wherein the curve may be a quadratic curve, a cubic curve or a cubic spiral, etc. The process of fitting the lane lines is prior art and will not be described.

Referring to fig. 2, fig. 2 is a process of training a neural network model according to an embodiment of the present invention, including but not limited to the following steps:

201: a sample image is acquired.

And acquiring an image containing the lane line from the database, and labeling the lane line in the sample image, wherein the labeling process is the prior art.

202: and constructing a second feature vector of each pixel point on the sample image to obtain a training sample.

Firstly, according to the labeling result, adding a label to each pixel point, namely identifying whether each pixel point is a background or a pixel point on a lane line; then, constructing a target vector of each pixel point, wherein the target vector is used for representing the spatial relationship between each pixel point and an adjacent pixel point corresponding to the pixel point; and finally, splicing the label of each pixel point and the target vector to obtain a second feature vector of the pixel point. And taking the second feature vector of the pixel point as the supervision information of each pixel point, and forming a training sample by the supervision information of each pixel point and the sample image.

Wherein the spatial relationship comprises a first spatial relationship and/or a second spatial relationship.

Referring to fig. 3, a pixel point P is any one pixel point in the sample image. And constructing a target vector of the pixel point by the direction of entering the pixel point P from the adjacent pixel point. Thus, the target vector may also be referred to as an entry vector, and the first spatial relationship may be represented by constructing the entry vector.

Specifically, if the adjacent pixel point is a pixel point on the lane line, the corresponding dimension in the entering vector is set to 1, otherwise, the corresponding dimension is set to 0, wherein the corresponding dimension is determined by the adjacent relationship between the adjacent pixel point and the pixel point P.

For example, if the adjacent pixel is a left adjacent pixel, dimension 1 entering the vector is the corresponding dimension; if the adjacent pixel point is the upper adjacent pixel point, the dimension 2 entering the vector is the corresponding dimension; if the adjacent pixel point is a right adjacent pixel point, the dimension 3 entering the vector is a corresponding dimension; if the adjacent pixel point is the lower adjacent pixel point, the dimension 4 entering the vector is the corresponding dimension.

Further, different assignments may be performed on the incoming vector to represent the neighboring relationship between the neighboring pixel point and the pixel point P. As shown in fig. 3, the entering pixel P of the left adjacent pixel is represented by assignment 1, the entering pixel P of the upper adjacent pixel is represented by assignment 2, the entering pixel P of the right adjacent pixel is represented by assignment 3, and the entering pixel P of the lower adjacent pixel is represented by assignment 4.

For example, by constructing the entry vector, if only the right adjacent pixel of the pixel P is a pixel on the lane line and the other adjacent pixels are not pixels on the lane line, it is determined that the entry vector of the pixel P is [ 0010 ] or [ 0030 ].

It should be noted that, in the present application, a target vector of each pixel is constructed by setting the corresponding dimension to 1, but the form of constructing the target vector is not limited uniquely.

Referring to fig. 4, a second spatial relationship is constructed by the direction from the pixel point P to each of the adjacent pixel points. Therefore, the target vector of each pixel point can also be called as a departure vector, so that the second spatial relationship can be represented by the departure vector.

Specifically, if the adjacent pixel point to the pixel point P is a pixel point on the lane line, the corresponding dimension in the departure vector is set to 1, and if not, the corresponding dimension is set to 0, where the corresponding dimension is determined by the adjacent relationship between the adjacent pixel point and the pixel point P.

For example, if the adjacent pixel is a left adjacent pixel, the dimension 1 in the departure vector is the corresponding dimension; if the adjacent pixel point is the upper adjacent pixel point, the dimension 2 in the departure vector is the corresponding dimension; if the adjacent pixel point is a right adjacent pixel point, the dimension 3 in the departure vector is a corresponding dimension; if the adjacent pixel point is the lower adjacent pixel point, the dimension 4 in the departure vector is the corresponding dimension.

The corresponding relationship between the vector dimension and the adjacent relationship is only an example, and is not limited.

Of course, similar to constructing the first spatial relationship, the neighboring relationship between the neighboring pixel point and the pixel point P can also be represented by performing different assignments on the departure vector. Referring to fig. 4, the forward direction of the pixel point P to the left adjacent pixel point is represented by the assignment 1, the forward direction of the pixel point P to the upper adjacent pixel point is represented by the assignment 2, the forward direction of the pixel point P to the right adjacent pixel point is represented by the assignment 3, and the forward direction of the pixel point P to the lower adjacent pixel point is represented by the assignment 4.

Since both the incoming and outgoing vectors are constructed in a certain direction. Therefore, before constructing the entering vector or the leaving vector, the direction is preset, and the second feature vector of each pixel point is constructed according to the preset direction.

Specifically, when an entry vector is constructed, connecting lines according to the direction from an adjacent pixel point to each pixel point to obtain a connecting line direction, and when the included angle between the connecting line direction and a preset direction is smaller than a threshold value, determining the adjacent pixel point as a target adjacent pixel point; when constructing the leaving vector, connecting according to the direction from each pixel point to the adjacent pixel point to obtain the connecting direction, and when the included angle between the connecting direction and the preset direction is smaller than the threshold value, determining the adjacent pixel point as the target adjacent pixel point. Wherein the threshold value may be 90 °.

Referring to fig. 5, the arrows indicate the predetermined directions, and the pixels P0-P6 are the pixels on the lane lines. When constructing an entry vector (first spatial relationship) of a pixel point, no corresponding target adjacent pixel point exists in P0, so that the entry vector of P0 is obtained as [ 0000 ], when constructing an entry vector for the pixel point P1, the target adjacent pixel point of the pixel point P1 is obtained as P0 and the pixel point P7, wherein the type of the pixel point P1 is a pixel point on a lane line, the type of the pixel point P7 is not a pixel point on the lane line, so that the entry vector of the pixel point P1 is [ 0001 ]; similarly, the incoming vector for pixel P2 is found to be [ 1000 ]. For the pixel points on the non-lane lines, the corresponding entry vectors are all [ 0000 ].

When constructing the leaving vector (second spatial relationship) of the pixel, according to the above-mentioned method for finding the target neighboring pixel, it can be obtained that the leaving vector of the pixel P0 is [ 0100 ], the leaving vector of the pixel P1 is [ 0010 ], the leaving vector of the pixel P0 is [ 1000 ], and so on. For pixels on non-lane lines, their departure vectors are all [ 0000 ].

In addition, the label of each pixel point can be represented by 0 and 1 vectors. For example, when the pixel is a pixel on the lane line, the corresponding vector is [ 01 ], and when the pixel is not a pixel on the lane line, the corresponding vector is [10 ].

Optionally, under the condition that the first spatial relationship is constructed for each pixel point, the entry vector of each pixel point is spliced with the label information of each pixel point. Referring to fig. 5, it can be obtained that the second feature vector of the pixel point P0 is [ 010000 ], the second feature vector of P1 is [ 010010 ], the second feature vector of P2 is [ 011000 ], and so on; the second eigenvectors of the pixels on the non-lane lines are [ 100000 ].

Optionally, under the condition that a second spatial relationship is constructed for each pixel, the entry vector of each pixel is spliced with the label information of each pixel. Referring to fig. 5, it can be obtained that the second eigenvector of the pixel point P0 is [ 010100 ], the second eigenvector of P1 is [ 010001 ], the second eigenvector of P2 is [ 011000 ], and so on; the second eigenvectors of the pixels on the non-lane lines are [ 100000 ].

Optionally, under the condition that the first spatial relationship and the second spatial relationship are simultaneously constructed for each pixel point, the entering vector and the leaving vector of each pixel point are spliced with the label information of each pixel point. Referring to fig. 5, the second eigenvector of the pixel point P0 is [ 0100000100 ], the second eigenvector of P1 is [ 010100100001 ], the second eigenvector of P2 is [ 0110001000 ], and so on; the second feature vectors of the pixels on the non-lane lines are [ 1000000000 ].

And finally, taking the second feature vector of each pixel point as the supervision information of each pixel point, and combining the supervision information and the sample image together to form a training sample.

203: and performing model training by using the sample image and the supervision information to obtain the neural network model.

And acquiring input data corresponding to the sample image, and training by using the input data. Performing correlation operation on the input data based on parameters in the model to obtain a prediction characteristic vector of each pixel point, and comparing the prediction characteristic vector of each pixel point with the supervision information of each pixel point to obtain a loss result of the training; and updating parameters in the model based on a gradient descent method and the loss result until the difference value between the predicted characteristic vector and the supervision information is smaller than a threshold value, and finishing the training of the model to obtain the neural network model.

It can be understood that, based on the above-mentioned manner of constructing the spatial relationship, if the used training sample is the case of constructing the first spatial relationship training sample, when the image to be recognized is processed by the neural network model, the first feature vector of each output pixel point also includes the first spatial relationship between each pixel point and the adjacent pixel point corresponding to the pixel point, so that the implementation process of determining the target pixel point set according to the first feature vector of each pixel point in the image to be recognized may be as follows:

obtaining a plurality of candidate pixel points according to the first feature vector of each pixel point, wherein any one of the candidate pixel points is a pixel point on an object to be identified, namely obtaining a plurality of pixel points belonging to the object to be identified according to the classification label of each pixel point;

and obtaining a plurality of candidate pixel point sets according to the plurality of candidate pixel points, wherein the distance between the pixel points in each pixel point set in the plurality of candidate pixel point sets is smaller than a threshold value. For the pixel points on the same lane line, the distance between the adjacent pixel points is shorter. Therefore, by threshold segmentation, a plurality of candidate pixel points with a distance smaller than the threshold are found to form a candidate pixel set, and the pixel set is a pixel point on one lane line. However, threshold segmentation will segment some pixels that are close to the lane line but not belonging to a uniform lane line into a candidate set. Moreover, as the relative distance of the lane lines in the image to be recognized is farther, the distance between the two lane lines on the image due to the shooting problem is closer, and the segmentation problem is more obvious. In addition, some pixel points with characteristics similar to the characteristics of the lane lines and distances closer to the lane lines are divided into the same pixel set. For example, when there is a zebra crossing beside one lane line, the pixel points corresponding to the zebra crossing and the pixel points corresponding to the lane line are divided into a pixel point set.

Optionally, in order to solve the above segmentation problem, in this application, after the threshold segmentation is performed, under the condition that each pixel point includes the first spatial relationship, the pixel points of each candidate pixel point set are clustered according to the first spatial relationship corresponding to each pixel point in each candidate pixel point set, so as to obtain a target pixel point set, that is, the pixel points of each candidate pixel point set are clustered by using the first spatial relationship, and the pixel points with the consistent first spatial relationship are screened out from each candidate pixel point set to form a target pixel point set. Because the lane line is a straight line extending forwards, when the first spatial relationship is consistent, it can be described that each pixel point in the target pixel point set is from the last pixel point on the lane line, and the spatial directions of the pixel points are consistent, so that it can be described that the pixel points belong to the same lane line, and for two pixel points (for example, pixel points on two lane lines or pixel points on a zebra line) which do not belong to the same lane line in each candidate set, although the distance between the two pixel points is short, the spatial directions are inevitably inconsistent, the pixel points segmented by mistake can be screened out, and the fitting accuracy of the subsequent lane lines is improved.

Optionally, under the condition that the spatial relationship of each pixel includes the second spatial relationship, after a plurality of candidate pixel sets are obtained, clustering the pixels of each candidate pixel set according to the second spatial relationship corresponding to each pixel in each candidate pixel set, so as to obtain a target pixel set. And screening out the pixels with the consistent second spatial relationship from each candidate pixel set to form a target pixel set.

Further, under the condition that the spatial relationship of each pixel point includes the first spatial relationship and includes the second spatial relationship, after a plurality of candidate pixel sets are obtained, clustering the pixel points of each candidate pixel point set according to the first spatial relationship and the second spatial relationship corresponding to each pixel point in each candidate pixel point set, so as to obtain a target pixel point set. And screening out pixels with consistent first spatial relationship and consistent second spatial relationship from each candidate pixel set to form a target pixel set.

In a possible implementation manner, in order to reduce the operation pressure of the network model, before the second feature vector of each pixel is constructed, the pixels in the sample image are fused to reduce the dimensionality of the training sample.

Referring to fig. 6-7, in order to fuse each pixel in the sample image, a padding operation is performed on the sample image, wherein the dimension of the padding operation is determined according to the number of pixels to be fused and the dimension of the sample image.

For example, as shown in fig. 5, the dimension of the sample image is 7 × 7, and if four pixels are fused into one pixel, the sample image needs to be padded by one dimension, so as to obtain an 8 × 8 pixel matrix as shown in fig. 6. Then, four pixel points are fused into one pixel point, specifically, when the four pixel points at least include a pixel point on a lane line, the fused pixel point is set as the pixel point on the lane line, and then the fused pixel matrix shown in fig. 7 can be obtained.

It should be noted that the process of constructing the second eigenvector for each pixel in the pixel matrix shown in fig. 7 is the same as that described above, and will not be described again. However, since one pixel point in the pixel matrix shown in fig. 7 represents four pixel points, in order to not lose data information of the original four pixel points, the second eigenvectors of the original four pixel points and the second eigenvector of the fused pixel point need to be spliced to obtain the final second eigenvector of the fused pixel point.

The following describes the process of pixel fusion in detail with the pixel PP 0. As shown in fig. 6-7, PP0 is obtained by fusing the pixels P01, P12, P23 and P40.

Under the condition that a first spatial relationship is constructed for each pixel point, the second characteristic vector of PP0 is [ 010000 ], the second characteristic vector of P01 is [ 010000 ], the second characteristic vector of P12 is [ 010001 ], the second characteristic vector of P23 is [ 011000 ], and the second characteristic vector of P40 is [ 100000 ]. Therefore, the second eigenvectors of P01, P12, P23 and P41 are respectively spliced with the second eigenvector of PP0 to obtain a final second eigenvector of PP0 as [ 010000010000011000100000 ].

Under the condition that a second spatial relationship is constructed for each pixel point, the second characteristic vector of PP0 is [ 010100 ], the second characteristic vector of P01 is [ 010100 ], the second characteristic vector of P12 is [ 010010 ], the second characteristic vector of P23 is [ 010100 ], and the second characteristic vector of P40 is [ 100000 ]. Therefore, the second eigenvectors of P01, P12, P23 and P41 are respectively spliced with the second eigenvector of PP0 to obtain a final second eigenvector of PP0 as [ 010100010100010010010100100000 ].

Under the condition that a first spatial relationship and a second spatial relationship are constructed for each pixel point, a second feature vector of PP0 is obtained to be [ 0100000100 ], a second feature vector of P01 is obtained to be [ 0100000100 ], a second feature vector of P12 is obtained to be [ 0100010010 ], a second feature vector of P23 is obtained to be [ 0110000100 ], and a second feature vector of P40 is obtained to be [ 1000000000 ]. Therefore, the second eigenvectors of P01, P12, P23 and P41 are respectively spliced with the second eigenvector of PP0 to obtain a final second eigenvector of PP0 as [ 01000001000100000100010001001001100001001000000000 ].

In this example, after the pixel points are fused, when the second feature vector of each pixel point is constructed, the data information (label information and spatial relationship) of the previous dimension (high latitude) of each pixel point is spliced to the current dimension (low latitude), so that the obtained final second feature vector contains both the data information of the current dimension and the data information of the previous dimension. Therefore, the data compression is realized, the operation process is reduced, the data information of each dimension is kept, and the operation precision is not reduced.

Therefore, when the model is trained, under the condition that pixel points in a sample image are compressed, when the trained neural network model processes an image to be recognized, each pixel point in an output first feature map already contains data information on high dimensionality, the first feature map does not need to be subjected to up-sampling processing, the first feature map does not need to be sampled to the same dimensionality as the image to be recognized, the first feature map can be directly used for classifying each pixel point to obtain a first feature vector of each pixel point in the first feature map, and the first feature vectors of each pixel point (each pixel point on the image to be recognized) corresponding to each dimensionality are synchronously obtained.

Referring to fig. 8, fig. 8 is a schematic flowchart of another object identification method according to an embodiment of the present disclosure. The method is applied to an object recognition device. The method of the embodiment comprises the following steps:

801: an object recognition device acquires a sample image.

802: and the object identification device constructs a second feature vector of each pixel point on the sample image to obtain a training sample.

803: and the object recognition device performs model training by using the training sample to obtain a neural network model.

804: the object recognition device processes an image to be recognized by using a neural network model to obtain a first feature vector of each pixel point in the image to be recognized, wherein the first feature vector of each pixel point comprises a spatial relation between each pixel point and an adjacent pixel point.

805: and the object identification device determines a target pixel point set according to the first feature vector of each pixel point in the image to be identified.

806: and the object identification device identifies the object to be identified in the image to be identified according to the target pixel point set.

It should be noted that, for the specific implementation of the steps of the method shown in fig. 8, reference may be made to the specific implementation of the method described in fig. 1, and a description thereof is omitted here.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an object recognition apparatus 900 according to an embodiment of the present disclosure, as shown in fig. 9, the object recognition apparatus 900 includes a processor, a memory, a communication interface, and one or more programs, and the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for performing the following steps:

In a possible embodiment, before processing the image to be recognized using the neural network model, the program further comprises instructions for performing the following steps:

acquiring a sample image;

constructing a second feature vector of each pixel point on the sample image to obtain a training sample;

and carrying out model training by using the training sample to obtain the neural network model.

In a possible embodiment, in constructing the second feature vector of each pixel point on the sample image to obtain the training sample, the program is specifically configured to execute the following instructions:

determining a target adjacent pixel point in adjacent pixel points of each pixel point in the sample image according to a preset direction;

determining a second feature vector of each pixel point according to the label of each pixel point, the label of a target adjacent pixel point corresponding to each pixel point and the preset direction, wherein the second feature vector comprises the label of each pixel point and the spatial relationship between each pixel point and the adjacent pixel point;

and splicing the second feature vectors of each pixel point in the sample image to obtain a training sample.

In a possible embodiment, in determining a target neighboring pixel point among neighboring pixel points of each pixel point in the sample image according to a preset direction, the program is specifically configured to execute the following instructions:

under the condition that a first spatial relationship is constructed for each pixel point, connecting lines according to the direction from the adjacent pixel point of each pixel point to the pixel point to obtain a connecting line direction, and when the included angle between the connecting line direction and the preset direction is smaller than a threshold value, determining the adjacent pixel point as a target adjacent pixel point;

under the condition that a second spatial relationship is constructed for each pixel point, connecting lines according to the direction from each pixel point to the adjacent pixel point of each pixel point to obtain the connecting line direction, and when the included angle between the connecting line direction and the preset direction is smaller than a threshold value, determining that the adjacent pixel point is the target adjacent pixel point.

In a possible implementation manner, in terms of processing an image to be recognized by using a neural network model to obtain a first feature vector of each pixel point in the image to be recognized, the above-mentioned program is specifically configured to execute the following instructions:

processing an image to be recognized by using a neural network model to obtain a first feature map, wherein the first feature map is obtained before an up-sampling operation is not carried out;

and classifying each pixel point in the image to be identified according to the first feature map to obtain a first feature vector of each pixel point in the image to be identified.

In a possible implementation manner, in a case that the spatial relationship of each pixel includes the first spatial relationship, in terms of determining a target pixel set according to the first feature vector of each pixel in the image to be recognized, the above-mentioned program is specifically configured to execute the following instructions:

obtaining a plurality of candidate pixel points according to the first feature vector of each pixel point in the image to be recognized, wherein any one pixel point in the candidate pixel points is a pixel point on the object to be recognized;

obtaining a plurality of candidate pixel point sets according to the plurality of candidate pixel points, wherein the distance between adjacent pixel points in each of the plurality of candidate pixel point sets is smaller than a threshold value;

and clustering the pixels of each candidate pixel point set according to the first spatial relationship corresponding to each pixel point in each candidate pixel point set to obtain a target pixel point set.

In a possible implementation manner, in a case that the spatial relationship of each pixel includes the second spatial relationship, in terms of determining a target pixel set according to the first feature vector of each pixel in the image to be recognized, the above-mentioned program is specifically configured to execute the following instructions:

determining a plurality of candidate pixel points according to the first feature vector of each pixel point in the image to be recognized, wherein any one pixel point in the plurality of candidate pixel points is a pixel point on the object to be recognized;

forming a plurality of candidate pixel point sets by the candidate pixel points, wherein the distance between the pixel points in each pixel point set in the candidate pixel point sets is smaller than a threshold value;

and clustering the pixels of each candidate pixel point set according to a second spatial relationship corresponding to each pixel point in each candidate pixel point set to obtain a target pixel point set.

In a possible implementation manner, in a case that the spatial relationship of each pixel includes the first spatial relationship and the second spatial relationship, in terms of determining a target pixel set according to the first feature vector of each pixel in the image to be recognized, the above-mentioned program is specifically configured to execute the following instructions:

and clustering the pixels of each candidate pixel point set according to the first relation and the second spatial relation corresponding to each pixel point in each candidate pixel point set to obtain a target pixel point set.

In a possible implementation manner, in a case that the object to be recognized is a lane line, the target pixel point set is a set of pixel points belonging to the same lane line, and the lane line is any one lane line in the image to be recognized, and in terms of fitting the object to be recognized according to the target pixel point set, the above-mentioned program is specifically configured to execute the following instructions:

and performing lane line fitting according to the target pixel point set to obtain the lane line.

Referring to fig. 10, fig. 10 is a block diagram of functional units of an object recognition apparatus 1000 according to an embodiment of the present application, where the object recognition apparatus 1000 includes: a processing unit 1001, a determining unit 1002, and an identifying unit 1003, wherein:

the processing unit 1001 is configured to process an image to be recognized by using a neural network model to obtain a first feature vector of each pixel point in the image to be recognized, where the first feature vector of each pixel point includes a spatial relationship between each pixel point and an adjacent pixel point;

a determining unit 1002, configured to determine a target pixel point set according to the first feature vector of each pixel point in the image to be identified;

the identifying unit 1003 is configured to identify an object to be identified in the image to be identified according to the target pixel point set.

In a possible implementation, the object recognition apparatus 1000 further includes a training unit 1004, before processing the image to be recognized using the neural network model, the training unit 1004 is configured to:

acquiring a sample image;

In a possible implementation manner, in constructing the second feature vector of each pixel point on the sample image to obtain a training sample, the training unit 1004 is specifically configured to:

In a possible implementation manner, in determining a target neighboring pixel point among neighboring pixel points of each pixel point in the sample image according to a preset direction, the training unit 1004 is specifically configured to:

In a possible implementation manner, in terms of processing an image to be recognized by using a neural network model to obtain a first feature vector of each pixel point in the image to be recognized, the processing unit 1001 is specifically configured to:

In a possible implementation manner, in a case that the spatial relationship of each pixel includes the first spatial relationship, in terms of determining the target pixel point set according to the first feature vector of each pixel in the image to be recognized, the determining unit 1002 is specifically configured to:

In a possible implementation manner, in a case that the spatial relationship of each pixel includes the second spatial relationship, in terms of determining a target pixel set according to the first feature vector of each pixel in the image to be recognized, the determining unit 1002 is specifically configured to:

In a possible implementation manner, in a case that the spatial relationship of each pixel includes the first spatial relationship and the second spatial relationship, in terms of determining a target pixel set according to the first feature vector of each pixel in the image to be identified, the determining unit 1002 is specifically configured to:

In a possible implementation manner, in a case that the object to be recognized is a lane line, the target pixel point set is a pixel point set belonging to the same lane line, and the lane line is any one lane line in the image to be recognized, and in terms of fitting the object to be recognized according to the target pixel point set, the recognizing unit 1003 is specifically configured to:

Embodiments of the present application also provide a computer storage medium, which stores a computer program, where the computer program is executed by a processor to implement part or all of the steps of any one of the object identification methods as described in the above method embodiments.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the object identification methods as recited in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An object recognition method, comprising:

2. The method of claim 1, wherein prior to processing the image to be recognized using the neural network model, the method further comprises:

acquiring a sample image;

3. The method of claim 2, wherein constructing a second feature vector for each pixel point on the sample image to obtain a training sample comprises:

4. The method of claim 3, wherein the determining the target neighboring pixel point among the neighboring pixel points of each pixel point in the sample image according to the preset direction comprises:

5. The method according to any one of claims 1 to 4, wherein the processing the image to be recognized by using the neural network model to obtain the first feature vector of each pixel point in the image to be recognized comprises:

6. The method according to claim 4 or 5, wherein in a case that the spatial relationship of each pixel point includes the first spatial relationship, the determining a target pixel point set according to the first feature vector of each pixel point in the image to be recognized includes:

7. The method according to claim 4 or 5, wherein in a case that the spatial relationship of each pixel point includes the second spatial relationship, the determining a target pixel point set according to the first feature vector of each pixel point in the image to be recognized includes:

determining a plurality of candidate pixel points according to the first characteristic vector of each pixel point in the image to be recognized, wherein any one pixel point in the candidate pixel points is a pixel point on the object to be recognized;

8. The method according to claim 4 or 5, wherein in a case that the spatial relationship of each pixel includes the first spatial relationship and the second spatial relationship, the determining a target pixel set according to the first feature vector of each pixel in the image to be recognized includes:

9. The method according to any one of claims 1 to 8, wherein in a case where the object to be recognized is a lane line, the target pixel point set is a set of pixel points belonging to the same lane line, the lane line is any one lane line in the image to be recognized, and the fitting the object to be recognized according to the target pixel point set includes:

10. An object recognition device, comprising:

11. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method of any of claims 1-9.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-9.