CN113989598A

CN113989598A - Image-based identification method and device and vehicle

Info

Publication number: CN113989598A
Application number: CN202010660939.6A
Authority: CN
Inventors: 夏晗; 杨臻; 张维
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2022-01-28
Also published as: WO2022007858A1

Abstract

The embodiment of the application provides an image-based identification method, an image-based identification device, electronic equipment, a storage medium, a program product, a chip and a vehicle, and the method comprises the following steps: the method comprises the steps of acquiring images around a vehicle acquired by a plurality of image acquisition devices respectively, identifying objects to be identified in the multi-frame images, generating respective first identification information of the multi-frame images, determining a detection area of a target image according to the first identification information of a source image, generating attribute information of the objects to be identified according to the first identification information and each detection area, determining the detection area of the target image through a process of supplementing the detection areas mutually based on the first identification information of the source image, wherein the process is equivalent to a process of supplementing the detection areas mutually, and the technical effects of completeness and comprehensiveness of identification can be realized by supplementing the detection areas mutually, and when the attribute information of the objects to be identified is generated according to the first identification information and each detection area, the technical effects of reliability and accuracy of the attribute information of the objects to be identified can be realized.

Description

Image-based identification method and device and vehicle

Technical Field

The present application relates to the field of image recognition technology, and in particular, to the field of computer vision technology, and more particularly, to an image-based recognition method and apparatus, an electronic device, a storage medium, a chip, and a vehicle.

Background

With the development of the automatic driving technology, how to improve the safety of automatic driving becomes a focus of attention, and the recognition of traffic lights, vehicles, pedestrians, and the like is one of important factors for improving the safety of automatic driving.

In the prior art, a multi-camera fusion mode can be adopted to realize the identification of traffic lights, vehicles, pedestrians and the like. Specifically, a plurality of cameras (for example, two cameras or three cameras may be provided) may be provided on the vehicle, images around the vehicle are respectively collected by the cameras, the collected images are sent to the processor, the processor generates an image including each element in each image based on each image, and identifies the image including each element in each image, so as to obtain attribute information of the traffic light, the vehicle, the pedestrian, and the like, such as color information of the traffic light, the position of the pedestrian, and the like.

However, the inventors found that the prior art has at least the following problems: the accuracy of attribute information obtained by the processor by identifying an image including elements in the image is low.

Disclosure of Invention

In order to solve the technical problem, embodiments of the present application provide an image-based recognition method, an image-based recognition apparatus, an electronic device, a storage medium, a program product, a chip, and a vehicle.

According to an aspect of an embodiment of the present application, an embodiment of the present application provides an image-based recognition method, including:

acquiring images around a vehicle acquired by a plurality of image acquisition devices respectively;

identifying objects to be identified in a plurality of frames of images, and generating respective first identification information of the plurality of frames of images;

determining a detection area of a target image according to first identification information of a source image, wherein the source image is used for representing any frame image in each image, and the target image is used for representing other frame images except the source image in each image;

and generating attribute information of the object to be identified according to the first identification information and each detection area.

In the embodiment of the application, the images respectively acquired by the plurality of image acquisition devices are introduced so as to supplement the detection area of each image, and the comprehensiveness and reliability of the detection area are fully considered, so that the technical effect of identification accuracy is realized.

Specifically, by determining the detection region of the target image based on the first identification information of the source image, which corresponds to a process of mutually complementing the detection regions, and by mutually complementing the detection regions that may affect the identification result of the object to be identified, the technical effects of completeness and comprehensiveness of identification can be achieved, and when the attribute information of the object to be identified is generated from the first identification information and each detection region, the technical effects of reliability and accuracy of the attribute information of the object to be identified can be achieved.

That is to say, according to the embodiment of the present application, the respective detection regions are complemented and improved based on the respective frame images, so that a relatively complete and comprehensive detection region can be obtained, and a technical effect of high reliability and high accuracy of the attribute information of the object to be identified is achieved.

In some embodiments, the first recognition information includes a first detection box, and determining the detection area of the target image according to the first recognition information of the source image includes:

and determining the detection area according to the position information of the first detection frame.

In some embodiments, the determining the detection area according to the position information of the first detection frame comprises:

and determining the detection area according to the position information of the first detection frame, the internal reference of the camera of the source image, the internal reference of the camera of the target image and the external reference between the camera of the source image and the camera of the target image.

In some embodiments, the determining the detection region according to the position information of the first detection frame, the internal reference of the camera of the source image, the internal reference of the camera of the target image, and the external reference between the camera of the target image and the camera of the source image comprises:

determining the position information of the first detection frame in the coordinate system of the camera of the source image according to the position information of the first detection frame and the internal reference of the camera of the source image;

determining the position information of the first detection frame in the coordinate system of the camera of the target image according to the position information of the first detection frame in the coordinate system of the camera of the source image and the external parameters;

and determining the detection area according to the position information of the first detection frame in the coordinate system of the camera of the target image and the internal reference of the camera of the target image.

In the embodiment of the application, the first detection frame of the source image can be accurately mapped to the target image based on a plurality of times of conversion of the position information.

In some embodiments, before the determining the detection region according to the position information of the first detection frame, the method includes:

determining position conversion information between the target image and the source image;

and the determining the detection area according to the position information of the first detection frame comprises: and performing position conversion on the position information of the first detection frame according to the position conversion information to generate the detection area.

In the embodiment of the application, position conversion information between the target image and the source image can be determined firstly, and position conversion is performed on the position information of the first detection frame according to the position conversion information to generate the detection area.

Based on the above analysis, in some embodiments, the detection region may be determined based on multiple conversions of the position information, and in other embodiments, the position conversion information between the source image and the target image may be calibrated in advance, and after the calibration, a conversion calculation may be performed based on the position conversion information to obtain the detection region.

In some embodiments, the detection region is a region subjected to image super-resolution enhancement processing.

In the embodiment of the application, the image super-resolution enhancement processing can be performed on the detection area, so that the resolution of the detection area is improved, and the technical effects of the accuracy and the reliability of identification are improved in the subsequent identification process.

In some embodiments, the generating attribute information of the object to be identified according to the first identification information and each of the detection regions includes:

generating second identification information of each detection area;

and generating attribute information of the object to be identified according to the first identification information and the second identification information.

In some embodiments, the first identification information includes a first detection frame, the second identification information includes a second detection frame, and the generating the attribute information of the object to be identified according to the first identification information and the second identification information includes:

merging the first detection frame and the second detection frame;

and performing fusion processing on the combined detection frames to generate attribute information of the object to be identified.

In the embodiment of the application, a first detection frame and a second detection frame of a source image are combined (hereinafter referred to as a first combined detection frame), the first detection frame and the second detection frame of a target image are combined (hereinafter referred to as a second combined detection frame), and the first combined detection frame and the second combined detection frame are subjected to fusion processing to obtain attribute information of an object to be recognized.

In some embodiments, the fusing the merged detection frames, and generating the attribute information of the object to be recognized includes:

extracting the feature vector of the combined detection frame;

and outputting the attribute information of the object to be identified according to a preset fusion model and the feature vector.

In some embodiments, before the outputting the attribute information of the object to be recognized according to the preset fusion model and the feature vector, the method further includes:

acquiring a sample image, wherein the sample image is acquired by the plurality of image acquisition devices, and the sample image comprises the object to be identified;

and generating the fusion model according to the sample image and a preset truth value, wherein the truth value is determined according to at least one image in a plurality of images acquired by the plurality of image acquisition devices at the same time, and the at least one image comprises the object to be identified.

In the embodiment of the application, a true value can be labeled on the basis of one image in a plurality of images acquired by a plurality of image acquisition devices at the same time, and true values of other images can be determined on the basis of the true value of the image, for example, the true values are determined on the basis of position conversion information (which can be labeled in advance), so that the labeling cost is saved, and the efficiency of training the fusion model is improved.

In some embodiments, the plurality of image capture devices includes at least two of a tele camera, a mid camera, and a short camera.

In the embodiment of the application, a plurality of image acquisition devices are formed by adopting two cameras with different focal lengths, so that a plurality of frames of images can be acquired based on different visual angles, the problems of missing identification and error identification are avoided, and the reliability and accuracy of identification are improved.

In some embodiments, the plurality of image capture devices is three in number, and three image capture devices comprise: one long focus camera, one medium focus camera and one short focus camera.

According to another aspect of the embodiments of the present application, there is also provided an apparatus for recognizing an object to be recognized, the apparatus including:

the first acquisition module is used for acquiring images around the vehicle acquired by the plurality of image acquisition devices;

the identification module is used for identifying objects to be identified in the images of multiple frames and generating respective first identification information of the images of the multiple frames;

the device comprises a determining module, a detecting module and a judging module, wherein the determining module is used for determining a detection area of a target image according to first identification information of a source image, the source image is used for representing any frame image in each image, and the target image is used for representing other frame images except the source image in each image;

and the first generation module is used for generating attribute information of the object to be identified according to the first identification information and each detection area.

In some embodiments, the first identification information includes a first detection frame, and the determination module is configured to determine the detection area according to position information of the first detection frame.

In some embodiments, the determination module is configured to determine the detection area according to the position information of the first detection frame, the internal reference of the camera of the source image, the internal reference of the camera of the target image, and the external reference between the camera of the source image and the camera of the target image.

In some embodiments, the determining module is configured to determine, according to the position information of the first detection frame and the internal reference of the camera of the source image, the position information of the first detection frame in the coordinate system of the camera of the source image, determine, according to the position information of the first detection frame in the coordinate system of the camera of the source image and the external reference, the position information of the first detection frame in the coordinate system of the camera of the target image, and determine the detection area according to the position information of the first detection frame in the coordinate system of the camera of the target image and the internal reference of the camera of the target image.

In some embodiments, the determining module is configured to determine position conversion information between the target image and the source image, and perform position conversion on the position information of the first detection frame according to the position conversion information to generate the detection region.

In some embodiments, the first generating module is configured to generate second identification information of each detection area, and generate attribute information of the object to be identified according to the first identification information and the second identification information.

In some embodiments, the first identification information includes a first detection frame, the second identification information includes a second detection frame, and the first generation module is configured to combine the first detection frame and the second detection frame, perform fusion processing on the combined detection frames, and generate the attribute information of the object to be identified.

In some embodiments, the first generating module is configured to extract a feature vector of the combined detection frame, and output attribute information of the object to be recognized according to a preset fusion model and the feature vector.

In some embodiments, the apparatus further comprises:

the second acquisition module is used for acquiring a sample image, wherein the sample image is acquired by the plurality of image acquisition devices, and the sample image comprises the object to be identified;

and the second generation module is configured to generate the fusion model according to the sample image and a preset true value, where the true value is determined according to at least one image of multiple frames of images acquired by the multiple image acquisition devices at the same time, and the at least one image includes the object to be identified.

According to another aspect of the embodiments of the present application, there is also provided a computer storage medium having stored thereon computer instructions, which, when executed by a processor, cause the method of any of the above embodiments to be performed.

According to another aspect of embodiments of the present application, there is also provided a computer program product, which when run on a processor, causes the method of any of the above embodiments to be performed.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the method of any of the above embodiments to be performed.

According to another aspect of the embodiments of the present application, there is also provided a chip, including:

an input interface for images around the vehicle acquired by each of the plurality of image acquisition devices;

logic circuitry for performing the method of any of the above embodiments, generating attribute information of an object to be identified;

and the output interface is used for controlling the driving strategy of the vehicle according to the attribute information of the object to be identified.

According to another aspect of the embodiments of the present application, there is also provided a vehicle including the apparatus of any of the embodiments described above; alternatively, the electronic device of any of the above embodiments; alternatively, a chip as described in any of the above embodiments.

In some embodiments, the vehicle further comprises:

and the image acquisition devices are used for acquiring images.

Drawings

The drawings are included to provide a further understanding of the embodiments of the application and are not intended to limit the application. Wherein the content of the first and second substances,

fig. 1 is a schematic view of an application scenario of an image-based recognition method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of an image-based recognition method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating an image-based recognition method according to another embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating an image-based recognition method according to another embodiment of the present application;

FIG. 5 is a schematic diagram of an image recognition-based apparatus according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an image recognition apparatus according to another embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;

FIG. 8 is a schematic diagram of a chip according to an embodiment of the present application;

fig. 9 is a schematic view of a vehicle according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The image-based identification method can be applied to a scene in which a vehicle identifies objects (people and/or objects) to be identified around the vehicle.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of an image-based identification method according to an embodiment of the present application, and in the application scenario shown in fig. 1, an object to be identified is a traffic light.

In the application scenario shown in fig. 1, a plurality of image capturing devices (not shown) are provided on a vehicle 100 traveling on a road, and each image capturing device is used to capture an image of the surroundings of the vehicle 100 based on a Field of View (FOV).

In the application scenario shown in fig. 1, the image capturing device may capture an image including the traffic light 200 and transmit the image to a server (not shown in the figure) provided in the vehicle 100.

After receiving the images sent by the image acquisition devices, the server executes the image-based identification method of the embodiment of the application to generate the attribute information of the traffic lights.

It should be noted that the foregoing examples are merely exemplary of servers, and are not to be construed as limiting the subject matter of the embodiments of the present application.

For example, the execution subject of the embodiment of the present application may also be a processor provided in the vehicle 100, a vehicle mounted BOX (T-BOX), a domain Controller (Domian Controller, DC), a Multi-domain Controller (MDC), a vehicle mounted Unit (On board Unit, OBU), a chip, and the like.

In the related art, the method for identifying the object to be identified is as follows: a plurality of cameras are arranged on a vehicle, images are acquired by the cameras respectively, images including elements in the images are generated and recognized, and attribute information of an object to be recognized is generated.

However, due to the influence of performance parameters such as FOV of the image capturing device, and/or the influence of the environment (such as weather environment, traffic jam condition, etc.) where the vehicle is located, etc., a problem that the accuracy of the generated attribute information of the object to be identified is low may be caused.

In order to solve the above problems, the inventors of the present application, after having conducted creative work, have arrived at the inventive concept of the embodiments of the present application: images acquired by a plurality of image acquisition devices are acquired, and a plurality of frames of images complement and verify each other, thereby generating attribute information of an object to be identified.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

According to an aspect of the embodiments of the present application, the embodiments of the present application provide an image-based recognition method, which may be applied to an application scenario as shown in fig. 1.

Referring to fig. 2, fig. 2 is a flowchart illustrating an image-based recognition method according to an embodiment of the present application.

As shown in fig. 2, the method includes:

s101: images of the surroundings of the vehicle acquired by each of the plurality of image acquisition devices are acquired.

In addition, as can be seen from the above examples, in some embodiments, the image-based recognition device may be a server, a processor, an in-vehicle box, a domain controller, a multi-domain controller, an in-vehicle unit, a chip, and so on. In the embodiments of the present application, the execution subject is exemplarily described as a server.

The image acquisition device can be used for representing electronic equipment with an image acquisition function, such as a camera, a video camera and the like, and the position of the image acquisition device can be set based on requirements, experience and experiments.

Preferably, the position of the image capturing device may be set with reference to the maximum field of view of the image captured by the image capturing device. Wherein the field of view maximization in this example may be understood from the dimension of field of view maximization of the images acquired by a single image acquisition device and/or field of view maximization of the images acquired by all image acquisition devices.

That is to say, in this application embodiment, a plurality of image acquisition devices may be disposed on the vehicle, and each image acquisition device is connected to the server, and each image acquisition device sends the image acquired by each image acquisition device to the server.

In order to make the reader understand the scheme of the embodiment more clearly, the embodiment of the present application is described by taking the image capturing device as a camera and taking 3 image capturing devices as examples.

Specifically, the server receives images respectively transmitted by 3 cameras, that is, the server can receive 3 frames of images, and the 3 frames of images are from different 3 cameras.

In some embodiments, the plurality of image capture devices includes at least two of a tele camera, a mid camera, and a short-focus camera.

For example, the plurality of image capturing devices may include both types of cameras, a tele camera and a mid camera; as another example, the plurality of image capture devices may include both types of cameras, a long focus camera and a short focus camera; as another example, the plurality of image capture devices may include both types of cameras, a mid-focus camera and a short-focus camera; as another example, the plurality of image capture devices may include three types of cameras, a long focus camera, a medium focus camera, and a short focus camera.

Wherein, the long focus, the middle focus and the short focus are all for the standard lens. For example, for a camera using 135 film (film frames are 36x wide by 24MM), a 50MM lens is counted as a standard lens, while a lens smaller than 50MM is short focus, a lens between 50MM and 135MM is medium focus, and a lens larger than 135MM is long focus.

In some embodiments, the number of the plurality of image capturing devices is three, and the three image capturing devices comprise: one long focus camera, one medium focus camera and one short focus camera.

Wherein, considering the cost of the image acquisition devices, and the factors based on the integrality and comprehensiveness of the images acquired by each image acquisition device, the number of the image acquisition devices can be determined to be 3, and the 3 image acquisition devices adopt cameras with different focuses, that is: one long focus camera, one medium focus camera and one short focus camera.

S102: and identifying the object to be identified in the multi-frame images to generate respective first identification information of the multi-frame images.

As can be seen from the above examples, the object to be recognized includes people and/or objects around the vehicle, such as pedestrians, other vehicles, traffic lights, signs, and so on.

The first identification information may be used for representing, identifying and obtaining related information, such as a detection frame, of the image.

In some embodiments, the recognition network model may be generated by model training, and the first recognition information may be generated according to the recognition network model.

It should be noted that the above description is only an exemplary description of one of the possible ways of generating the first identification information, and is not to be construed as limiting the way of generating the first identification information according to the embodiment of the present application.

As can be seen from the above example, in this step, after the server receives the 3 frames of images, the first identification information of each of the 3 frames of images, that is, the 3 pieces of first identification information, can be generated.

S103: and determining a detection area of a target image according to the first identification information of the source image, wherein the source image is used for representing any frame image in the images, and the target image is used for representing other frame images except the source image in the images.

The detection area can be used for representing, and when the first identification information of the source image is mapped to the target image, the first identification information corresponds to the position on the target image.

This step is understood to mean that, for any one of the plurality of frame images, the detection area of the other frame image can be determined according to any one frame image.

Based on the above example, if the 3-frame images are image a, image B, and image C, respectively, and the source image is image a, the target image may be image B, or may be image C, and when the target image is image B, the detection area is determined on image B based on the first recognition information of image a, and when the target image is image C, the detection area is determined on image C based on the first recognition information of image a. By analogy, at least two detection regions on image a (i.e., at least one detection region determined based on the first identification information of image B and at least one detection region determined based on the first identification information of image C), at least two detection regions on image B (i.e., at least one detection region determined based on the first identification information of image a and at least one detection region determined based on the first identification information of image C), at least two detection regions on image C (i.e., at least one detection region determined based on the first identification information of image a and at least one detection region determined based on the first identification information of image B) may be determined.

In the embodiment of the application, the process of determining the detection area of the target image according to the first identification information of the source image is equivalent to the process of mutually supplementing the detection area, and the technical effects of completeness and comprehensiveness of identification can be realized by mutually supplementing the detection areas which may influence the identification result of the object to be identified, so that the technical effects of reliability and accuracy of subsequently generated attribute information of the object to be identified are realized.

S104: and generating attribute information of the object to be identified according to the first identification information and each detection area.

The attribute information of the object to be identified can be used for representing information related to the position and the category of the object to be identified.

For example, if the object to be identified is a traffic light, the attribute information of the traffic light may be a distance between the traffic light and the vehicle, a color category of the traffic light (such as a red light, a green light, and a yellow light), a shape category of the traffic light (such as a left turn, a straight turn, and a right turn), and the like.

For another example, if the object to be identified is a pedestrian, the attribute information of the pedestrian may be a distance between the pedestrian and the vehicle, a category of the pedestrian (such as an adult, a child, and an elderly person), and the like.

Based on the above analysis, an embodiment of the present application provides an image-based recognition method, including: acquiring images around a vehicle, which are acquired by a plurality of image acquisition devices respectively, identifying objects to be identified in a plurality of frames of images, generating first identification information of the plurality of frames of images, determining a detection area of a target image according to the first identification information of the source image, wherein the source image is used for representing any frame of image in the images, the target image is used for representing other frame images except the source image in the images, generating attribute information of the objects to be identified according to the first identification information and the detection areas, determining the detection area of the target image based on the first identification information of the source image is equivalent to a process of mutually supplementing the detection area, and mutually supplementing the detection area which can influence the identification result of the objects to be identified, so that the technical effects of completeness and comprehensiveness of identification can be realized, and when the attribute information of the objects to be identified is generated according to the first identification information and the detection areas, the technical effects of reliability and accuracy of the attribute information of the object to be recognized can be achieved.

For the reader to understand the determination method of the detection region more clearly, the image-based identification method according to the embodiment of the present application will be described in detail with reference to fig. 3. Fig. 3 is a schematic flowchart of an image-based recognition method according to another embodiment of the present application.

As shown in fig. 3, the method includes:

s201: images of the surroundings of the vehicle acquired by each of the plurality of image acquisition devices are acquired.

For the description of S201, reference may be made to S101, which is not described herein again.

S202: and identifying the object to be identified in the multi-frame images to generate respective first identification information of the multi-frame images.

For the description of S202, reference may be made to S102, which is not described herein again.

S203: and determining a detection area according to the position information of the first detection frame, wherein the first identification information comprises the first detection frame.

In some embodiments, S203 may include: and determining a detection area according to the position information of the first detection frame, the internal reference of the camera of the source image, the internal reference of the camera of the target image and the external reference between the camera of the source image and the camera of the target image.

In some embodiments, S203 may specifically comprise:

s2031: and determining the position information of the first detection frame in the coordinate system of the camera of the source image according to the position information of the first detection frame and the internal reference of the camera of the source image.

In some embodiments, the position information of the first detection frame in the coordinate system of the camera of the source image may be determined according to equation 1

Formula 1:

wherein, K₁Is a reference of the camera for the source image,

is the position information of the first detection frame.

S2032: and determining the position information of the first detection frame in the coordinate system of the camera of the target image according to the position information of the first detection frame in the coordinate system of the camera of the source image and the external parameters.

In some embodiments, the position information of the first detection frame in the coordinate system of the camera of the target image may be determined according to equation 2

Formula 2:

wherein T is radix Ginseng.

S2033: and determining a detection area according to the position information of the first detection frame in the coordinate system of the camera of the target image and the internal reference of the camera of the target image.

In some embodiments, the detection region may be determined according to equation 3

Formula 3:

wherein, K₂Is the camera's internal reference to the target image.

In the embodiment of the application, the first detection frame of the source image can be accurately mapped to the target image through conversion among the position information, so that the technical effects of accuracy and reliability of the determined detection area are achieved.

S204: and generating attribute information of the object to be identified according to the first identification information and each detection area.

For the description of S204, reference may be made to S104, which is not described herein again.

In the above example, the first detection frame of the source image is accurately mapped to the target image based on a plurality of conversions of the position information, and in other embodiments, the position conversion information between the target image and the source image may be determined first, and the position information of the first detection frame is subjected to position conversion according to the position conversion information to generate the detection area.

That is, in some embodiments, the detection region may be determined based on a plurality of conversions of the position information, and in other embodiments, the position conversion information between the source image and the target image may be calibrated in advance, and after the calibration, a conversion calculation may be performed based on the position conversion information to obtain the detection region.

In some embodiments, after determining the attribute information of the object to be identified, the driving strategy of the vehicle may be controlled based on the attribute information of the object to be identified.

The driving strategy can be used for representing information related to the driving state of the vehicle, such as acceleration driving, deceleration driving, braking, left turning, right turning and the like.

As can be seen from the above example, if the object to be identified is a traffic light, the attribute information of the traffic light may include a color category of the traffic light, and the like, and when the color category is a red light, the server may control the vehicle to brake or to decelerate (where the driving policy may be determined to be braking or decelerating based on the distance between the vehicle and the stop line), and the like.

Therefore, when the driving strategy of the vehicle is controlled based on the attribute information of the object to be recognized, the technical effects of safety and reliability of the driving of the vehicle can be improved.

In order to make the reader understand the method for generating the attribute information of the object to be recognized according to the first identification information and each detection area more deeply, the image-based recognition method according to the embodiment of the present application will be described in detail with reference to fig. 4. Fig. 4 is a schematic flowchart of an image-based recognition method according to another embodiment of the present application.

As shown in fig. 4, the method includes:

s301: images of the surroundings of the vehicle acquired by each of the plurality of image acquisition devices are acquired.

For the description of S301, reference may be made to S101, which is not described herein again.

S302: and identifying the object to be identified in the multi-frame images to generate respective first identification information of the multi-frame images.

For the description of S302, reference may be made to S102, which is not described herein again.

S303: and determining a detection area of a target image according to the first identification information of the source image, wherein the source image is used for representing any frame image in the images, and the target image is used for representing other frame images except the source image in the images.

For the description of S303, reference may be made to S103, or reference may be made to S203, which is not described herein again.

S304: second identification information of each detection area is generated.

Similarly, the second identification information may be used to represent related information obtained by identifying the image, such as a detection frame.

In some embodiments, the recognition network model may be generated by model training, and the second recognition information may be generated according to the recognition network model.

It should be noted that the above description is only an exemplary description of one of the possible ways of generating the second identification information, and is not to be construed as a limitation on the way of generating the second identification information according to the embodiment of the present application.

Based on the above example, if it is determined that at least two detection regions on the image a are specifically 2 detection regions, and each detection region is a detection region determined based on the first identification information of the image B and a detection region determined based on the first identification information of the image C, the 2 detection regions are respectively identified to obtain respective second identification information, and so on, the second identification information of the detection regions on the image B and the image C is obtained. Equivalently, the image a includes one piece of first identification information and two pieces of second identification information; the image B also comprises one piece of first identification information and two pieces of second identification information; one piece of first identification information and two pieces of second identification information are included for the image C.

In some embodiments, the detection region may be subjected to image super-resolution enhancement processing to improve the resolution of the detection region, thereby achieving a technical effect of improving the accuracy and reliability of the generated second identification information.

S305: and merging the first detection frame and the second detection frame, wherein the first identification information comprises the first detection frame, and the second identification information comprises the second detection frame.

The merging process can be used for representing, and acquiring a union set of a first detection frame and a second detection frame corresponding to any image acquisition device.

Based on the above example, if the image capturing device for the image a corresponds to one first identification information and two second identification information, that is, corresponds to one first detection frame and two second detection frames, the first detection frame and the two second detection frames are merged, and the detection frames after merging include both the content in the one first detection frame and the content in the two second detection frames. By analogy, 3 combined detection frames can be obtained.

S306: and performing fusion processing on the combined detection frames to generate attribute information of the object to be identified.

The fusion processing can be used for representing and determining the attribute information of the object to be identified according to the detection frames.

In some embodiments, S306 comprises:

s3061: and extracting the feature vector of the combined detection frame.

S3062: and outputting attribute information of the object to be identified according to the preset fusion model and the feature vector.

Wherein the fusion model is generated based on the network model framework and the sample image. And the embodiment of the application does not limit the selection of the network model framework.

In some embodiments, a method of generating a fusion model may include:

s01: and acquiring a sample image, wherein the sample image is acquired by a plurality of image acquisition devices, and the sample image comprises the object to be identified.

The number of sample images may be set based on requirements, experience, and experimentation, among others. For example, in general, for application scenes where the accuracy of image-based recognition is high, relatively more sample images may be selected, while for application scenes where the accuracy of image-based recognition is low, relatively fewer sample images may be selected.

S02: and generating a fusion model according to the sample image and a preset truth value, wherein the truth value is determined according to at least one image in a plurality of frames of images acquired by a plurality of image acquisition devices at the same time, and the at least one image comprises an object to be identified.

Specifically, the loss may be calculated according to the output result and the true value for the sample image, and the fusion model may be updated and iterated according to the loss until the loss reaches a preset loss threshold, or the number of iterations reaches a preset number threshold.

According to another aspect of the embodiments of the present application, there is also provided an image-based recognition apparatus for performing the method according to any of the above embodiments, such as the method shown in any of fig. 2 to 4.

Referring to fig. 5, fig. 5 is a schematic diagram of an image recognition device according to an embodiment of the present application.

As shown in fig. 5, the apparatus includes:

a first acquiring module 11, configured to acquire images around a vehicle acquired by each of the plurality of image acquiring devices;

the identification module 12 is configured to identify an object to be identified in multiple frames of the images, and generate respective first identification information of the multiple frames of the images;

a determining module 13, configured to determine a detection region of a target image according to first identification information of a source image, where the source image is used to represent any one frame image in each image, and the target image is used to represent other frame images in each image except for the source image;

and a first generating module 14, configured to generate attribute information of the object to be identified according to the first identification information and each detection area.

In some embodiments, the first identification information includes a first detection frame, and the determining module 13 is configured to determine the detection area according to position information of the first detection frame.

In some embodiments, the determining module 13 is configured to determine the detection area according to the position information of the first detection frame, the internal reference of the camera of the source image, the internal reference of the camera of the target image, and the external reference between the camera of the source image and the camera of the target image.

In some embodiments, the determining module 13 is configured to determine, according to the position information of the first detection frame and the internal reference of the camera of the source image, the position information of the first detection frame in the coordinate system of the camera of the source image, determine, according to the position information of the first detection frame in the coordinate system of the camera of the source image and the external reference, the position information of the first detection frame in the coordinate system of the camera of the target image, and determine the detection area according to the position information of the first detection frame in the coordinate system of the camera of the target image and the internal reference of the camera of the target image.

In some embodiments, the determining module 13 is configured to determine position conversion information between the target image and the source image, and perform position conversion on the position information of the first detection frame according to the position conversion information to generate the detection area.

In some embodiments, the first generating module 14 is configured to generate second identification information of each of the detection regions, and generate attribute information of the object to be identified according to the first identification information and the second identification information.

In some embodiments, the first identification information includes a first detection frame, the second identification information includes a second detection frame, and the first generating module 14 is configured to combine the first detection frame and the second detection frame, perform fusion processing on the combined detection frames, and generate the attribute information of the object to be identified.

In some embodiments, the first generating module 14 is configured to extract a feature vector of the combined detection frame, and output attribute information of the object to be identified according to a preset fusion model and the feature vector.

As can be seen in conjunction with fig. 6, in some embodiments, the apparatus further comprises:

a second obtaining module 15, configured to obtain a sample image, where the sample image is obtained by the multiple image obtaining devices, and the sample image includes the object to be identified;

a second generating module 16, configured to generate the fusion model according to the sample image and a preset true value, where the true value is determined according to at least one image of multiple frames of images acquired by the multiple image acquisition devices at the same time, and the at least one image includes the object to be identified.

According to another aspect of the embodiments of the present disclosure, there is also provided an electronic device including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to cause the method of any of the above embodiments to be performed, such as to cause the method of any of the embodiments of fig. 2-4 to be performed.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Among other things, electronic devices may be used to characterize various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other suitable computers. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

For example, the electronic device may be a vehicle BOX (T-BOX) provided in a vehicle, a Domain Controller (DC), a Multi-domain Controller (MDC), an On Board Unit (OBU), a chip, or the like.

In particular, the electronic device may include at least one processor 101, a communication bus 102, and at least one communication interface 103.

The processor 101 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the routine sequences of the present disclosure. The processor 101 may be connected to the memory 104 through at least one communication interface 103, and the memory 104 may be disposed inside the electronic device or disposed outside the electronic device. For example, the memory 104 may be a register, cache, etc. internal to the electronic device, and the memory 104 may also be a storage device external to the electronic device.

For example, if the electronic device is a vehicle box, the vehicle box includes at least one processor, a communication bus, and at least one communication interface. And a processor in the electronic device may be connected to a storage device disposed outside the electronic device through the communication interface, so that the communication interface obtains an instruction from the storage device disposed outside the electronic device, and the processor executes the instruction to implement the method according to any one of the embodiments in fig. 2 to fig. 4.

Of course, in other embodiments, the electronic device may be provided with a memory therein for storing instructions, and the processor may retrieve the instructions from the memory through the communication bus, and when executing the instructions, the processor may implement the method according to any one of fig. 2 to 4.

It should be noted that, the vehicle-mounted box is only exemplarily described here, and the electronic device may also be any one of a domain controller, a multi-domain controller, a vehicle-mounted unit and a chip, and the principle is the same as that of the vehicle-mounted box.

In some embodiments, if the memory 104 is a storage device disposed outside the electronic device, the processor 101 may be connected to the external storage device through the communication interface 103 to collect instructions from the external storage device through the communication interface 103, and when the processor 101 executes the instructions, the method as shown in any one of fig. 2 to fig. 4 may be implemented.

In some embodiments, if memory 104 is disposed in an electronic device, memory 104 may be a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 104 may be self-contained and coupled to the processor 101 via the communication bus 102. The memory 104 may also be integrated with the processor 101.

The memory 104 may be a computer storage medium provided in the present disclosure, and the memory 104 stores instructions executable by the at least one processor 101 to cause the at least one processor 101 to execute the method according to any one of fig. 2 to 4.

Memory 104, which is a type of computer storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules. The processor 101 executes various functional applications and data processing of the electronic device by running non-transitory software programs, instructions and modules stored in the memory 104, i.e. implements the method as shown in any of the embodiments of fig. 2 to 4.

The memory 104 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the stored data area may store data created according to use of the vehicle-end device, and the like. Further, the memory 104 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 104 may optionally include memory located remotely from the processor 101, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, car networking, intranets, local area networks, blockchain networks, mobile communication networks, and combinations thereof.

The communication bus 102 may include a path that conveys information between the aforementioned components.

The communication interface 103 may be any transceiver or IP port or bus interface, etc. for communicating with internal or external devices or electronic devices or communication networks, such as ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), etc. If the electronic device is a functional unit integrated in the vehicle, the communication interface 103 includes one or more of a transceiver for communication with a Network outside the vehicle, a bus interface for communication with other internal units in the vehicle (e.g., a Controller Area Network (CAN) bus interface), and the like.

In particular implementations, processor 101 may include one or more CPUs such as CPU0 and CPU1 in fig. 7 for one embodiment.

In particular implementations, an electronic device may include multiple processors, such as processor 101 and processor 107 in FIG. 7, for example, as an embodiment. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In a specific implementation, the electronic device may further include an output device 105 and an input device 106, as an embodiment. The output device 105 is in communication with the processor 101 and may display information in a variety of ways. For example, the output device 105 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 106 is in communication with the processor 101 and can accept user input in a variety of ways. For example, the input device 106 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.

When the vehicle-end device shown in fig. 7 is a chip, the function/implementation process of the communication interface 103 may also be implemented by pins or circuits.

According to another aspect of the embodiments of the present application, there is also provided a chip.

Referring to fig. 8, fig. 8 is a schematic diagram of a chip according to an embodiment of the disclosure.

As shown in fig. 8, the chip includes:

an input interface 21 for images around the vehicle acquired by each of the plurality of image acquisition devices;

a logic circuit 22, configured to execute the method according to any of the above embodiments, for example, execute the method shown in any of fig. 2 to 4, and generate attribute information of an object to be identified;

and the output interface 23 is used for controlling the driving strategy of the vehicle according to the attribute information of the object to be identified.

According to another aspect of the embodiments of the present application, there is also provided a vehicle including the image-based recognition apparatus according to any one of the embodiments, such as the image-based recognition apparatus shown in fig. 5 or 6; or, an electronic device as described in the above example, such as the electronic device shown in fig. 7; alternatively, a chip as described in the above example, such as the chip shown in fig. 8.

In some embodiments, the vehicle further comprises: and the image acquisition devices are used for acquiring images.

Specifically, when the vehicle includes the image-based recognition device as shown in fig. 5 or 6, then the plurality of image capturing devices are respectively connected with the image-based recognition device and transmit the respective captured images to the image-based recognition device; when the vehicle includes the electronic device shown in fig. 7, the plurality of image capturing devices are respectively connected to the electronic device and transmit the respective captured images to the electronic device; when the vehicle includes the chip as shown in fig. 8, a plurality of image capturing devices are respectively connected to the chip and transmit respective captured images to the chip.

The interior components of the vehicle are now exemplarily described with reference to fig. 9.

As shown in fig. 9, the vehicle includes: the system comprises a processor 201, an external memory interface 202, an internal memory 203, a Universal Serial Bus (USB) interface 204, a power management module 205, an antenna 1, an antenna 2, a mobile communication module 206, a wireless communication module 207, a sensor 208, an image acquisition device 209 and a vehicle-mounted box 210. It is to be understood that the structure illustrated in the present embodiment does not constitute a specific limitation of the vehicle.

Wherein the vehicle may interact with external devices through the wireless communication module 207.

The sensors 208 include, among other things, radar as described in fig. 9.

In other embodiments of the present disclosure, the vehicle may include more or fewer components than illustrated, or some components may be combined, some components may be split, or a different arrangement of components. Also, the illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 201 may include one or more processing units, such as: the processor 201 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors. In some embodiments, the vehicle may also include one or more processors 201. The controller may be a neural center and a command center of the vehicle, among others. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution. A memory may also be provided in the processor 201 for storing instructions and data. In some embodiments, the memory in the processor 201 is a cache memory.

In some embodiments, the processor 201 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, and/or a serial universal bus (USB) interface, etc. The USB interface 211 is an interface conforming to the USB standard specification, and can be used to connect a charger to charge a vehicle.

It should be understood that the illustrated interface connection relationship between the modules in the embodiments of the present disclosure is only an exemplary illustration, and does not constitute a structural limitation for the vehicle. In other embodiments of the present disclosure, the vehicle may also adopt different interface connection manners or a combination of a plurality of interface connection manners in the above embodiments.

It should be noted that, when the vehicle includes an image-based recognition device, the image-based recognition device may be the processor 201 as described in fig. 9, or may be the vehicle-mounted box 210 as described in fig. 9; when the vehicle includes an electronic device, then the electronic device may be the processor 201 as described in fig. 9, or may be the in-vehicle box 210 as described in fig. 9; when the vehicle includes a chip, then the chip may be a chip in the processor 201 as described in fig. 9, or may be a chip in the on-board box 210 as described in fig. 9.

It is clear to a person skilled in the art that the descriptions of the embodiments provided in the present application may be referred to each other, and for convenience and brevity of the description, for example, the functions and the steps performed by the devices and the apparatuses provided in the embodiments of the present application may be referred to the relevant descriptions of the method embodiments of the present application, and the method embodiments and the device embodiments may be referred to each other.

Those skilled in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs all or part of the steps comprising the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways without departing from the scope of the application. For example, the above-described embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Additionally, the apparatus and methods described, as well as the illustrations of various embodiments, may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present application. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electronic, mechanical or other form.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image-based recognition method, the method comprising:

2. The method of claim 1, wherein the first recognition information comprises a first detection box, and wherein determining the detection area of the target image according to the first recognition information of the source image comprises:

3. The method of claim 2, wherein the determining the detection area according to the position information of the first detection frame comprises:

4. The method of claim 3, wherein the determining the detection area according to the position information of the first detection frame, the internal reference of the camera of the source image, the internal reference of the camera of the target image, and the external reference between the camera of the target image and the camera of the source image comprises:

5. The method according to claim 2, wherein before the determining the detection area according to the position information of the first detection frame, the method comprises:

6. The method according to any one of claims 1 to 5, wherein the generating attribute information of the object to be recognized based on the first recognition information and each of the detection regions comprises:

generating second identification information of each detection area;

7. The method according to claim 6, wherein the first identification information includes a first detection box, the second identification information includes a second detection box, and the generating the attribute information of the object to be identified according to the first identification information and the second identification information includes:

merging the first detection frame and the second detection frame;

8. The method according to claim 7, wherein the fusing the merged detection frames to generate the attribute information of the object to be recognized comprises:

extracting the feature vector of the combined detection frame;

9. The method according to claim 8, wherein before the outputting the attribute information of the object to be recognized according to the preset fusion model and the feature vector, the method further comprises:

10. An image-based recognition apparatus, the apparatus comprising:

11. The apparatus of claim 10, wherein the first identification information comprises a first detection frame, and the determining module is configured to determine the detection area according to position information of the first detection frame.

12. The apparatus of claim 11, wherein the determining module is configured to determine the detection area according to the position information of the first detection frame, the internal reference of the camera of the source image, the internal reference of the camera of the target image, and the external reference between the camera of the source image and the camera of the target image.

13. The apparatus according to claim 12, wherein the determining module is configured to determine the position information of the first detection frame in the coordinate system of the camera of the source image according to the position information of the first detection frame and the internal reference of the camera of the source image, determine the position information of the first detection frame in the coordinate system of the camera of the target image according to the position information of the first detection frame in the coordinate system of the camera of the source image and the external reference, and determine the detection area according to the position information of the first detection frame in the coordinate system of the camera of the target image and the internal reference of the camera of the target image.

14. The apparatus according to claim 11, wherein the determining module is configured to determine position conversion information between the target image and the source image, and perform position conversion on the position information of the first detection frame according to the position conversion information to generate the detection area.

15. The apparatus according to any one of claims 10 to 14, wherein the first generating module is configured to generate second identification information for each detection area, and generate attribute information of the object to be identified according to the first identification information and the second identification information.

16. The apparatus according to claim 15, wherein the first identification information includes a first detection frame, the second identification information includes a second detection frame, and the first generating module is configured to perform merging processing on the first detection frame and the second detection frame, perform fusion processing on the merged detection frames, and generate the attribute information of the object to be recognized.

17. The apparatus according to claim 16, wherein the first generating module is configured to extract a feature vector of the merged detection frame, and output the attribute information of the object to be recognized according to a preset fusion model and the feature vector.

18. The apparatus of claim 17, further comprising:

19. A computer storage medium having stored thereon computer instructions which, when executed by a processor, cause the method of any of claims 1 to 9 to be performed.

20. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to cause the method of any of claims 1-9 to be performed.

21. A chip, comprising:

logic circuitry for performing the method of any one of claims 1 to 9, generating attribute information of an object to be identified;

22. A vehicle, characterized in that the vehicle comprises a device according to any one of claims 10 to 18; or, the electronic device of claim 20; alternatively, a chip as claimed in claim 21.