CN113705279B

CN113705279B - Method and device for identifying position of target object

Info

Publication number: CN113705279B
Application number: CN202010435949.XA
Authority: CN
Inventors: 刘毅; 周志鹏
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2022-07-08
Anticipated expiration: 2040-05-21
Also published as: CN113705279A

Abstract

The embodiment of the application provides a method and a device for identifying the position of a target object, relates to the field of intelligent traffic, and specifically comprises the following steps: inputting an image collected by a front camera of a vehicle into a detection model; the detection model comprises a feature extraction network layer and a detection head; extracting the characteristics of a target object in the image by using a characteristic extraction network layer to obtain a first characteristic diagram; acquiring a second feature map from the first feature map, wherein the second feature map comprises part or all of features of the target object, and the second feature map is smaller than the first feature map; and calculating a second characteristic diagram by using the detection head to obtain the position information of the target object. The method and the device for extracting the target object feature can avoid the phenomenon of low accuracy of target object feature extraction caused by image cutting, and can reduce the operation amount when the cut target feature graph containing the target object feature is subsequently calculated due to cutting of the feature graph, so that the method and the device are suitable for low-computation-force chips.

Description

Method and device for identifying position of target object

Technical Field

The present application relates to intelligent transportation in the field of data processing technologies, and in particular, to a method and an apparatus for identifying a position of a target object.

Background

Advanced Driving Assistance System (ADAS) is an important technology in the field of automatic driving. The ADAS collects data of the vehicle during operation using various sensors mounted on the vehicle, and performs calculation and analysis in combination with navigator map data and the like to complete vehicle control. For example, pedestrian collision warning is a core function in ADAS, and the pedestrian collision warning timely sends out warning information when detecting a pedestrian by performing pedestrian detection. When the pedestrian detection is carried out, the accurate and complex algorithm can obtain higher detection accuracy. However, since some of the chips built in the vehicle are low-computational-power chips, precise and complicated algorithms cannot be operated. Therefore, it is worth to discuss how to detect a pedestrian on the premise that the built-in chip is a low computation chip.

In the prior art, a pedestrian detection method suitable for a low computational power chip is provided. The method includes the steps that an image collected by a vehicle-mounted camera is cut, the cut image is input into a deep learning network, feature extraction and regression of a coordinate point of a target center are conducted through the deep learning network, and therefore a pedestrian detection result is obtained.

However, the prior art method may result in low accuracy of pedestrian detection.

Disclosure of Invention

The embodiment of the application provides a method and a device for identifying the position of a target object, and aims to solve the technical problem that in the prior art, the accuracy rate of pedestrian detection is low.

A first aspect of an embodiment of the present application provides a method for identifying a position of a target object, including:

inputting an image collected by a front camera of a vehicle into a detection model; the detection model comprises a feature extraction network layer and a detection head; extracting the characteristics of a target object in the image by using the characteristic extraction network layer to obtain a first characteristic diagram; acquiring a second feature map from the first feature map, wherein the second feature map comprises part or all of features of the target object, and the second feature map is smaller than the first feature map; and calculating the second characteristic diagram by using the detection head to obtain the position information of the target object. In this way, after the feature map of the target object is extracted and obtained from the image, the feature map is cut, so that the phenomenon of low accuracy of feature extraction of the target object caused by image cutting can be avoided, and the feature map is cut, so that the calculation amount can be reduced when the cut target feature map containing the features of the target object is calculated subsequently, and the method is suitable for a low-calculation-force chip.

In a possible implementation manner, the obtaining the second feature map from the first feature map includes: and extracting the area of the lower half area of the first characteristic diagram from the first characteristic diagram to obtain a second characteristic diagram. Therefore, the detection precision can be ensured, and the calculation overhead can be greatly reduced.

In a possible implementation manner, the obtaining the second feature map from the first feature map includes: and extracting the area in the lane line in the first characteristic diagram to obtain a second characteristic diagram. Therefore, the calculation amount can be reduced under the condition of providing accurate warning for the vehicle.

In a possible implementation manner, the obtaining a second feature map from the first feature map includes: and extracting the area in the lower half area of the first characteristic diagram and in the lane line in the first characteristic diagram to obtain a second characteristic diagram. Therefore, the calculation amount can be greatly reduced under the condition of providing accurate warning for the vehicle.

In a possible implementation manner, the extracting, by using the feature extraction network layer, the feature of the target object in the image to obtain a feature map includes: extracting the characteristics of the target object in the image by using the characteristic extraction network layer to obtain a three-layer first characteristic diagram; wherein the down-sampling multiples of the first feature map are different for each layer; the obtaining of the second feature map from the first feature map includes: and acquiring three second characteristic maps in the three layers of first characteristic maps by using the detection head respectively.

In a possible implementation manner, the calculating the second feature map by using the detection head to obtain the position information of the target object includes: and fusing the three second feature maps by using the detection head, and outputting the position information of the target object. It can be understood that in a general detection network, the backhaul and the detection head respectively account for half of the total calculation amount, and then the separation reduces the calculation amount of the detection head by half, so that the total network calculation amount is reduced by 25%, and in such a way, the small detection model can be operated on a low-performance vehicle machine.

In one possible implementation, the warning indicating that the target object is too close is performed in a case where the distance of the target object from the vehicle is below a threshold value. Thereby better assisting the driver in driving.

In a possible implementation manner, the down-sampling multiples of the three layers of the first feature maps are respectively: 8 times, 16 times and 32 times.

A second aspect of the embodiments of the present application provides an apparatus for identifying a position of a target object, including:

the input module is used for inputting the image collected by the front camera of the vehicle into the detection model; the detection model comprises a feature extraction network layer and a detection head;

the characteristic extraction module is used for extracting the characteristics of the target object in the image by utilizing the characteristic extraction network layer to obtain a first characteristic diagram;

an obtaining module, configured to obtain a second feature map from the first feature map, where the second feature map includes part or all of features of the target object, and the second feature map is smaller than the first feature map;

and the calculation module is used for calculating the second characteristic diagram by using the detection head to obtain the position information of the target object.

In a possible implementation manner, the obtaining module is specifically configured to:

and extracting the area of the lower half area of the first characteristic diagram from the first characteristic diagram to obtain a second characteristic diagram.

and extracting the area in the lane line in the first characteristic diagram to obtain a second characteristic diagram.

and extracting the area in the lower half area of the first characteristic diagram and in the lane line in the first characteristic diagram to obtain a second characteristic diagram.

In a possible implementation manner, the feature extraction module is specifically configured to:

extracting the characteristics of the target object in the image by using the characteristic extraction network layer to obtain a three-layer first characteristic diagram; wherein the downsampling multiples of the first feature map of each layer are different;

the acquisition module is specifically configured to:

and acquiring three second characteristic maps in the three layers of first characteristic maps by using the detection head respectively.

In a possible implementation manner, the calculation module is specifically configured to:

and fusing the three second feature maps by using the detection head, and outputting the position information of the target object.

In a possible implementation manner, the method further includes:

and the warning module is used for executing warning for indicating that the target object is too close under the condition that the distance between the target object and the vehicle is lower than a threshold value.

A third aspect of the embodiments of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the preceding first aspects.

A fourth aspect of embodiments of the present application provides a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of the preceding first aspects.

In summary, the embodiment of the present application has the following beneficial effects with respect to the prior art:

the embodiment of the application provides a method and a device for identifying the position of a target object, which are not used for cutting an image, but used for cutting a feature map of the target object after the feature map is extracted from the image, so that the phenomenon of low accuracy of feature extraction of the target object caused by image cutting can be avoided, and the operation amount can be reduced when the cut target feature map containing the features of the target object is subsequently calculated due to the cutting of the feature map, and the method and the device are suitable for a low-computation chip. Specifically, an image collected by a front camera of the vehicle can be input into the detection model; the detection model comprises a feature extraction network layer and a detection head; extracting the characteristics of a target object in the image by utilizing the characteristic extraction network layer to obtain a first characteristic diagram; acquiring a second feature map from the first feature map, wherein the second feature map comprises part or all of features of the target object, and the second feature map is smaller than the first feature map; and calculating a second characteristic diagram by using the detection head to obtain the position information of the target object.

Drawings

Fig. 1 is a schematic diagram of a system architecture to which a method for identifying a location of a target object according to an embodiment of the present application is applied;

fig. 2 is a schematic flowchart of a method for identifying a position of a target object according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of an architecture of a detection model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an architecture of another detection model provided in the embodiments of the present application;

fig. 5 is a schematic structural diagram of a device for identifying a position of a target object according to an embodiment of the present application;

fig. 6 is a block diagram of an electronic device for implementing a method for location identification of a target object according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

The position identification method of the target object in the embodiment of the application can be applied to vehicles, the vehicles can be unmanned vehicles or manned vehicles with an auxiliary driving function, the vehicles can specifically include cars, off-road vehicles, trucks and the like, and the embodiment of the application does not specifically limit the vehicles.

The front camera can be arranged in the vehicle, for example, the front camera can be arranged between a rearview mirror and a front window in the vehicle, the front camera can collect images in front of the vehicle, and the camera can be a fisheye camera or any other camera.

The detection model described in the embodiment of the present application may be a neural network model based on deep learning, and the detection model may be trained based on a sample image including a target object, and the detection model may be used to identify the target object and output position information of the target object, where the position information of the target object may be, for example, coordinates of a center point of the target object.

The detection model of the embodiment of the application can be operated on chips with low cost such as a vehicle-mounted terminal which is relatively universal in a vehicle, is easy to deploy and has low requirements on the computing capacity of the chips, so that low-end vehicles can recognize the position of a target object.

The target object described in the embodiments of the present application may be any object that may affect the driving of a vehicle, such as a pedestrian, a bicycle, an electric vehicle, or a motorcycle. When the target object is a pedestrian, the function may correspond to a pedestrian collision warning function in the ADAS, for example, when the position of the pedestrian is detected to be closer to the position of the vehicle, the driver may be warned by using voice, text, pictures, images, or the like, so as to prompt the driver to pay attention to the pedestrian.

The second feature map described in the embodiment of the present application may be obtained by cutting the first feature map, and when the first feature map is cut, the second feature map including features of part or all of the target object may be cut based on the target object. The second feature map including the features of a part or all of the target object may be obtained based on a certain rule, for example, cutting off the upper half of the first feature in the height direction, instead of the target object. The specific cutting method will be described in detail in the following embodiments, and will not be described herein. The size of the second characteristic diagram is smaller than that of the first characteristic diagram, so that the calculation amount is reduced when the second characteristic diagram is adopted to calculate the position information of the target object subsequently.

As shown in fig. 1, fig. 1 is a schematic diagram of an application scenario architecture to which the method provided in the embodiment of the present application is applied.

The vehicle 1 shown in fig. 1 may be provided with a vehicle-mounted terminal (or referred to as a vehicle-to-vehicle terminal or a vehicle-to-all (V2X) communication device) for vehicle-to-vehicle communication, and the vehicle 1 in the figure may communicate with other vehicles and vehicles (V2V) through the provided vehicle-mounted terminal, communicate with other pedestrians and pedestrians (V2P), communicate with other roadside infrastructure (V2I), or communicate with a network (V2N). The vehicle-mounted terminal can realize all-round connection and high-efficiency information interaction between the vehicle and pedestrians, other vehicles, road side equipment and networks, and realize the vehicle networking functions such as information service, traffic safety, traffic efficiency and the like. For example, after the vehicle-mounted terminal can be connected with a mobile phone of a driver, the driver can play music, map navigation, make and receive calls and the like by using the vehicle-mounted terminal.

Along with the development of electronic technology and intelligent vehicle technology, functions that the vehicle-mounted terminal can realize are also more and more abundant, and the vehicle-mounted terminal needs to present more functions to the driver through visual display page for the driver is through selecting the icon control in the page, and control vehicle-mounted terminal realizes the function that the icon control corresponds. For example, the in-vehicle terminal 2 is provided on a center console inside the vehicle 1 as shown in fig. 1, and the in-vehicle terminal 2 presents functions that can be provided by the in-vehicle terminal to the user by means of icon controls on a page displayed on the display interface 21. As an example, in fig. 1, icon controls corresponding to functions including call making, messaging, map navigation, news watching, television watching and setting are included in the page. The selection of the possible functions for the driver of the vehicle 1 can be effected simply by rotating a knob on the centre console 22 or by clicking a key on the centre console 22.

The vehicle-mounted terminal of the embodiment of the application can be provided with a detection model. A front camera 11 may also be provided in the vehicle 1, for example, as shown in fig. 1, the front camera 11 may be provided between a rear view mirror and a front window in the vehicle.

When the vehicle 1 is running, the front camera 11 can capture an image, the vehicle-mounted terminal can input the image into the detection model, identify the target object based on the detection model and output the position of the target object, and when the distance between the target object and the vehicle is too short, the driver can be prompted to pay attention through a warning image or audio-visual information in the display interface 21, so that the driving of the driver can be assisted.

As shown in fig. 2, fig. 2 is a schematic flowchart of a method for identifying a position of a target object according to an embodiment of the present application. The method specifically comprises the following steps:

s101: inputting an image collected by a front camera of a vehicle into a detection model; the detection model comprises a feature extraction network layer (backbone) and a detection head (detection head).

In the embodiment of the application, the backbone is used for automatically extracting the complex features of the image, and the detection head is used for regressing the coordinate point of the center of the target object.

For example, the vehicle can periodically shoot images by using a front camera during running, and the vehicle-mounted terminal can input the images into the detection model.

S102: and extracting the characteristics of the target object in the image by using the characteristic extraction network layer to obtain a first characteristic diagram.

In the embodiment of the application, the feature extraction network layer of the detection model may extract features of the target image in the image based on any common method.

For example, taking the target object as a pedestrian as an example, the feature extraction network layer may extract features of the pedestrian in the image to obtain a first feature map, where the number of the first feature maps may be multiple and is related to a specific setting of the detection model, and the number of the first feature maps is not specifically limited in the embodiment of the present application.

S103: and acquiring a second feature map from the first feature map, wherein the second feature map comprises part or all of features of the target object, and the second feature map is smaller than the first feature map.

In this embodiment, in a possible implementation manner, when the second feature map is obtained from the first feature map, the second feature map including features of part or all of the target object may be obtained from the first feature map based on the target object.

In another possible implementation manner, when the second feature map is obtained from the first feature map, the second feature map including the features of part or all of the target object may be obtained based on a certain rule, for example, cutting off the upper half area of the first feature in the height direction, instead of the target object.

When the number of the first feature maps is plural, a second feature map may be acquired in each of the first feature maps. The size of the second characteristic diagram is smaller than that of the first characteristic diagram, so that the calculation amount is reduced when the second characteristic diagram is adopted to calculate the position information of the target object subsequently.

In addition, since the feature extraction is performed on the complete image in S102 in the embodiment of the present application, the feature of the target object can be accurately extracted, the target object can be accurately identified, and when the second feature map is obtained by subsequently cropping the first feature map, even if the cropped second feature map only includes the features of a part of the target object, the target object cannot be identified erroneously.

S104: and calculating the second characteristic diagram by using the detection head to obtain the position information of the target object.

In this embodiment of the present application, the detection head may calculate the second feature map in any general manner to obtain the position information of the target object, which is not specifically limited in this embodiment of the present application.

In summary, the embodiment of the present application provides a method and an apparatus for identifying a position of a target object, which do not crop an image, but crop a feature map of the target object after extracting the feature map from the image, so as to avoid a phenomenon of low accuracy of feature extraction of the target object due to image cropping, and reduce an amount of computation when computing a subsequent target feature map containing features of the target object due to the cropping of the feature map, and are suitable for a low-computation-effort chip. Specifically, an image collected by a front camera of the vehicle can be input into the detection model; the detection model comprises a feature extraction network layer and a detection head; extracting the characteristics of a target object in the image by using a characteristic extraction network layer to obtain a first characteristic diagram; acquiring a second feature map from the first feature map, wherein the second feature map comprises part or all of features of the target object, and the second feature map is smaller than the first feature map; and calculating a second characteristic diagram by using the detection head to obtain the position information of the target object.

On the basis of the embodiment corresponding to fig. 2, in a possible implementation manner, the extracting, by using the feature extraction network layer, the feature of the target object in the image to obtain a feature map includes: extracting the characteristics of the target object in the image by using the characteristic extraction network layer to obtain a three-layer first characteristic diagram; wherein the downsampling multiples of the first feature map of each layer are different; the obtaining of the second feature map from the first feature map includes: and acquiring three second characteristic maps in the three layers of first characteristic maps by using the detection head respectively. The calculating the second feature map by using the detection head to obtain the position information of the target object includes: and fusing the three second feature maps by using the detection head, and outputting the position information of the target object.

By way of example, fig. 3 shows an architecture diagram of a detection model according to an embodiment of the present application.

For example, 448 × 256 images may be input to the detection model, three layers of first feature maps may be output after the backsbone, the three layers of first feature maps are respectively 56 × 32 times of downsampling 8 times, 28 × 16 times of downsampling 16 times, and 14 × 8 times of downsampling 32 times, the three layers of first feature maps are separated in the height direction (may be referred to as post-separation), a feature region of the lower half region of the feature map for the center point of the pedestrian is obtained and is referred to as a second feature map, the three second feature maps are respectively 56 × 16, 28 × 8, and 14 × 4, and the three second feature maps are input to the detection head portion fusion (ADD) and then target center point coordinates 28 × 16 n of the pedestrian are output.

It should be noted that, the way of the back separation in the embodiment of the present application is different from the way of the front separation (pre-splitting) in the original image, as shown in fig. 4, the front separation is to cut and retain the original image in a half area directly, unlike the back separation, because after the image is cut, the upper half of the body including the pedestrian may be removed directly, thereby seriously affecting the pedestrian feature extraction. The post-separation does not affect the acquisition of the overall pedestrian characteristics, and only the characteristic region extraction of the central point of the pedestrian is described after the backbone, so that the accurate central coordinate of the pedestrian can be obtained, and the calculation amount is reduced under the condition that the detection accuracy is not changed.

It can be understood that in a general detection network, the backhaul and the detection head respectively account for half of the total calculation amount, and then the separation reduces the calculation amount of the detection head by half, so that the total network calculation amount is reduced by 25%, and in such a way, the small detection model can be operated on a low-performance vehicle machine.

On the basis of the embodiment corresponding to fig. 2, in a possible implementation manner, the acquiring, by S103, the second feature map from the first feature map includes: and extracting the area of the lower half area of the first characteristic diagram from the first characteristic diagram to obtain a second characteristic diagram.

In the examples of the present application, the following facts were found: the driving scene shot by the vehicle-mounted front camera is taken as an image, the level of the camera is generally adjusted, the horizontal line of the vanishing point of the road surface is approximately positioned at a position which is one half to two thirds away from the top of the picture, the center of a pedestrian is almost impossible to be higher than the position which is one half away from the picture, the height of the camera in the vehicle away from the ground is 1.5 meters, and if the camera is parallel to the ground, the central point of the pedestrian with the height of 2 meters is also below 1.5 meters. Based on the fact that this pedestrian is imaged on the road surface, it can be considered that the feature region in the detection model, which is responsible for estimating the center point of the pedestrian, is in the lower half of the feature map.

It is understood that the above fact also applies to a target object such as a bicycle, an electric vehicle, a motorcycle, or the like, which is highly similar to a pedestrian.

Based on the facts, the embodiment of the application uses the detection model to extract the region of the lower half area of the first feature map from the first feature map for pedestrian position detection, so that not only can the detection precision be ensured, but also the calculation overhead can be greatly reduced.

In the present embodiment, the lower half may be a region 1/3 to 3/4 below the first characteristic diagram, and the present embodiment does not specifically limit the lower half.

On the basis of the embodiment corresponding to fig. 2, in a possible implementation manner, the acquiring, in S103, a second feature map from the first feature map includes: and extracting the area in the lane line in the first characteristic diagram to obtain a second characteristic diagram.

In the embodiment of the application, it is found that the target object in the area outside the lane line has little influence on the driving of the vehicle, so that the area inside the lane line can be obtained by cutting off the part outside the lane line in the first feature map as the second feature map, and exemplarily, the lane line can be recognized in the first feature map, and the part outside the lane line in the first feature map can be cut off to obtain the second feature map. Therefore, the calculation amount can be reduced under the condition of providing accurate warning for the vehicle.

On the basis of the embodiment corresponding to fig. 2, in a possible implementation manner, the acquiring, in S103, a second feature map from the first feature map includes: and extracting the area in the lower half area of the first characteristic diagram and in the lane line in the first characteristic diagram to obtain a second characteristic diagram.

In the embodiment of the application, the area in the lane line is further extracted from the lower half area of the first feature map, so that the second feature map can be further reduced, and the calculation amount is further reduced.

On the basis of the corresponding embodiment in fig. 2, in a possible implementation manner, in the case that the distance between the target object and the vehicle is lower than a threshold value, an alert indicating that the target object is too close is executed.

In this embodiment of the present application, the threshold may be set according to an actual application scenario, and may be any value between 1 meter and 50 meters, for example, and the threshold is not specifically limited in this embodiment of the present application.

When the distance of the target object from the vehicle is below the threshold, if the driver does not notice the target object, a collision accident may be caused, and therefore, a warning indicating that the target object is too close may be performed. For example, the attention of the driver can be prompted through a warning image or a video and audio in a central control screen of the vehicle, so that the driving of the driver can be well assisted.

Fig. 5 is a schematic structural diagram of an embodiment of an apparatus for identifying a position of a target object according to the present application. As shown in fig. 5, the apparatus for identifying a position of a target object according to this embodiment includes:

the input module 31 is used for inputting the image collected by the front camera of the vehicle into the detection model; the detection model comprises a feature extraction network layer and a detection head;

a feature extraction module 32, configured to extract features of the target object in the image by using the feature extraction network layer, so as to obtain a first feature map;

an obtaining module 33, configured to obtain a second feature map from the first feature map, where the second feature map includes part or all of features of the target object, and the second feature map is smaller than the first feature map;

and the calculating module 34 is configured to calculate the second feature map by using the detection head to obtain the position information of the target object.

the acquisition module is specifically configured to:

In a possible implementation manner, the method further includes:

The embodiment of the application provides a method and a device for identifying the position of a target object, which are not used for cutting an image, but used for cutting a feature map of the target object after the feature map is extracted from the image, so that the phenomenon of low accuracy of feature extraction of the target object caused by image cutting can be avoided, and the operation amount can be reduced when the cut target feature map containing the features of the target object is subsequently calculated due to the cutting of the feature map, and the method and the device are suitable for a low-computation chip. Specifically, an image collected by a front camera of the vehicle can be input into the detection model; the detection model comprises a feature extraction network layer and a detection head; extracting the characteristics of a target object in the image by using a characteristic extraction network layer to obtain a first characteristic diagram; acquiring a second feature map from the first feature map, wherein the second feature map comprises part or all of features of the target object, and the second feature map is smaller than the first feature map; and calculating a second characteristic diagram by using the detection head to obtain the position information of the target object.

The device for identifying the position of the target object provided in the embodiments of the present application may be used to execute the method shown in the corresponding embodiments, and the implementation manner and the principle thereof are the same and will not be described again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, the embodiment of the present application is a block diagram of an electronic device of a method for identifying a position of a target object. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). One processor 601 is illustrated in fig. 6.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of location identification of a target object provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of position identification of a target object provided herein.

The memory 602, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the input module 31, the feature extraction module 32, the acquisition module 33, and the calculation module 34 shown in fig. 5) corresponding to the method of identifying the position of the target object in the embodiment of the present application. The processor 601 executes various functional applications of the server and data processing, i.e., a method of implementing location identification of a target object in the above method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device identified according to the position of the target object, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 may optionally include memory located remotely from the processor 601, which may be connected over a network to a location-aware electronic device of the target object. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of location identification of a target object may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus for position recognition of the target object, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the image is not cut, the feature graph of the target object is extracted from the image and then cut, so that the phenomenon of low accuracy of feature extraction of the target object caused by image cutting can be avoided, and the feature graph is cut, so that the calculation amount can be reduced when the cut target feature graph containing the features of the target object is calculated subsequently, and the method is suitable for a low-calculation-force chip. Specifically, an image acquired by a front camera of the vehicle can be input into the detection model; the detection model comprises a feature extraction network layer and a detection head; extracting the characteristics of a target object in the image by using a characteristic extraction network layer to obtain a first characteristic diagram; acquiring a second feature map from the first feature map, wherein the second feature map comprises part or all of features of the target object, and the second feature map is smaller than the first feature map; and calculating a second characteristic diagram by using the detection head to obtain the position information of the target object.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments are not intended to limit the scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of position identification of a target object, the method comprising:

inputting an image collected by a front camera of a vehicle into a detection model; the detection model comprises a feature extraction network layer and a detection head;

extracting the characteristics of a target object in the image by using the characteristic extraction network layer to obtain a first characteristic diagram;

extracting the area of the lower half area of the first characteristic diagram from the first characteristic diagram to obtain a second characteristic diagram; or,

extracting a region in the lane line in the first characteristic diagram to obtain a second characteristic diagram; or,

extracting a region in the lower half area of the first characteristic diagram and in the lane line from the first characteristic diagram to obtain a second characteristic diagram; the second feature map comprises part or all of the features of the target object, and the second feature map is smaller than the first feature map;

and calculating the second characteristic diagram by using the detection head to obtain the position information of the target object.

2. The method according to claim 1, wherein the extracting features of the target object in the image by using the feature extraction network layer to obtain a feature map comprises:

extracting the characteristics of the target object in the image by using the characteristic extraction network layer to obtain a three-layer first characteristic diagram; wherein the down-sampling multiples of the first feature map are different for each layer;

the obtaining of the second feature map from the first feature map includes:

3. The method of claim 2, wherein the calculating the second feature map using the detection head to obtain the position information of the target object comprises:

and fusing the three second characteristic graphs by using the detection head, and outputting the position information of the target object.

4. The method of claim 3, further comprising:

executing an alert indicating that the target object is too close if the distance of the target object from the vehicle is below a threshold.

5. The method of claim 4, wherein the down-sampling multiples of the three-level first profile are respectively: 8 times, 16 times and 32 times.

6. An apparatus for location identification of a target object, comprising:

the acquisition module is used for extracting the area of the lower half area of the first characteristic diagram from the first characteristic diagram to obtain a second characteristic diagram; or,

extracting an area in a lane line in the first characteristic diagram from the first characteristic diagram to obtain a second characteristic diagram; or,

7. The apparatus of claim 6, wherein the feature extraction module is specifically configured to:

the acquisition module is specifically configured to:

8. The apparatus of claim 7, wherein the computing module is specifically configured to:

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 7, wherein the down-sampling multiples of the three-level first feature map are respectively: 8 times, 16 times and 32 times.

11. An electronic device, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5.