CN112257668A

CN112257668A - Main and auxiliary road judging method and device, electronic equipment and storage medium

Info

Publication number: CN112257668A
Application number: CN202011264064.4A
Authority: CN
Inventors: 孙中阳
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-01-22

Abstract

The embodiment of the application provides a method and a device for judging main and auxiliary roads based on artificial intelligence, electronic equipment and a storage medium, and relates to the technical field of navigation. The method comprises the following steps: acquiring a road condition video; determining a driving area from a video frame of a road condition video; projecting the driving area to a world coordinate system of the real world, and obtaining the width of a road to be identified according to the distance between pixel points at the left end and the right end of the pixel points in the same row in the driving area in the world coordinate system; and matching the width of the road to be identified with the width of the road in the base map data, and determining the main and auxiliary road information of the road to be identified according to the matching result. The embodiment of the application is suitable for various types of vehicles, and overcomes the problems that in the prior art, the error is large due to the fact that the road surface width measurement is simply carried out according to sensor data, and the main and auxiliary road information is not accurately judged by utilizing positioning data and combining base map data.

Description

Main and auxiliary road judging method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of navigation technologies, and in particular, to a method and an apparatus for judging a main road and a side road, an electronic device, and a storage medium.

Background

The existing main and auxiliary road judging methods mainly comprise two methods:

firstly, a GPS or GPS + inertial navigation multi-sensor fusion positioning mode is used for comparing the proximity degree of the current vehicle position and the real position of the main road or the auxiliary road for judgment, and meanwhile, a machine learning mode is used for judging the main road and the auxiliary road according to a model according to current data such as GPS and other sensor signals, but the method does not relate to camera data and base map main and auxiliary road width information. However, this solution has a problem of low accuracy due to the large and unavoidable measurement errors of the sensor data. For example, the error of a GPS sensor is about 15 meters generally, the error can reach 30 meters or more under complex conditions such as the condition that a tree building is shielded, and the similar conditions such as the vicinity of a high frame and the like are needed to be judged by a plurality of main roads and auxiliary roads.

The scheme of combining the vehicle motion data through the camera is that whether the current vehicle is ready to change lanes is judged through images acquired by the camera, lane lines and vehicle motion data are intercepted in an area close to a road turn crossing, and if the lane lines and the vehicle motion data deflect, the situation of a main road and an auxiliary road is judged to change.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for determining a primary road and a secondary road, which overcome the above problems or at least partially solve the above problems.

In a first aspect, a method for judging a main road and a secondary road is provided, the method comprising:

acquiring a road condition video;

determining a driving area from a video frame of a road condition video, wherein the driving area is the imaging content of a road to be identified in the real world in the video frame;

projecting the driving area to a world coordinate system of the real world, and obtaining the width of a road to be identified according to the distance between pixel points at the left end and the right end of the pixel points in the same row in the driving area in the world coordinate system;

and matching the width of the road to be identified with the width of the road in the base map data, and determining the main and auxiliary road information of the road to be identified according to the matching result.

In one possible implementation, determining a driving region from the video frame includes:

and inputting the video frame into a pre-trained image segmentation model, and obtaining a classification result of each pixel point in the video frame output by the image segmentation model, wherein the classification result is used for representing whether the pixel point is in a driving area.

inputting the video frame into a pre-trained image classification model, and obtaining a driving region and/or a moving object region in the video frame output by the image classification model, wherein the moving object region is imaging content of a moving object in the real world in the video frame;

in one possible implementation, projecting the driving region to a real world coordinate system further includes:

and if a moving object region exists in the video frame and the moving object region is determined to meet the interference condition, acquiring the next video frame from the road condition video to determine the driving region.

Determining that the moving object region meets the interference condition, including:

and if the number of the moving object regions is larger than a preset threshold value, or pixel points on the edges of at least one of the left side and the right side in the video frame are located in the moving object regions, determining that the moving object regions meet the interference condition.

In one possible implementation, projecting the driving region to a real world coordinate system includes:

and determining the coordinates of each pixel point in the driving area in a pixel coordinate system, and combining an internal reference matrix, an external reference matrix and the depth of the video acquisition equipment which are determined in advance to obtain the coordinates of the pixel points in a world coordinate system of the real world.

In a possible implementation manner, obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the right end in the same row of pixel points in the world coordinate system in the driving area includes:

determining target row pixel points for calculating width from the driving area according to the distribution condition of the distances of the pixel points at the left end and the right end in all the row pixel points in the driving area;

and obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the right end in the target row pixel points in the world coordinate system.

In a possible implementation manner, determining target row pixel points for calculating width from a driving area according to a distribution condition of distances of pixel points at left and right ends in a world coordinate system among all row pixel points in the driving area includes:

sorting the distances of the pixels at the left end and the right end of all the rows of pixels in the running area in the world coordinate system from large to small;

and taking the row pixel points with the preset number which are sorted in the front as target row pixel points.

In one possible implementation manner, matching the width of the road to be identified with the width of the road in the base map data, and determining the main and auxiliary road information of the road to be identified according to the matching result includes:

determining a road in an area with the current driving position as the center of a circle and the preset length as the radius from the base map data as a road to be matched;

and determining the main and auxiliary road information of the target road recorded in the map data as the main and auxiliary road information of the road to be identified by taking the road to be matched, of which the width similarity with the road to be identified is greater than a preset threshold value, as the target road.

In a second aspect, there is provided a primary and secondary road judging device, including:

the video acquisition module is used for acquiring road condition videos;

the driving area identification module is used for determining a driving area from a video frame of the road condition video, and the driving area is the imaging content of a road to be identified in the real world in the video frame;

the road width identification module is used for projecting the driving area to a world coordinate system of the real world and obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the right end in the same row of the pixel points in the driving area in the world coordinate system;

and the base map matching module is used for matching the width of the road to be identified with the width of the road in the base map data and determining the main and auxiliary road information of the road to be identified according to the matching result.

In one possible implementation, the driving area identifying module includes:

and the image segmentation submodule is used for inputting the video frame into a pre-trained image segmentation model, obtaining a classification result of each pixel point in the video frame output by the image segmentation model, and the classification result is used for representing whether the pixel point is in the driving area.

In one possible implementation, the driving area identifying module includes:

the image classification submodule is used for inputting the video frame into a pre-trained image classification model and obtaining a driving region and/or a moving object region in the video frame output by the image classification model, wherein the moving object region is the imaging content of a moving object in the real world in the video frame;

the main and auxiliary road judging device further comprises:

and the screening module is used for acquiring the next video frame from the road condition video to determine the driving area if a moving object area exists in the video frame and the moving object area meets the interference condition before projecting the driving area to a world coordinate system of the real world.

In one possible implementation manner, the determining, by the screening module, that the moving object region satisfies the interference condition includes: and if the number of the moving object regions is larger than a preset threshold value, or pixel points on the edges of at least one of the left side and the right side in the video frame are located in the moving object regions, determining that the moving object regions meet the interference condition.

In one possible implementation, the road width identifying module includes:

and the projection submodule is used for determining the coordinates of each pixel point in the driving area in a pixel coordinate system, and obtaining the coordinates of the pixel points in a world coordinate system of the real world by combining an internal reference matrix, an external reference matrix and the depth of the video acquisition equipment which are determined in advance.

In one possible implementation, the road width identifying module includes:

the target row determining submodule is used for determining target row pixel points for calculating the width from the driving area according to the distribution condition of the distances between pixel points at the left end and the right end in all the row pixel points in the driving area;

and the width calculation submodule is used for obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the pixel points at the right end in the target row pixel points in the world coordinate system.

In one possible implementation, the target row determination sub-module includes:

the sorting unit is used for sorting the distances of the pixel points at the left end and the right end in the world coordinate system in all the pixel points in the driving area from large to small;

and the selection unit is used for taking the row pixel points with the preset number in the front sequence as target row pixel points.

In one possible implementation, the base map matching module includes:

the road to be matched acquisition submodule is used for determining a road in an area which takes the current driving position as the center of a circle and takes the preset length as the radius from the base map data as a road to be matched;

and the matching sub-module is used for taking the road to be matched, the width similarity of which with the road to be identified is greater than a preset threshold value, as a target road, and determining the main and auxiliary road information of the target road recorded in the map data as the main and auxiliary road information of the road to be identified.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method as provided in the first aspect.

The method, the device, the electronic equipment and the storage medium for judging the main road and the auxiliary road provided by the embodiment of the invention can obtain the road condition video, determining a driving area according to the video frame in the road condition video, then projecting the driving area to a world coordinate system of the real world, the road width can be determined, the width is finally matched with the base map data, the main and auxiliary road information of the road to be identified is determined according to the matching result, because the road width difference of the main and auxiliary roads is large in reality, the main and auxiliary road information can be well recognized by the method for determining the road width through the image recognition and the coordinate system conversion without high accuracy, the problem of large error caused by the fact that the road width measurement is carried out only according to sensor data in the prior art is solved, and the problem of inaccurate judgment of the main and auxiliary road information by utilizing positioning data and base map data is solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a schematic diagram of a transformation between an image coordinate system and a world coordinate system according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a transformation between a camera coordinate system and an image coordinate system according to an embodiment of the present disclosure;

fig. 3 is an application scenario diagram of a main road and auxiliary road judging method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a main road and auxiliary road judging method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a primary and secondary road judgment method according to another embodiment of the present application;

FIG. 6 is a schematic diagram of a video frame according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a determination of a width of a road to be identified according to a video frame according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image segmentation model according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of another video frame according to an embodiment of the present application

FIG. 10 is a flowchart illustrating a method for determining a primary road and a secondary road according to yet another embodiment of the present application;

fig. 11 is a schematic structural diagram of a main/auxiliary road judging device according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

The application provides a main and auxiliary road judging method, a main and auxiliary road judging device, an electronic device and a storage medium, and aims to solve the technical problems in the prior art.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

To facilitate understanding of the methods provided by the embodiments of the present application, first, terms referred to in the embodiments of the present application will be described:

AI is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

CV (Computer Vision), Computer Vision is a science for researching how to make a machine "see", and further, it means that a camera and a Computer are used to replace human eyes to perform machine Vision such as recognition, tracking and measurement on a target, and further, graphics processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or to transmit to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes technologies such as image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also includes common biometric technologies such as face Recognition, fingerprint Recognition, and the like.

ML (Machine Learning) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to the technologies of artificial intelligence CV, ML and the like, and provides a main and auxiliary road judgment method which can be applied to the fields of AGV (automatic Guided Vehicle), automatic driving of automobiles, auxiliary driving of automobiles and the like.

Image coordinate system: the image coordinate system is a coordinate system with the upper left vertex of the image collected by the camera as the origin of coordinates. The x-axis and the y-axis of the image coordinate system are the length and width directions of the acquired image.

Camera coordinate system: the camera coordinate system is a three-dimensional rectangular coordinate system established by taking the focusing center of the camera as an origin and taking the optical axis as the z axis. Wherein the x-axis of the camera coordinate system is parallel to the x-axis of the image coordinate system of the acquired image, and the y-axis of the camera coordinate system is parallel to the y-axis of the image coordinate system of the acquired image.

World coordinate system: the world coordinate system can describe the position of the camera in the real world and can also describe the position of an object in the real world in an image captured by the camera. The camera coordinate system can be converted into the world coordinate system through the pose of the camera in the world coordinate system. Typically, the world coordinate system has the x-axis pointing horizontally in the eastern direction, the y-axis pointing horizontally in the northern direction, and the z-axis pointing vertically upward.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of the conversion between an image coordinate system and a world coordinate system provided in the embodiment of the present application, and as shown in fig. 1, directions of image processing, stereoscopic vision, and the like often involve four coordinate systems: world coordinate system, camera coordinate system, image coordinate system, pixel coordinate system.

Wherein, O_W-X_WY_WZ_W: a world coordinate system, describing camera position, in meters (m);

O_C-X_CY_CZ_C: a camera coordinate system, the optical center being a far point, the unit being meters (m);

o-xy: an image coordinate system with an optical center as an image midpoint in millimeters (mm);

uv: a pixel coordinate system, wherein the origin is the upper left corner of the image and the unit is pixel;

p: a point in the world coordinate system, i.e. a point in the real world;

p: an imaged point of the point P in the image, the coordinates in the image coordinate system being (x, y), the coordinates in the pixel coordinate system being (u, v);

f: focal length of camera equal to O and O_cF | | | O-O_c||。

To convert from the pixel coordinate system to the world coordinate system, two steps are typically required: the first step is to convert the pixel coordinate system to the image coordinate system, the second step is to convert the image coordinate system to the camera coordinate system, and the third step is to convert the camera coordinate system to the world coordinate system.

For the first step, both the pixel coordinate system and the image coordinate system are on the imaging plane, except that the respective origin and measurement unit are different. The origin of the image coordinate system is the intersection of the camera optical axis and the imaging plane, typically the midpoint of the imaging plane (principal point). The unit of an image coordinate system is mm and belongs to a physical unit, while the unit of a pixel coordinate system is pixel, and we commonly describe that a pixel point is several rows and several columns. The transition between the two is as follows:

where dx and dy denote respectively how many mm each column and each row represents, i.e. 1 pixel-dx mm, u₀、v₀Is the center of the image plane and belongs to the internal reference of the camera.

For the second step, referring to fig. 2, the transformation relationship between the camera coordinate system and the image coordinate system belongs to the perspective projection relationship from the camera coordinate system to the image coordinate system, and can be expressed as:

wherein (X)_c,Y_c,Z_c) Representing the coordinates of the P point in the camera coordinate system, when the unit of the P point has been converted to meters.

For the third step, from the camera coordinate system to the world coordinate system, rotation and translation are involved (in fact all movements can also be described by rotation matrices and translation vectors). Rotating different angles around different coordinate axes to obtain corresponding rotation matrixes, and then the conversion relationship from the camera coordinate system to the world coordinate system can be expressed as:

where R denotes a rotation matrix and T denotes an offset vector.

Then the conversion relationship of a point under the pixel coordinate system to the world coordinate system can be obtained through the conversion of the above four coordinate systems:

wherein f is_x、f_y、u₀And v₀Belonging to camera parameters, the matrix formed by the camera parameters is called an internal parameter matrix, and R,

And T belongs to camera extrinsic parameters, and a matrix formed by the camera extrinsic parameters is called an extrinsic parameter matrix.

According to the above conversion relationship, a coordinate point in a three-dimensional image can find a corresponding pixel point in a two-dimensional image, but conversely, finding a corresponding point in the three-dimensional image through a point in the two-dimensional image requires knowing the Z on the left of the equation_cI.e. depth information.

Fig. 3 shows an application scenario of the main and auxiliary road judgment method according to an embodiment of the present application. As shown in fig. 3, when the vehicle 10 is traveling in a road segment 20 where a main road and a sub road need to be distinguished, the road condition acquisition device 11 in the vehicle 10 acquires a road condition video, the road condition acquisition device 11 sends the acquired road condition video to the server 30, base map data is stored in the server 30 in advance, the server 30 determines main and sub road information of the road segment 20 according to a video frame of the road condition video and the base map data, the server 30 sends the main and sub road information of the road segment 20 to the vehicle-mounted terminal 12 for displaying, and relevant people can view the main and sub road information of the road segment 20 through the vehicle-mounted terminal 12.

The following describes an execution subject and an implementation environment of an embodiment of the present application:

according to the main and auxiliary road judging method provided by the embodiment of the application, the execution main body of each step can be a vehicle, and the vehicle can comprise road condition acquisition equipment. The road condition acquisition equipment can include camera equipment for shoot the road that traveles that is located the vehicle place ahead, obtain the road conditions video.

Optionally, the image capturing apparatus may be any electronic apparatus with an image capturing function, such as a camera, a video camera, a still camera, or the like, which is not limited in this embodiment of the application.

Optionally, the road condition collecting device may further include a processing unit, where the processing unit is configured to process the road condition video to execute the main/auxiliary road judging method, and obtain the main/auxiliary road information of the vehicle driving road section. The processing unit may be an electronic device, such as a processor, having image and data processing functionality.

It should be noted that the processing unit may be integrated on the vision module, or may be independently present on the vehicle to form a processing module, and the processing module and the vision module may be electrically connected.

Optionally, in the main/auxiliary road judging method provided in the embodiment of the present application, the execution subject of each step may be a vehicle-mounted terminal installed on a vehicle. The vehicle-mounted terminal has image acquisition and image processing functions. The vehicle-mounted terminal can comprise a camera device and a processing device, and after the camera device collects the road image, the processing device can execute the vehicle positioning method based on the road image to acquire the position information of the vehicle.

In some other embodiments, as shown in fig. 3, the vehicle location method described above may also be performed by a server. After the server obtains the road condition video, the judgment result of the main road and the auxiliary road can be sent to the vehicle.

It should be noted that the server may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content delivery network), and a big data and artificial intelligence platform, which is not limited in this embodiment of the present application.

The execution method of the server in the embodiment of the present application may be implemented in a form of cloud computing (cloud computing), which is a computing mode and distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.

According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:

fig. 4 is a schematic flow chart of a main road and auxiliary road judging method according to an embodiment of the present disclosure, as shown in fig. 4, a road condition collecting device 110 is installed on a vehicle 100, a target vehicle 100 can shoot an image in front of the vehicle through the camera device 110 to obtain a road condition video, and a driving area 210 is determined from a video frame 200 of the road condition video, where the driving area 210 is an imaging content of a road to be identified in the real world in the video frame; projecting the driving area to a world coordinate system of the real world, obtaining the width d of the road to be identified according to the distance between the pixel points at the left end and the right end in the same row of pixel points in the driving area in the world coordinate system, matching the width d of the road to be identified with the width of the road in the base map data 200, and determining the main and auxiliary road information of the road to be identified according to the matching result.

The technical solution of the present application will be described below by means of several embodiments.

Referring to fig. 5, a schematic flow chart of a main/auxiliary road judging method provided in another embodiment of the present application is shown, in the embodiment of the present application, the method is exemplified by being applied to the vehicle or the vehicle-mounted terminal described above. The method may include the steps of:

s101, collecting road condition videos.

The road condition video of the embodiment of the application is an image containing a road to be identified, which is acquired through the camera equipment installed on the target vehicle.

The target vehicle can be provided with the camera shooting device, and when the camera shooting device is arranged on the front windshield of the target vehicle, the road condition video in front of the target vehicle can be shot through the camera shooting device.

Optionally, the image capturing apparatus may be any electronic apparatus with an image capturing function, such as a camera, a video camera, a still camera, or the like, which is not limited in this embodiment.

S102, determining a driving area from the video frame of the road condition video, wherein the driving area is the imaging content of the road to be identified in the real world in the video frame.

The video frame in the road condition video is an image containing a driving area, and the driving area is imaging content of a road to be identified in the real world in the video frame.

Illustratively, as shown in fig. 6, a schematic diagram of a video frame is exemplarily shown. The video frame 200 includes 4 lanes.

S103, projecting the driving area to a world coordinate system of the real world, and obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the right end of the pixel points in the same row in the driving area in the world coordinate system.

From the foregoing, a point has a transformation relationship from a pixel coordinate system to a world coordinate system:

therefore, the embodiments of the present applicationThe coordinate (u, v) of each pixel point in the driving area in the pixel coordinate system can be determined, and a predetermined internal parameter matrix of the video acquisition equipment is combined

External reference matrix

And depth Z_cObtaining the coordinates (X) of the pixel points in the world coordinate system of the real world_W,Y_W,Z_W)。

Optionally, the depth information of the pixel points may be obtained by a laser radar or a neural network model for estimating the depth of the pixel. Laser radar (Light Detection And Ranging), namely, Light Detection And measurement, is used for obtaining data And generating an accurate digital elevation model, And can highly accurately position Light spots of laser beams on an object, namely, the position of each pixel point on a positioning image projected to a position range camera of a world coordinate system.

For a neural network model by estimating pixel depth, it should be understood that the neural network model by estimating pixel depth may be trained as follows. Firstly, collecting a certain number of sample images, acquiring RGB information and gradient information of each pixel point in each sample image, marking the depth of each pixel point, then, training an initial neural network to be convergent by taking the depth of the pixel point as a sample label based on the RGB information and the gradient information of each pixel point in the sample images, and obtaining a neural network model capable of estimating the pixel depth.

For example, as shown in fig. 7, a schematic diagram of determining the width of a road to be identified according to a video frame in the embodiment of the present application is exemplarily shown, and assuming that it is known in advance from an internal reference matrix and an external reference matrix of a camera that the side length of a perpendicular line in a triangle in fig. 7 is 150 pixels, which corresponds to 1.5 meters in the real world, and the side length of a bottom side is 1500 pixels, it is possible to find that the actual length of the longest side in the triangle is 15 meters, that is, it is possible to determine that the distance between pixel points at the left end and the right end in a driving area in a world coordinate system is 15 meters, and the width of the road to be identified is 15 meters.

And S104, matching the width of the road to be identified with the width of the road in the base map data, and determining the main and auxiliary road information of the road to be identified according to the matching result.

The base map data is basic data of a map, for example, basic roads, water systems, and the like, and the base map data indicates information such as the width, direction, length, and main/sub roads of a road. According to the embodiment of the application, after the width of the road to be identified is determined, the width can be matched in the base map data, if a road which is consistent with the width of the road to be identified or has similarity larger than a preset threshold value is matched, the main and auxiliary road information of the road to be identified is determined according to the main and auxiliary road information of the road, and if the road is a main road, the road to be identified is also determined to be the main road.

According to the method for judging the main and auxiliary roads provided by the embodiment of the application, the road condition video is obtained, the driving area is determined according to the video frame in the road condition video, then the driving area is projected to the world coordinate system of the real world, the road width can be determined, finally, the width is matched with the base map data, the main and auxiliary road information of the road to be identified is determined according to the matching result, and because the difference of the road widths of the main and auxiliary roads is large in reality, the main and auxiliary road information can be well identified without high accuracy by determining the road width by means of image identification and coordinate system conversion in the scheme, the problem that the error is large due to the fact that the road width measurement is simply carried out according to sensor data in the prior art is solved, and the problem that the positioning data is combined with the base map data to judge the main and auxiliary road information is.

On the basis of the foregoing embodiments, as an alternative embodiment, the embodiment of the present application may obtain, in a machine learning manner, the identification of the driving area at a pixel level in the video frame, so as to calculate the road width more accurately, and specifically, determine the driving area from the video frame, including:

It can be understood that the image segmentation model in the embodiment of the present application is a binary model, that is, the output result of the image segmentation model is that for each pixel point, it is determined as either a driving region or a non-driving region, and pixels belonging to the same region are assigned with the same number, so that the image is divided into mutually disjoint regions.

The image segmentation model of the embodiment of the application can adopt any one of the following: UNet, ENet (Efficient Neural Network), PSPNet (Pyramid Scene Parsing Network), FCN (full convolutional Neural Network), MRCNN, and the like, which are not limited in the embodiments of the present application.

After the video frame is input into the image segmentation model, a road segmentation map corresponding to the video frame can be obtained, wherein the road segmentation map is an image segmented with a driving area, the driving area and a background can be displayed in a distinguishing manner in the road segmentation map, and each lane can be further displayed in a distinguishing manner.

Alternatively, the road segmentation map may be a gray scale map, where the driving area and the background may be represented by different gray scale values, may be represented by different colors, or may be represented by different brightness values, which is not limited in this embodiment of the present application.

Illustratively, as shown in fig. 8, a schematic structural diagram of an image segmentation model is exemplarily shown, which includes two parts, an encoder and a decoder, wherein the encoder includes a backbone network architecture, and the backbone network architecture is used as an image classification architecture for feature extraction. It has been verified that not only the type of backbone can control the performance, but also the output step (defined as the ratio between the input image size and the final profile of the encoder) can control the performance of the network. The inclusion of a DASPP (densesaspp, dense hole space pyramid pooling) module, a depth separable convolution module, and a connectivity layer behind the backbone network architecture allows the network to learn the global context characteristics of the entire image to refine the full resolution prediction. The main idea of the depth separable convolution is to divide the input and kernel into channels (which share the same number of channels) and each input channel will be convolved with a corresponding kernel channel. Then, using a 1 × 1 convolution to perform a point-by-point convolution to project the output of the depth convolution into the new channel space, experiments have demonstrated that using a depth separable convolution can reduce computational cost and have similar or better performance.

The decoder includes 3 × 3 convolutional layers and upsample layers (upsamplates), where the 3 × 3 convolutional layers are 2 in total. In addition, the output of the encoder can be enhanced with low-level features, which may have a large number of feature maps, from earlier layers of the backbone network through long-distance residual concatenation, and to solve this problem, the number of channels of the low-level features is reduced with 1 × 1 convolution.

It should be understood that in real scenes, the situation of no other vehicles on the road as shown in fig. 6 is rare, and exemplarily, as shown in fig. 9, it exemplarily shows a schematic diagram of another video frame, as shown, the road in the video frame also has other vehicles, the vehicles in the video frame are marked in the form of rectangular frames through the image classification model, and the positions of the vehicles just cause that the road cannot be completely imaged in the video frame, which causes that the width of the driving area directly according to the video frame is not the real width of the road.

In order to solve the above problem, on the basis of the above embodiments, as an alternative embodiment, the driving region may be determined by using an image classification model, which can recognize more kinds of objects in the picture than the image segmentation model, such as pedestrians, vehicles, animals, plants, and the like. According to the method and the device, the image classification model is utilized, the complexity of the road condition can be determined while the driving area is determined, and further preparation is made for screening out the video frames which are more utilized to determine the road width.

Fig. 10 is a flowchart illustrating a main/auxiliary road judging method according to still another embodiment of the present application, and as shown in fig. 10, the method includes:

s201, collecting road condition videos.

This step is the same as or similar to the step S101 in the embodiment of fig. 5, and is not described here again.

S202, selecting a frame of video frame from the road condition video.

Optionally, the unselected video frames are sequentially selected according to the time sequence of the video frames in the traffic video.

S203, inputting the video frame into a pre-trained image classification model, and obtaining a driving region and/or a moving object region in the video frame output by the image classification model, wherein the moving object region is imaging content of a moving object in the real world in the video frame.

The image classification model of the embodiment of the application can identify roads to be identified and moving objects appearing in the real world, and the moving objects of the embodiment of the application include but are not limited to pedestrians, animals, motor vehicles, non-motor vehicles and the like. It is understood that before step S203 is performed, a process of training the obtained image classification model is also involved. Firstly, obtaining a certain number of sample images, wherein at least a driving area is arranged in each sample image, further judging whether a moving object area exists in each sample image, if not, marking the driving area in each sample image, otherwise, marking the moving object area in each sample image, and training the initial model of the marked sample images to be convergent to obtain an image classification model capable of performing image classification.

It should be noted that, when the sample image is marked, the class to which each pixel point in the sample image belongs, that is, the class belongs to the moving object region, or the class belongs to the driving region, is marked, so that the image classification model can output the classification result at the pixel level.

For example, the pixel values of the pixels belonging to the background may be set to 0, the pixel values of the pixels belonging to the driving area may be set to 1, the pixel values of the pixels belonging to the pedestrian may be set to 2, and the pixel values of the pixels belonging to the vehicle may be set to 3. In some other examples, the pixel values of the pixels of the background, the driving area, and the moving object area may also be set to other values, respectively, which is not limited in this embodiment.

S204, if a moving object region exists in the video frame and the moving object region is determined to meet the interference condition, returning to the step S202;

s204', if no moving object region exists in the video frame or the moving object region exists in the video frame and the moving object region is determined not to meet the interference condition, executing the step S205;

according to the method and the device, after the video frame is identified by the image classification model, whether the video frame contains the moving object region is judged firstly, if the moving object region does not exist, the fact that the road where the vehicle runs at present is clear is indicated, the road width can be calculated by using the video frame, and if the video frame contains the moving object region, whether the moving object region meets the interference condition needs to be further judged. If the interference condition is met, the current video frame is not suitable for calculating the road width, and one video frame needs to be selected again. If the interference condition is not met, the current video frame is suitable for calculating the road width, and the video frame does not need to be reselected.

Optionally, in this embodiment of the present application, if the number of the moving object regions is greater than the preset threshold, it is determined that the moving object regions satisfy the interference condition.

For example, when the number of moving objects is greater than 2, it is determined that the moving object regions satisfy the interference adjustment, and the number of the moving object regions is not specifically limited in the embodiment of the present application.

For example, if the ratio of the number of the pixels in the moving object region to the total number of the pixels in the video frame is greater than the preset threshold, it is determined that the moving object region satisfies the interference condition.

For example, if a video frame has 1000 pixels, and the number of the pixels in the moving object region is 200, and the preset threshold is 10%, obviously, 200/1000 is 20%, and is greater than the preset threshold, it is determined that the moving object region satisfies the interference condition.

For example, if the pixel points on at least one of the left and right edges in the video frame are located in the moving object region, it is determined that the moving object region satisfies the interference condition.

When the pixel points at the edge of at least one of the left side and the right side in the video frame are positioned in the moving object region, it is indicated that at least one side of the road is shielded by the moving object region, and the accuracy of the calculated road width is low, so that the moving object region is determined to meet the interference condition.

S205, projecting the driving area to a world coordinate system of the real world, and obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the right end of the pixel points in the same row in the driving area in the world coordinate system.

This step is the same as or similar to the step S103 in the embodiment of fig. 5, and is not described here again.

And S206, matching the width of the road to be identified with the width of the road in the base map data, and determining the main and auxiliary road information of the road to be identified according to the matching result.

This step is the same as or similar to the step S104 in the embodiment of fig. 5, and is not repeated here.

According to the main and auxiliary road judging method, the driving area and the moving object area in the video frame are identified through the image classification model, whether the moving object area meets the interference condition is judged, whether the video frame needs to be selected again is determined according to the judgment result, the width of the road to be identified is calculated more accurately, and the road is matched from the base map more accurately.

On the basis of the foregoing embodiments, as an optional embodiment, the obtaining, in step S103 and step S205, the width of the road to be identified according to the distance between the pixels at the left end and the right end in the same row of pixels in the driving area in the world coordinate system includes:

s301, determining target row pixel points for calculating width from the driving area according to the distribution condition of distances of pixel points at the left end and the right end in all the row pixel points in the driving area in the world coordinate system.

Taking the video frame shown in fig. 9 as an example, since there is vehicle interference in the video frame, distances between pixels at left and right ends in each line of pixels in the driving area in the world coordinate system are different, and at this time, the target line of pixels used for calculating the width may be determined according to the distribution of the distances.

Optionally, the target row pixel points used to calculate the width should satisfy: in the video frame, only the pixel points of the driving region exist in the target line, and the pixel points of the moving object region do not exist.

In addition, the distance between the pixel points at the left end and the pixel points at the right end of all the pixel points in the driving area in the world coordinate system can be sequenced from large to small; and taking the row pixel points with the preset number which are sorted in the front as target row pixel points.

By sequencing the distances, a part of line pixels with the farthest distance of the pixels at the left end and the right end in the lines of pixels in the video frame in the world coordinate system can be determined, and the accuracy rate of using the part of line pixels as the target line pixels is high.

S302, obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the right end in the target line pixel points in the world coordinate system.

Optionally, when the number of the target rows is one, the distance between the pixel points at the left end and the right end of the pixel points of the target rows in the world coordinate system can be directly used as the width of the road to be identified;

when the number of the target rows is multiple, the average value or the weighted average value of the distances between the pixel points at the left end and the right end of the target rows in the world coordinate system can be used as the width of the road to be identified.

On the basis of the foregoing embodiments, as an optional embodiment, matching the width of the road to be identified with the width of the road in the base map data, and determining the main and auxiliary road information of the road to be identified according to the matching result includes:

determining a road in an area with the current driving position as the center of a circle and the preset distance as the radius from the base map data as a road to be matched;

According to the embodiment of the application, the requirement on the positioning precision is not required, so that the road in the area with the current driving position as the center of a circle and the preset length as the radius is taken as the road to be matched.

Optionally, the preset length may be 1 km, and the preset length is not specifically limited in this embodiment of the application.

An embodiment of the present application provides a primary and secondary road judging device, as shown in fig. 11, the device may include: the system comprises a video acquisition module 101, a driving area identification module 102, a road width identification module 103 and a base map matching module 104, and specifically comprises:

the video acquisition module 101 is used for acquiring a road condition video;

the driving area identification module 102 is configured to determine a driving area from a video frame of a road condition video, where the driving area is imaging content of a road to be identified in the real world in the video frame;

the road width identification module 103 is used for projecting the driving area to a world coordinate system of the real world, and obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the right end in the same row of pixel points in the driving area in the world coordinate system;

and the base map matching module 104 is configured to match the width of the road to be identified with the width of the road in the base map data, and determine the main and auxiliary road information of the road to be identified according to the matching result.

The main and auxiliary road judging device provided in the embodiment of the present invention specifically executes the processes of the above method embodiments, and please refer to the contents of the above main and auxiliary road judging method embodiments in detail, which are not described herein again. According to the main and auxiliary road judging device provided by the embodiment of the invention, the road condition video is obtained, the driving area is determined according to the video frame in the road condition video, then the driving area is projected to the world coordinate system of the real world, the road width can be determined, finally the width is matched with the base map data, the main and auxiliary road information of the road to be identified is determined according to the matching result, and because the difference of the road widths of the main and auxiliary roads is large in reality, the main and auxiliary road information can be well identified without high accuracy by determining the road width by means of image identification and coordinate system conversion in the scheme, the problem of large error caused by the fact that the road width measurement is carried out only according to sensor data in the prior art is solved, and the problem of inaccurate main and auxiliary road information judgment by utilizing positioning data and combining the base map data is solved.

On the basis of the above embodiments, as an alternative embodiment, the travel area identification module includes:

the image segmentation submodule is used for inputting the video frame into a pre-trained image segmentation model to obtain a classification result of each pixel point in the video frame output by the image segmentation model, and the classification result is used for representing whether the pixel point is in a driving area or not

the main and auxiliary road judging device further comprises:

On the basis of the foregoing embodiments, as an optional embodiment, the determining, by the screening module, that the moving object region satisfies the interference condition includes: and if the number of the moving object regions is larger than a preset threshold value, or pixel points on the edges of at least one of the left side and the right side in the video frame are located in the moving object regions, determining that the moving object regions meet the interference condition.

On the basis of the foregoing embodiments, as an optional embodiment, the road width identifying module includes:

On the basis of the foregoing embodiments, as an alternative embodiment, the target row determining submodule includes:

On the basis of the above embodiments, as an alternative embodiment, the base map matching module includes:

An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: the road width can be determined by acquiring the road condition video, determining the driving area according to the video frame in the road condition video, projecting the driving area to the world coordinate system of the real world, finally matching the width with the base map data, determining the main and auxiliary road information of the road to be identified according to the matching result, and because the road width difference of the main and auxiliary roads is large in reality, the main and auxiliary road information can be well identified without high accuracy by determining the precision of the road width in the image identification and coordinate system conversion mode of the scheme, so that the problem of large error caused by the fact that the road width measurement is simply carried out according to sensor data in the prior art is solved, and the problem that the main and auxiliary road information is inaccurate by utilizing positioning data and the base map data is solved.

In an alternative embodiment, there is provided an electronic device, as shown in fig. 12, an electronic device 4000 shown in fig. 12 including: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (field programmable Gate Array) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 12, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the road width can be determined by acquiring the road condition video, determining the driving area according to the video frame in the road condition video, projecting the driving area to the world coordinate system of the real world, finally matching the width with the base map data, and determining the main and auxiliary road information of the road to be identified according to the matching result.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A main and auxiliary road judging method is characterized by comprising the following steps:

acquiring a road condition video;

determining a driving area from the video frame of the road condition video, wherein the driving area is the imaging content of the road to be identified in the real world in the video frame;

projecting the driving area to a world coordinate system of the real world, and obtaining the width of the road to be identified according to the distance between pixel points at the left end and the right end in the same row of pixel points in the driving area in the world coordinate system;

2. The method for judging a main road and a secondary road according to claim 1, wherein the determining a driving area from the video frame comprises:

inputting the video frame into a pre-trained image segmentation model, and obtaining a classification result of each pixel point in the video frame output by the image segmentation model, wherein the classification result is used for representing whether the pixel point is in the driving area.

3. The method for judging a main road and a secondary road according to claim 1 or 2, wherein the determining a driving area from the video frame comprises:

the projecting the travel region to the real world coordinate system further comprises:

and if a moving object region exists in the video frame and the moving object region is determined to meet the interference condition, acquiring the next video frame from the road condition video to determine a driving region.

4. The method for judging a main road and a secondary road according to claim 3, wherein the determining that the moving object region satisfies an interference condition comprises:

and if the number of the moving object regions is larger than a preset threshold value, or pixel points on the edges of at least one of the left side and the right side in the video frame are located in the moving object regions, determining that the moving object regions meet interference conditions.

5. The method according to any one of claims 1 to 4, wherein the projecting the travel region to the real world coordinate system includes:

and determining the coordinates of each pixel point in the driving area in a pixel coordinate system, and combining the predetermined internal reference matrix, external reference matrix and depth of the video acquisition equipment to obtain the coordinates of the pixel points in a world coordinate system of the real world.

6. The method for judging the main road and the auxiliary road according to claims 1-4, wherein the obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the pixel points at the right end in the same row of the pixel points in the driving area in the world coordinate system comprises:

determining target row pixel points for calculating width from the driving area according to the distribution condition of the distances between the pixel points at the left end and the right end in all the row pixel points in the driving area;

and obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the pixel points at the right end in the target line pixel points in the world coordinate system.

7. The method for judging the main road and the auxiliary road according to claim 6, wherein the determining the target row pixel for calculating the width from the driving area according to the distribution of the distances of the pixel points at the left end and the right end in all the row pixel points in the driving area in the world coordinate system comprises:

sorting the distances of the pixel points at the left end and the right end in all the pixel points in the driving area in the world coordinate system from large to small;

8. The main and auxiliary road judgment method according to claim 1, wherein the matching the width of the road to be identified with the width of the road in the base map data, and determining the main and auxiliary road information of the road to be identified according to the matching result comprises:

and determining the main and auxiliary road information of the target road recorded in the map data as the main and auxiliary road information of the road to be identified, wherein the width similarity of the road to be identified and the road to be matched is greater than a preset threshold value.

9. A primary and secondary road determination device, comprising:

the video acquisition module is used for acquiring road condition videos;

the driving area identification module is used for determining a driving area from a video frame of the road condition video, wherein the driving area is the imaging content of a road to be identified in the real world in the video frame;

the road width identification module is used for projecting the driving area to a world coordinate system of the real world and obtaining the width of the road to be identified according to the distance between the pixel points at the left end and the right end in the same row of pixel points in the driving area in the world coordinate system;

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of determining a primary and secondary road as claimed in any one of claims 1 to 8 when executing the program.

11. A computer-readable storage medium storing computer instructions for causing a computer to perform the steps of the primary and secondary road judgment method according to any one of claims 1 to 8.