WO2020048487A1

WO2020048487A1 - Image data processing method and system

Info

Publication number: WO2020048487A1
Application number: PCT/CN2019/104415
Authority: WO
Inventors: 李友增; 李国镇; 李佩伦; 赵震
Original assignee: 北京嘀嘀无限科技发展有限公司
Priority date: 2018-09-05
Filing date: 2019-09-04
Publication date: 2020-03-12
Also published as: CN110879943B; CN110879943A

Abstract

Disclosed are an image data processing method and system. The method comprises: obtaining an image containing a road; obtaining at least one image layer of the image on the basis of the image and a trained traffic marking identification model, the image layer comprising at least one traffic marking in the image and a region reflecting at least one landmark in an overhead image; and processing the at least one image layer to obtain a position coordinate of the at least one traffic marking in the image. By means of the traffic marking identification model, the image layers of different categories of traffic markings can be identified, and the image layer is further processed and then can be used for constructing a high-precision map.

Description

Image data processing method and system

Priority information

This application claims priority from Chinese application number 201811031784.9, filed on September 5, 2018, the entire contents of which are incorporated herein by reference.

Technical field

The present application relates to the field of intelligent assisted driving or unmanned driving, and in particular, to an image data processing method and system.

Background technique

Cars have become more and more important means of transportation in people's lives. In order to better improve the safety and comfort of drivers during driving, people have made many achievements in car intelligence. Generally, the lane detection of vehicle-mounted equipment is mainly used in unmanned driving and lane keeping systems. It mainly detects lane lanes in front of the vehicle based on the images collected by the camera, and the system can automatically adjust the steering wheel steering when the vehicle deviates. Return the vehicle to the center of the lane. With the development of automobile intelligence, lane detection of vehicle equipment is gradually applied to the construction of high-precision maps. Therefore, it is necessary to propose an image data processing method and system to improve the accuracy of road traffic marking recognition.

Summary of the Invention

Embodiments of the present application provide an image data processing method, system, device, and computer-readable storage medium. Specifically include the following methods.

In a first aspect, the present application discloses a method for processing image data. The method includes obtaining an image including a road; obtaining at least one layer of the image based on the image and a trained traffic marking recognition model; the layer including at least one traffic marking in the image, and Reflecting the area of the at least one traffic line in the overhead image; processing the at least one layer to obtain position coordinates of the at least one traffic line in the image.

In some embodiments, obtaining at least one layer of the image based on the image and the trained traffic marking recognition model further includes: obtaining multiple sub-images based on the image; the multiple Each of the sub-images includes a part of the image, and at least two sub-images collectively include a part of the image; for each sub-image, based on the sub-image and the trained traffic marking recognition model, Obtain at least one layer of the sub-image; at least one layer of the sub-image includes at least a portion of at least one traffic line in the sub-image, and at least a portion of the at least one traffic line reflects that A region in the sub-image; and based on at least one layer of at least two sub-images, at least one layer of the image is jointly determined.

In some embodiments, the trained traffic marking recognition model is a trained MASK-RCNN model.

In some embodiments, the image size of the layer is the same as the image size of the overhead image, and the layer is a binary image.

In some embodiments, the processing the at least one layer to obtain the position coordinates of the at least one traffic line in the image further includes: performing an erosion treatment and an expansion treatment on the at least one layer. Or a combination of one or more of the smoothing processes.

In some embodiments, the image includes a top-down image of a road.

In some embodiments, the obtaining a top-view image including a road further includes: obtaining a road video taken by a vehicle-mounted device; obtaining multiple images based on the road video; obtaining image data of a same position in each image; and stitching the Image data of the same position in a plurality of images is used to obtain a top-view image of the road.

In some embodiments, the image data of the same position in each image includes image data of at least one line that is the same in each image.

In some embodiments, the layer also reflects a category of the at least one traffic line.

In some embodiments, the layer responsive to the image is a lane line layer, and processing the at least one layer to obtain the position coordinates of the at least one traffic marking in the image further includes: based on The pixel values of the lane line layer and the pixel points in the lane line layer determine the position coordinates of the left edge point and the right edge point of at least one lane line in the lane line layer; based on the at least one Position coordinates of a left edge point and a right edge point of a lane line in the lane line layer, determining position coordinates of a center line of the at least one lane line in the lane line layer; based on a preset lane line A width and a position coordinate of a center line of the at least one lane line determine a position coordinate of the at least one lane line in the image.

In some embodiments, the layer in response to the image is a lane line layer, and processing the at least one layer to obtain the position coordinates of the at least one traffic marking in the image further includes: judging Whether the distance in the horizontal direction between the end of at least one lane line and the end of another lane line in the lane line layer is smaller than the first threshold, and whether the distance in the vertical direction is smaller than the second threshold; When the distance between the end point of at least one lane line and the end point of another lane line in the horizontal direction is less than the first threshold value, and the distance in the vertical direction is less than the second threshold value, the The certain end point is spliced with the certain end point of the other lane line.

In some embodiments, the layer in response to the image is a lane line layer, and processing the at least one layer to obtain the position coordinates of the at least one traffic marking in the image further includes: judging Whether the length of a lane line in the lane line layer is less than a third threshold; and in response to the length of the certain lane line being less than a third threshold, removing the lane line from the image.

In some embodiments, the processing the at least one layer to obtain position coordinates of the at least one traffic line in the image further includes: based on the at least one layer and pixels in the layer The pixel value of the point, determining the maximum coordinate value and the minimum coordinate value of the traffic marking line in the horizontal direction in the layer, and the maximum coordinate value and the minimum coordinate value in the vertical direction; based on the traffic marking line in the horizontal direction The maximum coordinate value and the minimum coordinate value on the vertical direction and the maximum coordinate value and the minimum coordinate value in the vertical direction determine the position coordinates of the traffic marking in the overhead image.

In a second aspect, the present application discloses an image data processing system. The system includes at least one memory for storing computer instructions; at least one processor in communication with the memory, wherein when the at least one processor executes the computer instructions, the at least one processor causes the system to execute : Obtaining an image containing a road; obtaining at least one layer of the image based on the image and a trained traffic marking recognition model; the layer including at least one traffic marking in the image, and reflecting the The area of the at least one traffic line in the image is described; the at least one layer is processed to obtain position coordinates of the at least one traffic line in the image.

In a third aspect, the present application discloses an image data processing system. The system includes: an image acquisition module for acquiring an image including a road; a layer acquisition module for acquiring at least one layer of the image based on the image and a trained traffic marking recognition model; the The layer includes at least one traffic line in the image; a layer processing module is configured to process the at least one layer to obtain position coordinates of the at least one traffic line in the image.

In some embodiments, the layer acquisition module further includes: a sub-image acquisition unit configured to acquire multiple sub-images based on the image; the multiple sub-images each include a part of the image, and at least two The sub-images collectively include a portion of the overhead image; the sub-layer acquisition unit is configured to obtain at least at least one of the sub-images for each sub-image based on the sub-image and the trained traffic marking recognition model One layer; at least one layer of the sub-image includes at least a portion of at least one traffic line in the sub-image, and an area in the sub-image that reflects at least a portion of the at least one traffic line ; The joint determination unit is configured to jointly determine at least one layer of the images based on at least one layer of the at least two sub-images.

In some embodiments, the image size of the layer is the same as the image size of the image, and the layer is a binary image.

In some embodiments, the layer processing module is further configured to perform one or more of an etching process, an expansion process, or a smoothing process on the at least one layer.

In some embodiments, the image includes a top-down image of a road.

In some embodiments, the image acquisition module further includes: a video acquisition unit for acquiring a road video taken by the vehicle-mounted device; an image acquisition unit for acquiring multiple images based on the road video; an image data extraction unit for acquiring each image The image data at the same position in the image; the image data stitching unit is used to stitch the image data at the same position in the multiple images to obtain a top view image of the road.

In some embodiments, the layer in response to the overhead image is a lane line layer, and the layer processing module is further configured to: based on the pixel values of the lane line layer and the pixel values in the lane line layer, Determining position coordinates of the left edge point and the right edge point of at least one lane line in the lane line layer; based on the positions of the left edge point and the right edge point of the at least one lane line in the lane line layer Coordinates, determining position coordinates of a center line of the at least one lane line in the lane line layer; determining the at least one based on a preset width of the lane line and position coordinates of the center line of the at least one lane line The position coordinates of the lane line in the image.

In some embodiments, the layer in response to the overhead image is a lane line layer, and the layer processing module is further configured to determine a certain end point of at least one lane line and a certain lane line of another lane line in the lane line layer. Whether the distance of one endpoint in the horizontal direction is smaller than the first threshold and whether the distance in the vertical direction is smaller than the second threshold; in response to a certain endpoint of the at least one lane line and a certain endpoint of the other lane line being horizontal When the distance in the direction is less than the first threshold and the distance in the vertical direction is less than the second threshold, the certain end point of the at least one road and the certain end point of the other lane line are stitched .

In some embodiments, the layer in response to the overhead image is a lane line layer, and the layer processing module is further configured to: determine whether a length of a lane line in the lane line layer is less than a third threshold; The length of a certain lane line is less than the third threshold, and the lane line is removed from the image.

In some embodiments, the layer processing module is further configured to determine, based on the at least one layer and pixel values of pixels in the layer, a maximum coordinate value of a traffic marking line in the layer in a horizontal direction And minimum coordinate values, and the maximum and minimum coordinate values in the vertical direction; based on the maximum and minimum coordinate values in the horizontal direction of the traffic markings, and the maximum and minimum coordinate values in the vertical direction The coordinate value determines a position coordinate of the traffic marking line in the image.

In a fourth aspect, the present application discloses a computer-readable storage medium. The storage medium stores computer instructions, and when the computer reads the computer instructions in the storage medium, the computer executes the image data processing method.

Some additional features of this application can be explained in the following description. Some of the additional features of this application will be apparent to those skilled in the art from a review of the following description and corresponding drawings, or an understanding of the production or operation of the embodiments. The features of the present application can be achieved and achieved through the practice or use of the methods, means, and combinations of the various aspects of the specific embodiments described below.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of an on-demand service system according to some embodiments of the present application.

FIG. 2 is a block diagram of an exemplary computing device for a dedicated system for implementing the technical solutions of the present application.

FIG. 3 is a block diagram of an exemplary mobile device of a dedicated system for implementing the technical solution of the present application.

FIG. 4 is a flowchart of an exemplary image data processing method according to some embodiments of the present application.

FIG. 5 is a flowchart of a method for acquiring a top-view image of a road according to some embodiments of the present application.

FIG. 6 is a top view image of an exemplary road according to some embodiments of the present application.

FIG. 7 is a flowchart of a method for obtaining at least one layer of an image according to some embodiments of the present application.

FIG. 8 is an original image before processing a lane line layer according to some embodiments of the present application.

FIG. 9 is a result diagram of the lane line layer being etched according to some embodiments of the present application.

FIG. 10 is a result diagram of the lane line layer after being etched and expanded after being etched according to some embodiments of the present application.

FIG. 11 is a method for determining a position coordinate of a lane line in a top-view image according to some embodiments of the present application.

FIG. 12 is a result diagram after a center line of a lane line is determined according to some embodiments of the present application.

FIG. 13 is a result diagram of extracting a center lane line center line after the lane line layer is eroded and expanded according to some embodiments of the present application.

FIG. 14 is a flowchart of an exemplary method for processing a multi-segment lane line according to some embodiments of the present application.

FIG. 15 is a flowchart of a method for determining position coordinates of at least one traffic line in a top-view image according to some embodiments of the present application.

FIG. 16 is a schematic diagram of processing a triangle landmark according to some embodiments of the present application.

FIG. 17 is a block diagram of an image data processing apparatus according to some embodiments of the present application.

FIG. 18 is a block diagram of a top-view image acquisition 1710 according to some embodiments of the present application.

FIG. 19 is a block diagram of a layer acquisition module 1720 according to some embodiments of the present application.

detailed description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are just some examples or embodiments of the present application. For those of ordinary skill in the art, the present application can also be applied to these drawings without creative efforts. Other similar scenarios. Unless obvious from a locale or otherwise stated, the same reference numerals in the figures represent the same structure or operation.

As shown in this application and the claims, the words "a", "an", "an" and / or "the" do not specifically refer to the singular, but may include the plural unless the context clearly indicates an exception. Generally speaking, the terms "including" and "comprising" are only meant to include clearly identified steps and elements, and these steps and elements do not constitute an exclusive list, and the method or equipment may also include other steps or elements.

Although this application makes various references to certain modules or units in the system according to the embodiments of this application, any number of different modules or units may be used and run on the client and / or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.

A flowchart is used in the present application to explain the operations performed by the system according to the embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed precisely in sequence. Instead, the various steps can be processed in reverse order or simultaneously. At the same time, you can add other operations to these processes, or remove a step or steps from these processes.

The embodiments of the present application can be applied to different transportation systems. Different transportation systems include, but are not limited to, one or a combination of several types of land, ocean, aviation, and aerospace. For example, taxis, special cars, downwind cars, buses, driving cars, trains, trains, high-speed rail, driverless vehicles, delivery / courier delivery, etc. apply a management and / or distribution transportation system. This application can accurately detect road images and determine the position coordinates of traffic markings in the images, so as to apply to the construction of high-precision maps. In some embodiments, the present application may also provide services for unmanned driving, for example, to achieve high-precision navigation based on the results of the processing of this application, so that unmanned vehicles are driven according to traffic markings. Application scenarios of different embodiments of the present application include, but are not limited to, one or a combination of a webpage, a browser plug-in, a client, a custom system, an enterprise internal analysis system, an artificial intelligence robot, and the like. It should be understood that the application scenarios of the system and method of the present application are merely some examples or embodiments of the present application. For those of ordinary skill in the art, without paying creative labor, they can also refer to these drawings. Apply this application to other similar scenarios.

In some embodiments, the traffic markings may include at least two types of markings. The first type is lane lines, such as

lane lines

601 and 602 shown in FIG. 6, and the second type is non-lane line markings (also known as landmarks or landmark lines), such as U-turns shown in FIG. 6 Landmark 603 and go straight landmark 604.

The lane line may include a two-way two-lane road surface center line to separate the opposite traffic flow, indicating that under the principle of ensuring safety, vehicles are allowed to cross overtaking, usually instructing motorists to drive to the right, and the line is a yellow dotted line. The lane line may include a lane dividing line to separate the traffic flow in the same direction, which means that under the principle of ensuring safety, vehicles are allowed to overtake the line or change lanes, and the line is white. The lane line may include a lane edge line, which is used to indicate the edge of a motor vehicle lane or to divide a boundary between a motor vehicle and a non-motor vehicle lane. The line is a solid white line. The lane line may include a highway distance confirmation marking line to provide a reference for the driver of the vehicle to maintain a safe driving distance. The line is a thick white parallel solid line. Lane lanes may include overtaking lanes, meaning that vehicles are strictly prohibited from overtaking or overpassing lanes, and are used to divide roads with two or more motorways in the up and down direction without a central divider. For example, the center Yellow double solid line, center yellow solid line, three lane marking line, lane changing is forbidden. Lane lines can include diversion lines, indicating that the vehicle must travel on a prescribed route, and must not run on or over the line. The line is white.

The landmark line can include a left turn to-be-turned area line, which is used to indicate that a turning vehicle can enter the to-be-turned area during a straight ahead period and wait for a left turn. The left turn period ends. Vehicles are prohibited from staying in the area to be turned. The line is a white dotted line. The landmark line may include a left turn guide line, which indicates a boundary between a left turn motorized vehicle and a non-motorized vehicle. Motor vehicles travel to the left of the line, and non-motor vehicles travel to the right of the line. The lines are white dashed lines. The landmark line may include a pedestrian crossing line, which indicates that a pedestrian is allowed to cross the lane, and the line is a white parallel thick solid line (orthogonal, oblique). The landmarks can include highway entrance and exit markings, which are used to provide safe intersections for vehicles entering or exiting the ramp, and to reduce collisions with protruding edges, including horizontal markings at entrances and exits, and markings in triangular areas. The landmark line may include a parking space line indicating the location where the vehicle is parked, and the line is a solid white line. The landmark line can include a harbour-style stop line, which indicates that public buses pass through a dedicated separation approach and stop location. The landmark line may include a toll island mark, indicating the location of the toll island, and providing clear markings for vehicles entering the toll lane. The landmark line may include a guide arrow, which indicates the driving direction of the vehicle, and is mainly used in the guide lane at the intersection, near the exit ramp and to guide channelized traffic. The landmark line may include a road surface text line, which is a mark that indicates or restricts a vehicle's travel using road surface text, for example, a maximum speed, a large car, a small car, a passing lane. The landmark line may include a forbidden line, for example, a roadside parking line is prohibited, and a line indicating parking on the roadside is prohibited. The landmark line may include a stop line, for example, a stop line at a signal light intersection, which is a solid white line, indicating a parking position where the vehicle is waiting to be released. The landmark line can include a parking yield line, which means that vehicles must stop at this intersection or slow down to give way to the main road. The landmark line can include a deceleration yield line, which means that vehicles must decelerate or stop at this intersection, allowing vehicles on the main road to go ahead. The landmark line may include a non-motor vehicle no-car zone marking line to notify the cyclist of the prohibited entry area at the intersection. The landmark line may include a mesh line to inform the driver that temporary parking is not allowed at the intersection. The landmark line can include a center circle to distinguish between large and small turning vehicles. The landmark line may include a dedicated lane line to indicate a dedicated lane restricted to a certain vehicle type, and other vehicles and pedestrians may not enter. The landmark line may include a U-turn prohibition mark. The landmark line may include a warning line and a lane width gradient line to warn the driver of the vehicle that the road width is reduced or the number of lanes is reduced. Drive with caution and prohibit overtaking. Landmark lines can include deceleration lines to warn drivers of vehicles that they should slow down.

"Passenger", "Passenger User", "Passenger Device", "Driver", "Driver User", "Driver Device", "Client", "Client Device", " "Client user", "user", "user", "user terminal", etc. are interchangeable and refer to the party who needs or subscribes to the service, which can be an individual or a tool. In addition, the "user" described in this application may be a party that requires or subscribes to a service, or a party that provides or assists in providing a service.

The image data processing system 100 may include a server 110, a network 120, a user terminal 130, a vehicle-mounted device 140, and a memory 150.

The server 110 may be local or remote. The server 110 may process information and / or data. In some embodiments, the server 110 may be a system for analyzing and processing the collected information to generate an analysis result. The server 110 may be a terminal device, a server, or a server group. The server group may be centralized, such as a data center. The server group may also be distributed, such as a distributed system. In some embodiments, the server 110 may be implemented on a cloud platform. For example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-layer cloud, or the like, or any combination thereof. In some embodiments, the server 110 may execute on the computing device 200 described in FIG. 2 that includes one or more components.

In some embodiments, the server 110 may include a processing engine 112. The processing engine 112 may process information and / or data to perform one or more functions described in this application. For example, the processing engine 112 may acquire an image containing a road. The processing engine 112 may obtain a trained traffic marking recognition model. The processing engine 112 may obtain at least one layer of the overhead image based on the image and the trained traffic landmark recognition model; the layer includes at least one traffic landmark in the overhead image, and at least one An area of the at least one traffic line in the overhead image is reflected. The processing engine 112 may process the at least one layer to obtain position coordinates of the at least one traffic marking line in the overhead image. In some embodiments, the processing engine 112 may include one or more processing engines (eg, a single-core processing engine or a multi-core processor). For example only, the processing engine 112 may include one or more hardware processors, such as a central processing unit (CPU), an application-specific integrated circuit (ASIC), an application-specific instruction set processor (ASIP), an image processing unit (GPU), a physical Computing Processing Unit (PPU), Digital Signal Processor (DSP), Field Programmable Gate Array (FPGA), Programmable Logic Device (PLD), Controller, Microcontroller Unit, Reduced Instruction Set Computer (RISC), Microprocessor Device or any combination thereof.

The user terminal 130 may be a passenger terminal or a driver terminal, and may also refer to an individual, a tool, or other entity that issues a service order. The user terminal 130 may take a road image through a camera carried by the user terminal 130 and upload the road image to the server 110. In some embodiments, the user terminal 130 includes, but is not limited to, one or more of a mobile device 130-1, a built-in vehicle 130-2, a notebook computer 130-3, a desktop computer 130-4, and the like. In some embodiments, the mobile device 130-1 may include a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the wearable device may include a bracelet, footwear, glasses, helmet, watch, clothing, backpack, smart accessories, etc. or any combination thereof. In some embodiments, the mobile device may include a mobile phone, a personal digital assistant (PDA), a gaming device, a navigation device, a point of sale (POS) device, a laptop computer, a desktop computer, etc., or any combination thereof. In some embodiments, the virtual reality device and / or the augmented reality device may include a virtual reality helmet, virtual reality glasses, a virtual reality eye mask, an augmented reality helmet, an augmented reality glasses, an augmented reality eye mask, etc., or any combination thereof. For example, the virtual reality device and / or augmented reality device may include Google Glass ^™ , RiftCon ^™ , Fragments ^™ , Gear VR ^™, and the like. In some embodiments, the client 130 may process information and / or data. The user terminal can be used to analyze and process the collected information to generate a system analysis result. For example, the user end 130 may perform preprocessing such as filtering, denoising, and correcting distortion on the acquired image. In some embodiments, the client 130 may also receive the processing result of the image by the server 110 and output the processing result to the user.

The in-vehicle device 140 may be a mobile device 140-1, a navigation device 140-2, a notebook computer 140-3, and a driving recorder 140-4. For example, the in-vehicle device 140 may include, but is not limited to, a driving recorder, a vehicle camera, a vehicle camera, a vehicle recorder, a vehicle recording device, a driving recorder, a car rear-view mirror, a driving recorder, a driving recorder, and the like having a camera function. Device. In some embodiments, the in-vehicle device 140 may capture a road image to complete a road image acquisition process. The server 110 may acquire the above-mentioned road image and process it.

The server 110 may directly access and access data information stored in the memory 150, and may also directly access and access information of the client 130 through the network 120.

The memory 150 may generally refer to a device having a storage function. The memory 150 is mainly used to store data collected from the user terminal 130 and various data generated during the operation of the image data processing system 100. For example, the storage device 150 may store images acquired from the user terminal 130 and / or the in-vehicle device 140. As another example, the storage device 150 may store instructions executed by the processing engine 112 or used for image data processing. As another example, the storage device 150 may store training samples for training a traffic marking recognition model. Specifically, the training sample may be multiple original images and their label information. The original image may include a top-view image of the road or an image in a non-top view of the road, and the original image may also include a processed top-view image of the image in a non-top view of the road. The annotation information reflects the traffic signs in each original image The area of the line, and / or the category.

The memory 150 may be local or remote. The connection or communication between the system database and other modules of the system can be wired or wireless. In some embodiments, the storage device 150 may include mass storage, removable memory, volatile read-write memory, read-only memory (ROM), etc., or any combination thereof. Exemplary mass storage may include magnetic disks, optical disks, solid-state disks, and the like. Exemplary removable memories may include flash drives, floppy disks, optical disks, memory cards, compact disks, magnetic tapes, and the like. An exemplary volatile read-only memory may include random access memory (RAM). Exemplary RAMs may include dynamic RAM (DRAM), double-rate synchronous dynamic RAM (DDR SDRAM), static RAM (SRAM), gate fluid RAM (T-RAM), zero-capacity RAM (Z-RAM), and the like. Exemplary ROMs may include mask ROM (MROM), programmable ROM (PROM), erasable programmable ROM (EPROM), electronically erasable programmable ROM (EEPROM), compact disc ROM (CD-ROM), and digital General-purpose disk ROM, etc. In some embodiments, the storage device 150 may be implemented on a cloud platform. For example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-layer cloud, or the like, or any combination thereof.

In some embodiments, the storage device 150 may be connected to the network 120 to communicate with one or more components in the image data processing system 100 (eg, the server 110, the in-vehicle device 140, etc.). One or more components in the image data processing system 100 may access data or instructions stored in the storage device 150 via the network 120. In some embodiments, the storage device 150 may be directly connected to or in communication with one or more components in the image data processing system 100 (eg, the server 110, the in-vehicle device 140). In some embodiments, the storage device 150 may be part of the server 110.

The network 120 may provide a channel for information and / or data exchange. In some embodiments, one or more components in the image data processing system 100 (for example, the server 110, the client 130, the in-vehicle device 140, and the storage 150) may send information to other components in the image data processing system 100 through the network 120 And / or data. For example, the processing engine 112 may obtain a top-view image of the road from the storage device 150 and / or the vehicle-mounted device 140 through the network 120. The network 120 may be a single network or a combination of multiple networks. The network 120 may include, but is not limited to, a cable network, a wired network, a fiber optic network, a telecommunications network, an internal network, the Internet, a local area network (LAN), a wide area network (WAN), a wireless local area network (WLAN), and a metropolitan area network (MAN). , Public switched telephone network (PSTN), Bluetooth network, ZigBee network, near field communication (NFC) network, etc. or any combination thereof. The network 120 may include one or more network access points, such as a wired or wireless access point, a base station (such as 120-1, 120-2), or a network exchange point. Through the above access points, the data source is connected to the network 120 and passed through Send information over the network.

It should be noted that the image data processing system 100 is provided for illustrative purposes only and is not intended to limit the scope of the application. For those of ordinary skill in the art, various modifications and changes can be made according to the description of this application. For example, the image data processing system 100 may further include a database, an information source, and the like. As another example, the image data processing system 100 may be implemented on other devices to implement similar or different functions. However, these amendments and changes will not depart from the scope of this application.

FIG. 2 is a block diagram of an exemplary computing device 200 for a dedicated system for implementing the technical solutions of the present application.

As shown in FIG. 2, the computing device 200 may include a processor 210, a memory 220, an input / output interface 230, and a communication communication port 240.

The processor 210 may execute calculation instructions (program code) and perform functions of the image data processing system 100 described in this application. The calculation instructions may include programs, objects, components, data structures, procedures, modules, and functions (the functions refer to specific functions described in this application). For example, the processor 210 may process image or text data obtained from any other component of the image data processing system 100. In some embodiments, the processor 210 may include a microcontroller, a microprocessor, a reduced instruction set computer (RISC), an application specific integrated circuit (ASIC), an application specific instruction set processor (ASIP), a central processing unit (CPU) , Graphics processing unit (GPU), physical processing unit (PPU), microcontroller unit, digital signal processor (DSP), field programmable gate array (FPGA), advanced RISC machine (ARM), programmable logic device, and capable of Any circuit, processor, etc., or any combination thereof that performs one or more functions. For illustration purposes only, the computing device 200 in FIG. 2 describes only one processor, but it should be noted that the computing device 200 in this application may further include multiple processors.

The memory 220 may store data / information obtained from any other components of the image data processing system 100. In some embodiments, the memory 220 may include mass storage, removable memory, volatile read and write memory, read-only memory (ROM), or the like, or any combination thereof. Exemplary mass storage may include magnetic disks, optical disks, solid-state drives, and the like. Removable storage may include flash drives, floppy disks, optical disks, memory cards, compact disks, and magnetic tapes. Volatile read and write memory may include random access memory (RAM). RAM can include dynamic RAM (DRAM), double-rate synchronous dynamic RAM (DDR, SDRAM), static RAM (SRAM), thyristor RAM (T-RAM), zero capacitance (Z-RAM), and so on. ROM can include mask ROM (MROM), programmable ROM (PROM), erasable programmable ROM (PEROM), electrically erasable programmable ROM (EEPROM), compact disc ROM (CD-ROM), and digital versatile disc ROM Wait.

An input / output (I / O) interface 230 may be used to input or output signals, data, or information. In some embodiments, the input / output interface 230 may enable a user to communicate with the image data processing system 100. In some embodiments, the input / output interface 230 may include an input device and an output device. Exemplary input devices may include a keyboard, a mouse, a touch screen, a microphone, and the like, or any combination thereof. Exemplary output devices may include display devices, speakers, printers, projectors, etc., or any combination thereof. Exemplary display devices may include a liquid crystal display (LCD), a light emitting diode (LED) based display, a flat panel display, a curved display, a television device, a cathode ray tube (CRT), etc., or any combination thereof. The communication port 240 may be connected to a network for data communication. The connection may be a wired connection, a wireless connection, or a combination of both. The wired connection may include a cable, an optical cable or a telephone line, or any combination thereof. The wireless connection may include Bluetooth, Wi-Fi, WiMax, WLAN, ZigBee, a mobile network (for example, 3G, 4G, or 5G, etc.), etc., or any combination thereof. In some embodiments, the communication port 240 may be a standardized port, such as RS232, RS485, and the like. In some embodiments, the communication port 240 may be a specially designed port.

FIG. 3 is a block diagram of an exemplary mobile device 300 for implementing a dedicated system of the technical solution of the present application. According to some embodiments of the present application, the user terminal 130 may be implemented thereon.

As shown in FIG. 3, the mobile device 300 may include a communication platform 310, a display 320, a graphics processor (GPU) 330, a central processing unit (CPU) 340, an input / output interface 350, a memory 360, a memory 370, and the like. The CPU 340 may include an interface circuit and a processing circuit similar to the processor 210. In some embodiments, any other suitable components, including but not limited to a system bus or controller (not shown), may also be included within the mobile device 300. In some embodiments, the operating system 361 (eg, iOS, Android, Windows Phone, etc.) and application programs 362 may be loaded from the memory 370 into the memory 360 for execution by the CPU 340. The application programs 362 may include a browser or an application program for receiving imaging, graphics processing, audio, or other related information from the image data processing system 100. The user's interaction with the information flow can be achieved via the input / output 350 and provided to the processing engine 112 and / or other components of the image data processing system 100 via the network 120.

To implement the various modules, units, and functions described in this application, a computing device or mobile device may be used as a hardware platform for one or more components described in this application. The hardware elements, operating systems, and programming languages of these computers or mobile devices are conventional in nature, and those skilled in the art can adapt these technologies to the on-demand service system described in this application after being familiar with these technologies. A computer with user interface elements can be used to implement a personal computer (PC) or other type of workstation or terminal device, and the computer can also act as a server if properly programmed. It is believed that those skilled in the art may also be familiar with the structure, programs, or general operations of this type of computer equipment. Therefore, no additional explanation is described for the drawings.

In some embodiments, the image data processing method 400 is executed by a device having processing and computing capabilities, such as the server 110 or the computing device 200.

Step 401: Acquire an image including a road.

In some embodiments, the image of the road may include a top view image of the road. A bird's-eye view image of a road may refer to a road image taken from an aerial view of the road. In some embodiments, the road image may also include road images taken from other angles. Obtaining road images can be obtained in many ways. For example, road images can be obtained by aerial photography, road images can also be obtained by surveillance cameras installed on the road, and road images collected by the user or on-board equipment (for example, driving recorder) can be used to obtain road images. image.

In some embodiments, the road image captured by the user terminal or the in-vehicle device includes a road image not viewed from another angle. For example, a user sits in a vehicle to obtain a road image through a user terminal, and another example is a road image captured by a vehicle-mounted device placed in a rearview mirror of a vehicle through a vehicle's front windshield. In some embodiments, the road image at a non-top view angle may be processed and converted into a corresponding top view image. For a detailed description of acquiring a top view image of a road based on a road video collected by a user terminal or a vehicle-mounted device, refer to FIG. 5 and related descriptions.

Step 402: Obtain a trained traffic marking recognition model.

The server 110 may obtain the trained traffic marking recognition model through the network 120. The trained traffic marking recognition model can identify the type of traffic markings in the road image and / or the location in the image, such as the position of lane lines in the road image, and the locations of various landmarks in the road image. .

In some embodiments, the trained traffic marking recognition model can process a road image, output the processed image, and label different types of traffic markings in different forms on the image. For example, the user can circle each traffic line with a dotted frame and / or mark different categories of traffic lines with text. As another example, a trained traffic marking recognition model can output an image and use different pixel values to represent different types of traffic markings on the image. For example, the user number "1" represents a lane line, and the number "2 "Means a landmark (for example, a sidewalk).

By way of example only, the preferred trained traffic lane recognition model in this embodiment is a trained MASK-RCNN model. The input when training the MASK-RCNN model is multiple original images and their label information. The original image may include a top-view image of a road or an image taken at a non-top-down angle of the road, and the original image may also include a processed top-view image of an image taken at a non-top-down angle of the road. The pixel coordinates or pixel position of the line, and / or category. The labeling information may be layer information labeled with various traffic marking categories. The layer can be an image the same size as the original image. In some embodiments, one original image corresponds to multiple layers, and each layer contains a type or a traffic line. In some embodiments, one original image may correspond to only one layer, and the layer includes all traffic markings in the original image. In order to facilitate subsequent image processing, in still other embodiments, the pixel values of the layer may be limited to a limited number of values. For example, the layer may be a binary image, and the pixel value of the traffic markings in the layer is "1". The pixel values of the remaining pixels are "0"; for example, by marking different types of traffic markings on the original image, different pixel values can be used to represent different types of traffic markings on the same image (for example, the pixel value " 1 "represents lane lines, pixel value" 2 "represents sidewalk, pixel value" 3 "represents straight), and pixel values of the remaining pixels are" 4 ". The server 110 can obtain layer information corresponding to different types of traffic markings. Based on the original image and the layers of its various traffic markings, the MASK-RCNN model learns to make the predicted output traffic markings and the various traffic markings consistent. MASK-RCNN model. In some embodiments, the trained traffic marking recognition model may also be a trained traffic marking model based on a convolutional neural network, a deep learning-based traffic marking model, a MATLAB-based traffic marking recognition model, Traffic recognition model based on intelligent cognition, traffic feature recognition model based on shape features, etc. Different types of original images (such as road top images, road images taken at non-top angles, or processed top images) are input into the MASK-RCNN model. The trained MASK-RCNN model can The road image is processed to obtain one or more layers of the corresponding image, and the layer may include traffic markings obtained by model recognition.

Step 403: Obtain at least one layer of the image based on the image and the trained traffic marking recognition model; the layer includes at least one traffic marking in the image, and reflects at least the An area of at least one traffic line in the image.

By inputting the image into a trained traffic marking recognition model, the server 110 can obtain at least one layer of the image. Specifically, at least one layer corresponding to different image types may be output according to an input image type (for example, a road top image, an image in a non-top view angle of the road, or a processed top image). In some embodiments, the output layers have the same traffic markings at the same location in the image corresponding to the traffic markings. A layer can include one or more traffic lines. In some embodiments, a layer may include only one type of traffic line (for example, only lane lines), or a plurality of types of traffic line (for example, lane lines and landmarks). As another example, a layer may include only one type of landmark (for example, a sidewalk), or may include multiple types of landmarks (for example, a sidewalk, a U-turn, and a slow down). As another example, one layer may only include lane lines, and each other layer may include only one type of landmark (for example, one layer may only include U-turns, one layer may include only sidewalks, and one layer may include only right turns).

The layer can be a binary image (the pixel value of each pixel is 0 or 1). A pixel value of 1 indicates that the point is a traffic marking, and a pixel value of 0 indicates that the point is non-traffic. Marking. The layer can also reflect the category of at least one traffic line. For example, a layer only contains information about lane lines, which is also called a lane line layer, where a pixel value of 1 indicates that the point is a lane line. For another example, a certain layer only includes information of a landmark (for example, a sidewalk), which is also referred to as a landmark layer, and a pixel value of 1 indicates that the point is a landmark. In other embodiments, the layer may have more types of pixel values. Different traffic markings are represented by different pixel values, and the pixel values of the same traffic marking are the same. The background pixels are distinguished from the traffic marking pixels. The pixel value of the point. Taking the MASK-RCNN model as an example, the MASK-RCNN model can output mask layer information of different types of traffic markings, and can obtain pixel-level location information of traffic markings. The mask layer information of each type of traffic markings is two-dimensional 0/1 raster data (the grid can include one or more pixels), and the pixel value is 1 in the area with traffic markings, and the value is 0 otherwise.

In other embodiments, the traffic markings on the output layer may be framed by a wire frame, and the area within the wire frame is the area of the traffic markings in the image. The wireframe can be one or more combinations of rectangular, circular, oval, polygonal, or irregular wireframes. Different wireframes can have different colors.

Step 404: Process the at least one layer to obtain position coordinates of the at least one traffic line in the image.

The layer reflects the position of at least one traffic line in the image. In some embodiments, the output layers have the same traffic markings at the same location in the image corresponding to the traffic markings. Through further processing of the layer, the server 110 can obtain more precise position coordinates of at least one traffic line in the image. The processing methods include, but are not limited to, image denoising, image transformation, edge detection, edge thinning, image analysis, image compression, image enhancement, image blur processing, image interpolation, binary transformation, threshold transformation, Fourier transformation, Discrete cosine transform, etc. For details about layer processing, please refer to the detailed description later.

By way of example only, FIG. 5 illustrates steps in which the server 110 acquires a top-view image of a road in some embodiments.

Step 501: Obtain a road video taken by a vehicle-mounted device.

On-board equipment can continuously shoot road videos while the vehicle is driving. Vehicle-mounted devices include, but are not limited to, vehicle-mounted recorders, vehicle cameras, vehicle cameras, vehicle-mounted recorders, vehicle-mounted recording devices, vehicle-mounted recorders, connected car rear-view mirrors, vehicle-mounted recorders, and vehicle-recording devices. The server 110 may obtain a road video shot by the vehicle-mounted device through the network 120.

Step 502: Acquire multiple images based on the road video.

The server 110 may divide the road video into multiple frames of data to obtain multiple images. For example, the server 110 may extract several consecutive frames of images, obtain an image every one frame, or obtain an image at multiple frames.

Step 503: Obtain image data at the same position in each image.

In some embodiments, the server 110 may extract image data of the same line in each image. In still other embodiments, the server 110 may extract image data of the same multiple lines in each image. For example, the server 110 may extract three lines (ie, the first line, the second line, and the third line) of image data from the top to the bottom of each image. In some embodiments, the server 110 may extract one or more lines of data with less distortion from each image. For example, you can extract one or more lines of data at the top or bottom of the image, or one or more lines with less distortion in the middle part of the image. In some embodiments, the server 110 may extract image data from other fixed positions in each image. The fixed position is not limited to image data of a fixed line or multiple lines, but may also be local image data, such as a fixed line. Or some columns in the image data in multiple rows.

Step 504: stitch the image data of the same position among the multiple images to obtain a top-view image of the road.

For example, after the server 110 can extract the image data of the same line in each image, the server 110 can stitch the image data of the same line in each image in chronological order to form the final overhead image of the road. In still other embodiments, the server 110 may form a plurality of top-view images of roads by stitching based on extracting image data of the same multiple lines in each image. For example, after the server 110 extracts three rows of image data from line 100, line 320, and line 480 from the top of each image, the server 110 may stitch the image data based on the line 100 image data of each image to form a line. A bird's-eye view image, the image data of the 320th line of each image are stitched to form a second bird's-eye view image, and the image data of the 480th line of each image are stitched to form a third bird's-eye view image. After the server 110 obtains the three overhead images, it can finally determine a suitable overhead image by comparing the quality (e.g., sharpness) of the three images.

Generally, on-vehicle equipment is installed in the car to obtain road images through the front windshield. The roads in the images may appear near wide, narrow and narrow visual effects. After the process shown in FIG. 5, the roads have different images. The widths on the lines can be approximately equal, which can be understood as converting the road images acquired by the in-vehicle equipment into images with similar shooting effects from a bird's-eye view.

For illustration only, FIG. 7 shows that in other embodiments, the server 110 obtains at least one layer of the image based on the image and the trained traffic marking recognition model, including:

Step 701: Obtain multiple sub-images based on the images; the multiple sub-images each include a portion of the image, and at least two sub-images collectively include a portion of the image. In some embodiments, the image may include a top-view image of the road, and may also include an image in a non-top-view perspective of the road or an image in a non-top-view perspective after processing. For a detailed description of the image, refer to FIG. 4 and related descriptions, and details are not described herein.

Step 702: For each sub-image, obtain at least one layer of the sub-image based on the sub-image and the trained traffic marking recognition model; the layer includes at least one of the sub-images At least a portion of a traffic line, and at least an area in the sub-image that reflects at least a portion of the at least one traffic line.

Step 703: Jointly determine at least one layer of the images based on at least one layer of the at least two sub-images.

For illustration only, the server 110 may capture sub-images of multiple images through a window smaller than the image. For example, the pixel size of an image is 1024 × 1024, and the size of the window is 64 × 64. By aligning the pixels of the 32nd row and the 32nd column of the window with each pixel of the image, the image can be captured to obtain 1024 × 1024 sub-images each having a size of 64 × 64. The pixel value of the uncovered image in the window is replaced with a pixel value of 0.

At least one layer of each sub-image can be identified by inputting each sub-image into a trained traffic marking recognition model (for example, a MASK-RCNN model), where each layer can include only one type of traffic Marking. For a single layer, the pixel value of the pixel point of the layer is 0 or 1, where 1 indicates that the pixel point is the traffic marking. For example, for a layer that is a lane line layer, a pixel value of 1 indicates that the point is a lane line. For another example, for a layer that is a sidewalk layer, a pixel value of 1 at a pixel point indicates that the point is a sidewalk. Further, for a plurality of layers of the same traffic marking line, the pixels at the same position in the top view image, when the statistical number of the pixel value of the pixel point is greater than a certain threshold, the server 110 It can be determined that the pixel value of the point in the layer of the traffic marking corresponding to the image is finally 1. For example, when 100 sub-images are input to a traffic marking recognition model (for example, a MASK-RCNN model), 100 sub-image lane line layers are output. The 100 sub-image lane line layers include the same A point in the image. . For the 100 sub-image lane line layers, when the number of pixel values corresponding to point A of the image is greater than 50, the server 110 may determine that the pixel value of point A in the image lane line layer is 1, otherwise, Determine that the final pixel value of point A in the final lane line layer is 0. Similarly, the server 110 may determine the final pixel value of each pixel point in each layer of the image. Finally, the server 110 may determine at least one layer of the image. For example, the server 110 may finally determine a lane line layer and a layer corresponding to each landmark (for example, a sidewalk layer, a U-turn layer, a left turn layer, etc.). Through this method, the sub-region of the image can be identified multiple times, and the results of the multiple recognition can be combined (e.g., fused) to determine the pixel value at the corresponding pixel position in the image layer, which can effectively reduce the recognition error and improve the image recognition. Precision.

In some implementations, the server 110 may include, but is not limited to, performing an etching process, an expansion process, or a smoothing process on the processing of the layer.

Take the lane line layer as an example. Figure 8 is the original image before processing the lane line layer. White represents the lane line and black represents the non-lane line.

In some embodiments, before the lane line is extracted, the server 110 may perform morphological processing on the lane line layer, mainly using a relatively small rectangular kernel (for example, kernel (2, 2), kernel (3, 3), Core (5,5)). Corrosion operation means that when the pixel value of a pixel in the core is not all 1, the pixel value of the pixel in the core is changed from 1 to 0 (that is, from white to black). When the value is all 1, no change is made. This operation can largely filter out invalid (or isolated) noise. FIG. 9 is a result diagram of the lane line layer after being corroded. As can be seen from FIG. 9, after the erosion, the lane line after the erosion is narrower than the original lane line.

Through the corrosion operation, most of the invalid (or isolated) noise can be filtered out, but at the same time, valid information will be lost. At this time, a relatively large rectangular kernel (for example, kernel (20,3), kernel (20,5) or core (5,3)). The dilation operation means that when the pixel values of the pixels in the kernel are not all 0, the pixels with the pixel values of 0 in the kernel are changed from 0 to 1 (that is, from black to white). When the value is all 0, no change is made. FIG. 10 is a result diagram of the lane line layer after being corroded and then subjected to expansion processing. Part of the expansion can offset the loss of precision introduced in the corrosion operation, and part of it can solve the situation of lane line disconnection caused by partial occlusion (as shown in Figure 901, the two lane lines are disconnected. After expansion, Figure 10 is obtained. The two lane lines in the middle 1001 section are connected).

In some implementations, FIG. 11 illustrates a method in which the server 110 may perform the following operations on the lane line layer to obtain the position coordinates of the image in response to the image layer being a lane line layer. In some embodiments, before performing the following operations on the lane line layer, the server 110 may perform one or more of an etching process, an expansion process, or a smoothing process on the lane line layer.

Step 1101: Determine the position coordinates of the left edge point and the right edge point of at least one lane line in the lane line layer based on the pixel values of the pixel points in the lane line layer and the lane line layer.

The server 110 may perform left and right lane line edge extraction on the layer from top to bottom. For example, the server 110 may read one line of data at a time to determine the first left edge point and the first right edge point of the first lane line in the line, where the first left edge point is a pixel value from 0 to 1 The first pixel point (from black to white), and the first right edge point is the first pixel point whose pixel value changes from 1 to 0 (from white to black). The server 110 continues to scan the line of data to obtain the second left edge point and the second right edge point. If the distance between the second left edge point and the first left edge point is greater than a certain threshold, it means that there are two lane lines, and By analogy, find the left edge point and right edge point of the third or fourth lane line. The server 110 scans data of different lines, and compares the position of a pixel point (for example, the left edge point) of the lane line of the line with the position of a pixel point (for example, the left edge point) of the lane line of the previous line. If it is less than a certain threshold, it is regarded as the same lane line, so that the second line of information is combined with the first line of information; and so on, combining multiple lines of information, and finally connecting to obtain multiple lane lines.

Step 1102: Determine position coordinates of a center line of the at least one lane line in the lane line layer based on position coordinates of a left edge point and a right edge point of the at least one lane line in the lane line layer.

In some embodiments, the server 110 may perform a certain operation on the coordinates of the left edge point and the right edge point of each lane line in the lane line layer, and use the operation result as the coordinates of the center line point of the lane line. For example, the coordinates of the left edge point and the right edge point of each lane line in the lane line layer are averaged, and the result is taken as the coordinate position of the lane line center line. The server 110 forms the center line of the lane line by connecting the midpoint of the left edge point and the right edge point of the lane line. FIG. 12 is a result diagram after the center line of the lane line is determined for the lane line layer.

Step 1103: Determine position coordinates of the at least one lane line in the image based on a preset width of the lane line and position coordinates of a center line of the at least one lane line.

The server 110 may determine the pixel width of the lane line in the lane line layer based on a preset width of the lane line (for example, 10 mm, 20 mm, 30 mm, 50 mm, 100 mm, 200 mm, etc.). For example, the server 110 may determine that the pixel width of the lane lines in the lane line layer is 30 pixels. In some embodiments, the server 110 may widen a certain pixel area from the center line of the lane line to the left and right to determine the position coordinates of the lane line in the image. For example, the server 110 may use the center line of the lane line as a reference to widen 15 pixels to the left and right sides to determine the position coordinates of the lane line in the image.

In some embodiments, the server 110 may determine whether the distance in the horizontal direction between the end of the centerline of one lane line and the head of the centerline of another lane line is less than the first threshold, and whether the distance in the vertical direction is Less than the second threshold. Exemplarily, the first threshold may be set to 15 pixels, and the second threshold may be set to 100 pixels. When the distance in the horizontal direction between the end of the centerline of one lane line and the head of the centerline of another lane line is less than the first threshold and the distance in the vertical direction is less than the second threshold, the server 110 may The ends of the at least one lane line and the head of the center line of the other lane line are connected by a linear interpolation method. Further, the server 110 may determine a position coordinate of the at least one lane line in the bird's-eye view image by widening a number of pixels to the left and right sides by using the spliced center line as a reference.

In some embodiments, the server 110 may perform the morphological processing (including corrosion and expansion) on the lane line layer and then perform the operation in FIG. 11, as shown in FIG. 13, extracting the center lane after the lane line layer is corroded and expanded. Result graph of line centerline.

In some embodiments, a detected lane line may be divided into several segments due to pedestrian occlusion or other occlusion reasons. For the case where the lane line is broken, the server 110 may perform splicing, filtering, and smoothing on the lane line. FIG. 14 shows a flowchart of an exemplary method for processing a multi-segment lane line.

Step 1401, it is determined whether a distance between a certain end point of at least one lane line and another end point of another lane line in the lane line layer in a horizontal direction is less than a first threshold value, and whether the distance in a vertical direction is less than a second threshold value .

The server 110 may determine the position coordinates of the left edge point and the right edge point of a certain end of each lane line in the lane line layer by the method of step 1101. Based on the position coordinates of the left edge point and the right edge point of a certain end of each lane line in the lane line layer, the server 110 may determine the endpoint of each lane line in the lane line layer. It should be noted that the end is the area where the lane line starts or ends in the vertical direction of the lane line layer, such as the top row of pixels or the bottom row of pixels. Width, the end can include multiple pixels in a row. The end point is a point in the end area, which is used to characterize the end of the lane line. The server 110 may determine a distance in a horizontal direction (also referred to as a horizontal distance) and a distance in a vertical direction (also referred to as a vertical direction) between a certain end point of each lane line and another end point of another lane line in the lane line layer. Straight distance). The server 110 may calculate a horizontal distance and a vertical distance between one end point of a certain lane line end (which may include multiple pixels) and one end point of another lane line head end (which may include multiple pixels). For example, the server 110 may calculate a horizontal distance and a vertical distance between the end of the centerline of a certain lane line and the beginning of the centerline of another lane line. As another example, the server 110 may calculate a horizontal distance and a vertical distance between a left edge point at the end of a certain lane line and a left edge point at the head of another lane line. Further, it is determined whether a distance between a certain end point of at least one lane line and another end point of another lane line in the lane line layer in a horizontal direction is less than a first threshold value, and whether a distance in a vertical direction is less than a second threshold value. .

In step 1402, in response to a distance between a certain end point of the at least one lane line and a certain end point of the other lane line in the horizontal direction being smaller than the first threshold and the distance in the vertical direction being smaller than the second threshold, The one end point of the at least one lane line and the one end point of the other lane line are spliced.

For example, the first threshold may be set to 15 pixels, and the second threshold may be set to 100 pixels. In some embodiments, the server 110 may combine pixels in a range of pixels formed by the left edge point and the right edge point of the end of the at least one lane line and the left edge point and the right edge point of the head end of the other lane line. The value becomes 1 to form a spliced segment of the two lane lines.

In some embodiments, the server 110 may connect the end of the center line of the at least one lane line and the head of the center line of the other lane line by a linear interpolation method. Further, the server 110 can widen a number of pixels to the left and right sides by using the spliced center line as a reference to realize the splicing of two lane lines.

In some embodiments, the server 110 may connect the left edge point of the end of the at least one lane line and the left edge point of the head of the other lane line by a linear interpolation method. Further, the server 110 can realize the splicing of the two lane lines by using the left edge line after splicing as a reference to be several pixels wide to the right.

In some embodiments, the server 110 may connect a right edge point at the end of the at least one lane line and a right edge point at the head of the other lane line by a linear interpolation method. Further, the server 110 may widen a number of pixels to the left by using the spliced right edge line as a reference to realize splicing of two lane lines.

Step 1403: Determine whether the length of a lane line in the lane line layer is less than a third threshold.

The server 110 may determine the positions of the head and end of a lane line in the lane line layer. The server 110 may determine the length of the lane line based on the position of the head and end of the lane line in the lane line layer. Further, the server 110 may determine whether the length of a certain lane line is less than a third threshold. The third threshold may be a value set in advance by the server 110, and for example, may be a distance of 100 pixels.

Step 1404, in response to the length of a certain lane line being less than a third threshold, removing the lane line from the image.

When the length of a lane line is less than the third threshold, the server 110 may change the pixel value of the pixel point of the lane line in the lane line layer from 1 to 0 (that is, from white to black).

Step 1405: Perform window smoothing processing on the processed lane line layer.

In some embodiments, in the extracted lane lines, the short-term bumps of the vehicle may cause a very small number of lane lines to have “spikes” after being extracted. Without loss of accuracy, the server 110 may Line layers are smoothed. Smoothing algorithms include, but are not limited to, mean filtering, median filtering, Gaussian filtering, bilateral filtering, and so on.

It should be noted that the processing sequence of the method 1400 for splicing, filtering, and smoothing lane lines is only exemplary, and the application does not limit the processing sequence. For example, the server 110 may perform filtering processing on the lane lines before performing stitching and smoothing processing. In addition, certain steps in method 1400 may be used independently. For example, steps 1401 and 1402 can be used independently to perform lane line splicing. Steps 1403 to 1404 can be used in combination to remove other traces or objects on the ground that interfere with traffic markings recognition. 1405 can be used independently to smooth the traffic markings. Such deformations are all within the protection scope of this application.

15 is a flowchart of another method for processing a layer to obtain position coordinates of at least one traffic line in an image according to some embodiments of the present application. The method 1500 can be used to process an image as a landmark layer. The server 110 may perform the following operations on the landmark layer to obtain the position coordinates of the landmark in the image.

Step 1501: Based on the at least one layer and the pixel values of the pixels in the layer, determine a maximum coordinate value and a minimum coordinate value of the traffic markings in the layer in the horizontal direction and a maximum value in the vertical direction. Coordinate value and minimum coordinate value.

Taking a triangular landmark representing a straight line as an example, as shown in FIG. 16, the server 110 may determine that the maximum coordinate value of the triangular landmark in the horizontal direction of the layer is x2, the minimum coordinate value is x1, and the maximum coordinate value in the vertical direction y2 and the minimum coordinate value y1.

Step 1502: Determine the position coordinates of the landmark in the image based on the maximum coordinate value and the minimum coordinate value in the horizontal direction of the traffic marking line and the maximum coordinate value and the minimum coordinate value in the vertical direction.

As shown in FIG. 16, based on the maximum coordinate value and the minimum coordinate value of the triangular landmark in the horizontal direction and the maximum coordinate value and the minimum coordinate value in the vertical direction, the server 110 may determine a rectangular area of the triangular landmark. The server 110 may determine the rectangular area ABCD of the triangular landmark based on the coordinates A (x1, y1), coordinates B (x2, y2), coordinates C (x1, y2), and coordinates D (x2, y2). When drawing a map, the server 110 may frame a region containing a landmark in a rectangular frame manner, and then mark the guide line such as going straight, turning left, or a zebra crossing.

For other landmark lines, the server 110 can also identify the rectangular area where the landmark is located by using the method 1500. When drawing a map, the server 110 frames the area containing the landmark in a rectangular frame manner, and then marks the guide line such as going straight, turning left, or a zebra crossing. It should be noted that the method 1500 is also applicable to the processing of lane lines.

The image data processing apparatus 1700 may be implemented by the server 110. For convenience of description, the image data processing apparatus 1700 may also be referred to as an image data processing system.

The image data processing device 1700 includes an image acquisition module 1710, a layer acquisition module 1720, and a layer processing module 1730.

The image acquisition module 1710 is configured to acquire an image including a road.

A layer acquisition module 1720 is configured to obtain at least one layer of the image based on the image and a trained traffic marking recognition model; the layer includes at least one traffic marking in the image, and reflects at least one A region of the at least one traffic line in the image.

A layer processing module 1730 is configured to process the at least one layer to obtain position coordinates of the at least one traffic line in the image.

In some embodiments, the layer processing module 1730 is further configured to perform one or more combinations of an etching process, an expansion process, or a smoothing process on the at least one layer.

In some embodiments, the image includes a top-down image of the road.

In some embodiments, in response to the layer of the image being a lane line layer, the layer processing module 1730 is further configured to determine at least one of the lane line layers and the pixel values of the pixels in the lane line layer. Position coordinates of a left edge point and a right edge point of a lane line in the lane line layer; based on position coordinates of the left edge point and the right edge point of the at least one lane line in the lane line layer, determine Position coordinates of a center line of the at least one lane line in the lane line layer; determining a position of the at least one lane line based on a preset width of the lane line and position coordinates of the center line of the at least one lane line Position coordinates in the image.

In some embodiments, in response to the image layer being a lane line layer, the layer processing module 1730 is further configured to determine whether an end point of at least one lane line and an end point of another lane line in the lane line layer are at Whether the distance in the horizontal direction is less than the first threshold value, and whether the distance in the vertical direction is less than the second threshold value; in response to a distance in the horizontal direction between an end point of the at least one lane line and an end point of the other lane line When the distance is smaller than the first threshold and the distance in the vertical direction is smaller than the second threshold, the certain end point of the at least one road and the certain end point of the other lane line are stitched.

In some embodiments, the layer in response to the image is a lane line layer, and the layer processing module 1730 is further configured to determine whether a length of a lane line in the lane line layer is less than a third threshold; in response to the certain line The length of the lane line is less than the third threshold, and the lane line is removed from the image.

In some embodiments, the layer processing module 1730 is further configured to determine, based on the at least one layer and the pixel values of the pixels in the layer, the maximum coordinate value and the minimum coordinate value of the traffic markings in the layer in the horizontal direction. Coordinate values, and the maximum and minimum coordinate values in the vertical direction; based on the maximum and minimum coordinate values in the horizontal direction of the traffic markings, and the maximum and minimum coordinate values in the vertical direction To determine the position coordinates of the traffic marking in the image.

The overhead image acquisition module 1710 includes a video acquisition unit 1810, an image acquisition unit 19820, an image data extraction unit 1830, and an image data splicing unit 1840.

The video acquiring unit 1810 is configured to acquire a road video captured by a vehicle-mounted device.

The image acquisition unit 1820 is configured to acquire multiple images based on the road video.

The image data extraction unit 1830 is configured to acquire image data of a same position in each image.

The image data splicing unit 1840 is configured to splice image data of the same position among the multiple images to obtain a top-view image of the road.

In some embodiments, the image data at the same position in each image includes at least one line that is the same in each image.

The layer acquisition module 1720 includes a sub-image acquisition unit 1910, a sub-layer acquisition unit 1920, and a joint determination unit 1930.

The sub-image obtaining unit 1910 is configured to obtain multiple sub-images based on the image; the multiple sub-images each include a part of the image, and at least two sub-images collectively include a certain part of the image;

The sub-layer obtaining unit 1920 is configured to obtain, for each sub-image, at least one layer of the sub-image based on the sub-image and the trained traffic marking recognition model; the layer includes the sub-image At least a portion of at least one traffic line in the image, and at least a region reflecting at least a portion of the at least one traffic line in the sub-image;

The joint determining unit 1930 is configured to jointly determine at least one layer of the images based on at least one layer of the at least two sub-images.

The present application discloses an image data processing system. The system includes at least one processor and at least one memory; the at least one memory is configured to store computer instructions; the at least one processor is configured to execute at least a part of the computer instructions to implement any image data processing method.

The present disclosure discloses a computer-readable storage medium. The storage medium stores computer instructions, and when the computer instructions are executed by a processor, any image data processing method is implemented.

It should be noted that the foregoing modules may be software modules implemented by computer instructions. The above description of the candidate selection display and determination system and its modules is for convenience of description only, and cannot limit the application to the scope of the illustrated embodiments. It can be understood that for those skilled in the art, after understanding the principle of the system, it is possible to arbitrarily combine the various modules or form a subsystem to connect with other modules without departing from this principle. For example, each module can share a storage module, and each module can also have its own storage module. Such deformations are all within the protection scope of this application.

The beneficial effects that the embodiments of the present application may bring include, but are not limited to: (1) different types of traffic markings can be identified through a traffic marking recognition model; (2) splicing processing, filtering processing, and smoothing of the extracted lane lines The processing solves the problem that the traffic markings are partially occluded and discontinuous to a certain extent; (3) extracting the same position of multiple images in the road video to stitch the top view image of the road to a certain extent solves the problem of image distortion; ( 4) The Mask-RCNN model can identify the pixel-level positions of different types of traffic markings in the layer, meeting the requirements for higher-precision traffic marking recognition. It should be noted that different embodiments may have different beneficial effects. In different embodiments, the possible beneficial effects may be any one or a combination of the foregoing, or any other beneficial effects that may be obtained.

The basic concepts have been described above. Obviously, for those skilled in the art, the above detailed disclosure is merely an example, and does not constitute a limitation on the present application. Although it is not explicitly stated here, those skilled in the art may make various modifications, improvements and amendments to this application. Such modifications, improvements and amendments are suggested in this application, so such modifications, improvements and amendments still belong to the spirit and scope of the exemplary embodiments of this application.

Meanwhile, specific words are used in this application to describe the embodiments of this application. For example, "one embodiment", "an embodiment", and / or "some embodiments" means a certain feature, structure, or characteristic related to at least one embodiment of the present application. Therefore, it should be emphasized and noted that the "one embodiment" or "one embodiment" or "an alternative embodiment" mentioned two or more times in different places in this specification does not necessarily refer to the same embodiment . In addition, certain features, structures, or characteristics in one or more embodiments of the present application may be appropriately combined.

In addition, those skilled in the art can understand that aspects of this application can be illustrated and described through several patentable categories or situations, including any new and useful processes, machines, products, or combinations of materials, or their Any new and useful improvements. Accordingly, various aspects of the present application can be executed entirely by hardware, can be executed entirely by software (including firmware, resident software, microcode, etc.), and can also be executed by a combination of hardware and software. The above hardware or software can be called "data block", "module", "engine", "unit", "component" or "system". In addition, aspects of the present application may manifest as a computer product located in one or more computer-readable media, the product including a computer-readable program code.

Computer storage media may contain a transmitted data signal containing a computer program code, such as on baseband or as part of a carrier wave. The propagation signal may have multiple manifestations, including electromagnetic form, optical form, etc., or a suitable combination form. A computer storage medium may be any computer-readable medium other than a computer-readable storage medium, which may be connected to an instruction execution system, apparatus, or device to enable communication, propagation, or transmission of a program for use. Program code on a computer storage medium may be transmitted through any suitable medium, including radio, cable, fiber optic cable, RF, or similar media, or any combination of the foregoing.

The computer program code required for the operation of each part of this application can be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C ++, C #, VB.NET, Python Etc., conventional programming languages such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may run entirely on the user's computer, or as a stand-alone software package on the user's computer, or partly on the user's computer, partly on a remote computer, or entirely on the remote computer or server. In the latter case, the remote computer can be connected to the user's computer through any network form, such as a local area network (LAN) or wide area network (WAN), or to an external computer (for example, via the Internet), or in a cloud computing environment, or as a service Uses such as software as a service (SaaS).

In addition, unless explicitly stated in the claims, the order of processing elements and sequences described in this application, the use of alphanumeric characters, or the use of other names is not intended to limit the order of the processes and methods of this application. Although the above disclosure discusses some embodiments that are currently considered useful through various examples, it should be understood that this type of detail is for illustration purposes only, and the appended claims are not limited to the disclosed embodiments. Instead, the claims It is intended to cover all modifications and equivalent combinations that conform to the spirit and scope of the embodiments of the present application. For example, although the system components described above can be implemented by hardware devices, they can also be implemented only by software solutions, such as installing the described system on an existing server or mobile device.

In the same way, it should be noted that, in order to simplify the expressions disclosed in this application and thereby help the understanding of one or more embodiments, the foregoing description of the embodiments of the application sometimes incorporates multiple features into one embodiment, attached Figure or description of it. However, this disclosure method does not mean that the subject of the present application requires more features than those mentioned in the claims. Indeed, the features of an embodiment are less than all the features of a single embodiment disclosed above.

In some embodiments, numbers describing the number of components and attributes are used. It should be understood that, for such numbers used in the description of the embodiments, the modifiers "about", "approximately" or "substantially" Modification. Unless stated otherwise, "about", "approximately" or "substantially" indicates that the number allows for a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximate values, and the approximate values may be changed according to the characteristics required by individual embodiments. In some embodiments, the numerical parameter should take the specified significant digits into account and adopt a general digits retention method. Although the numerical ranges and parameters used to confirm the breadth of the range in some embodiments of this application are approximate values, in specific embodiments, the setting of such values is as accurate as possible within the feasible range.

For each patent, patent application, patent application publication, and other materials cited in this application, such as articles, books, instructions, publications, documents, etc., the entire contents thereof are hereby incorporated by reference. Except for application history files that are inconsistent with or conflicting with the content of this application, except for those files that have the broadest scope of the claims in this application (currently or later attached to this application). It should be noted that if there is any inconsistency or conflict between the use of descriptions, definitions, and / or terms in the accompanying materials of this application, the descriptions, definitions, and / or terms of this application shall prevail .

Finally, it should be understood that the embodiments described in this application are only used to illustrate the principles of the embodiments of this application. Other variations may also fall within the scope of this application. Therefore, by way of example and not limitation, alternative configurations of embodiments of the present application may be considered consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to the embodiments explicitly introduced and described in the present application.

Those skilled in the art should understand that the embodiments of the present application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

The above is the basic idea of the present application, which is only presented in the form of an embodiment. Obviously, those skilled in the art can make corresponding changes, improvements or amendments according to the present application. These changes, improvements, and amendments have been implicitly or indirectly proposed by this application, and are all included within the spirit or scope of the embodiments of this application.

Claims

An image data processing method, comprising:

Get images containing roads;

Obtaining at least one layer of the image based on the image and the trained traffic marking recognition model; the layer includes at least one traffic marking in the image, and reflecting that the at least one traffic marking is in the A region in the image;

And processing the at least one layer to obtain position coordinates of the at least one traffic line in the image.
The method of claim 1, wherein obtaining at least one layer of the image based on the image and the trained traffic marking recognition model further comprises:

Obtaining a plurality of sub-images based on the image; each of the plurality of sub-images including a part of the image, and at least two of the sub-images collectively including a part of the image;

For each sub-image,

Obtaining at least one layer of the sub-image based on the sub-image and the trained traffic marking recognition model; at least one layer of the sub-image includes at least one traffic marking in the sub-image At least a portion, and at least a region in the sub-image that reflects at least a portion of the at least one traffic line;

Based on at least one layer of at least two sub-images, at least one layer of the images is jointly determined.
The method of claim 1, wherein the trained traffic marking recognition model is a trained MASK-RCNN model.
The method according to claim 1, wherein the image size of the layer is the same as the image size of the image, and the layer is a binary image.
The method of claim 1, wherein processing the at least one layer to obtain position coordinates of the at least one traffic line in the image further comprises:

The at least one layer is subjected to one or more combinations of an etching process, an expansion process, or a smoothing process.
The method of claim 1, wherein the image is a top-down image including a road.
The method according to claim 6, wherein the acquiring an image including a road further comprises:

Obtain road videos taken by vehicle equipment;

Acquiring multiple images based on the road video;

Obtain image data at the same position in each image;

The image data of the same position in the multiple images are stitched to obtain a top-view image of the road.
The method according to claim 7, wherein the image data of the same position in each image includes image data of at least one line that is the same in each image.
The method of claim 1, wherein the layer further reflects a category of the at least one traffic line.
The method of claim 9, wherein the layer responsive to the image is a lane line layer, and the at least one layer is processed to obtain the at least one traffic line in the image The location coordinates also include:

Determining position coordinates of the left edge point and the right edge point of at least one lane line in the lane line layer based on pixel values of the lane line layer and pixel points in the lane line layer;

Determining position coordinates of a center line of the at least one lane line in the lane line layer based on position coordinates of a left edge point and a right edge point of the at least one lane line in the lane line layer;

A position coordinate of the at least one lane line in the image is determined based on a preset width of the lane line and a position coordinate of a center line of the at least one lane line.
The method of claim 9, wherein the layer responsive to the image is a lane line layer, and the at least one layer is processed to obtain the at least one traffic line in the image The location coordinates also include:

Judging whether a distance between a certain end point of at least one lane line and another end point of another lane line in a lane line layer in a horizontal direction is less than a first threshold value, and whether a distance in a vertical direction is less than a second threshold value;

In response to a distance in a horizontal direction between a certain end point of the at least one lane line and a certain end point of another lane line being less than a first threshold value, and when the distance in the vertical direction is less than a second threshold value, The one end point of one road and the one end point of the other lane line are spliced.
The method according to claim 9 or 11, wherein the layer responsive to the image is a lane line layer, and the at least one layer is processed to obtain the at least one traffic marking line at the The position coordinates of the image also include:

Determine whether the length of a lane line in the lane line layer is less than a third threshold;

In response to the length of a certain lane line being less than a third threshold, the lane line is removed from the image.
The method of claim 1, wherein processing the at least one layer to obtain position coordinates of the at least one traffic line in the image further comprises:

Based on the at least one layer and the pixel values of the pixel points in the layer, determine the maximum coordinate value and the minimum coordinate value of the traffic markings in the layer in the horizontal direction, and the maximum coordinate value and Minimum coordinate value

Based on the maximum coordinate value and the minimum coordinate value in the horizontal direction of the traffic marking line and the maximum coordinate value and the minimum coordinate value in the vertical direction, the position coordinates of the traffic marking line in the image are determined.
An image data processing system, characterized in that the system includes:

At least one memory for storing computer instructions;

At least one processor in communication with the memory, wherein when the at least one processor executes the computer instructions, the at least one processor causes the system to execute:

Get images containing roads;

Obtaining at least one layer of the image based on the image and the trained traffic marking recognition model; the layer includes at least one traffic marking in the image, and reflecting that the at least one traffic marking is in the A region in the image;

And processing the at least one layer to obtain position coordinates of the at least one traffic line in the image.
The system of claim 14, wherein to obtain at least one layer of the image based on the image and the trained traffic marking recognition model, the at least one processor causes the system to Further implementation:

Obtaining a plurality of sub-images based on the image; each of the plurality of sub-images including a part of the image, and at least two of the sub-images collectively including a part of the overhead image;

For each sub-image,

Obtaining at least one layer of the sub-image based on the sub-image and the trained traffic marking recognition model; at least one layer of the sub-image includes at least one traffic marking in the sub-image At least a portion, and at least a region in the sub-image that reflects at least a portion of the at least one traffic line;

Based on at least one layer of at least two sub-images, at least one layer of the images is jointly determined.
The system according to claim 14, wherein the trained traffic marking recognition model is a trained MASK-RCNN model.
The system according to claim 14, wherein the image size of the layer is the same as the image size of the image, and the layer is a binary image.
The system of claim 14, wherein in order to process the at least one layer to obtain position coordinates of the at least one traffic line in the image, the at least one processor causes the at least one processor to The system further executes:

The at least one layer is subjected to one or more combinations of an etching process, an expansion process, or a smoothing process.
The system of claim 14, wherein the image comprises a top-view image of a road.
The system of claim 19, wherein in order to obtain an image containing a road, the at least one processor causes the system to further execute:

Obtain road videos taken by vehicle equipment;

Acquiring multiple images based on the road video;

Obtain image data at the same position in each image;

The image data of the same position in the multiple images are stitched to obtain a top-view image of the road.
The system according to claim 20, wherein the image data of the same position in each image includes image data of at least one line that is the same in each image.
The system of claim 14, wherein the layer further reflects a category of the at least one traffic line.
The system according to claim 22, wherein the layer in response to the image is a lane line layer, and the at least one layer is processed to obtain the at least one traffic line in the image. Position coordinates, the at least one processor causes the system to further execute:

Determining position coordinates of the left edge point and the right edge point of at least one lane line in the lane line layer based on the pixel values of the lane line layer and the pixel points in the lane line layer;

Determining position coordinates of a center line of the at least one lane line in the lane line layer based on position coordinates of a left edge point and a right edge point of the at least one lane line in the lane line layer;

A position coordinate of the at least one lane line in the image is determined based on a preset width of the lane line and a position coordinate of a center line of the at least one lane line.
The system according to claim 22, wherein the layer in response to the image is a lane line layer, and the at least one layer is processed to obtain the at least one traffic line in the image. Position coordinates, the at least one processor causes the system to further execute:

Judging whether a distance between a certain end point of at least one lane line and another end point of another lane line in a lane line layer in a horizontal direction is less than a first threshold, and whether a distance in a vertical direction is less than a second threshold;

In response to a distance in a horizontal direction between a certain end point of the at least one lane line and a certain end point of another lane line being less than a first threshold value, and when the distance in the vertical direction is less than a second threshold value, The one end point of one road and the one end point of the other lane line are spliced.
The system according to claim 22 or 24, wherein the layer responsive to the image is a lane line layer, and the at least one layer is processed to obtain the at least one traffic marking line in the Position coordinates of the image, the at least one processor causes the system to further execute:

Determine whether the length of a lane line in the lane line layer is less than a third threshold;

In response to the length of a certain lane line being less than a third threshold, the lane line is removed from the image.
The system of claim 14, wherein in order to process the at least one layer to obtain position coordinates of the at least one traffic line in the image, the at least one processor causes the at least one processor to The system further executes:

Based on the at least one layer and the pixel values of the pixel points in the layer, determine the maximum coordinate value and the minimum coordinate value of the traffic markings in the layer in the horizontal direction, and the maximum coordinate value and Minimum coordinate value

Based on the maximum coordinate value and the minimum coordinate value in the horizontal direction of the traffic marking line and the maximum coordinate value and the minimum coordinate value in the vertical direction, the position coordinates of the traffic marking line in the image are determined.
An image data processing system, comprising:

An image acquisition module, for acquiring an image containing a road;

A layer acquisition module, configured to obtain at least one layer of the image based on the image and the trained traffic line recognition model; the layer includes at least one traffic line in the image, and reflects the The area of the at least one traffic line in the image;

A layer processing module is configured to process the at least one layer to obtain position coordinates of the at least one traffic marking in the image.
A computer-readable storage medium, characterized in that the storage medium stores computer instructions, and after the computer reads the computer instructions in the storage medium, the computer executes the method, and the method includes:

Get images containing roads;

Obtaining at least one layer of the image based on the image and the trained traffic marking recognition model; the layer includes at least one traffic marking in the image, and reflecting that the at least one traffic marking is in the A region in the image;

And processing the at least one layer to obtain position coordinates of the at least one traffic line in the image.