CN117612140B

CN117612140B - Road scene identification method and device, storage medium and electronic equipment

Info

Publication number: CN117612140B
Application number: CN202410081782.XA
Authority: CN
Inventors: 丁宇; 王明明; 朱子凌
Original assignee: Foss Hangzhou Intelligent Technology Co Ltd
Current assignee: Foss Hangzhou Intelligent Technology Co Ltd
Priority date: 2024-01-19
Filing date: 2024-01-19
Publication date: 2024-04-19
Anticipated expiration: 2044-01-19
Also published as: CN117612140A

Abstract

The application discloses a road scene identification method and device, a storage medium and electronic equipment. Wherein the method comprises the following steps: acquiring a road image acquired by a vehicle-mounted terminal on a target road; inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of the road scene indicated by the road image, wherein the first scene recognition result comprises the matching probability of the road scene and a plurality of classification labels in a target classification mode; acquiring a road object set associated with the road image, wherein the road object set comprises a plurality of road objects determined according to an object identification result of the road image; and verifying the first scene recognition result according to the prior association relation between the road object set and each classification label to obtain a second scene recognition result. The application solves the technical problem of inaccuracy in the identification of the road scene in the related technology.

Description

Road scene identification method and device, storage medium and electronic equipment

Technical Field

The application relates to the field of intelligent driving, in particular to a method and a device for identifying a road scene, a storage medium and electronic equipment.

Background

Vehicle terminal scene recognition has become a popular research direction in the technical field of vehicle-mounted intelligence. Through the vehicle terminal scene recognition technology, vehicles can automatically recognize various scenes such as surrounding roads, traffic signs, pedestrians, vehicles and the like, and perform functions such as intelligent driving assistance, automatic driving, path planning, true value construction and the like, for example, training an automatic driving model requires obtaining a large amount of object data under the scenes, but training all the obtained data without screening wastes a large amount of resources, the efficiency is low, and scene fragments with high value and training significance can be selected through recognition of road scenes, for example, data of different scene recognition tags can be screened out as training data, and further the training efficiency of the automatic driving model can be improved.

The existing scene recognition mode is generally to directly recognize environment information received by sensor equipment such as a vehicle-mounted camera, a radar, a laser radar and the like, and realize recognition of different scenes by a vehicle terminal according to data received by the sensor and map information and vehicle surrounding environment information, but the data obtained based on the existing vehicle-mounted terminal sensor cannot accurately reflect the real environment where the vehicle is located, so that the technical problem of inaccuracy in recognition of road scenes exists in related technologies.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a method and a device for identifying a road scene, a storage medium and electronic equipment, which are used for at least solving the technical problem that the identification of the road scene in the related technology is inaccurate.

According to an aspect of an embodiment of the present application, there is provided a method for identifying a road scene, including: acquiring a road image acquired by a vehicle-mounted terminal on a target road; inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of a road scene indicated by the road image, wherein the first scene recognition result comprises matching probabilities of the road scene being respectively matched with a plurality of classification labels in a target classification mode; acquiring a road object set associated with the road image, wherein the road object set comprises a plurality of road objects determined according to an object identification result of the road image; and verifying the first scene recognition result according to the prior association relation between the road object set and each classification label to obtain a second scene recognition result.

According to another aspect of the embodiment of the present application, there is also provided a device for identifying a road scene, including: the first acquisition unit is used for acquiring a road image acquired by the vehicle-mounted terminal on a target road; the first determining unit is used for inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of the road scene indicated by the road image, wherein the first scene recognition result comprises matching probabilities of respectively matching the road scene with a plurality of classification labels in a target classification mode; a second obtaining unit, configured to obtain a road object set associated with the road image, where the road object set includes a plurality of road objects determined according to an object recognition result of the road image; and the second determining unit is used for checking the first scene recognition result according to the prior association relation between the road object set and each classification label to obtain a second scene recognition result.

As an optional solution, the above-mentioned road scene recognition device further includes: a third obtaining unit, configured to obtain a reference object set, where the reference object set includes a plurality of road objects for performing scene recognition; determining an object feature vector matched with the road image according to the set relation between the reference object set and the road object set; inputting the object feature vector and the label vector of the classification label into a target priori scene model to obtain label association probability, wherein the label association probability indicates the association relationship between the road scene matched with the classification label and the road object set; and under the condition that the tag association probability meets a tag verification condition, verifying the matching probability corresponding to the classified tag to obtain the second scene recognition result.

As an alternative, the third obtaining unit includes: the updating module is used for adjusting the matching probability corresponding to the classified label to 0 under the condition that the label association probability is smaller than or equal to a target probability threshold value; or updating the matching probability according to the product result of the tag association probability and the matching probability.

As an alternative, the third determining unit includes: the third determining module is used for acquiring a plurality of sample road images and the sample road images comprise sample road objects; training a priori scene model in a training stage according to the label vector of the classification label corresponding to the sample road image and the object vector corresponding to the sample road object, wherein the priori scene model in the training stage is a prediction model constructed according to a gradient lifting decision tree; and under the condition that the prior scene model meets the convergence condition, determining the prior scene model as the target prior scene model.

As an alternative, the third determining unit includes: a fourth determining module, configured to obtain, when the road image is a road image of one frame in a target image sequence, the second scene recognition result of each frame of the road image in the target image sequence, where the target image sequence includes a plurality of frames of road images in a road video collected by the vehicle terminal for the target road; and determining a target scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of the road image of each frame and the image weight corresponding to each road image of each frame.

As an alternative, the third determining unit further includes: the method comprises the steps of obtaining one classification label from a plurality of classification labels as a current classification label; acquiring a current matching probability corresponding to the current classification tag from the second scene recognition result of the road image of each frame; acquiring a plurality of weighted summation results among the matching probabilities and the corresponding image weights, and taking the weighted summation results as the sequence matching probabilities corresponding to the current classification labels; repeating the steps until traversing a plurality of the classification labels to obtain the sequence matching probabilities respectively corresponding to the classification labels; and determining the classification label corresponding to the target matching probability with the highest probability in the sequence matching probabilities as a target scene recognition result of the road scene indicated by the target image sequence.

As an alternative, the apparatus further includes: a verification unit, configured to determine a reference scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of the road image of each frame and an image weight corresponding to each of the road images of each frame, where the reference scene recognition result includes the matching probabilities that the road scene indicated by the target image sequence is matched with a plurality of classification labels in the target classification mode; and verifying the reference scene recognition result according to the prior association relation between the road object set and each classification label to obtain the target scene recognition result.

As an alternative, the third determining unit is further configured to: configuring an equal initial image weight for each frame of the road image in the target image sequence; when a key image frame is determined from the target image sequence, the initial image weight of the key image frame is increased to a first image weight, wherein the first image weight is higher than the initial image weight; and in the case that the reference image frame is determined from the target image sequence, reducing the initial image weight of the reference image frame to a second image weight, wherein the second image weight is lower than the initial image weight.

As an alternative, the first determining unit is further configured to: acquiring a plurality of sample road images and corresponding scene labels; performing image enhancement processing on the sample road image to obtain a reference sample image, and taking the scene tag corresponding to the sample road image as the scene tag of the reference sample image; and training the scene recognition network in a training state according to the sample road image and the reference sample image to obtain the target scene recognition network meeting the convergence condition.

According to yet another aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the recognition method of the road scene as above.

According to still another aspect of the embodiment of the present application, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above-mentioned road scene recognition method through the computer program.

In the above embodiment of the present application, a road image acquired by a vehicle-mounted terminal for a target road is acquired; inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of a road scene indicated by the road image, wherein the first scene recognition result comprises matching probabilities of the road scene being respectively matched with a plurality of classification labels in a target classification mode; acquiring a road object set associated with the road image, wherein the road object set comprises a plurality of road objects determined according to an object identification result of the road image; and verifying the first scene recognition result according to the prior association relation between the road object set and each classification label to obtain a second scene recognition result.

According to the embodiment of the application, at least one scene recognition result of the road scene is obtained based on the target scene recognition network, the road object set associated with the road image is obtained, the first scene recognition result is checked based on the priori association relation between the road object set and each classification label, the second scene recognition result is further obtained, the rationality and the accuracy of the scene recognition result of the first scene can be further judged through the priori association relation, the more accurate second scene recognition result is obtained, the road scene recognition method is highly adapted to the complex road driving scene, the technical effect of accurately recognizing the road scene is realized, and the technical problem that the data obtained by the existing vehicle-mounted terminal sensor cannot accurately reflect the real environment where the vehicle is located, and the inaccuracy exists in recognition of the road scene is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic illustration of an application environment of an alternative road scene recognition method according to an embodiment of the present application;

FIG. 2 is a flow chart of an alternative method of identifying road scenes in accordance with an embodiment of the application;

FIG. 3 is a schematic diagram of a classification tag output of an alternative road scene according to an embodiment of the application;

FIG. 4 is a schematic diagram of a class label output for another alternative road scene according to an embodiment of the application;

FIG. 5 is a schematic diagram of a classification tag output for a further alternative road scene in accordance with an embodiment of the application;

FIG. 6 is a schematic diagram of a classification tag output for a further alternative road scene in accordance with an embodiment of the application;

FIG. 7 is a schematic diagram of an alternative object prior model training process in accordance with embodiments of the application;

FIG. 8 is a schematic diagram of an alternative road scene recognition method model according to an embodiment of the application;

FIG. 9 is a schematic diagram of a classification tag output for a further alternative road scene in accordance with an embodiment of the application;

FIG. 10 is a flow chart of another alternative road scene recognition method according to an embodiment of the application;

FIG. 11 is a flow chart of yet another alternative road scene recognition method according to an embodiment of the application;

FIG. 12 is a schematic diagram of an alternative road scene recognition device according to an embodiment of the application;

fig. 13 is a schematic structural view of an alternative electronic device according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiment of the present application, there is provided a method for identifying a road scene, alternatively, the method for identifying a road scene may be applied to, but not limited to, a hardware environment as shown in fig. 1. Optionally, the method for identifying a road scene provided by the application can be applied to a vehicle terminal. Fig. 1 shows a side view of a vehicle terminal 101, which vehicle terminal 101 can travel on a running surface 113. The vehicle terminal 101 includes a memory 102 storing an on-board navigation system 103, a digitized road map 104, a space monitoring system 117, a vehicle controller 109, a GPS (global positioning system) sensor 110, an HMI (human/machine interface) device 111, and also includes an autonomous controller 112 and a telematics controller 114.

In one embodiment, the space monitoring system 117 includes one or more space sensors and systems for monitoring the viewable area 105 in front of the vehicle terminal 101, and a space monitoring controller 118 is also included in the space monitoring system 117; the spatial sensors for monitoring the visible area 105 include a lidar sensor 106, a radar sensor 107, a camera 108, and the like. The spatial monitoring controller 118 may be used to generate data related to the viewable area 105 based on data input from the spatial sensor. The space monitoring controller 118 may determine the linear range, relative speed, and trajectory of the vehicle terminal 101 based on inputs from the space sensors, for example, determining the current travel speed of the vehicle and the road condition information at which the vehicle is currently located. The space sensor of the vehicle terminal space monitoring system 117 may include an object location sensing device, which may include a range sensor that may be used to locate a front object, such as a front vehicle object.

The camera 108 is advantageously mounted and positioned on the vehicle terminal 101 in a position allowing capturing of an image of the viewable area 105, wherein at least part of the viewable area 105 comprises a portion of the travel surface 113 in front of the vehicle terminal 101 and comprising a trajectory of the vehicle terminal 101. The viewable area 105 may also include the surrounding environment. Other cameras may also be employed, including, for example, a second camera disposed on a rear or side portion of the vehicle terminal 101 to monitor one of the rear of the vehicle terminal 101 and the right or left side of the vehicle terminal 101.

The autonomous controller 112 is configured to implement autonomous driving or Advanced Driver Assistance System (ADAS) vehicle terminal functionality. Such functionality may include a vehicle terminal onboard control system capable of providing a level of driving automation. The driving automation may include a series of dynamic driving and vehicle end operations. Driving automation may include some level of automatic control or intervention involving a single vehicle end function (e.g., steering, acceleration, and/or braking). For example, the autonomous controller may be configured to generate the recognition result of the road scene by performing the steps of:

S102, acquiring a road image acquired by a vehicle-mounted terminal on a target road;

S104, inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of the road scene indicated by the road image, wherein the first scene recognition result comprises the matching probability of respectively matching the road scene with a plurality of classification labels in a target classification mode;

s106, acquiring a road object set associated with the road image, wherein the road object set comprises a plurality of road objects determined according to the object identification result of the road image;

s108, checking the first scene recognition result according to the prior association relation between the road object set and each classification label to obtain a second scene recognition result.

HMI device 111 provides man-machine interaction for the purpose of directing infotainment systems, GPS (global positioning system) sensors 110, on-board navigation system 103, and similar operations, and includes a controller. HMI device 111 monitors operator requests and provides status, service, and maintenance information of the vehicle terminal system to the operator. HMI device 111 communicates with and/or controls operation of a plurality of operator interface devices. HMI device 111 may also communicate with one or more devices that monitor biometric data associated with the vehicle terminal operator. For simplicity of description, HMI device 111 is depicted as a single device, but in embodiments of the systems described herein may be configured as multiple controllers and associated sensing devices.

Operator controls may be included in the passenger compartment of the vehicle terminal 101 and may include, by way of non-limiting example, a steering wheel, an accelerator pedal, a brake pedal, and operator input devices that are elements of the HMI device 111. The operator controls enable a vehicle terminal operator to interact with the running vehicle terminal 101 and direct operation of the vehicle terminal 101 to provide passenger transport.

The on-board navigation system 103 employs a digitized road map 104 for the purpose of providing navigation support and information to the vehicle terminal operator. The autonomous controller 112 employs the digitized road map 104 for the purpose of controlling autonomous vehicle terminal operations or ADAS vehicle terminal functions.

The vehicle terminal 101 may include a telematics controller 114, with the telematics controller 114 including a wireless telematics communication system capable of off-vehicle terminal communication, including communication with a communication network 115 having wireless and wired communication capabilities. Included in the wireless telematics communication system is an off-board server 116 that is capable of short-range wireless communication with mobile terminals.

According to the embodiment of the application, at least one scene recognition result of the road scene is obtained based on the target scene recognition network, the road object set associated with the road image is obtained, the first scene recognition result is checked based on the priori association relation between the road object set and each classification label, and then the second scene recognition result is obtained, so that the evaluation standard is highly adapted to the complex road driving scene, the technical effect of accurately recognizing the road scene is realized, and the technical problem that the real environment where the vehicle is located cannot be accurately reflected by the data obtained by the existing vehicle-mounted terminal sensor, and the recognition of the road scene is inaccurate is solved.

As an alternative embodiment, as shown in fig. 2, the method for identifying a road scene may be performed by an electronic device, and specific steps include:

s202, acquiring a road image acquired by a vehicle-mounted terminal on a target road;

S204, inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of the road scene indicated by the road image, wherein the first scene recognition result comprises the matching probability of respectively matching the road scene with a plurality of classification labels in a target classification mode;

S206, acquiring a road object set associated with the road image, wherein the road object set comprises a plurality of road objects determined according to the object identification result of the road image;

And S208, checking the first scene recognition result according to the prior association relation between the road object set and each classification label to obtain a second scene recognition result.

It can be understood that the road image acquired by the vehicle-mounted terminal in S202 in the above embodiment may be used for vehicle automated driving training, road scene generalization model training, and the like in different scenes in the unmanned field; the road image may include, but is not limited to, a road image obtained in real time for the vehicle terminal and used for road scene recognition alone, or a road image in a sequence of images obtained by the vehicle terminal, it being understood that the sequence of images may also be used for performing the road scene recognition operation.

Further, in the above step S204, it may be understood that the recognition result output by the target scene recognition network may be as shown in fig. 3, for example, the target classification mode may be classification according to road types, the multiple classification labels may be expressways, villages, homothermal, etc., the probability of the corresponding road type being expressways is 0.3, the probability of the road type being villages is 0.4, and the probability of the road type being homothermal is 0.1.

It should be noted that, the above-mentioned target classification manner may be classified into one or more types, for example, as shown in fig. 3, which is a schematic diagram of a classification label output result of a road scene, a road type may be determined as a target classification manner, or, as shown in fig. 4, a road type, an illumination type and a weather type may be determined as a target classification manner, and under each type corresponding to the target classification manner, a plurality of classification labels may be further classified, so as to obtain a matching probability that the classification labels are respectively matched with the plurality of classification labels, as shown in fig. 4, which is a schematic diagram of a classification label output result of the road scene.

In the above step S204, it is understood that, from the characteristics of the road object, the road object set may include a dynamic object set and a static object set, and the dynamic road object refers to an object having a changing characteristic on the road, which has mobility on the road and changes its position and state with time and space; these dynamic road objects may be monitored and identified by sensors, cameras, etc.; static road objects refer to objects that have no mobility on the road, which are usually fixed at specific locations on the road and do not change position and state over time and space. The location and attributes of the static road object may be obtained through map, GPS, etc. information. Dynamic objects can be, but are not limited to, pedestrians, bicycles, motorcycles, buses, trains, etc., and static objects can be, but are not limited to, traffic signs, road guardrails, street lamps, road markings, road signs, etc.; the road object set may include a road object acquired through a map API, a road object acquired through a road management system, a road object acquired through a traffic management department, a road object acquired through satellite image identification, a road object acquired through geography, and the like, in terms of the acquisition mode of the road object; from the viewpoint of the use of the road object, the road object set may include a road object on a road, a road object on a pedestrian path, a road object on a mountain path, a road object on a vehicle-specific path, and the like, and the manner of acquiring the road object is not particularly limited in this embodiment.

As an embodiment, the object in the road environment can be extracted and identified more precisely according to the following manner. First, a camera image may be acquired for preprocessing, including: according to the camera images acquired in a short time, the position and the gesture of the camera in the mobile robot coordinate system, namely the pose, are calculated, and a local map is established. The main methods for constructing the local map at the front end are a characteristic point method (indirect method) and a direct method. The feature point consists of a key point and a descriptor, wherein the key point refers to the position of the feature point in an image, can be detected in a multi-frame image, establishes a pairing relation by comparing the descriptors, optimizes the pose of a camera by minimizing a re-projection error, and is commonly used as an indirect method of orb-SLAM2; the direct method has no feature extraction step, directly uses gray information of pixels, optimizes pose by minimizing photometric error of the pixels, and is commonly used as DSO. Based on the theory of indirect and direct methods, a priori constraints can be added to the existing theoretical framework such as: dimensional constraints, planar feature constraints, parallel line feature constraints, and the like. According to the camera pose, three-dimensional map points and loop detection information calculated by the visual odometer at different moments, global optimization is carried out on the pose and the three-dimensional map points, the noise problem in the SLAM process is solved by constructing a local map at the rear end, the problem is modeled as the maximum posterior probability estimation in mathematics, and the main methods include a filter method represented by extended Kalman filtering and a nonlinear optimization method. And according to the estimated track, a map corresponding to the task requirement is established, so that a road image and a road object on the road map can be accurately obtained.

As another alternative, the truth value, road object, may be obtained by a truth value construction module that performs a truth value construction of successive frames on the segment. The true value comprises the position, the size and the motion information of dynamic barriers (vehicles, pedestrians, two-wheelers and the like) in each frame, and the accurate true value generation of target ID information in continuous frames; static targets such as lane lines, road edges, traffic signs, location information labels of traffic lights, etc.

The truth model is mainly used for carrying out target detection, tracking, track optimization and the like on dynamic and static obstacles detected by sensors such as images/lasers and the like on continuous frame fragments, and finally generating truth construction needed by all downstream algorithms of the continuous frame fragments. The truth value generated by the module can be subjected to two stages of coarse optimization and trajectory matching to optimize the truth value. The coarse optimization truth stage compares the result generated based on the large model and tracking with the key frame with a small amount of truth, if the frame is not the key frame, the model result is fully utilized, and if the frame is the key frame, the manual check is performed to perform the post-processing of missing detection and false detection. And entering a fine optimization stage, namely track optimization, after the true value of the coarse optimization stage. The track optimization divides the large segment and the true value frame to form small segments, then target tracking, recall and track optimization are carried out in the small segments, and then the small segments are connected in series. Through track optimization, the track, course angle and size of the target motion can be optimized in a 3D space better, and target truth information beyond the model capacity is supplemented.

After the corresponding true values are obtained, the true value information with different frequencies is needed in different test tasks due to different acquisition frequencies of images, point clouds and the like. By upsampling/downsampling the generated true values, more accurate true value information can be obtained. Further, the road object set and the classification labels have a priori association relationship, the obtained first scene recognition result can be checked to obtain a second scene recognition result, and the accuracy of the recognition scene can be further judged through the steps S204 and S206.

It should be noted that, as an alternative embodiment, scene classification tasks include, but are not limited to, urban roads, viaducts, high-speed, tunnels, villages, tollgates, parking lots and other tags, where 8 tags are taken as an example, a model is input into an image matrix of 3840×1980, a response chart of 224×224 is obtained by calculation through a backbone network of ResNet, the response chart outputs a 8*1-dimensional matrix through a fully connected neural network, the score of each category is obtained by calculation of softmax corresponding to the score of each of the tags, that is, the matching probabilities corresponding to each of the plurality of classification tags, and the output tags and the probability thereof are called soft tags (corresponding non-soft tags are tags with the largest output probability).

In an optional implementation manner, verifying the first scene recognition result according to the prior association relationship between the road object set and each classification label, and obtaining the second scene recognition result includes:

s1, acquiring a reference object set, wherein the reference object set comprises a plurality of road objects for scene recognition;

S2, determining an object feature vector matched with the road image according to a set relation between the reference object set and the road object set;

S3, inputting the object feature vector and the label vector of the classification label into a target priori scene model to obtain label association probability, wherein the label association probability indicates the association relationship between the road scene matched with the classification label and the road object set;

And S4, checking the matching probability corresponding to the classified label under the condition that the label association probability meets the label checking condition, and obtaining a second scene recognition result.

It should be understood that, in the embodiment of the present application, the reference object set may be a preset reference object set that is the same as or different from the road object and used for scene recognition, so as to avoid the problem that the road scene recognition is inaccurate due to misjudgment of the road object obtained according to the road image or the failure to accurately recognize the road object due to unclear road image, and it should be noted that the reference object set may include other road objects besides the road object recognized according to the road image, for example, static road objects such as pedestrians, dynamic road objects such as automobiles, and building objects such as residential buildings, markets, schools, and the like.

It should be noted that, the above-mentioned set relationship between the set of reference pairs and the set of road objects according to the obtained set relationship may include, but is not limited to, inclusion relationship, intersection relationship, equality relationship, etc.; and further obtaining an object feature vector matched with the road image as an input quantity of a scene recognition model, for example, if pedestrians are recognized in both the reference object set and the road object set, inputting the feature vector corresponding to the road object, namely the pedestrians, into a target prior scene model, and further calculating a tag vector of a classification tag corresponding to the pedestrians to obtain tag association probability.

Further, the obtained object feature vector and the label vector of the classification label are input into a target priori scene model to obtain label association probability, so that association relation between the road scene matched with the classification label and the road object set can be obtained, and it is understood that the association relation can be but is not limited to the reasonable degree between the road scene matched with the classification label and the road object set; it should be noted that, the tag association probability may be used to indicate an association relationship between a scene recognition result obtained by the tag matching and a road object set appearing in the road object set, for example, when the probability that the tag matching obtains that the scene recognition result is high speed is the greatest, and pedestrians appear in the road object set, then the association relationship between the road scene and the road object set that are matched by the tag matching is irrelevant (unreasonable), that is, the scene recognition has an abnormality.

It should be noted that, the objective prior scene model may be obtained by training based on GBDT algorithm, GBDT algorithm is an iterative decision tree algorithm, and by constructing a group of weak learners (trees) and accumulating the results of multiple decision trees as final prediction output, the algorithm effectively combines the decision tree with an integrated idea, and integrated learning is an algorithm for completing learning tasks by constructing and combining multiple learner tasks, and the integrated common classification includes: the idea of the Bagging and Boosting GBDT algorithm is that the result of all weak classifiers is added to be equal to the predicted value; each time taking the current prediction as a reference, the next weak classifier fits the residual error of the error function to the predicted value, namely the error between the predicted value and the true value, the residual error (the residual error of the previous tree is learned by each tree-global optimum) and the gradient (the gradient descent value of the previous tree is learned by each regression tree through a gradient descent method-local optimum).

According to the embodiment of the application, the reference object set is added, so that the problem that the road scene recognition is inaccurate due to the fact that the road object obtained according to the road image is misjudged or the road object cannot be accurately recognized due to the fact that the road image is unclear is avoided, the object feature vector matched with the road image can be obtained according to the relation between the reference object set and the road object set associated with the road image, the target prior scene model is input to obtain the association relation, the verification of the first scene recognition result is realized, the second scene recognition result is obtained, the accuracy of the second scene recognition result is improved, and the accuracy of the whole road scene recognition method is further improved.

It can be understood that the tag association probability may be, but is not limited to, greater than a certain preset value when the tag association probability satisfies the verification condition, so as to obtain the second scene recognition result through verification of the matching probability corresponding to the classified tag.

In an optional embodiment, when the tag association probability meets a tag verification condition, verifying the matching probability corresponding to the classified tag includes one of the following:

in the first aspect, when the tag association probability is less than or equal to a target probability threshold, the matching probability corresponding to the classified tag is adjusted to 0; or alternatively, the first and second heat exchangers may be,

And secondly, updating the matching probability according to the product result of the tag association probability and the matching probability.

The method for verifying the matching probability corresponding to the classification tag will be described below with reference to specific examples.

In the first manner, if the tag association probability is less than or equal to the target threshold, for example, the target threshold is 0.3, and pedestrians and bicycles appear in the obtained road image, the tag association probability is 0.1, and the tag association probability is less than the target threshold, the situation of the expressway is impossible, so that the matching probability corresponding to the tag classification is adjusted to 0, and the adjusted matching probability result is shown in fig. 5.

In the second mode, as an alternative implementation manner, multiplication may be performed on the tag association probability and the matching probability corresponding to the classified tag, the obtained value is used as the matching probability, the original probability is replaced, it may be understood that if the original matching probability is low, and the obtained tag association probability is also low, a lower result is obtained after multiplication, the result is updated to be the matching probability of the classified tag, and it may be more obvious that in the real scene corresponding to the road image, the occurrence probability of the object corresponding to the classified tag is extremely low, for example, the tag association probability of the expressway is 0.1,0.1×0.3=0.03, so that the matching probability is updated to be 0.03, that is, the probability that the scene corresponding to the road image is the expressway is extremely low, and the adjusted matching probability result is shown as 6.

According to the embodiment of the application, the obtained tag association probability is further processed, the algorithm of the scene recognition model is optimized, the matching probability of the road object corresponding to the abnormal classification tag in the scene is reduced by further processing the data, the possible technical effect of increasing the matching probability of the road object corresponding to the classification tag is achieved, and the reliability of the matching probability of the classification tag is improved.

In an alternative embodiment, before inputting the object feature vector and the tag vector of the classification tag into the object prior scene model, the method includes:

S1, acquiring a plurality of sample road images, wherein the sample road images comprise sample road objects;

S2, training a priori scene model in a training stage according to a label vector of a classification label corresponding to the sample road image and an object vector corresponding to the sample road object, wherein the priori scene model in the training stage is a prediction model constructed according to a gradient lifting decision tree;

And S3, determining the prior scene model as a target prior scene model under the condition that the prior scene model meets the convergence condition.

It can be understood that in this embodiment, the tag vector of the classification tag corresponding to the obtained sample road image and the object vector corresponding to the road object in the sample road are input into the prior experience model in the training stage to perform training, and when the prior scene model converges, the target prior scene model is obtained. The specific training process is explained as follows:

It should be noted that, the prior scene model in the training stage is a prediction model constructed according to the gradient lifting decision tree, which is understood as an iterative decision tree algorithm, and the result of a plurality of decision trees is accumulated as a final prediction output, that is, a corresponding scene recognition result is recognized by predicting the obtained target image sequence by constructing a group of weak learners (trees). The object a priori scene model is trained here with GBDT algorithm.

Specifically, GBDT is an iterative decision tree algorithm based on Boosting concept, which is an algorithm to classify or regress data by using an addition model (i.e. linear combination of basis functions) and continuously reducing the residual error generated by the training process. The intuitive understanding of GBDT algorithm is that each round of prediction and actual value has residual, the next round of prediction is performed again according to the residual, and finally all predictions are added to obtain the result. GBDT through multiple iterations, each iteration produces a weak classifier, each trained on the residuals of the previous classifier. The requirements for weak classifiers are generally simple enough and low variance and high bias. Because the training process is to continually increase the accuracy of the final classifier by reducing the bias. Under the regression task, GBDT will have a predicted value for each sample during each round of iteration, the loss function at this time is the mean square error loss function, and when the loss function selects "mean square error loss", the value of each fitting is (true value-predicted value), namely the residual error. I.e. the sum of the results of all weak classifiers is equal to the predicted value; each time, the current prediction is used as a reference, the next weak classifier fits the residual error of the error function to the predicted value (the error between the predicted value and the true value).

In the following, explanation is made on a concrete example of the GBDT model training process, as shown in fig. 7, where the illumination intensity is 50%, the illumination intensity is 30% to perform fitting, where the loss is 20%, then the remaining loss is fitted with the illumination intensity of 10%, where the difference is 10%, and the remaining difference is fitted with the illumination intensity of 5% for the third round, where the difference is 5%, if the iteration process is not yet completed, the iteration is continued, and each iteration, the error of the illumination intensity of the fitting is reduced, and the prediction result of the illumination intensity in fig. 7 is actually 30% +10% +1% +0% = 41%, that is, the illumination type scene identified by the current road image is a scene with the illumination intensity of 41%, so that the illumination type can be determined to be daytime or noon according to the weather judgment standard. Fig. 8 is a schematic diagram of a road scene recognition method model, and at this time, a real road scene is assumed to be a high-speed scene.

In the technical solution of the present application, the adjustment may be performed according to the actual situation according to specific steps, for example, in the road scene recognition method described in the present application, specifically, training the GBDT model may be performed in a row or column form, where each row or column contains a set of data, the row or column form is only an example, and the input data may be, but is not limited to: a plurality of tag vector combinations.

In an optional implementation manner, taking a first object vector corresponding to a pedestrian, a second object vector corresponding to a pavement and a third object vector corresponding to a traffic sign (a sallow pedestrian) as examples, probabilities that three object vectors respectively correspond to a scene, a B scene and a C scene and correspond to the a scene, the B scene and the C scene can be obtained respectively, and then all recognition results can be input into a training model to calculate the most accurate scene recognition result in the three scene recognition results;

Or under another implementation mode, after three object vectors are input, a scene recognition result and probability are obtained according to the association relation among the three object vectors, and then the obtained scene recognition result is input into a training model to calculate the probability that the scene recognition result is correct;

it should be noted that, the input tag vector is not limited to the object vector corresponding to the obtained classified tag, but includes the object vector corresponding to the tag that is not recognized but may exist, so that the probability (possibility) that the scene recognition result is a, B or C, etc. can be obtained based on the GBDT model, the process of predicting the accuracy of the input scene recognition result by using the GBDT model has continuous iteration steps, and each iteration, the error of fitting prediction is reduced, and the fitting process is the process of determining the accuracy of the scene recognition result, that is, in the embodiment, the error of determining the scene recognition result by using the GBDT model is reduced, and the accuracy of the scene recognition result is improved.

According to the embodiment of the application, the target priori scene model is trained by using GBDT algorithm, the target priori scene model is determined after convergence, that is, the error range between the predicted value and the true value is reduced (namely, residual error is reduced) each time in the training process of the model, and the result of the true value=predicted value is finally realized, so that the recognized scene error is small and the recognition result is more accurate after the target priori scene model is trained in the mode.

In an optional implementation manner, according to the prior association relationship between the road object set and each of the classification labels, the method further includes, after verifying the first scene recognition result to obtain a second scene recognition result:

s1, under the condition that a road image is one frame of road image in a target image sequence, acquiring a second scene recognition result of each frame of road image in the target image sequence, wherein the target image sequence comprises a plurality of frames of road images in a road video acquired by a vehicle-mounted terminal on a target road;

S2, determining a target scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of each frame of road image and the image weight corresponding to each frame of road image.

It can be understood that in the above embodiment of the present application, each frame in the obtained target road image sequence is used to identify a road scene, each frame corresponds to a second scene identification result, and a weight of each frame of road image is obtained, so as to determine a target scene identification result obtained by the target road image sequence, as shown in fig. 9, which is a schematic diagram of a classification tag output result of a road scene, the target road image sequence includes three frames, and a corresponding first scene identification result can be obtained corresponding to each frame.

Specifically, a certain frame in the target road image sequence may be determined to be a key frame, and may be selected manually or by using one with highest definition in the target road image sequence or one with highest definition in the target road image sequence as a key frame, wherein the weight of the key frame is set to be more than 0.5, and other image frames with lower definition or fewer definition of the target road image sequence are set to be less than 0.5.

According to the embodiment of the application, a certain frame of image in the target image sequence can be obtained, and then the target recognition result of the road scene indicated by the target image sequence can be determined according to the second scene recognition result corresponding to each frame of road image in the target image sequence and the weight, so that the technical effect of recognizing the scene corresponding to the target image sequence is realized.

In an optional embodiment, the determining, according to the second scene recognition result of the road image of each frame and the image weight corresponding to each road image of each frame, the target scene recognition result of the road scene indicated by the target image sequence includes:

s1, acquiring a classification label from a plurality of classification labels as a current classification label;

s2, acquiring the current matching probability corresponding to the current classification tag from the second scene recognition result of each frame of road image;

S3, acquiring a plurality of matching probabilities and weighted summation results among corresponding image weights, and taking the weighted summation results as sequence matching probabilities corresponding to the current classification labels;

s4, repeating the steps until a plurality of classification labels are traversed, and obtaining sequence matching probabilities respectively corresponding to the classification labels;

s5, determining the classification label corresponding to the target matching probability with the highest probability in the sequence matching probabilities as a target scene recognition result of the road scene indicated by the target image sequence.

It should be noted that, in an alternative embodiment, three sequential frames are taken as an example to identify the road type of the road image, as shown in fig. 9, which is a schematic diagram of the output result of the classification tag of the road scene, the probability that the road type is an expressway is 0.3×0.6+0.1×0.2+0.2×0.3=0.26, and the probability that the road type is a country is: 0.2 x 0.6+0.1 x 0.2+0.4 x 0.3=0.26, the probability of the road type being the same as: the probability that the calculation result of the road type probability of the target road image sequence is equal to or greater than 0.6×0.6+0.4×0.2+0.4×0.3=0.56 is known from the calculation result, so that the recognition result of the road scene target recognition scene indicated by the target image sequence is equal to or greater than the probability of equal to or greater than zero.

The steps are repeated, for example, the weather type and the illumination type can be calculated, the condition that the matching probability of the weather type and the illumination type is the largest is obtained, the condition that the matching probability of the illumination type and the classification tag is the largest is obtained, further, a detailed target scene recognition result obtained by the road image sequence can be determined, and as shown in fig. 9, road objects with the largest corresponding probabilities in the tags are obtained, namely, "the same in the sky", "the sunny day" and "the noon", so that the road scene corresponding to the road image can be known to be the same scene in the sunny noon based on the label result. Further, more accurate and detailed scene recognition results are obtained, and accuracy of road scene recognition in a complex road environment is improved.

According to the embodiment of the application, the matching probability of the plurality of classification labels in each frame in the target image sequence and the image weight are weighted and summed, so that the road object with the highest matching probability in each classification label is obtained, and the target scene recognition result of the road scene indicated by the target image sequence can be determined.

S1, determining a reference scene recognition result of a road scene indicated by a target image sequence according to a second scene recognition result of each frame of road image and image weights corresponding to each frame of road image, wherein the reference scene recognition result comprises matching probabilities of the road scene indicated by the target image sequence and a plurality of classification labels respectively in a target classification mode;

And S2, verifying the reference scene recognition result according to the prior association relation between the road object set and each classification label to obtain a target scene recognition result.

In this embodiment, after the recognition result of the target image sequence is obtained, the tag probability of matching each classification tag is further checked and adjusted, and according to the second scene recognition result and the corresponding weight corresponding to each frame of image, the recognition result of the road scene corresponding to the reference scene indicated by the target image sequence can be further determined.

For example, in an optional implementation manner, under the condition that the scene recognition result obtained by training according to the GBDT model includes a passenger car tag, a freight car tag, a traffic sign tag (service area) tag and matching probabilities corresponding to the classification tags, it is explained that the scene recognition result may be a high-speed scene, and further, under the condition that the truth result in the real road scene includes a toll gate, it is explained that the scene recognition result may be a high-speed scene, it is possible to judge that the classification tags and the probabilities are normal, and the scene recognition result is normal, that is, the obtained classification tags and the corresponding matching probabilities have high credibility; through the embodiment, the verification of the reference scene identification is realized, the label probability of high reliability of the classified labels is increased, the label probability of low reliability of the classified labels is reduced, and further the accuracy of the classified labels and the accuracy of the target scene identification result are improved.

According to the embodiment of the application, the identification result of the reference scene is further checked according to the prior association relation, the classification probability of each classification label match can be further checked and adjusted after the identification result of the image sequence is obtained, the identification result of the scene which is the target scene can be basically determined under the condition that the reliability of the identification result of the reference scene is high by checking, the identification result of the target scene can be more accurately obtained by further checking the identification result of the scene based on the embodiment, and the technical problem of inaccuracy in the identification of the road scene in the related technology is solved.

In an optional embodiment, before determining the target scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of the road image of each frame and the image weight corresponding to each road image of each frame, the method further includes:

s1, configuring an equal initial image weight for each frame of road image in a target image sequence;

S2, under the condition that the key image frame is determined from the target image sequence, the initial image weight of the key image frame is increased to a first image weight, wherein the first image weight is higher than the initial image weight;

and S3, under the condition that the reference image frame is determined from the target image sequence, reducing the initial image weight of the reference image frame to a second image weight, wherein the second image weight is lower than the initial image weight.

It will be appreciated that, under the initial condition, each frame of road image in the target image sequence has an equal initial image weight, the image weight corresponding to the key frame may be adjusted after the key frame is determined, and the method for determining the key frame includes, but is not limited to, defining an evaluation index, such as sharpness, stability, color, contrast, and the like, according to the criterion of image quality evaluation, selecting a frame with the optimal image quality as the key image frame, or selecting a frame with the most identified road object in the road image as the key frame, and increasing the initial image weight of the key image frame. In another embodiment, after the reference image frame of the target image sequence is determined, the initial image weight of the reference image frame is adjusted, and the initial image weight of the reference image frame is reduced to the second weight, i.e. the initial image weight of the reference image frame is reduced.

Specifically, the initial image weight of each frame of road image in the target image sequence is 0.3, after determining the key frame in the target image sequence, the key frame weight may be adjusted to 0,5, and after determining the reference image frame in the target image sequence, the reference image frame weight may be adjusted to 0.2, which is not limited herein specifically.

According to the embodiment of the application, the reliability and the accuracy of the scene recognition model algorithm are improved by optimizing the weights of the key image frames and the reference image frames in the target image sequence in the scene recognition model, so that the accuracy of road scene recognition is improved.

In an optional embodiment, before the inputting the road image into the target scene recognition network to obtain at least one first scene recognition result matched with the road scene in the road image, the method further includes:

S1, acquiring a plurality of sample road images and corresponding scene labels;

s2, performing image enhancement processing on the sample road image to obtain a reference sample image, and taking a scene tag corresponding to the sample road image as a scene tag of the reference sample image;

And S3, training the scene recognition network in a training state according to the sample road image and the reference sample image to obtain the target scene recognition network meeting the convergence condition.

In the data preprocessing stage, in order to enhance the adaptability of the target scene recognition model to different types of training data, the training capacity and recognition quality of the target scene recognition model can be improved by performing color conversion, exposure conversion, contrast conversion and the like on the road image, and by the implementation mode, when the data under different sensors are faced, complex scenes can be better used and recognized, the overall recognition quality is improved, and the robustness of a scene recognition model algorithm is improved.

A complete process of the present application is described below in conjunction with fig. 10:

S1002, acquiring an acquired road image; specifically, consecutive frame segment images may be equally spaced apart by a certain number of frames as shown in fig. 11 (a); s1004, inputting the frame-extracted road image data into a scene recognition model to recognize each scene information, as shown in a (b) diagram in FIG. 11; the identified tags include weather, road type, lighting type, road topology, etc. The scene is identified as a multi-task classification model based on deep learning, and a corresponding type result is output for each frame. Here, the soft label of each task output result, namely the probability value deduced by each type and model is reserved, and the scene recognition result of the graph (c) in fig. 11 is taken as an example;

s1006, extracting and inputting true value information of each frame of the frame-extracted road image into a priori empirical model for inference, wherein the scene recognition process of the (d) image in fig. 11 is taken as an example, pedestrians, vehicles and tricycles are in the dynamic scene true value, and traffic lights and road identifications (speed limit signs) are in the static scene true value;

In the dynamic scene reconstruction, a model is constructed by combining a small amount of key frame labels and true values, so that the moving track of dynamic obstacles in the scene can be better obtained, the label data can help the true value model to carry out algorithm lifting, the dynamic scene reconstruction method belongs to an evolutionary true value construction system, and accurate dynamic scene true values can be obtained, and specifically, the dynamic scene true values can be generated through the true value model and the label data together. The truth model is mainly used for carrying out target detection, tracking, track optimization and the like on dynamic and static obstacles detected by sensors such as images/lasers and the like on continuous frame fragments, and finally generating truth construction needed by all downstream algorithms of the continuous frame fragments. The labeling data is generated by extracting frames from the corresponding fragment data and labeling, and after manual labeling, the labeling data has more accurate true value information relative to an algorithm, and the original fragment can be inserted back through fragment matching mapping. In the process of establishing the true values, the labeling true values of the key frames can assist the true value algorithm to establish the true values, so that scene true value information which is more accurate than pure algorithm deduction is obtained. The accuracy of the true value can be further improved through a track optimization strategy in the follow-up process, and a better dynamic obstacle true value is realized;

In the static scene reconstruction process, the 3D space road identification information is better assisted to optimize by means of an image semantic segmentation result, when facing scenes which are difficult to solve by a laser radar such as lane line abrasion, unclear, rainy days and the like, the image can provide more information to help obtain more accurate static scene information, for example, firstly, the 3D space point cloud collected by the continuous frame segment laser radar can obtain the static target point cloud information of the scene segment through a laser SLAM algorithm, and the static targets such as pavement markers, elevation markers and the like of the SLAM scene are clustered and extracted according to the reflection intensity and the shape of the point cloud through a deep learning model; and then acquiring image information which is synchronous with the point cloud, and acquiring corresponding semantic information such as lane lines, road edges and the like through semantic segmentation. Converting the detected 3D lane lines into an image coordinate system through coordinates, intercepting lane line curves of 150 meters before and after each, and projecting the lane line curves into images at corresponding time; and finally, optimizing the 3D lane line through mask information of semantic segmentation on the image. For example, if the confidence of the 3D detection lane line is low, if the corresponding semantic information exists on the image, the corresponding semantic information is reserved; if the lane lines detected in the 3D space are not matched on the image, deleting the lane lines if the detection confidence is not high. The true quality of the static detected static target can be further optimized through the semantic segmentation information of the image.

Further, the a priori empirical model summarizes dynamic/static targets that may or may not occur within some scenarios based on some traditional experience, and supports judgment. Such as pedestrians in the scene, then basically no high-speed scene is present; when the traffic light appears, the high probability is an intersection scene; if a lane line separation point occurs, a ramp scene is possible.

S1008, judging whether the obtained classification label is reasonable; specifically, based on the prior experience model, the soft tag is screened once, unreasonable classification results are removed, the weight of classification of part of scene types is improved, and here, the example judgment result is that high-speed scenes are unlikely to occur, so that the scene recognition result is shown in a (e) diagram in fig. 11, namely, the scene recognition result is "high-speed" in the (e) diagram in fig. 11: 0.45 "deleted (scratched);

As an alternative, the prior model is based on GBDT method, the overall flow is as follows:

Firstly, converting data of an obtained road object, specifically, performing single-heat coding on a real scene label, and setting a positive sample for the existing label combination and a negative sample (1 and 0) on the contrary;

Then training the prior experience model, specifically inputting the data of the determinant into GBDT to obtain an optimized regression tree, outputting a two-classification label result for a given result, wherein the label result is reasonable when the result is more than 0.5, and otherwise, the label result is abnormal;

And finally, calling by using an priori experience model, specifically, judging the availability of the frame extraction result by using a regression tree according to the frame extraction image classification result, judging the frame extraction result to be the problem data if the score is less than 0.5, taking a scene as a high-speed scene as an example as shown in fig. 8, inputting the obtained classification label into the priori experience model to obtain the score of 0 of the pedestrian and 40 speed limit sign, namely the pedestrian and 40 speed limit sign as the problem data, and obtaining the score of 0.8 of the pedestrian, the rainy day, the 120 speed limit sign and the traffic light-free sign, and indicating that the label accords with the high-speed scene, namely the data is reasonable.

If the determined classification labels are reasonable, S1010 is executed, and for different recognition tasks, the multi-frame results are uniformly calculated (the calculation process is represented by "voting" in fig. 11), where the category with the largest sum of the multi-frame weights (calculation result) is selected as the last label of the scene, and as shown in the (f) diagram in fig. 11, the calculation result is obtained by: the recognition result corresponding to the weather type is a sunny day, the result corresponding to the illumination type is a daytime, and the result corresponding to the road type is an intersection;

Specifically, for tasks such as scene recognition, weather recognition and the like, extracting each frame of a 10-second video segment to conduct classification model reasoning, for each task, outputting a label of the task corresponding to each frame and probability thereof by a model, multiplying the probability by the weight of the label with the highest score, summing each label of each task to obtain the score of the label of the task corresponding to the video segment, and selecting the highest score result as the label corresponding to the task of the video segment;

otherwise, executing S1012, discarding the anomaly tag, as shown in (e) of fig. 11, discarding the "high speed" tag in the road type;

According to the embodiment, the first scene recognition result and the classification label matching probability can be obtained according to the obtained road image, the first scene recognition result is verified based on the prior association relationship between the road object set and the classification label according to the road object set associated with the obtained road image, the rationality and the accuracy of the scene recognition result of the first scene can be further judged through the prior association relationship, so that a more accurate second scene recognition result is obtained, the road scene recognition method is highly adapted to the complex road driving scene, and the scene recognition result in the complex environment is more accurate.

And the motion relation of each target vehicle and the own vehicle in the scene is classified semantically through traffic flow identification according to the identified different scenes, the semantics are defined through parameters, the motion mode of the target vehicle can be better described, the generalization of the parameters in a reasonable range can be ensured, the generalized scene is controllable and accords with physical world logic, and the algorithm can be accurately evaluated.

Specifically, the method may include the steps of: (1) traffic flow semantic recognition: firstly, analyzing the motion track and time information of each dynamic barrier target in the dynamic truth value information together with the motion track and time information of the vehicle, and combining static environment information such as lane information to obtain corresponding traffic flow semantic information such as cut-in, cut-out, follow, overtaking and the like. If the vehicle track cuts into the front from the left side of the vehicle, judging that the traffic flow semantic of the target is cut-in behavior according to the whole sequence; (2) traffic flow semantic parameterization: based on the identified traffic flow semantics, the traffic flow semantics are described with preset parameters by combining the track information. Different traffic flows have different parameter information, such as cut-in behavior, defining a plurality of parameters of start time, start position, cut-in time, cut-in angle, cut-in speed, speed change, completion time, completion position, etc. Through the parameters, the original track points can be converted into parameterized motion semantic information; (3) semantic generalization: here, a reasonable generalization range is set for each semantic, in combination with the dynamics principle and the current value of the vehicle motion. After generalization, a brand new vehicle running track is generated, unreasonable tracks are removed based on experience information, and a reasonable generalization scene is reserved. Meanwhile, the target can be generalized to other target types, such as a trolley to a truck and the like; meanwhile, the method comprises information such as weather illumination and the like, can be used for carrying out related generalization, and can be suitable for complete evaluation including a sensor perception model. By the method, a large number of similar scenes conforming to physical logic can be generated, the model can be helped to be verified rapidly and in a large quantity, and the evaluation result of the model can be obtained.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

According to another aspect of the embodiment of the present application, there is also provided a road scene recognition device for implementing the above road scene recognition method. As shown in fig. 12, the apparatus includes:

a first acquiring unit 1202, configured to acquire a road image acquired by a vehicle-mounted terminal for a target road;

The first determining unit 1204 is configured to input a road image into the target scene recognition network, and obtain at least one first scene recognition result of the road scene indicated by the road image, where the first scene recognition result includes matching probabilities that the road scene is respectively matched with the plurality of classification labels in the target classification mode;

A second obtaining unit 1206 configured to obtain a road object set associated with the road image, where the road object set includes a plurality of road objects determined according to an object recognition result of the road image;

The second determining unit 1208 performs verification on the first scene recognition result according to the prior association relationship between the road object set and each classification label, to obtain a second scene recognition result.

Optionally, the device for identifying a road scene further includes: a third obtaining unit, configured to obtain a reference object set, where the reference object set includes a plurality of road objects for performing scene recognition; determining an object feature vector matched with the road image according to the set relation between the reference object set and the road object set; inputting the object feature vector and the label vector of the classification label into a target priori scene model to obtain label association probability, wherein the label association probability indicates the association relationship between the road scene matched with the classification label and the road object set; and under the condition that the tag association probability meets a tag verification condition, verifying the matching probability corresponding to the classified tag to obtain the second scene recognition result.

Optionally, the third obtaining unit includes: the updating module is used for adjusting the matching probability corresponding to the classified label to 0 under the condition that the label association probability is smaller than or equal to a target probability threshold value; or updating the matching probability according to the product result of the tag association probability and the matching probability.

Optionally, the third determining unit includes: the third determining module is used for acquiring a plurality of sample road images and the sample road images comprise sample road objects; training a priori scene model in a training stage according to the label vector of the classification label corresponding to the sample road image and the object vector corresponding to the sample road object, wherein the priori scene model in the training stage is a prediction model constructed according to a gradient lifting decision tree; and under the condition that the prior scene model meets the convergence condition, determining the prior scene model as the target prior scene model.

Optionally, the third determining unit includes: a fourth determining module, configured to obtain, when the road image is a road image of one frame in a target image sequence, the second scene recognition result of each frame of the road image in the target image sequence, where the target image sequence includes a plurality of frames of road images in a road video collected by the vehicle terminal for the target road; and determining a target scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of the road image of each frame and the image weight corresponding to each road image of each frame.

Optionally, the third determining unit further includes: the method comprises the steps of obtaining one classification label from a plurality of classification labels as a current classification label; acquiring a current matching probability corresponding to the current classification tag from the second scene recognition result of the road image of each frame; acquiring a plurality of weighted summation results among the matching probabilities and the corresponding image weights, and taking the weighted summation results as the sequence matching probabilities corresponding to the current classification labels; repeating the steps until traversing a plurality of the classification labels to obtain the sequence matching probabilities respectively corresponding to the classification labels; and determining the classification label corresponding to the target matching probability with the highest probability in the sequence matching probabilities as a target scene recognition result of the road scene indicated by the target image sequence.

Optionally, the apparatus further includes: a verification unit, configured to determine a reference scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of the road image of each frame and an image weight corresponding to each of the road images of each frame, where the reference scene recognition result includes the matching probabilities that the road scene indicated by the target image sequence is matched with a plurality of classification labels in the target classification mode; and verifying the reference scene recognition result according to the prior association relation between the road object set and each classification label to obtain the target scene recognition result.

Optionally, the third determining unit is further configured to: configuring an equal initial image weight for each frame of the road image in the target image sequence; when a key image frame is determined from the target image sequence, the initial image weight of the key image frame is increased to a first image weight, wherein the first image weight is higher than the initial image weight; and in the case that the reference image frame is determined from the target image sequence, reducing the initial image weight of the reference image frame to a second image weight, wherein the second image weight is lower than the initial image weight.

Optionally, the first determining unit 1204 is further configured to: acquiring a plurality of sample road images and corresponding scene labels; performing image enhancement processing on the sample road image to obtain a reference sample image, and taking the scene tag corresponding to the sample road image as the scene tag of the reference sample image; and training the scene recognition network in a training state according to the sample road image and the reference sample image to obtain the target scene recognition network meeting the convergence condition.

Specific embodiments may refer to examples shown in the above-mentioned road scene recognition method, and in this example, details are not repeated here.

According to still another aspect of the embodiment of the present application, there is also provided an electronic device for implementing the above-mentioned method for identifying a road scene, where the electronic device may be a terminal device or a server as shown in fig. 13. The present embodiment is described taking the electronic device as an example. As shown in fig. 13, the electronic device comprises a memory 1302 and a processor 1304, the memory 1302 having stored therein a computer program, the processor 1304 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, it will be appreciated by those of ordinary skill in the art that the structure shown in FIG. 13 is illustrative only. Fig. 13 is not limited to the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 13, or have a different configuration than shown in FIG. 13.

The memory 1302 may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for identifying a road scene in the embodiment of the present invention, and the processor 1304 executes the software programs and modules stored in the memory 1302, thereby performing various functional applications and data processing, that is, implementing the method for identifying a road scene. Memory 1302 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 1302 may further include memory located remotely from processor 1304, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1302 may be used to store, but is not limited to, file information such as a target logical file. As an example, as shown in fig. 13, the above memory 1302 may include, but is not limited to, a first acquisition unit 1202, a first determination unit 1204, a second acquisition unit 1206, and a second determination unit 1208 in the above-described road scene recognition apparatus. In addition, other module units in the above-mentioned road scene recognition device may be further included, but are not limited thereto, and are not described in detail in this example.

Optionally, the transmission device 1306 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 1306 includes a network adapter (NetworkInterfaceController, NIC) that can be connected to other network devices and routers via a network cable so as to communicate with the internet or a local area network. In one example, the transmission device 1306 is a radio frequency (RadioFrequency, RF) module that is used to communicate wirelessly with the internet.

In addition, the electronic device further includes: a display 1308, and a connection bus 1310 for connecting the various modular components in the electronic device described above.

According to one aspect of the present application, there is provided a computer program product comprising a computer program/instruction containing program code for executing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. When executed by a central processing unit, performs various functions provided by embodiments of the present application.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

It should be noted that the computer system of the electronic device is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. The computer program, when executed by a central processing unit, performs the various functions defined in the system of the application.

According to one aspect of the present application, there is provided a computer-readable storage medium, from which a processor of a computer device reads the computer instructions, the processor executing the computer instructions, causing the computer device to perform the methods provided in the various alternative implementations described above.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the steps of:

S1, acquiring a road image acquired by a vehicle-mounted terminal on a target road;

s2, inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of the road scene indicated by the road image, wherein the first scene recognition result comprises matching probabilities of respectively matching the road scene with a plurality of classification labels in a target classification mode;

s3, acquiring a road object set associated with the road image, wherein the road object set comprises a plurality of road objects determined according to an object identification result of the road image;

And S4, checking the first scene recognition result according to the prior association relation between the road object set and each classification label to obtain a second scene recognition result.

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing electronic equipment related hardware, and the program may be stored in a computer readable storage medium, where the storage medium may include: flash disk, read-only memory (ROM), random access memory (RandomAccessMemory, RAM), magnetic disk or optical disk, and the like.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method of the various embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed user equipment may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and are merely a logical functional division, and there may be other manners of dividing the apparatus in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A method for identifying a road scene, comprising:

Acquiring a road image acquired by a vehicle-mounted terminal on a target road;

Inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of a road scene indicated by the road image, wherein the first scene recognition result comprises matching probabilities of the road scene being respectively matched with a plurality of classification labels in a target classification mode;

Acquiring a road object set associated with the road image, wherein the road object set comprises a plurality of road objects determined according to an object identification result of the road image;

And verifying the first scene recognition result according to the prior association relation between the road object set and each classification label to obtain a second scene recognition result, wherein the method comprises the following steps: acquiring a reference object set, wherein the reference object set comprises a plurality of road objects for scene recognition; determining an object feature vector matched with the road image according to the set relation between the reference object set and the road object set; inputting the object feature vector and the label vector of the classification label into a target priori scene model to obtain label association probability, wherein the label association probability indicates the association relationship between the road scene matched with the classification label and the road object set; under the condition that the tag association probability meets a tag verification condition, verifying the matching probability corresponding to the classified tag to obtain the second scene recognition result;

And under the condition that the tag association probability meets a tag verification condition, verifying the matching probability corresponding to the classified tag, wherein the matching probability comprises one of the following steps: when the tag association probability is smaller than or equal to a target probability threshold value, the matching probability corresponding to the classified tag is adjusted to be 0; or updating the matching probability according to the product result of the tag association probability and the matching probability.

2. The method of claim 1, wherein before inputting the object feature vector and the tag vector of the classification tag into the object prior scene model, comprising:

acquiring a plurality of sample road images, wherein the sample road images comprise sample road objects;

Training a priori scene model in a training stage according to the label vector of the classification label corresponding to the sample road image and the object vector corresponding to the sample road object, wherein the priori scene model in the training stage is a prediction model constructed according to a gradient lifting decision tree;

and under the condition that the prior scene model meets the convergence condition, determining the prior scene model as the target prior scene model.

3. The method according to claim 1, wherein the verifying the first scene recognition result according to the prior association relationship between the road object set and each of the classification tags, after obtaining a second scene recognition result, further comprises:

Acquiring the second scene recognition result of each frame of road image in a target image sequence under the condition that the road image is one frame of road image in the target image sequence, wherein the target image sequence comprises a plurality of frames of road images in a road video acquired by the vehicle-mounted terminal on the target road;

and determining a target scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of the road image of each frame and the image weight corresponding to each road image of each frame.

4. A method according to claim 3, wherein said determining the target scene recognition result of the road scene indicated by the target image sequence based on the second scene recognition result of the road image of each frame and the respective image weights corresponding to the road images of each frame comprises:

acquiring one classification label from a plurality of classification labels as a current classification label;

Acquiring a current matching probability corresponding to the current classification tag from the second scene recognition result of the road image of each frame;

Acquiring a plurality of matching probabilities and weighted summation results among the corresponding image weights, and taking the weighted summation results as sequence matching probabilities corresponding to the current classification labels;

Repeating the steps until a plurality of classification labels are traversed, and obtaining the sequence matching probabilities respectively corresponding to the classification labels;

And determining the classification label corresponding to the target matching probability with the highest probability in the sequence matching probabilities as a target scene recognition result of the road scene indicated by the target image sequence.

5. A method according to claim 3, wherein said determining the target scene recognition result of the road scene indicated by the target image sequence based on the second scene recognition result of the road image of each frame and the respective image weights corresponding to the road images of each frame comprises:

Determining a reference scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of the road image of each frame and the image weight corresponding to each road image of each frame, wherein the reference scene recognition result comprises the matching probability that the road scene indicated by the target image sequence is matched with a plurality of classification labels respectively in the target classification mode;

And verifying the reference scene recognition result according to the prior association relation between the road object set and each classification label to obtain the target scene recognition result.

6. A method according to claim 3, wherein before determining the target scene recognition result of the road scene indicated by the target image sequence according to the second scene recognition result of the road image of each frame and the respective image weights corresponding to the road images of each frame, the method further comprises:

configuring an equal initial image weight for each frame of the road image in the target image sequence;

In the case of determining a key image frame from the target image sequence, raising the initial image weight of the key image frame to a first image weight, wherein the first image weight is higher than the initial image weight;

In case a reference image frame is determined from the target image sequence, the initial image weight of the reference image frame is reduced to a second image weight, wherein the second image weight is lower than the initial image weight.

7. The method of claim 1, wherein before inputting the road image into a target scene recognition network to obtain at least one first scene recognition result that matches a road scene in the road image, further comprising:

acquiring a plurality of sample road images and corresponding scene labels;

Performing image enhancement processing on the sample road image to obtain a reference sample image, and taking the scene tag corresponding to the sample road image as the scene tag of the reference sample image;

and training the scene recognition network in a training state according to the sample road image and the reference sample image to obtain the target scene recognition network meeting the convergence condition.

8. A road scene recognition device, characterized by comprising:

The first acquisition unit is used for acquiring a road image acquired by the vehicle-mounted terminal on a target road;

the first determining unit is used for inputting the road image into a target scene recognition network to obtain at least one first scene recognition result of a road scene indicated by the road image, wherein the first scene recognition result comprises matching probabilities of the road scene being respectively matched with a plurality of classification labels in a target classification mode;

A second acquisition unit configured to acquire a road object set associated with the road image, where the road object set includes a plurality of road objects determined according to an object recognition result of the road image;

The second determining unit, according to the prior association relationship between the road object set and each classification label, checks the first scene recognition result to obtain a second scene recognition result, including: acquiring a reference object set, wherein the reference object set comprises a plurality of road objects for scene recognition; determining an object feature vector matched with the road image according to the set relation between the reference object set and the road object set; inputting the object feature vector and the label vector of the classification label into a target priori scene model to obtain label association probability, wherein the label association probability indicates the association relationship between the road scene matched with the classification label and the road object set; under the condition that the tag association probability meets a tag verification condition, verifying the matching probability corresponding to the classified tag to obtain the second scene recognition result;

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program, when run by an electronic device, performs the method of any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 7 by means of the computer program.