CN117173513A

CN117173513A - Method, system, equipment and roadside unit for detecting and positioning pavement construction

Info

Publication number: CN117173513A
Application number: CN202311160552.4A
Authority: CN
Inventors: 康晓; 郭振华; 曹思奇
Original assignee: Tianyi Transportation Technology Co ltd
Current assignee: Tianyi Transportation Technology Co ltd
Priority date: 2023-09-08
Filing date: 2023-09-08
Publication date: 2023-12-05

Abstract

The application provides a method, a system, electronic equipment and a road side unit for detecting and positioning pavement construction, wherein the method comprises the following steps: constructing an image feature and text feature training set, and training by using the training set to obtain a classifier model; acquiring a video stream by using a road-side camera, and acquiring a picture frame according to the video stream; inputting the picture frame into the classifier model to obtain a classification label, and judging whether the classification label is pavement construction or not; and responding to the classification label for pavement construction, and determining a warning area according to the construction marker. The application can detect things with certain commonalities aiming at the non-uniform physical forms such as construction sites, and the like, and can accurately position the place and scale of occurrence of an event after combining a detection model of the commonalities, thereby providing effective road traffic safety guarantee for the running of an automatic driving vehicle.

Description

Method, system, equipment and roadside unit for detecting and positioning pavement construction

Technical Field

The application relates to the field of automatic driving, in particular to a method, a system, electronic equipment and a road side unit for detecting and positioning pavement construction.

Background

When the automatic driving system needs to sense the road environment in real time and needs to temporarily construct on the road, if the automatic driving system cannot receive the prompt of road construction through the traffic system in real time and detours the construction road, the automatic driving system is likely to enter the construction road still according to the planned route, and safety accidents are likely to occur. However, if the existence of road pavement construction can be detected immediately, the automatic driving vehicle can take corresponding evasive measures, such as obstacle avoidance or changing of driving routes, so as to ensure driving safety.

The main purpose of pavement construction detection is to detect a construction target on a pavement and determine a construction road range. The existing method is low in robustness and precision, and the main problems are that the construction behavior is not an object with a very fixed form and is difficult to detect in a target detection mode, and temporary pavement construction is not always updated into a high-precision map of an automatic driving system in real time, so that the current pavement construction depends more on manual submission reports of driver passengers and the like, and construction early warning is issued at a terminal after summarized information is checked. Therefore, a precise and instant pavement construction detection method is urgently needed for positioning a construction site.

Disclosure of Invention

In view of the above, an object of the embodiments of the present application is to provide a method, a system, an electronic device, and a road side unit for detecting and positioning road construction, which analyze and predict collected video stream data through a trained CLIP model to determine whether the road construction is performed, and can detect things with non-uniform physical forms but certain commonalities, such as a construction site, and the like, and accurately position the place and scale of occurrence of an event after combining with a detection model of a commonality object, thereby providing effective road traffic safety guarantee for driving of an automatic driving vehicle.

Based on the above objects, an aspect of the embodiments of the present application provides a method for detecting and locating road construction, including the following steps: constructing an image feature and text feature training set, and training by using the training set to obtain a classifier model; acquiring a video stream by using a road-side camera, and acquiring a picture frame according to the video stream; inputting the picture frame into the classifier model to obtain a classification label, and judging whether the classification label is pavement construction or not; and responding to the classification label for pavement construction, and determining a warning area according to the construction marker.

In some embodiments, the step of constructing a training set of image features and text features and training using the training set to obtain a classifier model comprises: inputting the training pictures into an image encoder to obtain corresponding image feature vectors; constructing description text of each category according to the classification label of the classification task, and inputting the description text into a text encoder to obtain a corresponding text feature vector; and establishing an association between the image feature vector and the corresponding text feature vector.

In some embodiments, the step of establishing an association between the image feature vector and the corresponding text feature vector comprises: different combinations are carried out on the plurality of image feature vectors and the plurality of text feature vectors to form a plurality of feature matrixes, and cosine similarity between the image feature vectors and the text feature vectors in each feature matrix is calculated; and establishing the association between the image feature vector and the corresponding text feature vector according to the feature matrix with the optimal cosine similarity.

In some embodiments, the step of calculating cosine similarity between the image feature vector and the text feature vector in each feature matrix includes: and calculating the first cosine similarity of diagonal elements of the feature matrix, and calculating the second cosine similarity of elements except the diagonal elements in the feature matrix.

In some embodiments, the step of establishing the association between the image feature vector and the corresponding text feature vector according to the feature matrix with optimal cosine similarity includes: and determining the feature matrix with the maximum first cosine similarity and the minimum second cosine similarity as an optimal feature matrix, and establishing a corresponding relation between the image and the label according to the optimal feature matrix.

In some embodiments, the step of inputting the picture frame into the classifier model to obtain a classification label includes: inputting the picture frame into the classifier model to obtain a predicted feature vector, obtaining the prediction probability of each category according to the predicted feature vector and the normalized exponential function, and taking the category with the highest prediction probability as a classification label corresponding to the picture.

In some embodiments, the step of determining the warning area from the construction marker comprises: and taking the minimum circumscribed rectangle of the detected construction marker as a warning area.

In some embodiments, the method further comprises: and calculating the position coordinate information of the warning area in the map according to the position coordinate information of the warning area in the picture, the height information of the camera in the physical space and the position coordinate information in the map.

In another aspect of the embodiments of the present application, there is provided a system for detecting and locating road construction, including: the training module is configured to construct a training set of image features and text features, and training the training set to obtain a classifier model; the acquisition module is configured to acquire a video stream by using a road-end camera, and acquire a picture frame according to the video stream; the label module is configured to input the picture frame into the classifier model to obtain a classification label and judge whether the classification label is pavement construction or not; and the warning module is configured for responding to the classification label to construct the road surface and determining a warning area according to the construction marker.

In still another aspect of the embodiment of the present application, there is also provided an electronic device, including: at least one processor; and a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method as above.

In a further aspect of the embodiments of the present application, there is also provided a road side unit storing a computer program which when executed by a processor implements the above method steps.

The application has the following beneficial technical effects: the feature description of the construction site can be improved through the excellent image encoder, the scene classification of the construction site can be given to the image by combining with the classifier, and the place and the scale of the event can be accurately positioned by combining with a specific target detection method after the core object is positioned aiming at the scene, so that effective road traffic safety guarantee is provided for the running of the automatic driving vehicle.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an embodiment of a method for detecting and locating pavement construction according to the present application;

FIG. 2 is a schematic diagram of the warning area location of pavement construction provided by the application;

FIG. 3 is a flow chart of an embodiment of a method for detecting and locating pavement construction provided by the present application;

FIG. 4 is a schematic diagram of an embodiment of a system for detecting and locating pavement construction provided by the present application;

fig. 5 is a schematic hardware structure diagram of an embodiment of an electronic device for detecting and positioning pavement construction according to the present application;

fig. 6 is a schematic diagram of an embodiment of a computer roadside unit for detecting and positioning pavement construction according to the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the following embodiments of the present application will be described in further detail with reference to the accompanying drawings.

It should be noted that, in the embodiments of the present application, all the expressions "first" and "second" are used to distinguish two entities with the same name but different entities or different parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present application, and the following embodiments are not described one by one.

In a first aspect of the embodiment of the application, an embodiment of a method for detecting and positioning pavement construction is provided. Fig. 1 is a schematic diagram of an embodiment of a method for detecting and positioning pavement construction provided by the application. As shown in fig. 1, the embodiment of the present application includes the following steps:

s1, constructing an image feature and text feature training set, and training by using the training set to obtain a classifier model;

s2, acquiring a video stream by using a road-side camera, and acquiring a picture frame according to the video stream;

s3, inputting the picture frame into the classifier model to obtain a classification label, and judging whether the classification label is pavement construction or not; and

s4, responding to the classification label to construct the road surface, and determining a warning area according to the construction marker.

The CLIP model is a multi-mode model based on contrast learning, mainly comprises a Text Encoder and an Image Encoder, extracts Text and Image characteristics respectively, and enables the model to learn a Text-Image matching relationship based on contrast learning, for example, large-scale data (4 hundred million Text-Image pairs) can be used for training, and based on the mass data, the CLIP model can learn more general visual semantic information to improve help for downstream tasks.

And constructing an image feature and text feature training set, and training by using the training set to obtain a classifier model. In the embodiment of the application, the CLIP model training is divided into two stages, wherein the first stage is image feature extraction and text feature extraction, and the second stage is classifier training. The amount of text words and the size of the text-image pairs covered by the first stage pre-training model are extremely large, and the feature extractor of this stage can be used to train the mobilizable visual model without training tuning model parameters, but only training model parameters of the second stage.

In some embodiments, the step of constructing a training set of image features and text features and training using the training set to obtain a classifier model comprises: inputting the training pictures into an image encoder to obtain corresponding image feature vectors; constructing description text of each category according to the classification labels of the classification tasks, and inputting the description text into a text encoder to obtain corresponding text feature vectors; and establishing an association between the image feature vector and the corresponding text feature vector. The embodiment of the application mainly transfers the CLIP model to be used for training and predicting the pavement construction classifier, and inputs frames (frames) into an image encoder module to be processed to obtain the feature embedded vector of the image, wherein the image encoder module transfers the image encoder based on ViT architecture. Constructing descriptive text of each category according to the classification label of the classification task: a photo of { label }, and then sends the text to a text encoder to obtain corresponding text features, the text encoder migrates the encoder of the text transducer model, and a set of images and fixed text feature pairs with classification labels are delivered by their respective encoders.

Workflow of the image encoder of the Vit architecture: dividing a picture into patches; paving the patches; mapping the linearity of the tiled patches to a lower dimensional space; adding position ebadd coding information; sending the image sequence data into standard Transformer encoder; pre-training on a larger dataset; fine tuning on the downstream dataset is used for image classification.

In some embodiments, the step of establishing an association between the image feature vector and the corresponding text feature vector comprises: combining the plurality of image feature vectors and the plurality of text feature vectors differently to form a plurality of feature momentsThe matrix is used for calculating cosine similarity between the image feature vector and the text feature vector in each feature matrix; and establishing the association between the image feature vector and the corresponding text feature vector according to the feature matrix with the optimal cosine similarity. If one batch has N text-image pairs, N text features and N image features are combined two by two to form a size N ² The CLIP model predicts N ² The similarity of the possible text-image pairs is calculated by computing cosine similarity (cosine similarity) of the text features and the image features to obtain the similarity.

In some embodiments, the step of calculating cosine similarity between the image feature vector and the text feature vector in each feature matrix includes: and calculating the first cosine similarity of diagonal elements of the feature matrix, and calculating the second cosine similarity of elements except the diagonal elements in the feature matrix. One batch has N positive samples, namely truly one-to-one text and image in the matrix (diagonal elements in the matrix), the rest N ² The N text-image pairs are then negative samples, and training the final result requires that the cosine similarity of the positive samples be the largest and the cosine similarity of the negative samples be the smallest.

In some embodiments, the step of establishing the association between the image feature vector and the corresponding text feature vector according to the feature matrix with optimal cosine similarity includes: and determining the feature matrix with the maximum first cosine similarity and the minimum second cosine similarity as an optimal feature matrix, and establishing a corresponding relation between the image and the label according to the optimal feature matrix. And (3) training the positive sample with maximum cosine similarity and the negative sample with minimum cosine similarity is completed, and establishing a connection between the image and the label according to the relationship between the image and the label at the moment.

The training of the CLIP model can be analogous to a general classification model, but the general classification model inputs images and enumerated labels in a training stage, the enumerated labels are minimized to optimize model parameters by enumerating the classification differences obtained by model prediction of the enumerated labels and the images, the image and the text (A photo of { label) containing the labels are input in a first stage of the CLIP, the image feature vector and the label (text) feature vector are output, the image feature vector and the label (text) feature vector are input in a second stage of training stage, and the cosine similarity of positive samples is maximized and the cosine similarity of negative samples is minimized to optimize the model parameters.

And acquiring a video stream by using a road-end camera, and acquiring a picture frame according to the video stream. And acquiring video streams of each corresponding region by using a road-side camera, and acquiring image frames in the video streams for classifying pictures.

And inputting the picture frame into the classifier model to obtain a classification label, and judging whether the classification label is pavement construction or not.

Inputting frames into a classification model to obtain feature vectors predicted by the model, and calculating softmax to obtain the prediction probability (category confidence) of each category, wherein the highest category confidence is labeled text (A photo of { label }) corresponding to the image. Softmax can "compress" one K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1. For example, the value of the Softmax function corresponding to the input vector [1,2,3,4,1,2,3] is [0.024,0.064,0.175,0.475,0.024,0.064,0.175]. The term with the greatest weight in the output vector corresponds to the maximum value "4" in the input vector. Thus, the tag class to which the image corresponds can be confirmed according to softmax.

And responding to the classification label for pavement construction, and determining a warning area according to the construction marker. If the output classification label (label) is pavement construction, the frame is a corresponding pavement construction image; and (3) performing construction marker detection on the images classified as pavement construction by using the trained construction marker detection models such as the cone barrel, the water horse and the like, and accurately positioning the construction position based on the detection result.

If no construction marker is detected, a pavement construction condition early warning can exist before the automatic driving vehicle is sent; if the construction marker is detected, the minimum circumscribed rectangle of the whole detected construction markers is taken as a warning area, the warning area is sent to a map of an automatic driving vehicle, and a path planning scheme is re-formulated.

In some embodiments, the method further comprises: and calculating the position coordinate information of the warning area in the map according to the position coordinate information of the warning area in the picture, the height information of the camera in the physical space and the position coordinate information in the map. Fig. 2 is a schematic diagram of positioning a warning area of pavement construction, as shown in fig. 2, after coordinate position information of a minimum circumscribed rectangle detection frame of a detected construction marker in a picture is determined, map coordinates of the detection frame of pavement construction in a high-precision map can be calculated based on height information of a camera in a physical space and position information of the detection frame in the high-precision map, so that a positioning function in the high-precision map is realized.

Specifically, the map coordinates of the detection frame in the high-precision map can be obtained by transforming the pixel coordinate system of the picture and the world coordinate system of the physical space, and the following formula is a transformation formula of the pixel coordinate system and the world coordinate system:

P _uv ＝KTP _w

wherein P is _uv Representing the pixel coordinate system, P _w Representing the world coordinate system, K representing the in-camera parameter matrix and T representing the out-of-camera parameter matrix.

The inner parameter matrix K represents the transformation of the pixel coordinate system with respect to the camera coordinate system (related to the camera and the lens) and the outer parameter matrix T represents the transformation of the camera coordinate system with respect to the world coordinate system.

Fig. 3 is a flowchart of an embodiment of a method for detecting and positioning pavement construction provided by the application, as shown in fig. 3, an image frame is acquired, the image frame is input into an image encoder to acquire a feature embedding vector, the feature embedding vector is input into a classifier to obtain a classification result, whether the classification result is pavement construction is judged, if yes, construction marker detection is performed, if construction markers exist, path planning is performed again, if no construction markers exist, pavement construction early warning is performed, and if the classification result is not pavement construction, the process is directly finished.

According to the embodiment of the application, the feature description of the construction site can be improved through the excellent image encoder, the scene classification of the construction site can be given to the image by combining with the classifier, and the position and the scale of the event can be accurately positioned by combining with a specific target detection method after the positioning core object of the scene is determined, so that effective road traffic safety guarantee is provided for the running of the automatic driving vehicle.

It should be noted that, in the above-mentioned embodiments of the method for detecting and positioning pavement construction, the steps may be intersected, replaced, added and deleted, so that the method for detecting and positioning pavement construction by using these reasonable permutation and combination changes should also belong to the protection scope of the present application, and the protection scope of the present application should not be limited to the embodiments.

Based on the above objects, a second aspect of the embodiments of the present application provides a system for detecting and positioning road construction. As shown in fig. 4, the system 200 includes the following modules: the training module is configured to construct a training set of image features and text features, and training the training set to obtain a classifier model; the acquisition module is configured to acquire a video stream by using a road-end camera, and acquire a picture frame according to the video stream; the label module is configured to input the picture frame into the classifier model to obtain a classification label and judge whether the classification label is pavement construction or not; and the warning module is configured for responding to the classification label to construct the road surface and determining a warning area according to the construction marker.

In some embodiments, the training module is further configured to: inputting the training pictures into an image encoder to obtain corresponding image feature vectors; constructing description text of each category according to the classification label of the classification task, and inputting the description text into a text encoder to obtain a corresponding text feature vector; and establishing an association between the image feature vector and the corresponding text feature vector.

In some embodiments, the training module is further configured to: different combinations are carried out on the plurality of image feature vectors and the plurality of text feature vectors to form a plurality of feature matrixes, and cosine similarity between the image feature vectors and the text feature vectors in each feature matrix is calculated; and establishing the association between the image feature vector and the corresponding text feature vector according to the feature matrix with the optimal cosine similarity.

In some embodiments, the training module is further configured to: and calculating the first cosine similarity of diagonal elements of the feature matrix, and calculating the second cosine similarity of elements except the diagonal elements in the feature matrix.

In some embodiments, the training module is further configured to: and determining the feature matrix with the maximum first cosine similarity and the minimum second cosine similarity as an optimal feature matrix, and establishing a corresponding relation between the image and the label according to the optimal feature matrix.

In some embodiments, the tag module is further configured to: inputting the picture frame into the classifier model to obtain a predicted feature vector, obtaining the prediction probability of each category according to the predicted feature vector and the normalized exponential function, and taking the category with the highest prediction probability as a classification label corresponding to the picture.

In some embodiments, the alert module is further configured to: and taking the minimum circumscribed rectangle of the detected construction marker as a warning area.

In some embodiments, the system further comprises a positioning module configured to: and calculating the position coordinate information of the warning area in the map according to the position coordinate information of the warning area in the picture, the height information of the camera in the physical space and the position coordinate information in the map.

In view of the above object, a third aspect of an embodiment of the present application provides an electronic device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, constructing an image feature and text feature training set, and training by using the training set to obtain a classifier model; s2, acquiring a video stream by using a road-side camera, and acquiring a picture frame according to the video stream; s3, inputting the picture frame into the classifier model to obtain a classification label, and judging whether the classification label is pavement construction or not; and S4, responding to the classification label to construct the road surface, and determining a warning area according to the construction marker.

In some embodiments, the steps further comprise: and calculating the position coordinate information of the warning area in the map according to the position coordinate information of the warning area in the picture, the height information of the camera in the physical space and the position coordinate information in the map.

Fig. 5 is a schematic hardware structure diagram of an embodiment of the electronic device for detecting and positioning pavement construction according to the present application.

Taking the example of the apparatus shown in fig. 5, a processor 301 and a memory 302 are included in the apparatus.

The processor 301 and the memory 302 may be connected by a bus or otherwise, for example in fig. 5.

The memory 302 is used as a non-volatile computer readable storage medium for storing non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the method for detecting and locating road surface construction in the embodiments of the present application. The processor 301 executes various functional applications of the server and data processing, i.e., a method of realizing detection and positioning of road surface construction, by running nonvolatile software programs, instructions, and modules stored in the memory 302.

Memory 302 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the method of detection and positioning of road surface construction, and the like. In addition, memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 302 may optionally include memory located remotely from processor 301, which may be connected to the local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Computer instructions 303 corresponding to one or more methods of detecting and locating road surface construction are stored in memory 302, which when executed by processor 301, perform the method of detecting and locating road surface construction in any of the method embodiments described above.

Any one of the embodiments of the electronic device that performs the above-described method of detecting and locating road surface construction may achieve the same or similar effects as any of the embodiments of the method described above.

The application also provides a road side unit storing a computer program which when executed by a processor performs the method of detecting and locating road surface construction.

Fig. 6 is a schematic diagram of an embodiment of the road side unit for detecting and positioning the road surface construction according to the present application. Taking a roadside unit as shown in fig. 6 as an example, the roadside unit 401 stores a computer program 402 that performs the above method when executed by a processor.

Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the procedures in the methods of the embodiments described above may be implemented by a computer program to instruct related hardware, and the program of the method for detecting and locating road surface construction may be stored in a computer readable storage medium, and the program may include the procedures of the embodiments of the methods described above when executed. The storage medium of the program may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (RAM), or the like. The computer program embodiments described above may achieve the same or similar effects as any of the method embodiments described above.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The foregoing embodiment of the present application has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the application, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the application, and many other variations of the different aspects of the embodiments of the application as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present application.

Claims

1. The method for detecting and positioning the pavement construction is characterized by comprising the following steps of:

constructing an image feature and text feature training set, and training by using the training set to obtain a classifier model;

acquiring a video stream by using a road-side camera, and acquiring a picture frame according to the video stream;

inputting the picture frame into the classifier model to obtain a classification label, and judging whether the classification label is pavement construction or not; and

and responding to the classification label for pavement construction, and determining a warning area according to the construction marker.

2. The method of claim 1, wherein the steps of constructing a training set of image features and text features and training using the training set to obtain a classifier model comprise:

inputting the training pictures into an image encoder to obtain corresponding image feature vectors;

constructing description text of each category according to the classification label of the classification task, and inputting the description text into a text encoder to obtain a corresponding text feature vector; and

an association between the image feature vector and the corresponding text feature vector is established.

3. The method of detecting and locating road construction according to claim 2, wherein the step of establishing an association between the image feature vector and the corresponding text feature vector comprises:

different combinations are carried out on the plurality of image feature vectors and the plurality of text feature vectors to form a plurality of feature matrixes, and cosine similarity between the image feature vectors and the text feature vectors in each feature matrix is calculated; and

and establishing the association between the image feature vector and the corresponding text feature vector according to the feature matrix with the optimal cosine similarity.

4. A method of detecting and locating road construction according to claim 3, wherein the step of calculating cosine similarity between the image feature vector and the text feature vector in each feature matrix comprises:

and calculating the first cosine similarity of diagonal elements of the feature matrix, and calculating the second cosine similarity of elements except the diagonal elements in the feature matrix.

5. The method for detecting and locating pavement construction according to claim 4, wherein the step of establishing the association between the image feature vector and the corresponding text feature vector according to the feature matrix with the optimal cosine similarity comprises:

and determining the feature matrix with the maximum first cosine similarity and the minimum second cosine similarity as an optimal feature matrix, and establishing a corresponding relation between the image and the label according to the optimal feature matrix.

6. The method of detecting and locating road construction according to claim 1, wherein the step of inputting the picture frame into the classifier model to obtain a classification tag comprises:

inputting the picture frame into the classifier model to obtain a predicted feature vector, obtaining the prediction probability of each category according to the predicted feature vector and the normalized exponential function, and taking the category with the highest prediction probability as a classification label corresponding to the picture.

7. The method for detecting and locating road surface construction according to claim 1, wherein the step of determining the warning area based on the construction marker comprises:

and taking the minimum circumscribed rectangle of the detected construction marker as a warning area.

8. The method of detecting and locating road construction according to claim 7, further comprising:

and calculating the position coordinate information of the warning area in the map according to the position coordinate information of the warning area in the picture, the height information of the camera in the physical space and the position coordinate information in the map.

9. A system for detecting and locating pavement construction, comprising:

the training module is configured to construct a training set of image features and text features, and training the training set to obtain a classifier model;

the acquisition module is configured to acquire a video stream by using a road-end camera, and acquire a picture frame according to the video stream;

the label module is configured to input the picture frame into the classifier model to obtain a classification label and judge whether the classification label is pavement construction or not; and

and the warning module is configured for responding to the classification label to construct the road surface and determining a warning area according to the construction marker.

10. An electronic device, comprising:

at least one processor; and

a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method of any one of claims 1-8.

11. A roadside unit storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1-8.