CN116363628A

CN116363628A - Mark detection method and device, nonvolatile storage medium and computer equipment

Info

Publication number: CN116363628A
Application number: CN202310287042.7A
Authority: CN
Inventors: 王佑星; 陈博; 尹荣彬; 张伟伟; 徐名源; 邱璆; 张达明; 宋楠楠; 薛鸿; 许际晗
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-06-30

Abstract

The invention discloses a mark detection method, a mark detection device, a nonvolatile storage medium and computer equipment. Wherein the method comprises the following steps: acquiring a history frame, a target frame and motion information of a vehicle, wherein the history frame and the target frame are images obtained by shooting a camera of the vehicle; determining a target area according to the motion information and a history detection frame in a history frame, wherein the history detection frame is used for marking traffic signs in the history frame; and generating a target detection frame according to the target area and the target frame, wherein the target detection frame is used for marking a target traffic sign in the target area in the target frame. The invention solves the technical problem that a large amount of calculation force is needed in the scheme of identifying the traffic sign board from the video image in time in traffic running.

Description

Mark detection method and device, nonvolatile storage medium and computer equipment

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to a method and apparatus for detecting a flag, a nonvolatile storage medium, and a computer device.

Background

With the development of the artificial intelligence field and the increase of the automatic driving demand, the detection of the traffic sign has become the necessary capability of automatically driving the vehicle, and the detection result of the traffic sign not only can provide road standard input for the automatic driving function, but also can be used as a key input element of a positioning algorithm, so that the detection of the traffic sign has important significance. At present, some traffic sign detection schemes are proposed by some groups, but the problems of poor scheme robustness, low accuracy, insufficient information quantity and the like still exist. For example, the scheme proposed by the related art is either large in operation amount, so that the real-time requirement of the mark recognition cannot be met, or the problem of missed detection and false detection is easy to occur.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a sign detection method, a sign detection device, a nonvolatile storage medium and computer equipment, which at least solve the technical problem that a scheme for identifying traffic signs from video images in time during traffic running requires a large amount of calculation force.

According to an aspect of an embodiment of the present invention, there is provided a flag detection method including: acquiring a history frame, a target frame and motion information of a vehicle, wherein the history frame and the target frame are images obtained by shooting a camera of the vehicle; determining a target area according to the motion information and a history detection frame in the history frame, wherein the history detection frame is used for marking traffic signs in the history frame; and generating a target detection frame according to the target area and the target frame, wherein the target detection frame is used for marking a target traffic sign in the target area in the target frame.

Optionally, the method further comprises: identifying the target traffic sign to obtain a classification result of the target traffic sign; identifying traffic auxiliary sign information corresponding to the target traffic sign according to the target area and the target detection frame; and outputting a sign detection result corresponding to the target traffic sign according to the classification result and the traffic auxiliary sign information.

Optionally, the determining the target area according to the motion information and the history detection frame in the history frame includes: according to the history frame, determining a first coordinate of a traffic sign marked by the history detection frame in a world coordinate system; predicting a second coordinate of the traffic sign marked by the history detection frame in the world coordinate system at the time corresponding to the target frame according to the motion information; generating a prediction frame corresponding to the history detection frame in the target frame according to the second coordinate; and generating the target area comprising the prediction frame according to the prediction frame.

Optionally, the generating a target detection frame according to the target area and the target frame includes: cutting off an image corresponding to the target area in the target frame to obtain a target image; and carrying out image recognition on the target image to obtain the target detection frame.

Optionally, the performing image recognition on the target image to obtain the target detection frame includes: performing image recognition on the target image to obtain an initial detection frame; establishing an MHT tree structure based on the history detection frame and the initial detection frame, wherein the MHT tree structure is used for representing a weak matching relationship between the history detection frame and the initial detection frame; and screening the target detection frame from the initial detection frame based on the MHT tree structure.

Optionally, the selecting the target detection box from the initial detection box based on the MHT tree structure includes: and screening the target detection frame from the initial detection frame based on a Hungary algorithm, wherein the Hungary algorithm is used for correcting the weak matching relationship in the MHT tree structure into a one-to-one correspondence relationship between the historical detection frame and the target detection frame.

Optionally, the screening the target detection frame from the initial detection frame based on the hungarian algorithm includes: acquiring a first texture corresponding to the history detection frame and a second texture corresponding to the initial detection frame respectively; and matching the globally optimal detection frame in the initial detection frame with the historical detection frame in a one-to-one correspondence manner according to the first texture, the second texture and the cost function constructed based on the Hungary algorithm to obtain the target detection frame, wherein the cost function is constructed based on texture characteristics and characteristic similarity of the textures.

According to another aspect of the embodiment of the present invention, there is also provided a mark detection device, including: the acquisition module is used for acquiring historical frames, target frames and motion information of a vehicle, wherein the historical frames and the target frames are images obtained by shooting of a camera of the vehicle; the determining module is used for determining a target area according to the motion information and a history detection frame in the history frame, wherein the history detection frame is used for marking traffic marks in the history frame; the generation module is used for generating a target detection frame according to the target area and the target frame, wherein the target detection frame is used for marking traffic signs in the target frame, and the traffic signs are located in the target area.

According to still another aspect of the embodiments of the present invention, there is further provided a nonvolatile storage medium, where the nonvolatile storage medium includes a stored program, and when the program runs, the device in which the nonvolatile storage medium is controlled to execute any one of the above-mentioned flag detection methods.

According to still another aspect of the embodiments of the present invention, there is further provided a computer device, where the computer device includes a memory and a processor, where the memory is configured to store a program, and the processor is configured to execute the program stored in the memory, where the program executes any one of the above-mentioned flag detection methods.

In the embodiment of the invention, a mode of establishing a relation related to traffic signs between the photographed front and rear multi-frame images is adopted, and the relation between the history frames and the target frames is connected based on the motion information of the vehicle, so that the aim of predicting the target detection frame for marking the traffic signs in the target frames based on the history detection frames in the history frames is fulfilled, the technical effect of efficiently and accurately identifying the traffic signs from the real-time video images of the vehicle is realized, and the technical problem that a large amount of calculation force is required in the scheme of timely identifying the traffic signs from the video images in traffic driving is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a block diagram showing the hardware configuration of a computer terminal for implementing a flag detection method;

FIG. 2 is a flow chart of a method for detecting a marker according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a coordinate system conversion relationship provided in accordance with an alternative embodiment of the present invention;

FIG. 4 is a schematic diagram of detection frame position estimation in a target frame provided in accordance with an alternative embodiment of the present invention;

FIG. 5 is a schematic illustration of traffic sign detection results provided in accordance with an alternative embodiment of the present invention;

fig. 6 is a block diagram of a flag detection apparatus according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an embodiment of the present invention, there is provided a flag detection method embodiment, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

The method embodiment provided in the first embodiment of the present application may be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 1 shows a block diagram of a hardware structure of a computer terminal for implementing a flag detection method. As shown in fig. 1, the computer terminal 10 may include one or more (shown as 102a, 102b, … …,102 n) processors (which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors and/or other data processing circuits described above may be referred to herein generally as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module or incorporated, in whole or in part, into any of the other elements in the computer terminal 10. As referred to in the embodiments of the present application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination to interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the mark detection method in the embodiments of the present invention, and the processor executes the software programs and modules stored in the memory 104, thereby executing various functional applications and data processing, that is, implementing the mark detection method of the application program. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10.

In order to solve the above problems, the traffic sign detection schemes given in the related art have problems of poor scheme robustness, low accuracy, insufficient information amount, and the like, for example: 1) And after converting the image color gamut into HSV, filtering the passing range, and further judging the area and the characteristics of the traffic sign through a connected domain algorithm. But such a method relies heavily on the uniformity of traffic sign features and, in addition, false detection will be very likely if elements similar to traffic sign features appear in the image. 2) Based on an artificial intelligence method, traffic signs are directly detected in images based on a deep learning model, and the detection accuracy is improved through the method. Although the accuracy is improved, the traffic signs between the front frame and the rear frame are not associated, and the detected traffic signs cannot be judged to be new targets or targets existing for a period of time; in addition, the large-scale deep learning network has high computational power requirements, and often cannot meet the requirements of real-time operation in vehicle end domain control.

Fig. 2 is a flow chart of a method for detecting a mark according to an embodiment of the present invention, as shown in fig. 2, the method includes the following steps:

step S202, acquiring a history frame, a target frame and motion information of a vehicle, wherein the history frame and the target frame are images shot by a camera of the vehicle.

The target frame may be an image captured by a camera of the vehicle at a current time, and the history frame may be an image captured by the same camera of the vehicle at a history time (i.e., a time before the current time), for example, the history frame may be a previous frame image of the target frame. The motion information of the vehicle may be motion information of the vehicle, the motion information including a motion state of the vehicle in a time from a time when the history frame is photographed to a time when the target frame is photographed.

Step S204, determining a target area according to the motion information and a history detection frame in the history frame, wherein the history detection frame is used for marking traffic marks in the history frame.

In order to quickly detect traffic signs from the target frame, the present embodiment may use the detection results of the traffic signs in the history frame to assist in the detection of the traffic signs in the target frame. Optionally, the embodiment may acquire the current frame image information, the kinematic information of the vehicle, and the final detection result of the traffic sign in the previous frame (i.e., the history detection frame in the history frame), and screen the history detection frame based on the preset confidence threshold, and directly discard the detection result that does not meet the condition, for example discard the detection frame with too low confidence threshold.

As an alternative embodiment, determining the target area according to the motion information and the history detection frame in the history frame may include the steps of: determining a first coordinate of a traffic sign marked by a history detection frame in a world coordinate system according to the history frame; predicting a second coordinate of the traffic sign marked by the history detection frame in the world coordinate system at the moment corresponding to the target frame according to the motion information; generating a prediction frame corresponding to the history detection frame in the target frame according to the second coordinate; according to the prediction frame, a target area including the prediction frame is generated.

The possible occurrence of the detection frame in the next frame of the history frame (i.e. the target frame) can then be predicted according to the similar triangle principle. The motion distance and the orientation angle change of the vehicle between the front frame and the rear frame are predicted through the vehicle kinematic information (such as the vehicle speed and the Yaw angle change rate), and are converted into a camera coordinate system through camera external parameters. Fig. 3 is a schematic diagram of a coordinate system conversion relationship provided according to an alternative embodiment of the present invention, and the relationship of a world coordinate system, a camera coordinate system, and an image coordinate system is shown in fig. 3: in the figure, O is the origin of the camera coordinate system, and p is a point in the image coordinate system. After the position change of the vehicle under the camera coordinate system is obtained based on the kinematic information of the vehicle, because the travelling camera usually adopts a pinhole model, the coordinate systems have a pinhole imaging relationship, and the approximate position of the detection frame in the current frame can be calculated based on the detection result in the previous frame by using a similar triangle method.

Fig. 4 is a schematic diagram of the estimation of the position of the detection frame in the target frame according to the alternative embodiment of the present invention, as shown in fig. 4, the movement distance of the vehicle between the history frame and the target frame and the change of the orientation angle of the vehicle may be estimated by the vehicle kinematic information first, and then the distance between the traffic sign detected in the history frame and the vehicle body in the VCS coordinate system (vehicle body coordinate system) may be estimated based on the similar triangle principle:

wherein x is _vcs To distance traffic sign from vehicle body in VCS coordinate system, f _v For the image distance s of the camera _H S is the height of the traffic sign frame in the VCS coordinate system _imH Is the height of the traffic sign frame in the image coordinate system.

Similarly, the lateral position of the traffic sign in the VCS coordinate system may be calculated by acquiring information of the vanishing point of the camera, the yaw rate of the vehicle, the lateral position of the traffic sign in the image, and the like.

After the position information of the traffic sign in the history frame in the VCS coordinate system is obtained, the possible position of the same traffic sign frame in the target frame in the image coordinate system can be estimated through the kinematic information of the vehicle and the similar triangle principle again. The specific implementation scheme is as follows:

Wherein y is _im Y is the ordinate of the center point of the target traffic sign frame in the predicted target frame in the image coordinate system _vp Is the vanishing point of the camera head,

for the ordinate of the center point of the traffic sign detection frame in the history frame in the image coordinate system,/>

For the position of the history frame traffic sign on the x-axis in the VCS coordinate system, L is the distance travelled by the vehicle between two frames, yaw _rate Is the rate of change of the yaw angle of the vehicle between two frames.

Similarly, the abscissa of the predicted current frame traffic sign frame center point in the image coordinate system can be calculated, so that the possible position of the detection frame in the target frame predicted by the history detection frame in the history frame is finally obtained, namely, the target area in the target frame is predicted.

Step S206, generating a target detection frame according to the target area and the target frame, wherein the target detection frame is used for marking a target traffic sign in the target area in the target frame.

Through the steps, a mode of establishing a relation related to traffic signs between the photographed front and rear multi-frame images is adopted, and the relation between the history frames and the target frames is connected based on the motion information of the vehicle, so that the purpose of predicting the target detection frame for marking the traffic signs in the target frames based on the history detection frames in the history frames is achieved, the technical effect of efficiently and accurately identifying the traffic signs from the real-time video images of the vehicle is achieved, and the technical problem that a large amount of calculation force is required in the scheme of timely identifying the traffic signs from the video images in traffic driving is solved.

As an alternative embodiment, the method further comprises the steps of: identifying a target traffic sign to obtain a classification result of the target traffic sign; identifying traffic auxiliary sign information corresponding to the target traffic sign according to the target area and the target detection frame; and outputting a sign detection result corresponding to the target traffic sign according to the classification result and the traffic auxiliary sign information. The optional embodiment provides a method for carrying out refinement processing on an identified target traffic sign, fig. 5 is a schematic diagram of a traffic sign detection result provided according to the optional embodiment of the invention, as shown in fig. 5, a traffic sign is usually a traffic marking graph and an auxiliary board, a round sign which is encircled by an upper square frame and indicates 'speed limit 40 km/h' in fig. 5 is a traffic sign graph, and a part which is encircled by a lower square frame and indicates 'ramp' is an auxiliary board. In this optional embodiment, the classification result of the target traffic sign may be a result obtained after image recognition based on the traffic sign graph, and the traffic auxiliary board information may be information obtained after recognition based on the traffic auxiliary board. When the type of the traffic sign is identified, an accurate traffic sign detection result can be output after combining the traffic sign graph and the auxiliary sign information, and the accuracy of the detection result is improved.

As an alternative embodiment, four deep learning models may be used to detect traffic signs, the four models being a full-map detection model, a rapid detection model, an auxiliary sign detection model, and a classification model, respectively. Four models are briefly described in turn:

full-view detection model: the input of the model is a complete image of a frame, for example, a history frame or a target frame, and the model detects traffic signs in the whole image area of the frame image, but does not classify the content. The model can judge whether to set a frame-separation calling scheme or not based on the design and the calculation condition of the operation platform, and if the frame-separation calling is adopted, the quick detection model is required to be called in the image frames without calling the full-image detection model for supplementation. For example, if the calculation power of the vehicle platform system is sufficient, each frame of image captured by the camera on the vehicle may be input into the full-image detection model, and the full-image detection model may sequentially detect the traffic sign from each frame of image. If the calculation force of the vehicle platform system is calculated, the full-image detection model can be called once every 1 frame or any multi-frame, and the rapid detection model is called at the middle interval frame to detect the area of the traffic sign board.

Quick detection model: the input of the model is an image of a region of interest (Region of Interest, abbreviated as ROI), the region range of the ROI is selected based on the predicted position of the detection frame in the previous frame in the current frame, and the traffic sign is detected in the range, and the model does not classify the content. The model function is basically consistent with that of a full-graph detection model, but the model with smaller parameter number can be adopted to remarkably improve the operation efficiency due to smaller input. Optionally, the input fast detection model may be a target frame in the foregoing embodiment and the optional embodiment, the detection frame in the previous frame may be a history detection frame in the history frame, the fast detection model may divide a target area in the target frame according to the history detection frame in the history frame and the motion information of the vehicle, generate the target detection frame based on the target area, and detect a target traffic sign in an image range of the target frame corresponding to the target detection frame, that is, the target traffic sign is located in the history detection frame in the history frame, and then the target traffic sign is highly probable to be located in the target detection frame in the target frame. Obviously, the rapid detection model does not need to carry out complex full image detection on the target frame, so that the calculation force can be greatly saved, and the detection time of the traffic sign can be accelerated.

Classification model: the input of the model is the image within the frame range in the full-image detection model or the rapid detection model, the content in the image is classified, and the sign detection result corresponding to the target traffic sign, such as speed limit and specific numerical value, cancellation of speed limit and specific numerical value, forbidden stop and the like, is output.

Auxiliary card detection model: the input of the model is an ROI image, the ROI is obtained based on the detected traffic sign position in the current target frame, and the traffic sign auxiliary sign is detected in the range, for example, the auxiliary sign in fig. 5, that is, the part circled by the lower square of fig. 5 and representing the "ramp" is detected. The model is mainly used for expanding traffic sign detection information and judging the lane relation corresponding to the traffic sign content.

In an actual application scene, after the position of the prediction frame is obtained, judging and calling a full-image detection model or a rapid detection model based on a preset scheme selected in the application: if the full-graph detection model is called, directly obtaining a detection frame, and continuing subsequent processing after de-distortion; if the rapid detection model is called, the position of the prediction frame is expanded to obtain the ROI, detection is carried out in the region, de-distortion is carried out after the target detection frame is obtained, and the rapid detection model can be a lightweight deep learning model which meets the requirements of precision and instantaneity.

The adoption of the rapid detection model can effectively reduce the omission ratio, and meanwhile, when the rapid detection model is used, in order to improve the efficiency and inhibit false detection, the rapid detection model can also carry out the following treatment:

the plurality of detection frames in the image input into the rapid detection model are ordered according to the size, and the large frame is preferentially selected for recognition, because the larger detection frame in the image means that the traffic sign in the detection frame is closer to the vehicle, and the detection priority is high. Meanwhile, if the small frame is within the large frame range, the small frame is not detected.

And acquiring detection results of all rapid detection of the traffic sign in the image, acquiring Non-overlapping detection frames by adopting an NMS (Non-Maximum Suppression ) algorithm, and returning confidence coefficient results of all the detection frames.

After the detection of the rapid detection model is finished, based on the confidence coefficient of the detection frames output by the model, preliminary screening is carried out on all the detection frames, and the detection frames lower than the confidence coefficient threshold value are directly deleted. And then constructing an MHT tree structure for the detection frame and the prediction frame for initially establishing a matching relationship. The prediction frame is a detection frame in a target frame predicted based on a history detection frame in a history frame, and for each prediction frame, the prediction frame can be reasonably expanded to obtain an ROI area corresponding to the prediction frame, and because the ROI area can be large or small, the ROI area may include a plurality of initial detection frames (i.e., candidate frames in the ROI area), which one of the plurality of initial detection frames is a detection frame corresponding to the history detection frame one by one, and the target detection frame (i.e., the target detection frame and the history detection frame are the same traffic sign in reality) needs to be determined. For each prediction frame, a detection frame with a distance smaller than a threshold value can be stored as a candidate frame to be matched, so that a preliminary one-to-many weak matching relationship is realized, namely, one history detection frame corresponds to a plurality of candidate frames.

As an optional embodiment, performing image recognition on the target image to obtain a target detection frame, including the following steps: performing image recognition on the target image to obtain an initial detection frame; based on the history detection frame and the initial detection frame, establishing an MHT tree structure, wherein the MHT tree structure is used for representing a weak matching relationship between the history detection frame and the initial detection frame; and screening the initial detection frame based on the MHT tree structure to obtain a target detection frame.

Further, the multiple candidate frames output by the model can be further screened according to preset width and height thresholds; if the candidate frame is positioned at the edge of the image, considering that the image edge is distorted more and is easy to cause false detection, meanwhile, the traffic sign positioned at the edge of the image means that the vehicle is about to be exceeded, the information of the traffic sign is often not important any more, and the candidate frame can be discarded; furthermore, if the candidate box is located below the vanishing point of the image, it will also be discarded, since, according to common knowledge, the traffic sign does not appear below the horizon. After the screening, reserving candidate frames with weak association relation with the prediction frames, and adding the candidate frames into an array to be classified. If the number of elements in the array to be classified is smaller than the threshold value, then whether the detection frame result needs to be supplemented in the current frame is judged later. And finally returning all the detection frames to be identified to finish the detection link.

As an alternative embodiment, generating a target detection frame according to the target area and the target frame includes: cutting off an image corresponding to a target area in a target frame to obtain a target image; and carrying out image recognition on the target image to obtain a target detection frame.

As an alternative embodiment, based on the MHT tree structure, the target detection box is obtained by screening from the initial detection box, which includes the following steps: and screening the initial detection frames to obtain target detection frames based on a Hungary algorithm, wherein the Hungary algorithm is used for correcting the weak matching relationship in the MHT tree structure into one-to-one correspondence relationship between the historical detection frames and the target detection frames.

As an alternative embodiment, based on the hungarian algorithm, the target detection frame is obtained by screening from the initial detection frame, which includes: acquiring a first texture corresponding to the history detection frame and a second texture corresponding to the initial detection frame respectively; and matching the globally optimal detection frame in the initial detection frame with the historical detection frame in a one-to-one correspondence manner according to the first texture, the second texture and the cost function constructed based on the Hungary algorithm to obtain a target detection frame, wherein the cost function is constructed based on texture characteristics and characteristic similarity of the texture.

The above-mentioned optional embodiments include a method for determining the target detection frame, and specifically, the following optional embodiments may be used for processing:

the post-processing module is adopted, and the main function of the post-processing module is to realize one-to-one matching of the prediction frame and the detection frame, finish the classification of the traffic sign board and the detection of the auxiliary board in the process, and carry out size adjustment on the detection result through the filter, and output the final detection result.

First, a classification model may be invoked on all detection frame results (candidate frames) that are also retained, obtaining attributes of traffic signs in each frame. And then correcting the one-to-many association relation into a one-to-one association relation through a Hungary algorithm. The hungarian algorithm is a combined optimization algorithm for solving task allocation problems in polynomial time, and one-to-one matching results of a prediction frame and a target detection frame in an initial detection frame can be generated in a global optimum mode through the algorithm and a set cost function, wherein the cost function can be used for judging the distance, the length-width ratio, the area difference value, the characteristic difference value and the like of any one frame of the prediction frame and the candidate frame, and helping to judge which one of two currently detected frames is matched with traffic signs in a history detection frame in a history frame. Specifically, the matching can be performed according to texture features in the CNN, and different objects can be distinguished by utilizing the difference of textures between different targets. And obtaining texture features and further obtaining feature vectors, and calculating the difference between the feature vectors in the modes of Euclidean distance, cosine distance and the like. The more similar the two objects are, the smaller the characteristic distance is, and the larger the characteristic distance is, so that the characteristic distance can be used for judging whether the objects between the front frame and the rear frame have larger similarity or not, and the objects between the two frames can be associated. In this way, the one-to-many matches in the MHT tree structure are initially optimized using the hungarian algorithm. And for the detection result which cannot be matched in the CNN characteristic mode, a cost function can be further built again according to the IOU (Intersection of Union, cross-correlation ratio) and the center distance between the detection result and the prediction frame, secondary matching is carried out by using a Hungary algorithm, and finally, the one-to-one matching relation between the globally optimal historical detection frame and the target detection frame is obtained.

According to the matching result, the target detection frame information which is successfully matched can be updated into the prediction frame structure which is successfully matched, so that the transitivity of the inter-frame relationship is ensured. If the target detection frame which is not successfully matched appears and the candidate frame is not positioned at the edge of the image, supplementing the candidate frame into a prediction frame array, entering a processing sequence of the next frame, and indicating that the traffic sign in the candidate frame appears in the target frame for the first time and is used for predicting the traffic sign in the image after the target frame.

And acquiring attribute information of the history detection frame in the history frame through the established one-to-one matching relationship, voting by utilizing the current attribute and the history attribute, selecting the detection result attribute with the largest vote number, and judging whether the detection result attribute meets the output standard through comparison with a threshold value.

The voting queue can be designed as a sliding window with fixed length, and the voting queue is used for reducing false detection and improving accuracy of detection result output. If a length-5 voting queue is created, then for all traffic sign objects, at least 5 consecutive frames are required to have matching detection results, and the final attribute output will obtain the attribute with the largest number of votes through 5 voting output. For example, the voting result of a detected traffic sign for 5 consecutive frames is: stopping, limiting 70, limiting 60 and limiting 60, and finally outputting the limiting 60 as a detection result.

After classification is completed, speed-limiting traffic signs with larger overall sizes are obtained, an auxiliary board recognition model is called for the speed-limiting traffic signs, and whether the recognition result contains auxiliary information such as 'ramp' or not is verified to judge lanes related to the traffic signs. The model may be configured to be invoked at intervals to save computational effort, while the model may not be invoked if the total number of votes for the attribute exceeds a threshold. The auxiliary card recognition model logic is as follows:

as shown in fig. 5, the upper frame (the frame that frames the "40" mark) is a detection frame in the current frame, and expansion is performed based on the detection frame to obtain the outermost frame (i.e., ROI area frame) that frames the entire signboard, and the auxiliary board detection model is called in the ROI area to obtain the lower frame (the frame that frames the "ramp" mark) and the auxiliary board information therein.

Then, maintaining the prediction frame array, deleting the objects in which the matching relation is not established among the continuous multiframes, and merging the two prediction frames with high overlapping degree. And finally, invoking a Kalman filtering algorithm on all objects in the array, taking the predicted width and the predicted height of the current frame as input, taking the predicted width and the predicted height of the current frame detection frame as observation values, combining the detected width and the predicted height by Kalman gain, and obtaining the final detection frame size information. The Kalman filtering algorithm is called to enable the width and the height of the detection frame to be more stable, the size of the prediction frame and the size of the current detection frame are weighted, and fine adjustment is carried out on the boundary of the current detection frame.

The specific Kalman filter flow is as follows:

(predicting state changes)

(predictive estimation covariance matrix)

(measurement residual error)

(measurement residual covariance)

(optimal Kalman gain)

(updated state estimation)

P _k|k ＝(I-K _k H _h )P _k|k-1 (updated covariance estimation)

So far, the whole traffic sign processing flow is completed, and the detection result of the traffic sign is finally output.

The whole detection flow is shown in the following chart:

step S1, carrying out position prediction on a history detection frame in a history frame, and predicting the position of the history detection frame in a current target frame to obtain an ROI (region of interest) in the target frame;

step S2, expanding the ROI area, calling a detection model, and firstly removing part of invalid detection frames, such as detection frames which are positioned at the image edge of the target frame, according to prior information;

s3, establishing rough matching between a prediction frame and a detection frame by utilizing an MHT tree structure;

s4, selecting a detection frame with high priority as an identification candidate frame;

s5, calling a classification model to identify the content in the detection frame;

step S6, establishing a one-to-one matching relationship between the prediction frame and the detection frame through a Hungary algorithm;

step S7, a detection frame which does not achieve matching is established as a new prediction frame target, and the traffic sign in the prediction frame is probably the traffic sign which does not appear in the history frame;

S8, calling an auxiliary card model for detection;

s9, correcting the position and the size of the traffic sign detection frame through Kalman filtering;

step S10, outputting the position of the traffic sign detection frame.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

From the above description of the embodiments, it will be clear to those skilled in the art that the method for detecting a flag according to the above embodiments may be implemented by means of software plus a necessary general hardware platform, or may be implemented by hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present invention.

According to an embodiment of the present invention, there is further provided a mark detection device for implementing the above mark detection method, and fig. 6 is a block diagram of a structure of the mark detection device provided according to an embodiment of the present invention, as shown in fig. 6, the mark detection device includes: the acquisition module 62, the determination module 64 and the generation module 66 will be described below as the flag detection device.

The acquiring module 62 is configured to acquire a history frame, a target frame, and motion information of a vehicle, where the history frame and the target frame are images captured by a camera of the vehicle;

a determining module 64, configured to determine a target area according to the motion information and a history detection frame in the history frame, where the history detection frame is used to mark a traffic sign in the history frame;

the generating module 66 is configured to generate a target detection frame according to the target area and the target frame, where the target detection frame is configured to label traffic signs located in the target area in the target frame.

Here, the above-mentioned obtaining module 62, determining module 64 and generating module 66 correspond to steps S202 to S206 in the embodiment, and the three modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the above-mentioned embodiment. It should be noted that the above-described module may be operated as a part of the apparatus in the computer terminal 10 provided in the embodiment.

Embodiments of the present invention may provide a computer device, optionally in this embodiment, the computer device may be located in at least one network device of a plurality of network devices of a computer network. The computer device includes a memory and a processor.

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for detecting a flag in the embodiments of the present invention, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the method for detecting a flag. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located relative to the processor, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: acquiring a history frame, a target frame and motion information of a vehicle, wherein the history frame and the target frame are images obtained by shooting a camera of the vehicle; determining a target area according to the motion information and a history detection frame in a history frame, wherein the history detection frame is used for marking traffic signs in the history frame; and generating a target detection frame according to the target area and the target frame, wherein the target detection frame is used for marking a target traffic sign in the target area in the target frame.

Optionally, the above processor may further execute program code for: identifying a target traffic sign to obtain a classification result of the target traffic sign; identifying traffic auxiliary sign information corresponding to the target traffic sign according to the target area and the target detection frame; and outputting a sign detection result corresponding to the target traffic sign according to the classification result and the traffic auxiliary sign information.

Optionally, the above processor may further execute program code for: determining a target area according to the motion information and a history detection frame in a history frame, wherein the method comprises the following steps: determining a first coordinate of a traffic sign marked by a history detection frame in a world coordinate system according to the history frame; predicting a second coordinate of the traffic sign marked by the history detection frame in the world coordinate system at the moment corresponding to the target frame according to the motion information; generating a prediction frame corresponding to the history detection frame in the target frame according to the second coordinate; according to the prediction frame, a target area including the prediction frame is generated.

Optionally, the above processor may further execute program code for: generating a target detection frame according to the target area and the target frame, including: cutting off an image corresponding to a target area in a target frame to obtain a target image; and carrying out image recognition on the target image to obtain a target detection frame.

Optionally, the above processor may further execute program code for: image recognition is carried out on the target image to obtain a target detection frame, which comprises the following steps: performing image recognition on the target image to obtain an initial detection frame; based on the history detection frame and the initial detection frame, establishing an MHT tree structure, wherein the MHT tree structure is used for representing a weak matching relationship between the history detection frame and the initial detection frame; and screening the initial detection frame based on the MHT tree structure to obtain a target detection frame.

Optionally, the above processor may further execute program code for: based on the MHT tree structure, a target detection frame is obtained by screening from the initial detection frame, and the method comprises the following steps: and screening the initial detection frames to obtain target detection frames based on a Hungary algorithm, wherein the Hungary algorithm is used for correcting the weak matching relationship in the MHT tree structure into one-to-one correspondence relationship between the historical detection frames and the target detection frames.

Optionally, the above processor may further execute program code for: based on the hungarian algorithm, a target detection frame is obtained by screening from the initial detection frame, and the method comprises the following steps: acquiring a first texture corresponding to the history detection frame and a second texture corresponding to the initial detection frame respectively; and matching the globally optimal detection frame in the initial detection frame with the historical detection frame in a one-to-one correspondence manner according to the first texture, the second texture and the cost function constructed based on the Hungary algorithm to obtain a target detection frame, wherein the cost function is constructed based on texture characteristics and characteristic similarity of the texture.

Those skilled in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute on associated hardware, the program may be stored in a non-volatile storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

Embodiments of the present invention also provide a nonvolatile storage medium. Alternatively, in the present embodiment, the above-described nonvolatile storage medium may be used to store the program code executed by the flag detection method provided in the above-described embodiment.

Alternatively, in this embodiment, the above-mentioned nonvolatile storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.

Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: acquiring a history frame, a target frame and motion information of a vehicle, wherein the history frame and the target frame are images obtained by shooting a camera of the vehicle; determining a target area according to the motion information and a history detection frame in a history frame, wherein the history detection frame is used for marking traffic signs in the history frame; and generating a target detection frame according to the target area and the target frame, wherein the target detection frame is used for marking a target traffic sign in the target area in the target frame.

Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: identifying a target traffic sign to obtain a classification result of the target traffic sign; identifying traffic auxiliary sign information corresponding to the target traffic sign according to the target area and the target detection frame; and outputting a sign detection result corresponding to the target traffic sign according to the classification result and the traffic auxiliary sign information.

Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: determining a target area according to the motion information and a history detection frame in a history frame, wherein the method comprises the following steps: determining a first coordinate of a traffic sign marked by a history detection frame in a world coordinate system according to the history frame; predicting a second coordinate of the traffic sign marked by the history detection frame in the world coordinate system at the moment corresponding to the target frame according to the motion information; generating a prediction frame corresponding to the history detection frame in the target frame according to the second coordinate; according to the prediction frame, a target area including the prediction frame is generated.

Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: generating a target detection frame according to the target area and the target frame, including: cutting off an image corresponding to a target area in a target frame to obtain a target image; and carrying out image recognition on the target image to obtain a target detection frame.

Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: image recognition is carried out on the target image to obtain a target detection frame, which comprises the following steps: performing image recognition on the target image to obtain an initial detection frame; based on the history detection frame and the initial detection frame, establishing an MHT tree structure, wherein the MHT tree structure is used for representing a weak matching relationship between the history detection frame and the initial detection frame; and screening the initial detection frame based on the MHT tree structure to obtain a target detection frame.

Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: based on the MHT tree structure, a target detection frame is obtained by screening from the initial detection frame, and the method comprises the following steps: and screening the initial detection frames to obtain target detection frames based on a Hungary algorithm, wherein the Hungary algorithm is used for correcting the weak matching relationship in the MHT tree structure into one-to-one correspondence relationship between the historical detection frames and the target detection frames.

Optionally, in the present embodiment, the non-volatile storage medium is arranged to store program code for performing the steps of: based on the hungarian algorithm, a target detection frame is obtained by screening from the initial detection frame, and the method comprises the following steps: acquiring a first texture corresponding to the history detection frame and a second texture corresponding to the initial detection frame respectively; and matching the globally optimal detection frame in the initial detection frame with the historical detection frame in a one-to-one correspondence manner according to the first texture, the second texture and the cost function constructed based on the Hungary algorithm to obtain a target detection frame, wherein the cost function is constructed based on texture characteristics and characteristic similarity of the texture.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a non-volatile storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method of detecting a marker, comprising:

acquiring a history frame, a target frame and motion information of a vehicle, wherein the history frame and the target frame are images obtained by shooting a camera of the vehicle;

determining a target area according to the motion information and a history detection frame in the history frame, wherein the history detection frame is used for marking traffic signs in the history frame;

and generating a target detection frame according to the target area and the target frame, wherein the target detection frame is used for marking a target traffic sign in the target area in the target frame.

2. The method according to claim 1, wherein the method further comprises:

identifying the target traffic sign to obtain a classification result of the target traffic sign;

identifying traffic auxiliary sign information corresponding to the target traffic sign according to the target area and the target detection frame;

And outputting a sign detection result corresponding to the target traffic sign according to the classification result and the traffic auxiliary sign information.

3. The method of claim 1, wherein the determining the target area based on the motion information and a history detection box in the history frame comprises:

according to the history frame, determining a first coordinate of a traffic sign marked by the history detection frame in a world coordinate system;

predicting a second coordinate of the traffic sign marked by the history detection frame in the world coordinate system at the time corresponding to the target frame according to the motion information;

generating a prediction frame corresponding to the history detection frame in the target frame according to the second coordinate;

and generating the target area comprising the prediction frame according to the prediction frame.

4. The method of claim 1, wherein generating a target detection box from the target region and the target frame comprises:

cutting off an image corresponding to the target area in the target frame to obtain a target image;

and carrying out image recognition on the target image to obtain the target detection frame.

5. The method of claim 4, wherein the performing image recognition on the target image to obtain the target detection frame includes:

performing image recognition on the target image to obtain an initial detection frame;

establishing an MHT tree structure based on the history detection frame and the initial detection frame, wherein the MHT tree structure is used for representing a weak matching relationship between the history detection frame and the initial detection frame;

and screening the target detection frame from the initial detection frame based on the MHT tree structure.

6. The method of claim 5, wherein the screening the target detection box from the initial detection box based on the MHT tree structure comprises:

and screening the target detection frame from the initial detection frame based on a Hungary algorithm, wherein the Hungary algorithm is used for correcting the weak matching relationship in the MHT tree structure into a one-to-one correspondence relationship between the historical detection frame and the target detection frame.

7. The method according to claim 6, wherein the screening the target detection box from the initial detection boxes based on the hungarian algorithm comprises:

Acquiring a first texture corresponding to the history detection frame and a second texture corresponding to the initial detection frame respectively;

and matching the globally optimal detection frame in the initial detection frame with the historical detection frame in a one-to-one correspondence manner according to the first texture, the second texture and the cost function constructed based on the Hungary algorithm to obtain the target detection frame, wherein the cost function is constructed based on texture characteristics and characteristic similarity of the textures.

8. A sign detection apparatus, comprising:

the acquisition module is used for acquiring historical frames, target frames and motion information of a vehicle, wherein the historical frames and the target frames are images obtained by shooting of a camera of the vehicle;

the determining module is used for determining a target area according to the motion information and a history detection frame in the history frame, wherein the history detection frame is used for marking traffic marks in the history frame;

the generation module is used for generating a target detection frame according to the target area and the target frame, wherein the target detection frame is used for marking traffic signs in the target frame, and the traffic signs are located in the target area.

9. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored program, wherein the program, when run, controls a device in which the non-volatile storage medium is located to perform the method for detecting a flag according to any one of claims 1 to 7.

10. A computer device comprising a memory for storing a program and a processor for executing the program stored in the memory, wherein the program is operative to perform the method of any one of claims 1 to 7.