Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For a better understanding of the embodiments of the present application, some of the terms or expressions appearing in the course of describing the embodiments of the present application are to be interpreted as follows:
optical Character Recognition (OCR): the method is a process that the electronic equipment checks characters printed on paper, determines the shape of the characters by detecting dark and light modes, and then translates the shape into computer characters by a character recognition method; the method is a technology for converting characters in a paper document into an image file with a black-white dot matrix in an optical mode aiming at print characters, and converting the characters in the image into a text format through recognition software for further editing and processing by word processing software.
And (3) a classification algorithm: a classifier is trained from a group of samples whose class labels are known, allowing it to classify an unknown sample. The classification algorithm belongs to supervised learning, and the classification process is to establish a classification model to describe a preset quantity set or a concept set and construct the model by analyzing database metaancestors described by attributes; the method aims to divide a new data set by using classification, and mainly relates to accuracy of classification rules, overfitting, selection and selection of contradiction division and the like. Commonly used classification algorithms include: bayesian classification algorithms, logistic regression algorithms, decision tree algorithms, artificial neural network algorithms, etc.
Example 1
In accordance with an embodiment of the present application, there is provided an embodiment of a sign detection method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that illustrated herein.
Fig. 1 is a schematic flow chart of an alternative sign detection method according to an embodiment of the present application, as shown in fig. 1, the method at least includes steps S102-S106, wherein:
step S102, a first image set to be detected is obtained.
Specifically, the first image set may be a plurality of sets of road information images captured by a vehicle-mounted camera, where a part of the images may include traffic prohibition signboards, such as weight-limiting, height-limiting, width-limiting, and speed-limiting signboards, and because the weight-limiting, width-limiting, and height-limiting signboards belong to secret-related information and need to be desensitized, a final objective of the present application is to identify images including three types of target signboards, i.e., weight-limiting, width-limiting, and height-limiting, from the first image set and then perform desensitization on the target signboards therein.
It should be noted that, when detecting a video, the actual detection process still detects frame-by-frame images in the video, and at this time, all frame images in the video may be used as the first image set.
And step S104, inputting the first image set into a target detection model for detection, and outputting to obtain a second image set, wherein the target detection model is used for detecting the first signboard image in the input image, and the second image set comprises the first signboard image.
The first signboard mainly refers to a traffic prohibition signboard, and the first signboard image is an image including the traffic prohibition signboard, which includes but is not limited to: weight-limiting signboard images, height-limiting signboard images, width-limiting signboard images, speed-limiting signboard images, bearing weight-limiting signboard images and the like.
In some optional embodiments of the present application, in order to preferentially ensure a high recall rate of the target signboard, the images in the first image set may be initially screened by the target detection model to find all the first signboard images including the traffic prohibition signboard, so as to obtain the second image set.
The target detection model can be obtained through the following training process: firstly, acquiring a sample image comprising a first signboard and marking information corresponding to the sample image; and training the initial detection model according to the sample image and the labeling information to obtain a target detection model.
Specifically, the sample image may be an image or a video including a traffic prohibition signboard collected by the vehicle-mounted camera, and the image or the video carries the label information of the corresponding traffic prohibition signboard; the initial detection model can adopt detection models such as FasterRCNN, RetinaNet, CenterNet, Yolo series and the like commonly used in deep learning, and the initial detection model is trained by using the sample image and the labeling information, so that the target detection model capable of detecting the traffic prohibition signboard in the image can be obtained.
In a specific detection process, inputting any first image in a first image set into a target detection model for detection; when the target detection model detects that the first image is the first signboard image, a detection box (detecting box) can be added in the first image, and the detection box is used for marking the first signboard in the first image; then, the target detection model outputs the first image as one image in the second image set. In this way, the target detection model can generally reach a recall rate of more than 95% for the target signboard.
And S106, inputting the second image set into a target recognition model for recognition, and outputting to obtain a third image set, wherein the target recognition model is used for recognizing the target signboard image in the input image according to an optical character recognition algorithm and/or a classification algorithm, and the third image set comprises the target signboard image.
Wherein, the target signboard mainly refers to three kinds of signboards of weight limit, width limit and height limit, namely the target signboard image includes: the schematic diagram of the weight-limited signboard image, the height-limited signboard image and/or the width-limited signboard image is shown in fig. 2, wherein the weight of the weight-limited signboard is 15t, the height of the height-limited signboard is 4.0m, and the width of the width-limited signboard is 3 m.
After the images in the first image set are primarily screened through the target detection model to obtain a second image set, the images in the second image set can be precisely screened again through the target identification model, all target signboard images including the limited weight, limited width or limited height signboards are found, and a third image set is obtained.
In order to further improve the accuracy of the image recognition result, in some alternative embodiments of the present application, the object recognition model generally includes two parts, i.e., an OCR algorithm module and a classification algorithm module, but may include only one of the OCR algorithm module and the classification algorithm module, only the accuracy of the recognition is reduced. In a specific detection process, for any second image in the second image set, inputting the second image into an OCR algorithm module to obtain a first recognition result; or inputting the second image into a classification algorithm module to obtain a second recognition result; determining whether the second image is the target signboard image according to the first recognition result and/or the second recognition result; and when the second image is the target signboard image, outputting the second image as one image in a third image set.
When the image is detected through the OCR algorithm, the second image is input into the OCR algorithm module, the OCR algorithm module recognizes character information in a detection frame in the second image, judges whether the second image is the target signboard image or not according to the character information, and outputs a first recognition result corresponding to the judgment result.
Specifically, after the second image is input into the OCR algorithm module, the module performs OCR on a detection frame (i.e. a traffic prohibition signboard part in the image) in the second image, determines character information therein, for example, recognizes that the second image includes characters "1", "5" and "t", that is, determines that the second image is the target signboard, and outputs a first recognition result of 1; if the second image is determined not to be the target signboard by the recognition, the first recognition result is output as 0.
When the image is detected through the classification algorithm, the second image is input into the classification algorithm module, and the classification algorithm module is used for identifying the type of the signboard in the detection frame in the second image and outputting a second identification result corresponding to the type of the signboard.
The classification algorithm module is preset with a category index, the category index comprises a plurality of signboard types and a plurality of second identification results corresponding to the plurality of signboard types one by one, and the plurality of signboard types at least comprise: target sign, non-target sign. For example, build a category index: non-target signboards: 0; target signboard: 1. optionally, the target signboard may be further divided, and a category index is established: non-target signboards: 0; weight limit signboard: 1; width limiting signboard: 2; height limiting signboard: 3. wherein 0, 1, 2, and 3 represent a second recognition result corresponding to the type of the signboard, and the second recognition result can be set by the user, which is merely an example and is not limited in detail.
In order to improve the accuracy of the recognition result, after the second image is detected by the OCR algorithm and the classification algorithm, respectively, it may be determined whether the second image is the target signboard image by a voting mechanism.
In some optional embodiments of the present application, the specific flow of the voting mechanism is as follows:
first, the first recognition result and the second recognition result are weighted and averaged according to a preset weight coefficient to obtain a third recognition result, and usually, the weight coefficient of the second recognition result is greater than the weight coefficient of the first recognition result. Assume that the first recognition result of the OCR algorithm is n
1With a weight of w
1The second recognition result of the classification algorithm is n
2With a weight of w
2Considering that the recognition result of the classification algorithm is usually more accurate than the recognition result of the OCR algorithm, there is w
2>w
1Wherein w is
1+w
21, and the third recognition result is
Then comparing the third recognition result with a preset threshold value theta, wherein the preset threshold value theta is usually set by a user according to engineering experience; when the third recognition result is larger than a preset threshold value, namely res is larger than theta, determining that the second image is the target signboard image; and when the third recognition result is not larger than the preset threshold value, namely res is not larger than theta, determining that the second image is not the target signboard image.
Optionally, if the target recognition model only includes the OCR algorithm module to obtain only the first recognition result, or the target recognition model only includes the classification algorithm module to obtain only the second recognition result, directly comparing the first recognition result or the second recognition result with a preset threshold, and determining whether the second image is the target signboard image according to the comparison result. This process can also be regarded as having the weight coefficient w of the first recognition result in the voting mechanism10 or the weight coefficient w of the second recognition result2=0。
Fig. 3 shows a process of identifying a target signboard in an image, in which an image is first captured by a camera, and then the image is input into a target detection model for detection, and when it is confirmed that a traffic prohibition signboard is included therein, a candidate detection frame is output, and the detection frame includes the traffic prohibition signboard; and then, performing OCR algorithm recognition and classification algorithm recognition on the detection box respectively, voting by the OCR algorithm recognition and the classification algorithm recognition, and outputting a final recognition result.
For example, assuming that the actual speed limit 60 sign in the image is the speed limit sign, the OCR algorithm gives a "no" decision because the characters of t or m are not recognized, the classification algorithm or the sign is too small to distinguish, gives a "yes" decision, votes for the two, and finally outputs "no" through weight calculation; assuming that the mark plate with the weight limit of 15t is actually in the image, the OCR algorithm and the classification algorithm both give a judgment of 'yes', vote of the two, and finally output 'yes' is calculated through the weight. In practical application, a user can flexibly adjust the voting ratio of the OCR algorithm and the classification algorithm according to the performance of the OCR algorithm and the classification algorithm.
In the embodiment of the application, after a first image set to be detected is obtained, the first image set is firstly input into a target detection model for detection, and a second image set is obtained through output, wherein the target detection model is used for detecting a first signboard image in the input image, and the second image set comprises the first signboard image; and then, inputting the second image set into a target recognition model for recognition, and outputting to obtain a third image set, wherein the target recognition model is used for recognizing a target signboard image in the input image according to an optical character recognition algorithm and/or a classification algorithm, and the third image set comprises the target signboard image. In the detection process, the target detection model and the target identification model are used for cascade detection, the target detection model is used for primary screening, high recall rate of the target signboard can be preferentially ensured, then the target identification model is used for precise screening, and the image is identified through an optical character identification algorithm and/or a classification algorithm, so that the accuracy of the identification result can be improved, and the technical problems that the accuracy of the detection identification result of the traffic prohibition signboard in the related technology is not high and the desensitization requirement is difficult to meet are solved.
Example 2
According to an embodiment of the present application, there is also provided a signboard detecting apparatus for implementing the above-mentioned signboard detecting method, as shown in fig. 4, the apparatus at least includes an obtaining module 40, a detecting module 42 and an identifying module 44, wherein:
an obtaining module 40, configured to obtain a first image set to be detected.
Specifically, the first image set may be a plurality of sets of road information images captured by a vehicle-mounted camera, where a part of the images may include traffic prohibition signboards, such as weight-limiting, height-limiting, width-limiting, and speed-limiting signboards, and because the weight-limiting, width-limiting, and height-limiting signboards belong to secret-related information and need to be desensitized, a final objective of the present application is to identify images including three types of target signboards, i.e., weight-limiting, width-limiting, and height-limiting, from the first image set and then perform desensitization on the target signboards therein.
It should be noted that, when detecting a video, the actual detection process still detects frame-by-frame images in the video, and at this time, all frame images in the video may be used as the first image set.
The detection module 42 is configured to input the first image set into a target detection model for detection, and output the target detection model to obtain a second image set, where the target detection model is used to detect a first signboard image in the input image, and the second image set includes the first signboard image.
The first signboard mainly refers to a traffic prohibition signboard, and the first signboard image is an image including the traffic prohibition signboard, which includes but is not limited to: weight-limiting signboard images, height-limiting signboard images, width-limiting signboard images, speed-limiting signboard images, bearing weight-limiting signboard images and the like.
In some optional embodiments of the present application, in order to preferentially ensure a high recall rate of the target signboard, the images in the first image set may be initially screened by the target detection model to find all the first signboard images including the traffic prohibition signboard, so as to obtain the second image set.
The target detection model can be obtained through the following training process: firstly, acquiring a sample image comprising a first signboard and marking information corresponding to the sample image; and training the initial detection model according to the sample image and the labeling information to obtain a target detection model.
Specifically, the sample image may be an image or a video including a traffic prohibition signboard collected by the vehicle-mounted camera, and the image or the video carries the label information of the corresponding traffic prohibition signboard; the initial detection model can adopt detection models such as FasterRCNN, RetinaNet, CenterNet, Yolo series and the like commonly used in deep learning, and the initial detection model is trained by using the sample image and the labeling information, so that the target detection model capable of detecting the traffic prohibition signboard in the image can be obtained.
In a specific detection process, inputting any first image in a first image set into a target detection model for detection; when the target detection model detects that the first image is the first signboard image, a detection box (detecting box) can be added in the first image, and the detection box is used for marking the first signboard in the first image; then, the target detection model outputs the first image as one image in the second image set. In this way, the target detection model can generally reach a recall rate of more than 95% for the target signboard.
And the recognition module 44 is configured to input the second image set into a target recognition model for recognition, and output the second image set to obtain a third image set, where the target recognition model is configured to recognize a target signboard image in the input image according to an optical character recognition algorithm and a classification algorithm, and the third image set includes the target signboard image.
Wherein, the target signboard mainly refers to three kinds of signboards of weight limit, width limit and height limit, namely the target signboard image includes: the schematic diagram of the weight-limited signboard image, the height-limited signboard image and/or the width-limited signboard image is shown in fig. 2, wherein the weight of the weight-limited signboard is 15t, the height of the height-limited signboard is 4.0m, and the width of the width-limited signboard is 3 m.
After the images in the first image set are primarily screened through the target detection model to obtain a second image set, the images in the second image set can be precisely screened again through the target identification model, all target signboard images including the limited weight, limited width or limited height signboards are found, and a third image set is obtained.
In order to further improve the accuracy of the image recognition result, in some alternative embodiments of the present application, the object recognition model generally includes two parts, i.e., an OCR algorithm module and a classification algorithm module, but may include only one of the OCR algorithm module and the classification algorithm module, only the accuracy of the recognition is reduced. In a specific detection process, for any second image in the second image set, inputting the second image into an OCR algorithm module to obtain a first recognition result; or inputting the second image into a classification algorithm module to obtain a second recognition result; determining whether the second image is the target signboard image according to the first recognition result and/or the second recognition result; and when the second image is the target signboard image, outputting the second image as one image in a third image set.
When the image is detected through the OCR algorithm, the second image is input into the OCR algorithm module, the OCR algorithm module recognizes character information in a detection frame in the second image, judges whether the second image is the target signboard image or not according to the character information, and outputs a first recognition result corresponding to the judgment result.
Specifically, after the second image is input into the OCR algorithm module, the module performs OCR on a detection frame (i.e. a traffic prohibition signboard part in the image) in the second image, determines character information therein, for example, recognizes that the second image includes characters "1", "5" and "t", that is, determines that the second image is the target signboard, and outputs a first recognition result of 1; if the second image is determined not to be the target signboard by the recognition, the first recognition result is output as 0.
When the image is detected through the classification algorithm, the second image is input into the classification algorithm module, and the classification algorithm module is used for identifying the type of the signboard in the detection frame in the second image and outputting a second identification result corresponding to the type of the signboard.
The classification algorithm module is preset with a category index, the category index comprises a plurality of signboard types and a plurality of second identification results corresponding to the plurality of signboard types one by one, and the plurality of signboard types at least comprise: target sign, non-target sign. For example, build a category index: non-target signboards: 0; target signboard: 1. optionally, the target signboard may be further divided, and a category index is established: non-target signboards: 0; weight limit signboard: 1; width limiting signboard: 2; height limiting signboard: 3. wherein 0, 1, 2, and 3 represent a second recognition result corresponding to the type of the signboard, and the second recognition result can be set by the user, which is merely an example and is not limited in detail.
In order to improve the accuracy of the recognition result, after the second image is detected by the OCR algorithm and the classification algorithm, respectively, it may be determined whether the second image is the target signboard image by a voting mechanism.
In some optional embodiments of the present application, the specific flow of the voting mechanism is as follows:
first, the first recognition result and the second recognition result are weighted and averaged according to a preset weight coefficient to obtain a third recognition result, and usually, the weight coefficient of the second recognition result is greater than the weight coefficient of the first recognition result. Assume that the first recognition result of the OCR algorithm is n
1With a weight of w
1The second recognition result of the classification algorithm is n
2With a weight of w
2Considering that the recognition result of the classification algorithm is usually more accurate than the recognition result of the OCR algorithm, there is w
2>w
1Wherein w is
1+w
21, and the third recognition result is
Then comparing the third recognition result with a preset threshold value theta, wherein the preset threshold value theta is usually set by a user according to engineering experience; when the third recognition result is larger than a preset threshold value, namely res is larger than theta, determining that the second image is the target signboard image; and when the third recognition result is not larger than the preset threshold value, namely res is not larger than theta, determining that the second image is not the target signboard image.
Optionally, if the target recognition model only includes the OCR algorithm module to obtain only the first recognition result, or the target recognition model only includes the classification algorithm module to obtain only the second recognition result, directly comparing the first recognition result or the second recognition result with a preset threshold, and determining whether the second image is the target signboard image according to the comparison result. This process can also be regarded as having the weight coefficient w of the first recognition result in the voting mechanism10 or the weight coefficient w of the second recognition result2=0。
It should be noted that, each module in the signboard detection apparatus in the embodiment of the present application corresponds to the implementation step of the signboard detection method in embodiment 1 one to one, and since the detailed description is already performed in embodiment 1, some details that are not shown in this embodiment may refer to embodiment 1, and are not described herein again.
Example 3
According to an embodiment of the present application, there is also provided a nonvolatile storage medium including a stored program, wherein the apparatus in which the nonvolatile storage medium is located is controlled to execute the signboard detection method in embodiment 1 when the program is executed.
Specifically, when the program runs, the device in which the nonvolatile storage medium is located is controlled to execute the following steps: acquiring a first image set to be detected; inputting the first image set into a target detection model for detection, and outputting to obtain a second image set, wherein the target detection model is used for detecting a first signboard image in the input image, and the second image set comprises the first signboard image; and inputting the second image set into a target recognition model for recognition, and outputting to obtain a third image set, wherein the target recognition model is used for recognizing a target signboard image in the input image according to an optical character recognition algorithm and/or a classification algorithm, and the third image set comprises the target signboard image.
Example 4
According to an embodiment of the present application, there is also provided a vehicle, as shown in fig. 5, mainly including: the vehicle-mounted camera system comprises a vehicle body 50, a vehicle-mounted camera 52 and an electronic device 54, wherein the vehicle-mounted camera 52 is used for collecting a signboard image, and the electronic device 54 comprises: a memory 540 and a processor 542, wherein the memory 540 stores therein a computer program, and the processor 542 is configured to execute the signboard detecting method in embodiment 1 by the computer program.
Alternatively, the electronic device 54 described above may be such as: any Vehicle-mounted information interaction terminal such as a Vehicle-mounted central computer, a central domain controller, an integrated ECU (Electronic Control Unit), an IVI (In-Vehicle Infotainment system), an SPB (Super Brain), an IHU (Infotainment Head Unit), and a DHU (Driver Head Unit). Wherein, SPB is a central domain controller defined as the brain of the automobile; the IHU is a vehicle-mounted comprehensive information processing device which is formed by adopting a vehicle-mounted special central processing unit based on a vehicle body bus system and internet Service, can realize a series of applications including three-dimensional navigation, real-time road conditions, IPTV, auxiliary driving, fault detection, vehicle information, vehicle body control, mobile office, wireless communication, online-based entertainment functions, TSP (Telematics Service Provider) and the like, and greatly improves the vehicle electronization, networking and intelligentization levels; the DHU is a combination of an IHU and a DIM (Driver Information Module, or Dash Integration Module), which is generally a display screen for displaying Information related to the driving and functions of the vehicle and is generally positioned behind the steering wheel where it is most easily seen by the Driver.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit may be a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.