CN111832554A - Image detection method, device and storage medium - Google Patents
Image detection method, device and storage medium Download PDFInfo
- Publication number
- CN111832554A CN111832554A CN201910300190.1A CN201910300190A CN111832554A CN 111832554 A CN111832554 A CN 111832554A CN 201910300190 A CN201910300190 A CN 201910300190A CN 111832554 A CN111832554 A CN 111832554A
- Authority
- CN
- China
- Prior art keywords
- character
- detected
- determining
- image
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 208
- 239000011159 matrix material Substances 0.000 claims abstract description 73
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 2
- 230000015654 memory Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 238000012015 optical character recognition Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/625—License plates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application discloses an image detection method, an image detection device and a storage medium, wherein the image detection device acquires a plurality of images to be detected of a detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
Description
Technical Field
The present application relates to the field of image recognition, and in particular, to an image detection method, an image detection apparatus, and a storage medium.
Background
Recognition of objects such as license plates, house plates, etc. is a popular application in the field of Optical Character Recognition (OCR).
A large number of parking lots, toll stations and the like are arranged on the market, and the OCR is adopted for license plate recognition. However, in these application scenarios, the distance between the license plate and the camera needs to be about 1 meter, the imaging requirement is relatively high, but when the distance between the camera and the license plate is relatively long (for example, in the logistics industry, when the license plate of a vehicle at a loading/unloading port is identified, the distance between the camera and the license plate is greater than 3 meters), or the imaging quality of the object is poor due to environmental reasons, the method is applied to image identification, and the identification accuracy is relatively low.
Disclosure of Invention
The embodiment of the application provides an image detection method, an image detection device and a storage medium, which can improve the accuracy of image identification.
In one aspect, the present application provides an image detection method, including:
acquiring a plurality of images to be detected of a detection object;
respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string;
determining an editing distance matrix corresponding to the plurality of images to be detected according to the character string;
and determining a target character string corresponding to the detection object according to the editing distance matrix.
Optionally, in some embodiments, the determining, according to the character string, an edit distance matrix corresponding to the multiple images to be detected includes:
respectively carrying out editing distance calculation on any two character strings in the character strings to obtain the editing distance between any two character strings;
and determining the editing distance matrix according to the editing distance.
Optionally, in some embodiments, the determining, according to the edit distance matrix, a target character string corresponding to the detection object includes:
determining the number of zero elements of each row in the editing distance matrix;
determining the maximum zero element number with the maximum numerical value from the zero element numbers of each line;
determining whether the maximum number of zero elements is greater than a number threshold;
and if the number of the zero elements is larger than the number threshold, determining the character string corresponding to the zero element in the maximum zero element number as the target character string.
Optionally, in some embodiments, the detection result further includes a character, a frame corresponding to the character, and a confidence of the character, and after determining whether the maximum zero element number is greater than a number threshold, the method further includes:
if the number of the characters is not larger than the number threshold, determining a target character corresponding to each character position according to the characters, the frames corresponding to the characters and the confidence degrees of the characters;
and determining the target character string according to the target character corresponding to each character position.
Optionally, in some embodiments, the determining, according to the character, a frame corresponding to the character, and the confidence level of the character, a target character corresponding to each character position includes:
determining a character position corresponding to each character according to the frame corresponding to the character;
determining characters corresponding to each character position;
determining the score of each character in each character position according to the character corresponding to each character position and the confidence coefficient of the character;
and determining the character with the highest score in each character position as the target character corresponding to each character position.
Optionally, in some embodiments, the acquiring a plurality of images to be detected of the detection object includes:
acquiring a video to be detected of a detected object;
and acquiring the plurality of images to be detected from the video to be detected.
Optionally, in some embodiments, the acquiring the plurality of images to be detected from the video to be detected includes:
determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected;
and determining the multiple images to be detected from each frame of image to be detected according to the confidence coefficient of each frame of image to be detected and the target pixel area of each frame of image to be detected.
Correspondingly, the present application further provides an image detection apparatus, specifically including:
the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of images to be detected of a detection object;
the processing unit is used for respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, and the detection result comprises a character string;
the first determining unit is used for determining an editing distance matrix corresponding to the multiple images to be detected according to the character strings;
and the second determining unit is used for determining a target character string corresponding to the detection object according to the editing distance matrix.
Optionally, in some embodiments, the first determining unit is specifically configured to:
respectively carrying out editing distance calculation on any two character strings in the character strings to obtain the editing distance between any two character strings;
and determining the editing distance matrix according to the editing distance.
Optionally, in some embodiments, the second determining unit includes:
the first determining subunit is used for determining the number of zero elements in each row in the editing distance matrix;
the second determining subunit is used for determining the maximum zero element number with the maximum numerical value from the zero element numbers of each row;
a third determining subunit, configured to determine whether the maximum number of zero elements is greater than a number threshold;
and the fourth determining subunit is configured to determine, when the maximum number of zero elements is greater than the number threshold, a character string corresponding to a zero element in the maximum number of zero elements as the target character string.
Optionally, in some embodiments, the detection result further includes a character, a frame corresponding to the character, and a confidence level of the character, and the apparatus further includes:
a third determining unit, configured to determine, when the maximum zero element number is not greater than the number threshold, a target character corresponding to each character position according to the character, a frame corresponding to the character, and a confidence of the character;
and the fourth determining unit is used for determining the target character string according to the target character corresponding to each character position.
Optionally, in some embodiments, the third determining unit is specifically configured to:
determining a character position corresponding to each character according to the frame corresponding to the character;
determining characters corresponding to each character position;
determining the score of each character in each character position according to the character corresponding to each character position and the confidence coefficient of the character;
and determining the character with the highest score in each character position as the target character corresponding to each character position.
Optionally, in some embodiments, the obtaining unit is specifically configured to:
acquiring a video to be detected of a detected object;
and acquiring the plurality of images to be detected from the video to be detected.
Optionally, in some embodiments, the obtaining unit is further specifically configured to:
determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected;
and determining the multiple images to be detected from each frame of image to be detected according to the confidence coefficient of each frame of image to be detected and the target pixel area of each frame of image to be detected.
Yet another aspect of the present application provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the above-described aspects.
In addition, a storage medium is further provided, where multiple instructions are stored, and the instructions are suitable for being loaded by a processor to perform the steps in any one of the image detection methods provided in the embodiments of the present application.
In the embodiment of the application, an image detection device acquires a plurality of images to be detected of a detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an image detection method provided in an embodiment of the present application;
FIG. 2 is a schematic flowchart of an image detection method according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart of another image detection method provided in the embodiments of the present application;
FIG. 4 is a schematic structural diagram of an image detection apparatus provided in an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific embodiments shown, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.
The principles of the present application may be employed in numerous other general-purpose or special-purpose computing, communication environments or configurations. Examples of well known computing systems, environments, and configurations that may be suitable for use with the application include, but are not limited to, hand-held telephones, personal computers, servers, multiprocessor systems, microcomputer-based systems, mainframe-based computers, and distributed computing environments that include any of the above systems or devices.
The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The embodiment of the application provides an image detection method, an image detection device and a storage medium.
The image detection device can be integrated in the server, and the accuracy of image identification can be improved by the image detection device.
In some embodiments, the image detection apparatus in the present application may be applied to a loading and unloading port of a logistics transition, for identifying a license plate number of a vehicle at the loading and unloading port, as shown in fig. 1, fig. 1 is an application scene schematic diagram of the image detection method in the embodiments of the present application, a camera in fig. 1 may be installed at a position of the loading and unloading port close to a ceiling, and a lens faces a direction of a vehicle coming, the image detection apparatus in the present application obtains a surveillance video of a license plate position of the truck through the camera, extracts a plurality of images to be detected from the surveillance video, and then performs image detection processing on the plurality of images to be detected to obtain a detection result corresponding to each image to be detected, where the detection result includes a character string; then determining an edit distance matrix corresponding to a plurality of images to be detected according to the character strings; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. And the target character string is a license plate character string in the application scene.
The image detection apparatus in the present application may also be used to detect other objects, such as a doorplate, and the specific application scenario and the detection object are not limited herein.
Referring to fig. 2, fig. 2 is a schematic flowchart of an image detection method according to an embodiment of the present disclosure. The method comprises the following specific processes:
201. and acquiring a plurality of images to be detected of the detection object.
The object to be detected can be a license plate, a doorplate and other objects needing character strings identified through images.
In some embodiments, acquiring a plurality of images to be detected of a detection object includes: acquiring a video to be detected of a detected object; and then acquiring a plurality of images to be detected from the video to be detected.
In some embodiments, a camera may be used to collect a video to be detected of a detected object, a multi-target detection network (SSD) may be used to continuously detect a license plate in the video to be detected, and the detected license plate image and a corresponding confidence thereof may be stored in a queue.
More specifically, acquiring a plurality of images to be detected from a video to be detected includes: determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected; and then determining a plurality of images to be detected from each frame of images to be detected according to the confidence coefficient of each frame of images to be detected and the target pixel area of each frame of images to be detected.
After the confidence of each frame of image to be detected and the target pixel area of each frame of image to be detected are determined from the queue, the confidence of each frame of image to be detected is multiplied by the target pixel area of the corresponding image to be detected to serve as an evaluation index of the image to be detected, then each frame of image to be detected is sequenced, N license plates with the highest scores are taken out to serve as a plurality of images to be detected of an object to be detected in the embodiment of the application and are normalized to be a uniform size, wherein the value of N can be any number from 6 to 10.
202. And respectively carrying out image detection processing on the plurality of images to be detected to obtain a detection result corresponding to each image to be detected.
The method and the device for recognizing the character strings can be used for respectively carrying out image detection processing on the multiple images to be detected based on a deep learning character string recognition method, and storing detailed detection results, wherein the detection results comprise the character strings of each image to be detected.
In some embodiments, the detection result further includes a character, a frame corresponding to the character, and a confidence level of the character.
203. And determining an edit distance matrix corresponding to the plurality of images to be detected according to the character strings.
Specifically, in some embodiments, the edit distance between any two character strings in the character strings is calculated respectively to obtain the edit distance between any two character strings; and then determining the edit distance matrix according to the edit distance.
That is, the edit distance (Levenshtein distance) is calculated between every two character strings of each image to be detected in the multiple images to be detected, wherein the multiple images to be detected in the embodiment of the application can be N images to be detected, and the character strings of the N images to be detected are plateStri(i-0, 1, …, N-1), where the edit distance between string i and string j is Li,j=levenshteinDistance(plateStri,plateStrj)。
Wherein the edit distance is used to quantify the difference between two strings, defined as: the minimum number of times required to edit a single character (modify, insert, delete) when modifying string a to string B.
After the editing distance calculation is carried out on every two character strings in the N character strings, the editing distance between every two character strings can be combined into an editing distance matrix.
204. And determining a target character string corresponding to the detection object according to the editing distance matrix.
In some embodiments, the number of zero elements in each row in the edit distance matrix needs to be determined first; then determining the maximum zero element number with the maximum numerical value from the zero element number of each row; determining whether the maximum number of zero elements is greater than a number threshold; and if the number of the zero elements is larger than the number threshold, determining the character string corresponding to the zero element in the maximum zero element number as a target character string. If the number of the characters is not greater than the number threshold, determining a target character corresponding to each character position according to the characters, the frames corresponding to the characters and the confidence degrees of the characters; and determining a target character string according to the target character corresponding to each character position, wherein the number threshold may be an up-integer of N/2, for example, if N is 8, the number threshold is 4, and if N is 9, the number threshold is 5.
The editing distance matrix is a symmetric square matrix with a main diagonal line of 0, and the number of zero elements in each row represents the number of times that the corresponding character string of the row appears in the N detection results (character strings). If all the detection results are the same, the row is all zero, and the edit distance matrix L is a zero matrix at this time.
If L is a zero matrix, all detection results are the same at this time, the results are directly output, the calculated amount is reduced, otherwise, the number of zero elements of each row of the matrix L needs to be counted, the rows of the matrix L are sorted according to the number of the zero elements, the row with the most zero elements is obtained, and the zero elements of the row with the most zero elements represent the number n of the same detection results.
And judging whether N > ceil (N/2) is established or not, if so, determining the fused detection result to be the N same detection results, and determining the character strings corresponding to the N zero elements as target character strings.
If N > ceil (N/2) is not true, then the target character corresponding to each character position needs to be determined respectively, and then the target character string is determined according to the target character corresponding to each character position.
Specifically, because N images to be detected are normalized, and the detection result further includes each character in each image to be detected, a frame (i.e., bounding-box) of each character, and a confidence of each character, based on these information, we align the characters first, and then vote and sort the characters at the positions of each character to obtain the best character (i.e., target character) at the position, where the step of aligning the characters is as follows:
according to the bounding-box of each character in the detection result, a vertex coordinate x and a character width can be obtained, and then the horizontal coordinate of the center point of the corresponding character is calculated according to the vertex coordinate x and the character width:
then, one detection result is taken from the N detection results (each detection result includes all characters of one image to be detected) as a reference, for example, if the detection result of one image to be detected includes 8 characters, the 8 characters are taken as the reference, then each character in the remaining N-1 detection results is compared with the reference once from left to right, specifically, a threshold is set, and if the minimum value of the distance between the center point of one character and the horizontal coordinate of the center point of one reference character is smaller than the threshold, the character is classified to the corresponding reference character.
When all characters in the N-1 detection results are classified to the reference detection result (namely, when the character corresponding to each character position is determined), character voting of the character position is carried out from left to right according to the character position in the reference detection result, and it is assumed that m characters (C) exist at a certain character position1,C2,…,Cm) For character Ci(i ═ 1,2, …, m), the number of occurrences is l, the character score is:
scoreCi=confidence1+confidence2+…+confidencel;
i.e. character CiAnd adding the corresponding character confidences to obtain the final score of the character, then taking the character with the highest score in the position as the optimal character (target character) of the position, and sequencing the optimal characters of all the character positions according to the sequence from left to right to obtain the target character string (namely the fused detection result).
In the embodiment of the application, an image detection device acquires a plurality of images to be detected of a detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
By using the image detection method, the problem of low image character recognition accuracy caused by poor image imaging quality due to the fact that the camera is far away from a detection object and the illumination of the shooting environment is complex can be effectively solved, the image detection technology can be applied to complex scenes such as a logistics loading and unloading port, the license plate of a vehicle can be automatically detected, and the scheduling management of the vehicle can be improved.
Referring to fig. 3, fig. 3 is another schematic flow chart of an image detection method according to an embodiment of the present application, and a specific flow of the method may be as follows:
301. the image detection equipment acquires a video to be detected of the license plate.
The object to be detected can be a license plate, a doorplate and other objects needing character strings through image recognition.
In some embodiments, a camera may be used to collect a video to be detected of a detected object, an SSD is used to continuously detect a license plate in the video to be detected, and the detected license plate image and a corresponding confidence thereof are stored in a queue.
302. The image detection equipment acquires a plurality of images to be detected from a video to be detected.
Specifically, in some embodiments, the confidence of each frame of the image to be detected in the video to be detected and the target pixel area of each frame of the image to be detected can be determined; and then determining a plurality of images to be detected from each frame of images to be detected according to the confidence coefficient of each frame of images to be detected and the target pixel area of each frame of images to be detected.
After the confidence of each frame of image to be detected and the target pixel area of each frame of image to be detected are determined from the queue, the confidence of each frame of image to be detected is multiplied by the target pixel area of the corresponding image to be detected to serve as an evaluation index of the image to be detected, then each frame of image to be detected is sequenced, N license plates with the highest scores are taken out to serve as a plurality of images to be detected of an object to be detected in the embodiment of the application and are normalized to be a uniform size, wherein the value of N can be any number from 6 to 10.
303. And the image detection equipment respectively carries out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected.
The detection result comprises a character string and a character corresponding to each image to be detected, a frame corresponding to the character and the confidence coefficient of the character.
In this embodiment, the character string is a specific series of license plate numbers, for example, the character string of a certain image to be detected is "yue B8866 x", the character is a character at each character position in the series of license plate numbers, for example, "B", "8", "6" and "x", a frame corresponding to the character is a bounding-box of each character, and the confidence of the character may reflect the definition or the trueness of each character.
Specifically, the embodiment of the application can perform image detection processing on the multiple images to be detected respectively based on a deep learning character string recognition method, and store detailed detection results.
304. The image detection equipment respectively carries out edit distance calculation on any two license plates in the license plates to obtain the edit distance between any two license plates.
That is, the edit distance (Levenshtein distance) is calculated between every two character strings (license plates) of each image to be detected in the plurality of images to be detected, wherein the plurality of images to be detected in the embodiment of the application can be N images to be detected, and the character strings of the N images to be detected are placestri(i-0, 1, …, N-1), where the edit distance between string i and string j is Li,j=levenshteinDistance(plateStri,plateStrj)。
305. The image detection device determines an edit distance matrix from the edit distance.
After the editing distance calculation is carried out on every two character strings in the N character strings, the editing distance between every two character strings can be combined into an editing distance matrix.
The edit distance matrix is a symmetric square matrix with a main diagonal line of 0, and the number of zero elements in each row represents the number of times that the corresponding character string in the row appears in the N detection results (character strings).
306. The image detection device determines the number of zero elements in each row in the edit distance matrix.
After the edit distance matrix is determined, the number of zero elements in each row in the edit distance matrix is counted.
If all the detection results (license plates) are the same, the row is all zero, and the edit distance matrix L is a zero matrix at the moment.
If L is a zero matrix, all detection results are the same at the moment, the results are directly output at the moment, the calculated amount is reduced, and otherwise, the number of zero elements in each row of the matrix L needs to be counted.
307. The image detection device determines the maximum zero element number with the maximum value from the zero element numbers of each line.
After the number of zero elements in each row in the edit distance matrix is determined, the rows of the edit distance matrix L are sorted according to the number of zero elements to obtain a row with the most zero elements, and the number of zero elements in the row (i.e., the maximum number of zero elements) is determined, where the number of zero elements in the row represents the number of the same detection results, for example, the number of zero elements in the row with the most zero elements is n (i.e., the maximum number of zero elements is n).
308. The image detection equipment determines whether the maximum number of zero elements is greater than a number threshold value; if yes, go to step 309, and if not, go to step 310 and 311.
After the maximum zero element number N of the edit distance matrix L is obtained, it is determined whether the maximum zero element number is greater than a number threshold, where the number threshold may be ceil (N/2), which is an upward integer of N/2.
309. And the image detection equipment determines the character strings corresponding to the zero elements in the maximum zero element number as the target license plate.
If n is greater than the number threshold, it indicates that most elements of the row where the largest zero element is located are zero elements, and at this time, the character string corresponding to the zero element in the row can be directly determined as a target license plate (a finally confirmed license plate).
310. And the image detection equipment determines a target character corresponding to each character position according to the characters, the frames corresponding to the characters and the confidence degrees of the characters.
If n is not greater than the number threshold, determining the target license plate according to each character in the picture to be detected, specifically:
because the N images to be detected are normalized, and the detection result further includes each character in each image to be detected, a border of each character (i.e., bounding-box), and a confidence of each character, based on these information, we align the characters first, and then vote and sort the characters at the positions of each character to obtain the optimal characters (i.e., target characters) at the positions, where the step of aligning the characters is as follows:
according to the bounding-box of each character in the detection result, a vertex coordinate x and a character width can be obtained, and then the horizontal coordinate of the center point of the corresponding character is calculated according to the vertex coordinate x and the character width:
then, one detection result is taken from the N detection results (each detection result includes all characters of one image to be detected) as a reference, for example, if the detection result of one image to be detected includes 8 characters, the 8 characters are taken as the reference, then each character in the remaining N-1 detection results is compared with the reference once from left to right, specifically, a threshold is set, and if the minimum value of the distance between the center point of one character and the horizontal coordinate of the center point of one reference character is smaller than the threshold, the character is classified to the corresponding reference character.
When all characters in the N-1 detection results are classified to the reference detection result (namely, when the character corresponding to each character position is determined), character voting of the character position is carried out from left to right according to the character position in the reference detection result, and it is assumed that m characters (C) exist at a certain character position1,C2,…,Cm) For character Ci(i ═ 1,2, …, m), the number of occurrences is l, the character score is:
scoreCi=confidence1+confidence2+…+confidencel;
i.e. character CiAdding the corresponding character confidences to obtain the final score of the character, and then, obtaining the highest score in the positionAs the optimal character (target character) for that position.
311. And the image detection equipment determines a target license plate according to the target character corresponding to each character position.
And sequencing the optimal characters of all the character positions from left to right to obtain the target license plate (namely the fused detection result).
In the embodiment of the application, the image detection equipment acquires a plurality of images to be detected of a detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target license plate corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target license plate corresponding to the detection object is determined by using the edit distance matrix corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
By using the image detection method, the problem of low image character recognition accuracy caused by poor image imaging quality due to the fact that the camera is far away from a detection object and the illumination of the shooting environment is complex can be effectively solved, the image detection technology can be applied to complex scenes such as a logistics loading and unloading port, the license plate of a vehicle can be automatically detected, and the scheduling management of the vehicle can be improved.
That is, in the image detection method in the embodiment of the present application, first, two character strings corresponding to detection objects are compared to obtain an edit distance, an edit distance matrix is constructed, then, zero element properties in the edit distance matrix are used to simplify calculation under specific conditions, and a fused detection result is directly obtained, for example, if the edit distance matrix is an all-zero element matrix, the detection result is directly output at this time, if the edit distance matrix is a non-all-zero element matrix, the detection result is determined according to a row with the most zero elements, if the number of zero elements in the row with the most zero elements is greater than ceil (N/2), the character strings corresponding to the zero elements in the row are fused, the detection result is output, if the number of zero elements in the row with the most zero elements is not greater than ceil (N/2), then a voting strategy based on character positions needs to be performed through a normalized image at this time, and obtaining the optimal character of each character position, and then obtaining a final detection result according to the optimal character of each character position.
In order to better implement the image detection method provided by the embodiment of the present application, an embodiment of the present application further provides an image detection device, and the image detection device may be specifically integrated in a server. The meanings of the nouns are the same as those in the text recognition method, and specific implementation details can refer to the description in the method embodiment.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present application, where the image detection apparatus includes: the acquisition unit 401, the processing unit 402, the first determination unit 403, and the second determination unit 404 are as follows:
an acquiring unit 401, configured to acquire a plurality of images to be detected of a detection object;
the processing unit 402 is configured to perform image detection processing on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, where the detection result includes a character string;
a first determining unit 403, configured to determine, according to the character string, an edit distance matrix corresponding to the multiple images to be detected;
a second determining unit 404, configured to determine, according to the edit distance matrix, a target character string corresponding to the detection object.
In some embodiments, the first determining unit 403 is specifically configured to:
respectively carrying out editing distance calculation on any two character strings in the character strings to obtain the editing distance between any two character strings;
and determining the editing distance matrix according to the editing distance.
Referring to fig. 5, in some embodiments, the second determining unit 404 includes:
a first determining subunit 4041, configured to determine the number of zero elements in each row in the edit distance matrix;
a second determining subunit 4042, configured to determine, from the number of zero elements in each row, the maximum number of zero elements with the largest numerical value;
a third determining subunit 4043, configured to determine whether the maximum number of zero elements is greater than a number threshold;
a fourth determining subunit 4044, configured to determine, when the maximum number of zero elements is greater than the number threshold, a character string corresponding to a zero element in the maximum number of zero elements as the target character string.
In some embodiments, the detection result further includes a character, a frame corresponding to the character, and a confidence of the character, and the apparatus further includes:
a third determining unit 405, configured to determine, when the maximum number of zero elements is not greater than the number threshold, a target character corresponding to each character position according to the character, a frame corresponding to the character, and a confidence of the character;
a fourth determining unit 406, configured to determine the target character string according to the target character corresponding to each character position.
In some embodiments, the third determining unit 405 is specifically configured to:
determining a character position corresponding to each character according to the frame corresponding to the character;
determining characters corresponding to each character position;
determining the score of each character in each character position according to the character corresponding to each character position and the confidence coefficient of the character;
and determining the character with the highest score in each character position as the target character corresponding to each character position.
In some embodiments, the obtaining unit 401 is specifically configured to:
acquiring a video to be detected of a detected object;
and acquiring the plurality of images to be detected from the video to be detected.
In some embodiments, the obtaining unit 401 is further specifically configured to:
determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected;
and determining the multiple images to be detected from each frame of image to be detected according to the confidence coefficient of each frame of image to be detected and the target pixel area of each frame of image to be detected.
In the embodiment of the present application, the obtaining unit 401 obtains a plurality of images to be detected of a detection object; then, the processing unit 402 performs image detection processing on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; the first determining unit 403 determines an edit distance matrix corresponding to a plurality of images to be detected according to the character string; finally, the second determining unit 404 determines a target character string corresponding to the detection object according to the edit distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
Referring to fig. 6, embodiments of the present application provide a server 600, which may include one or more processors 601 of a processing core, one or more memories 602 of a computer-readable storage medium, a Radio Frequency (RF) circuit 603, a power supply 604, an input unit 605, and a display unit 606. Those skilled in the art will appreciate that the server architecture shown in FIG. 5 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 601 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 602 and calling data stored in the memory 602, thereby performing overall monitoring of the server. Optionally, processor 601 may include one or more processing cores; preferably, the processor 601 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601.
The memory 602 may be used to store software programs and modules, and the processor 601 executes various functional applications and data processing by operating the software programs and modules stored in the memory 602.
The RF circuitry 603 may be used for receiving and transmitting signals during the process of transmitting and receiving information.
The server also includes a power supply 604 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 601 via a power management system to manage charging, discharging, and power consumption management functions via the power management system.
The server may also include an input unit 605, and the input unit 605 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The server may also include a display unit 606, and the display unit 606 may be used to display information input by the user or provided to the user, as well as various graphical user interfaces of the server, which may be made up of graphics, text, icons, video, and any combination thereof. Specifically, in this embodiment, the processor 601 in the server loads the executable file corresponding to the process of one or more application programs into the memory 602 according to the following instructions, and the processor 601 runs the application programs stored in the memory 602, thereby implementing various functions as follows:
acquiring a plurality of images to be detected of a detection object;
respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string;
determining an editing distance matrix corresponding to the plurality of images to be detected according to the character string;
and determining a target character string corresponding to the detection object according to the editing distance matrix.
As can be seen from the above, in the embodiment of the present application, the image detection device obtains a plurality of images to be detected of the detection object; then, image detection processing is carried out on the multiple images to be detected respectively to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string; determining an edit distance matrix corresponding to a plurality of images to be detected according to the character string; and finally, determining a target character string corresponding to the detection object according to the editing distance matrix. In the embodiment of the application, the target character strings corresponding to the detection objects are determined by using the edit distance matrixes corresponding to the multiple images to be detected, and the image detection is performed by combining the multiple images to be detected, so that the accuracy of image identification can be improved.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, the present application provides a storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the image detection methods provided in the present application. For example, the instructions may perform the steps of:
acquiring a plurality of images to be detected of a detection object;
respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string;
determining an editing distance matrix corresponding to the plurality of images to be detected according to the character string;
and determining a target character string corresponding to the detection object according to the editing distance matrix.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any image detection method provided in the embodiments of the present application, beneficial effects that can be achieved by any image detection method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The foregoing detailed description is directed to an image detection method, an image detection apparatus, and a storage medium provided in the embodiments of the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the methods and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (10)
1. An image detection method, comprising:
acquiring a plurality of images to be detected of a detection object;
respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, wherein the detection result comprises a character string;
determining an editing distance matrix corresponding to the plurality of images to be detected according to the character string;
and determining a target character string corresponding to the detection object according to the editing distance matrix.
2. The method according to claim 1, wherein the determining the edit distance matrix corresponding to the plurality of images to be detected according to the character string comprises:
respectively carrying out editing distance calculation on any two character strings in the character strings to obtain the editing distance between any two character strings;
and determining the editing distance matrix according to the editing distance.
3. The method according to claim 1, wherein the determining a target character string corresponding to the detection object according to the edit distance matrix comprises:
determining the number of zero elements of each row in the editing distance matrix;
determining the maximum zero element number with the maximum numerical value from the zero element numbers of each line;
determining whether the maximum number of zero elements is greater than a number threshold;
and if the number of the zero elements is larger than the number threshold, determining the character string corresponding to the zero element in the maximum zero element number as the target character string.
4. The method of claim 3, wherein the detection result further includes a character, a frame corresponding to the character, and a confidence of the character, and after determining whether the maximum number of zero elements is greater than a number threshold, the method further includes:
if the number of the characters is not larger than the number threshold, determining a target character corresponding to each character position according to the characters, the frames corresponding to the characters and the confidence degrees of the characters;
and determining the target character string according to the target character corresponding to each character position.
5. The method according to claim 4, wherein the determining a target character corresponding to each character position according to the character, a frame corresponding to the character, and a confidence level of the character comprises:
determining a character position corresponding to each character according to the frame corresponding to the character;
determining characters corresponding to each character position;
determining the score of each character in each character position according to the character corresponding to each character position and the confidence coefficient of the character;
and determining the character with the highest score in each character position as the target character corresponding to each character position.
6. The method according to any one of claims 1 to 5, wherein the acquiring a plurality of images to be detected of the detection object comprises:
acquiring a video to be detected of a detected object;
and acquiring the plurality of images to be detected from the video to be detected.
7. The method according to claim 6, wherein the obtaining the plurality of images to be detected from the video to be detected comprises:
determining the confidence coefficient of each frame of image to be detected in the video to be detected and the target pixel area of each frame of image to be detected;
and determining the multiple images to be detected from each frame of image to be detected according to the confidence coefficient of each frame of image to be detected and the target pixel area of each frame of image to be detected.
8. An image detection apparatus, characterized by comprising:
the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of images to be detected of a detection object;
the processing unit is used for respectively carrying out image detection processing on the multiple images to be detected to obtain a detection result corresponding to each image to be detected, and the detection result comprises a character string;
the first determining unit is used for determining an editing distance matrix corresponding to the multiple images to be detected according to the character strings;
and the second determining unit is used for determining a target character string corresponding to the detection object according to the editing distance matrix.
9. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps in the image detection method according to any one of claims 1 to 7.
10. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the image detection method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910300190.1A CN111832554B (en) | 2019-04-15 | 2019-04-15 | Image detection method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910300190.1A CN111832554B (en) | 2019-04-15 | 2019-04-15 | Image detection method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111832554A true CN111832554A (en) | 2020-10-27 |
CN111832554B CN111832554B (en) | 2024-10-15 |
Family
ID=72914502
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910300190.1A Active CN111832554B (en) | 2019-04-15 | 2019-04-15 | Image detection method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111832554B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003223610A (en) * | 2002-01-28 | 2003-08-08 | Toshiba Corp | Character recognizing device and character recognizing method |
JP2011065646A (en) * | 2009-09-18 | 2011-03-31 | Fujitsu Ltd | Apparatus and method for recognizing character string |
CN102750379A (en) * | 2012-06-25 | 2012-10-24 | 华南理工大学 | Fast character string matching method based on filtering type |
CN103399907A (en) * | 2013-07-31 | 2013-11-20 | 深圳市华傲数据技术有限公司 | Method and device for calculating similarity of Chinese character strings on the basis of edit distance |
US20130314755A1 (en) * | 2012-05-23 | 2013-11-28 | Andrew C. Blose | Image capture device for extracting textual information |
CN103428307A (en) * | 2013-08-09 | 2013-12-04 | 中国科学院计算机网络信息中心 | Method and equipment for detecting counterfeit domain names |
CN103493067A (en) * | 2011-12-26 | 2014-01-01 | 华为技术有限公司 | Method and apparatus for recognizing a character of a video |
CN103996021A (en) * | 2014-05-08 | 2014-08-20 | 华东师范大学 | Fusion method of multiple character identification results |
CN104464736A (en) * | 2014-12-15 | 2015-03-25 | 北京百度网讯科技有限公司 | Error correction method and device for voice recognition text |
CN105930836A (en) * | 2016-04-19 | 2016-09-07 | 北京奇艺世纪科技有限公司 | Identification method and device of video text |
CN106203425A (en) * | 2016-07-01 | 2016-12-07 | 北京旷视科技有限公司 | Character identifying method and device |
CN106847288A (en) * | 2017-02-17 | 2017-06-13 | 上海创米科技有限公司 | The error correction method and device of speech recognition text |
CN107220639A (en) * | 2017-04-14 | 2017-09-29 | 北京捷通华声科技股份有限公司 | The correcting method and device of OCR recognition results |
RU2673015C1 (en) * | 2017-12-22 | 2018-11-21 | Общество с ограниченной ответственностью "Аби Продакшн" | Methods and systems of optical recognition of image series characters |
CN108920580A (en) * | 2018-06-25 | 2018-11-30 | 腾讯科技(深圳)有限公司 | Image matching method, device, storage medium and terminal |
-
2019
- 2019-04-15 CN CN201910300190.1A patent/CN111832554B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003223610A (en) * | 2002-01-28 | 2003-08-08 | Toshiba Corp | Character recognizing device and character recognizing method |
JP2011065646A (en) * | 2009-09-18 | 2011-03-31 | Fujitsu Ltd | Apparatus and method for recognizing character string |
CN103493067A (en) * | 2011-12-26 | 2014-01-01 | 华为技术有限公司 | Method and apparatus for recognizing a character of a video |
US20130314755A1 (en) * | 2012-05-23 | 2013-11-28 | Andrew C. Blose | Image capture device for extracting textual information |
CN102750379A (en) * | 2012-06-25 | 2012-10-24 | 华南理工大学 | Fast character string matching method based on filtering type |
CN103399907A (en) * | 2013-07-31 | 2013-11-20 | 深圳市华傲数据技术有限公司 | Method and device for calculating similarity of Chinese character strings on the basis of edit distance |
CN103428307A (en) * | 2013-08-09 | 2013-12-04 | 中国科学院计算机网络信息中心 | Method and equipment for detecting counterfeit domain names |
CN103996021A (en) * | 2014-05-08 | 2014-08-20 | 华东师范大学 | Fusion method of multiple character identification results |
CN104464736A (en) * | 2014-12-15 | 2015-03-25 | 北京百度网讯科技有限公司 | Error correction method and device for voice recognition text |
CN105930836A (en) * | 2016-04-19 | 2016-09-07 | 北京奇艺世纪科技有限公司 | Identification method and device of video text |
CN106203425A (en) * | 2016-07-01 | 2016-12-07 | 北京旷视科技有限公司 | Character identifying method and device |
CN106847288A (en) * | 2017-02-17 | 2017-06-13 | 上海创米科技有限公司 | The error correction method and device of speech recognition text |
CN107220639A (en) * | 2017-04-14 | 2017-09-29 | 北京捷通华声科技股份有限公司 | The correcting method and device of OCR recognition results |
RU2673015C1 (en) * | 2017-12-22 | 2018-11-21 | Общество с ограниченной ответственностью "Аби Продакшн" | Methods and systems of optical recognition of image series characters |
CN108920580A (en) * | 2018-06-25 | 2018-11-30 | 腾讯科技(深圳)有限公司 | Image matching method, device, storage medium and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN111832554B (en) | 2024-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108898086B (en) | Video image processing method and device, computer readable medium and electronic equipment | |
US8792722B2 (en) | Hand gesture detection | |
US20120027263A1 (en) | Hand gesture detection | |
CN112052186B (en) | Target detection method, device, equipment and storage medium | |
CN112381104B (en) | Image recognition method, device, computer equipment and storage medium | |
CN110378287B (en) | Document direction recognition method, device and storage medium | |
CN111160202A (en) | AR equipment-based identity verification method, AR equipment-based identity verification device, AR equipment-based identity verification equipment and storage medium | |
CN111767908B (en) | Character detection method, device, detection equipment and storage medium | |
EP4113376A1 (en) | Image classification model training method and apparatus, computer device, and storage medium | |
CN111461105A (en) | Text recognition method and device | |
CN113902944A (en) | Model training and scene recognition method, device, equipment and medium | |
CN117671548A (en) | Abnormal sorting detection method and device, electronic equipment and storage medium | |
CN113192081B (en) | Image recognition method, image recognition device, electronic device and computer-readable storage medium | |
CN114299546A (en) | Method and device for identifying pet identity, storage medium and electronic equipment | |
WO2021138893A1 (en) | Vehicle license plate recognition method and apparatus, electronic device, and storage medium | |
CN112532884A (en) | Identification method and device and electronic equipment | |
CN112949672A (en) | Commodity identification method, commodity identification device, commodity identification equipment and computer readable storage medium | |
CN111179218A (en) | Conveyor belt material detection method and device, storage medium and terminal equipment | |
EP4332910A1 (en) | Behavior detection method, electronic device, and computer readable storage medium | |
CN111832554B (en) | Image detection method, device and storage medium | |
CN112214639B (en) | Video screening method, video screening device and terminal equipment | |
CN114170576A (en) | Method and device for detecting repeated images | |
CN113269730A (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN114255321A (en) | Method and device for collecting pet nose print, storage medium and electronic equipment | |
CN113111888A (en) | Picture distinguishing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |