CN115410105A

CN115410105A - Container mark identification method, device, computer equipment and storage medium

Info

Publication number: CN115410105A
Application number: CN202110511830.0A
Authority: CN
Inventors: 毛钺铖
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2022-11-29

Abstract

The application relates to a target mark identification method, a target mark identification device, computer equipment and a storage medium. The method comprises the following steps: acquiring a video stream obtained by monitoring a target; performing image detection processing on video frames in the video stream, and screening out video frames comprising targets from the video frames to serve as target video frames; carrying out target tracking processing based on the target video frame to obtain a running track of the target; and when the current video frame in the target video frames meets the preset target condition based on the running track, carrying out mark identification based on the current video frame to obtain corresponding target mark information. The method can improve the mark identification accuracy.

Description

Container mark identification method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to a method and an apparatus for identifying a target mark, a computer device, and a storage medium.

Background

The port is used as an important cargo aggregation point for logistics transportation, and plays an important role in both international logistics and domestic logistics. Containers are the most common way of storing and packaging goods, and have wide application in port goods management and transportation. Since there are many container trucks entering and leaving a port every year or every day, the entry and exit registration management of containers is essential for the logistics of ports and the management of cargo and security.

Each container has its corresponding tag (located on the side of the container end door) by which the origin of the container can be identified, as well as other information. The manual observation is time-consuming and labor-consuming for registering the container, the cost is high, the traditional mode at present is to intelligently identify and register the container mark through a camera, for example, the traditional computer vision mode is adopted to determine the position of the container, then some common morphological transformations such as binaryzation and swelling corrosion are carried out, and then the extraction of the container mark is carried out by utilizing the principle of statistics or template character matching. However, the traditional computer vision method needs manual design features, has poor stability for complex scenes and has the problem of inaccurate memory identification.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a target mark recognition method, apparatus, computer device, and storage medium capable of improving mark recognition accuracy.

A container tag identification method, the method comprising:

acquiring a video stream obtained by monitoring a target;

performing image detection processing on video frames in the video stream, and screening out video frames comprising targets from the video frames as target video frames;

carrying out target tracking processing based on the target video frame to obtain a running track of the target;

and when the current video frame in the target video frames meets the preset target condition based on the running track, carrying out mark identification based on the current video frame to obtain corresponding target mark information.

A container tag identification arrangement, the arrangement comprising:

the acquisition module is used for acquiring a video stream obtained by monitoring a target;

the screening module is used for carrying out image detection processing on the video frames in the video stream and screening out the video frames comprising the targets from the video frames as target video frames;

the target tracking processing module is used for carrying out target tracking processing based on the target video frame to obtain a running track of the target;

and the mark identification module is used for carrying out mark identification based on the current video frame when the current video frame in the target video frames meets the preset target condition based on the running track, so as to obtain corresponding target mark information.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring a video stream obtained by monitoring a target;

performing image detection processing on video frames in the video stream, and screening out video frames comprising targets from the video frames to serve as target video frames;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring a video stream obtained by monitoring a target;

According to the target mark identification method, the target mark identification device, the computer equipment and the storage medium, the video stream obtained by monitoring the target is subjected to image detection processing, and a target video frame including the target is screened out. And then the target is accurately tracked based on the screened target video frame. And based on the tracking result, a frame of clear image containing a complete target can be automatically and reliably extracted from the video stream, text recognition is carried out, and target mark information is extracted. According to the method and the device, a frame of video frame meeting the preset target condition can be automatically captured in the scene of the monitoring video stream without manually designing features, so that the target mark can be accurately identified based on the video frame. Moreover, the whole process is automatic detection and identification, and the target mark identification efficiency is greatly improved.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of a target mark recognition method;

FIG. 2 is a schematic flow chart diagram illustrating a method for identifying target marks in one embodiment;

FIG. 3 is a flowchart illustrating steps of performing image detection processing on video frames in a video stream and screening the video frames including a target from the video frames as a target video frame according to an embodiment;

FIG. 4 is a schematic diagram comparing video frames containing an incomplete container and video frames containing a complete container in one embodiment;

FIG. 5 is a schematic diagram of a three-point snapshot in one embodiment;

FIG. 6 is a flow diagram illustrating the container tag information extraction step based on the current video frame in one embodiment;

FIG. 7 is a schematic illustration of possible locations of bin type numbers in a container tag in one embodiment;

FIG. 8 is a schematic flow chart diagram of a method for identifying a container tag in one embodiment;

FIG. 9 is a block diagram of an embodiment of an apparatus for identifying object markers;

FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target mark identification method provided by the application can be applied to the application environment shown in fig. 1. Wherein the monitoring device 102 communicates with the computer device 104 over a network. The monitoring device 102 is disposed at a preset position, such as a road intersection, and is used for monitoring a target passing through the preset position in real time. The monitoring device 102 transmits the captured video stream to the computer device 104. The computer device 104 performs image detection processing on video frames in the video stream, and screens out video frames including the target from the video frames as target video frames. The computer device 104 performs target tracking processing based on the target video frame to obtain a target running track; and when the current video frame in the target video frames meets the preset target condition based on the running track, carrying out mark identification based on the current video frame to obtain corresponding target mark information.

The monitoring device 102 may specifically be a monitoring camera. The computer device 104 may specifically be a terminal or a server. The terminal can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server can be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, there is provided a container tag identification method, which is illustrated by applying the method to the computer device in fig. 1, and includes the following steps:

step S202, a video stream obtained by monitoring the target is obtained.

Specifically, a monitoring device may be set at the target passing place in advance. For example, when the target is a transportation vehicle, a monitoring device may be previously disposed at a road gate for cargo transportation, and the monitoring device may monitor vehicles passing through the road gate in real time. The computer equipment can obtain the video stream collected and transmitted by the monitoring equipment.

It should be noted that, one application scenario of the present application is to transport goods by loading containers on vehicles in port logistics. When a vehicle passes through a road gate, it is necessary to register and manage containers entering and exiting the gate. Therefore, it is necessary to provide a monitoring device at the road gate to monitor and manage vehicles entering and exiting the road gate.

Step S204, the video frames in the video stream are subjected to image detection processing, and the video frames including the targets are screened out from the video frames to be used as the target video frames.

Specifically, the computer device may perform image detection processing on each video frame in the video stream, respectively, to obtain an image detection result. The image detection result may specifically be whether a target is detected in the video frame. Further, the computer device may screen out a video frame including the target from the video frames as a target video frame based on the image detection result.

In one embodiment, the target may be a transportation vehicle, and accordingly, the computer device may filter out video frames including the transportation vehicle from the video frames as the target video frame. In one embodiment, the object may be a container vehicle on which a container is loaded. The computer device, upon detecting a container vehicle, further determines whether a complete container is included on the container vehicle. Further, the computer device may screen out video frames from the video frames that include container vehicles and full containers as target video frames.

In one embodiment, the computer device may send the video stream frame-by-frame into the algorithm input interface, first using the object detection algorithm for vehicle detection. Because of the flow vehicles at the gate, the detected vehicles do not necessarily contain a complete container. Therefore, the container integrity judgment of the detected vehicle image block (patch) is also needed. In further embodiments, partial frames of the video stream may be fed into the algorithm input interface, for example, every few frames or randomly screened from the video stream for feeding into the algorithm input interface.

In one embodiment, the computer device may transmit the video frames to a pre-trained image detection model, process the video frames through the image detection model, and output an object bounding box and its associated attributes in each frame, and whether the bounding box contains a complete object. The relevant attributes of the bounding box may specifically include the coordinates, length, width, and confidence of the bounding box. And the bounding box confidence coefficient represents the credibility of the target vehicle in the bounding box selection.

In one embodiment, the image detection model may be trained by including images of complete targets as positive samples and images of non-complete targets or incomplete targets as negative samples. In one embodiment, when the target is a container vehicle, the image detection model is specifically trained by using a vehicle image with a complete container as a positive sample and a vehicle image with an incomplete container as a negative sample. A sample area in which vehicles are marked in the positive sample, and a complete container contained therein; a negative sample with a sample area marked with vehicles and a container containing an incomplete container. Of course, images of vehicles not included may also be used as negative examples for training. In this way, through the training of the positive sample and the negative sample, the image detection model can identify the vehicle and can identify whether the complete container is included on the vehicle.

And step S206, performing target tracking processing based on the target video frame to obtain the running track of the target.

The target tracking processing refers to tracking and marking the same target appearing in different frames, and in the application, the targets appearing in the continuous frames are mainly ensured to be the same object, so that the track movement route of the target can be found out.

In one embodiment, the step S206, namely performing the target tracking process based on the target video frame to obtain the driving track of the target, includes: acquiring position information of an enclosure frame used for selecting a target in a target video frame; performing target tracking processing based on the position information of the surrounding frames to obtain object identifications corresponding to the surrounding frames respectively; and determining the running track of the target based on the position information of the surrounding frame corresponding to the same object identifier in each target video frame.

In some embodiments, the computer device may feed target video frames that include a bounding box of the target into the tracking algorithm, while video frames that are determined not to include the target, or not to include the complete target, are not input into the tracking algorithm. The bounding box containing the target may be a bounding box for selecting the target obtained when the target is detected on the video frame.

In some embodiments, when the target is a container vehicle, the computer device may feed the target video frame containing the bounding box of the complete container into the tracking algorithm, while the video frame identified as an incomplete container is not input into the tracking algorithm. The bounding box containing the complete container may be a bounding box for selecting a marked target vehicle obtained during target detection of the video frame.

Further, the computer device performs target tracking processing on the input target video frame, and returns an object identifier (also referred to as tracking identifier) of the target in the target video. And if the tracked objects in different object video frames are the same object, returning the same track id. In this embodiment, the computer device may use a sort (simple Online And real-time Tracking) Tracking algorithm based on kalman filtering, but may also use other Tracking algorithms, for example, a multi-target Tracking algorithm based on a multi-thread single-target tracker, or a multi-target Tracking algorithm based on deep learning end-to-end, which is not limited in this embodiment.

Further, the computer device may determine the bounding boxes corresponding to the same object identifier in each target video frame to be the same target, and may further determine the driving track of the target based on the position information of the bounding boxes corresponding to the same object identifier.

In some embodiments, to prevent tracking targets from being lost (i.e., to mark the same target as a different object identifier), the present application may not track frames in which targets are lost. The specific method comprises the following steps: for the current detection frame currently processed in the video stream, if the target is detected, the current detection frame can be sent to a tracking algorithm, and the tracking algorithm updates the current tracking target (for example, the tracking target can be a container vehicle). If the tracking target setting object mark of the frame is the same as the previous frame, the frame is indicated as the same target, otherwise, if the detection result is null, the tracking target is not updated, so that the loss of the tracking target can be prevented.

Further, if the number of consecutive preset frames (T) is in _miss Frame), the currently tracked object identifier is not detected, and the tracked target object can be discarded, which indicates that the target leaves the camera area, and the object appearing later is regarded as a new target, and a new object identifier is used. The preset frame number in this embodiment may specifically be a frame number corresponding to an original video frame, for example, T _miss Set to 100.

In one embodiment, the step of performing target tracking processing based on the position information of the bounding boxes to obtain the object identifiers corresponding to the bounding boxes respectively specifically includes: and for the current tracking frame subjected to target tracking processing, when the object identifier determined based on the position information of the surrounding frame in the current tracking frame is the same as the object identifier of the previous frame, taking the object identifier as the object identifier of the current tracking frame, otherwise, determining that the tracking result is empty, and not updating the tracking target. And when the currently tracked object identifier is detected within the continuous preset frame number, stopping tracking the object identifier. And when the surrounding frame appears in the subsequent target video frame, adopting the new object identifier to track the target again.

Therefore, the running track of the same target, namely the position coordinate change information of the tracking target in the target video frame can be acquired in real time by tracking the continuous frames of the target video frame.

In the above embodiment, by performing tracking based on the bounding boxes of the target in the target video frame, the object identifiers corresponding to the respective bounding boxes can be identified. When the bounding boxes correspond to the same object identification, the same object can be considered, so that the tracking of the object can be accurately realized, and the driving track of the object is obtained.

And S208, when the current video frame in the target video frames meets the preset target condition based on the running track, performing mark identification based on the current video frame to obtain corresponding target mark information.

The preset target condition may be understood as a preset image definition condition, and the preset image definition condition may be specifically configured by at least one condition. Specifically, for the same target, based on the form track of the target, the computer device can track the target in the target video frame in real time, the computer device can judge the target video frame in the video stream in real time, and when the current video frame meets the preset image definition condition, the current video frame can be directly captured and acquired, so that the mark identification is carried out based on the current video frame, and the target mark information is obtained.

In one embodiment, the object may be a container vehicle, and the object marker information may be container marker information. For the same container vehicle tracked currently, the computer device can screen out a frame of high-quality video frame from the target video frames to carry out container mark identification, and obtain the container mark information of the container carried on the vehicle passing through the passageway bayonet.

In one embodiment, after the computer device screens out a high-quality current video frame for tag identification, the subsequent video frame under the same object identifier as the target may not be processed any more, and is discarded directly until the object identifier is switched, and then step S208 is repeated to reduce the computational stress of the processor.

In one embodiment, the current video frame meeting the preset target condition may specifically be a video frame with the clearest image within a preset number of frames in which the current video frame is a preamble. The current video frame satisfying the preset target condition may also be a currently most stable frame of video frame determined based on the preamble video frame, and the like. The current video frame meeting the preset target condition may also be a video frame in which the area ratio of the bounding box to the video frame is greater than a preset ratio threshold, or the distance from the border of the bounding box to the border of the video frame is less than a preset distance, or the like.

Further, the computer device may perform OCR (Optical Character Recognition) on the current video frame satisfying the preset target condition to recognize Character information in the current video frame, and detect position information of each text line. Thereby identifying container identification information present on the container in the current video frame. In one embodiment, the computer device may employ a Differential Binarization and a CRNN + CTC (convolutional recurrent neural network + neural network-based temporal classification) network for OCR recognition.

In one embodiment, after OCR recognition is performed on a current video frame to obtain corresponding text information, the computer device may use the text information whose text length, text content, or text position, etc., meet the target mark characteristics as target mark information.

According to the target mark identification method, the video stream obtained by monitoring the target is subjected to image detection processing, and the target video frame including the target is screened out. And then the target is accurately tracked based on the screened target video frame. And based on the tracking result, a frame of clear image containing a complete target can be automatically and reliably extracted from the video stream, text recognition is carried out, and target mark information is extracted. According to the method and the device, a frame of video frame meeting the preset target condition can be automatically captured in the scene of the monitoring video stream without manually designing features, so that the target mark can be accurately identified based on the video frame. Moreover, the whole process is automatic detection and identification, and the target mark identification efficiency is greatly improved.

In one embodiment, the step S204, namely, performing image detection processing on the video frames in the video stream, and screening out a video frame including a target from the video frames as a target video frame, includes:

step S302, each video frame in the video stream is respectively subjected to target detection to obtain candidate video frames comprising targets.

In some embodiments, the computer device may input each video frame in the video stream after being pre-processed (e.g., resized to a particular size) into the object detection model. The target detection model may specifically be a model such as YOLO (a target detection algorithm), YOLO 2, YOLO 3, SSD (Single Shot multi box Detector), fast-RCNN (a target detection algorithm based on a neural network), and CenterNet (an object detection algorithm). And then whether the video frame comprises the target or not can be detected through the target detection model.

In some embodiments, the computer device may resize each video frame and input the resized video frame to the object detection model, and output one or more detection boxes and attribute information of each detection box, such as coordinates, length, width, confidence, and the like, through the object detection model. Furthermore, for each video frame, the object detection model may filter out one of the detection frames as a bounding box according to a specific criterion. In this embodiment, the specific criterion may be a detection box with the largest area, which serves as a surrounding box in the video frame for determining the target. In other embodiments, the detection box with the highest confidence may be selected as the bounding box. The embodiments of the present application do not limit this.

For example, the computer device may input the video frame size adjusted to 416 × 416 using an object detection model formed by YOLOv3 object detection network. The confidence threshold is set to 0.5 (for determining the reliability of the detected object belonging to a certain class, and if the confidence is lower than the threshold, the detection is discarded), the non-maximum suppression is set to 0.3 (for processing the overlapped detection frames bounding box, if the confidence is higher than the value, the detection frame is determined to be the same object, and the redundant detection frames are discarded), and the one or more detection frames including the object detected by the object detection model and the attribute information of the detection frame, such as the coordinates, the length, the width, the confidence and the like, are output as the result. Furthermore, the computer device may filter the result, and each frame of the video frame may filter out one of the plurality of detection frames as a subject target according to a specific standard, that is, filter out the surrounding frame. In this embodiment, the specific criterion may be a bounding box in which the area is the largest, i.e., a subject object of the video frame.

It should be noted that, in a road gate scene, when container vehicles need to be identified, because one gate generally has only a lane width allowing one vehicle to travel, in this case, it may be default that a subject target in a video frame is a target vehicle to be detected, and even if the subject target is not the target vehicle, there is no influence, because the target integrity determination algorithm in the subsequent steps also performs the determination and removal on the subject target.

And step S304, inputting the candidate video frame into a target integrity judgment model, processing the candidate video frame through the target integrity judgment model, and outputting a target integrity judgment result.

The target integrity judgment model is used for judging the candidate video frames detected in the preamble step and determining whether the video frames of the detected target comprise an integral whole target object. In one embodiment, when the target is a container vehicle and the target mark information is container mark information, the target integrity discrimination model may specifically be a container integrity discrimination model, and the container integrity discrimination is to determine whether the candidate video frames detected in the preceding step include a complete whole container, that is, the container includes complete mark information.

In one embodiment, the computer device may use the classification network to construct a target integrity discrimination model, such as densenet (dense network), vgg-net (computer vision network), resnet (residual network), and the like. And then training the target integrity judgment model through the training sample to obtain the trained target integrity judgment model. And classifying the candidate video frames through the target integrity judgment model, and outputting a target integrity judgment result.

In one embodiment, when the target is a container vehicle, the corresponding target integrity discriminant model may specifically be a container integrity discriminant model. The computer device may use the classification network to construct a container integrity discrimination model, such as densenet (dense network), vgg-net (computer vision network), resnet (residual network), and the like. And then training the container integrity judgment model through the training sample to obtain the trained container integrity judgment model. And classifying the candidate video frames through the container integrity judgment model, and outputting a container integrity judgment result. The container integrity judgment result may specifically include a result indicating that a complete container is present, and a result indicating that an incomplete container is present.

Referring to fig. 4, the left side of fig. 4 is a display diagram of a video frame containing an incomplete container, and the right side of fig. 4 is a display diagram of a video frame containing a complete container.

In one embodiment, the computer device may perform size adjustment on bounding boxes in the candidate video frames, and after the bounding boxes are adjusted to a preset size (e.g., 224 × 224 size), input image blocks corresponding to the bounding boxes into the target integrity judgment model to perform target integrity judgment, and output complete or incomplete and target integrity confidence. Therefore, the interference of other irrelevant information in the video frame can be removed, and the judgment accuracy is improved.

In some embodiments, the computer device may train the container integrity discrimination model by: first, data (training samples and validation/test data) was collected: video data is collected at a gate, a main target (vehicle) is detected by using a target detection network, and an image block of the main target (vehicle) is stored in a magnetic disk. Because the vehicle entrance is a continuous moving object in the video, when the vehicle enters the entrance, the container enters the camera from incomplete to complete and then from incomplete to incomplete in a progressive way, namely, the boundary point of the video frames of the incomplete (abnormal) container and the complete (normal) container is easily determined, and the vehicle exits the entrance in the same way. Therefore, whether the video frame images are completely marked in batches can be rapidly marked (for the purposes of the application, the side containing the complete container marking information is complete, namely the side is considered to be complete). In addition, complete images of some other container vehicles may also be collected and added to the training set as positive samples. The negative sample can randomly cut the positive sample image except the collected incomplete container image, and the positive sample is the negative sample (the container is cut) after being cut, so that data enhancement is realized, and the sample space of the training set is enlarged.

Further, the computer device may use the resnet50 and train with positive and negative examples, test the trained model with validation data, and may end the training when the end condition is satisfied. The termination condition includes any one of the following conditions: 1) The judging accuracy of the trained container integrity judging model exceeds a certain threshold, such as 98%; 2) The difference value of the judging accuracy of the container integrity judging model after two or more continuous training is smaller than a threshold value, for example, smaller than 0.1%; 3) The difference in the loss function of the container integrity discrimination model after two or more consecutive training is less than a threshold, e.g., less than 0.1%. 4) Stopping training after the number of times of training the models by using all the training sets reaches a certain threshold value, and selecting the model with the best effect or the model after the last training.

And S306, screening candidate video frames corresponding to the target integrity judgment result representing the targets including the complete targets to serve as target video frames.

Specifically, the computer device may filter candidate video frames corresponding to the target integrity discrimination result characterizing the complete target to serve as target video frames.

In the above embodiment, candidate video frames including the target are screened out from the video stream through target detection. When the target is in a motion state, the detected target does not necessarily contain a complete mark pattern, so that the integrity of the target is identified for the detected target, and the video frames including the complete target are screened out to be used as the video frames for subsequent tracking, so that the situation that candidates are identified based on the complete mark pattern during the mark identification can be ensured, and the identification accuracy is improved.

In one embodiment, when it is determined that a current video frame in the target video frames meets a preset target condition based on the driving track, performing marker identification based on the current video frame to obtain corresponding target marker information includes: determining a current video frame in the target video frame and a preamble frame before the current video frame according to the driving track; the preamble frame includes a first video frame preceding the current video frame and a second video frame preceding the first video frame; determining whether the current video frame meets a preset target condition according to target related information corresponding to the first video frame, the second video frame and the current video frame; the target related information comprises at least one of position information corresponding to the target and a target complete confidence coefficient; and when the current video frame meets the preset target condition, performing mark identification based on the current video frame to obtain corresponding target mark information.

Specifically, the computer device may determine a current video frame of the target video frames and a previous frame before the current video frame according to the travel track of the target. The preamble frame specifically includes a first video frame preceding the current video frame and a second video frame preceding the first video frame. Wherein the first video frame may be a previous frame to the current frame, or a previous N frames; the second video frame may be a previous frame or a previous N frames before the first video frame, and the like, which is not limited in this embodiment.

Furthermore, the computer device can determine whether the current video frame meets a preset target condition according to the target related information corresponding to the first video frame, the second video frame and the current video frame; the target related information comprises at least one of position information corresponding to the target and target integrity confidence. And when the current video frame meets the preset target condition, performing mark identification based on the current video frame to obtain corresponding target mark information.

In one embodiment, when the target is a container vehicle, the computer device may determine whether the current video frame meets a preset target condition according to container-related information corresponding to each of the first video frame, the second video frame, and the current video frame; the container related information comprises at least one of position information corresponding to the container and container integrity confidence; and when the current video frame meets the preset target condition, carrying out container marking identification based on the current video frame to obtain corresponding container marking information.

In one embodiment, the position information corresponding to the target includes a preset position point for selecting a bounding box of the target vehicle, and the preset target condition includes at least one of the following conditions: (condition 1) an included angle between a first vector formed by a first preset position point corresponding to the first video frame and a second preset position point corresponding to the second video frame and a second vector formed by the second preset position point and a third preset position point corresponding to the current video frame is less than or equal to a preset included angle threshold; (condition 2) the projection distance from the third preset position point to the second preset position point is greater than the distance from the first preset position point to the second preset position point; and (condition 3) the target complete confidence corresponding to the current video frame is greater than or equal to a preset confidence threshold.

It should be noted that when the target is in a moving state, the driving track of the target can be obtained by tracking, but if a stable and high-quality video frame image is to be captured in real time, the target needs to determine whether capturing is needed in the tracked process, that is, the current video frame needs to be judged, and the track cannot be completely driven.

For container identification scenarios, the purpose of the scenario is to identify the markers on the container, and the cameras of the monitoring devices located at the road junction typically take images from the back of the vehicle; when the vehicle is completely present in the camera area, the earlier the vehicle appears, the closer the vehicle is to the camera, so that the vehicle needs to be snapshotted as soon as possible and cannot wait for the vehicle to go the farther away.

In order to obtain more stable snapshot, whether the current video frame is obtained through snapshot can be determined by judging whether the current video frame meets at least one of the following three conditions. As shown in fig. 5, a schematic diagram of a three-point snapshot system is shown: the method comprises the following steps that ABCDE tracks the position of a center point of five frames for a target, A- > B- > C- > D- > E tracks of the five frames, namely the time of occurrence of A is earlier than that of B, B is earlier than that of C, and the like. The center point position may be specifically a position coordinate of a center point of a bounding box for selecting the target in the video frame in the foregoing embodiment, or other feature points of the bounding box, such as a vertex end point, and the like, which is not limited in this embodiment of the application.

Assuming that the current video frame is the moment of C occurrence, A->B->The C trace, D and E belong to future time instants and do not occur. Then, the point C is a third preset position point corresponding to the current video frame; the point B is a first preset position point corresponding to the first video frame; the point A is a second preset position point corresponding to the second video frame. Taking position information of A, B, C, calculating an included angle between a first vector AB and a second vector AC, wherein the included angle is smaller than theta _th A threshold value (condition 1); the perpendicular Line (vertical Line in FIG. 5) of the straight Line AB is drawn through A, and the perpendicular distance (projection distance) from the point C to the Line A and the distance between the two points AB are respectively calculated and recorded as d _CA And d _AB ，d _CA Greater than d _AB (condition 2); the target complete confidence corresponding to point C in FIG. 5 is greater than the preset confidence threshold C _th Or, confidence level thresholds are preset for target complete confidence levels corresponding to the A, B and the C (condition 3). And if at least one of the condition 1, the condition 2 and the condition 3 is met, capturing the current video frame by snapshot. It can be understood that when the condition 1, the condition 2, and the condition 3 are simultaneously satisfied, the image quality of the current video frame is better. The current video frame is the image frame where C is located, if the current video frame does not meet the requirement, the frame corresponding to D appears at the next moment, and B->C->The three-point trajectory repeatedly calculates condition 1, condition 2, and condition 3. And taking time or frames as a sequence of independent variables, and capturing until a condition is met. After the current video frame meeting the conditions is obtained through snapshot, the object identification of the same tracking appearing behind is not snapshot any more, and is directly discarded, so that each tracked object identification is guaranteed to be snapshot only once. In one embodiment, θ _th Is set to 15 DEG, c _th The value is 0.9, but other values may be used, and the present application is not limited thereto.

In one embodiment, the above conditions 1 and 2 are to ensure that the currently tracked video frame is a stable and clear video frame, so as to facilitate the subsequent container mark identification.

In one embodiment, for a scene shot by a camera from the front, a snapshot as late as possible is required for extracting information, and a three-point snapshot method (counting 3 frames from the front after the next tracked object identifier appears) can be applied, so that video frames meeting the conditions are extracted as fast and accurately as possible.

In the above embodiment, according to the target related information corresponding to each of the first video frame, the second video frame, and the current video frame, whether the current video frame meets the preset target condition can be accurately and quickly determined. Because the target related information comprises at least one of the position information corresponding to the target and the target complete confidence coefficient, the high-quality current video frame can be screened from the video frames to identify the target mark, and the mark identification accuracy can be greatly improved.

The following description will be made in detail by taking the object as a container vehicle and the tag information as a container number as an example. In one embodiment, the container tag information includes information of at least one information category, and the identifying of the container tag based on the current video frame to obtain the corresponding container tag information includes: performing image character recognition on a current video frame to obtain a plurality of candidate text lines; for each identified candidate text line, determining the text length corresponding to the corresponding candidate text line; matching the text length corresponding to each candidate text line with a preset length, and adding the candidate text line to a candidate set corresponding to an information category based on the information category corresponding to the matched preset length; and screening out target text lines which accord with the container marking characteristics from the candidate sets respectively corresponding to the information categories, and obtaining corresponding container marking information based on the screened target text lines.

The information category may specifically include a check code, a box owner code, a registration code, and a box code. The preset length corresponding to the check bit is a first value (for example, a value 1), the preset length corresponding to the box main code and the box code is a second value (for example, a value 4), the preset length corresponding to the registration code is a third value (for example, a value 6), the preset length further includes a fourth value (for example, a value 7), and the fourth value is the sum of the first value and the third value.

Specifically, the computer device may perform image character recognition on the current video frame to obtain a plurality of candidate text lines and content corresponding to each candidate text line. For each text line identified, the text length of the text line is determined. And when the text length of the candidate text line is a first value, adding the candidate text line to the candidate check code set corresponding to the first value. And when the text length of the candidate text line is a second value, respectively searching and matching based on the box main code table and the box code table, when the matching in the box main code table is successful, adding the candidate text line into the candidate box main code set, and when the matching in the box code table is successful, adding the candidate text line into the candidate box code set. And when the text length of the candidate text line is a third value and the text contents are all numbers, adding the candidate text line to the candidate registration code set. And splitting the candidate text line when the text length of the candidate text line is a fourth value, and respectively executing a candidate set adding step corresponding to the first value and a candidate set adding step corresponding to the third value based on the splitting result. And when the text length of the candidate text line does not correspond to any preset length, discarding the candidate text line. For each candidate text line, the candidate text line may be added to the corresponding candidate set in the manner described above.

Furthermore, the computer device can screen out a target text line which accords with the container marking characteristics from the candidate set respectively corresponding to each information category, and obtain corresponding container marking information based on the screened target text line.

In one embodiment, screening out a target text line meeting the container marking characteristics from candidate sets respectively corresponding to the information categories, and obtaining corresponding container marking information based on the screened target text line, includes: performing nested traversal on a candidate box main code set corresponding to the box main code and a candidate registration code set corresponding to the registration code; calculating a corresponding test check code based on the combination of each nesting traversal, and searching the test check code in a candidate check code set corresponding to the check code; when the test check code is found in the candidate check code set, and the candidate text line where the box main code and the box type code in the current traversed combination are located and the candidate text line corresponding to the test check code meet the collinear condition, taking the box main code in the current combination as a target box main code, taking the registration code in the current combination as a target registration code, and taking the test check code as the target check code; taking the box type code satisfying the position corresponding relation with the target box main code or the target registration code in the candidate box type code set corresponding to the box type code as the target box type code; and taking the target box master code, the target registration code, the target verification code and the target box code as container mark information corresponding to the container together.

It should be noted that the candidate text line where the box main code and the box code in the currently traversed combination are located and the candidate text line corresponding to the test check code satisfy the collinear condition, and specifically, the collinear condition may be satisfied by the center points of the candidate text lines where the three codes are located. The central points of the candidate text lines of the three codes are collinear, the area of a triangle formed by connecting the three central points end to end can be directly calculated, and if the area is smaller than a threshold value A _th And collinear.

The box type code and the target box owner code or the target registration code satisfy the corresponding relationship of the position, and specifically, the candidate text line where the box type code is located is below the text line where the target box owner code is located, or the candidate text line where the box type code is located is below the text line where the target box owner code is located.

Therefore, the target text lines which accord with the container marking characteristics can be accurately and quickly screened out from the candidate sets respectively corresponding to the information categories, and the corresponding container marking information can be obtained based on the screened target text lines.

In a specific embodiment, referring to fig. 6, fig. 6 is a schematic flow chart of the container tag information extraction step based on the current video frame in one embodiment. As shown in fig. 6, the step of extracting container tag information based on the current video frame may specifically include the following steps:

a) And selecting one candidate text line individual from the OCR recognition result, if the confidence coefficient of the candidate text line individual is greater than a threshold value (such as 0.4), continuing to the next step, and otherwise, discarding the candidate text line individual.

b) Spaces in the text recognition content are removed.

c) Checking the text length of the candidate text line to be equal to 1 or 4 or 6 or 7, then continuing to the next step, otherwise discarding it.

d) If the text length is equal to 1 and the content is a number, a set of candidate check codes is added.

e) If the text length is equal to 4, checking a box main code table, if the text length is not found, continuously searching in the box code table, if the text length is not found, searching again, calculating an editing distance while searching, if a replacement error in the editing distance is less than or equal to a preset value, such as less than or equal to 1, correcting errors according to a preset error correction table, if the editing distance is zero after error correction, successfully matching, if the box main code is successfully matched, adding a candidate box main code set, if the box code is matched, adding a candidate box code set, and if the box code is not successfully matched, discarding the candidate box code set.

f) If the text length is equal to 6 and the content does not contain letters, then the set of candidate registration codes is added.

g) If the text length is equal to 7, splitting according to 6-1, and respectively executing d) and f).

h) And performing nested traversal on the obtained candidate box main code set and the candidate registration code set, calculating a test check code corresponding to the combination of the current traversal, and if the calculated test check code is found in the candidate check code set and the central points of text lines where the three codes are located are collinear, finding out the correct target box main code, the target registration code and the target check code.

i) The currently traversed location of the box code is checked, since the box code may appear below the master code (minority) or the registration code (majority), as shown in fig. 7, which is a schematic diagram of the possible locations of the box codes in the container label in one embodiment. Therefore, the distance (AB) from the upper left corner to the lower left corner of the registration code/the distance (AC) from the upper left corner to the upper left corner of the box-type code of the registration code or the Distance (DE) from the upper left corner to the lower left corner of the box-type code/the Distance (DF) from the upper left corner to the upper left corner of the box-type code of the main code are checked, and if one of the distances is greater than a threshold value, for example, 0.7, the target box-type code is obtained.

In one embodiment, the above algorithm for matching box master and box codes is described in more detail with reference to the following table 1:

table 1 logic schematic table for matching box main code and box code

In the above embodiment, the text length corresponding to the candidate text line in the current video frame is matched with the preset length corresponding to each information category, so that the candidate text lines are added to the corresponding candidate sets respectively, the interference information with inconsistent numbers can be quickly and accurately screened out, and then the target text line meeting the container marking characteristics is screened out based on the candidate sets, so that the screening efficiency can be improved, and the screening accuracy is improved.

In a specific embodiment, the target is a container vehicle and the tag information is container tag information. Referring to fig. 8, fig. 8 is a flow chart illustrating a method for identifying a container tag according to an embodiment. The scheme provided by the application is a mode for automatically identifying the bayonet container mark, the input is a video stream (frame) of a common monitoring camera (a non-snapshot camera), and the output is that the mark information of the container loaded by the camera is identified through one vehicle, so that the mark information of the container is automatically acquired. As shown in fig. 8, taking the target vehicle as a truck as an example, the video stream is sent into the algorithm input interface frame by frame, firstly, the target detection algorithm is used to perform truck detection, and then container integrity judgment is performed on the detected truck image block (patch), because the truck is a mobile vehicle at a gate, the detected truck does not necessarily contain a complete container, and then, whether the detected truck contains a complete container or not can be identified; when the truck completely enters the monitoring range of the camera, the truck is tracked by a single target, an optimal frame is selected from tracking results to be used as a snapshot result of the container truck, namely a truck image block containing a complete container, then the image block is sent to an OCR detection recognition algorithm to carry out character positioning and content recognition, and finally, a verification and filtering algorithm is used for extracting the container mark information. According to the method and the device, the container truck can be tracked under the video stream, a frame of clear image containing the complete container can be automatically and reliably extracted from the video stream, OCR recognition is carried out, and mark extraction is carried out, so that the efficiency is improved; the OCR recognition marking information mode in the application can be suitable for more complex scenes, such as inclination angle monitoring, and has an error correction function.

It should be understood that although the various steps in the flowcharts of fig. 2, 3, 6 and 8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 3, 6 and 8 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in FIG. 9, there is provided an object marker recognition device 900 comprising: an acquisition module 901, a screening module 902, a target tracking module 903, and a marker identification module 904, wherein:

an obtaining module 901, configured to obtain a video stream obtained by monitoring a target.

The screening module 902 is configured to perform image detection processing on video frames in a video stream, and screen out a video frame including a target from the video frames as a target video frame.

And the target tracking processing module 903 is configured to perform target tracking processing based on the target video frame to obtain a driving track of the target.

And a mark identification module 904, configured to perform mark identification based on the current video frame when it is determined that the current video frame in the target video frames meets a preset target condition based on the driving track, so as to obtain corresponding target mark information.

In one embodiment, the screening module 902 is further configured to perform target detection on each video frame in the video stream, so as to obtain candidate video frames including targets; inputting the candidate video frame into a target integrity judging model, processing the candidate video frame through the target integrity judging model, and outputting a target integrity judging result; and screening candidate video frames corresponding to the target integrity judgment result with the complete target included in the characterization as target video frames.

In one embodiment, the target tracking processing module 903 is further configured to obtain position information of a bounding box for the selected target in the target video frame; performing target tracking processing based on the position information of the surrounding frames to obtain object identifications corresponding to the surrounding frames respectively; and determining the running track of the target based on the position information of the surrounding frame corresponding to the same object identifier in each target video frame.

In one embodiment, the mark identifying module 904 is further configured to determine, according to the driving track, a current video frame in the target video frames and a previous frame before the current video frame; the preamble frame includes a first video frame preceding the current video frame and a second video frame preceding the first video frame; determining whether the current video frame meets a preset target condition according to target related information corresponding to the first video frame, the second video frame and the current video frame; the target related information comprises at least one of position information corresponding to the target and a target complete confidence coefficient; and when the current video frame meets the preset target condition, performing mark identification based on the current video frame to obtain corresponding target mark information.

In one embodiment, the position information corresponding to the container includes a preset position point of a bounding box for selecting the target vehicle, and the preset target condition includes at least one of the following conditions: an included angle between a first vector formed by a first preset position point corresponding to the first video frame and a second preset position point corresponding to the second video frame and a second vector formed by the second preset position point and a third preset position point corresponding to the current video frame is less than or equal to a preset included angle threshold value; the projection distance from the third preset position point to the second preset position point is greater than the distance from the first preset position point to the second preset position point; and the target complete confidence corresponding to the current video frame is greater than or equal to a preset confidence threshold.

In one embodiment, the target includes a container vehicle, the tag information includes container tag information, the container tag information includes information of at least one information category, and the tag identification module 904 is further configured to perform image text recognition on the current video frame to obtain a plurality of candidate text lines; for each identified candidate text line, determining the text length corresponding to the corresponding candidate text line; matching the text length corresponding to each candidate text line with a preset length, and adding the candidate text line to a candidate set corresponding to an information category based on the information category corresponding to the matched preset length; and screening out target text lines which accord with the container marking characteristics from the candidate sets respectively corresponding to the information categories, and obtaining corresponding container marking information based on the screened target text lines.

In one embodiment, the information category comprises a check code, a box owner code, a registration code and a box code; the mark identification module 904 is further configured to perform nested traversal on a candidate box main code set corresponding to the box main code and a candidate registration code set corresponding to the registration code; calculating a corresponding test check code based on the combination of each nesting traversal, and searching the test check code in a candidate check code set corresponding to the check code; when the test check code is found in the candidate check code set, and the candidate text line where the box main code and the box code in the current traversed combination are located and the candidate text line corresponding to the test check code meet the collinear condition, taking the box main code in the current combination as a target box main code, taking the registration code in the current combination as a target registration code, and taking the test check code as the target check code; taking the box type code satisfying the position corresponding relation with the target box main code or the target registration code in the candidate box type code set corresponding to the box type code as the target box type code; and taking the target box master code, the target registration code, the target verification code and the target box code as container mark information corresponding to the container together.

The target mark recognition device screens out a target video frame including a target by performing image detection processing on a video stream obtained by monitoring the target. And then the target is accurately tracked based on the screened target video frame. And based on the tracking result, a frame of clear image containing a complete target can be automatically and reliably extracted from the video stream, and text recognition is carried out so as to extract target mark information. According to the method and the device, a frame of video frame meeting the preset target condition can be automatically captured in the scene of the monitoring video stream without manually designing features, so that the target mark can be accurately identified based on the video frame. Moreover, the whole process is automatic detection and identification, and the target mark identification efficiency is greatly improved.

For the specific definition of the target mark recognition device, reference may be made to the above definition of the target mark recognition method, which is not described herein again. The modules in the target mark recognition device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal or a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store container tag information. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a container tag identification method.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of target mark identification, the method comprising:

acquiring a video stream obtained by monitoring a target;

2. The method according to claim 1, wherein the performing image detection processing on the video frames in the video stream, and screening out a video frame including a target from the video frames as a target video frame comprises:

respectively carrying out target detection on each video frame in the video stream to obtain candidate video frames comprising targets;

inputting the candidate video frame into a target integrity discrimination model, processing through the target integrity discrimination model, and outputting a target integrity discrimination result;

and screening candidate video frames corresponding to the target integrity judgment result representing the complete target to serve as target video frames.

3. The method according to claim 1, wherein the performing target tracking processing based on the target video frame to obtain a driving track of the target comprises:

acquiring position information of an enclosure frame used for selecting a target in the target video frame;

performing target tracking processing based on the position information of the surrounding frame to obtain object identifications corresponding to the Bao Weikuang respectively;

and determining the driving track of the target based on the position information of the surrounding frame corresponding to the same object identifier in each target video frame.

4. The method according to claim 1, wherein when it is determined that a current video frame of the target video frames meets a preset target condition based on the driving track, performing marker recognition based on the current video frame to obtain corresponding target marker information comprises:

determining a current video frame in the target video frames and a preamble frame before the current video frame according to the running track; the preamble frame comprises a first video frame preceding the current video frame and a second video frame preceding the first video frame;

determining whether the current video frame meets a preset target condition according to target related information corresponding to the first video frame, the second video frame and the current video frame; wherein the target related information comprises at least one of position information corresponding to the target and a target complete confidence level;

and when the current video frame meets the preset target condition, performing mark identification based on the current video frame to obtain corresponding target mark information.

5. The method according to claim 4, wherein the position information corresponding to the target comprises a preset position point of a bounding box for selecting the target vehicle, and the preset target condition comprises at least one of the following conditions:

an included angle between a first vector formed by a first preset position point corresponding to a first video frame and a second preset position point corresponding to a second video frame and a second vector formed by the second preset position point and a third preset position point corresponding to the current video frame is smaller than or equal to a preset included angle threshold value;

the projection distance from the third preset position point to the second preset position point is greater than the distance from the first preset position point to the second preset position point; and

and the target complete confidence corresponding to the current video frame is greater than or equal to a preset confidence threshold.

6. The method of claim 1, wherein the target comprises a container vehicle, the tag information comprises container tag information, the container tag information comprises information of at least one information category, and the tag identification based on the current video frame to obtain corresponding target tag information comprises:

performing image character recognition on the current video frame to obtain a plurality of candidate text lines;

for each identified candidate text line, determining the text length corresponding to the corresponding candidate text line;

matching the text length corresponding to each candidate text line with a preset length, and adding the candidate text line to a candidate set corresponding to an information category based on the information category corresponding to the matched preset length;

and screening target text lines which accord with the container marking characteristics from the candidate sets respectively corresponding to the information categories, and obtaining corresponding container marking information based on the screened target text lines.

7. The method of claim 6, wherein the categories of information include a check code, a box owner code, a registration code, and a box code; screening out a target text line which accords with the container marking characteristics from the candidate set respectively corresponding to each information category, and obtaining corresponding container marking information based on the screened target text line, wherein the method comprises the following steps:

performing nested traversal on a candidate box main code set corresponding to the box main code and a candidate registration code set corresponding to the registration code;

calculating a corresponding test check code based on the combination of each nesting traversal, and searching the test check code in a candidate check code set corresponding to the check code;

when the test check code is found in the candidate check code set, and the candidate text line where the box main code and the box type code in the currently traversed combination are located and the candidate text line corresponding to the test check code meet the collinear condition, taking the box main code in the current combination as a target box main code, taking the registration code in the current combination as a target registration code, and taking the test check code as the target check code;

taking the box type code meeting the position corresponding relation with the target box main code or the target registration code in the candidate box type code set corresponding to the box type code as a target box type code;

and the target box owner code, the target registration code, the target verification code and the target box code are jointly used as container mark information corresponding to the container loaded on the container vehicle.

8. An object marker recognition apparatus, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.