CN115359471A

CN115359471A - Image processing and joint detection model training method, device, equipment and storage medium

Info

Publication number: CN115359471A
Application number: CN202210835360.8A
Authority: CN
Inventors: 陈科桦; 倪子涵; 孙逸鹏; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2022-11-18

Abstract

The disclosure provides an image processing and joint detection model training method, device, equipment and storage medium, relates to the technical field of artificial intelligence, specifically to the technical field of deep learning, image processing and computer vision, and can be applied to scenes such as OCR (optical character recognition), license plate desensitization and the like. The image processing method comprises the following steps: inputting an image to be processed into a joint detection model for image processing, and synchronously obtaining a first area where a vehicle is located and a second area of a vehicle target component; acquiring a third area positioned in the first area from the second area; and determining a fourth area comprising the license plate according to the third area. The method and the device can improve the accuracy of license plate positioning.

Description

Image processing and joint detection model training method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning, image processing, and computer vision technologies, which may be applied to scenes such as Optical Character Recognition (OCR), license plate desensitization, and in particular, to a method, an apparatus, a device, and a storage medium for training an image processing and joint detection model.

Background

Vehicle-mounted data is an important data source for making and updating, generally comprises a license plate image, and in order to protect data privacy and safety, license plate desensitization needs to be carried out on the license plate image.

License plate desensitization requires locating the position of the license plate in order to desensitize its text based on the position of the license plate.

Disclosure of Invention

The disclosure provides an image processing and model training method, device, equipment and storage medium.

According to an aspect of the present disclosure, there is provided an image processing method including: inputting an image to be processed into a joint detection model for image processing, and synchronously obtaining a first area where a vehicle is located and a second area of a vehicle target component; acquiring a third area positioned in the first area from the second area; and determining a fourth area comprising the license plate according to the third area.

According to another aspect of the present disclosure, there is provided a joint detection model training method, including: inputting the image sample into a joint detection model for image processing, and synchronously obtaining first prediction area information of the vehicle and second prediction area information of a vehicle target component; constructing a loss function based on the first prediction region information and first real region information of the license plate, and the second prediction region information and second real region information of the vehicle target component; based on the loss function, model parameters of the joint detection model are adjusted.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: the first determination module is used for inputting the image to be processed into the joint detection model for image processing, and synchronously obtaining a first area where the vehicle is located and a second area of the vehicle target component; the second determining module is used for acquiring a third area positioned in the first area from the second area; and the third determining module is used for determining a fourth area comprising the license plate according to the third area.

According to another aspect of the present disclosure, there is provided a joint detection model training apparatus, including: the determining module is used for inputting the image sample into the joint detection model to perform image processing, and synchronously obtaining first prediction region information of the vehicle and second prediction region information of the vehicle target component; the building module is used for building a loss function based on the first prediction region information, the first real region information of the license plate, the second prediction region information and the second real region information of the vehicle target component; and the adjusting module is used for adjusting the model parameters of the joint detection model based on the loss function.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of the above aspects.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the above aspects.

According to the technical scheme disclosed by the invention, the accuracy of license plate positioning can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an application scenario for implementing an image processing method or a model training method of an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a detection result of a license plate vehicle joint detection model provided by the embodiment of the disclosure;

FIG. 4 is a flow chart of another image processing method provided by the disclosed embodiments;

FIG. 5 is a flow chart of a model training method provided by an embodiment of the present disclosure;

fig. 6 is a block diagram of an image processing apparatus provided in an embodiment of the present disclosure;

FIG. 7 is a block diagram of a model training apparatus provided in an embodiment of the present disclosure;

fig. 8 is a schematic diagram of an electronic device for implementing an image processing method or a model training method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, some license plate positioning schemes exist, only a single license plate factor is usually considered, and the accuracy needs to be improved.

In order to improve the accuracy of license plate location, the present disclosure provides the following embodiments.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present disclosure, and as shown in fig. 1, the image processing method according to the embodiment includes:

101. and inputting the image to be processed into the joint detection model for image processing, and synchronously obtaining a first area where the vehicle is located and a second area of the vehicle target component.

102. And acquiring a third area positioned in the first area from the second area.

103. And determining a fourth area comprising the license plate according to the third area.

The image to be processed may be an image acquired by a vehicle using a camera or other devices, or an image obtained by framing a video acquired by the camera, and the image to be processed may include a license plate image and a vehicle image.

The joint detection model is a multi-target detection model, namely, the input of the joint detection model is the image to be processed, and the output of the joint detection model is the detection result of various targets in the image to be processed. In the present embodiment, the plurality of objects includes a vehicle and a vehicle object member.

The area in which the vehicle is located may be referred to as a first area, and the area in which the vehicle target component is located may be referred to as a second area.

Synchronously determining a first area and a second area in the image, namely processing the whole image by a pointer to determine the first area and the second area at one time; correspondingly, the first region and the second region are asynchronously determined, for example, the whole image may be processed to determine the first region; then, intercepting the whole image based on the first area to obtain a partial image corresponding to the first area; and processing the partial image to determine a second area.

Therefore, compared with an asynchronous mode, the method for synchronously determining the first area and the second area can obviously improve the processing speed and improve the detection efficiency.

Normally, for a certain vehicle, taking a vehicle target component as a license plate as an example, a second area where the license plate of the vehicle is located in a first area where the vehicle is located, but when the combined detection model is actually used for target detection, the license plate position information obtained by the combined detection model may have errors, for example, traffic lights may exist in an image to be processed, and the combined detection model may falsely detect the traffic lights as the license plate. Therefore, the license plate position can be positioned based on the relation between the first area and the second area, and the accuracy of license plate positioning is improved.

Due to the false detection problem of the joint detection model, the second region obtained by adopting the joint detection model is not necessarily located in the first region, the false detection region can be filtered based on the first region, and the second region located in the first region is reserved as the third region.

For the third region, the third region may be used as a final license plate region, or the third region may be further filtered, and the filtered third region is used as a final license plate region. The final license plate region may be referred to as a fourth region.

In this embodiment, a third region located in the first region is obtained from the second region, and a fourth region including the license plate is determined based on the third region, and because the relationship between the first region and the second region is considered, a final license plate region is further determined for the second region located in the first region, that is, factors of the license plate and the vehicle are considered, and compared with a case where only a single license plate factor is considered, the accuracy of license plate positioning can be improved. In addition, the first area and the second area are synchronously determined, so that the detection speed can be increased, and the license plate positioning efficiency can be further improved.

For better understanding of the embodiments of the present disclosure, the following describes application scenarios to which the embodiments of the present disclosure are applicable. The present embodiment takes an image captured by a vehicle as an example.

As shown in fig. 2, a vehicle 201 may collect an image of an environment around the vehicle, and the vehicle may transmit the collected image to a server 202 through a communication network, and the server performs a license plate desensitization process on the image. Alternatively, after the vehicle 201 acquires the image, the vehicle may perform a license plate desensitization process on the image locally. In the present embodiment, the vehicle transmits an image to the server, and the server performs desensitization processing as an example.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Taking the image processing performed by the server as an example, the server may obtain the license plate and vehicle joint detection model in advance, and the server may be obtained by self training or obtained from other devices.

The joint detection model may be specifically referred to as a license plate vehicle joint detection model, and the license plate vehicle joint detection model is a deep Neural Network model, such as a Convolutional Neural Network (CNN) model, and may be specifically a YOLO series v3 (YOLO v 3) model.

The YOLO (you only look once) model is an object detection model, the input of which is an image, and the output of which is the detection result of all objects in the image, the detection result includes bounding box (bounding box) information and category information of the objects, and the bounding box information may include the following 5 parameters: center position (x, y), height (h), width (w), and confidence. The category information is used to indicate the category of the object, such as a pedestrian, a car, a bicycle, and the like. In this embodiment, the categories of the targets include: vehicles and license plates.

When the YOLO model is applied to the combined detection of the license plates and the vehicles, the input of the combined detection model of the license plates and the output of the combined detection model of the vehicle are images, the detection result of the vehicle comprises the information and the category information of the boundary frame of the vehicle, and the detection result of the license plates comprises the information and the category information of the boundary frame of the license plates.

If the bounding box of the vehicle is called a first bounding box and the bounding box of the license plate is called a second bounding box, the first area is an area surrounded by the first bounding box, and the second area is an area surrounded by the second bounding box.

As shown in fig. 3, the first bounding boxes are denoted by 301a to 301d, respectively, and the second bounding boxes are denoted by 302b to 302d, respectively.

After the first region and the second region are obtained, the second region located within the first region may be taken as a third region.

For example, the second areas corresponding to the second bounding boxes 302b to 302d are located within the first areas corresponding to the first bounding boxes 301b to 301d, respectively, and thus the second areas corresponding to the second bounding boxes 302b to 302d are all the third areas.

The above-mentioned example of detecting the vehicle position and the license plate position by using the license plate and vehicle joint detection model may also be a case of respectively training two models, which are called a license plate detection model and a vehicle detection model, where an image may be input to the license plate detection model and the vehicle detection model, a second boundary frame of a license plate is output by using the license plate detection model, a first boundary frame of a vehicle is output by using the vehicle detection model, and then a candidate license plate region is determined based on a relationship between a second region surrounded by the second boundary frame and the first region surrounded by the first boundary frame.

For each third region, the third region may be used as a final license plate region, that is, a fourth region including a license plate; or, the fourth area may be determined by combining the characters in the third area, for example, obtaining the characters in each third area by using an OCR technology, and if the characters in the third area include preset characters, it is determined that the corresponding third area is the fourth area, that is, the characters in the fourth area include the preset characters, and the preset characters are characters indicating an area, for example, jing, jin, and ji.

After the fourth area is determined, desensitization processing can be performed on the characters in the fourth area to obtain desensitized license plate data.

Desensitization treatments include, for example: and performing fuzzification processing on the characters.

In combination with the application scenario, the present disclosure further provides an image processing method.

Fig. 4 is a flowchart of another image processing method provided in the embodiment of the present disclosure, where the embodiment takes a joint detection model, specifically referred to as a license plate vehicle joint detection model, as an example, the image processing method provided in the embodiment includes:

401. and acquiring an image to be processed.

For example, a vehicle captures images or video of the surrounding environment using a device such as a camera mounted on the vehicle. The vehicle may then send the captured images or videos to a server so that the server may obtain the images or videos captured by the vehicle. For a video, each frame of image in the video can be extracted and processed for the image.

402. And inputting the image to be processed into a joint detection model for image processing to obtain the output first boundary frame information of the vehicle and the second boundary frame information of the vehicle target component.

The license plate and vehicle combined detection model is a deep neural network model, the input of the model is an image, and the output of the model is a license plate detection result and a vehicle detection result in the image.

The backbone network of the license plate vehicle joint detection model can be a YOLO v3 model, the whole image is input into the license plate vehicle joint detection model for target detection processing, and the license plate vehicle joint detection model can synchronously output first boundary frame information and second boundary frame information. For example, as shown in fig. 3, the detection result of the vehicle includes first bounding box information of the vehicle, such as 301a to 301d, and the detection result of the license plate includes second bounding box information of the license plate, such as 302b to 302d.

403. Determining a first area according to the first bounding box information; and determining a second area according to the second bounding box information.

Taking the first bounding box as an example, the first bounding box information may include: center position (x, y), height (h), width (w), and confidence of the vehicle.

Based on the center position (x, y), height (h), width (w), a region may be outlined as the first region. For example, the area surrounded by the bounding box 301b is defined as the first area 301 b.

In this embodiment, the first boundary frame information and the second boundary frame information are obtained by using the license plate and vehicle joint detection model, and the license plate and vehicle joint detection model is a single model, and the single model can synchronously determine information of multiple targets (i.e., license plates and vehicles). In addition, the license plate information and the vehicle information are jointly considered, and compared with the situation that the license plate information and the vehicle information are respectively and singly considered, the target detection can be carried out by combining various information, and the accuracy of the detection results of the license plate and the vehicle can be improved.

Further, since the number of bounding boxes detected by YOLO v3 is large, the model may be compressed to reduce the amount of computation, reduce the memory overhead, and the like.

Wherein, for the compression processing, may include: the model parameters of the license plate vehicle joint detection model are model parameters after quantization processing and/or model parameters after pruning processing.

Further, for pruning, the joint detection model is a candidate joint detection model with the minimum test runtime among a plurality of candidate joint detection models, and the number of channels of any one of the candidate joint detection models is smaller than that of the existing pre-training model.

For the training process of the license plate vehicle joint detection model, and the content of quantification or pruning, reference may be made to the following embodiments.

In this embodiment, the model parameters are quantized model parameters and/or pruned model parameters, which can reduce the number of parameters, thereby reducing the amount of computation, reducing the storage overhead, and improving the detection efficiency.

404. And judging whether the second area in the first area exists or not, if so, executing 405, and otherwise, executing 410.

Whether a first area corresponding to the first bounding box includes a second area corresponding to the second bounding box may be determined based on the first bounding box information and the second bounding box information.

For example, for the first region corresponding to 301a, there is no second region therein; for the first region corresponding to 301b, a second region exists therein.

405. And acquiring a third area positioned in the first area from the second area.

In the detection process, only the first region may be identified, and the second region, such as 301a in fig. 3, is not included therein, and at this time, it may be considered that the license plate is too small, and the position may be filtered; it is also possible to identify only the second region outside of which the first region does not exist, and then filter out the location since the absence of a vehicle can be regarded as a false detection of the license plate.

And taking the second area which comprises the first area and the second area and is positioned in the first area as a third area. For example, referring to fig. 3, each of the second areas surrounded by the second bounding boxes 302b to 302d is a third area.

406. And acquiring characters in the third area.

Wherein OCR techniques may be employed to identify text within each third region.

For example, an image corresponding to the third area may be captured as a license plate image, the license plate image is input to the character recognition model, and the character recognition model performs character recognition processing on the input license plate image to output characters in the third area.

The character recognition model is a deep neural network model for recognizing characters in images. The word recognition model may be a Connection Timing Classification (CTC) model, and a backbone network (backbone) thereof may be selected as the renet 34.

407. And judging whether the characters in the third area contain preset characters, if so, executing 408, and otherwise, executing 410.

The predetermined characters are characters indicating regions, such as jing, jin, and ji.

408. Determining the third region as a fourth region including a license plate.

In this embodiment, the fourth region is determined by combining the characters, so that the accuracy of license plate positioning can be further improved.

409. Desensitizing the text in the fourth region.

Among them, desensitization treatment includes, for example: and performing fuzzification processing on the characters in the fourth area.

In this embodiment, since the fourth region is the accurate license plate region obtained by the above method, by performing desensitization processing on the characters in the fourth region, accuracy of the desensitization processing can be improved, and thus user privacy data can be better protected.

410. And (6) ending.

In the embodiment, a license plate and vehicle combined detection model is adopted to obtain a first area and a second area, the second area located in the first area is used as a third area, and a fourth area is determined by combining characters in the third area, so that the accuracy of license plate positioning can be improved by referring to license plates, vehicles and character information; and then through carrying out desensitization to the characters in the fourth region, can improve the degree of accuracy of license plate desensitization. In the related technology, only a single factor of the license plate is considered, the false detection rate is high, for example, traffic lights, railings, lane lines and the like can be mistakenly detected as the license plate, so that the range of data to be desensitized is enlarged, resources required by desensitization processing are wasted, and the data is incomplete due to the fact that desensitization data are carried out on the data which should not be desensitized, and the integrity and the accuracy of the data are influenced.

The above describes a license plate locating process, which involves a license plate vehicle joint detection model. The embodiment of the disclosure also provides a training method of the license plate vehicle joint detection model.

Fig. 5 is a flowchart of a joint detection model training method provided in the embodiment of the present disclosure, and as shown in fig. 5, the joint detection model training method provided in the embodiment includes:

501. and inputting the image sample into the joint detection model to perform image processing, and synchronously obtaining first prediction region information of the vehicle and second prediction region information of the vehicle target component.

502. And constructing a loss function based on the first prediction region information and the first real region information of the license plate, and the second prediction region information and the second real region information of the vehicle target component.

503. Adjusting model parameters of the joint detection model based on the loss function.

The image samples can be obtained from an existing sample set, the image samples are manually labeled to obtain label data, the vehicle target component is a license plate, the label data can comprise label data of the license plate in the image samples and label data of a vehicle, and then the image samples and the label data can be used as training data to train the joint detection model.

A typical license plate detection model is performed based on a single license plate data, for example, each set of training data may be represented as < image sample, tag data of a license plate in an image sample >.

In the embodiment, the license plate data is considered as well as the license plate data. For example, each set of training data may be represented as < image sample, label data of a license plate in the image sample, and label information of the license plate in the image sample >.

The tag data of the vehicle may include: first real area information of the vehicle and first category information, tag data of the vehicle may include: second real area information and second category information of the license plate.

For example, 301a in fig. 3 may not mark the license plate region because the license plate is too small, and thus 301a may not be marked during marking.

After the image sample and the label data thereof are obtained, the image sample can be input into a joint detection model to be trained, and the output information of the model comprises first prediction region information where a vehicle is located in the image sample and second prediction region information where a license plate is located.

The area information may be specifically bounding box information.

After the prediction information is obtained, a loss function can be constructed based on the prediction information and the real information, and the loss function is adopted to adjust the model parameters until the preset iteration times are reached, so that the final model is obtained.

The specific loss function may be determined based on a loss function corresponding to a backbone network of the model, for example, the backbone network of the license plate vehicle joint detection model may be a YOLO v3 model, and a loss function corresponding to the YOLO v3 model may be adopted, and a specific formula of the loss function may refer to an existing description of the YOLO v3 model.

In the embodiment, based on the first prediction region information of the vehicle in the image sample and the first real region information of the vehicle, and the second prediction region information of the license plate in the image sample and the second real region information of the license plate, a loss function is constructed, and the model parameters are adjusted based on the loss function, so that not only vehicle target component factors but also vehicle factors can be considered in the model training process, and compared with the case of only considering a single vehicle target component factor, the accuracy of the model can be improved, and further when the license plate is positioned based on the model, the accuracy of the license plate positioning can be improved.

The detection model is a deep neural network model, and the backbone network of the detection model is, for example, a YOLO v3 model.

The basic idea of the YOLO algorithm is: the image is divided into a preset number of grid cells (grid cells), for example, 13 × 13. For each grid cell, a fixed number of bounding boxes of objects whose center points fall within the grid cell are predicted, as well as the confidence of the objects on the respective categories. The fixed number may be represented by B, where B =2 in YOLO v1, B =5 in YOLO v2, B =3 in YOLO v 3; the number of categories may be denoted by C; wherein B and C are both positive integers. Since B can be represented by 5 parameters, center position, width, height and confidence, the dimension output by the YOLO model is B (5+C), or B5+C.

In YOLO v3, 3 scales (scales) are fused, the sizes of the other two scales are 26 × 26 and 52 × 52, detection is performed on feature maps of the scales, and improvement of the detection effect on small targets is obvious. YOLO v3 adopts feature fusion of multiple scales, so that the number of bounding boxes is large.

The model is generally deployed on a Central Processing Unit (CPU), and may be compressed to ensure performance such as operating efficiency of the CPU.

The compression process may include: pruning, and/or quantification.

Wherein, for quantization, may include: and carrying out quantization processing on the adjusted model parameters to obtain the quantized model parameters.

For pruning, may include: obtaining the test running time of each candidate joint detection model in a plurality of candidate joint detection models on target hardware; and using the candidate joint detection with the minimum test running time as a final joint detection model; wherein the number of channels of any one of the candidate joint detection models is smaller than the number of channels of the existing pre-training model.

The existing pre-training model is, for example, the general YOLO v3 model.

The target hardware refers to hardware, such as a CPU, to which the joint detection model is ultimately applied.

The pruning process may specifically be to reduce the number of channels (channels) of the model, for example, the number of channels involved in the YOLO v3 model is 64, 128, 256, and 512, respectively, the number of channels of at least one channel may be reduced, and the adopted means may be to reduce the number of convolution kernels during the convolution operation. Taking the number of channels as 64 as an example, for 64 channels, the existing YOLO v3 model uses 64 convolution kernels for processing, and in this embodiment, 32 convolution kernels may be used for processing, so as to obtain the output characteristics of 32 channels, so that the number of channels may be reduced compared with 64 channels. The specific number of channels may be a set fixed value, such as 32, or may be the number of channels corresponding to the best performance selected from among a plurality of selectable numbers of channels based on network search according to the test result of the model. For pruning, the model finally applied to target detection is a pruned model, for example, the number of channels involved in a general YOLO v3 model is 64, 128, 256, and 512, and the number of corresponding channels involved in the pruned model may be 32, 64, 128, and 256, respectively.

For quantization, the model parameters are typically 32-bit floating point numbers, which can be quantized to 8-bit fixed point numbers. The quantization scheme from the related floating point number to the fixed point number can be adopted. In addition, during the quantization processing, the updated model parameter may be quantized in each iteration process, or the floating point model parameter may be quantized after the final floating point model parameter is obtained by reaching the preset iteration number. For the quantization process, the model parameters finally applied to the target detection are the model parameters after the quantization process.

In the embodiment, the parameter quantity of the license plate and vehicle combined detection model can be reduced and the storage resources can be saved by pruning the existing pre-training model and/or quantizing the model parameters, and the calculation quantity required by detection can be reduced and the detection efficiency can be improved when the model is applied.

Fig. 6 is a block diagram of an image processing apparatus according to an embodiment of the disclosure, and as shown in fig. 6, the apparatus 600 includes: a first determination module 601, a second determination module 602, and a third determination module 603.

The first determining module 601 is configured to input an image to be processed into the joint detection model for image processing, and synchronize a first region where a vehicle is located and a second region of a vehicle target component; the second determining module 602 is configured to obtain a third area located in the first area from the second area; the third determining module 603 is configured to determine a fourth region including the license plate according to the third region.

In this embodiment, the second region in the first region is used as the third region, and the fourth region is determined based on the third region, and the relationship between the first region and the second region is considered, so that the fourth region is further determined for the second region located in the first region, that is, the factors of the license plate and the vehicle are considered, and compared with the case where only a single license plate factor is considered, the accuracy of license plate positioning can be improved. In addition, the first area and the second area are synchronously determined, so that the detection speed can be increased, and the license plate positioning efficiency can be further improved.

In some embodiments, the text in the fourth area includes a predetermined text.

In some embodiments, the first determining module 601 is further configured to: inputting an image to be processed into a joint detection model for image processing to obtain first boundary frame information of the vehicle and second boundary frame information of the vehicle target component; determining the first area according to the first bounding box information; and determining the second area according to the second bounding box information.

In the embodiment, the license plate and vehicle combined detection model is adopted to obtain the information of the first boundary frame and the information of the second boundary frame, and the license plate and vehicle combined detection model is a single model which can synchronously determine the information of various targets (namely license plates and vehicles). In addition, the license plate information and the vehicle information are jointly considered, compared with the situation that the license plate information and the vehicle information are singly considered respectively, the target detection can be carried out by combining various information, and the accuracy of the detection results of the license plate and the vehicle can be improved.

In some embodiments, the joint detection model satisfies at least one of: the joint detection model is a candidate joint detection model with the minimum test running time in a plurality of candidate joint detection models, and the number of channels of any candidate joint detection model in the plurality of candidate joint detection models is smaller than that of the channels of the existing pre-training model; the model parameters of the joint detection model are quantized model parameters.

In some embodiments, the apparatus 600 further comprises: a desensitization module. And the desensitization module is used for desensitizing the characters in the fourth area.

Fig. 7 is a block diagram of a model training apparatus according to an embodiment of the present disclosure, and as shown in fig. 7, the apparatus 700 includes: a determination module 701, a construction module 702 and an adjustment module 703.

The determining module 701 is configured to input the image sample into the joint detection model to perform image processing, and synchronize to obtain first predicted region information of the vehicle and second predicted region information of the vehicle target component; the construction module 702 is configured to construct a loss function based on the first predicted region information and the first real region information of the license plate, and the second predicted region information and the second real region information of the vehicle target component; the adjusting module 703 is configured to adjust model parameters of the joint detection model based on the loss function.

In the embodiment, based on the first prediction region information of the vehicle in the image sample and the first real region information of the vehicle, and the second prediction region information of the license plate in the image sample and the second real region information of the license plate, a loss function is constructed, and the model parameters are adjusted based on the loss function, so that not only license plate factors but also vehicle factors can be considered in the model training process, and compared with the situation that only a single license plate factor is considered, the accuracy of the model can be improved, and further, when the license plate is positioned based on the model, the accuracy of license plate positioning can be improved.

In some embodiments, the apparatus 700 further comprises: a compression module to perform at least one of:

obtaining the test running time of each candidate joint detection model in a plurality of candidate joint detection models on target hardware; and using the candidate joint detection with the minimum test running time as a final joint detection model; wherein the number of channels of any one candidate joint detection model in the plurality of candidate joint detection models is smaller than the number of channels of the existing pre-training model;

and carrying out quantization processing on the adjusted model parameters.

It is to be understood that "first", "second", and the like in the embodiments of the present disclosure are used for distinction only, and do not indicate the degree of importance, the order of timing, and the like.

It is to be understood that in the disclosed embodiments, the same or similar elements in different embodiments may be referenced.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the model training method or the image processing method. For example, in some embodiments, the model training method or the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the model training method or the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the model training method or the image processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. An image processing method comprising:

inputting an image to be processed into a joint detection model for image processing, and synchronously obtaining a first area where a vehicle is located and a second area of a vehicle target component;

acquiring a third area positioned in the first area from the second area;

and determining a fourth area comprising the license plate according to the third area.

2. The method of claim 1, wherein the text in the fourth region comprises a predetermined text.

3. The method according to claim 1, wherein the step of inputting the image to be processed into the joint detection model for image processing to synchronously obtain a first area where the vehicle is located and a second area of the vehicle target component comprises the steps of:

inputting an image to be processed into a joint detection model for image processing to obtain first boundary frame information of the vehicle and second boundary frame information of the vehicle target component;

determining the first area according to the first bounding box information;

and determining the second area according to the second bounding box information.

4. The method of claim 1, wherein the joint detection model satisfies at least one of:

the joint detection model is a candidate joint detection model with the minimum test running time in a plurality of candidate joint detection models, and the number of channels of any candidate joint detection model in the plurality of candidate joint detection models is smaller than that of the channels of the existing pre-training model;

the model parameters of the joint detection model are quantized model parameters.

5. The method of any of claims 1-4, after the determining the fourth region, the method further comprising:

desensitizing the text in the fourth region.

6. A joint detection model training method, comprising:

inputting the image sample into a joint detection model for image processing, and synchronously obtaining first prediction region information of the vehicle and second prediction region information of a vehicle target component;

constructing a loss function based on the first prediction region information and first real region information of the license plate, and the second prediction region information and second real region information of the vehicle target component;

based on the loss function, model parameters of the joint detection model are adjusted.

7. The method of claim 6, further comprising at least one of:

obtaining the test running time of each candidate joint detection model in the plurality of candidate joint detection models on target hardware; and using the candidate joint detection with the minimum test running time as a final joint detection model; wherein the number of channels of any one candidate joint detection model in the plurality of candidate joint detection models is smaller than the number of channels of the existing pre-training model;

and carrying out quantization processing on the adjusted model parameters.

8. An image processing apparatus comprising:

the first determination module is used for inputting the image to be processed into the joint detection model for image processing, and synchronously obtaining a first area where the vehicle is located and a second area of the vehicle target component;

the second determining module is used for acquiring a third area positioned in the first area from the second area;

and the third determining module is used for determining a fourth area comprising the license plate according to the third area.

9. The apparatus of claim 8, wherein the text in the fourth region comprises a predetermined text.

10. The apparatus of claim 8, wherein the first determining means is further for:

determining the first area according to the first bounding box information;

11. The apparatus of claim 8, wherein the joint detection model satisfies at least one of:

12. The apparatus of any of claims 8-11, further comprising:

and the desensitization module is used for desensitizing the characters in the fourth area.

13. A joint detection model training apparatus, comprising:

the determining module is used for inputting the image sample into the joint detection model to perform image processing, and synchronously obtaining first prediction region information of the vehicle and second prediction region information of the vehicle target component;

a construction module for constructing a loss function based on the first predicted region information and the first real region information of the license plate, and the second predicted region information and the second real region information of the vehicle target component;

and the adjusting module is used for adjusting the model parameters of the joint detection model based on the loss function.

14. The apparatus of claim 13, further comprising:

a compression module to perform at least one of:

and carrying out quantization processing on the adjusted model parameters.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.