CN112801048A

CN112801048A - Optimal target image identification method, device, equipment and storage medium

Info

Publication number: CN112801048A
Application number: CN202110318221.3A
Authority: CN
Inventors: 高子翔; 肖潇; 李冰
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-05-14

Abstract

The application relates to an identification method, a device, equipment and a storage medium of an optimal target image, belonging to the technical field of image processing, wherein the method comprises the following steps: acquiring a video image sequence of a monitored target; extracting a target sub-image of each frame of video image in the video image sequence to obtain a target sub-image sequence; and inputting the target sub-image sequence into a target image recognition model to obtain deep semantic features corresponding to the target sub-images, and combining and judging to output an optimal target sub-image based on the deep semantic features. The method and the device greatly improve the effect of selecting the optimal target image, can quickly and accurately search the optimal image from the snap-shot video image sequence in the field of video monitoring, and also improve the effect and performance of a subsequent functional analysis module. The method solves the problem that the target form is poor in the optimal image selected by the image quality evaluation algorithm in the prior art, and the subsequent analysis effect is influenced.

Description

Optimal target image identification method, device, equipment and storage medium

Technical Field

The application relates to an optimal target image identification method, an optimal target image identification device, optimal target image identification equipment and a storage medium, and belongs to the technical field of image processing.

Background

Along with the enlargement of the scale of cities and the increase of population, the supervision difficulty of personnel, motor vehicles and non-motor vehicles at each road gate of the cities is increased, the control pressure of relevant law enforcement departments on the urban safety is increased day by day, but the popularization of the monitoring technology effectively relieves the problem. In the process of monitoring road barrier personnel, motor vehicles and non-motor vehicles, the image quality of multi-frame video images of the same target acquired by monitoring equipment is uneven, and a plurality of low-quality images such as blurry images or shielded targets can be sent to a subsequent functional module, so that the waste of hardware performance and processing time is caused.

In order to solve the above problems, in the prior art, an image quality evaluation algorithm is generally adopted to perform image quality evaluation on a video image sequence acquired by a monitoring device, and an optimal image in the video image sequence is selected and input into a subsequent functional module. Existing image quality assessment algorithms can be divided into three types, full-reference, partial-reference and no-reference. The full reference image quality evaluation means that under the condition that an ideal image is selected as a reference image, the difference between the image to be evaluated and the reference image is compared, the distortion degree of the image to be evaluated is analyzed, the quality evaluation of the image to be evaluated is obtained, and finally which image is better is determined.

If the brightness, contrast and definition of the video image acquired by the monitoring device are not greatly different, and the target form difference in the image is large, the existing image quality evaluation algorithm only focuses on the image index information such as the brightness, contrast, noise and structure of the image, which can cause the existing image quality evaluation algorithm to select the target image with poor angle, incomplete appearance or blocked, and the target image with a positive angle, complete appearance or no blocking can be discarded, which can greatly affect the analysis effect of the subsequent functional module in the monitoring system.

Disclosure of Invention

The application provides an optimal target image identification method, device, equipment and storage medium, which aim to solve the problem that subsequent analysis is influenced by poor target form in an optimal image selected by an image quality evaluation algorithm in the prior art.

In order to solve the technical problem, the application provides the following technical scheme:

in a first aspect of the embodiments of the present application, a method for identifying an optimal target image is provided, where the method includes:

acquiring a video image sequence of a monitored target;

extracting a target sub-image of each frame of video image in the video image sequence to obtain a target sub-image sequence;

and inputting the target sub-image sequence into a target image recognition model to obtain deep semantic features corresponding to the target sub-images, and combining and judging to output an optimal target sub-image based on the deep semantic features.

In a second aspect of the embodiments of the present application, there is provided an apparatus for identifying an optimal target image, the apparatus including:

the video image acquisition module is used for acquiring a video image sequence of a monitored target;

the subimage extraction module is used for extracting a target subimage of each frame of video image in the video image sequence to obtain a target subimage sequence;

and the optimal image recognition module is used for inputting the target sub-image sequence into a target image recognition model to obtain deep semantic features corresponding to the target sub-images, and combining and distinguishing the deep semantic features to output optimal target sub-images.

In a third aspect of the embodiments of the present application, there is provided an electronic device, which includes a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the computer program is loaded and executed by the processor to implement the steps of the method for identifying an optimal target image according to the first aspect of the embodiments of the present application.

In a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, where a computer program is stored, and the computer program is loaded and executed by the processor to implement the steps of the method for identifying an optimal target image according to the first aspect of the embodiments of the present application.

The beneficial effect of this application lies in: according to the method, the target form in the target sub-image is represented by extracting the discriminative deep semantic features in the target sub-image, so that the optimal target sub-image is identified according to the difference of the target form in the image, the effect of selecting the optimal target image is greatly improved, the method is more suitable, meanwhile, the optimal image can be quickly and accurately searched from the snap-shot video image sequence in the video monitoring field, and the effect and performance of a subsequent function analysis module are also improved.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

Drawings

FIG. 1 is a network architecture diagram of a method and apparatus for identifying an optimal target image according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for identifying an optimal target image according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for identifying a better target image in a current image according to an embodiment of the present application

FIG. 4 is a sequence of target sub-images obtained in one embodiment of the present application;

FIG. 5 is a block diagram of a target image recognition model provided in one embodiment of the present application;

FIG. 6 is a block diagram of an apparatus for identifying an optimal target image according to an embodiment of the present application;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Fig. 1 is a schematic diagram of a network architecture for implementing the method and apparatus for identifying an optimal target image according to an embodiment of the present application, as shown in fig. 1, the network architecture at least includes: the image capturing device 101 includes an image capturing device 101 and an electronic device 102, where the electronic device may be a PC, a mobile terminal, or the like, or a server, and the image capturing device 101 may be a camera, a video camera, or a camera of the electronic device.

The electronic device 102 acquires a video image sequence of a monitoring target acquired by the image acquisition device in real time, extracts a target sub-image of each frame of video image in the video image sequence, and inputs the target sub-image into the database for storage. The electronic device 102 may obtain a target sub-image sequence within a predetermined time period of the monitoring target from the database, and identify an optimal target sub-image from the target sub-image sequence through a pre-trained deep learning model.

For example, in an application scenario of non-motor vehicle monitoring, if the electronic device 102 is a server or a PC, a non-motor vehicle gate monitoring analysis system may be installed, a web interface of the system is logged in, an image acquisition device (e.g., a camera) is connected to the system, a camera at a different non-motor vehicle gate captures a non-motor vehicle appearing in a monitoring range in real time, and a video image sequence of the same non-motor vehicle captured in real time in each camera is sent to the non-motor vehicle gate monitoring analysis system.

If the electronic device 102 is an intelligent mobile device, such as a mobile phone, a non-motor vehicle access monitoring analysis APP can be installed, after the non-motor vehicle access monitoring analysis APP is logged in, a camera of the mobile phone can be used for acquiring real-time non-motor vehicle video images, the extracted non-motor vehicle sub-images are stored in a storage space of the mobile phone, and meanwhile, local image cache of the mobile phone is periodically cleaned.

The electronic device 102 acquires a video image sequence of the non-motor vehicle through a non-motor vehicle bayonet monitoring analysis system or a non-motor vehicle bayonet monitoring analysis APP, identifies an optimal non-motor vehicle sub-image, and provides an optimal input image for an attribute analysis function module or a feature retrieval module of the non-motor vehicle bayonet monitoring system so as to acquire attribute information such as color, texture and style of the currently monitored non-motor vehicle, and whether a license plate exists or not; or similar vehicles and the like are searched in a local database/mobile phone memory, and finally the analysis result of the optimal non-motor vehicle image is displayed to the user.

Fig. 2 is a flowchart of a method for identifying an optimal target image according to an embodiment of the present application, where the method for identifying an optimal target image according to the embodiment of the present application may be applied to an electronic device 102 of a network architecture shown in fig. 1, and the electronic device 102 may be, for example, a PC, a mobile terminal, or the like, or may be a server, or the like. In an embodiment of the present application, the optimal target identification method includes the following steps:

s201, acquiring a video image sequence of a monitored target.

In the embodiment of the application, the video image sequence of the monitoring target acquired by the electronic device can be acquired by a camera of the electronic device, and also can be acquired by other devices and sent to the electronic device.

For example, in a non-motor vehicle monitoring scene, in order to perform target search or vehicle violation detection and the like, image acquisition devices (e.g., cameras) are respectively erected at non-motor vehicle bayonet positions in different monitoring areas, a non-motor vehicle appearing in a monitoring range is captured in real time, and the image acquisition devices acquire a video image sequence of the same non-motor vehicle entering the monitoring range in a current scene and a current time period.

The video image sequence collected by the image collecting device at the non-motor vehicle bayonet comprises the video images of the non-motor vehicle which just enters a monitoring area, the non-motor vehicle which completely enters the monitoring area and leaves the monitoring area, and the collected video images of the non-motor vehicle in different scenes comprise the images of the non-motor vehicle which are shielded by other objects such as roadside trees, pedestrians and the like.

S202, extracting a target sub-image of each frame of video image in the video image sequence to obtain a target sub-image sequence.

After the electronic device acquires the video image sequence of the monitored target, in order to obtain an optimal image therein, a target sub-image corresponding to the target to be monitored in each frame of video image can be extracted.

As shown in fig. 4, the extracted target sub-image sequence corresponding to the non-motor vehicle in the non-motor vehicle monitoring scene is shown. The extracted sequence of target sub-images corresponding to the non-motor vehicle may be stored in a database. If the electronic device for acquiring the video image sequence is a mobile phone, the extracted non-motor vehicle target sub-image sequence can be stored in the storage space of the mobile phone.

In a possible implementation manner, each frame of video image to be evaluated may be input into a pre-trained target detection model, and a target sub-image included in each frame of video image may be determined by the target detection model. For example, the obtained target sub-image may be a sub-image corresponding to a target frame where the monitoring target is located. For example, in a non-motor scene, the monitored target is a non-motor vehicle, and the obtained target sub-image is a sub-image corresponding to a target frame where the non-motor vehicle is located.

For example, the target detection model of this embodiment may be a YOLO target detection model, and may also be a fast RCNN series deep learning target detection model, which may be flexibly selected according to requirements in practical applications, and this embodiment is not specifically limited herein. The prior art, which is well known to those skilled in the art, can be adopted for detecting the target in the video image through the target detection model to determine the target sub-image, and the detailed description is omitted here.

S203, inputting the target sub-image sequence into a target image recognition model to obtain deep semantic features corresponding to the target sub-images, and combining and judging to output an optimal target sub-image based on the deep semantic features.

The following describes in detail the implementation process of S203, and fig. 3 is a flowchart illustrating an embodiment of the present application for identifying an optimal target sub-image, and referring to fig. 3, the application S203 specifically includes the following steps:

and S101, inputting two frames of target sub-images in the target sub-image sequence into the target image recognition model as a current image pair.

The target image recognition model is a trained network model and can be used for recognizing and distinguishing a better target sub-image in two frames of target sub-images.

Specifically, as a possible implementation manner, for the target sub-image sequence obtained in step 202, the first two frames of target sub-images in the target sub-image sequence may be selected to be obtained according to the arrangement order of the target sub-images in the target sub-image sequence, and input to the target recognition model as the current image pair.

In another possible implementation, any two frames of target sub-images from the target sub-image sequence may be selected as the current image pair and input to the target recognition model. In practical applications, a mode of acquiring the current image pair may be selected according to needs, and the embodiment is not limited herein.

For convenience of explanation, first two frames of target sub-images in the target sub-image sequence are sequentially acquired, the first frame of target sub-image is referred to as a first sub-image, the second frame of target sub-image is referred to as a second sub-image, and the first sub-image and the second sub-image are input into the target recognition model as a current image pair.

In one embodiment, the first sub-graph and the second sub-graph are directly input into the target recognition model; in another embodiment, the first sub-image in the first image and the second sub-image is preset as the current optimal target sub-image, and then the current optimal target sub-image is respectively input to the corresponding modules in the target recognition model for processing, which will be discussed in detail later and will not be described herein again.

And S102, extracting deep semantic features corresponding to the current image pair through the target image recognition model, and combining and judging and outputting a better target sub-image in the current image pair as a current optimal target sub-image based on the deep semantic features.

Specifically, the differences of the brightness, the contrast, the definition, and the like of the video image captured by the image capturing device of the present embodiment are very small, and the differences are large in the form of the target in the video image.

In the embodiment, the image quality of the target sub-image is judged by detecting the difference of the target form in the target sub-image, for example, the deep semantic features of the target sub-image can be extracted by using a target image recognition model obtained by deep learning to represent the target form in the target sub-image.

The target form in this embodiment refers to a visual state of a monitoring target included in a video image. For example, in a non-motor vehicle monitoring scene, a monitored target in the video image is a non-motor vehicle, and the target form of the non-motor vehicle may be, for example, the angle of the non-motor vehicle, whether the appearance is complete, whether there is a block, and the like.

And for the first sub-image and the second sub-image corresponding to the acquired current image pair, respectively extracting deep semantic features with discrimination information to judge the quality of the first sub-image and the right image by adopting a pre-trained target image recognition model, and recognizing a better target sub-image in the first sub-image and the second sub-image.

Illustratively, in a non-motor vehicle monitoring scene, the extracted deep semantic features may be, for example, the orientation of a non-motor vehicle, whether a license plate is exposed, whether the non-motor vehicle and the license plate are clear and complete, whether an occlusion exists, a vehicle angle, and the like.

And finally, updating the better image in the target image recognition model into the current optimal target sub-image according to the recognition result of the target image recognition model.

The embodiment can adaptively extract the deep semantic features with discriminability for different target sub-images. For example, in a certain monitoring scene of the non-motor vehicle, for example, a target sub-image exposing a license plate is required to be selected to analyze the license plate number of the non-motor vehicle, then by the method of the embodiment of the present application, deep semantic features of the target sub-image are extracted, and an optimal target sub-image meeting the requirements can be selected adaptively. Compared with the mode that image quality is evaluated by artificially setting fixed indexes such as non-motor vehicle angles and non-motor vehicle definition in the prior art, the scheme of the application has wider application range and more universality.

In one embodiment, the target image recognition model includes a feature extraction module including a first feature extraction module and a second feature extraction module.

In this embodiment, taking the first two frames of images in the currently acquired target sub-image sequence as the current image pair, and selecting the first sub-image as the current optimal sub-image, the steps S301 and S302 are specifically implemented as follows: inputting a first sub-image corresponding to the current image pair into the first feature extraction module, and performing feature extraction on the first sub-image by the first feature extraction module to output deep semantic features of the first sub-image; inputting a second sub-image corresponding to the current image pair into a second feature extraction module, and performing feature extraction on the second sub-image by the second feature extraction module to output deep semantic features of the second sub-image; and judging the target forms contained in the first sub-image and the second sub-image according to the deep semantic features corresponding to the first sub-image and the second sub-image, determining the first sub-image or the second sub-image with the target forms meeting preset conditions as a better target sub-image in the current image pair, and taking the better target sub-image as the current optimal target sub-image.

Optionally, the first feature extraction module and the second feature extraction module have the same structure, and optionally, the first feature extraction module and the second feature extraction module are both composed of 5 convolutional layers and 4 maximal pooling layers, and the connection relationships are convolutional layer conv1, maximal pooling layer pool1, convolutional layer conv2, maximal pooling layer pool2, convolutional layer conv3, maximal pooling layer pool3, convolutional layer conv4, maximal pooling layer pool4, and convolutional layer conv5, which are connected in sequence.

The deep semantic features extracted in the embodiment of the application refer to deep features of an image with semantic information. Illustratively, in the field of non-motor vehicle monitoring, the extracted deep semantic features may be, for example, comprehensive feature information of the orientation of a non-motor vehicle, whether a license plate is exposed, whether the non-motor vehicle and the license plate are clear and complete, whether a shield exists or not, the angle of the non-motor vehicle, and the like. The deep semantic features extracted by the embodiment have discriminability and are the key for judging the quality of the corresponding sub-images.

In another embodiment, as shown in fig. 5, the target image recognition model includes a first feature extraction module, a second feature extraction module, a feature fusion module, and a feature discrimination module. On the basis of the above embodiment, the method for extracting deep semantic features of the first sub-graph and deep semantic features of the second sub-graph by using the first feature extraction module and the second feature extraction module respectively includes the following steps: inputting the deep semantic features of the first sub-image and the deep semantic features of the second sub-image into a feature fusion module to obtain a fused deep semantic feature vector, inputting the fused deep semantic feature vector into a feature discrimination module, outputting the probability that the target forms contained in the first sub-image and the second sub-image corresponding to the current image pair meet preset conditions, and judging the first image or the second sub-image corresponding to the output larger probability as a better target sub-image.

Optionally, the feature fusion module is constructed by a concat function. The deep semantic features of the first sub-image extracted by the first feature extraction module and the deep semantic features of the second sub-image extracted by the second feature extraction module are input into the feature fusion module, and after concat feature fusion, the obtained fused deep semantic feature vector is input into the feature discrimination module.

Optionally, the feature discriminating module is composed of two fully-connected layers and a SoftMax layer, and the connection relationship is a first fully-connected layer FC1, a second fully-connected layer FC2 and a SoftMax layer which are connected in sequence.

The fused deep semantic feature vector is input into a SoftMax layer after passing through two fully-connected layers, and two probability values are obtained through calculation of a SoftMax function, namely the probability (the value range is 0-1) that the target form contained in the first sub-graph meets the preset condition and the probability that the target form contained in the second sub-graph meets the preset condition. The preset condition may be, for example, that the non-motor vehicle is not shielded and the license plate is exposed. In practical application, different preset conditions can be selected according to requirements.

If the probability value corresponding to the second sub-image is larger, the second sub-image is judged to be a better target sub-image; and if the probability value corresponding to the first sub-image is larger, judging that the first sub-image is a better target sub-image.

And after the better target sub-image is determined, updating the better target sub-image into the current optimal target sub-image.

S103: and taking the current optimal target sub-image and one frame of target sub-image which is not input into the target image recognition model in the target sub-image sequence as a current image pair, entering S102 until all target sub-images in the target sub-image sequence are input into the target image recognition model, and outputting the current optimal target sub-image as an optimal target sub-image.

Wherein, S103 specifically includes:

s1031, judging whether all target sub-images in the target sub-image sequence are input into the target image recognition model, if so, executing step S1033; otherwise, S1032 is executed

S1032, taking the current optimal target sub-image and a frame of target sub-image which is not input into the target image recognition model in the target sub-image sequence as a current image pair, and entering S102;

specifically, initially, if the previous two frames of video images are selected to be sequentially acquired for determining a better image, the next frame of target sub-image in the target sub-image sequence may be continuously acquired in sequence, the next frame of target sub-image acquired in sequence is used as a second sub-image, the current optimal target sub-image is used as a first sub-image, a current image pair is formed, and S102 is executed.

Initially, if any two frames of target sub-images are selected to be obtained to form the current image pair, then any unrecognized target sub-image may be obtained from the target sub-image sequence again, and the current image pair is formed with the current optimal target sub-image, and step S102 is executed.

And S1033, outputting the current optimal target sub-image as the optimal target sub-image.

Specifically, in this embodiment, a current image pair is formed by selecting one target sub-image from the target sub-image sequence and the current optimal target sub-image, and it is recognized that the better image in the current image pair is updated to the current optimal target sub-image. And when all the target sub-images in the target sub-image sequence are traversed, obtaining the current optimal target sub-image which is the optimal target sub-image in all the target sub-images in the target sub-image sequence.

In another embodiment, the target image recognition model includes a first feature extraction module, a second feature extraction module, a first keypoint detection module, and a second keypoint detection module.

Specifically, when the deep semantic features of the first sub-image and the second sub-image extracted by the first feature extraction module and the second feature extraction module of the target recognition model both satisfy the preset condition, the better target sub-image in the first sub-image and the second sub-image cannot be further determined.

For this situation, as a further optimized implementation manner, as shown in fig. 5, the target image recognition model according to the embodiment of the present application further includes a first keypoint detection module and a second keypoint detection module.

Specifically, the deep semantic features output by the first feature extraction module are input to the first keypoint detection module, and the first keypoint detection module performs keypoint target feature extraction on the feature map extracted by the first feature extraction module. And obtaining the key point target characteristics contained in the first sub-image corresponding to the current image pair.

And inputting the deep semantic features output by the second feature extraction module into a second key point detection module, and performing key point target feature extraction on the feature map extracted by the second feature extraction module by the second key point detection module to obtain key point target features contained in a second sub-map corresponding to the current image.

The embodiment further judges the superior target sub-image in the current image pair by combining the key point target characteristics contained in the first sub-image and the second sub-image. The key point target features refer to key part features of the monitoring target in the video image.

Illustratively, in the non-motor vehicle monitoring scenario, the extracted key point target features (including 9 key point targets) of the non-motor vehicle include: handlebar, headstock, front wheel, roof, automobile body middle part, rear wheel, tailstock, trunk and tricycle hopper (including the back carriage of taking the roof).

The first keypoint detection module and the second keypoint detection module of the embodiment have the same structure. Optionally, the first and second keypoint detection modules each comprise 6 convolutional layers and a SoftMax layer. Wherein, the 6 th layer convolution layer outputs 9-dimensional vectors which respectively correspond to the feature maps of 9 key point targets of the non-motor vehicle.

And inputting feature graphs of the 9 key point targets into a SoftMax layer, and identifying the key point targets contained in the first subgraph and the second subgraph so as to obtain the number of the key point targets contained in the first subgraph and the second subgraph.

When a better image cannot be determined according to the depth semantic features, if the number of key point targets contained in the first sub-image is larger than that contained in the second sub-image, the first sub-image is judged to be the better image; otherwise, the second sub-image is judged to be the better image.

In another embodiment, as shown in fig. 5, the target image recognition model includes a first feature extraction module, a second feature extraction module, a feature fusion module, a feature identification module, a first keypoint detection module, and a second keypoint detection module.

In this embodiment, the deep semantic features extracted by the first feature extraction module and the second feature extraction module are subjected to feature fusion by the feature fusion module and then output to the feature discrimination module, and the probability that the first sub-graph and the second sub-graph meet the preset condition is obtained by the feature discrimination module.

If the difference value of the two probability values obtained by the feature discrimination module of the target image recognition model is smaller than a preset value (for example, the difference value between the two probability values is 0.001), the image quality of the first sub-image and the image quality of the second sub-image are basically the same, and a better image cannot be determined better.

The feature determination module of this embodiment further includes a logic determination module, configured to further determine a better image in the current image pair according to the number of extracted keypoint targets, specifically:

the logic discrimination module is constructed by a linear discrimination function, and the obtained probability that the first sub-image and the second sub-image meet the preset condition and the number of the key point targets contained in the first sub-image and the second sub-image are output as the input of the logic discrimination module.

And if the logic discrimination module identifies that the difference value of the probability values of the first sub-image and the second sub-image meeting the preset condition is smaller than the preset value, judging the first sub-image or the second sub-image with more key point targets as a better image.

The embodiment further judges the better image by detecting the key point target, so that the judgment result is more accurate. Of course, the key point detection may not be performed, and in the case that the image quality of the first sub-image is not much different from that of the second sub-image, any one frame of target sub-image may be updated to the optimal target sub-image.

As shown in fig. 4, the sub-image selected in the box is the optimal target sub-image obtained by the method of the embodiment of the present application, and as can be seen from the whole target sub-image sequence, the non-motor vehicle body in the optimal target sub-image is complete, and the image is clear without being blocked or interfered by other objects.

After obtaining the optimal non-motor vehicle subimage in the non-motor vehicle subimage sequence, inputting the optimal non-motor vehicle subimage into a subsequent attribute analysis function module so as to obtain the color, texture and style of the current monitored non-motor vehicle and attribute information such as whether a license plate exists or not; or inputting the data into a subsequent characteristic retrieval module, searching similar vehicles and the like from a local database, and displaying the final analysis result to the user.

In summary, the invention respectively extracts the deep semantic features with discriminability to represent the forms of the non-motor vehicle targets in the two frames of input target sub-images by inputting two frames of target sub-images each time, identifies which one of the two sub-images is better, updates the optimal target sub-image, and then continuously judges until the last image of the target sub-image sequence. The embodiment of the application greatly improves the effect of selecting the optimal target image, can quickly and accurately search the optimal image from the snap-shot video image sequence in the field of video monitoring, and also improves the effect and performance of a subsequent function analysis module.

Fig. 6 is a block diagram of an apparatus for identifying an optimal target image according to an embodiment of the present application, and this embodiment takes an example of applying the apparatus to an electronic device of a network architecture shown in fig. 1. The device at least comprises the following modules:

Further, the optimal image recognition module includes:

the sub-image input unit is used for inputting two frames of target sub-images in the target sub-image sequence into the target image recognition model as a current image pair;

the optimal image judging unit is used for obtaining deep semantic features corresponding to the current image pair through the target image recognition model, and combining and judging and outputting an optimal target sub-image in the current image pair as a current optimal target sub-image based on the deep semantic features;

and the cycle identification unit is used for taking the current optimal target sub-image and one frame of target sub-image which is not input into the target image identification model in the target sub-image sequence as a current image pair, executing the step of the optimal image judgment unit until all target sub-images in the target sub-image sequence are input into the target image identification model, and outputting the current optimal target sub-image as an optimal target sub-image.

For the optimal target image recognition device provided in the embodiment of the present application, the above method embodiment is referred to for relevant details, and the implementation principle and technical effect are similar, which are not described herein again.

It should be noted that: the recognition device for an optimal target image provided in the above embodiment only exemplifies the division of the above functional modules when recognizing the optimal target image, and in practical applications, the above function allocation may be completed by different functional modules according to needs, that is, the internal structure of the recognition device for an optimal target image is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the recognition apparatus for an optimal target image and the recognition method for an optimal target image provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 7 is a block diagram of an electronic device according to an embodiment of the present application, where the electronic device may be a computing device such as a desktop computer, a notebook computer, a palm top computer, and a cloud server, and the device may include, but is not limited to, a processor and a memory. The electronic device of this embodiment at least includes a processor and a memory, where the memory stores a computer program, the computer program is executable on the processor, and when the processor executes the computer program, the steps in the above-mentioned embodiment of the method for identifying an optimal target image, for example, the steps in the method for identifying an optimal target image shown in fig. 2, are implemented. Or, the processor, when executing the computer program, implements the functions of the modules in the above-described recognition apparatus for an optimal target image.

Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the electronic device.

The processor may include one or more processing cores, such as: 4 core processors, 6 core processors, etc. The processor may be implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable gate array), PLA (Programmable logic array). The processor may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning. The processor is a control center of the electronic equipment and is connected with each part of the whole electronic equipment by various interfaces and lines.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the image detection apparatus by running or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a memory device, or other volatile solid state storage device.

It is understood by those skilled in the art that the apparatus described in this embodiment is only an example of an apparatus for recognizing an optimal target image, and does not constitute a limitation to the apparatus for recognizing an optimal target image, and in other embodiments, more or fewer components may be included, or some components may be combined, or different components may be included, for example, the apparatus for recognizing an optimal target image may further include an input/output device, a network access device, a bus, and the like. The processor, memory and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the electronic device of the embodiment may further include fewer or more components, and the embodiment is not limited thereto.

Optionally, the present application further provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the steps of the above-mentioned optimal target image identification method when being executed by a processor.

Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to implement the steps of the above-mentioned embodiment of the optimal target image identification method.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for identifying an optimal target image, the method comprising:

acquiring a video image sequence of a monitored target;

2. The method of claim 1, wherein inputting the sequence of target sub-images into a target image recognition model to obtain deep semantic features corresponding to each target sub-image, and performing combination and discrimination based on the deep semantic features to output an optimal target sub-image comprises:

s101: inputting two frames of target sub-images in the target sub-image sequence into the target image recognition model as a current image pair;

s102: obtaining deep semantic features corresponding to the current image pair through the target image recognition model, and performing combination and discrimination based on the deep semantic features to output a better target sub-image in the current image pair as a current optimal target sub-image;

3. The method according to claim 2, wherein S101 further comprises selecting a frame of target sub-image from the current image pair as a current optimal target sub-image; the target image recognition model comprises a first feature extraction module and a second feature extraction module; the obtaining of the deep semantic features corresponding to the current image pair through the target image recognition model, and the combining and judging output of the better target sub-image in the current image pair as the current optimal target sub-image based on the deep semantic features include:

inputting a first sub-image in the current image pair into the first feature extraction module to obtain deep semantic features of the first sub-image; inputting a second sub-image in the current image pair into the second feature extraction module to obtain deep semantic features of the second sub-image, wherein the first sub-image is a current optimal target sub-image;

and judging the target forms contained in the first sub-image and the second sub-image according to the deep semantic features corresponding to the first sub-image and the second sub-image, determining the first sub-image or the second sub-image with the target forms meeting preset conditions as a better target sub-image in the current image pair, and taking the better target sub-image as the current optimal target sub-image.

4. The method according to claim 3, wherein the target recognition model further includes a feature fusion module and a feature discrimination module, and the discriminating, according to the deep semantic features corresponding to the first sub-image and the second sub-image, the target morphology contained in the first sub-image and the second sub-image, and determining the first sub-image or the second sub-image whose target morphology meets a preset condition as a better target sub-image in the current image pair includes:

inputting the deep semantic features output by the first feature extraction module and the second feature extraction module into the feature fusion module to obtain fused feature vectors;

inputting the fused feature vector to the feature discrimination module, and outputting the probability that the target forms contained in the first sub-image and the second sub-image corresponding to the current image pair meet the preset condition through SoftMax calculation;

and judging the first sub-image or the second sub-image corresponding to the output larger probability value as a better target sub-image.

5. The method of claim 4, wherein the target image recognition model further comprises a first keypoint detection module and a second keypoint detection module, the method further comprising:

inputting the deep semantic features output by the first feature extraction module into the first key point detection module, inputting the deep semantic features output by the second feature extraction module into the second key point detection module, and respectively extracting key point target features to obtain key point targets contained in a first sub-image and a second sub-image corresponding to the current image pair;

correspondingly, the step of determining the first sub-image or the second sub-image corresponding to the output value with the larger probability as a better target sub-image comprises the following steps:

and combining the key point target characteristics with the probability to judge a better target sub-image in the current image pair.

6. The method of claim 3, wherein the target image recognition model further comprises a first keypoint detection module and a second keypoint detection module, the method further comprising:

inputting the deep semantic features output by the first feature extraction module into the first key point detection module, inputting the deep semantic features output by the second feature extraction module into the second key point detection module, and respectively extracting key point target features to obtain key point target features contained in a first sub-image and a second sub-image corresponding to the current image pair;

and combining the key point target features with the corresponding deep semantic features to judge a better target sub-image in the current image pair.

7. The method of claim 6, wherein the combining the keypoint target features with the corresponding deep semantic features to discriminate a superior target sub-image in a current image pair comprises:

and if the target forms contained in the first sub-image and the second sub-image corresponding to the current image pair both meet the preset condition, determining the target sub-image containing more key point target features as a better target sub-image in the current image pair.

8. An apparatus for identifying an optimal target image, the apparatus comprising:

9. An electronic device comprising a processor, a memory and a computer program stored in the memory and executable on the processor, characterized in that the computer program is loaded and executed by the processor to implement the steps of the method for identifying an optimal target image according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method for identifying an optimal target image according to any one of claims 1 to 7.