CN114092739B

CN114092739B - Image processing method, apparatus, device, storage medium, and program product

Info

Publication number: CN114092739B
Application number: CN202111289604.9A
Authority: CN
Inventors: 陈子亮
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2023-06-30
Anticipated expiration: 2041-11-02
Also published as: CN114092739A

Abstract

The disclosure provides an image processing method, an image processing device, a storage medium and a program product, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as image processing and image recognition. The specific implementation scheme is as follows: acquiring a sample image containing at least one real frame, and acquiring K first candidate anchor point frames according to the intersection ratio between each anchor point frame and the real frame in the sample image aiming at each real frame in the at least one real frame; and according to the intersection ratios between the K prediction frames corresponding to the K first candidate anchor frames in one-to-one mode in the sample image and the real frames, N second candidate anchor frames are taken out from the K first candidate anchor frames to serve as positive sample anchor frames in the sample image, wherein N is smaller than K.

Description

Image processing method, apparatus, device, storage medium, and program product

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as image processing and image recognition.

Background

In recent years, with the development of computer software and hardware technology, the fields of artificial intelligence and machine learning have also been greatly advanced. Computer vision is a simulation of biological vision using a computer and related equipment, and is an important part of the field of artificial intelligence, and the technology is also widely applied to application scenes such as image processing and image recognition.

The object detection is a research hotspot in the current computer vision and machine learning fields, and how to efficiently improve the detection effect and performance has become one of important research directions.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus, device, storage medium, and computer program product.

According to an aspect of the present disclosure, there is provided an image processing method including: acquiring a sample image containing at least one real frame; aiming at each real frame in the at least one real frame, obtaining K first candidate anchor frames according to the intersection ratio between each anchor frame and the real frame in the sample image; and according to the intersection ratios between the K prediction frames corresponding to the K first candidate anchor frames in one-to-one mode in the sample image and the real frames, N second candidate anchor frames are taken out from the K first candidate anchor frames to serve as positive sample anchor frames in the sample image, wherein N is smaller than K.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: the system comprises a sample image acquisition module, a candidate anchor point frame acquisition module and a positive sample anchor point frame determination module, wherein the sample image acquisition module is used for acquiring a sample image containing at least one real frame; the candidate anchor point frame acquisition module is used for acquiring K first candidate anchor point frames according to the intersection ratio between each anchor point frame and each real frame in the sample image aiming at each real frame in the at least one real frame; the positive sample anchor point frame determining module is used for taking out N second candidate anchor point frames from the K first candidate anchor point frames according to the cross-over ratios between the K prediction frames corresponding to the K first candidate anchor point frames one by one in the sample image and the real frames, and the N second candidate anchor point frames are used as positive sample anchor point frames in the sample image, wherein N is smaller than K.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method according to an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to embodiments of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 illustrates a system architecture suitable for the image processing methods and apparatus of embodiments of the present disclosure;

FIG. 2 illustrates a flow chart of an image processing method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates an image processing method according to an embodiment of the present disclosure;

FIG. 4 illustrates a schematic diagram of an intersection ratio between an anchor block and a real block according to an embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of determining positive and negative sample anchor blocks according to an embodiment of the present disclosure;

FIG. 6 illustrates a schematic diagram of a determined, positive and negative sample anchor block according to another embodiment of the present disclosure;

fig. 7 exemplarily shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 8 illustrates a block diagram of an electronic device for implementing the image processing method and apparatus of the embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Object detection refers to a technique for identifying and classifying objects in an image by using a deep learning model. Target detection typically includes target identification, target classification, target location positioning, and the like. At present, target detection is widely applied to the fields of automatic driving, security monitoring and the like. Before target detection by using a deep learning model, a large number of image samples are usually required to be used for model training, and the model training can be put into use after reaching a preset precision.

The image processing method provided by the embodiment of the disclosure can accurately distinguish positive and negative anchor blocks in the sample image, so that a high-quality training sample set can be provided for model optimization.

It should be appreciated that in the sample image, the anchor block may be a plurality of differently sized, differently aspect ratio bounding boxes generated centered at different pixels. The anchor block can also be understood as a priori block of different size, different aspect ratio, pre-set on the sample image.

In some embodiments, in the sample dividing method based on anchor blocks, an anchor block with an overlap ratio greater than a certain fixed overlap ratio threshold value is used as a positive sample anchor block according to the size of the overlap ratio by comparing the overlap ratio (Intersection of Union, abbreviated as IOU) between each anchor block and a real block corresponding to each target in an image, so as to optimize classification of a positive sample and regression of the position, size and shape of the target block, and an anchor block with an overlap ratio less than a certain fixed overlap ratio threshold value is used as a negative sample anchor block so as to optimize classification of a negative sample. It should be understood that the intersection ratio refers to the ratio of the intersection to the union between the anchor frame and the real frame, and may be used to represent the coincidence ratio between the anchor frame and the real frame.

For example, anchor blocks with different cross ratios of standard cards with a fixed threshold of 0.5/0.7 can be preset to distinguish between anchor blocks of positive and negative samples.

In other embodiments, the anchor boxes of the positive and negative samples may also be distinguished based on the center region of the real box.

Because each target in the image may have shielding, deformation and shape difference, and the two embodiments do not consider the influence of these factors, the two implementation methods cannot actually accurately distinguish the positive sample anchor block from the negative sample anchor block, and thus the model obtained by training cannot output a good prediction effect.

The disclosure will be described in detail below with reference to the drawings and specific examples.

A system architecture suitable for the image processing method and apparatus of the embodiments of the present disclosure is presented below.

Fig. 1 illustrates a system architecture suitable for the image processing method and apparatus of the embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other environments or scenarios.

As shown in fig. 1, a system architecture 100 in an embodiment of the present disclosure may include: a terminal 101 for acquiring training samples, a terminal 102 for model training and a terminal 103 for target detection.

In the disclosed embodiments, the target detection model may be trained based on FAST-R-CNN (FAST area-based convolutional network). It should be appreciated that the target recognition step of FAST-R-CNN includes: determining a plurality of candidate frames in an image, inputting the whole image into a backbone network of a convolutional neural network to obtain a feature map, finding a mapping area of each candidate frame on the feature map, further inputting the mapping area as a convolutional feature of each candidate frame into a spatial pyramid pooling layer and a layer behind the spatial pyramid pooling layer, judging whether the features extracted from the candidate frames belong to a specific class or not by using a classifier, and further adjusting the positions of the candidate frames belonging to a certain feature by using the classifier.

In the disclosed embodiment, the terminal 101 may be configured to perform an image processing method to obtain a sample set for model training. The terminal 102 may perform a corresponding model training method according to the sample set obtained by the terminal 101 to implement a corresponding model training. The terminal 103 may perform object detection on the specified image based on the model obtained by the terminal 102.

It should be noted that, the image processing and the model training may be implemented on the same terminal or may be implemented on different terminals.

Terminals

101, 102 and 103 may be servers or a server cluster.

It should be understood that the number of

terminals

101, 102, and 103 in fig. 1 is merely illustrative. There may be any number of

terminals

101, 102, and 103, as desired for implementation.

According to an embodiment of the present disclosure, the present disclosure provides an image processing method. The image processing method of the embodiment of the present disclosure may be performed by the terminal 101 in fig. 1 that acquires a sample image.

Fig. 2 shows a flowchart of an image processing method 200 according to an embodiment of the present disclosure.

As shown in fig. 2, the image processing method 200 according to the embodiment of the present disclosure includes operations S210 to S230.

In operation S210, a sample image including at least one real frame is acquired.

In operation S220, for each of the at least one real frame, K first candidate anchor frames are acquired according to a cross-over ratio (hereinafter referred to as a first cross-over ratio) between each anchor frame and the real frame in the sample image. Wherein K is an integer.

In operation S230, according to the cross-over ratios (hereinafter referred to as second cross-over ratios) between the K prediction frames corresponding to the K first candidate anchor frames in one-to-one correspondence in the sample image and the real frames, N second candidate anchor frames are taken out from the K first candidate anchor frames as positive sample anchor frames in the sample image, where N is smaller than K and N is also an integer.

In operation S220, it may be understood that a plurality of targets may be included in each sample image for each of the at least one real frame, each target corresponding to one real frame. Regardless of whether there is one or more real frames in the sample image, the method provided by the embodiments of the present disclosure may be performed for each real frame therein, such as performing operations S210-S230 to distinguish between positive and negative sample anchor frames in each sample image.

In addition, in operation S220, the anchor block may be understood as a priori block having a different size and a different aspect ratio. The different sizes and the different aspect ratios of the anchor blocks can be adjusted to accommodate different targets in the sample image. It should be appreciated that in the sample image, the true box corresponding to each object is determined. For each real frame, some anchor frames that are higher in intersection with the real frame may be selected from anchor frames in the sample image as first candidate anchor frames for selecting a positive sample anchor frame.

It should be appreciated that the higher the intersection ratio between the anchor frame and the real frame, the greater the overlap ratio of the anchor frame and the real frame. While in general, the greater the overlap ratio, the more active area that indicates the object contained in the anchor box. However, the possibility of special situations is not excluded, such as the intersection ratio between the anchor block or blocks and the real block or blocks being relatively large, but the effective area of the object contained in the anchor block or blocks being relatively small.

In order to select a positive sample anchor frame with higher quality, namely, an anchor frame with a larger effective area part containing a target is selected as the positive sample anchor frame, the embodiment of the disclosure also introduces a prediction frame. The prediction frame is obtained by predicting the corresponding anchor point frame. Therefore, some anchor blocks with larger intersection ratio can be initially screened out as first candidate anchor blocks based on the intersection ratio between the anchor blocks and the real blocks; further, the second filtering may be performed on the first candidate anchor frames based on the intersection ratio between the prediction frames and the real frames corresponding to the first candidate anchor frames, that is, some anchor frames corresponding to prediction frames with relatively large intersection ratio, that is, the second candidate anchor frames, are further filtered from the first candidate anchor frames, so as to be used as the final selected positive sample anchor frames. Since the prediction frame is obtained by predicting the corresponding first anchor frame, the higher the intersection ratio between the prediction frame and the real frame is, the larger the coincidence ratio between the prediction frame and the real frame is, and the larger the coincidence ratio between the prediction frame and the real frame is, the more the effective area part of the target contained in the prediction frame is. Therefore, the anchor block corresponding to the prediction frame is selected as the positive sample anchor block, so that the high-quality positive sample anchor block can be ensured to be obtained.

In the technical scheme of the embodiment of the disclosure, K first candidate anchor frames are selected from the anchor frames according to the first intersection ratio of the representation anchor frames and the real frames, N second candidate anchor frames are selected from the K first candidate anchor frames according to the second intersection ratio to serve as positive sample anchor frames in the sample image, namely, the positive sample anchor frames are selected according to the effective overlapping ratio between the prediction frames and the real frames, so that accuracy of determining the positive sample anchor frames is improved. In addition, the values of the first cross ratio, the values of the second cross ratio, the values of the candidate anchor blocks and the values of the positive sample anchor blocks can be changed to adapt to different sample images.

Fig. 3 shows a schematic diagram of an image processing method 300 according to an embodiment of the disclosure.

As shown in fig. 3, the K first candidate anchor blocks may include: among anchor blocks, an intersection ratio with a real block (which may be referred to as a first intersection ratio, which may be denoted as IOU 1) is arranged in order from the top to the bottom, and anchor blocks of the preceding K are arranged.

According to the technical scheme of the embodiment of the disclosure, according to the first intersection ratio, anchor blocks of K preceding rows are selected according to the sequence from large to small, so that anchor blocks with higher overlapping rate with a real frame can be preliminarily screened, namely K first candidate anchor blocks with more targets are obtained, and the selection accuracy of the anchor blocks of positive samples is improved.

For example, the intersection ratio between each anchor frame and the real frame may be calculated in advance, and shown in the form of a matrix. Each element in the matrix serves as a numerical value of the intersection ratio of one anchor block and the real block including the pixel point centered on the position.

As shown in fig. 3, in operation S330, according to the cross-over ratio (may be referred to as a second cross-over ratio, and may be denoted as IOU 2) between each of K prediction frames corresponding to the K first candidate anchor frames in one-to-one to the real frame in the sample image, N second candidate anchor frames may be extracted from the K first candidate anchor frames, and further the operations may include operations S31 to S33.

In operation S31, K prediction frames in the sample image, which are in one-to-one correspondence with the K first candidate anchor frames, are determined.

In operation S32, the cross-over ratio IOU2 between each of the K prediction frames and the real frame is acquired, and the acquired cross-over ratios are summed up to obtain a cross-over ratio summation result.

In operation S33, N second candidate anchor blocks are taken out from the K first candidate anchor blocks based on the sum of the cross ratios.

In the technical solution of the embodiment of the present disclosure, the prediction frames may be used as a medium for selecting the positive sample anchor frame from the first candidate anchor frames, and in operation S32, the sum of the intersection ratios between each of the K prediction frames and the real frame (i.e., the result obtained by summing the intersection ratios IOU2 between each of the K prediction frames and the real frame) characterizes the overlap ratio of the K prediction frames to the real frame, and the sum of the intersection ratios may be used as a reference for selecting the N second candidate frames, so as to improve the accuracy of positive sample anchor frame selection.

It should be understood that the ratio of the sum of each prediction frame and the real frame is between 0 and 1, and the value of the sum of the ratio of the sum of each prediction frame and the real frame IOU2 in the K prediction frames is between 0 and K.

As shown in fig. 3, in operation S32, extracting N second candidate anchor blocks from the K first candidate anchor blocks based on the cross-ratio summation result may include the following operations.

And calculating based on the sum of the cross ratios and the number of the first candidate frames to obtain a corresponding calculation result, wherein the numerical value of the calculation result is N.

And arranging the K predicted frames according to the sequence from larger to smaller of the intersection between each of the K predicted frames and the real frame than the IOU2, so as to obtain the sequence of the K predicted frames.

Based on the calculation result, each anchor block corresponding to the prediction block ranked in front N is taken as N second candidate anchor blocks.

According to the technical scheme of the embodiment of the disclosure, the size sequence of the intersection ratio IOU2 between each of the K prediction frames and the real frame is used as a selection standard, and N second candidate anchor point frames with smaller number are selected from the K prediction frames to serve as positive sample anchor point frames, so that selection of the positive sample anchor point frames can be optimized.

As shown in fig. 3, the value N of the calculation result can be calculated by the formula n=k+1- [ M ]. Wherein N also represents the number of second candidate anchor blocks, K represents the number of first candidate anchor blocks, M represents the value of the sum of the cross ratios, and 0 < M < K.

In the early stage of model training, the model parameters are not optimized in place, the value of M is smaller, and the selected number of the second candidate anchor frames obtained through calculation by the formula is larger, namely the value of N is larger, so that a large number of anchor frames are needed for optimizing the parameters of the model for each real frame in the early stage. In the later stage, the model parameter optimization is ideal, the value of M is larger, the selected number of the second candidate anchor frames obtained through calculation through the formula is smaller, namely the value of N is smaller, which means that each real frame only needs a small amount of parameters of the anchor frame optimization model in the later stage, and meanwhile, based on the condition that partial noise exists in the anchor frame matched with each real frame, the parameters of the anchor frame optimization model are required to be smaller than those in the earlier stage.

In the technical scheme of the embodiment of the disclosure, through the mapping of the formula, the number of the second candidate anchor blocks can be selected based on the actual conditions of the front and the rear of model training, so that the selection of the positive sample anchor blocks is more reasonable and accurate.

It should be further understood that, based on the summation result of the cross-ratios, the N second candidate anchor blocks are taken out from the K first candidate anchor blocks, so that the value of N is a positive integer, because the summation result of the cross-ratios represents the summation of the cross-ratios IOU2 between each of the K prediction blocks and the real block, so that the value interval of the summation result of the cross-ratios is (0, K), the value of the summation result of the cross-ratios may be a decimal, so that adaptively, the value of the summation result of the cross-ratios may be rounded first, and then the value of the calculation result N may be calculated, where n=k+1- [ M ], and the calculation formula of the value N of the calculation result may be rounded entirely, so as to obtain the value of the calculation result N, where n= [ k+1-M ], so as to ensure that the value of the calculation result N is a positive integer.

Illustratively, the sample training method according to an embodiment of the present disclosure may further include: and taking the anchor blocks except the positive sample anchor block in each anchor block as negative sample anchor blocks in the sample image.

It should be understood that, the numerical value of the intersection ratio of each positive sample anchor frame and the real frame selected through the above process is larger than the numerical value of the intersection ratio of each negative sample anchor frame and the real frame, the larger the numerical value of the intersection ratio is, which indicates that the overlapping area of the corresponding anchor frame and the real frame is larger, the method is more suitable for learning target features in training, and the negative sample anchor frame can be used as a negative sample for optimizing the negative sample.

It should also be understood that the negative sample anchor box is only a value having a smaller intersection ratio with respect to the positive sample anchor box and the real box, and does not mean that the negative sample anchor box and the real box do not coincide.

The image processing method of the embodiment of the present disclosure will be described below by way of example.

Fig. 4 shows an example of a sample image in which a person is taken as an example, a corresponding real box 40 is shown, and two

anchor boxes

41 and 42 are also shown. From this sample image, it is known that the intersection ratio between the anchor frame 41 and the real frame 40 is higher than that between the anchor frame 42 and the real frame 40, but is affected by the shape of the person, and in fact, the effective overlap between the anchor frame 42 and the real frame 40 is higher than that between the anchor frame 41 and the real frame 40.

It is foreseeable that if the size of the intersection ratio between the anchor frame and the real frame is merely taken as a standard for distinguishing between the positive and negative sample anchor frames, it is difficult to obtain high quality positive and negative sample anchor frames in some special cases.

As shown in fig. 5, the sample image includes a plurality of anchor blocks 51, where the cross-over ratio between each anchor block and the real block 50 is as shown, in an embodiment, the method for distinguishing the positive sample anchor block and the negative sample anchor block may distinguish the positive sample anchor block from the negative sample anchor block by using a fixed threshold, for example, a fixed threshold of 0.5, and in the sample image, an anchor block with a cross-over ratio greater than 0.5 (for example, an anchor block with a cross-over ratio of 0.52) may be used as the positive sample anchor block, and the rest of the anchor blocks may be used as the negative sample anchor blocks.

As shown in fig. 6, the sample image includes a plurality of anchor blocks 61, in this embodiment, the method for distinguishing the positive sample anchor block from the negative sample anchor block can distinguish the positive sample anchor block from the negative sample anchor block based on a central area (such as an area with gray scale in the figure), as shown in the figure, in the central area of the real frame 60, the anchor block with gray scale in the figure is used as the positive sample anchor block, and the rest anchor blocks are used as the negative sample anchor blocks.

Compared with the embodiments shown in fig. 5 and fig. 6, in the technical solution of the embodiments of the present disclosure, K first candidate anchor frames are initially selected according to a first intersection ratio between each anchor frame and a real frame, and then, a plurality of first candidate anchor frames are secondarily screened according to a second intersection ratio between a prediction frame corresponding to the K first anchor frames one by one and the real frame, so as to obtain N second candidate anchor frames, and finally, a reasonable number of second candidate anchor frames can be selected as positive sample anchor frames. Therefore, the selection of the positive and negative sample anchor blocks can be performed from a more comprehensive angle. Specifically, the number of positive sample anchor blocks can be changed according to each sample image, and the selection principle is not limited to the numerical value of the intersection ratio or the central area of the real frame, but the positive sample anchor blocks of a reasonable number are dynamically selected according to the intersection ratio of the prediction frame corresponding to the first candidate anchor block and the real frame and the mapping relation of the numerical value N of the calculation result. In addition, the number of the positive sample anchor blocks is more reasonable, the anchor blocks which are coincident with the real blocks where the targets are located are not omitted, and the anchor blocks which are not coincident with the real blocks where the targets are located are not used as the positive sample anchor blocks.

The present disclosure also provides an apparatus for sample training according to an embodiment of the present disclosure. Applied to a sample image containing at least one real frame,

as shown in fig. 7, the image processing apparatus 700 of the embodiment of the present disclosure includes a sample image acquisition module 710, a candidate anchor block acquisition module 720, and a positive sample anchor block determination module 730.

The sample image acquisition module 710 may be used to acquire a sample image containing at least one real frame. In an embodiment, the sample image obtaining module 710 may perform the above operation S210, which is not described herein.

The candidate anchor block obtaining module 720 may be configured to obtain, for each of the at least one real block, K first candidate anchor blocks according to an intersection ratio between each anchor block and the real block in the sample image. In an embodiment, the candidate anchor block obtaining module 720 may perform the above operation S220, which is not described herein.

The positive sample anchor block determining module 730 may be configured to extract N second candidate anchor blocks from the K first candidate anchor blocks according to the intersection ratios between the K prediction blocks corresponding to the K first candidate anchor blocks in one-to-one correspondence in the sample image and the real frames, so as to serve as positive sample anchor blocks in the sample image, where N is smaller than K. In an embodiment, the positive sample anchor block determining module 730 may perform the above operation S230, which is not described herein.

According to an image processing apparatus of an embodiment of the present disclosure, the K first candidate anchor blocks include: and the intersection ratio between each anchor point frame and the real frame is arranged in the order from big to small, so that the anchor point frames of the previous K are arranged.

According to an image processing apparatus of an embodiment of the present disclosure, a positive sample anchor block determination module includes: the prediction frame determination sub-module, the calculation sub-module and the candidate anchor frame selection sub-module. The prediction block determination sub-module may be configured to determine K prediction blocks in the sample image that are in one-to-one correspondence with the K first candidate anchor blocks. The computing sub-module may be configured to obtain a cross-over ratio between each of the K prediction frames and the real frame, and perform a summation computation on the obtained cross-over ratios to obtain a cross-over ratio summation result. The candidate anchor frame selection sub-module may be configured to extract N second candidate anchor frames from the K first candidate anchor frames based on the cross-ratio summation result.

According to an embodiment of the present disclosure, the candidate anchor frame selection submodule may include: the device comprises a calculation unit, a prediction frame determination unit and a candidate anchor point frame determination unit. The calculating unit may be configured to calculate, based on the sum of the cross ratios and the number of the first candidate anchor blocks, to obtain a corresponding calculation result, where a value of the calculation result is N. The prediction frame determination unit may be configured to determine, from among the K prediction frames, a prediction frame that is ranked N before from large to small, according to a magnitude of an intersection ratio between each of the K prediction frames and the real frame. The candidate anchor block determination unit may be configured to take, as the N second candidate anchor blocks, anchor blocks corresponding to prediction blocks ranked N in front one to one based on the calculation result.

According to the image processing apparatus of the embodiment of the present disclosure, the calculation unit calculates the numerical value N of the calculation result by the following formula: n=k+1- [ M ], where K represents the number of first candidate anchor blocks, M represents the value of the sum of the cross ratios, and 0 < M < K.

The image processing device according to the embodiment of the disclosure may further include a negative sample anchor block determination module. The negative sample anchor block determination module may be configured to treat anchor blocks other than the positive sample anchor block of the anchor blocks as negative sample anchor blocks in the sample image.

It should be understood that the embodiments of the apparatus portion of the present disclosure correspond to the same or similar embodiments of the method portion of the present disclosure, and the technical problems to be solved and the technical effects to be achieved also correspond to the same or similar embodiments, which are not described herein in detail.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in electronic device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, for example, an image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, e.g., 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

In the technical scheme of the disclosure, the related records, storage, application and the like of the sample image data all conform to the regulations of related laws and regulations and do not violate the popular regulations.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image processing method, comprising:

acquiring a sample image containing at least one real frame;

aiming at each real frame in the at least one real frame, obtaining K first candidate anchor frames according to the intersection ratio between each anchor frame and the real frame in the sample image;

determining K prediction frames in the sample image, wherein the K prediction frames are in one-to-one correspondence with the K first candidate anchor frames;

acquiring the cross-over ratio between each prediction frame in the K prediction frames and the real frame, and carrying out summation calculation on the acquired cross-over ratio to obtain a cross-over ratio summation result;

and taking each anchor point frame which is in one-to-one correspondence with the prediction frame ranked in front N as N second candidate anchor point frames according to the sequence from large to small of the intersection ratio between each of the K prediction frames and the real frame, and taking the N second candidate anchor point frames as positive sample anchor point frames in the sample image, wherein N is determined by utilizing the intersection ratio summation result, and N is smaller than K.

2. The method of claim 1, wherein the K first candidate anchor blocks comprise: and the intersection ratio between each anchor point frame and the real frame is arranged in the anchor point frames of the previous K according to the sequence from big to small.

3. The method of claim 1, wherein retrieving N second candidate anchor blocks from the K first candidate anchor blocks based on the sum of the cross ratios comprises:

calculating based on the sum of the cross ratios and the number of the first candidate anchor blocks to obtain a corresponding calculation result, wherein the numerical value of the calculation result is N;

arranging the K prediction frames according to the sequence from large to small of the intersection ratio between the K prediction frames and the real frame to obtain the sequence of the K prediction frames; and

and based on the calculation result, each anchor point frame corresponding to the prediction frames ranked in front N is used as the N second candidate anchor point frames.

4. A method according to claim 3, wherein the value N of the calculation result is calculated by the following formula:

N＝K+1-[M]；

wherein K represents the number of the first candidate anchor blocks, M represents the value of the sum of the cross ratios, and 0 < M < K.

5. The method of any one of claims 1 to 4, further comprising:

and taking the anchor blocks except the positive sample anchor block in the anchor blocks as negative sample anchor blocks in the sample image.

6. An image processing apparatus comprising:

a sample image acquisition module for acquiring a sample image containing at least one real frame;

the candidate anchor point frame acquisition module is used for acquiring K first candidate anchor point frames according to the intersection ratio between each anchor point frame and each real frame in the sample image aiming at each real frame in the at least one real frame; and

a positive sample anchor block determination module, the positive sample anchor block determination module comprising:

a prediction frame determining submodule, configured to determine the K prediction frames in the sample image that are in one-to-one correspondence with the K first candidate anchor frames;

the computing sub-module is used for obtaining the cross-over ratio between each prediction frame in the K prediction frames and the real frame, and carrying out summation computation on the obtained cross-over ratio to obtain a cross-over ratio summation result; and

and the candidate anchor point frame selecting sub-module is used for taking each anchor point frame which is in one-to-one correspondence with the prediction frames ranked in front N as N second candidate anchor point frames according to the sequence from large to small of the cross ratio between each of the K prediction frames and the real frame, and taking the N second candidate anchor point frames as positive sample anchor point frames in the sample image, wherein N is determined by utilizing the cross ratio summation result and is smaller than K.

7. The apparatus of claim 6, wherein the K first candidate anchor blocks comprise: and the intersection ratio between each anchor point frame and the real frame is arranged in the anchor point frames of the previous K according to the sequence from big to small.

8. The apparatus of claim 6, wherein the candidate anchor selection submodule comprises:

the calculation unit is used for calculating based on the sum of the cross ratios and the number of the first candidate anchor blocks to obtain a corresponding calculation result, wherein the numerical value of the calculation result is N;

a prediction frame determining unit, configured to determine, according to the magnitude of the intersection ratio between each of the K prediction frames and the real frame, a prediction frame in which N is ranked in front from large to small among the K prediction frames; and

and the candidate anchor point frame determining unit is used for taking each anchor point frame corresponding to the prediction frames ranked in front N one by one as the N second candidate anchor point frames based on the calculation result.

9. The apparatus according to claim 8, the calculation unit calculates the numerical value N of the calculation result by the following formula:

N＝K+1-[M]；

10. The apparatus of any of claims 6 to 9, further comprising:

and the negative sample anchor block determining module is used for taking the anchor blocks except the positive sample anchor block in each anchor block as negative sample anchor blocks in the sample image.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.