WO2020042126A1

WO2020042126A1 - Focusing apparatus, method and related device

Info

Publication number: WO2020042126A1
Application number: PCT/CN2018/103370
Authority: WO
Inventors: 马彦鹏; 宋永福; 杨琪; 王军; 陈聪
Original assignee: 华为技术有限公司
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2020-03-05
Also published as: CN112602319A; CN112602319B

Abstract

Disclosed are a focusing apparatus, method and related device. The focusing apparatus includes a processor, and an NPU and an ISP coupled to a CPU; the ISP is used to generate a first image; the NPU is used to acquire a first region of interest (ROI) set in the first image, wherein the first ROI set includes one or more first ROIs, and each first ROI includes a photographic object; and the CPU is used to: acquire a second ROI set in the first image, wherein the second ROI set includes one or more second ROIs, and each second ROI is a motion region; determine a target ROI in the first image based on the first ROI set and the second ROI set; and according to characteristic information of the target ROI, identify position information and size information of the target ROI in the second image and perform focusing, wherein the first image is located before the second image in a time domain. By means of the present application, the accuracy of focusing can be improved.

Description

Focusing device, method and related equipment

Technical field

The present application relates to the field of image processing technologies, and in particular, to a focusing device, method, and related equipment.

Background technique

Smartphone camera photography technology is moving towards SLR. At present, many smartphone cameras have surpassed traditional card cameras in terms of camera capabilities. High-quality photography relies on high-precision focusing technology. In the shooting of static scenes, the existing focusing technology generally places the focus point on the center of the screen. This focusing method can meet the needs of most consumers. When the shooting target is not in the center of the field of view, the center focus at this time will often cause the shooting target to be blurred. When shooting dynamic scenes, especially when the target is moving fast, this fixed center focus cannot meet the needs, so it is urgent to develop high-precision motion tracking technology.

Summary of the Invention

Embodiments of the present invention provide a focusing device, method, and related equipment to improve focusing accuracy.

In a first aspect, an embodiment of the present invention provides a focusing device, including a processor, a neural network processor and an image signal processor coupled to the processor; the image signal processor is configured to generate a first image The neural network processor is configured to obtain a first region of interest ROI set in the first image, where the first ROI set includes one or more first ROIs, and each first ROI includes one shooting An object; the processor, configured to: obtain a second ROI set in the first image, where the second ROI set includes one or more second ROIs, and each second ROI is a motion region; based on the first A ROI set and the second ROI set determine a target ROI in the first image; determine characteristic information of the target ROI; and identify the target ROI in the image signal processing according to the characteristic information of the target ROI Position information and size information in the second image generated by the processor, the first image is located before the second image in the time domain; and focusing is performed according to the position information and size information.

In the embodiment of the present invention, one or more candidate shooting objects are obtained by using NPU for AI object detection through image frames generated by the ISP in the focusing device, and one or more candidate motion areas are obtained by using a processor for moving object detection. Finally, the detected subject and the motion area are integrated to determine the target ROI to be finally focused, and subsequent tracking and focusing are performed based on the characteristic information of the target ROI. That is, using AI target detection and moving target detection, automatically comprehensively identify the target ROI in the field of view FOV, and then use the target ROI tracking algorithm to accurately calculate the real-time motion trajectory and size of the target ROI, and finally use the autofocus AF algorithm to calculate Movement track, do sports follow focus. The entire process does not require manual intervention by the user and the tracking focus is accurate, which greatly improves the shooting experience and effect.

In a possible implementation manner, the processor is specifically configured to determine a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is in the first Within a first preset region of an image; determining a valid second ROI from one or more second ROIs in the second ROI set, the valid second ROI being within a second preset of the first image Within a region; and in a case where an intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as a target ROI.

In the embodiment of the present invention, the first ROI set and the second ROI set are filtered to improve the recognition accuracy of the target ROI. And when the overlapping area between the effective first ROI and the effective second ROI is large, it indicates that the detection of the subject and the moving area at this time is likely to include the effective first area, so the effective first area can be As the target ROI.

In a possible implementation manner, the processor is further specifically configured to: when the intersection ratio of the effective first ROI and the effective second ROI is less than a preset threshold, reduce the effective first ROI The ROI between the two ROIs and the effective first ROI which is closer to the center point of the first image is determined as the target ROI.

In the embodiment of the present invention, when the overlapping area between the effective first ROI and the effective second ROI is small, it may indicate that the detection at this time is incorrect or the target ROI is drifting, so an ROI closer to the center point may be selected As the target ROI.

In a possible implementation manner, the valid first ROI has a highest evaluation score in one or more first ROIs within a first preset region of the first image; and / or the valid first ROI The two ROIs have the highest evaluation score in one or more second ROIs in the second preset region of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: the area with the ROI Proportionally proportional to the distance of the ROI from the center point of the first image, and proportional to the priority of the object category to which the ROI belongs.

In the embodiment of the present invention, when there are still multiple ROIs that may still exist after the processor performs filtering through a preset area, at this time, the area of the ROI, the distance from the center point of the first image, and the priority of the category to which the subject belongs The level is judged, and an ROI with a higher possibility of tracking and focusing is selected.

In a possible implementation manner, the processor is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.

In a possible implementation manner, the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image. The at least one third image is Time domain is located between the first image and the second image.

In the embodiment of the present invention, the processor not only needs to determine the initial value of the target ROI, but also needs to update the feature information in real time based on the motion tracking situation of the target ROI to more accurately track the focus.

In a possible implementation manner, the processor is further configured to: recalculate the target ROI after a first preset time period; or when the tracking confidence of the target ROI is less than a confidence threshold , Recalculating the target ROI, wherein the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy.

In the embodiment of the present invention, the processor not only needs to update the feature information in real time based on the tracking situation of the target ROI to more accurately track the focus, but also the updated feature information needs to be time-efficient. When the confidence level of the target ROI is low, it is necessary to consider initializing related parameters to perform a new round of confirmation and tracking of the target ROI.

In a possible implementation manner, the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.

The embodiments of the present invention provide multiple extraction methods of feature information to meet the requirements for extracting feature information in different images or different scenes.

In a second aspect, an embodiment of the present invention provides a focusing method, which may include:

Determining a first region of interest ROI set and a second ROI set, the first ROI set being a ROI set obtained from a first image generated by an image signal processor, the first ROI set including one or more first ROI, each first ROI includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more second ROIs, each The two ROIs are moving regions; determining a target ROI in the first image based on the first ROI set and the second ROI set; determining characteristic information of the target ROI; and identifying based on the characteristic information of the target ROI Position information and size information of the target ROI in a second image generated by the image signal processor, and the first image is located before the second image in the time domain; based on the position information and size information, Focus.

In a possible implementation manner, the determining a target ROI in the first image based on the first ROI set and the second ROI set includes: from one or more of the first ROI set A valid first ROI is determined from each of the first ROIs, the valid first ROI is within a first preset region of the first image; and a valid is determined from one or more second ROIs in the second ROI set A second ROI, where the effective second ROI is within a second preset region of the first image; and at an intersection of the effective first ROI and the effective second ROI that is greater than or equal to a preset threshold IoU In this case, the valid first ROI is determined as a target ROI.

In a possible implementation manner, the method further includes: when the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, dividing the effective second ROI with A ROI closer to the center point of the first image in the effective first ROI is determined as a target ROI.

In a possible implementation manner, the method further includes: updating the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.

In a possible implementation manner, the method further includes: recalculating the target ROI after a first preset period of time; or re-calculating the target ROI when the tracking confidence is less than a confidence threshold. Calculate the target ROI, wherein the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy.

According to a third aspect, an embodiment of the present invention provides a focusing device, which may include:

A first processing unit, configured to determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI set Including one or more first ROIs, each of which includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more A second ROI, each second ROI being a motion region; a second processing unit, configured to determine a target ROI in the first image based on the first ROI set and the second ROI set; a third processing unit, Used to determine feature information of the target ROI; a recognition unit, configured to identify position information and size information of the target ROI in a second image generated by the image signal processor according to the feature information of the target ROI, The first image is located before the second image in a time domain; a focusing unit is configured to focus according to the position information and size information.

In a possible implementation manner, the second processing unit is specifically configured to determine a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is Within a first preset region of the first image; determining a valid second ROI from one or more second ROIs in the second ROI set, the valid second ROI being within a first Within two preset regions; and in a case where an intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as a target ROI.

In a possible implementation manner, the second processing unit is further configured to:

In a case where the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.

In a possible implementation manner, the third processing unit is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.

In a possible implementation manner, the apparatus further includes:

A first initialization unit, configured to recalculate the target ROI after a first preset time period; or

A second initialization unit, configured to recalculate the target ROI when the tracking confidence of the target ROI is less than a confidence threshold, where the tracking confidence is used to indicate the tracking accuracy of the target ROI, The tracking confidence is directly proportional to the tracking accuracy.

According to a fourth aspect, an embodiment of the present invention provides an electronic device, including an image sensor and the focusing device according to any one of the foregoing first aspects; wherein

The image sensor is used to collect image data;

The image signal processor is configured to generate the first image based on the image data.

In a possible implementation manner, the electronic device further includes: a memory for storing program instructions; and the program instructions are executed by the processor.

In a fifth aspect, the present application provides a focusing device having the function of implementing any of the above-mentioned focusing methods. This function can be realized by hardware, and can also be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

In a sixth aspect, the present application provides a terminal. The terminal includes a processor, and the processor is configured to support the terminal to perform a corresponding function in a focusing method provided in the second aspect. The terminal may further include a memory, which is used for coupling with the processor, and stores the program instructions and data necessary for the terminal. The terminal may further include a communication interface for the terminal to communicate with other devices or a communication network.

In a seventh aspect, the present application provides a computer storage medium that stores a computer program that, when executed by a processor, implements the focusing method flow described in any one of the second aspects.

In an eighth aspect, an embodiment of the present invention provides a computer program. The computer program includes instructions. When the computer program is executed by a computer, the computer can execute the focusing method process according to any one of the second aspects.

In a ninth aspect, the present application provides a chip system that includes a processor, and is configured to implement functions involved in the focusing method process in any one of the foregoing second aspects. In a possible design, the chip system further includes a memory, and the memory is configured to store program instructions and data necessary for the focusing method. The chip system can be composed of chips, and can also include chips and other discrete devices.

BRIEF DESCRIPTION OF THE DRAWINGS

1 is a schematic structural diagram of a focusing device according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a first image according to an embodiment of the present invention; FIG.

3 is a schematic structural diagram of another focusing device according to an embodiment of the present invention;

4 is a schematic diagram of a functional principle of a focusing device according to an embodiment of the present invention;

5 is a schematic diagram of an SSD network implementation process provided by an embodiment of the present invention;

6 is a schematic diagram of screening a target ROI provided by an embodiment of the present invention;

FIG. 7 is a schematic flowchart of determining a target ROI according to an embodiment of the present invention; FIG.

FIG. 8 is a schematic flowchart of a target ROI tracking process according to an embodiment of the present invention; FIG.

9 is a schematic diagram of target ROI tracking provided by an embodiment of the present invention;

10 is a schematic diagram of updating feature information of a target ROI according to an embodiment of the present invention;

11 is a hardware structural diagram of a neural network processor according to an embodiment of the present invention;

12 is a schematic flowchart of a focusing method according to an embodiment of the present invention;

FIG. 13 is a schematic structural diagram of another focusing device according to an embodiment of the present invention.

detailed description

The embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

The terms "first", "second", "third", and "fourth" in the description and claims of the present application and the drawings are used to distinguish different objects, not to describe a specific order . Furthermore, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device containing a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units that are not listed, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

Reference to "an embodiment" herein means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are they separate or alternative embodiments that are mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

The terms “component”, “module”, “system” and the like used in this specification are used to indicate computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and / or a computer. By way of illustration, both an application running on a computing device and a computing device can be components. One or more components can reside within a process and / or thread of execution, and a component can be localized on one computer and / or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more data packets (e.g., data from two components that interact with another component between a local system, a distributed system, and / or a network, such as the Internet that interacts with other systems through signals) Communicate via local and / or remote processes.

First, some terms in this application are explained so as to facilitate understanding by those skilled in the art.

(1) Region of interest (ROI). In machine vision and image processing, the area to be processed is outlined from the processed image in the form of boxes, circles, ellipses, and irregular polygons. It is called interest. region.

(2) Artificial Intelligence (AI) is a theory, method, technology, and method that uses digital computers or digital computer-controlled machines to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. operating system. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic theories of AI.

(3) Convolutional Neural Network (CNN) is a multi-layer neural network. Each layer consists of multiple two-dimensional planes, and each plane consists of multiple independent neurons. The neurons share weights, and the number of parameters in the neural network can be reduced by weight sharing. At present, in a convolutional neural network, a processor performing a convolution operation usually converts a convolution of an input signal feature and a weight into a matrix multiplication operation between a signal matrix and a weight matrix. In the specific matrix multiplication operation, the signal matrix and the weight matrix are divided into blocks to obtain multiple Fractional signal matrices and fractal weight matrices, and then matrix multiplication and accumulation are performed on the multiple fractal signal matrices and fractal weight matrices.

(3) Image Signal Processing (ISP) is a unit that is mainly used to process the output signal of the front-end image sensor to match the image sensors of different manufacturers. Image processor (ISP) for cameras. The pipelined image processing engine can process image signals at high speed. It is also equipped with a dedicated circuit for the evaluation of Auto Exposure / Auto Focus / Auto White Balance.

(4) Intersection-over-Union (IoU), a concept used in object detection, is the overlap rate between the generated candidate frame and the ground truth frame, that is, their The ratio of intersection to union. Ideally, they are completely overlapping, that is, the ratio is 1.

In order to facilitate understanding of the embodiments of the present invention, the following technically addresses the technical problems solved by the embodiments of the present invention and the corresponding actual application scenarios. Common shooting scenarios and corresponding focusing methods include the following.

Scenario 1, technical solution of static scenario:

1) The center focus method. A fixed center position is set in advance as the focus area.

2) The user manually touches the target position in the screen as the focus area.

Disadvantages of the focusing solution in the above static scene:

1) The center focus area is limited. When the subject's target is off-center, the focus cannot be focused on the target;

2) The user manually selects the target focus. The AF algorithm needs to reconfigure the focus point, which lengthens the focusing time and the user's photo taking time. When the target starts to move, the focus cannot follow the target movement in real time.

Scenario 2: Technical solution for shooting dynamic scenes:

1) Focus tracking method based on the detection of the feature points. This method detects the feature points in the picture in real time, and then sets the focus on the feature points.

2) Target tracking method based on motion detection, through the content changes of the two frames before and after, quickly identify the moving objects in the shooting scene, and output the moving area to the AF algorithm in real time, and then adjust the focus point to the moving area in real time to achieve the moving target In addition, in the prior art, an artificial intelligence servo autofocus function is implemented in the prior art. In a high-speed continuous focusing mode of a moving subject, half-press the shutter to capture the subject in the viewfinder and detect its movement track. The built-in autofocus sensor in the SLR can identify whether the object is stationary or moving, and identify its moving direction, so that it can achieve accurate focus when shooting sports, children or animals.

Disadvantages of the focusing scheme in the above dynamic scene:

1) Using the focus tracking method of feature point detection, it is easy to detect the place with rich background texture, so that the focus cannot be really focused on the target.

2) Auto focus tracking method based on moving target detection method: It is easy to detect the moving area when the background around the moving target changes, so it is easy to false trigger and misfocus; the moving target trajectory is not smooth, the jump is serious, resulting in discontinuous focusing; when When the camera is moving or unstable, it is easy to detect moving objects in the picture, but the shooting target is still at this time, so it is easy to cause misfocus.

Therefore, for the above two scenarios, the problems and application scenarios that the embodiments of the present invention mainly solve include the following:

1. When shooting a static scene, the problem of selecting the focus area when the target object is not in the center, AI object detection algorithm is used to detect the main object in the picture, and then the main object area is input to the target tracking algorithm to monitor the status of the target in real time When the target is stationary, the AF algorithm directly sets the focus on the main target object to stabilize the focus. When the target starts to move, the tracking algorithm will follow the target's movement in real time, and the AF algorithm will do the tracking focus in real time.

2.When shooting dynamic scenes, the AI object detection algorithm combined with the moving target detection algorithm comprehensively outputs the main object in the current picture, and then the target tracking algorithm monitors the position area and size of the output moving target in real time to solve the misidentification of the moving target and the target Problems such as smoothness, unstable target tracking, and discontinuous focus.

It can be understood that the foregoing application scenarios are only a few exemplary implementations in the embodiments of the present invention, and the application scenarios in the embodiments of the present invention include but are not limited to the above application scenarios.

Based on the above, the following describes in conjunction with the focusing device and related equipment provided by the embodiments of the present invention. Please refer to FIG. 1. FIG. 1 is a schematic structural diagram of a focusing device according to an embodiment of the present invention. The focusing device 10 may include a processor 101, a neural network processor 102 and image signal processing coupled to the processor 101.器 103; wherein,

Image Signal Processor (ISP) 103 is used to generate the first image, which can match the image sensors of different manufacturers to process the image data output by the front-end image sensor, and generate corresponding image signals based on the image data .

A neural network processor (Neutral Processing Unit, NPU) 102, configured to obtain a first region of interest ROI set in the first image, where the first ROI set includes one or more first ROIs, and each first The ROI includes a subject. For example, the subject can be any object, such as a person, an animal, a building, a plant, etc. When the neural network processor 102 recognizes that there is a flower, a person, and a dog in the first image, the first ROI set includes The three first ROIs are plants, people, and animals. As shown in FIG. 2, FIG. 2 is a schematic diagram of a first image provided by an embodiment of the present invention. In FIG. 2, the NPU recognizes a human face (area 1), a dog face (area 3), a flower (area 4), and The table (area 5) is the first ROI.

A processor (Central Processing Unit) 101 is configured to obtain a second ROI set in the first image, and determine a target ROI in the first image based on the first ROI set and the second ROI set. Determining characteristic information of the target ROI; identifying position information and size information of the target ROI in the second image generated by the image signal processor 103 according to the characteristic information of the target ROI, and according to the position Information and size information to focus. The second ROI set includes one or more second ROIs, and each second ROI is a motion region. For example, if a puppy is moving through a frame or frames before the first image and the first image, then the area where the puppy is located in the first image is determined as the second ROI. It can be understood It is true that when multiple objects in the field of view are detected, multiple second ROIs can also be determined. The first image is located before the second image in the time domain, that is, the feature information of the target ROI determined by integrating AI recognition and motion detection in the previously collected and generated image is used as a basis for subsequent tracking of the target ROI. Real-time tracking focus. It can be understood that if no object movement is detected in the first image, the second ROI set may also be an empty set, which is equivalent to a static shooting scene. As shown in FIG. 2, the CPU detects that a person is moving through the motion, and thus recognizes that the region 2 where the character is located is a motion region, that is, a second ROI.

It can be understood that the processor 101 is further configured to, for example, run a general operating system software, and control the neural network processor 102 and the image signal processor 103 to perform focusing under the function of the general operating system software. For example, the first image generated by the image signal processor 103 is sent to the neural network processor 102 to obtain a first ROI set, and the first ROI set obtained by the neural network processor 102 is received. Further, the processor 101 is further configured to complete calculation processing and control related to the focusing process.

Optionally, the aforementioned neural network processor may also be integrated in the processor 101 as a part of the processor 101; it may also be another functional chip coupled to the processor 101 and capable of obtaining the first ROI set; Similarly, the functions performed by the processor 101 may be distributed and executed on multiple different function chips, which is not specifically limited in the embodiment of the present invention.

Please refer to FIGS. 3 and 4. FIG. 3 is a schematic structural diagram of another focusing device according to an embodiment of the present invention, and FIG. 4 is a functional principle schematic diagram of a focusing device according to an embodiment of the present invention. The focusing device 10 may include a processor 101, a neural network processor 102 and an image signal processor 103 coupled to the processor 101, and a lens 104, an image sensor 105, and a focusing device coupled to the image signal processor 103. Voice Coil Motor (VCM) 106;

The lens 104 is configured to focus the optical information of the real world on the image sensor through the principle of optical imaging. For example, the lens 104 may be a rear camera, a front camera, a rotary camera, etc. of a terminal (such as a smart phone).

The image sensor 105 is configured to output image data based on optical information collected by the lens 103 to provide the image data to the image signal processor 103 to generate a corresponding image signal.

The focus motor 106 may include a mechanical structure for performing static or dynamic focusing based on the position information and size information of the target ROI determined by the processor 101. For example, if the processor 101 recognizes that the target ROI is in a stationary state, the processor 101 controls the focus motor 106 to perform static focusing; if the processor 101 recognizes that the target ROI is in a moving state, the processor 101 controls the focus motor 106 to perform dynamic focusing .

It can be understood that, for the functions of the processor 101, the neural network processor 102, and the image signal processor 103, refer to the related description in FIG. 1 described above, and details are not described herein again.

Optionally, the focusing device in FIG. 1 or FIG. 3 may be located in a terminal (such as a smart phone, a tablet, a smart wearable device, etc.), a smart camera device (a smart camera, a smart camera, a smart tracking device), and a smart monitoring device. , Aerial drones, etc., this application will not list them one by one.

In the embodiment of the present invention, one or more candidate shooting objects are obtained through AI object detection using the NPU through the image frames generated by the ISP in the focusing device of FIG. 1 or FIG. 3 described above, and one or more are obtained through moving object detection using the processor. A plurality of candidate motion regions are combined with the detected shooting objects and motion regions to determine a target ROI to be finally focused, and subsequent tracking and focusing are performed based on the characteristic information of the target ROI. That is, AI target detection and moving target detection are used to automatically and comprehensively identify the target ROI in the field of view FOV, and then use the target ROI tracking algorithm to accurately calculate the real-time motion trajectory and size of the target ROI. Finally, the auto-focus AF algorithm is based on the real-time target ROI. Movement track, do sports follow focus. The entire process does not require manual intervention by the user and the tracking focus is accurate, which greatly improves the shooting experience and effect.

In a possible implementation manner, in the above-mentioned focusing device 10 (including the focusing device in FIG. 1 and FIG. 2, which will not be described later), the neural network processor 102 obtains the first ROI set in the first image, and specifically implements The method can be as follows:

The neural network processor 102 uses an AI object detection algorithm to obtain the target object in the picture (the first image), that is, the target ROI, and uses a general structure (such as the first few layers of structures such as resnet18, resnet26, etc.) as the basic network, and then on this basis Add other layers as the detection structure. The classification base model extracts the low-level features of the image to ensure that the low-level features can be distinguished. By adding a classifier of shallow features, it can help improve the classification performance. The detection part makes it possible to output a series of discretized bounding boxes on feature maps at different levels and the probability that each box contains an object instance. Finally, a non-maximum suppression (NMS) algorithm is performed to obtain the final object prediction result. Further, the detection model algorithm may adopt a single shot detection (SSD) framework. Please refer to FIG. 5. FIG. 5 is a schematic diagram of an SSD network implementation process provided by an embodiment of the present invention. The network implementation process may include the following: The main steps:

1. The main body adopts a one-stage detection structure, which prevents a large number of candidate target positions similar to faster-rcnn from entering two stages, thereby greatly improving the detection speed.

2. Using multi-scale feature mapping and multi-scale feature methods, each layer of features has different receptive fields, so that it can adapt to detect targets of different sizes and achieve better performance.

3. Use different sizes and ratios of default boxes. The default boxes determine the initial position of the final prediction box. Through different sizes and ratios, it can adapt to different sizes and shapes of the main object, and give the optimal initial value to make the prediction more accurate. accurate.

Since the AI object detection algorithm runs on the NPU, considering the limitation of power consumption performance, it can output detection results every 10 frames. The types of objects that can be detected include: flowers, people, cats, dogs, birds, bicycles, buses, Motorcycle, truck, car, train, boat, horse, kite, balloon, vase, bowl, plate, cup, classic handbag. The priority of the object category to which the shooting object belongs can be divided into four levels, the first priority is human, the second priority is flower, the third priority is cat and dog, and the fourth priority is the rest.

In a possible implementation manner, the specific implementation manner of the processor 101 in the focusing device 10 acquiring the second ROI set in the first image may be as follows:

The processor 101 may obtain a second ROI set by using a moving target detection algorithm. For example, the moving object detection algorithm is performed once every two frames, that is, the moving area in the current image is output every two frames. Optionally, the speed of the movement and the direction of the movement can be further output. As shown in FIG. 2, region 2 is the second ROI, which is the motion region output by the motion detection algorithm, and region 1 is the final target ROI.

In a possible implementation manner, the specific implementation manner that the processor 10 in the focusing device 10 determines the target ROI in the first image based on the first ROI set and the second ROI set may be: a processor 101 Determine a valid first ROI from one or more first ROIs in the first ROI set, and determine a valid second ROI from one or more second ROIs in the second ROI set; and When the effective first ROI crosses the effective second ROI and the ratio IoU is greater than or equal to a preset threshold, determining the effective first ROI as the target ROI; wherein the effective first ROI is in the Within a first preset region of the first image; the effective second ROI is within a second preset region of the first image. Further optionally, in a case where the intersection ratio of the effective first ROI and the effective second ROI is less than a preset threshold, the processor 101 adds the effective second ROI to the effective first ROI. A ROI closer to the center point of the first image is determined as a target ROI. That is, when the overlapping area between the effective first ROI and the effective second ROI is large, it indicates that the detection of the subject and the moving area at this time may include the effective first area, so the effective first area can be As the target ROI; when the overlapping area between the effective first ROI and the effective second ROI is small, it may indicate that the detection is wrong or the target ROI is drifting, so the ROI closer to the center point can be selected as Target ROI. Optionally, the target ROI may also be selected according to other calculation rules, such as combining a valid first ROI and a valid second ROI to obtain a new ROI, which is not enumerated in this application.

Please refer to FIG. 6. FIG. 6 is a schematic diagram of screening a target ROI provided by an embodiment of the present invention. For example, a first image (field of view of a camera) displayed on a mobile phone screen in FIG. 6 has a width of width and a height of height. ; For the subject recognition, the first ROI is valid within the first preset area, for example, for the first preset area range, the length or width of the invalid area w1 = min (width, height) × 0.2, at this time, ROI2 is valid, ROI0 and ROI1 are invalid. For the recognition of the moving region, the second ROI is valid within the second preset region. For example, for the second preset region, the length or width of the invalid region w2 = min (width, height) × 0.1; at this time, ROI1 and ROI2 is valid, ROI0 is invalid.

Further optionally, the effective first ROI has the highest evaluation score in one or more first ROIs in the first preset region of the first image; and / or the effective second ROI is in the The one or more second ROIs in the second preset region of the first image have the highest evaluation score; wherein the evaluation score of each ROI satisfies at least one of the following: proportional to the area of the ROI, and The distance of the ROI from the center point of the first image is inversely proportional to the priority of the object category to which the ROI belongs. That is, when multiple ROIs may still exist after filtering through the corresponding preset regions, at this time, the area of the ROI, the distance from the center point of the first image, and the priority of the category to which the subject belongs can be determined. Select an ROI from which the tracking focus is more likely. For example, the closer to the center point, the larger the area, and the more likely the ROI belonging to a person is used as the tracking target ROI. For another example, the priority of different object categories can also be set according to the current shooting mode. For example, in portrait mode, people have the highest priority, and in landscape mode, plants or buildings have the highest priority.

Please refer to FIG. 7, which is a schematic flowchart of determining a target ROI according to an embodiment of the present invention. In FIG. 7, AI object detection is performed by the NPU to obtain a first ROI set, and moving object detection is performed by the CPU to obtain a second ROI set. Because there may be multiple first and second ROIs detected at this time, and the recognition accuracy and accuracy are low, there are some ROIs that do not need to be focused (for example, shooting flowers in the background picture, inadvertently straying in the shooting background) Moving objects, etc.). Therefore, it needs to be screened by the CPU. First, the processor 101 detects whether the first ROI in the first ROI set and the second ROI in the second set are valid. For the AI object detection branch and / or motion area detection branch, when there is only one ROI, the ROI is directly output; when there are multiple targets, different targets can be comprehensively scored according to the following formula: 1. Obtain the ROI in each ROI separately. Priority of the object category to which the subject belongs; 2. the size area of each first ROI; 3. the distance dist of each first ROI from the center of the screen; comprehensive score Score = 0.4 × priority + 0.4 × area + 0.2 / dist, select the ROI with the highest score as the effective ROI of the branch, and finally determine the target ROI according to the intersection ratio between the effective first ROI and the effective second ROI.

Optionally, in addition to the above-mentioned method for determining the target ROI, the focusing device 10 in the embodiment of the present invention may also combine other preset strategies to provide different methods for determining the target ROI in different scenarios. For example, the preset strategy may include: 1) user-specified priority; 2) AI object detection priority; 3) motion detection priority; 4) joint selection of object detection and motion detection.

In a possible implementation manner, the feature information of the target ROI determined by the processor 101 in the above-mentioned focusing device 10 includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information. For example, it only includes the color feature Hog information extracted by the processor 101, only the directional gradient hog information extracted by the processor 101, or only the CNN information extracted by the neural network processor 102, or it is one of the three types of information described above. Any two, or a combination of three. It should be emphasized that the above-mentioned direction gradient hog information and color lab information can be extracted by the processor 101, and CNN information can be extracted by the neural network processor 102, and then sent to the processor 101 through the neural network processor 102.

In a possible implementation manner, the processor 101 further updates the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image. In another possible implementation manner, the feature information of the target ROI is determined according to the feature information of the first image corresponding to the target ROI and the feature information of at least one third image, the at least one third image Located between the first image and the second image in the time domain. That is, the processor 10 in the focusing device 10 is in the process of identifying the position information and the size information of the target ROI in the second image generated by the image signal processor according to the characteristic information of the target ROI. Taking the feature information of the target ROI in the first image as initial feature information, and subsequently updating the initial feature information based on the feature information corresponding to the position and size of the target ROI in each frame of the image during the tracking process, To ensure the accuracy of tracking the target ROI. Further, the processor 101 recalculates the target ROI after the first preset time period; or when the tracking confidence of the target ROI is less than the confidence threshold, the target ROI is recalculated, where The tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy. In the embodiment of the present invention, the processor 101 not only needs to update the feature information in real time based on the tracking condition of the target ROI to more accurately track the focus, but also the updated feature information is time-effective. After a long period of time, or the currently tracked When the confidence level of the target ROI is low, it is necessary to consider initializing related parameters to perform a new round of confirmation and tracking of the target ROI.

Please refer to FIG. 8, which is a schematic diagram of a target ROI tracking process according to an embodiment of the present invention. After the feature extraction of the target ROI is completed, the processor 101 selects a certain feature or a combination of multiple features to determine the feature information according to a preset rule, and determines whether to initialize the tracker after the rule judgment. If the tracker does not need to be initialized, directly Enter the tracking calculation, output the position and size information of the target ROI, and output a possible response map of the target's position, and finally update the feature information based on the new position and size of the target ROI, which can mainly include the following steps:

1. Feature selection: This part can choose different feature combinations according to different needs, such as using the hog feature alone, or a combination of hog + lab + cnn;

2. Is it initialized? :

1) Start, start the tracking system, and initialize the tracker;

2) Based on the confidence obtained from the tracking post-processing, when mConfidence <0.2; and the main target selection module outputs a new ROI, the tracker needs to be re-initialized;

3. Post-processing after tracking:

1) After passing the tracking calculation module, the tracking calculation algorithm uses related filtering algorithms, such as KCF (Kernel Correlation Filte), ECO (Efficient Convolution Operators), etc. The response graph for each frame of image output is w × h floating point two-dimensional The array F [w] [h] can be described as F _{w, h} , which has been normalized to the range of 0 to 1.0. Among them, the response map reflects the possible distribution of the target ROI in the picture, and the largest point is where the target ROI is located. The position can reflect the confidence level of the target ROI tracking through the response graph.

2) Confidence analysis:

(a) Calculate the maximum value Fmax according to the response graph as the tracking confidence of the current frame;

Confidence = max (F [w] [h]);

(b) The average correlation peak energy index is average peak-to-correlation energy (APCE), where

Among them, F _max is max (F [w] [h]), which is the maximum value of (F [w] [h]); F _min is min (F [w] [h]), which is The minimum value of (F [w] [h]); ∑ _{w, h} (F _{w, h} -F _min ) ² means traverse each value of F _{w, h} and subtract the minimum value, then do the square operation, and finally find with. This indicator can be used to characterize: when the calculated value of this indicator drops sharply compared with the historical average, it represents that the position and size of the target ORI of the current frame are not reliable, such as the target ROI is blocked or lost.

(c) Calculate the average confidence AverageConfidence and AverageApce during each tracking process; assuming that the frame is the Nth frame, the AverageConfidence and AverageApce of the current frame are:

3) Target ROI feature information update strategy:

Please refer to FIG. 9, which is a schematic diagram of target ROI tracking provided by an embodiment of the present invention. As shown in part a of FIG. 9, the initial position of the target ROI is 1, and the movement process from 1 to 6 in the picture The target tracking algorithm module outputs the position and size of the target in each frame in real time. At this time, the tracking confidence is high, and the feature information of the target ROI needs to be updated in real time.

As shown in part b of Fig. 9, when the occlusion loss of the target ROI occurs at

positions

2 and 4, the output confidence of the algorithm is low and the feature information update conditions are not met. At this time, the feature information of the target ROI cannot be updated, otherwise the feature information will be lost. Learned the feature information of the background or other interfering objects, so you need to wait until the target ROI reappears before continuing to update the feature information.

In the embodiment of the present invention, the processor 101 uses the target ROI determined by the first image as an initial ROI input. After feature extraction, feature selection, and tracking calculation, the target ROI is calculated in real time for each subsequent frame image (including the first image). Position and size in. The basis for judging whether the feature information is updated is as follows:

Calculate the tracking confidence of the current frame as: mConfidence;

Calculate historical average confidence: mHistoryAverageConfidence;

Calculate the correlation peak energy of the current frame: mApce;

Calculate the historical average correlation peak energy: mHistoryAverageApce;

① If the following conditional formula is satisfied, the feature information is updated in order to satisfy the feature information update condition:

mConfidence> 0.7 × mHistoryAverageConfidenc and mApce> 0.45 × mHistoryAverageApce,

② If the above conditional formula is not satisfied, and mConfidence> 0.2, then the target ROI feature information is not updated, that is, the feature information of the current image frame will not participate in the update of the target ROI feature information to optimize the tracking system and avoid the target ROI Tracking drift

③ If mConfidence <0.2; and the processor 101 outputs a new ROI (for example, when the processor 101 outputs a new target ROI every 10 frames), then the processor 101 may be triggered to re-determine the target ROI (including the NPU reacquiring the first A ROI set, and the CPU reacquires the second ROI set), that is, the initialization update of the tracking is completed again.

4) Real-time target information output:

After tracking the algorithm module, the position information and size information of the target ROI are output in real time. As shown in the main target of the figure below, the position is constrained: the green frame is the effective range when the target is stationary, at this time it is output to the AF algorithm for stable focusing; The frame is the effective range when the target is moving. At this time, the real-time output is output to the AF algorithm for motion tracking.

Please refer to FIG. 10, which is a schematic diagram of updating feature information of a target ROI according to an embodiment of the present invention. Assume that the image signal processor 103 generates n frames of images in the first preset time period. In FIG. 10, n = 10 is taken as an example, where the first frame may correspond to the first image in this application, and the second image may be a subsequent image. Any one of the 2nd, 3rd, 4th ... 10th image. specifically,

In FIG. 10, after the processor 101 determines the first ROI set and the second ROI set in the first frame (the first image), and then determines the target ROI, the feature information of the target ROI is extracted, that is, the feature information A in FIG. 10 Is also the initial identifying feature information of the target ROI; when the image signal processor generates the second frame image, at this time, the feature information B of the second frame image is first obtained; wherein the method of obtaining the feature information B may be based on, The position and size of the target ROI in the first frame image are extracted from the feature information of the position and size corresponding to the area in the second frame image, that is, the feature information B. The subsequent image frames are extracted from the feature information of the target ROI corresponding to the frame. The principle is the same and will not be repeated here. The processor 101 then compares the feature information B with the feature information A to determine the position and size of the target ROI determined in the first frame image in the second frame image; at the same time, according to the feature information A and the feature information B determines whether the second frame satisfies the feature information update condition. If the feature information update condition is met, the feature information is updated using the formula feature information A ′ = (k1 × A + k2 × B); assuming that it is determined that the above characteristics are not satisfied When the information update conditions are met but the initialization restart conditions are not met, the feature information of the latest update is used as the comparison model, or when it is determined that the initialization restart conditions are met, but the specified time point is not reached (that is, the processor 101 outputs a new Time point of the target ROI), it also continues to use the most recently updated feature information as the comparison model; however, if it is determined that the initialization restart conditions are met and the specified time point is reached, the target ROI re-output by the processor 101 can be used , And perform a new round of tracking ROI calculation. Optionally, k1 = 0.988 and k2 = 0.12 in the feature information update formula. The application does not specifically limit the conditions for updating the characteristic information and the update formula.

For example, in FIG. 10, the feature information D of the target ROI is determined in the image of the fourth frame, and after the feature information A "updated from the update of the third frame is correlated with the feature information D, it is determined that the current image of the fourth frame does not meet the features Information update conditions (for example, at this time, the target ROI is blocked or drifted greatly in the fourth frame). Therefore, the feature information D of the fourth frame does not participate in the subsequent update of the feature information, so it is necessary to continue to use the information in the third frame. The updated feature information, that is, after the feature information E is determined in the fifth frame, is still associated with the feature information updated in the third frame. Further, it is assumed that the feature information E is updated with the feature information A updated in the third frame. " After performing the correlation calculation, when it is determined that the initialization restart condition is satisfied, it is necessary to further determine whether the processor 101 outputs a new target ROI (it can also be considered to determine whether the first preset time period has been reached) until the processor 101 outputs a new target ROI , And then initialize. For example, in FIG. 10, it is necessary to wait until the eleventh frame to determine the target ROI again, which is also equivalent to initializing the feature information. The following is the process of updating the feature information of each frame in Figure 10:

First frame image: Feature information A

Image of the second frame: feature information B → update → feature information A '= (k1 × A + k2 × B)

Image of the third frame: feature information C → update → feature information A ”= (k1 × A ′ + k2 × C)

Image of the fourth frame: feature information D → not updated → feature information A ”= (k1 × A ′ + k2 × C)

5th frame image: feature information E → not updated (initial restart conditions are met) → feature information A ”= (k1 × A '+ k2 × C)

Frame 6 image: ...

Frame 7 image: ...

Frame 8 image: ...

Frame 9 image: ...

Frame 10 image: ...

Frame 11 image: recalculate feature information A

...

It can be understood that, for any one frame image generated by the image signal processor 103, tracking and focusing can be performed based on the embodiment of the invention described above, and feature information is updated, which is not exhaustive here.

In a possible implementation manner, when the processor 101 enters the target ROI tracking and focusing process, according to the real-time target ROI information, the current state of the target ROI is determined. When the target ROI is in motion, the target ROI is tracked and focused. For example, for the AF algorithm, using the target detection algorithm + motion detection algorithm + Tracking algorithm can solve the two major problems of no ROI information when tracking target movement and ROI loss after the target is stationary. In the case of using the Tracking algorithm to process the ROI information of each frame of image in real time, the AF algorithm can directly follow the ROI window for motion tracking, and when the moving target is stationary, it can perform stable focusing, which can solve the focus selection when the target is not in the center. problem.

Based on the structural description of the focusing device 10 in FIG. 1 and FIG. 3, FIG. 11 is a hardware structural diagram of a neural network processor according to an embodiment of the present invention.

The neural network processor NPU 102 is mounted on the CPU (such as Host CPU) as a coprocessor, and the Host CPU assigns tasks. The core part of the NPU is an arithmetic circuit 1203. The controller 1204 controls the arithmetic circuit 1203 to extract matrix data in the memory and perform multiplication operations.

In some implementations, the arithmetic circuit 1203 includes multiple processing units (Process Engines, PEs). In some implementations, the arithmetic circuit 1203 is a two-dimensional pulsating array. The arithmetic circuit 1203 may also be a one-dimensional pulsation array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1203 is a general-purpose matrix processor.

For example, suppose there are an input matrix A, a weight matrix B, and an output matrix C. The operation circuit takes the data corresponding to the matrix B from the weight memory 1202, and buffers it on each PE in the operation circuit. The arithmetic circuit takes matrix A data from the input memory 1201 and performs matrix operations on the matrix B. Partial or final results of the obtained matrix are stored in the accumulator 1208 accumulator.

The unified memory 1206 is used to store input data and output data. The weight data is directly accessed to the controller 12012 through the storage unit, and the memory is accessed to the controller, and the DMAC is transferred to the weight memory 1202. The input data is also transferred to the unified memory 1206 through the DMAC.

BIU stands for Bus Interface Unit, that is, the bus interface unit 1210, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch memory 1209.

The bus interface unit 1210 (Bus Interface Unit, referred to as BIU) is used to fetch the instruction memory 1209 to obtain instructions from external memory, and is also used for the storage unit access controller 12012 to obtain the original data of the input matrix A or weight matrix B from the external memory.

The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1206 or the weight data to the weight memory 1202 or the input data data to the input memory 1201.

The vector calculation unit 1207 has a plurality of arithmetic processing units, and further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional / FC layer network calculations in neural networks, such as Pooling, Batch Normalization, Local Normalization, and so on.

In some implementations, the vector calculation unit can 1207 store the processed output vector into the unified buffer 1206. For example, the vector calculation unit 1207 may apply a non-linear function to the output of the arithmetic circuit 1203, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 1207 generates a normalized value, a merged value, or both. In some implementations, a vector of the processed output can be used as an activation input to the arithmetic circuit 1203, for example for use in subsequent layers in a neural network.

An instruction fetch memory 1209 connected to the controller 1204 is used to store instructions used by the controller 1204;

The unified memory 1206, the input memory 1201, the weight memory 1202, and the fetch memory 1209 are all On-Chip memories. External memory is private to the NPU hardware architecture.

It can be understood that the related functions such as obtaining the first ROI set by the NPU and the CNN feature extraction of the target ROI described in FIG. 1 and FIG. 3 are implemented by the relevant functional units in the NPU described above, and will not be repeated here.

Please refer to FIG. 12. FIG. 12 is a schematic flowchart of a focusing method according to an embodiment of the present invention. The focusing method is applicable to any one of the focusing devices in FIG. 1 and FIG. 3 and a device including the focusing device. The method may include the following steps S201-S205.

Step S201: Determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI set includes one or more First ROIs, each of which includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more second ROIs, Each second ROI is a motion area;

Step S202: determine a target ROI in the first image based on the first ROI set and the second ROI set;

In a possible implementation manner, the determining a target ROI in the first image based on the first ROI set and the second ROI set includes:

Determining a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is within a preset area of the first image;

Determining an effective second ROI from one or more second ROIs in the second ROI set, where the effective second ROI is within a preset area of the first image;

In a case where the intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as the target ROI.

In a possible implementation manner, the method further includes:

In a possible implementation manner, the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or the effective second ROI The one or more second ROIs within the preset region of the first image have the highest evaluation score; wherein the evaluation score of each ROI satisfies at least one of the following: proportional to the area of the ROI, and The distance of the ROI from the center point of the first image is inversely proportional to the priority of the object category to which the ROI belongs.

Step S203: determine the characteristic information of the target ROI;

In a possible implementation manner, the feature information of the target ROI is also updated based on the feature information corresponding to the position and size of the target ROI in the historical image.

Step S204: Identify the position information and size information of the target ROI in the second image generated by the image signal processor according to the characteristic information of the target ROI, and the first image is located in the third region in the time domain. Before two images

Step S205: Focus according to the position information and size information.

In a possible implementation manner, after the first preset time period, recalculate the target ROI; or

In a possible implementation manner, when the tracking confidence of the target ROI is less than a confidence threshold, the target ROI is recalculated, where the tracking confidence is used to indicate that the tracking of the target ROI is accurate The tracking confidence is directly proportional to the tracking accuracy.

It should be noted that, for specific processes in the calibration method described in the embodiments of the present invention, reference may be made to related descriptions in the embodiments of the invention described in FIG. 1 to FIG. 11 above, and details are not described herein again.

Please refer to FIG. 13. FIG. 13 is a schematic structural diagram of another focusing device according to an embodiment of the present invention. The focusing device 30 may include a first processing unit 301, a second processing unit 302, a third processing unit 303, and a recognition unit 304. And focusing unit 305,

The first processing unit 301 is configured to determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI The set includes one or more first ROIs, and each first ROI includes a subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more Second ROIs, each second ROI is a motion area;

A second processing unit 302, configured to determine a target ROI in the first image based on the first ROI set and the second ROI set;

A third processing unit 303, configured to determine feature information of the target ROI;

A recognition unit 304, configured to identify position information and size information of the target ROI in a second image generated by the image signal processor according to the characteristic information of the target ROI, where the first image is located in the time domain Before the second image;

The focusing unit 305 is configured to perform focusing according to the position information and the size information.

In a possible implementation manner, the second processing unit 302 is specifically configured to:

In a possible implementation manner, the second processing unit 302 is further configured to:

In a possible implementation manner, the third processing unit 303 is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.

In a possible implementation manner, the apparatus further includes:

A first initialization unit 306, configured to recalculate the target ROI after a first preset time period; or

A second initialization unit 307 is configured to recalculate the target ROI when the tracking confidence of the target ROI is less than a confidence threshold, where the tracking confidence is used to indicate the tracking accuracy of the target ROI , The tracking confidence is directly proportional to the tracking accuracy.

It should be noted that, for functions of related units in the focusing device 30 described in the embodiment of the present invention, reference may be made to the related device embodiments described in FIG. 1 to FIG. 11 and the method embodiments described in FIG. 12. Related descriptions are not repeated here.

Each unit in FIG. 13 may be implemented in software, hardware, or a combination thereof. Units implemented in hardware may include circuits and electric furnaces, algorithm circuits, or analog circuits. A unit implemented in software may include program instructions, which is regarded as a software product, stored in a memory, and may be run by a processor to implement related functions. For details, refer to the previous introduction.

An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes part or all of the steps described in any of the foregoing method embodiments.

An embodiment of the present invention further provides a computer program. The computer program includes instructions. When the computer program is executed by a computer, the computer can perform part or all of the steps of any method for upgrading a vehicle-mounted device.

In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not described in detail in an embodiment, reference may be made to related descriptions in other embodiments.

It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all described as a series of action combinations. However, those skilled in the art should know that this application is not limited by the described action order. Because according to this application, some steps may be performed in other orders or simultaneously. Secondly, a person skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required for this application.

In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the above units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. To another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical or other forms.

The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, which may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.

In addition, the functional units in the embodiments of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.

When the above integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium. It includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, and specifically a processor in a computer device) to perform all or part of the steps of the foregoing method in each embodiment of the present application. The foregoing storage medium may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read-only memory (abbreviation: ROM), or a random access memory (Random Access Memory, abbreviation: RAM). A medium that can store program code.

As mentioned above, the above embodiments are only used to describe the technical solution of the present application, rather than limiting them. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still apply the foregoing The technical solutions described in the embodiments are modified, or some technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A focusing device, comprising a processor, a neural network processor and an image signal processor coupled to the processor;

The image signal processor is configured to generate a first image;

The neural network processor is configured to obtain a first region of interest ROI set in the first image, where the first ROI set includes one or more first ROIs, and each first ROI includes a photographic object ;

The processor is configured to:

Acquiring a second ROI set in the first image, where the second ROI set includes one or more second ROIs, and each second ROI is a motion region;

Determining a target ROI in the first image based on the first ROI set and the second ROI set;

Determining characteristic information of the target ROI;

Identifying position information and size information of the target ROI in a second image generated by the image signal processor according to the characteristic information of the target ROI, the first image being located before the second image in the time domain ;as well as

Focusing is performed according to the position information and the size information.
The apparatus according to claim 1, wherein the processor is specifically configured to:

Determining a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is within a first preset region of the first image;

Determining an effective second ROI from one or more second ROIs in the second ROI set, where the effective second ROI is within a second preset region of the first image;

In a case where the intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as the target ROI.
The apparatus according to claim 2, wherein the processor is further configured to:

When the intersection ratio of the effective first ROI and the effective second ROI is less than a preset threshold, the effective second ROI and the effective first ROI are more distanced from the center of the first image than the effective first ROI. The near ROI is determined as the target ROI.
The apparatus according to claim 2 or 3, wherein the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or The effective second ROI has the highest evaluation score in one or more second ROIs within a preset area of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: Is proportional to the area of the ROI, is inversely proportional to the distance of the ROI from the center point of the first image, and is proportional to the priority of the object category to which the ROI belongs.
The device according to any one of claims 1-4, wherein the processor is further configured to update the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image. Feature information.
The device according to any one of claims 1-5, wherein the processor is further configured to:

Recalculate the target ROI after the first preset time period; or

When the tracking confidence of the target ROI is less than the confidence threshold, the target ROI is recalculated, where the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence and tracking Accuracy is directly proportional.
The device according to any one of claims 1-6, wherein the feature information comprises one or more of directional gradient hog information, color lab information, and CNN information of a convolutional neural network.
A focusing method, comprising:

Determining a first region of interest ROI set and a second ROI set, the first ROI set being a ROI set obtained from a first image generated by an image signal processor, the first ROI set including one or more first ROI, each first ROI includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more second ROIs, each The two ROIs are the motion areas;

Determining a target ROI in the first image based on the first ROI set and the second ROI set;

Determining characteristic information of the target ROI;

Identifying position information and size information of the target ROI in a second image generated by the image signal processor according to the characteristic information of the target ROI, the first image being located before the second image in the time domain ;

Focusing is performed according to the position information and the size information.
The method according to claim 8, wherein determining the target ROI in the first image based on the first ROI set and the second ROI set comprises:

Determining a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is within a first preset region of the first image;

Determining an effective second ROI from one or more second ROIs in the second ROI set, where the effective second ROI is within a second preset region of the first image;

In a case where the intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as the target ROI.
The method according to claim 9, further comprising:

In a case where the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.
The method according to claim 9 or 10, wherein the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or The effective second ROI has the highest evaluation score in one or more second ROIs within a preset area of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: Is proportional to the area of the ROI, is inversely proportional to the distance of the ROI from the center point of the first image, and is proportional to the priority of the object category to which the ROI belongs.
The method according to any one of claims 8 to 11, further comprising: updating feature information of the target ROI based on feature information corresponding to the position and size of the target ROI in the historical image. .
The method according to any one of claims 8-12, wherein the method further comprises:

Recalculate the target ROI after the first preset time period; or

When the tracking confidence of the target ROI is less than the confidence threshold, the target ROI is recalculated, where the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence and tracking Accuracy is directly proportional.
The method according to any one of claims 8-13, wherein the feature information comprises one or more of directional gradient hog information, color lab information, and CNN information of a convolutional neural network.
An electronic device, comprising an image sensor and the focusing device according to any one of claims 1-7; wherein

The image sensor is used to collect image data;

The image signal processor is configured to generate the first image based on the image data.
The electronic device according to claim 15, further comprising: a memory for storing program instructions; and the program instructions are executed by the processor.
A computer storage medium, characterized in that the computer storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 8 to 14 is implemented.
A computer program, characterized in that the computer program includes instructions that, when the computer program is executed by a computer, causes the computer to execute the method according to any one of claims 8-14.