WO2020224479A1 - 目标的位置获取方法、装置、计算机设备及存储介质 - Google Patents
目标的位置获取方法、装置、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2020224479A1 WO2020224479A1 PCT/CN2020/087361 CN2020087361W WO2020224479A1 WO 2020224479 A1 WO2020224479 A1 WO 2020224479A1 CN 2020087361 W CN2020087361 W CN 2020087361W WO 2020224479 A1 WO2020224479 A1 WO 2020224479A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- sample image
- target
- sample
- model
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 195
- 230000008569 process Effects 0.000 claims abstract description 120
- 238000012549 training Methods 0.000 claims abstract description 66
- 238000012545 processing Methods 0.000 claims description 118
- 230000000875 corresponding effect Effects 0.000 claims description 81
- 238000000605 extraction Methods 0.000 claims description 23
- 230000015654 memory Effects 0.000 claims description 20
- 238000006073 displacement reaction Methods 0.000 claims description 10
- 230000009471 action Effects 0.000 claims description 2
- 230000002596 correlated effect Effects 0.000 claims description 2
- 230000002441 reversible effect Effects 0.000 abstract description 16
- 239000000523 sample Substances 0.000 description 442
- 238000005516 engineering process Methods 0.000 description 21
- 230000004044 response Effects 0.000 description 16
- 238000013473 artificial intelligence Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 11
- 230000002093 peripheral effect Effects 0.000 description 10
- 230000001133 acceleration Effects 0.000 description 9
- 239000000284 extract Substances 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000033001 locomotion Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 239000000919 ceramic Substances 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000001965 increasing effect Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 241001494479 Pecora Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/97—Determining parameters from multiple pictures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present invention relates to the field of computer technology, and particularly relates to a target location acquisition technology.
- the method for obtaining the position of a target is usually to give a target in a frame of image, and process the multi-frame image based on the target tracking algorithm to obtain the position of the target in the multi-frame image.
- the target tracking algorithm calculates each frame of sample image to determine the predicted position of the target, and then based on the predicted position of the target and the marked The real position of the target, the target tracking algorithm is trained.
- each frame of sample image needs to manually mark the true position of the target, which has high labor cost and cumbersome image processing process. Therefore, the above method for acquiring the position of the target is inefficient.
- the embodiments of the present invention provide a method, a device, a computer device, and a storage medium for acquiring a target location, which can solve the problems of high labor cost, cumbersome processing process, and low efficiency in related technologies.
- the technical solution is as follows:
- a method for obtaining the position of a target includes:
- a first image in the multi-frame image includes a target to be detected, and the first image is any one of the multi-frame images;
- the model parameters of the position acquisition model are based on the first position of the selected target in the first sample image in the multi-frame sample image and the selected target in the first sample image
- the second position is obtained through training, the second position is predicted based on the third position of the selected target in the second sample image in the multi-frame sample image, and the third position is based on the first Obtained by position prediction;
- the selected target is randomly selected from the first sample image;
- the second sample image is a sample that is different from the first sample image in the multi-frame sample image image;
- the position acquisition model determines the position of the object to be detected in the second image based on the model parameters and the position of the object to be detected in the first image, and the second image is the multiple An image that is different from the first image in the frame image.
- a method for obtaining the position of a target includes:
- the initial model and obtain the third position of the selected target in the second sample image based on the first position of the selected target in the first sample image in the multi-frame sample image according to the initial model, based on The third position of the selected target in the second sample image is acquired, the second position of the selected target in the first sample image is acquired, and based on the first position and the second position,
- the model parameters of the initial model are adjusted to obtain a position acquisition model;
- the position acquisition model is called, and the position of the target to be detected in the multiple frames of images is determined according to the position acquisition model.
- a device for acquiring the position of a target includes:
- An image acquisition module configured to acquire a multi-frame image, a first image in the multi-frame image includes a target to be detected, and the first image is any one of the multi-frame images;
- the model calling module is used to call the position acquisition model, and the model parameters of the position acquisition model are based on the first position of the selected target in the first sample image in the multi-frame sample image and the selected target in the first sample image
- the second position in the sample image is obtained through training, and the second position is predicted based on the third position of the selected target in the second sample image in the multi-frame sample image, and the third position Is obtained based on the first position prediction;
- the selected target is randomly selected from the first sample image;
- the second sample image is different from the first sample image in the multi-frame sample image Sample image of sample image;
- the position acquisition module is configured to determine the position of the target to be detected in the second image based on the model parameters and the position of the target to be detected in the first image through the position acquisition model.
- the second image is an image that is different from the first image among the multi-frame images.
- a device for acquiring the position of a target includes:
- Image acquisition module used to acquire multi-frame sample images
- the model training module is used to call the initial model, and obtain the selected target in the second sample image based on the first position of the selected target in the first sample image in the multi-frame sample image according to the initial model Based on the third position of the selected target in the second sample image, acquiring the second position of the selected target in the first sample image, based on the first position and the In the second position, the model parameters of the initial model are adjusted to obtain a position acquisition model;
- the location acquisition module is used to call the location acquisition model when multiple frames of images are acquired, and determine the location of the target to be detected in the multiple frames of images according to the location acquisition model.
- a computer device in one aspect, includes one or more processors and one or more memories, and at least one instruction is stored in the one or more memories. Multiple processors are loaded and executed to realize the operations performed by the target location acquisition method.
- a computer-readable storage medium is provided, and at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to realize the operations performed by the target location acquisition method.
- the position acquisition model obtained by training processes the multi-frame image to obtain the position of the target in the multi-frame image.
- the position acquisition model can be obtained through forward and reverse process training, and the forward process can be based on The first position of the selected target in the first sample image predicts the third position of the selected target in the second sample image. Through the reverse process, the third position of the selected target can be predicted in the first sample image. The second position, because the selected target is randomly selected in the first sample image, the selected position is determined.
- the first position is the real position of the selected target, and the selected target is used in the first sample
- the first position and the second position in the image, the error value between the first position and the second position can reflect the accuracy of the model parameters of the initial model, so the initial model can be trained according to the first position and the second position, There is no need for manual annotation by relevant technical personnel, which can effectively reduce labor costs, improve the efficiency of model training, and the image processing process is simple, which effectively improves the efficiency of the entire target location acquisition process.
- FIG. 1 is a schematic diagram of an implementation environment of a method for obtaining a position of a target according to an embodiment of the present invention
- FIG. 2 is a flowchart of a method for training a location acquisition model provided by an embodiment of the present invention
- Fig. 3 is a schematic diagram of an acquisition process of a multi-frame sample image provided by an embodiment of the present invention
- FIG. 4 is a schematic diagram of training data provided by an embodiment of the present invention.
- FIG. 5 is a training flowchart of a location acquisition model provided by an embodiment of the present invention.
- FIG. 6 is a comparison diagram of different sample image sets acquired according to an embodiment of the present invention.
- FIG. 7 is a flowchart of a method for obtaining the location of a target according to an embodiment of the present invention.
- FIG. 8 is a flowchart of a method for acquiring the position of a target according to an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of an apparatus for acquiring a target position according to an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of an apparatus for acquiring a target position according to an embodiment of the present invention.
- FIG. 11 is a schematic structural diagram of a terminal provided by an embodiment of the present invention.
- Fig. 12 is a schematic structural diagram of a server provided by an embodiment of the present invention.
- FIG. 1 is an implementation environment of a method for obtaining a target location according to an embodiment of the present invention.
- the implementation environment may include at least one computer device, wherein the multiple computer devices can realize data interaction through wired connection , Data interaction can also be realized through a network connection, which is not limited in the embodiment of the present invention.
- the at least one computer device may include a computer device 101 and a computer device 102, where the computer device 101 may be used to process a multi-frame image to obtain the position of a target in the multi-frame image.
- the computer device 102 may be used to collect multiple frames of images, or to shoot videos, and send the collected images or videos to the computer device 101, and the computer device 101 processes the images or videos to track the target.
- the at least one computer device may only include the computer device 101, and the computer device may collect multiple frames of images, or shoot videos, etc., so as to further compare the collected multiple frames of images or the captured videos.
- the multi-frame image obtained after image extraction and other processing, or the multi-frame image obtained after downloading, or the multi-frame image obtained after image extraction and other processing of the downloaded video are processed to determine the target in the multi-frame Position in the image to achieve target tracking.
- the embodiment of the present invention does not limit the application scenario of the method for acquiring the position of the target.
- the target location acquisition method can be applied to various target tracking scenes, for example, to analyze scenes in images or videos, for example, to track the target through monitoring equipment, for example, human-machine Interactive scene.
- the method for obtaining the location of the target provided in the embodiment of the present invention is not limited to these scenarios, and there are other scenarios, which are not listed here.
- the target can be a person or a thing.
- the target may be different.
- the computer device 101 and the computer device 102 may be provided as a terminal or a server, which is not limited in the embodiment of the present invention.
- AI Artificial Intelligence
- digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech technology, natural language processing technology, and machine learning/deep learning.
- a position acquisition model is trained through machine learning, and then the position acquisition model obtained by training is used to determine the position of the target to be detected in multiple frames of images.
- Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
- Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
- Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning techniques.
- Computer vision technology may also be involved in the process of position acquisition model training or target position acquisition.
- Computer Vision is a science that studies how to make machines "see”. More specifically, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure machine vision for targets, and further Do graphics processing to make computer processing more suitable for human eyes to observe or send to the instrument to detect images.
- computer vision studies related theories and technologies, trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
- the embodiments of this application involve, for example, image processing and image semantic understanding in computer vision technology. For example, after obtaining an image such as an image to be recognized or a training sample, image processing is performed, such as selecting a target, etc.; another example, using image semantic understanding technology Perform image feature extraction and so on.
- Figure 2 is a flowchart of a method for training a location acquisition model provided by an embodiment of the present invention.
- the training method for a location acquisition model can be applied to a computer device.
- the computer device can be provided as a terminal or a server. The embodiment does not limit this.
- the method may include the following steps:
- Step 201 The computer device obtains multiple frames of sample images.
- the computer device may obtain multiple frames of sample images, and train the initial model based on the multiple frames of sample images to obtain a position acquisition model, which may be based on the target to be detected determined in one frame of the image,
- the multi-frame image is processed to obtain the position of the target to be detected in each frame of the multi-frame image.
- the computer device can obtain a multi-frame sample image, use the multi-frame sample image as a training sample, and train the initial model.
- the multi-frame sample image does not need to be manually annotated by the relevant technicians, and the computer device can directly process the multi-frame sample image to train the initial model, thereby realizing the process of unsupervised learning and reducing Labor cost improves the efficiency of model training.
- the multi-frame sample image includes a plurality of sample image sets, and each sample image set includes a first sample image and at least one second sample image, and the second sample image is the first sample image.
- the first sample image can be used as a template image, which is also a sample image used to obtain a selected target
- the second sample image can be used as a search image, which means that the selected target can be searched.
- the sample image at the position of, that is, the position of the selected object in the search image can be obtained based on the selected object selected in the template image.
- each sample image set is a training sample set, and multiple frames of sample images (a first sample image and at least one second sample image) in each sample image set include the same selected Target, the computer device can track the selected target and obtain the position of the selected target in each frame of sample images.
- each sample image set may include one frame of the first sample image and two frames of the second sample image.
- three frames can be selected from the adjacent 10 frames of a video file, and one frame is used as the first sample. Image, the other two frames are used as the second sample image. That is, we assume that the selected target will not move out of a certain area within the short time of the 10 frames.
- Obtaining multiple frames of second sample images can avoid processing based on a frame of the first sample image and a frame of the second sample image.
- the error value of the result that happens to be obtained is very high, but in fact the intermediate data in the processing process is wrong Yes, by increasing the training samples, this accidental situation can be reduced, and the error accumulation can be expanded to correct it, which can improve the stability and error value of the position acquisition model.
- the process of obtaining the multi-frame sample image by the computer device can include various methods.
- the multi-frame sample image can be stored in the computer device or other computer devices, and the computer device can obtain the multi-frame sample image from a local storage file.
- the multi-frame sample image can also be sent to another computer device with an image acquisition request.
- the other computer device sends the multi-frame sample image to the computer device based on the image acquisition request, so that the computer device acquires the multi-frame sample image.
- the embodiment does not limit this.
- the computer device can directly obtain the multi-frame sample image, or extract the multi-frame sample image from the video file.
- the multi-frame sample image may be stored in an image database, and the computer device may obtain the multi-frame sample image from the image database.
- the video file in which the multi-frame sample image is located can be stored in a video database, and the computer device can obtain at least one video file from the video database, so as to extract the multi-frame sample image from the at least one video file.
- the embodiment of the present invention does not limit this.
- the multi-frame sample image may be from ILSVRC 2015, and ILSVRC 2015 is a data set for visual recognition.
- the computer device can also download video files from the network to extract images. It is precisely because the sample image of the present invention does not need to carry tag data and does not need to be manually labeled. Therefore, the acquisition of the multi-frame sample image is very convenient.
- the embodiment does not limit which method is adopted.
- the multi-frame sample image may also be a cropped image of the extracted or obtained image.
- the multi-frame image can be cropped, thereby Obtain the multi-frame sample image.
- the center of the multi-frame image may be used as the criterion, and the target area with the center as the center point can be cropped from the multi-frame image to obtain the multi-frame sample image.
- the computer device can extract three frames of images from the image sequence of the unlabeled video , And crop the central area of the three frames of images (for example, the area identified by the rectangular box in Figure 3) to obtain three sample images.
- the three sample images may include template images and search image blocks, where the template The image refers to the first sample image, and the search image block refers to the search image, that is, the second sample image.
- Figure 3 only shows the process of obtaining a sample image set.
- the computer device can obtain a large number of sample images in the same way, so as to train the initial model.
- Figure 4 shows some randomly collected training data.
- Figure 4 includes 28 images. Each image is an example of a frame of images collected for a certain target. These images used as training data include selected targets. The target can be a person or a thing. Each image is a piece of training data. For example, the image identified by the dashed box in FIG. 4 is a piece of training data.
- the selected target in the image can be a sheep, and each image will not be repeated here. These selected objects are relatively close to the central area of the image, and try to ensure that the selected objects will not move out of a certain area in a short time. In this case, there are related designs in the subsequent image processing process, so I won't go into details here.
- Step 202 The computer device calls the initial model, and randomly selects the target area in the first sample image in the multi-frame sample image as the selected target according to the initial model.
- the computer device After the computer device obtains the multi-frame sample images, it can call the initial model, and train the initial model based on the multi-frame sample images.
- the model parameters of the initial model are initial values
- the initial model can process the multi-frame sample image based on the model parameters, and predict the position of a target in the multi-frame sample image. The prediction result is not accurate. Therefore, the computer equipment can adjust the model parameters of the initial model during the training process to increase the error value of the initial model's image processing.
- the final trained position acquisition model can perform image processing. Processing with high error value.
- the computer device can execute this step 202 to input a multi-frame sample image into the initial model. Since the multi-frame sample image has not been manually annotated, and the multi-frame sample image does not include the given target, the initial model can start from the first The target area is randomly selected in the sample image as the selected target, and then the position of the selected target in the second sample image is continuously obtained through the prediction method, and the subsequent training process is performed.
- the process of randomly selecting the target area by the computer device can be implemented based on a random algorithm, and the random algorithm can be set by relevant technicians according to requirements, which is not limited in the embodiment of the present invention.
- Step 203 The initial model in the computer device is based on the first position of the selected target in the first sample image, the first sample image and the second sample image, and acquires the selected target in the second sample image The third position in.
- the computer device After the computer device determines the selected target in the first sample image, it can continue to obtain its position in the second sample image, such as the third position, based on the selected target. Understandably, the computer device determines the selected target in the first sample image, and the first position of the selected target in the first sample image is the real position. Therefore, the computer device can regard it as a real
- the data is used to determine the error value of the subsequent prediction data. For details, refer to the following step 203 to step 205. The embodiment of the present invention will not be repeated here.
- the initial model in the computer device can process the first sample image and the second sample image based on the first position of the selected target in the first sample image to obtain the selected target in the second sample image
- the third position in is the predicted position.
- the prediction process may be a forward process, and the computer device may predict the third position of the target in the second sample image based on the first position of the target in the first sample image to realize the target tracking process .
- the prediction process can be implemented through the following steps 1 and 2:
- Step 1 The initial model in the computer device obtains first image processing parameters based on the first position of the target in the first sample image and the first sample image.
- the initial model in the computer device can determine the first image processing parameter under the condition that the data before processing and the processing result are known, and the first image processing parameter is used to indicate how to compare the first sample image Processing is performed to obtain the first position of the selected target in the first sample image.
- the first image processing parameters obtained in this way can be used to perform similar processing on the second sample image, thereby obtaining the third position of the selected target in the second sample image.
- the initial model in the computer device may first extract the image features of the first sample image, and then process the image features.
- the initial model in the computer device can perform feature extraction on the first sample image based on the model parameters of the initial model to obtain the image features of the first sample image.
- the initial model in the computer device obtains the first image processing parameter based on the image feature of the first sample image and the first position of the selected target in the first sample image.
- the initial model in the computer device processes the image features of the first sample image based on the first image processing parameters, and the result that should be obtained is the first position of the selected target in the first sample image.
- Step 2 The initial model in the computer device processes the second sample image based on the first image processing parameter to obtain the third position of the selected target in the second sample image.
- the initial model in the computer device knows how to process the sample image after determining the first image processing parameter, so the second sample image can be processed in the same way to predict the selected target The third position in the second sample image.
- the initial model in the computer device can first extract the image features of the first sample image and then process the image features.
- the initial model in the computer device can be based on the model parameters of the initial model, and the second Feature extraction is performed on the second sample image to obtain the image characteristics of the second sample image.
- the computer device processes the image feature of the second sample image based on the first image processing parameter to obtain the third position of the selected target in the second sample image.
- the first position of the selected target in the first sample image may be expressed in the form of position indication information. Therefore, in step 203, the initial model in the computer device may be based on the selected target. Set the target at the first position in the first sample image, generate first position indication information corresponding to the first sample image, and the first position indication information is used to indicate that the selected target is in the first sample image In the first position. Then the initial model in the computer device can obtain the position indication information corresponding to the second sample image based on the first position indication information, the first sample image and the second sample image, and the position indication information corresponding to the second sample image It is used to indicate the third position of the selected target in the second sample image.
- the initial model in the computer device when the initial model in the computer device is based on the first image processing parameter, when the image feature of the second sample image is processed, the position indication information corresponding to the second sample image can be obtained.
- the initial model may convolve the first image processing parameter and the image feature of the second sample image to obtain the position indication information corresponding to the second sample image.
- the first position indication information and the position indication letter corresponding to the second sample image may be a response graph, and the location of the peak of the response graph is the location of the selected target.
- the response graph can be a matrix, and each value in the matrix can be used to represent one or more pixels.
- the above process may be: after the initial model in the computer device obtains the selected target, the first position indication information may be generated based on the first sample image and the first position of the selected target in the first sample image , The first position indication information is the true label of the first sample image, and the initial model in the computer device is based on the model parameters.
- the first sample image is feature extracted to obtain the image feature of the first sample image.
- the computer device processes the image features of the first sample image based on the first image processing parameters, and should obtain the first position indication information (response map, real label), and now the image features of the first sample image are known
- the first image processing parameter can be solved, and then the second sample image feature is extracted to obtain the image feature of the second sample image.
- the second The image features of the sample image are processed to obtain the position indication information corresponding to the second sample image, which is also a response map.
- the first position indication information may be a Gaussian-shaped response graph.
- the position indication information corresponding to the second sample image may be irregular instead of a Gaussian-shaped response map.
- the initial model may include a two-way network, one of which is used to process the first sample image, and the other is used to process the second sample image.
- the above-mentioned first image processing parameter can be the coefficient in the correlation filter.
- the process in step 203 can be as shown in Figure 5 (a) and (b), taking the first sample image as Template image, template image block, the second sample image is search image, search image block, the initial label is the first position indication information, the response image is the position indication information corresponding to the second sample image, for example, the initial model can be in the template image first After confirming the selected target, the initial label is generated, and based on the Convolutional Neural Network (CNN), feature extraction is performed on the template image, and the feature expression is performed, based on the initial label and template image To solve the coefficients in the correlation filter, the initial model can extract the characteristics of the search image, and then convolve the coefficients of the correlation filter and the image characteristics of the search image to obtain the response map.
- the embodiment of the present invention does not limit the sequence of the steps of feature extraction of the template image and the search image by the initial model, and may be performed simultaneously or sequentially.
- the initial model and the final position acquisition model are extremely lightweight. For example, only two convolutional layers may be included, and the size of the CNN filter may be 3x3x32x32 and 3x3x32x32. Of course, local response normalization can also be performed on the last layer.
- This lightweight network structure can make the tracking efficiency of the target extremely high.
- the unsupervised model based on forward and backward can also learn general feature expressions, and achieve good target tracking after training.
- the process of acquiring the first image processing parameters by the initial model can be implemented based on the following formula 1:
- W T is the first image processing parameter, that is, the coefficient of the correlation filter in the example, ⁇ is the regularization parameter, and ⁇ is the dot product operation between elements, Is the discrete Fourier transform, Is the inverse discrete Fourier transform, ⁇ means complex conjugate. This calculation process is performed in the Fourier domain. T is used to identify the first sample image.
- the second sample image can be processed, and the processing process can be implemented based on the following formula 2:
- R S is the position indication information corresponding to the second sample image, that is, the response map corresponding to the second sample image in the above example
- W T is the first image processing parameter, that is, the coefficient of the correlation filter in the example , Is the discrete Fourier transform, Is the inverse discrete Fourier transform, ⁇ represents the complex conjugate, and ⁇ is the dot product operation between elements.
- T is used to identify the first sample image
- S is used to identify the second sample image. Represents the feature extraction operation of CNN.
- Step 204 The initial model in the computer device is based on the third position of the selected target in the second sample image, the first sample image and the second sample image, and the selected target is acquired in the first sample image.
- the second sample image is a sample image different from the first sample image in the multi-frame sample image.
- the computer device obtains the third position of the selected target in the second sample image through the forward process based on the first position of the selected target in the first sample image, and then the selected target
- the third position of the target in the second sample image is used as the pseudo label of the second sample image, that is, the third position of the selected target in the second sample image is not real data, but it can be assumed to be a kind of Real data is used to perform the reverse process to obtain the second position of the selected target in the first sample image.
- the reverse process is the same as the image processing process of the forward process, except that the first sample image and the second sample image are mutually converted, the second sample image is used as the template image, and the first sample image is used as the search image. Reverse forecast.
- this step 204 can also be implemented through the following steps 1 and 2:
- Step 1 The initial model in the computer device obtains the second image processing parameter based on the third position of the selected target in the second sample image and the second sample image.
- This step one is the same as the steps in step 203 above, except that the first sample image and the second sample image are converted, the second sample image is used as the template image, and the first sample image is used as the search image, and the same is true
- the processing process is used to indicate how to process the second sample image to obtain the second position of the selected target in the second sample image.
- the initial model in the computer device may also first extract image features, and then further process the image features. Specifically, the initial model in the computer device may perform feature extraction on the second sample image based on the model parameters of the initial model to obtain the image features of the second sample image. The initial model in the computer device acquires the second image processing parameters based on the image characteristics of the second sample image and the third position of the selected target in the second sample image.
- Step 2 The initial model in the computer device processes the first sample image based on the second image processing parameter to obtain the second position of the target in the first sample image.
- step two is the same as step two in step 203, except that the first sample image and the second sample image are converted, the second sample image is used as the template image, and the first sample image is used as the search image, and the same is performed. Management process.
- the initial model in the computer device can also perform feature extraction on the first sample image based on the model parameters of the initial model to obtain the image features of the first sample image.
- the computer device processes the image feature of the first sample image based on the second image processing parameter to obtain the second position of the selected target in the first sample image.
- the position of the selected target in the image can be indicated by position indication information.
- the initial model in the computer device can also be based on the corresponding image of the second sample. Position indication information, the first sample image, and the second sample image, and second position indication information corresponding to the first sample image is obtained. The second position indication information is used to indicate that the selected target is in the first The second position in this image.
- step 204 may be: the initial model in the computer device is based on the model parameters, and the second sample image is feature extracted to obtain the first The image features of the second sample image, and based on the image features and the position indication information corresponding to the second sample image (the third position of the selected target in the second sample image), the second image processing parameters are acquired, and then the Perform feature extraction on the sample image to obtain the image features of the first sample image. Based on the second image processing parameters, process the image features of the first sample image to obtain the second position indication corresponding to the first sample image Information (the second position of the selected target in the first sample image).
- step 203 is a forward process
- step 204 is a reverse process.
- the forward + reverse process can be based on the first position (real position) of the selected target in the first sample image
- the second position (predicted position) of the selected target in the first sample image is obtained, so that based on the first position and the second position, the initial model can be used to process the image The error value.
- step 203 corresponds to the forward tracking process
- step 204 corresponds to the backward tracking process.
- the template image and the search image are exchanged, that is, the template The image becomes the second sample image, and the search image becomes the first sample image.
- the processing of the template image and the search image is the same as the forward tracking process.
- the response image obtained during the backward tracking process is the first sample image.
- #1 in Figure 5 is used to identify the first sample image
- #2 is used to identify the second sample image, which can be obtained from Figure 5, for the determination in #1
- Select the target (picture a is the position identified by the white rectangle in #1 of the template image block)
- the position identified by the white rectangular frame) and then based on the third position of the selected target in #2, and then the second position of the selected target in #1 is tracked backwards (picture a is the gray rectangular frame in #1 of the search image block)
- the identified position and then based on the first position (the position identified by the white rectangular box) and the second position (the position identified by the gray rectangular box) of the target in #1, it is determined whether
- the initial model in the computer device can also be implemented by formulas that are the same as the above formula 1 and formula 2 when performing step 204, that is, replace T in formula 1 with S, Replace Y T with Y S , Y S is R S , or a Gaussian-shaped response graph generated based on R S , replace S in Equation 2 with T, and replace W T with W S , where Y S is the first two positions corresponding to the sample image based on the position indication information or the R S obtained indication information Gaussian shape, it should be noted that, to the front and reverse tracking process, CNN model parameters are fixed.
- Step 205 The computer device obtains an error value of the second position relative to the first position based on the first position and the second position of the selected target in the first sample image.
- the computer device After the computer device obtains the first position and the second position of the selected target in the first sample image, it can evaluate the error value predicted by the initial model, so as to be based on the first position of the target in the first sample image.
- the error value of the second position relative to the first position is used to determine whether the model parameters of the initial model need to be adjusted.
- the process can also be implemented through a reward mechanism.
- the larger the error value the more appropriate the model parameters of the initial model.
- the smaller the error value the more appropriate the model parameters will be described as an example. Based on this principle, the following step 206 can be performed to train the initial model to obtain a location acquisition model with a small prediction error value.
- the multi-frame sample image may include a plurality of sample image sets, and each sample image set corresponds to an error value of the predicted position.
- the computer device can obtain at least one error value based on the first sample image and at least one frame of the second sample image included in the sample image set, that is, each frame of the second sample image can correspond to an error value, and the sample image set The corresponding error value may be determined based on the at least one error value.
- the computer device may obtain an average value of the at least one error value, and use the average value as the error value corresponding to the sample image set.
- the computer device may perform a weighted summation of the at least one error value to obtain the error value corresponding to the set of sample images.
- the embodiment of the present invention does not limit which implementation manner is adopted.
- Step 206 The computer device adjusts the model parameters of the initial model based on the error value, and stops when the target condition is met, to obtain a position acquisition model.
- the computer device After the computer device obtains the error value of the initial model prediction, it can adjust the model parameters based on the error value until the error value is small, and the position acquisition model is obtained.
- the accuracy of the position acquisition model prediction is relatively high.
- the target condition may be that the error value converges or the number of iterations reaches the target number.
- the position acquisition model obtained by the target condition has better image processing capabilities, and can achieve a target tracking process with a small error value.
- the multi-frame sample image may include a plurality of sample image sets, and each sample image set corresponds to an error value of the predicted position.
- the computer equipment can adjust the model parameters of the initial model according to the error value corresponding to each sample image set.
- the computer device can also divide the training samples into multiple batches, each batch includes a target number of sample image sets, and the computer device can compare the initial value based on the error value corresponding to each batch.
- the model parameters of the model are adjusted. For example, for each target number of sample image sets in the multiple sample image sets, the computer device may adjust the model parameters of the initial model based on multiple error values corresponding to the target number of sample image sets.
- the target number can be set by relevant technical personnel according to requirements, which is not limited in the embodiment of the present invention.
- the multiple sample image sets may also include not good sample images, for example, among the multiple frames of sample images in the sample image set, The movement displacement of the selected target is large, even moving out of the range of the image, then the error value corresponding to the sample image set does not play a big role in the training of the initial model, and the influence of this part of the sample should be weakened. Samples can be called difficult samples.
- the computer device can also perform any of the following methods:
- Method 1 The computer device removes the error values that satisfy the error value condition among the multiple error values based on the multiple error values corresponding to the target number of sample image sets, and the computer device performs the initial model based on the remaining multiple error values The model parameters are adjusted.
- Manner 2 The computer device determines the first weight of the multiple error values based on the multiple error values corresponding to the target number of sample image sets, and the computer device determines the first weight of the multiple error values based on the multiple error values and the multiple error values, The model parameters of the initial model are adjusted, and the first weight of the error value satisfying the error value condition among the plurality of error values is zero.
- the above method 1 and method 2 are the process of reducing the effect of the error value that satisfies the error value condition on the model parameter adjustment to zero.
- the part of the error value is directly removed, and the second method Middle is to set the first weight for it, and set the weight to zero.
- the error value condition may be an error value belonging to the target ratio with the largest error value. Both the error value condition and the target ratio can be set by relevant technicians according to requirements, which is not limited in the embodiment of the present invention.
- the target ratio can be 10%
- the computer device can remove 10% of the training samples in a batch, remove the 10% with the largest error value, or weight the error value of the 10% with the largest error value. Reset to zero.
- each sample image set may correspond to a second weight, and the second weight is used to represent the displacement of the selected target in the multi-frame sample images of the sample image set.
- the movement displacement of the selected target in the multi-frame sample images of the sample image set is very small, or even zero, when the selected target is tracked, the error value obtained does not reflect the predictive ability of the initial model. Therefore, This part of the error value should be weakened when adjusting the model parameters.
- the computer device may obtain the second weight of the error value of each sample image set, and the second weight is positive to the displacement of the target in each sample image set in the multi-frame sample image.
- the computer device may obtain the second weight, it can adjust the model parameters of the initial model based on multiple error values and multiple second weights corresponding to the target number of sample image sets. For example, the computer device may obtain the total error value corresponding to the target number of sample image sets based on the plurality of error values corresponding to the target number of sample image sets and the plurality of second weights, so as to determine the initial value based on the total error value.
- the model parameters of the model are adjusted.
- the second weight A motion can be introduced, and the computer device can obtain the second weight through the following formula 3:
- a motion is the second weight
- i is the identification of the sample image set
- R S is the position indication information corresponding to the second sample image
- Y T is the first position indication information corresponding to the first sample image
- Y S is the two positions corresponding to the sample image based on the position indication information or the R S Gaussian shape obtained indication information.
- the sample image set includes one frame of the first sample image and two frames of the second sample image as an example. T is used to represent the first sample image, S is used to represent the second sample image, and S1 is used to represent the One frame of second sample image, S2 is used to represent another frame of second sample image.
- the use of one frame of the first sample image (template image block) and one frame of the second sample image (search image block) is shown in #1 and #2 on the left, which may be coincidental Lead to success.
- the case of using one frame of the first sample image and two frames of the second sample image is shown in #1, #2, and #3 in the right figure, #2 in the right figure can also be called search image block #1, in the right figure #3 can also be called search image block #2.
- the computer device can synthesize the above-mentioned first weight and second weight to adjust the model parameters, that is, consider both the situation of the sample error value and the displacement situation. Specifically, for the multiple error values corresponding to the target number of sample image sets, the computer device may obtain the total weight of each error value based on the first weight and the second weight, and based on the total weight of the multiple error values, The multiple error values are weighted and summed to obtain the total error value of the multiple error values, and the model parameters of the initial model are adjusted based on the total error value.
- a drop is the first weight
- a motion is the second weight
- n is the target number
- n is a positive integer greater than 1
- i is the identification of the sample image set. Is the total weight.
- the total error value can be represented by the reduced reconstruction error.
- the process of obtaining the total error value can be implemented by the following formula five:
- the total error value corresponding to the sample image set is only an exemplary description, and the total error value may also be represented by other errors or reward values, which is not limited in the embodiment of the present invention.
- the model parameters are adjusted, and the accuracy of the image processing of the obtained position acquisition model is also improved.
- the process of obtaining the total error value can be implemented by the following formula six:
- Y T is the first position of the selected target in a sample image (the first sample The first position indication information corresponding to the image), Is the total error value corresponding to the target number of sample image sets.
- model parameter adjustment process can be implemented by means of gradient backhaul.
- equation 7 which is only used as an exemplary description and does not limit the adjustment process:
- the position acquisition model can be called a tracker, which can track forward and backward, that is, given an initial tracking target, the tracker can track the target forward, and at the same time, the position where the last tracking ends As the starting point, the tracker should be able to trace back to the initially specified position.
- a tracker can track forward and backward, that is, given an initial tracking target, the tracker can track the target forward, and at the same time, the position where the last tracking ends As the starting point, the tracker should be able to trace back to the initially specified position.
- the position acquisition model obtained by training processes the multi-frame image to obtain the position of the target in the multi-frame image.
- the position acquisition model can be obtained through forward and reverse process training, and the forward process can be based on The first position of the selected target in the first sample image predicts the third position of the selected target in the second sample image. Through the reverse process, the third position of the selected target can be predicted in the first sample image. The second position, because the selected target is randomly selected in the first sample image, the selected position is determined.
- the first position is the real position of the selected target, and the selected target is used in the first sample
- the first position and the second position in the image, the error value between the first position and the second position can reflect the accuracy of the model parameters of the initial model, so the initial model can be trained according to the first position and the second position, There is no need for manual annotation by relevant technical personnel, which can effectively reduce labor costs, improve the efficiency of model training, and the image processing process is simple, which effectively improves the efficiency of the entire target location acquisition process.
- FIG. 7 is a flowchart of a method for obtaining the location of a target according to an embodiment of the present invention.
- the method for obtaining the location of the target can be applied to a computer device.
- the computer device can be provided as a terminal or a server.
- the embodiment of the invention does not limit this. Referring to Figure 7, the method may include the following steps:
- Step 701 The computer device acquires a multi-frame image, a first image in the multi-frame image includes a target to be detected, and the first image is any one of the multi-frame images.
- the computer device can obtain a multi-frame image, and process the multi-frame image to determine the first position of the target to be detected in the multi-frame image.
- the computer device may obtain the multi-frame image in multiple ways.
- the computer device may obtain the multi-frame image in different ways.
- the computer device may have an image acquisition function, and the computer device may take an image, and perform the following image processing process on the captured multi-frame image to track the target to be detected in the multi-frame image.
- the computer device can also receive multi-frame images sent by the image acquisition device, and execute the following image processing process to track the target to be detected in the multi-frame image.
- the computer equipment can also obtain a video taken in real time or a video stored at a target address, extract multiple frames of images from the video, and perform the following image processing process to track the target to be detected in the multiple frames of images.
- the embodiment of the present invention does not limit the application scenario and the manner in which the computer device obtains the multi-frame image.
- the computer device may also crop the acquired or extracted multi-frame images to obtain the multi-frame images to be processed. Specifically, the computer device may crop the target area with the center of the multi-frame image as the center point from the obtained or extracted multi-frame image, to obtain the multi-frame image to be processed.
- the embodiments of the present invention will not be repeated here.
- Step 702 The computer device calls the location acquisition model.
- the model parameters of the position acquisition model are based on the position of the target to be detected in the first sample image in the multi-frame sample image (real position) and the position of the target to be detected in the first sample image (predicted position)
- the position of the target to be detected in the first sample image is obtained based on the position of the target to be detected in the second sample image in the multi-frame sample image.
- the position acquisition model can be obtained through the model training process shown in FIG. 2 above.
- the computer device shown in FIG. 7 may be the computer device shown in FIG. 2 above, that is, the computer device can call the location to obtain the model from the locally stored data.
- the computer shown in FIG. 7 The device and the computer device shown in FIG. 2 may also be different computer devices.
- the computer device shown in FIG. 2 may encapsulate the position acquisition model obtained by training and send it to the computer device shown in FIG. 7, and the computer The device performs processing such as decompression, and when image processing is required, the location can be called to obtain the model.
- the computer device shown in FIG. 7 may also call the position acquisition model trained in the computer device shown in FIG. 2 in real time when image processing is required, which is not limited in the embodiment of the present invention.
- Step 703 The computer device processes the second image based on the model parameters of the position acquisition model based on the position acquisition model and the position of the target to be detected in the first image, and outputs the target to be detected in the second image.
- the second image is another image in the multi-frame image that is different from the first image.
- the position of the target to be detected in the first image may be manually annotated by relevant technicians, or may be obtained by scanning the first image based on scan settings by a computer device.
- the technician may mark a target area in the first image according to requirements, and use it as the target to be detected.
- a computer device can be set to track a person. Therefore, the computer device can scan and recognize the face of the first image to determine the location of the person and use it as the target to be detected.
- the method for obtaining the position of the object to be detected can also be applied to other application scenarios, and the computer device can also use other methods to determine the position of the object to be detected in the first image.
- the present invention The embodiment does not limit this.
- This step 703 is the same as the above step 203.
- the computer device can obtain the position of the object to be detected in the second image through the following steps 1 and 2.
- Step 1 The position acquisition model in the computer device acquires image processing parameters based on the position of the target to be detected in the first image, the first image, and the model parameters.
- the position acquisition model in the computer device can generate the position indication information corresponding to the first image based on the position of the target to be detected in the first image, and the position indication information corresponding to the first image
- the position indication information is used to indicate the position of the target in the first image.
- the location acquisition model in the computer device may acquire image processing parameters based on the location indication information corresponding to the first image, the first image, and the model parameters.
- the location indication information is a response graph
- the location of the peak of the response graph is the location of the target to be detected.
- the location acquisition model in the computer device can perform feature extraction on the first image based on the model parameters to obtain the image features of the first image, and then the image based on the first image The feature and the position indication information corresponding to the first image are acquired to obtain image processing parameters.
- Step 2 The position acquisition model in the computer device processes the second image based on the image processing parameters, and outputs the position of the target to be detected in the second image.
- the position acquisition model in the computer device can process the second image based on the image processing parameters, and output the position indication information corresponding to the second image.
- the position indication information is used to indicate the position of the target to be detected in the second image.
- the position acquisition model in the computer device can perform feature extraction on the second image based on the model parameters to obtain the image features of the second image, and then based on the image processing parameters, The image feature of the second image is processed, and the position indication information corresponding to the second image is output.
- This step 703 is the same as the above step 203, and will not be repeated here.
- the position acquisition model obtained by training processes the multi-frame images to obtain the position of the target to be detected in the multi-frame image.
- the position acquisition model can use the forward and backward processes to use the target to be detected in the
- the true position and predicted position in the first sample image are trained on the initial model without manual annotation by relevant technical personnel, which can effectively reduce labor costs and improve the efficiency of model training.
- the image processing process is simple and effectively improves the entire The efficiency of the process of obtaining the location of the target to be detected.
- FIG. 8 is a flowchart of a method for obtaining the position of a target provided by an embodiment of the present invention. Referring to FIG. 8, the method may include the following steps:
- Step 801 The computer device obtains multiple frames of sample images.
- Step 802 The computer device calls the initial model, and obtains the selected target in the second sample image based on the first position of the selected target in the first sample image in the multi-frame sample image according to the initial model.
- the third position based on the third position of the selected target in the second sample image, acquiring the second position of the selected target in the first sample image, based on the first position and the The second position is to adjust the model parameters of the initial model to obtain a position acquisition model;
- the selected target is obtained by the initial model randomly selecting a target area in the first sample image;
- the second The sample image is a sample image that is different from the first sample image among the multiple frame sample images.
- the steps 801 and 802 are the same as the content of the embodiment shown in FIG. 2, and the embodiment of the present invention will not be repeated here.
- the computer device calls the location acquisition model, and determines the location of the target to be detected in the multiple frames of images according to the location acquisition model.
- This step 803 is the same as the content of the above-mentioned embodiment shown in FIG. 7, and the embodiment of the present invention will not be repeated here.
- the selected target in the first sample image is randomly selected through the initial model, and the transition is performed based on the second sample image, and the predicted position of the target in the first sample image is obtained through forward and reverse processes.
- the initial model is trained without manual labeling by relevant technical personnel, which can effectively reduce labor costs and improve the efficiency of model training.
- the obtained position acquisition model processes the image to acquire the position of the target to be detected.
- the image processing process is simple, which effectively improves the efficiency of the entire position acquisition process of the target to be detected.
- FIG. 9 is a schematic structural diagram of an apparatus for acquiring a target position according to an embodiment of the present invention.
- the apparatus may include:
- the image acquisition module 901 is configured to acquire a multi-frame image, a first image in the multi-frame image includes a target to be detected, and the first image is any one of the multi-frame images;
- the model calling module 902 is configured to call a position acquisition model, and the model parameters of the position acquisition model are based on the first position of the selected target in the first sample image in the multi-frame sample image and the selected target in the
- the second position in the first sample image is obtained through training, and the second position is predicted based on the third position of the selected target in the second sample image in the multi-frame sample image.
- the third The position is predicted based on the first position; the selected target is randomly selected from the first sample image; the second sample image is different from the multi-frame sample image Sample image of the first sample image;
- the position acquisition module 903 is configured to determine the position of the target to be detected in the second image based on the model parameters and the position of the target to be detected in the first image through the position acquisition model.
- the second image is an image that is different from the first image among the multi-frame images.
- the location acquisition module 903 is used to:
- the second image is processed, and the position of the target to be detected in the second image is output.
- the location acquisition module 903 is used to:
- the position indication information corresponding to the first image is generated, and the position indication information corresponding to the first image is used to indicate that the target to be detected is in the first image.
- the position acquisition module 903 is used to:
- the second image is processed, and position indication information corresponding to the second image is output.
- the position indication information corresponding to the second image is used to indicate that the target to be detected is in the first image. 2. The predicted position in the image.
- the location acquisition module 903 is used to:
- the position acquisition module 903 is used to:
- the device further includes a model training module, and the model training module is used for:
- the initial model is called, and the target area in the first sample image in the multi-frame sample image is randomly selected as the selected target through the initial model, based on the selected target in the first sample image
- the first position, the first sample image and the second sample image, the third position of the selected target in the second sample image is acquired, based on the selected target in the second sample image Obtaining the second position of the selected target in the first sample image, the third position of the first sample image and the second sample image;
- the model parameters of the initial model are adjusted until the target condition is met, and the position acquisition model is obtained.
- model training module is used to:
- the model training module is used to:
- the first sample image is processed to obtain the second position.
- the model training module is used to:
- the model training module is used to:
- model training module is used to:
- first position indication information corresponding to the first sample image is generated, and the first position indication information is used to indicate the selected target in the first sample image position;
- the position indication information corresponding to the second sample image is acquired, and the position indication information corresponding to the second sample image is used to indicate all The predicted position of the selected target in the second sample image;
- the model training module is used to:
- the second position indication information corresponding to the first sample image is acquired, and the second position indication The information is used to indicate the predicted position of the target in the first sample image.
- the multi-frame sample image includes a plurality of sample image sets, each sample image set includes one frame of the first sample image and at least one frame of the second sample image, and each sample image set corresponds to one sample image set.
- the model training module is used to:
- the model parameters of the initial model are adjusted based on multiple error values corresponding to the target number of sample image sets.
- model training module is used to perform any of the following:
- each sample image set corresponds to a second weight
- the adjustment of the model parameters of the initial model based on the multiple error values corresponding to the set of sample images of the target number includes:
- the model parameters of the initial model are adjusted.
- the device provided by the embodiment of the present invention processes a multi-frame image through a position acquisition model obtained by training to obtain the position of a target in the multi-frame image.
- the position acquisition model can be obtained through forward and reverse process training.
- the direction process can predict the third position of the selected target in the second sample image according to the first position of the selected target in the first sample image, and the reverse process can predict that the selected target is in the first position according to the third position.
- the first position is the actual position of the selected target.
- the first position and the second position in the first sample image, the error value between the first position and the second position can reflect the accuracy of the model parameters of the initial model, so the initial model can be adjusted according to the first position and the second position.
- the model is trained without manual annotation by relevant technical personnel, which can effectively reduce labor costs and improve the efficiency of model training.
- the image processing process is simple, and the efficiency of the entire target location acquisition process is effectively improved.
- FIG. 10 is a schematic structural diagram of a device for acquiring a target position according to an embodiment of the present invention.
- the device may include:
- the image acquisition module 1001 is used to acquire multiple frames of sample images
- the model training module 1002 is configured to call an initial model, and obtain the selected target in the second sample image based on the first position of the selected target in the first sample image in the multi-frame sample image according to the initial model
- the third position in the second sample image is based on the third position of the selected target in the second sample image, and the second position of the selected target in the first sample image is acquired based on the first position and In the second position, the model parameters of the initial model are adjusted to obtain a position acquisition model;
- the position acquisition module 1003 is configured to call the position acquisition model when multiple frames of images are acquired, and determine the position of the target to be detected in the multiple frames of images according to the position acquisition model.
- the device provided in the embodiment of the present invention randomly selects the selected target in the first sample image through the initial model, and performs transition based on the second sample image, and trains the initial model through forward and reverse processes, and through forward process
- the third position of the selected target in the second sample image can be predicted according to the first position of the selected target in the first sample image, and the third position of the selected target can be predicted according to the third position in the first sample image. Since the selected target is randomly selected in the first sample image, the selected position is determined.
- the first position is the real position of the selected target, and the selected target is used in the first
- the first position and the second position in the sample image the error value between the first position and the second position can reflect the accuracy of the model parameters of the initial model, so the initial model can be performed according to the first position and the second position.
- Training does not require manual annotation by relevant technical personnel, which can effectively reduce labor costs and improve the efficiency of model training.
- the image processing process is simple, and the efficiency of the entire target location acquisition process is effectively improved.
- the foregoing computer equipment may be provided as the terminal shown in FIG. 11 below, or may be provided as the server shown in FIG. 12 below, which is not limited in the embodiment of the present invention.
- FIG. 11 is a schematic structural diagram of a terminal provided by an embodiment of the present invention.
- the terminal 1100 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compressing standard audio Level 4) Player, laptop or desktop computer.
- the terminal 1100 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
- the terminal 1100 includes: one or more processors 1101 and one or more memories 1102.
- the processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
- the processor 1101 can adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
- the processor 1101 may also include a main processor and a coprocessor.
- the main processor is a processor used to process data in the wake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
- the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing content that needs to be displayed on the display screen.
- the processor 1101 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process calculation operations related to machine learning.
- AI Artificial Intelligence
- the memory 1102 may include one or more computer-readable storage media, which may be non-transitory.
- the memory 1102 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
- the non-transitory computer-readable storage medium in the memory 1102 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1101 to achieve the goal provided by the method embodiment of the present invention. Location acquisition method.
- the terminal 1100 may optionally further include: a peripheral device interface 1103 and at least one peripheral device.
- the processor 1101, the memory 1102, and the peripheral device interface 1103 may be connected by a bus or a signal line.
- Each peripheral device can be connected to the peripheral device interface 1103 through a bus, a signal line, or a circuit board.
- the peripheral device includes: at least one of a radio frequency circuit 1104, a display screen 1105, a camera 1106, an audio circuit 1107, a positioning component 1108, and a power supply 1109.
- the peripheral device interface 1103 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1101 and the memory 1102.
- the processor 1101, the memory 1102, and the peripheral device interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1101, the memory 1102, and the peripheral device interface 1103 or The two can be implemented on separate chips or circuit boards, which are not limited in this embodiment.
- the radio frequency circuit 1104 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
- the radio frequency circuit 1104 communicates with a communication network and other communication devices through electromagnetic signals.
- the radio frequency circuit 1104 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
- the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
- the radio frequency circuit 1104 can communicate with other terminals through at least one wireless communication protocol.
- the wireless communication protocol includes but is not limited to: metropolitan area network, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network.
- the radio frequency circuit 1104 may also include NFC (Near Field Communication) related circuits, which is not limited in the present invention.
- the display screen 1105 is used to display UI (User Interface).
- the UI can include graphics, text, icons, videos, and any combination thereof.
- the display screen 1105 also has the ability to collect touch signals on or above the surface of the display screen 1105.
- the touch signal may be input to the processor 1101 as a control signal for processing.
- the display screen 1105 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
- the display screen 1105 there may be one display screen 1105, which is provided with the front panel of the terminal 1100; in other embodiments, there may be at least two display screens 1105, which are respectively arranged on different surfaces of the terminal 1100 or in a folded design; In still other embodiments, the display screen 1105 may be a flexible display screen, which is disposed on the curved surface or the folding surface of the terminal 1100. Moreover, the display screen 1105 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen.
- the display screen 1105 may be made of materials such as LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode, organic light emitting diode).
- the camera assembly 1106 is used to capture images or videos.
- the camera assembly 1106 includes a front camera and a rear camera.
- the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
- the camera assembly 1106 may also include a flash.
- the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
- the audio circuit 1107 may include a microphone and a speaker.
- the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 1101 for processing, or input to the radio frequency circuit 1104 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively set in different parts of the terminal 1100.
- the microphone can also be an array microphone or an omnidirectional acquisition microphone.
- the speaker is used to convert the electrical signal from the processor 1101 or the radio frequency circuit 1104 into sound waves.
- the speaker can be a traditional membrane speaker or a piezoelectric ceramic speaker.
- the audio circuit 1107 may also include a headphone jack.
- the positioning component 1108 is used to locate the current geographic location of the terminal 1100 to implement navigation or LBS (Location Based Service, location-based service).
- the positioning component 1108 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Granus system of Russia, or the Galileo system of the European Union.
- the power supply 1109 is used to supply power to various components in the terminal 1100.
- the power source 1109 may be alternating current, direct current, disposable batteries or rechargeable batteries.
- the rechargeable battery may support wired charging or wireless charging.
- the rechargeable battery can also be used to support fast charging technology.
- the terminal 1100 further includes one or more sensors 1110.
- the one or more sensors 1110 include, but are not limited to: an acceleration sensor 1111, a gyroscope sensor 1112, a pressure sensor 1113, a fingerprint sensor 1114, an optical sensor 1115, and a proximity sensor 1116.
- the acceleration sensor 1111 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1100.
- the acceleration sensor 1111 can be used to detect the components of gravitational acceleration on three coordinate axes.
- the processor 1101 may control the display screen 1105 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 1111.
- the acceleration sensor 1111 may also be used for the collection of game or user motion data.
- the gyroscope sensor 1112 can detect the body direction and rotation angle of the terminal 1100, and the gyroscope sensor 1112 can cooperate with the acceleration sensor 1111 to collect the user's 3D actions on the terminal 1100.
- the processor 1101 can implement the following functions according to the data collected by the gyroscope sensor 1112: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
- the pressure sensor 1113 may be arranged on the side frame of the terminal 1100 and/or the lower layer of the display screen 1105.
- the processor 1101 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 1113.
- the processor 1101 controls the operability controls on the UI interface according to the user's pressure operation on the display screen 1105.
- the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
- the fingerprint sensor 1114 is used to collect the user's fingerprint.
- the processor 1101 can identify the user's identity based on the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 can identify the user's identity based on the collected fingerprint.
- the processor 1101 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
- the fingerprint sensor 1114 may be provided on the front, back or side of the terminal 1100. When a physical button or a manufacturer logo is provided on the terminal 1100, the fingerprint sensor 1114 may be integrated with the physical button or the manufacturer logo.
- the optical sensor 1115 is used to collect the ambient light intensity.
- the processor 1101 may control the display brightness of the display screen 1105 according to the ambient light intensity collected by the optical sensor 1115. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the display screen 1105 is decreased.
- the processor 1101 may also dynamically adjust the shooting parameters of the camera assembly 1106 according to the ambient light intensity collected by the optical sensor 1115.
- the proximity sensor 1116 also called a distance sensor, is usually arranged on the front panel of the terminal 1100.
- the proximity sensor 1116 is used to collect the distance between the user and the front of the terminal 1100.
- the processor 1101 controls the display screen 1105 to switch from the on-screen state to the off-screen state; when the proximity sensor 1116 detects When the distance between the user and the front of the terminal 1100 gradually increases, the processor 1101 controls the display screen 1105 to switch from the off-screen state to the on-screen state.
- FIG. 11 does not constitute a limitation on the terminal 1100, and may include more or less components than those shown in the figure, or combine certain components, or adopt different component arrangements.
- FIG. 12 is a schematic structural diagram of a server provided by an embodiment of the present invention.
- the server 1200 may have relatively large differences due to different configurations or performance, and may include one or more processors (central processing units, CPU) 1201 and one Or multiple memories 1202, wherein at least one instruction is stored in the one or more memories 1202, and the at least one instruction is loaded and executed by the one or more processors 1201 to achieve the objectives provided by the foregoing method embodiments Location acquisition method.
- the server 1200 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output, and the server 1200 may also include other components for implementing device functions, which will not be repeated here.
- a computer-readable storage medium such as a memory including instructions, which can be executed by a processor to complete the method for obtaining the location of the target in the foregoing embodiment.
- the computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a CD-ROM (Compact Disc Read-Only Memory, CD-ROM), Tapes, floppy disks and optical data storage devices, etc.
- the program can be stored in a computer-readable storage medium.
- the storage medium can be read-only memory, magnetic disk or optical disk, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (16)
- 一种目标的位置获取方法,所述方法应用于计算机设备,所述方法包括:获取多帧图像,所述多帧图像中的第一图像包括待检测目标,所述第一图像为所述多帧图像中的任一帧图像;调用位置获取模型,所述位置获取模型的模型参数是基于选定目标在多帧样本图像中第一样本图像中的第一位置以及所述选定目标在所述第一样本图像中的第二位置训练得到,所述第二位置是基于所述选定目标在所述多帧样本图像中第二样本图像中的第三位置预测得到的,所述第三位置是基于所述第一位置预测得到的;所述选定目标是在所述第一样本图像中随机选取得到的;所述第二样本图像为所述多帧样本图像中区别于所述第一样本图像的样本图像;通过所述位置获取模型基于所述模型参数以及所述待检测目标在所述第一图像中的位置,确定所述待检测目标在第二图像中的位置,所述第二图像为所述多帧图像中区别于所述第一图像的图像。
- 根据权利要求1所述的方法,所述位置获取模型基于所述模型参数以及所述待检测目标在所述第一图像中的位置,确定所述待检测目标在第二图像中的位置,包括:基于所述待检测目标在所述第一图像中的位置、所述第一图像以及所述模型参数,获取图像处理参数;基于所述图像处理参数,对所述第二图像进行处理,输出所述待检测目标在所述第二图像中的位置。
- 根据权利要求2所述的方法,所述基于所述待检测目标在所述第一图像中的位置、所述第一图像以及所述模型参数,获取图像处理参数,包括:基于所述待检测目标在所述第一图像中的位置,生成所述第一图像对应的位置指示信息,所述第一图像对应的位置指示信息用于表示所述待检测目标在所述第一图像中选定的位置;基于所述第一图像对应的位置指示信息、所述第一图像以及所述模型参数,获取所述图像处理参数;所述基于所述图像处理参数,对所述第二图像进行处理,输出所述待检测目标在所述第二图像中的位置,包括:基于所述图像处理参数,对所述第二图像进行处理,输出所述第二图像对应的位置指示信息,所述第二图像对应的位置指示信息用于表示所述待检测目标在所述第二图像中预测得到的位置。
- 根据权利要求3所述的方法,所述基于所述第一图像对应的位置指示信息、所述第一图像以及所述模型参数,获取所述图像处理参数,包括:基于所述模型参数,对所述第一图像进行特征提取,得到所述第一图像的图像特征;基于所述第一图像的图像特征和所述第一图像对应的位置指示信息,获取所述图像处理参数;所述基于所述图像处理参数,对所述第二图像进行处理,输出所述第二图像对应的位置指示信息,包括:基于所述模型参数,对所述第二图像进行特征提取,得到所述第二图像的图像特征;基于所述图像处理参数,对所述第二图像的图像特征进行处理,输出所述第二图像对应的位置指示信息。
- 根据权利要求1所述的方法,所述位置获取模型的训练过程包括:获取多帧样本图像;调用初始模型,通过所述初始模型随机选取所述多帧样本图像中的第一样本图像中目标区域作为所述选定目标,基于所述选定目标在所述第一样本图像中的第一位置、所述第一样本图像和第二样本图像,获取所述选定目标在所述第二样本图像中的第三位置,基于所述选定目标在所述第二样本图像中的第三位置、所述第一样本图像和第二样本图像,获取所述选定目标在所述第一样本图像中的第二位置;基于所述选定目标在所述第一样本图像中的第一位置和第二位置,获取所述第二位置相对于所述第一位置的误差值;基于所述误差值,对所述初始模型的模型参数进行调整,直至符合目标条件时停止,得到所述位置获取模型。
- 根据权利要求5所述的方法,所述基于所述选定目标在所述第一样本图像中的第一位置、所述第一样本图像和第二样本图像,获取所述选定目标在所述第二样本图像中的第三位置,包括:基于所述第一位置以及所述第一样本图像,获取第一图像处理参数;基于所述第一图像处理参数,对所述第二样本图像进行处理,得到所述第三位置;所述基于所述选定目标在所述第二样本图像中的第三位置、所述第一样本图像和第二样本图像,获取所述选定目标在所述第一样本图像中的第二位置,包括:基于所述第三位置和所述第二样本图像,获取第二图像处理参数;基于所述第二图像处理参数,对所述第一样本图像进行处理,得到所述第二位置。
- 根据权利要求6所述的方法,所述基于所述第一位置以及所述第一样本图像,获取第一图像处理参数,包括:基于所述初始模型的模型参数,对所述第一样本图像进行特征提取,得到所述第一样本图像的图像特征;基于所述第一样本图像的图像特征和所述第一位置,获取所述第一图像处理参数;所述基于所述第一图像处理参数,对所述第二样本图像进行处理,得到所述第三位置,包括:基于所述初始模型的模型参数,对所述第二样本图像进行特征提取,得到所述第二样本图像的图像特征;基于所述第一图像处理参数,对所述第二样本图像的图像特征进行处理,得到所述第三位置。
- 根据权利要求5-7任一项所述的方法,所述基于所述选定目标在所述第一样本图像中的第一位置、所述第一样本图像和第二样本图像,获取所述选定目标在所述第二样本图像中的第三位置,包括:基于所述第一位置,生成所述第一样本图像对应的第一位置指示信息,所述第一位置指示信息用于指示所述选定目标在所述第一样本图像中选定的位置;基于所述第一位置指示信息、所述第一样本图像和第二样本图像,获取所述第二样本图像对应的位置指示信息,所述第二样本图像对应的位置指示信息用于表示所述选定目标在所述第二样本图像中预测得到的位置;所述基于所述选定目标在所述第二样本图像中的第三位置、所述第一样本图像和第二样本图像,获取所述选定目标在所述第一样本图像中的第二位置,包括:基于所述第二样本图像对应的位置指示信息、所述第一样本图像和所述第二样本图像,获取所述第一样本图像对应的第二位置指示信息,所述第二位置指示信息用于表示所述目标在所述第一样本图像中预测得到的位置。
- 根据权利要求5所述的方法,所述多帧样本图像包括多个样本图像集合,每个样本图像集合包括一帧第一样本图像和至少一帧第二样本图像,每个样本图像集合对应一个所述误差值;所述基于所述误差值,对所述初始模型的模型参数进行调整,包括:对于所述多个样本图像集合中每目标数量的样本图像集合,基于所述目标数量的样本图像集合对应的多个误差值,对所述初始模型的模型参数进行调整。
- 根据权利要求9所述的方法,所述基于所述目标数量的样本图像集合对应的多个误差值,对所述初始模型的模型参数进行调整,包括下述任一项:基于所述目标数量的样本图像集合对应的多个误差值,去除所述多个误差值中满足误差值条件的误差值;基于剩下的误差值,对所述初始模型的模型参数进行调整;基于所述目标数量的样本图像集合对应的多个误差值,确定所述多个误差值的第一权重;基于所述多个误差值的第一权重和所述多个误差值,对所述初始模型的模型参数进行调整,所述多个误差值中满足误差值条件的误差值的第一权重为零。
- 根据权利要求9或10所述的方法,每个样本图像集合对应一个第二权重;所述基于所述目标数量的样本图像集合对应的多个误差值,对所述初始模型的模型参数进行调整,包括:获取每个样本图像集合的误差值的第二权重,所述第二权重与所述每个样本图像集合中所述选定目标在多帧样本图像中的位移正相关;基于所述目标数量的样本图像集合对应的多个误差值和多个第二权重,对所述初始模型的模型参数进行调整。
- 一种目标的位置获取方法,所述方法应用于计算机设备,所述方法包括:获取多帧样本图像;调用初始模型,根据所述初始模型基于选定目标在所述多帧样本图像中第一样本图像中的第一位置,获取所述选定目标在第二样本图像中的第三位置,基于所述选定目标在第二样本图像中的第三位置,获取所述选定目标在所述第一样本图像中的第二位置,基于所述第一位置和所述第二位置,对所述初始模型的模型参数进行调整,得到位置获取模型;所述选定目标是所述初始模型在所述第一样本图像中随机选取目标区域得到的;所述第二样本图像为所述多帧样本图像中区别于所述第一样本图像的样本图像;当获取到多帧图像时,调用所述位置获取模型,根据所述位置获取模型确定待检测目标在所述多帧图像中的位置。
- 一种目标的位置获取装置,所述装置包括:图像获取模块,用于获取多帧图像,所述多帧图像中的第一图像包括待检测目标,所述第一图像为所述多帧图像中的任一帧图像;模型调用模块,用于调用位置获取模型,所述位置获取模型的模型参数是基于选定目标在多帧样本图像中第一样本图像中的第一位置以及所述选定目标在所述第一样本图像中的第二位置训练得到,所述第二位置是基于所述选定目标在所述多帧样本图像中第二样本图像中的第三位置预测得到的,所述第三位置是基于所述第一位置预测得到的;所述选定 目标是在所述第一样本图像中随机选取得到的;所述第二样本图像为所述多帧样本图像中区别于所述第一样本图像的样本图像;位置获取模块,用于通过所述位置获取模型基于所述模型参数以及所述待检测目标在所述第一图像中的位置,确定所述待检测目标在第二图像中的位置,所述第二图像为所述多帧图像中区别于所述第一图像的图像。
- 一种目标的位置获取装置,所述装置包括:图像获取模块,用于获取多帧样本图像;模型训练模块,用于调用初始模型,根据所述初始模型基于选定目标在所述多帧样本图像中第一样本图像中的第一位置,获取所述选定目标在第二样本图像中的第三位置,基于所述选定目标在第二样本图像中的第三位置,获取所述选定目标在所述第一样本图像中的第二位置,基于所述第一位置和所述第二位置,对所述初始模型的模型参数进行调整,得到位置获取模型;位置获取模块,用于当获取到多帧图像时,调用所述位置获取模型,根据所述位置获取模型确定待检测目标在所述多帧图像中的位置。
- 一种计算机设备,所述计算机设备包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有至少一条指令,所述指令由所述一个或多个处理器加载并执行以实现如权利要求1至权利要求12任一项所述的目标的位置获取方法所执行的操作。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1至权利要求12任一项所述的目标的位置获取方法所执行的操作。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021542180A JP7154678B2 (ja) | 2019-05-06 | 2020-04-28 | 目標の位置取得方法、装置、コンピュータ機器及びコンピュータプログラム |
KR1020217025054A KR20210111833A (ko) | 2019-05-06 | 2020-04-28 | 타겟의 위치들을 취득하기 위한 방법 및 장치와, 컴퓨터 디바이스 및 저장 매체 |
EP20802490.1A EP3968223A4 (en) | 2019-05-06 | 2020-04-28 | METHOD AND APPARATUS FOR ACQUIRING TARGET POSITIONS, COMPUTER DEVICE, AND INFORMATION MEDIA |
US17/377,302 US20210343041A1 (en) | 2019-05-06 | 2021-07-15 | Method and apparatus for obtaining position of target, computer device, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910371250.9 | 2019-05-06 | ||
CN201910371250.9A CN110110787A (zh) | 2019-05-06 | 2019-05-06 | 目标的位置获取方法、装置、计算机设备及存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/377,302 Continuation US20210343041A1 (en) | 2019-05-06 | 2021-07-15 | Method and apparatus for obtaining position of target, computer device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020224479A1 true WO2020224479A1 (zh) | 2020-11-12 |
Family
ID=67488282
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/087361 WO2020224479A1 (zh) | 2019-05-06 | 2020-04-28 | 目标的位置获取方法、装置、计算机设备及存储介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20210343041A1 (zh) |
EP (1) | EP3968223A4 (zh) |
JP (1) | JP7154678B2 (zh) |
KR (1) | KR20210111833A (zh) |
CN (1) | CN110110787A (zh) |
WO (1) | WO2020224479A1 (zh) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110787A (zh) * | 2019-05-06 | 2019-08-09 | 腾讯科技(深圳)有限公司 | 目标的位置获取方法、装置、计算机设备及存储介质 |
CN110717593B (zh) * | 2019-10-14 | 2022-04-19 | 上海商汤临港智能科技有限公司 | 神经网络训练、移动信息测量、关键帧检测的方法及装置 |
CN110705510B (zh) * | 2019-10-16 | 2023-09-05 | 杭州优频科技有限公司 | 一种动作确定方法、装置、服务器和存储介质 |
CN111127539B (zh) * | 2019-12-17 | 2022-11-15 | 苏州智加科技有限公司 | 视差确定方法、装置、计算机设备及存储介质 |
TWI727628B (zh) * | 2020-01-22 | 2021-05-11 | 台達電子工業股份有限公司 | 具有位姿補償功能的動態追蹤系統及其位姿補償方法 |
CN111369585B (zh) * | 2020-02-28 | 2023-09-29 | 上海顺久电子科技有限公司 | 一种图像处理方法及设备 |
CN111414948B (zh) * | 2020-03-13 | 2023-10-13 | 腾讯科技(深圳)有限公司 | 目标对象检测方法和相关装置 |
CN113469172B (zh) * | 2020-03-30 | 2022-07-01 | 阿里巴巴集团控股有限公司 | 目标定位、模型训练、界面交互方法及设备 |
CN112115777A (zh) * | 2020-08-10 | 2020-12-22 | 杭州优行科技有限公司 | 一种交通标志类别的检测识别方法、装置和设备 |
CN112016514B (zh) * | 2020-09-09 | 2024-05-14 | 平安科技(深圳)有限公司 | 一种交通标志识别方法、装置、设备及储存介质 |
CN113590877B (zh) * | 2021-08-05 | 2024-06-14 | 杭州海康威视数字技术股份有限公司 | 获取标注数据的方法及装置 |
CN114608555B (zh) * | 2022-02-28 | 2024-08-06 | 珠海云洲智能科技股份有限公司 | 目标定位方法、系统及存储介质 |
CN114419471B (zh) * | 2022-03-29 | 2022-08-30 | 北京云迹科技股份有限公司 | 一种楼层识别方法、装置、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107492113A (zh) * | 2017-06-01 | 2017-12-19 | 南京行者易智能交通科技有限公司 | 一种视频图像中运动目标位置预测模型训练方法、位置预测方法及轨迹预测方法 |
WO2018121841A1 (en) * | 2016-12-27 | 2018-07-05 | Telecom Italia S.P.A. | Method and system for identifying targets in scenes shot by a camera |
CN108734109A (zh) * | 2018-04-24 | 2018-11-02 | 中南民族大学 | 一种面向图像序列的视觉目标跟踪方法及系统 |
CN109584276A (zh) * | 2018-12-04 | 2019-04-05 | 北京字节跳动网络技术有限公司 | 关键点检测方法、装置、设备及可读介质 |
CN110110787A (zh) * | 2019-05-06 | 2019-08-09 | 腾讯科技(深圳)有限公司 | 目标的位置获取方法、装置、计算机设备及存储介质 |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7756296B2 (en) * | 2007-03-27 | 2010-07-13 | Mitsubishi Electric Research Laboratories, Inc. | Method for tracking objects in videos using forward and backward tracking |
US9141196B2 (en) * | 2012-04-16 | 2015-09-22 | Qualcomm Incorporated | Robust and efficient learning object tracker |
US9911197B1 (en) * | 2013-03-14 | 2018-03-06 | Hrl Laboratories, Llc | Moving object spotting by forward-backward motion history accumulation |
US10474921B2 (en) * | 2013-06-14 | 2019-11-12 | Qualcomm Incorporated | Tracker assisted image capture |
JP6344953B2 (ja) * | 2014-04-07 | 2018-06-20 | パナソニック株式会社 | 軌跡解析装置および軌跡解析方法 |
US9646389B2 (en) * | 2014-08-26 | 2017-05-09 | Qualcomm Incorporated | Systems and methods for image scanning |
US20160132728A1 (en) * | 2014-11-12 | 2016-05-12 | Nec Laboratories America, Inc. | Near Online Multi-Target Tracking with Aggregated Local Flow Descriptor (ALFD) |
US9811732B2 (en) * | 2015-03-12 | 2017-11-07 | Qualcomm Incorporated | Systems and methods for object tracking |
US9613273B2 (en) * | 2015-05-19 | 2017-04-04 | Toyota Motor Engineering & Manufacturing North America, Inc. | Apparatus and method for object tracking |
US10586102B2 (en) * | 2015-08-18 | 2020-03-10 | Qualcomm Incorporated | Systems and methods for object tracking |
US10019631B2 (en) * | 2015-11-05 | 2018-07-10 | Qualcomm Incorporated | Adapting to appearance variations when tracking a target object in video sequence |
EP3403216B1 (en) * | 2016-01-11 | 2023-11-01 | Mobileye Vision Technologies Ltd. | Systems and methods for augmenting upright object detection |
US10255505B2 (en) * | 2016-09-21 | 2019-04-09 | GumGum, Inc. | Augmenting video data to present real-time sponsor metrics |
US10339671B2 (en) * | 2016-11-14 | 2019-07-02 | Nec Corporation | Action recognition using accurate object proposals by tracking detections |
KR20240005161A (ko) * | 2016-12-09 | 2024-01-11 | 톰톰 글로벌 콘텐트 비.브이. | 비디오 기반 위치결정 및 매핑을 위한 방법 및 시스템 |
GB2561892A (en) * | 2017-04-28 | 2018-10-31 | Nokia Technologies Oy | A Method, an apparatus and a computer program product for object detection |
WO2019064375A1 (ja) * | 2017-09-27 | 2019-04-04 | 日本電気株式会社 | 情報処理装置、制御方法、及びプログラム |
CN109584265B (zh) * | 2017-09-28 | 2020-10-02 | 杭州海康威视数字技术股份有限公司 | 一种目标跟踪方法及装置 |
JP7346401B2 (ja) * | 2017-11-10 | 2023-09-19 | エヌビディア コーポレーション | 安全で信頼できる自動運転車両のためのシステム及び方法 |
CN108062525B (zh) * | 2017-12-14 | 2021-04-23 | 中国科学技术大学 | 一种基于手部区域预测的深度学习手部检测方法 |
WO2019136479A1 (en) * | 2018-01-08 | 2019-07-11 | The Regents On The University Of California | Surround vehicle tracking and motion prediction |
US10963700B2 (en) * | 2018-09-15 | 2021-03-30 | Accenture Global Solutions Limited | Character recognition |
CN109635657B (zh) * | 2018-11-12 | 2023-01-06 | 平安科技(深圳)有限公司 | 目标跟踪方法、装置、设备及存储介质 |
WO2020194664A1 (ja) * | 2019-03-28 | 2020-10-01 | オリンパス株式会社 | トラッキング装置、学習済モデル、内視鏡システム及びトラッキング方法 |
-
2019
- 2019-05-06 CN CN201910371250.9A patent/CN110110787A/zh not_active Withdrawn
-
2020
- 2020-04-28 JP JP2021542180A patent/JP7154678B2/ja active Active
- 2020-04-28 EP EP20802490.1A patent/EP3968223A4/en active Pending
- 2020-04-28 WO PCT/CN2020/087361 patent/WO2020224479A1/zh unknown
- 2020-04-28 KR KR1020217025054A patent/KR20210111833A/ko not_active Application Discontinuation
-
2021
- 2021-07-15 US US17/377,302 patent/US20210343041A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018121841A1 (en) * | 2016-12-27 | 2018-07-05 | Telecom Italia S.P.A. | Method and system for identifying targets in scenes shot by a camera |
CN107492113A (zh) * | 2017-06-01 | 2017-12-19 | 南京行者易智能交通科技有限公司 | 一种视频图像中运动目标位置预测模型训练方法、位置预测方法及轨迹预测方法 |
CN108734109A (zh) * | 2018-04-24 | 2018-11-02 | 中南民族大学 | 一种面向图像序列的视觉目标跟踪方法及系统 |
CN109584276A (zh) * | 2018-12-04 | 2019-04-05 | 北京字节跳动网络技术有限公司 | 关键点检测方法、装置、设备及可读介质 |
CN110110787A (zh) * | 2019-05-06 | 2019-08-09 | 腾讯科技(深圳)有限公司 | 目标的位置获取方法、装置、计算机设备及存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3968223A4 |
Also Published As
Publication number | Publication date |
---|---|
US20210343041A1 (en) | 2021-11-04 |
EP3968223A1 (en) | 2022-03-16 |
JP7154678B2 (ja) | 2022-10-18 |
EP3968223A4 (en) | 2022-10-26 |
CN110110787A (zh) | 2019-08-09 |
JP2022518745A (ja) | 2022-03-16 |
KR20210111833A (ko) | 2021-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020224479A1 (zh) | 目标的位置获取方法、装置、计算机设备及存储介质 | |
CN110210571B (zh) | 图像识别方法、装置、计算机设备及计算机可读存储介质 | |
CN110121118B (zh) | 视频片段定位方法、装置、计算机设备及存储介质 | |
CN110544272B (zh) | 脸部跟踪方法、装置、计算机设备及存储介质 | |
CN110807361B (zh) | 人体识别方法、装置、计算机设备及存储介质 | |
CN111325726A (zh) | 模型训练方法、图像处理方法、装置、设备及存储介质 | |
CN110555839A (zh) | 缺陷检测识别方法、装置、计算机设备及存储介质 | |
CN110570460B (zh) | 目标跟踪方法、装置、计算机设备及计算机可读存储介质 | |
CN112749613B (zh) | 视频数据处理方法、装置、计算机设备及存储介质 | |
CN111091166A (zh) | 图像处理模型训练方法、图像处理方法、设备及存储介质 | |
CN114332530A (zh) | 图像分类方法、装置、计算机设备及存储介质 | |
CN111062981A (zh) | 图像处理方法、装置及存储介质 | |
CN111192262A (zh) | 基于人工智能的产品缺陷分类方法、装置、设备及介质 | |
CN112733970B (zh) | 图像分类模型处理方法、图像分类方法及装置 | |
CN112581358B (zh) | 图像处理模型的训练方法、图像处理方法及装置 | |
CN108288032A (zh) | 动作特征获取方法、装置及存储介质 | |
CN111178343A (zh) | 基于人工智能的多媒体资源检测方法、装置、设备及介质 | |
CN113705302A (zh) | 图像生成模型的训练方法、装置、计算机设备及存储介质 | |
CN113918767A (zh) | 视频片段定位方法、装置、设备及存储介质 | |
CN111589138A (zh) | 动作预测方法、装置、设备及存储介质 | |
CN113821658A (zh) | 对编码器进行训练的方法、装置、设备及存储介质 | |
CN110232417B (zh) | 图像识别方法、装置、计算机设备及计算机可读存储介质 | |
CN111982293B (zh) | 体温测量方法、装置、电子设备及存储介质 | |
CN111310701B (zh) | 手势识别方法、装置、设备及存储介质 | |
CN114996515A (zh) | 视频特征提取模型的训练方法、文本生成方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20802490 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021542180 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217025054 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020802490 Country of ref document: EP Effective date: 20211206 |