WO2022205632A1 - Target detection method and apparatus, device and storage medium - Google Patents

Target detection method and apparatus, device and storage medium Download PDF

Info

Publication number
WO2022205632A1
WO2022205632A1 PCT/CN2021/102202 CN2021102202W WO2022205632A1 WO 2022205632 A1 WO2022205632 A1 WO 2022205632A1 CN 2021102202 W CN2021102202 W CN 2021102202W WO 2022205632 A1 WO2022205632 A1 WO 2022205632A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
target
information
position change
Prior art date
Application number
PCT/CN2021/102202
Other languages
French (fr)
Chinese (zh)
Inventor
韩志伟
刘诗男
杨昆霖
侯军
伊帅
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2022205632A1 publication Critical patent/WO2022205632A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to a target detection method, apparatus, device, and storage medium.
  • the present disclosure provides a target detection method and apparatus, device and storage medium to solve the deficiencies in the related art.
  • a target detection method including: acquiring position change information of at least one pixel in a first image relative to a corresponding pixel in a previous frame of image, where the first image is A frame of image in the video to be detected; obtain the image feature of the first image as the first feature; obtain the second feature based on the position change information; perform enhancement processing on the first feature based on the second feature, generating a fusion feature; determining a detection result of the target object in the first image according to the fusion feature.
  • a target detection apparatus comprising: a first acquisition module configured to acquire position change information of at least one pixel in the first image relative to a corresponding pixel in a previous frame of image , the first image is a frame of image in the video to be detected; the second acquisition module is used to acquire the image feature of the first image as the first feature and obtain the second feature based on the position change information; the fusion module , for performing enhancement processing on the first feature based on the second feature to generate a fusion feature; a detection module for determining the detection result of the target object in the first image according to the fusion feature.
  • an electronic device the device includes a memory and a processor, the memory is used for storing computer instructions executable on the processor, the processor is used for executing the Computer instructions implement the method of the first aspect.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method of the first aspect.
  • FIG. 1 is a flowchart of a target detection method shown in an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a first image and a previous frame image thereof shown in an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of position change information of a first image shown in an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a process of target detection shown in an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a target detection apparatus shown in an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • FIG. 1 shows a flow of the method, including steps S101 to S104 .
  • the object to be detected targeted by the target detection method may be an image or a video.
  • each frame of the video can be processed in batches, or each frame of the video can be processed sequentially.
  • this embodiment takes a certain frame of video as the object to be detected for description.
  • the purpose of target detection is to detect the target object in the object to be detected to obtain the detection result, and the detection result can represent one or more aspects of the information of the target object (for example, the position, number, density and other information of the target object).
  • step S101 the position change information of at least one pixel in the first image relative to the corresponding pixel in the previous frame of image is acquired, and the first image is a frame of image in the video to be detected.
  • the first image is a frame of image in the video to be detected.
  • at least one pixel in the first image corresponds to the same object as the corresponding pixel in the previous frame of image.
  • the video to be detected may be a video recorded for a specific space, and the space may contain the target object and other objects at the same time.
  • the first image and its previous frame can be as shown in Figure 2, and the first image can be any frame after the second frame image in the video to be detected (including the second frame image), because the first frame image may There will be no previous frame image.
  • the video to be detected may be a surveillance video or a drone video, that is, the video to be detected may be a video captured by a fixed surveillance camera, or a video captured by a flying drone.
  • the to-be-detected video to which the first image shown in FIG. 2 and its previous frame image belong is a street view video captured by a drone.
  • the tiles containing target objects such as crowds in surveillance videos are often large in size, and the detection task of target objects such as crowds (such as counting people) is relatively simple; the tiles containing target objects such as people in drone videos are often very large in size. It is small, and detection by manual observation is prone to errors, and the above errors can be avoided by using the detection method provided in this embodiment.
  • the target object may be at least one of the following: a person, a vehicle, and an animal.
  • the position change between the corresponding pixels of the two frames of images corresponding to the same object may be caused by the objective movement of the object in the space corresponding to the video to be detected, or may be caused by the movement of video capture devices such as drones. , may also be caused by a combination of the above two reasons.
  • the position change information can represent the position change of the corresponding pixels in the two frames of images, and each corresponding object in the two frames of images is composed of several consecutive pixels, the position change information of all the pixels of the same object can be the same.
  • the position change information of the pixel point in the first image shown in FIG. 2 relative to the corresponding pixel point in the previous frame image is shown in FIG. 3 .
  • a pre-trained neural network may be used to obtain position change information.
  • a large number of video frames can be collected as samples, and the position change information of the corresponding pixels in these video frames can be used as labels, and then the output position change information (predicted value) can be compared by inputting the samples into the neural network to be trained. ) and the position change information (true value) as the label, the network loss value is obtained, and the network parameters of the neural network to be trained are further adjusted by the network loss value, and then through repeated iterations, continuous optimization, and finally the accuracy requirements are obtained.
  • the trained neural network is completed.
  • step S102 an image feature of the first image is acquired as a first feature; and a second feature is acquired based on the position change information.
  • the first feature may be acquired first, and then the second feature may be acquired, or the second feature may be acquired first, and then the first feature may be acquired, or The first feature and the second feature are acquired simultaneously.
  • a pre-trained neural network may be used to obtain the image feature of the first image as the first feature, for example, the VGG16_bn model may be used to extract the first feature.
  • a pre-trained neural network may be used to obtain the second feature based on the position change information, for example, a backbone model may be used to extract the second feature. It should be understood by those skilled in the art that the above specific manner for obtaining the second feature is only for illustration, which is not limited by the embodiments of the present disclosure.
  • first feature and the second feature may correspond to feature maps of the same size.
  • step S103 the first feature is enhanced based on the second feature to generate a fusion feature.
  • each object in the first image is different in one or more aspects (for example, the crowds, buildings, vehicles in the first image are different in external dimensions, etc.), and these differences will be reflected in the first image.
  • the position change information can represent the difference in motion of each object (for example, the position of a person in the first image is point A, the position of the person in the previous frame image is point B, the person The position change information in the first image can be determined by the position change information of point A relative to point B; for another example, the position of a certain building in the first image is point C, and the position of the building in the previous frame of image The position is also point C, the position change information of the building in the first image can be determined by the position change information of point C relative to point C, that is, the movement of the building is static), the difference in the above movement will be reflected in in the second feature of the position change information. Therefore, using the second feature to enhance the first feature and generating the fusion feature can further strengthen the difference of each object reflected in the first image.
  • the common method of feature fusion is to splicing two features to increase the number of channels, or adding two features to maintain the same number of channels after fusion.
  • the second feature may be multiplied by the first feature as a mask to obtain a fused feature.
  • step S104 the detection result of the target object in the first image is determined according to the fusion feature.
  • the target object may be one type of object in the first image (for example, a crowd), and the target object may also be a variety of objects in the first image (for example, crowd and traffic flow, or cattle, horses, and sheep); the target object It can be determined according to the user's selection, or it can be automatically determined according to a preset rule.
  • the detection result can represent the information of the target object in one or more aspects (for example, the location, quantity, density, etc. of the target object), and the coverage of the detection result can be determined according to the user's choice, or can be automatically determined according to preset rules.
  • Sure Those skilled in the art should understand that the above specific definitions of the target object and the detection result are merely illustrative, and are not limited by the embodiments of the present disclosure.
  • the position change information of at least one pixel in the first image relative to the corresponding pixel in the previous frame image is obtained, and the first feature of the first image and the above position change information are obtained respectively.
  • the second feature is to perform enhancement processing on the first feature based on the second feature to generate a fusion feature, and finally determine the detection result of the target object in the first image according to the fusion feature. Since the position change information between the corresponding pixels of two adjacent frames of images is used, the temporal information of the video is used, and the accuracy of the detection result can be increased.
  • the detection method in this embodiment uses position change information, and when generating fusion features, The first feature is enhanced, so the accuracy of the detection result is increased, that is, a relatively accurate detection result can be obtained.
  • the position change information includes optical flow information.
  • the optical flow information represents the instantaneous speed of the pixel motion of the spatially moving object on the observation imaging plane. Therefore, when obtaining the optical flow information of the first image, the LK algorithm (Lucas Kanade algorithm) can be used to obtain it.
  • the LK algorithm has great constraints on the video, such as constant brightness, short time between adjacent frames, and similar adjacent pixels. Therefore, the accuracy and efficiency of the LK algorithm are low.
  • deep learning methods can also be used to obtain optical flow information. For example, FlowNet model or FlowNet2 model is used to obtain optical flow information.
  • the first feature of the first image and the second feature of the position change information can be acquired in the following manner: acquiring the image feature in the first image as the first feature, and based on the The optical flow feature obtained from the optical flow information is used as the second feature.
  • the image feature can represent the feature of at least one dimension of the pixel point of the first image
  • the optical flow feature can represent the position change rate of the pixel point of the first image
  • the first feature may be enhanced based on the second feature in the following manner to generate a fusion feature: first, at least one image of the first image is determined according to the second feature The position change rate of the pixel point; next, for each pixel point in the at least one pixel point, the enhancement parameter of the target feature element is determined according to the position change rate of the pixel point, wherein the target feature element is the feature elements corresponding to the pixel points in the first feature; finally, based on each of the enhancement parameters, differential enhancement processing is performed on the target feature element corresponding to the first feature to generate a fusion feature.
  • the position change information can represent the difference in the movement speed of each object in the first image, and the difference in movement speed will be reflected in the second feature of the position change information, so the difference in movement speed between the target object and other objects will be In the second feature, for example, if the target object is a pedestrian, the movement speed of the target object is higher than that of other objects, such as buildings.
  • the pixels in the first image are divided into different sets of regions, each set of regions constitutes an object, and different objects move at different speeds, that is, different objects contain different positions of pixels with different rates of change. Therefore, the position change rate of different pixel points can be determined through the second feature, and the objects represented by the pixel points with different position change rates are different. Therefore, the enhancement parameter of the target feature element can be determined according to the position change rate of the pixel point, and further The target feature element is enhanced to obtain a fusion sub-feature of the fusion feature, in other words, a fusion sub-feature for the target feature element is obtained.
  • the enhancement degrees of different feature elements are different, that is, the phenomenon of differential enhancement processing is performed on the feature elements in the first feature as a whole.
  • the enhanced first feature forms a fused feature, or all fused sub-features can constitute a fused feature.
  • the enhancement parameter can indicate whether or not to enhance or the degree of enhancement, that is to say, the pixels of the target object and the pixels of other objects can be distinguished by whether or not to enhance or the degree of enhancement, so as to enhance the target object and other objects. difference in the first feature. For example, only the feature elements corresponding to the pixels of the target object may be enhanced, or the feature elements corresponding to the pixels of the target object may be enhanced to a higher degree, and the feature elements corresponding to other pixels may be enhanced to a lower degree. Further, the moving speed of the target object is larger than that of other objects, and accordingly, the position change rate of the pixel points in the target object is also larger than that of other objects.
  • the enhancement parameter of the target feature element may be determined according to the position change rate of the pixel point and a preset standard change rate. For example, if the standard change rate is a threshold, the feature elements corresponding to the pixels whose position change rate is greater than the threshold value are enhanced, and the feature elements corresponding to the pixels whose position change rate is less than or equal to the threshold value are not enhanced.
  • the standard change rate can be used as a reference value, and the enhancement degree of the feature element is determined according to the relationship between the position change rate of the pixel point and the reference value: the position change rate in response to the pixel point is equal to the standard change rate.
  • the enhancement parameter of the target feature element is a preset standard enhancement parameter; or in response to the position change rate of the pixel point being greater than the standard change rate, determine that the enhancement parameter of the target feature element is greater than the standard enhancement parameter parameter; or in response to the position change rate of the pixel point being smaller than the standard change rate, it is determined that the enhancement parameter of the target feature element is smaller than the standard enhancement parameter.
  • the position change rate of the pixel point is determined by the second feature of the position change information, and according to the difference in the position change rate of the pixel point, the enhancement parameters of the feature elements corresponding to different pixel points are determined, and then the part of the pixel point is determined.
  • the feature elements are enhanced, or all feature elements are enhanced to different degrees, thereby further strengthening the difference between the target object and other objects in the first feature, thereby increasing the accuracy and efficiency of the target object detection result.
  • the detection result of the target object in the first image may be determined according to the fusion feature in the following manner: first, a density map of the target object is generated according to the fusion feature; next, based on the density
  • the figure refers to the number of density points of the target object (for example, summing the density points) to determine the number of target objects in the first image.
  • the density map is used to indicate the position, quantity, density and other information of the target object in the first image
  • the density map has density points that refer to the target object
  • the size of the density map can be related to the first feature and the first feature.
  • the feature maps corresponding to the two features are of equal size. Therefore, the number of target objects can be determined according to the number of density points that refer to the target object in the density map, that is, the number of target objects can be determined by summing the density points.
  • a pre-trained neural network can be used to determine the density map.
  • a decoder model such as Stochastic Frontier Approach (SFA) can be used to determine the density map.
  • SFA Stochastic Frontier Approach
  • This model can use multiple feature maps as input to extract different scales. Therefore, the determined density map is more accurate.
  • the video to be detected is a street view video to which the first image shown in FIG. 2 belongs
  • the target object is a person in the street view
  • the number of pedestrians in the first image can be determined based on the above target detection method, that is The number of pedestrians at the time corresponding to the first image is determined.
  • corresponding actions can be made according to the number of pedestrians. For example, when the number of pedestrians exceeds a preset number threshold, an alarm message can be issued to alert pedestrians and managers that the street is currently too crowded.
  • the accuracy and efficiency of the detection result can be further improved.
  • the quantity change information of the target objects in the video to be detected may also be generated in the following manner: First, obtain the first quantity information of the target objects in the first image, and obtain the target objects in the second image. Second quantity information of objects, wherein the first image and the second image are respectively a frame of images in the video to be detected; next, obtain the first time information of the first image and the second image second time information, wherein the first time information is the time of the first image in the video to be detected, and the second time information is the time of the second image in the video to be detected (For example, the first time information may be earlier or later than the second time information); finally, according to the first quantity information, the first time information, the second quantity information and the second time information, the quantity change information is determined, wherein , the quantity change information is used to indicate the quantity change of the target object in the video to be detected at different times.
  • the number of second images is not limited, and may be one or multiple, that is, the number of target objects of one frame of image may be obtained, and the number of target objects of multiple frames of images may also be obtained.
  • the subsequently acquired second time information may also be one or more, and the subsequently generated quantity change information may be for two images (a first image and a second image), or may be for multiple images. (a first image and at least two second images).
  • the method of acquiring the number of target objects in the second image can be the same as the above-mentioned method of acquiring the number of target objects in the first image (that is, the first number information), or the method of acquiring the number of target objects in the first image (that is, the first number information)
  • the number of the target objects in the first image is different, which is not intended to be specifically limited in this embodiment.
  • the time of the video to be detected can be a relative time, that is, the time relative to the moment when the video starts. For example, if the total duration of the video is 25 minutes, the time of the start moment of the video is 0:00, and the end moment of the video is 0:00.
  • the time of the video to be detected is 00:25; the time of the video to be detected can also be an absolute time, that is, the absolute time of the video recording. For example, the total duration of the video is still 25min, and the video starts from The time of the start moment of the video is 2020.11.13.8:00, and the time of the end moment of the video is 2020.11.13.8:25.
  • the video to be detected is a street view video to which the first image shown in FIG. 2 belongs, and the target object is a person in the street view. Therefore, the number of pedestrians in the first image and at least one second image can be determined, and also It is to be able to determine the change in the number of pedestrians in the Street View video.
  • the present disclosure by acquiring the number of target objects in images of other frames in the video to be detected, and further combining the time information of each frame of images to generate the quantity change information of the video to be detected, it is possible to detect the video at the time corresponding to the video to be detected. Within the segment, the number changes and trends of the target objects are obtained, thereby further increasing the comprehensiveness of the detection results.
  • the change trend of the number of people in 12 months of the year can be obtained, so that people's consumption habits can be analyzed, and then the peak months and quarters of consumption (that is, peak consumption seasons), and the trough months and months of consumption can be obtained.
  • Season that is, consumption off-season
  • the change trend of the number of people during business hours can also be obtained, so as to obtain the peak time and trough time of daily consumption.
  • the information obtained above can be used as guidance data for business operation or property management, so as to achieve the purpose of scientific management.
  • the change trend of traffic flow before and after holidays can be obtained, so that travel data can be counted, which can then be used as guidance data for expressway management.
  • the detection result of the target object in the first image may also be determined according to the fusion feature in the following manner, including: first, generating a density map of the target object according to the fusion feature; The position of each target object indicated in the density map and the preset area in the first image determine the number of target objects in the preset area in the first image.
  • the density map is used to indicate information such as the position, quantity, density, etc. of the target object in the first image, and the size of the density map may be equal to the size of the feature maps corresponding to the first feature and the second feature.
  • the density map may have the target objects in the first image, and label information such as position and/or count mark for each target object. Therefore, the number of target objects can be determined according to the positions of the target objects in the density map, that is, the number of target objects can be determined by summing the target objects in the density map.
  • a pre-trained neural network can be used to determine the density map.
  • a decoder model such as Stochastic Frontier Approach (SFA) can be used to determine the density map.
  • SFA Stochastic Frontier Approach
  • This model can use multiple feature maps as input to extract different scales. Therefore, the determined density map is more accurate.
  • the preset area can be an area where the flow of people is controlled, such as some places with limited flow, only a certain number of people are allowed to enter, and for example, some dangerous areas such as construction areas, pedestrians are prohibited from entering, that is, the flow of people needs to be controlled to 0 .
  • prompt information may be generated in response to the number of target objects in the preset area being greater than a preset number threshold. For example, if the flow of people in the restricted area exceeds the maximum flow of people required, an alarm will be issued to prohibit pedestrians from continuing to enter; for another example, after pedestrians enter the construction area, the alarm will be issued and pedestrians should be reminded to leave in time; another example, in some outdoor areas In a live game, the activity area of the players can be monitored, and if they enter the foul area, the alarm will be issued; for example, in football, basketball and other sports, the activity area of the players can be monitored, and if they enter the foul area, the Make an alarm.
  • the number of target objects in the preset area is used as the detection result, which can realize the detection and control of the flow of people in a specific area, which increases the pertinence and accuracy of the detection, thereby increasing the application range of the detection method. wider.
  • FIG. 4 shows a process of object detection according to an embodiment of the present disclosure.
  • the position change information is optical flow information
  • the target detection result is a density map.
  • the process is as follows: first perform optical flow prediction, then perform optical flow feature extraction and image feature extraction respectively, then perform feature fusion with optical flow features and image features to obtain fusion features, and finally use fusion features for density map prediction.
  • optical flow prediction is performed first, that is, the optical flow extraction network is used to extract optical flow information from the first image and the previous frame of the first image; then, from the extracted optical flow information, a neural network is used Extract optical flow features, and extract image features from the first image using a neural network (such as VGG16_bn), then multiply the optical flow features as a mask with the image features to obtain fusion features; finally, send the fusion features to the decoder (eg SFA) to predict density maps.
  • a neural network such as VGG16_bn
  • a target detection apparatus is provided.
  • FIG. 5 shows a schematic structural diagram of the apparatus, including: a first acquisition module 501 for acquiring at least a The position change information of a pixel point relative to the corresponding pixel point in the previous frame image; the second acquisition module 502 is used to acquire the image feature of the first image as the first feature and obtain the second feature based on the position change information
  • the fusion module 503 is used to carry out enhancement processing to the first feature based on the second feature to generate a fusion feature; the detection module 504 is used to determine the detection result of the target object in the first image according to the fusion feature .
  • the position change information includes optical flow information
  • the second obtaining module is configured to: use the optical flow feature obtained from the optical flow information as the second feature.
  • the fusion module is configured to: determine the position change rate of at least one pixel of the first image according to the second feature; for each pixel in the at least one pixel, according to The position change rate of the pixel point determines the enhancement parameter of the target feature element, wherein the target feature element is the feature element corresponding to the pixel point in the first feature; The target feature element corresponding to the first feature is subjected to differential enhancement processing to generate the fusion feature.
  • the fusion module is further configured to: determine the enhancement parameter of the target feature element according to the position change rate of the pixel point and a preset standard change rate.
  • the fusion module is further configured to: in response to the position change rate of the pixel being equal to the standard change rate, determine that the enhancement parameter of the target feature element is a preset standard enhancement parameter; or In response to the position change rate of the pixel point being greater than the standard rate of change, it is determined that the enhancement parameter of the target feature element is greater than the standard enhancement parameter; or in response to the position change rate of the pixel point being smaller than the standard rate of change , it is determined that the enhancement parameter of the target feature element is smaller than the standard enhancement parameter.
  • the detection module is configured to: generate a density map of the target object according to the fusion feature; determine the first image based on the number of density points that refer to the target object in the density map The first quantity information of the target object in .
  • the detection module is further configured to: acquire second quantity information of the target object in a second image, where the second image is a frame of image in the video to be detected; acquire first time information and second time information, wherein the first time information is the time of the first image in the to-be-detected video, and the second time information is the time of the second image in the to-be-detected video. Detecting time in the video; generating quantity change information according to the first quantity information, the first time information, the second quantity information and the second time information, wherein the quantity change information is used to indicate The number of target objects in the video to be detected changes at different times.
  • the detection module is configured to: generate a density map of the target object according to the fusion feature; determine the first image according to the position of each target object indicated in the density map The number of the target objects within the preset area in .
  • the detection module is further configured to generate prompt information in response to the number of target objects in the preset area being greater than a preset number threshold.
  • a third aspect of the embodiments of the present disclosure provides an electronic device, please refer to FIG. 6 , which shows the structure of the device, and the device includes a memory and a processor, and the memory is used for storing data that can be stored on the processor.
  • Running computer instructions the processor is configured to detect a target based on the method of the first aspect when executing the computer instructions.
  • a fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method described in the first aspect.
  • first and second are used for descriptive purposes only, and should not be construed as indicating or implying relative importance.
  • the term “plurality” refers to two or more, unless expressly limited otherwise.

Abstract

The present disclosure relates to a target detection method and apparatus, a device and a storage medium. The target detection method comprises: obtaining position change information of at least one pixel in a first image with respect to a corresponding pixel in a previous image frame, the first image being an image frame in a video to be detected; obtaining an image feature of the first image as a first feature; obtaining a second feature on the basis of the position change information; performing enhancement processing on the first feature on the basis of the second feature to generate a fusion feature; and determining the detection result of a target object in the first image according to the fusion feature.

Description

目标检测方法、装置、设备及存储介质Object detection method, device, device and storage medium
相关申请交叉引用Cross-reference to related applications
本申请主张申请号为202110352206.0、申请日为2021年3月31日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application claims the filing of a Chinese patent application with an application number of 202110352206.0 and an application date of March 31, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.
技术领域technical field
本公开涉及图像处理技术领域,尤其涉及一种目标检测方法、装置、设备及存储介质。The present disclosure relates to the technical field of image processing, and in particular, to a target detection method, apparatus, device, and storage medium.
背景技术Background technique
随着人工智能技术的发展,图像中的目标可以自动检测,降低了人工成本,提高了效率和准确率。With the development of artificial intelligence technology, objects in images can be automatically detected, reducing labor costs and improving efficiency and accuracy.
发明内容SUMMARY OF THE INVENTION
本公开提供一种目标检测方法和装置、设备及存储介质,以解决相关技术中的不足。The present disclosure provides a target detection method and apparatus, device and storage medium to solve the deficiencies in the related art.
根据本公开实施例的第一方面,提供一种目标检测方法,包括:获取第一图像中的至少一个像素点相对前一帧图像中的对应像素点的位置变化信息,所述第一图像为待检测视频中的一帧图像;获取所述第一图像的图像特征作为第一特征;基于所述位置变化信息获取第二特征;基于所述第二特征对所述第一特征进行增强处理,生成融合特征;根据所述融合特征确定所述第一图像中目标对象的检测结果。According to a first aspect of the embodiments of the present disclosure, there is provided a target detection method, including: acquiring position change information of at least one pixel in a first image relative to a corresponding pixel in a previous frame of image, where the first image is A frame of image in the video to be detected; obtain the image feature of the first image as the first feature; obtain the second feature based on the position change information; perform enhancement processing on the first feature based on the second feature, generating a fusion feature; determining a detection result of the target object in the first image according to the fusion feature.
根据本公开实施例的第二方面,提供一种目标检测装置,包括:第一获取模块,用于获取第一图像中的至少一个像素点相对前一帧图像中的对应像素点的位置变化信息,所述第一图像为待检测视频中的一帧图像;第二获取模块,用于获取所述第一图像的图像特征作为第一特征以及基于所述位置变化信息获取第二特征;融合模块,用于基于所述第二特征对所述第一特征进行增强处理,生成融合特征;检测模块,用于根据所述融合特征确定所述第一图像中目标对象的检测结果。According to a second aspect of the embodiments of the present disclosure, there is provided a target detection apparatus, comprising: a first acquisition module configured to acquire position change information of at least one pixel in the first image relative to a corresponding pixel in a previous frame of image , the first image is a frame of image in the video to be detected; the second acquisition module is used to acquire the image feature of the first image as the first feature and obtain the second feature based on the position change information; the fusion module , for performing enhancement processing on the first feature based on the second feature to generate a fusion feature; a detection module for determining the detection result of the target object in the first image according to the fusion feature.
根据本公开实施例的第三方面,提供一种电子设备,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现第一方面所述的方法。According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, the device includes a memory and a processor, the memory is used for storing computer instructions executable on the processor, the processor is used for executing the Computer instructions implement the method of the first aspect.
根据本公开实施例的第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面所述的方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method of the first aspect.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
图1是本公开实施例示出的目标检测方法的流程图;1 is a flowchart of a target detection method shown in an embodiment of the present disclosure;
图2是本公开实施例示出的第一图像及其前一帧图像的示意图;FIG. 2 is a schematic diagram of a first image and a previous frame image thereof shown in an embodiment of the present disclosure;
图3是本公开实施例示出的第一图像的位置变化信息的示意图;3 is a schematic diagram of position change information of a first image shown in an embodiment of the present disclosure;
图4是本公开实施例示出的目标检测的过程示意图;4 is a schematic diagram of a process of target detection shown in an embodiment of the present disclosure;
图5是本公开实施例示出的目标检测装置的结构示意图;5 is a schematic structural diagram of a target detection apparatus shown in an embodiment of the present disclosure;
图6是本公开实施例示出的电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但 这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."
随着人工智能技术的发展,图像中的目标可以自动检测,降低了人工成本,提高了效率和准确率。相关技术中,针对视频的图像帧进行检测时,与普通图像的目标检测完全一致,然而其未对视频的特征进行充分利用,导致检测结果不准确。With the development of artificial intelligence technology, objects in images can be automatically detected, reducing labor costs and improving efficiency and accuracy. In the related art, when detecting an image frame of a video, it is completely consistent with the target detection of an ordinary image, but it does not fully utilize the features of the video, resulting in inaccurate detection results.
基于此,本公开实施例的第一方面提供了一种目标检测方法,请参照附图1,其示出了该方法的流程,包括步骤S101至步骤S104。Based on this, a first aspect of the embodiments of the present disclosure provides a target detection method. Please refer to FIG. 1 , which shows a flow of the method, including steps S101 to S104 .
其中,该目标检测方法所针对的待检测对象可以是图像,也可以是视频。当待检测对象是视频时,可以批量处理视频的每帧图像,或依次处理视频的每帧图像。为方便描述,本实施例以视频的某一帧图像作为待检测对象进行描述。目标检测的目的是对待检测对象中的目标对象进行检测,以获得检测结果,检测结果可以表示目标对象一方面或多方面的信息(例如,目标对象的位置、数量、密度等信息)。The object to be detected targeted by the target detection method may be an image or a video. When the object to be detected is a video, each frame of the video can be processed in batches, or each frame of the video can be processed sequentially. For convenience of description, this embodiment takes a certain frame of video as the object to be detected for description. The purpose of target detection is to detect the target object in the object to be detected to obtain the detection result, and the detection result can represent one or more aspects of the information of the target object (for example, the position, number, density and other information of the target object).
在步骤S101中,获取第一图像中的至少一个像素点相对前一帧图像中的对应像素点的位置变化信息,所述第一图像为待检测视频中的一帧图像。其中,所述第一图像中的至少一个像素点与所述前一帧图像中的对应像素点对应于同一对象。In step S101, the position change information of at least one pixel in the first image relative to the corresponding pixel in the previous frame of image is acquired, and the first image is a frame of image in the video to be detected. Wherein, at least one pixel in the first image corresponds to the same object as the corresponding pixel in the previous frame of image.
其中,待检测视频可以为针对特定的空间录制的视频,该空间内包含目标对象,同时还可以包含其他对象。第一图像和其前一帧图像可如图2所示,第一图像可以为待检测视频中的第二帧图像之后(包括第二帧图像)的任意一帧图像,因为第一帧图像可能会没有前一帧图像。Wherein, the video to be detected may be a video recorded for a specific space, and the space may contain the target object and other objects at the same time. The first image and its previous frame can be as shown in Figure 2, and the first image can be any frame after the second frame image in the video to be detected (including the second frame image), because the first frame image may There will be no previous frame image.
在一个示例中,待检测视频可以为监控视频或无人机视频,也就是说,待检测视频可以为固定的监控摄像头拍摄的视频,或是通过飞行的无人机拍摄的视频。例如,图2中所示出的第一图像和其前一帧图像所属的待检测视频就是通过无人机拍摄的街景视频。监控视频中的包含人群等目标对象的图块往往尺寸较大,对于人群等目标对象的检测任务(例如计数人物)较为简单;无人机视频中的包含人群等目标对象的图块往往尺寸很小,依靠人工观察进行检测容易发生错误,使用本实施例提供的检测方法能够避免上述错误。In one example, the video to be detected may be a surveillance video or a drone video, that is, the video to be detected may be a video captured by a fixed surveillance camera, or a video captured by a flying drone. For example, the to-be-detected video to which the first image shown in FIG. 2 and its previous frame image belong is a street view video captured by a drone. The tiles containing target objects such as crowds in surveillance videos are often large in size, and the detection task of target objects such as crowds (such as counting people) is relatively simple; the tiles containing target objects such as people in drone videos are often very large in size. It is small, and detection by manual observation is prone to errors, and the above errors can be avoided by using the detection method provided in this embodiment.
在一个示例中,目标对象可以为下述至少一种:人物、车辆和动物。In one example, the target object may be at least one of the following: a person, a vehicle, and an animal.
其中,两帧图像对应于同一对象的对应像素点之间的位置变化,可能由于待检测视频对应的空间中的对象的客观移动造成的,也可能由于无人机等视频采集设备的运动造成的,还可能是上述两方面原因共同造成的。由于位置变化信息可以表示两帧图像中的对应像素点的位置变化,而两帧图像中的各个相对应的对象均是由若干连续像素点构成的,因此同一对象的所有像素点的位置变化信息可以是相同的。例如,图2所示出的第一图像中的像素点相对前一帧图像中的对应像素点的位置变化信息如图3所示。本领域技术人员应当理解,以上位置变化信息的具体释义仅为示意,本公开实施例对此不进行限制。Among them, the position change between the corresponding pixels of the two frames of images corresponding to the same object may be caused by the objective movement of the object in the space corresponding to the video to be detected, or may be caused by the movement of video capture devices such as drones. , may also be caused by a combination of the above two reasons. Since the position change information can represent the position change of the corresponding pixels in the two frames of images, and each corresponding object in the two frames of images is composed of several consecutive pixels, the position change information of all the pixels of the same object can be the same. For example, the position change information of the pixel point in the first image shown in FIG. 2 relative to the corresponding pixel point in the previous frame image is shown in FIG. 3 . Those skilled in the art should understand that the above specific interpretation of the location change information is only for illustration, which is not limited by the embodiments of the present disclosure.
本步骤中,可以采用预先训练的神经网络获取位置变化信息。训练神经网络时,可以采集大量的视频帧作为样本,将这些视频帧中的对应像素点的位置变化信息作为标签,然后通过将样本输入待训练的神经网络,比较输出的位置变化信息(预测值)与作为标签的位置变化信息(真值)间的差异,得出网络损失值,并进一步通过网络损失值调整待训练神经网络的网络参数,然后通过反复迭代,不断优化,最终得到符合精度要求的训练完成的神经网络。本领域技术人员应当理解,以上获取位置变化信息的具体方式仅为示意,本公开实施例对此不进行限制。In this step, a pre-trained neural network may be used to obtain position change information. When training a neural network, a large number of video frames can be collected as samples, and the position change information of the corresponding pixels in these video frames can be used as labels, and then the output position change information (predicted value) can be compared by inputting the samples into the neural network to be trained. ) and the position change information (true value) as the label, the network loss value is obtained, and the network parameters of the neural network to be trained are further adjusted by the network loss value, and then through repeated iterations, continuous optimization, and finally the accuracy requirements are obtained. The trained neural network is completed. Those skilled in the art should understand that the above specific manner of acquiring the location change information is only for illustration, which is not limited in the embodiments of the present disclosure.
在步骤S102中,获取所述第一图像的图像特征作为第一特征;基于所述位置变化信息获取第二特征。In step S102, an image feature of the first image is acquired as a first feature; and a second feature is acquired based on the position change information.
其中,获取第一特征和获取第二特征的顺序并无限制,也就是说,可以先获取第一特征,再获取第二特征,也可以先获取第二特征,再获取第一特征,还可以同时获取第一特征和第二特征。There is no restriction on the order of acquiring the first feature and acquiring the second feature, that is, the first feature may be acquired first, and then the second feature may be acquired, or the second feature may be acquired first, and then the first feature may be acquired, or The first feature and the second feature are acquired simultaneously.
本步骤中,可以采用预先训练的神经网络获取所述第一图像的图像特征作为第一特征,例如采用VGG16_bn模型提取第一特征。本领域技术人员应当理解,以上获取第一图像的图像特征的具体方式仅为示意,本公开实施例对此不进行限制。In this step, a pre-trained neural network may be used to obtain the image feature of the first image as the first feature, for example, the VGG16_bn model may be used to extract the first feature. Those skilled in the art should understand that the above specific manner of acquiring the image feature of the first image is only for illustration, which is not limited in the embodiments of the present disclosure.
本步骤中,可以采用预先训练的神经网络基于所述位置变化信息获取第二特征,例如采用backbone模型提取第二特征。本领域技术人员应当理解,以上获取第二特征的具体方式仅为示意,本公开实施例对此不进行限制。In this step, a pre-trained neural network may be used to obtain the second feature based on the position change information, for example, a backbone model may be used to extract the second feature. It should be understood by those skilled in the art that the above specific manner for obtaining the second feature is only for illustration, which is not limited by the embodiments of the present disclosure.
另外,第一特征和第二特征可以对应相同尺寸的特征图。In addition, the first feature and the second feature may correspond to feature maps of the same size.
在步骤S103中,基于所述第二特征对所述第一特征进行增强处理,生成融合特征。In step S103, the first feature is enhanced based on the second feature to generate a fusion feature.
其中,第一图像内的各个对象在一个方面或多个方面存在差异(例如,第一图像 内的人群、建筑物、车辆在外形尺寸等上存在差异),这些差异会体现在第一图像的第一特征中,而位置变化信息可以表示各个对象在运动方面的差异(例如,某个人在第一图像中的位置为A点,该人在前一帧图像中的位置为B点,该人在第一图像中的位置变化信息可以通过A点相对B点的位置变化信息确定;再例如,某个建筑物在第一图像中的位置为C点,该建筑物在前一帧图像中的位置也为C点,该建筑物在第一图像中的位置变化信息可以通过C点相对C点的位置变化信息确定,即该建筑物的运动是静止的),上述运动方面的差异会体现在位置变化信息的第二特征中。因此利用第二特征对第一特征进行增强处理,生成融合特征能够进一步强化各个对象体现在第一特征中的差异,也就是说,体现在融合特征中的各个对象的差异会更加明显和细化。Wherein, each object in the first image is different in one or more aspects (for example, the crowds, buildings, vehicles in the first image are different in external dimensions, etc.), and these differences will be reflected in the first image. In the first feature, the position change information can represent the difference in motion of each object (for example, the position of a person in the first image is point A, the position of the person in the previous frame image is point B, the person The position change information in the first image can be determined by the position change information of point A relative to point B; for another example, the position of a certain building in the first image is point C, and the position of the building in the previous frame of image The position is also point C, the position change information of the building in the first image can be determined by the position change information of point C relative to point C, that is, the movement of the building is static), the difference in the above movement will be reflected in in the second feature of the position change information. Therefore, using the second feature to enhance the first feature and generating the fusion feature can further strengthen the difference of each object reflected in the first feature, that is to say, the difference of each object embodied in the fusion feature will be more obvious and refined. .
特征融合的常用方法是对两个特征拼接从而增加通道数,或者对两个特征做加法维持融合后的通道数不变。在一个示例中,可以将第二特征作为掩膜(mask)与第一特征相乘,得到融合特征。The common method of feature fusion is to splicing two features to increase the number of channels, or adding two features to maintain the same number of channels after fusion. In one example, the second feature may be multiplied by the first feature as a mask to obtain a fused feature.
在步骤S104中,根据所述融合特征确定所述第一图像中目标对象的检测结果。In step S104, the detection result of the target object in the first image is determined according to the fusion feature.
其中,目标对象可以是第一图像中的一种对象(例如,人群),目标对象还可以是第一图像中的多种对象(例如,人群和车流,或者牛、马、羊);目标对象可以根据用户的选择进行确定,也可以根据预设规则自动确定。检测结果可以表示目标对象在一个方面或多个方面的信息(例如,目标对象的位置、数量、密度等信息),检测结果的涵盖范围可以根据用户的选择进行确定,也可以根据预设规则自动确定。本领域技术人员应当理解,以上目标对象、检测结果的具体释义仅为示意,本公开实施例对此不进行限制。The target object may be one type of object in the first image (for example, a crowd), and the target object may also be a variety of objects in the first image (for example, crowd and traffic flow, or cattle, horses, and sheep); the target object It can be determined according to the user's selection, or it can be automatically determined according to a preset rule. The detection result can represent the information of the target object in one or more aspects (for example, the location, quantity, density, etc. of the target object), and the coverage of the detection result can be determined according to the user's choice, or can be automatically determined according to preset rules. Sure. Those skilled in the art should understand that the above specific definitions of the target object and the detection result are merely illustrative, and are not limited by the embodiments of the present disclosure.
本公开的实施例中,通过获取第一图像中的至少一个像素点相对于前一帧图像中的对应像素点的位置变化信息,并分别获取第一图像的第一特征和上述位置变化信息的第二特征,以基于第二特征对第一特征进行增强处理,生成融合特征,最后根据融合特征确定第一图像中目标对象的检测结果。由于利用了相邻两帧图像的对应像素点间的位置变化信息,因此利用了视频的时域信息,可以增加检测结果的准确性。In the embodiment of the present disclosure, the position change information of at least one pixel in the first image relative to the corresponding pixel in the previous frame image is obtained, and the first feature of the first image and the above position change information are obtained respectively. The second feature is to perform enhancement processing on the first feature based on the second feature to generate a fusion feature, and finally determine the detection result of the target object in the first image according to the fusion feature. Since the position change information between the corresponding pixels of two adjacent frames of images is used, the temporal information of the video is used, and the accuracy of the detection result can be increased.
而且,无人机视频等待检测视频中,目标对象的尺寸较小,即使人工观察,都难以避免发生错误,但是本实施例中的检测方法,由于利用了位置变化信息,而且生成融合特征时对第一特征进行了增强处理,因此增加了检测结果的准确性,即能够获取较为准确的检测结果。Moreover, in the UAV video waiting for detection, the size of the target object is small, even if it is manually observed, it is difficult to avoid errors. However, the detection method in this embodiment uses position change information, and when generating fusion features, The first feature is enhanced, so the accuracy of the detection result is increased, that is, a relatively accurate detection result can be obtained.
本公开的一些实施例中,所述位置变化信息包括光流信息。其中,光流信息表示空间运动物体在观察成像平面上的像素运动的瞬时速度。因此在获取第一图像的光流信息时,可以采用LK算法(Lucas Kanade算法)获取,LK算法对视频有较大约束,例如亮度恒定、需要相邻帧时间很短以及需要相邻像素有相似的运动等约束,因此LK算法精度和效率都较低。为了更加高效且高精度地获取光流信息,也可以利用深度学习的方法获取,例如,采用FlowNet模型或FlowNet2模型获取光流信息。In some embodiments of the present disclosure, the position change information includes optical flow information. Among them, the optical flow information represents the instantaneous speed of the pixel motion of the spatially moving object on the observation imaging plane. Therefore, when obtaining the optical flow information of the first image, the LK algorithm (Lucas Kanade algorithm) can be used to obtain it. The LK algorithm has great constraints on the video, such as constant brightness, short time between adjacent frames, and similar adjacent pixels. Therefore, the accuracy and efficiency of the LK algorithm are low. In order to obtain optical flow information more efficiently and with high accuracy, deep learning methods can also be used to obtain optical flow information. For example, FlowNet model or FlowNet2 model is used to obtain optical flow information.
基于此,可以按照下述方式获取所述第一图像的第一特征以及所述位置变化信息的第二特征:获取所述第一图像中的图像特征作为所述第一特征,以及基于从所述光流信息中获取的光流特征作为所述第二特征。Based on this, the first feature of the first image and the second feature of the position change information can be acquired in the following manner: acquiring the image feature in the first image as the first feature, and based on the The optical flow feature obtained from the optical flow information is used as the second feature.
图像特征能够表征第一图像的像素点的至少一个维度的特征,光流特征能够表征第一图像的像素点的位置变化率。The image feature can represent the feature of at least one dimension of the pixel point of the first image, and the optical flow feature can represent the position change rate of the pixel point of the first image.
本公开的一些实施例中,可以按照下述方式基于所述第二特征对所述第一特征进行增强处理,生成融合特征:首先,根据所述第二特征确定所述第一图像的至少一个像素点的位置变化率;接下来,针对所述至少一个像素点中的每个像素点,根据所述像素点的位置变化率确定目标特征元素的增强参数,其中,所述目标特征元素为所述第一特征中与所述像素点对应的特征元素;最后,基于每个所述增强参数,对所述第一特征中对应的所述目标特征元素进行差别化增强处理,生成融合特征。In some embodiments of the present disclosure, the first feature may be enhanced based on the second feature in the following manner to generate a fusion feature: first, at least one image of the first image is determined according to the second feature The position change rate of the pixel point; next, for each pixel point in the at least one pixel point, the enhancement parameter of the target feature element is determined according to the position change rate of the pixel point, wherein the target feature element is the feature elements corresponding to the pixel points in the first feature; finally, based on each of the enhancement parameters, differential enhancement processing is performed on the target feature element corresponding to the first feature to generate a fusion feature.
其中,位置变化信息可以表示第一图像中各个对象在运动速度上的差异,且运动速度的差异会体现在位置变化信息的第二特征中,因此目标对象与其他对象在运动速度上的差异会体现在第二特征中,例如,目标对象为行人,则目标对象的运动速度大于其他对象,例如建筑物。Among them, the position change information can represent the difference in the movement speed of each object in the first image, and the difference in movement speed will be reflected in the second feature of the position change information, so the difference in movement speed between the target object and other objects will be In the second feature, for example, if the target object is a pedestrian, the movement speed of the target object is higher than that of other objects, such as buildings.
在一个示例中,第一图像中的像素点被划分为不同的区域集合,每个区域集合构成一个对象,不同对象的运动速度不同,也就是不同对象包含的像素点的位置变化率不同。因此,通过第二特征能够确定出不同的像素点的位置变化率,且位置变化率不同的像素点代表的对象不同,因此可以根据像素点的位置变化率确定目标特征元素的增强参数,并进一步对目标特征元素进行增强,以得到融合特征的融合子特征,换言之,得到针对目标特征元素的融合子特征。由于不同对象所包含的像素点对应的特征元素的增强参数不同,因此对不同特征元素的增强程度不同,即从整体上呈现出对第一特征中特征元素进行差别化增强处理的现象,差别化增强处理后的第一特征形成融合特征,或者说 全部的融合子特征则可构成融合特征。In an example, the pixels in the first image are divided into different sets of regions, each set of regions constitutes an object, and different objects move at different speeds, that is, different objects contain different positions of pixels with different rates of change. Therefore, the position change rate of different pixel points can be determined through the second feature, and the objects represented by the pixel points with different position change rates are different. Therefore, the enhancement parameter of the target feature element can be determined according to the position change rate of the pixel point, and further The target feature element is enhanced to obtain a fusion sub-feature of the fusion feature, in other words, a fusion sub-feature for the target feature element is obtained. Since the enhancement parameters of the feature elements corresponding to the pixels contained in different objects are different, the enhancement degrees of different feature elements are different, that is, the phenomenon of differential enhancement processing is performed on the feature elements in the first feature as a whole. The enhanced first feature forms a fused feature, or all fused sub-features can constitute a fused feature.
其中,增强参数可以表示增强与否或增强程度,也就是说,针对目标对象的像素点和其他对象的像素点,可以通过增强与否或增强程度进行区分,以强化目标对象与其他对象体现在第一特征中的区别。例如,可以只增强目标对象的像素点对应的特征元素,或者还可以较高程度的增强目标对象的像素点对应的特征元素,较低程度的增强其他像素点对应的特征元素。进一步来说,目标对象的运动速度较之其他对象更大,相应的,目标对象中像素点的位置变化率较之其他对象中像素点的位置变化率也更大。因此可以只增强位置变化率较大的像素点对应的特征元素,或较大程度增强位置变化率较大的像素点对应的特征元素,较低程度增强其他像素点对应的特征元素。Among them, the enhancement parameter can indicate whether or not to enhance or the degree of enhancement, that is to say, the pixels of the target object and the pixels of other objects can be distinguished by whether or not to enhance or the degree of enhancement, so as to enhance the target object and other objects. difference in the first feature. For example, only the feature elements corresponding to the pixels of the target object may be enhanced, or the feature elements corresponding to the pixels of the target object may be enhanced to a higher degree, and the feature elements corresponding to other pixels may be enhanced to a lower degree. Further, the moving speed of the target object is larger than that of other objects, and accordingly, the position change rate of the pixel points in the target object is also larger than that of other objects. Therefore, it is possible to enhance only the feature elements corresponding to the pixels with a large position change rate, or to a large extent to enhance the feature elements corresponding to the pixels with a large position change rate, and to a lesser degree to enhance the feature elements corresponding to other pixels.
在一个示例中,可以根据所述像素点的位置变化率和预设的标准变化率,确定所述目标特征元素的增强参数。例如,标准变化率为一阈值,增强位置变化率大于该阈值的像素点对应的特征元素,不增强位置变化率小于或等于该阈值的像素点对应的特征元素。再例如,标准变化率可以作为一个参考值,根据像素点的位置变化率与该参考值的大小关系确定特征元素的增强程度:响应于所述像素点的位置变化率与所述标准变化率相等,确定所述目标特征元素的增强参数为预设的标准增强参数;或响应于所述像素点的位置变化率大于所述标准变化率,确定所述目标特征元素的增强参数大于所述标准增强参数;或响应于所述像素点的位置变化率小于所述标准变化率,确定所述目标特征元素的增强参数小于所述标准增强参数。In an example, the enhancement parameter of the target feature element may be determined according to the position change rate of the pixel point and a preset standard change rate. For example, if the standard change rate is a threshold, the feature elements corresponding to the pixels whose position change rate is greater than the threshold value are enhanced, and the feature elements corresponding to the pixels whose position change rate is less than or equal to the threshold value are not enhanced. For another example, the standard change rate can be used as a reference value, and the enhancement degree of the feature element is determined according to the relationship between the position change rate of the pixel point and the reference value: the position change rate in response to the pixel point is equal to the standard change rate. , determine that the enhancement parameter of the target feature element is a preset standard enhancement parameter; or in response to the position change rate of the pixel point being greater than the standard change rate, determine that the enhancement parameter of the target feature element is greater than the standard enhancement parameter parameter; or in response to the position change rate of the pixel point being smaller than the standard change rate, it is determined that the enhancement parameter of the target feature element is smaller than the standard enhancement parameter.
本公开的实施例中,通过位置变化信息的第二特征确定像素点的位置变化率,并根据像素点的位置变化率的不同,确定不同的像素点对应的特征元素的增强参数,进而对部分特征元素进行增强,或对全部特征元素进行不同程度的增强,从而进一步强化了目标对象与其他对象体现在第一特征中的差异,进而增加了目标对象检测结果的准确性和效率。In the embodiment of the present disclosure, the position change rate of the pixel point is determined by the second feature of the position change information, and according to the difference in the position change rate of the pixel point, the enhancement parameters of the feature elements corresponding to different pixel points are determined, and then the part of the pixel point is determined. The feature elements are enhanced, or all feature elements are enhanced to different degrees, thereby further strengthening the difference between the target object and other objects in the first feature, thereby increasing the accuracy and efficiency of the target object detection result.
本公开的一些实施例中,可以按照下述方式根据融合特征确定所述第一图像中目标对象的检测结果:首先,根据所述融合特征生成目标对象的密度图;接下来,基于所述密度图中指代目标对象的密度点的数量(例如对密度点进行求和),确定所述第一图像中的目标对象的数量。In some embodiments of the present disclosure, the detection result of the target object in the first image may be determined according to the fusion feature in the following manner: first, a density map of the target object is generated according to the fusion feature; next, based on the density The figure refers to the number of density points of the target object (for example, summing the density points) to determine the number of target objects in the first image.
其中,所述密度图用于指示所述第一图像中的目标对象的位置、数量、密度等信息,密度图中具有指代目标对象的密度点,密度图的尺寸可以和第一特征以及第二特征 对应的特征图的尺寸相等。因此可以根据密度图中指代目标对象的密度点的数量确定目标对象的数量,即通过对密度点进行求和便可确定目标对象的数量。Wherein, the density map is used to indicate the position, quantity, density and other information of the target object in the first image, the density map has density points that refer to the target object, and the size of the density map can be related to the first feature and the first feature. The feature maps corresponding to the two features are of equal size. Therefore, the number of target objects can be determined according to the number of density points that refer to the target object in the density map, that is, the number of target objects can be determined by summing the density points.
其中,可以采用预先训练的神经网络确定密度图,例如采用诸如随机前沿方法(Stochastic Frontier Approach,SFA)的decoder模型确定密度图,这种模型可以使用多个特征图作为输入,从而提取不同尺度的特征,因此确定的密度图较为准确。本领域技术人员应当理解,以上生成密度图的具体方式仅为示意,本公开实施例对此不进行限制。Among them, a pre-trained neural network can be used to determine the density map. For example, a decoder model such as Stochastic Frontier Approach (SFA) can be used to determine the density map. This model can use multiple feature maps as input to extract different scales. Therefore, the determined density map is more accurate. Those skilled in the art should understand that the above specific manner of generating the density map is only for illustration, which is not limited in this embodiment of the present disclosure.
在一个示例中,待检测视频为图2所示出的第一图像所属的街景视频,目标对象为街景中的人物,可以基于上述目标检测方法确定出第一图像中的行人数量,也就是能够确定出第一图像对应的时间的行人数量。具体应用时,可以根据行人数量做出相应动作,例如当行人数量过多,超过预设的数量阈值时,可以发出警报信息进行报警,以提示行人和管理人员目前街道过于拥挤。In an example, the video to be detected is a street view video to which the first image shown in FIG. 2 belongs, the target object is a person in the street view, and the number of pedestrians in the first image can be determined based on the above target detection method, that is The number of pedestrians at the time corresponding to the first image is determined. In specific applications, corresponding actions can be made according to the number of pedestrians. For example, when the number of pedestrians exceeds a preset number threshold, an alarm message can be issued to alert pedestrians and managers that the street is currently too crowded.
由于经济的发展,目前人群聚集越来越频繁,因此将人群计数作为检测结果,进而进行报警等操作,能够防止由于人群密集发生踩踏等危险事件。Due to the development of the economy, crowds gather more and more frequently. Therefore, the crowd count is used as the detection result, and then the alarm is carried out, which can prevent the occurrence of dangerous events such as stampede due to the dense crowd.
本公开的实施例中,通过生成密度图,进而确定目标对象的数量,也就是以目标对象的数量作为检测结果,能够进一步提高检测结果的准确性和效率。In the embodiment of the present disclosure, by generating a density map and then determining the number of target objects, that is, taking the number of target objects as the detection result, the accuracy and efficiency of the detection result can be further improved.
本公开的一些实施例中,还可以按照下述方式生成待检测视频中的目标对象的数量变化信息:首先,获取第一图像中的目标对象的第一数量信息,获取第二图像中的目标对象的第二数量信息,其中,所述第一图像和所述第二图像分别为所述待检测视频中的一帧图像;接下来,获取第一图像的第一时间信息和第二图像的第二时间信息,其中,所述第一时间信息为所述第一图像在所述待检测视频中的时间,所述第二时间信息为所述第二图像在所述待检测视频中的时间(例如,第一时间信息可以早于或晚于第二时间信息);最后,根据所述第一数量信息、第一时间信息、第二数量信息和第二时间信息,确定数量变化信息,其中,所述数量变化信息用于表示待检测视频中的目标对象在不同时刻的数量变化。In some embodiments of the present disclosure, the quantity change information of the target objects in the video to be detected may also be generated in the following manner: First, obtain the first quantity information of the target objects in the first image, and obtain the target objects in the second image. Second quantity information of objects, wherein the first image and the second image are respectively a frame of images in the video to be detected; next, obtain the first time information of the first image and the second image second time information, wherein the first time information is the time of the first image in the video to be detected, and the second time information is the time of the second image in the video to be detected (For example, the first time information may be earlier or later than the second time information); finally, according to the first quantity information, the first time information, the second quantity information and the second time information, the quantity change information is determined, wherein , the quantity change information is used to indicate the quantity change of the target object in the video to be detected at different times.
其中,第二图像的数量不做限制,可以是一个,也可以是多个,也就是说,可以获取一帧图像的目标对象的数量,也可以获取多帧图像的目标对象的数量。相对应的,后续获取的第二时间信息也可以是一个或多个,进而后续生成的数量变化信息可以是针对两个图像(第一图像和一个第二图像),也可以是针对多个图像(第一图像和至少两个第二图像)。The number of second images is not limited, and may be one or multiple, that is, the number of target objects of one frame of image may be obtained, and the number of target objects of multiple frames of images may also be obtained. Correspondingly, the subsequently acquired second time information may also be one or more, and the subsequently generated quantity change information may be for two images (a first image and a second image), or may be for multiple images. (a first image and at least two second images).
其中,获取第二图像中目标对象的数量(即,第二数量信息)的方式可以与上述获取第一图像中目标对象的数量(即,第一数量信息)的方式相同,也可以与上述获取第一图像中目标对象的数量的方式不同,本实施例对此无意进行具体限制。The method of acquiring the number of target objects in the second image (that is, the second number information) can be the same as the above-mentioned method of acquiring the number of target objects in the first image (that is, the first number information), or the method of acquiring the number of target objects in the first image (that is, the first number information) The number of the target objects in the first image is different, which is not intended to be specifically limited in this embodiment.
其中,待检测视频的时间,可以是相对时间,也就是相对于视频开始的时刻的时间,例如,视频的总时长为25min,则视频的起始时刻的时间为0:00,视频的结束时刻的时间为00:25;待检测视频的时间,还可以是绝对时间,也就是视频录制时的绝对时间,例如,视频的总时长仍为25min,视频从2020.11.13.8:00开始录制,则视频的起始时刻的时间为2020.11.13.8:00,视频的结束时刻的时间为2020.11.13.8:25。The time of the video to be detected can be a relative time, that is, the time relative to the moment when the video starts. For example, if the total duration of the video is 25 minutes, the time of the start moment of the video is 0:00, and the end moment of the video is 0:00. The time of the video to be detected is 00:25; the time of the video to be detected can also be an absolute time, that is, the absolute time of the video recording. For example, the total duration of the video is still 25min, and the video starts from The time of the start moment of the video is 2020.11.13.8:00, and the time of the end moment of the video is 2020.11.13.8:25.
在一个示例中,待检测视频为图2所示出的第一图像所属的街景视频,目标对象为街景中的人物,因此可以确定出第一图像和至少一个第二图像中的行人数量,也就是能够确定出街景视频中的行人数量的变化。In an example, the video to be detected is a street view video to which the first image shown in FIG. 2 belongs, and the target object is a person in the street view. Therefore, the number of pedestrians in the first image and at least one second image can be determined, and also It is to be able to determine the change in the number of pedestrians in the Street View video.
本公开的实施例中,通过获取待检测视频中的其他帧的图像中目标对象的数量,进一步结合每帧图像的时间信息生成待检测视频的数量变化信息,因此可以在待检测视频对应的时间段内,获得目标对象的数量变化及趋势,从而进一步增加检测结果的全面性。In the embodiment of the present disclosure, by acquiring the number of target objects in images of other frames in the video to be detected, and further combining the time information of each frame of images to generate the quantity change information of the video to be detected, it is possible to detect the video at the time corresponding to the video to be detected. Within the segment, the number changes and trends of the target objects are obtained, thereby further increasing the comprehensiveness of the detection results.
例如,针对一个商业街区,可以获取一年中12个月的人流数量变化趋势,从而可以分析人们的消费习惯,进而得出消费的高峰月份、季度(即消费旺季),和消费的低谷月份、季度(即消费淡季);同理,针对该商业街区,还可以获取每天营业的时间内的人流数量变化趋势,从而得出每天消费的高峰时间和低谷时间。上述得出的这些信息可以作为商业经营或物业管理的指导数据,从而能够达到科学管理的目的。For example, for a commercial block, the change trend of the number of people in 12 months of the year can be obtained, so that people's consumption habits can be analyzed, and then the peak months and quarters of consumption (that is, peak consumption seasons), and the trough months and months of consumption can be obtained. Season (that is, consumption off-season); similarly, for this commercial block, the change trend of the number of people during business hours can also be obtained, so as to obtain the peak time and trough time of daily consumption. The information obtained above can be used as guidance data for business operation or property management, so as to achieve the purpose of scientific management.
再例如,针对高速公路,可以获取节假日前后的车流量变化趋势,从而可以统计出行数据,进而作为高速管理的指导数据。For another example, for expressways, the change trend of traffic flow before and after holidays can be obtained, so that travel data can be counted, which can then be used as guidance data for expressway management.
本公开的一些实施例中,还可以按照下述方式根据融合特征确定所述第一图像中目标对象的检测结果,包括:首先,根据所述融合特征生成目标对象的密度图;接下来,根据所述密度图中指示的每个目标对象的位置以及所述第一图像中的预设区域,确定所述第一图像中的预设区域内的目标对象的数量。In some embodiments of the present disclosure, the detection result of the target object in the first image may also be determined according to the fusion feature in the following manner, including: first, generating a density map of the target object according to the fusion feature; The position of each target object indicated in the density map and the preset area in the first image determine the number of target objects in the preset area in the first image.
其中,所述密度图用于指示所述第一图像中的目标对象的位置、数量、密度等信息,密度图的尺寸可以和第一特征以及第二特征对应的特征图的尺寸相等。例如,密度图中可以具有第一图像中的目标对象,且为每个目标对象标注位置和/或计数标志等标注 信息。因此可以根据密度图中目标对象的位置确定目标对象的数量,即通过对密度图中的目标对象进行求和便可确定目标对象的数量。The density map is used to indicate information such as the position, quantity, density, etc. of the target object in the first image, and the size of the density map may be equal to the size of the feature maps corresponding to the first feature and the second feature. For example, the density map may have the target objects in the first image, and label information such as position and/or count mark for each target object. Therefore, the number of target objects can be determined according to the positions of the target objects in the density map, that is, the number of target objects can be determined by summing the target objects in the density map.
其中,可以采用预先训练的神经网络确定密度图,例如采用诸如随机前沿方法(Stochastic Frontier Approach,SFA)的decoder模型确定密度图,这种模型可以使用多个特征图作为输入,从而提取不同尺度的特征,因此确定的密度图较为准确。本领域技术人员应当理解,以上生成密度图的具体方式仅为示意,本公开实施例对此不进行限制。Among them, a pre-trained neural network can be used to determine the density map. For example, a decoder model such as Stochastic Frontier Approach (SFA) can be used to determine the density map. This model can use multiple feature maps as input to extract different scales. Therefore, the determined density map is more accurate. Those skilled in the art should understand that the above specific manner of generating the density map is only for illustration, which is not limited in this embodiment of the present disclosure.
其中,预设区域可以是控制人流量的区域,例如某些限流场所,只允许一定数量的人进入,再例如,施工区域等某些危险区域,禁止行人进入,即人流量需要控制为0。Among them, the preset area can be an area where the flow of people is controlled, such as some places with limited flow, only a certain number of people are allowed to enter, and for example, some dangerous areas such as construction areas, pedestrians are prohibited from entering, that is, the flow of people needs to be controlled to 0 .
在确定预设区域内的目标对象的数量后,可以响应于所述预设区域内的目标对象的数量大于预设的数量阈值,生成提示信息。例如,限流场所的人流量超过了要求的最高人流量,进行报警,以禁止行人继续进入;再例如,施工区域进入行人后,进行报警,并提示行人及时离开;再例如,在一些户外的真人游戏中,可以对游戏人员的活动区域进行监视,若进入犯规区域,则进行报警;再例如,在足球、篮球等运动项目中,可以对运动员的活动区域进行监视,若进入犯规区域,则进行报警。After the number of target objects in the preset area is determined, prompt information may be generated in response to the number of target objects in the preset area being greater than a preset number threshold. For example, if the flow of people in the restricted area exceeds the maximum flow of people required, an alarm will be issued to prohibit pedestrians from continuing to enter; for another example, after pedestrians enter the construction area, the alarm will be issued and pedestrians should be reminded to leave in time; another example, in some outdoor areas In a live game, the activity area of the players can be monitored, and if they enter the foul area, the alarm will be issued; for example, in football, basketball and other sports, the activity area of the players can be monitored, and if they enter the foul area, the Make an alarm.
本公开的实施例中,将预设区域的目标对象的数量作为检测结果,能够实现对特定区域的人流检测和人流控制,增加了检测的针对性和准确性,从而使该检测方法的应用范围更加广泛。In the embodiment of the present disclosure, the number of target objects in the preset area is used as the detection result, which can realize the detection and control of the flow of people in a specific area, which increases the pertinence and accuracy of the detection, thereby increasing the application range of the detection method. wider.
请参照附图4,其示出了根据本公开一个实施例的目标检测的过程。其中,位置变化信息为光流信息,目标检测结果为密度图。该过程为:首先进行光流预测,接下来分别进行光流特征提取和图像特征提取,然后将光流特征与图像特征进行特征融合以获得融合特征,最后利用融合特征进行密度图预测。在一个实施例中,首先进行光流预测,即利用光流提取网络从第一图像和第一图像的前一帧图像中提取光流信息;接下来从提取的光流信息中,利用神经网络提取光流特征,以及从第一图像中利用神经网络(例如VGG16_bn)提取图像特征,然后,将光流特征作为掩膜与图像特征相乘,以获得融合特征;最后把融合特征送入到decoder(例如,SFA)来预测密度图。Please refer to FIG. 4 , which shows a process of object detection according to an embodiment of the present disclosure. Among them, the position change information is optical flow information, and the target detection result is a density map. The process is as follows: first perform optical flow prediction, then perform optical flow feature extraction and image feature extraction respectively, then perform feature fusion with optical flow features and image features to obtain fusion features, and finally use fusion features for density map prediction. In one embodiment, optical flow prediction is performed first, that is, the optical flow extraction network is used to extract optical flow information from the first image and the previous frame of the first image; then, from the extracted optical flow information, a neural network is used Extract optical flow features, and extract image features from the first image using a neural network (such as VGG16_bn), then multiply the optical flow features as a mask with the image features to obtain fusion features; finally, send the fusion features to the decoder (eg SFA) to predict density maps.
根据本公开实施例的第二方面,提供一种目标检测装置,请参照附图5,其示出了该装置的结构示意图,包括:第一获取模块501,用于获取第一图像中的至少一个像素点相对前一帧图像中的对应像素点的位置变化信息;第二获取模块502,用于获取所述第一图像的图像特征作为第一特征以及基于所述位置变化信息获取第二特征;融合模块 503,用于基于所述第二特征对所述第一特征进行增强处理,生成融合特征;检测模块504,用于根据所述融合特征确定所述第一图像中目标对象的检测结果。According to a second aspect of the embodiments of the present disclosure, a target detection apparatus is provided. Please refer to FIG. 5 , which shows a schematic structural diagram of the apparatus, including: a first acquisition module 501 for acquiring at least a The position change information of a pixel point relative to the corresponding pixel point in the previous frame image; the second acquisition module 502 is used to acquire the image feature of the first image as the first feature and obtain the second feature based on the position change information The fusion module 503 is used to carry out enhancement processing to the first feature based on the second feature to generate a fusion feature; the detection module 504 is used to determine the detection result of the target object in the first image according to the fusion feature .
在一个实施例中,所述位置变化信息包括光流信息,所述第二获取模块用于:将从所述光流信息中获取的光流特征作为所述第二特征。In one embodiment, the position change information includes optical flow information, and the second obtaining module is configured to: use the optical flow feature obtained from the optical flow information as the second feature.
在一个实施例中,所述融合模块用于:根据所述第二特征确定所述第一图像的至少一个像素点的位置变化率;针对所述至少一个像素点中的每个像素点,根据所述像素点的位置变化率确定目标特征元素的增强参数,其中,所述目标特征元素为所述第一特征中与所述像素点对应的特征元素;基于每个所述增强参数,对所述第一特征中对应的所述目标特征元素进行差别化增强处理,生成所述融合特征。In one embodiment, the fusion module is configured to: determine the position change rate of at least one pixel of the first image according to the second feature; for each pixel in the at least one pixel, according to The position change rate of the pixel point determines the enhancement parameter of the target feature element, wherein the target feature element is the feature element corresponding to the pixel point in the first feature; The target feature element corresponding to the first feature is subjected to differential enhancement processing to generate the fusion feature.
在一个实施例中,所述融合模块还用于:根据所述像素点的位置变化率和预设的标准变化率,确定所述目标特征元素的增强参数。In one embodiment, the fusion module is further configured to: determine the enhancement parameter of the target feature element according to the position change rate of the pixel point and a preset standard change rate.
在一个实施例中,所述融合模块还用于:响应于所述像素点的位置变化率与所述标准变化率相等,确定所述目标特征元素的增强参数为预设的标准增强参数;或响应于所述像素点的位置变化率大于所述标准变化率,确定所述目标特征元素的增强参数大于所述标准增强参数;或响应于所述像素点的位置变化率小于所述标准变化率,确定所述目标特征元素的增强参数小于所述标准增强参数。In one embodiment, the fusion module is further configured to: in response to the position change rate of the pixel being equal to the standard change rate, determine that the enhancement parameter of the target feature element is a preset standard enhancement parameter; or In response to the position change rate of the pixel point being greater than the standard rate of change, it is determined that the enhancement parameter of the target feature element is greater than the standard enhancement parameter; or in response to the position change rate of the pixel point being smaller than the standard rate of change , it is determined that the enhancement parameter of the target feature element is smaller than the standard enhancement parameter.
在一个实施例中,所述检测模块用于:根据所述融合特征生成所述目标对象的密度图;基于所述密度图中指代所述目标对象的密度点的数量,确定所述第一图像中的所述目标对象的第一数量信息。In one embodiment, the detection module is configured to: generate a density map of the target object according to the fusion feature; determine the first image based on the number of density points that refer to the target object in the density map The first quantity information of the target object in .
在一个实施例中,所述检测模块还用于:获取第二图像中的所述目标对象的第二数量信息,其中,所述第二图像为所述待检测视频中的一帧图像;获取第一时间信息和第二时间信息,其中,所述第一时间信息为所述第一图像在所述待检测视频中的时间,所述第二时间信息为所述第二图像在所述待检测视频中的时间;根据所述第一数量信息、所述第一时间信息、所述第二数量信息和所述第二时间信息,生成数量变化信息,其中,所述数量变化信息用于表示待检测视频中的目标对象在不同时刻的数量变化。In one embodiment, the detection module is further configured to: acquire second quantity information of the target object in a second image, where the second image is a frame of image in the video to be detected; acquire first time information and second time information, wherein the first time information is the time of the first image in the to-be-detected video, and the second time information is the time of the second image in the to-be-detected video. Detecting time in the video; generating quantity change information according to the first quantity information, the first time information, the second quantity information and the second time information, wherein the quantity change information is used to indicate The number of target objects in the video to be detected changes at different times.
在一个实施例中,所述检测模块用于:根据所述融合特征生成所述目标对象的密度图;根据所述密度图中指示的每个所述目标对象的位置,确定所述第一图像中的预设区域内的所述目标对象的数量。In one embodiment, the detection module is configured to: generate a density map of the target object according to the fusion feature; determine the first image according to the position of each target object indicated in the density map The number of the target objects within the preset area in .
在一个实施例中,所述检测模块还用于:响应于所述预设区域内的目标对象的数 量大于预设的数量阈值,生成提示信息。In one embodiment, the detection module is further configured to generate prompt information in response to the number of target objects in the preset area being greater than a preset number threshold.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在第一方面有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment of the method related to the first aspect, and will not be described in detail here.
本公开实施例的第三方面提供了一种电子设备,请参照附图6,其示出了该设备的结构,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时基于第一方面所述的方法对目标进行检测。A third aspect of the embodiments of the present disclosure provides an electronic device, please refer to FIG. 6 , which shows the structure of the device, and the device includes a memory and a processor, and the memory is used for storing data that can be stored on the processor. Running computer instructions, the processor is configured to detect a target based on the method of the first aspect when executing the computer instructions.
本公开实施例的第四方面提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现第一方面所述的方法。A fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method described in the first aspect.
在本公开中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性。术语“多个”指两个或两个以上,除非另有明确的限定。In the present disclosure, the terms "first" and "second" are used for descriptive purposes only, and should not be construed as indicating or implying relative importance. The term "plurality" refers to two or more, unless expressly limited otherwise.
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common general knowledge or techniques in the technical field not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (21)

  1. 一种目标检测方法,包括:A target detection method, comprising:
    获取第一图像中的至少一个像素点相对前一帧图像中的对应像素点的位置变化信息,所述第一图像为待检测视频中的一帧图像;Acquiring position change information of at least one pixel in the first image relative to the corresponding pixel in the previous frame of image, where the first image is a frame of image in the video to be detected;
    获取所述第一图像的图像特征作为第一特征;acquiring the image feature of the first image as the first feature;
    基于所述位置变化信息获取第二特征;obtaining a second feature based on the position change information;
    基于所述第二特征对所述第一特征进行增强处理,生成融合特征;以及The first feature is enhanced based on the second feature to generate a fusion feature; and
    根据所述融合特征确定所述第一图像中目标对象的检测结果。The detection result of the target object in the first image is determined according to the fusion feature.
  2. 根据权利要求1所述的目标检测方法,其中,所述位置变化信息包括光流信息,基于所述位置变化信息获取第二特征,包括:The target detection method according to claim 1, wherein the position change information includes optical flow information, and obtaining the second feature based on the position change information includes:
    将从所述光流信息中获取的光流特征作为所述第二特征。The optical flow feature obtained from the optical flow information is used as the second feature.
  3. 根据权利要求1或2所述的目标检测方法,其中,基于所述第二特征对所述第一特征进行增强处理,生成所述融合特征,包括:The target detection method according to claim 1 or 2, wherein the first feature is enhanced based on the second feature to generate the fusion feature, comprising:
    根据所述第二特征确定所述第一图像的至少一个像素点的位置变化率;determining a position change rate of at least one pixel of the first image according to the second feature;
    针对所述至少一个像素点中的每个像素点,根据所述像素点的位置变化率确定目标特征元素的增强参数,其中,所述目标特征元素为所述第一特征中与所述像素点对应的特征元素;For each pixel point in the at least one pixel point, the enhancement parameter of the target feature element is determined according to the position change rate of the pixel point, wherein the target feature element is the difference between the pixel point in the first feature and the pixel point. the corresponding feature element;
    基于每个所述增强参数,对所述第一特征中对应的所述目标特征元素进行差别化增强处理,生成所述融合特征。Based on each of the enhancement parameters, differential enhancement processing is performed on the corresponding target feature elements in the first feature to generate the fusion feature.
  4. 根据权利要求3所述的目标检测方法,其中,根据所述像素点的位置变化率确定目标特征元素的增强参数,包括:The target detection method according to claim 3, wherein determining the enhancement parameter of the target feature element according to the position change rate of the pixel, comprising:
    根据所述像素点的位置变化率和预设的标准变化率,确定所述目标特征元素的增强参数。The enhancement parameter of the target feature element is determined according to the position change rate of the pixel point and the preset standard change rate.
  5. 根据权利要求4所述的目标检测方法,其中,根据所述像素点的位置变化率和预设的标准变化率,确定所述目标特征元素的增强参数,包括:The target detection method according to claim 4, wherein determining the enhancement parameter of the target feature element according to the position change rate of the pixel point and a preset standard change rate, comprising:
    响应于所述像素点的位置变化率与所述标准变化率相等,确定所述目标特征元素的增强参数为预设的标准增强参数;或In response to the position change rate of the pixel being equal to the standard change rate, determining that the enhancement parameter of the target feature element is a preset standard enhancement parameter; or
    响应于所述像素点的位置变化率大于所述标准变化率,确定所述目标特征元素的增强参数大于所述标准增强参数;或In response to the position change rate of the pixel point being greater than the standard change rate, determining that the enhancement parameter of the target feature element is greater than the standard enhancement parameter; or
    响应于所述像素点的位置变化率小于所述标准变化率,确定所述目标特征元素的增强参数小于所述标准增强参数。In response to the position change rate of the pixel point being smaller than the standard change rate, it is determined that the enhancement parameter of the target feature element is smaller than the standard enhancement parameter.
  6. 根据权利要求1至5任意一项所述的目标检测方法,其中,根据所述融合特征确定所述第一图像中目标对象的检测结果,包括:The target detection method according to any one of claims 1 to 5, wherein determining the detection result of the target object in the first image according to the fusion feature comprises:
    根据所述融合特征生成所述目标对象的密度图;generating a density map of the target object according to the fusion feature;
    基于所述密度图中指代所述目标对象的密度点的数量,确定所述第一图像中的所述目标对象的第一数量信息。First quantity information of the target object in the first image is determined based on the number of density points that refer to the target object in the density map.
  7. 根据权利要求6所述的目标检测方法,还包括:The target detection method according to claim 6, further comprising:
    获取第二图像中的所述目标对象的第二数量信息,其中,所述第二图像为所述待检测视频中的一帧图像;acquiring second quantity information of the target object in a second image, wherein the second image is a frame of image in the video to be detected;
    获取第一时间信息和第二时间信息,其中,所述第一时间信息为所述第一图像在所述待检测视频中的时间,所述第二时间信息为所述第二图像在所述待检测视频中的时间;Obtain first time information and second time information, wherein the first time information is the time of the first image in the video to be detected, and the second time information is the time of the second image in the The time in the video to be detected;
    根据所述第一数量信息、所述第一时间信息、所述第二数量信息和所述第二时间信息,生成数量变化信息,其中,所述数量变化信息用于表示所述待检测视频中的所述目标对象在不同时刻的数量变化。Quantity change information is generated according to the first quantity information, the first time information, the second quantity information and the second time information, wherein the quantity change information is used to indicate that the video to be detected is in the The number of the target objects changes at different times.
  8. 根据权利要求1至5任意一项所述的目标检测方法,其中,根据所述融合特征确定所述第一图像中目标对象的检测结果,包括:The target detection method according to any one of claims 1 to 5, wherein determining the detection result of the target object in the first image according to the fusion feature comprises:
    根据所述融合特征生成所述目标对象的密度图;generating a density map of the target object according to the fusion feature;
    根据所述密度图中指示的每个所述目标对象的位置,确定所述第一图像中的预设区域内的所述目标对象的数量。According to the position of each of the target objects indicated in the density map, the number of the target objects in the preset area in the first image is determined.
  9. 根据权利要求8所述的目标检测方法,还包括:The target detection method according to claim 8, further comprising:
    响应于所述预设区域内的所述目标对象的数量大于预设的数量阈值,生成提示信息。In response to the number of the target objects in the preset area being greater than a preset number threshold, prompt information is generated.
  10. 一种目标检测装置,包括:A target detection device, comprising:
    第一获取模块,用于获取第一图像中的至少一个像素点相对前一帧图像中的对应像素点的位置变化信息,所述第一图像为待检测视频中的一帧图像;a first acquisition module, configured to acquire the position change information of at least one pixel in the first image relative to the corresponding pixel in the previous frame of image, where the first image is a frame of image in the video to be detected;
    第二获取模块,用于获取所述第一图像的图像特征作为第一特征以及基于所述位置变化信息获取第二特征;a second acquisition module, configured to acquire an image feature of the first image as a first feature and acquire a second feature based on the position change information;
    融合模块,用于基于所述第二特征对所述第一特征进行增强处理,生成融合特征;a fusion module, configured to perform enhancement processing on the first feature based on the second feature to generate a fusion feature;
    检测模块,用于根据所述融合特征确定所述第一图像中目标对象的检测结果。A detection module, configured to determine the detection result of the target object in the first image according to the fusion feature.
  11. 一种电子设备,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现:An electronic device, the device comprising a memory and a processor, the memory for storing computer instructions that can be executed on the processor, the processor for implementing when executing the computer instructions:
    获取第一图像中的至少一个像素点相对前一帧图像中的对应像素点的位置变化信息,所述第一图像为待检测视频中的一帧图像;Acquiring position change information of at least one pixel in the first image relative to the corresponding pixel in the previous frame of image, where the first image is a frame of image in the video to be detected;
    获取所述第一图像的图像特征作为第一特征;acquiring the image feature of the first image as the first feature;
    基于所述位置变化信息获取第二特征;obtaining a second feature based on the position change information;
    基于所述第二特征对所述第一特征进行增强处理,生成融合特征;The first feature is enhanced based on the second feature to generate a fusion feature;
    根据所述融合特征确定所述第一图像中目标对象的检测结果。The detection result of the target object in the first image is determined according to the fusion feature.
  12. 根据权利要求11所述的电子设备,其中,所述位置变化信息包括光流信息,基于所述位置变化信息获取第二特征时,所述处理器在执行所述计算机指令时实现:The electronic device according to claim 11, wherein the position change information includes optical flow information, and when acquiring the second feature based on the position change information, the processor implements when executing the computer instructions:
    将从所述光流信息中获取的光流特征作为所述第二特征。The optical flow feature obtained from the optical flow information is used as the second feature.
  13. 根据权利要求11或12所述的电子设备,其中,基于所述第二特征对所述第一特征进行增强处理,生成所述融合特征时,所述处理器在执行所述计算机指令时实现:The electronic device according to claim 11 or 12, wherein, when the first feature is enhanced based on the second feature, and the fusion feature is generated, the processor implements when executing the computer instructions:
    针对所述至少一个像素点中的每个像素点,根据所述像素点的位置变化率确定目标特征元素的增强参数,其中,所述目标特征元素为所述第一特征中与所述像素点对应的特征元素;For each pixel point in the at least one pixel point, the enhancement parameter of the target feature element is determined according to the position change rate of the pixel point, wherein the target feature element is the difference between the first feature and the pixel point. the corresponding feature element;
    基于每个所述增强参数,对所述第一特征中对应的所述目标特征元素进行差别化增强处理,生成所述融合特征。Based on each of the enhancement parameters, differential enhancement processing is performed on the corresponding target feature elements in the first feature to generate the fusion feature.
  14. 根据权利要求13所述的电子设备,其中,根据所述像素点的位置变化率确定目标特征元素的增强参数时,所述处理器在执行所述计算机指令时实现:The electronic device according to claim 13, wherein, when determining the enhancement parameter of the target feature element according to the position change rate of the pixel point, the processor implements when executing the computer instruction:
    根据所述像素点的位置变化率和预设的标准变化率,确定所述目标特征元素的增强参数。The enhancement parameter of the target feature element is determined according to the position change rate of the pixel point and the preset standard change rate.
  15. 根据权利要求14所述的电子设备,其中,根据所述像素点的位置变化率和预设的标准变化率,确定所述目标特征元素的增强参数时,所述处理器在执行所述计算机指令时实现:The electronic device according to claim 14, wherein, when the enhancement parameter of the target feature element is determined according to the position change rate of the pixel point and the preset standard change rate, the processor executes the computer instruction When realized:
    响应于所述像素点的位置变化率与所述标准变化率相等,确定所述目标特征元素的增强参数为预设的标准增强参数;或In response to the position change rate of the pixel being equal to the standard change rate, determining that the enhancement parameter of the target feature element is a preset standard enhancement parameter; or
    响应于所述像素点的位置变化率大于所述标准变化率,确定所述目标特征元素的增强参数大于所述标准增强参数;或In response to the position change rate of the pixel point being greater than the standard change rate, determining that the enhancement parameter of the target feature element is greater than the standard enhancement parameter; or
    响应于所述像素点的位置变化率小于所述标准变化率,确定所述目标特征元素的增强参数小于所述标准增强参数。In response to the position change rate of the pixel point being smaller than the standard change rate, it is determined that the enhancement parameter of the target feature element is smaller than the standard enhancement parameter.
  16. 根据权利要求11至15任意一项所述的电子设备,其中,根据所述融合特征确定所述第一图像中目标对象的检测结果时,所述处理器在执行所述计算机指令时实现:The electronic device according to any one of claims 11 to 15, wherein, when the detection result of the target object in the first image is determined according to the fusion feature, the processor implements when executing the computer instructions:
    根据所述融合特征生成所述目标对象的密度图;generating a density map of the target object according to the fusion feature;
    基于所述密度图中指代所述目标对象的密度点的数量,确定所述第一图像中的所述 目标对象的第一数量信息。First quantity information of the target object in the first image is determined based on the number of density points that refer to the target object in the density map.
  17. 根据权利要求16所述的电子设备,所述处理器在执行所述计算机指令时还实现:The electronic device of claim 16, the processor, when executing the computer instructions, further implements:
    获取第二图像中的所述目标对象的第二数量信息,其中,所述第二图像为所述待检测视频中的一帧图像;acquiring second quantity information of the target object in a second image, wherein the second image is a frame of image in the video to be detected;
    获取第一时间信息和第二时间信息,其中,所述第一时间信息为所述第一图像在所述待检测视频中的时间,所述第二时间信息为所述第二图像在所述待检测视频中的时间;Obtain first time information and second time information, wherein the first time information is the time of the first image in the video to be detected, and the second time information is the time of the second image in the The time in the video to be detected;
    根据所述第一数量信息、所述第一时间信息、所述第二数量信息和所述第二时间信息,生成数量变化信息,其中,所述数量变化信息用于表示所述待检测视频中的所述目标对象在不同时刻的数量变化。Quantity change information is generated according to the first quantity information, the first time information, the second quantity information and the second time information, wherein the quantity change information is used to indicate that the video to be detected is in the The number of the target objects changes at different times.
  18. 根据权利要求11至15任意一项所述的电子设备,其中,根据所述融合特征确定所述第一图像中目标对象的检测结果时,所述处理器在执行所述计算机指令时实现:The electronic device according to any one of claims 11 to 15, wherein, when the detection result of the target object in the first image is determined according to the fusion feature, the processor implements when executing the computer instructions:
    根据所述融合特征生成所述目标对象的密度图;generating a density map of the target object according to the fusion feature;
    根据所述融合特征生成所述目标对象的密度图;generating a density map of the target object according to the fusion feature;
    根据所述密度图中指示的每个所述目标对象的位置,确定所述第一图像中的预设区域内的所述目标对象的数量。According to the position of each of the target objects indicated in the density map, the number of the target objects in the preset area in the first image is determined.
  19. 根据权利要求18所述的电子设备,所述处理器在执行所述计算机指令时还实现:19. The electronic device of claim 18, the processor, when executing the computer instructions, further implements:
    响应于所述预设区域内的所述目标对象的数量大于预设的数量阈值,生成提示信息。In response to the number of the target objects in the preset area being greater than a preset number threshold, prompt information is generated.
  20. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现权利要求1至9任一所述的方法。A computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method according to any one of claims 1 to 9.
  21. 一种计算机程序,所述计算机程序存储在计算机可读介质上,其中当所述计算机程序被处理器执行时实现权利要求1至9任一所述的方法。A computer program stored on a computer-readable medium, wherein the method of any one of claims 1 to 9 is implemented when the computer program is executed by a processor.
PCT/CN2021/102202 2021-03-31 2021-06-24 Target detection method and apparatus, device and storage medium WO2022205632A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110352206.0A CN113011371A (en) 2021-03-31 2021-03-31 Target detection method, device, equipment and storage medium
CN202110352206.0 2021-03-31

Publications (1)

Publication Number Publication Date
WO2022205632A1 true WO2022205632A1 (en) 2022-10-06

Family

ID=76387771

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102202 WO2022205632A1 (en) 2021-03-31 2021-06-24 Target detection method and apparatus, device and storage medium

Country Status (3)

Country Link
CN (1) CN113011371A (en)
TW (1) TW202240471A (en)
WO (1) WO2022205632A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011371A (en) * 2021-03-31 2021-06-22 北京市商汤科技开发有限公司 Target detection method, device, equipment and storage medium
CN113901909B (en) * 2021-09-30 2023-10-27 北京百度网讯科技有限公司 Video-based target detection method and device, electronic equipment and storage medium
CN114528923B (en) * 2022-01-25 2023-09-26 山东浪潮科学研究院有限公司 Video target detection method, device, equipment and medium based on time domain context

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709447A (en) * 2016-12-21 2017-05-24 华南理工大学 Abnormal behavior detection method in video based on target positioning and characteristic fusion
US20190120955A1 (en) * 2017-10-20 2019-04-25 Texas Instruments Incorporated System and method for camera radar fusion
CN110874853A (en) * 2019-11-15 2020-03-10 上海思岚科技有限公司 Method, device and equipment for determining target motion and storage medium
CN111695627A (en) * 2020-06-11 2020-09-22 腾讯科技(深圳)有限公司 Road condition detection method and device, electronic equipment and readable storage medium
CN111784735A (en) * 2020-04-15 2020-10-16 北京京东尚科信息技术有限公司 Target tracking method, device and computer readable storage medium
CN112580545A (en) * 2020-12-24 2021-03-30 山东师范大学 Crowd counting method and system based on multi-scale self-adaptive context network
CN113011371A (en) * 2021-03-31 2021-06-22 北京市商汤科技开发有限公司 Target detection method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229336B (en) * 2017-12-13 2021-06-04 北京市商汤科技开发有限公司 Video recognition and training method and apparatus, electronic device, program, and medium
CN110176027B (en) * 2019-05-27 2023-03-14 腾讯科技(深圳)有限公司 Video target tracking method, device, equipment and storage medium
CN111428551B (en) * 2019-12-30 2023-06-16 杭州海康威视数字技术股份有限公司 Density detection method, density detection model training method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709447A (en) * 2016-12-21 2017-05-24 华南理工大学 Abnormal behavior detection method in video based on target positioning and characteristic fusion
US20190120955A1 (en) * 2017-10-20 2019-04-25 Texas Instruments Incorporated System and method for camera radar fusion
CN110874853A (en) * 2019-11-15 2020-03-10 上海思岚科技有限公司 Method, device and equipment for determining target motion and storage medium
CN111784735A (en) * 2020-04-15 2020-10-16 北京京东尚科信息技术有限公司 Target tracking method, device and computer readable storage medium
CN111695627A (en) * 2020-06-11 2020-09-22 腾讯科技(深圳)有限公司 Road condition detection method and device, electronic equipment and readable storage medium
CN112580545A (en) * 2020-12-24 2021-03-30 山东师范大学 Crowd counting method and system based on multi-scale self-adaptive context network
CN113011371A (en) * 2021-03-31 2021-06-22 北京市商汤科技开发有限公司 Target detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
TW202240471A (en) 2022-10-16
CN113011371A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
WO2022205632A1 (en) Target detection method and apparatus, device and storage medium
KR102189262B1 (en) Apparatus and method for collecting traffic information using edge computing
JP7036863B2 (en) Systems and methods for activity monitoring using video data
Tu et al. Automatic behaviour analysis system for honeybees using computer vision
US10997428B2 (en) Automated detection of building entrances
Bjerge et al. Real‐time insect tracking and monitoring with computer vision and deep learning
Lim et al. Automated classroom monitoring with connected visioning system
Dujon et al. Machine learning to detect marine animals in UAV imagery: Effect of morphology, spacing, behaviour and habitat
US20160314353A1 (en) Virtual turnstile system and method
Chang et al. Video analytics in smart transportation for the AIC'18 challenge
JP6789876B2 (en) Devices, programs and methods for tracking objects using pixel change processed images
CN112733690A (en) High-altitude parabolic detection method and device and electronic equipment
CN114721403B (en) Automatic driving control method and device based on OpenCV and storage medium
CN115100732A (en) Fishing detection method and device, computer equipment and storage medium
CN113935395A (en) Training of object recognition neural networks
CN109063790A (en) Object identifying model optimization method, apparatus and electronic equipment
CN111797831A (en) BIM and artificial intelligence based parallel abnormality detection method for poultry feeding
CN104077571B (en) A kind of crowd's anomaly detection method that model is serialized using single class
Kay et al. The Caltech Fish Counting dataset: a benchmark for multiple-object tracking and counting
CN108960165A (en) A kind of stadiums population surveillance method based on intelligent video identification technology
Alghyaline A real-time street actions detection
Follmann et al. Detecting animals in infrared images from camera-traps
CN113822367B (en) Regional behavior analysis method, system and medium based on human face
Hu et al. Multi-level trajectory learning for traffic behavior detection and analysis
CN112541403B (en) Indoor personnel falling detection method by utilizing infrared camera

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21934281

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21934281

Country of ref document: EP

Kind code of ref document: A1