CN116433767B

CN116433767B - Target object detection method, target object detection device, electronic equipment and storage medium

Info

Publication number: CN116433767B
Application number: CN202310413991.5A
Authority: CN
Inventors: 吕以豪; 卢飞翔; 李龙腾; 张良俊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2024-02-20
Anticipated expiration: 2043-04-18
Also published as: CN116433767A

Abstract

The disclosure provides a target object detection method, a target object detection device, electronic equipment and a storage medium, and relates to the technical field of image processing, in particular to the technical field of artificial intelligence and the technical field of computer vision. The specific implementation scheme is as follows: determining object pixels representing a target object in images to be detected, wherein the images to be detected comprise N images to be detected, the N images to be detected are respectively shot by corresponding shooting devices, and N is an integer larger than 1; determining imaging rays passing through the object pixels and the shooting devices corresponding to the object pixels according to the device geographic positions and the device attribute parameters of the shooting devices; obtaining candidate point elements representing the target object according to the imaging rays corresponding to the N shooting devices; and determining the target geographic position of the target object according to the candidate geographic position of the candidate point element.

Description

Target object detection method, target object detection device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technology, and in particular, to the field of artificial intelligence technology and the field of computer vision technology.

Background

Along with the rapid development of science and technology, in scenes such as football match rebroadcasting, the method can detect according to collected match rebroadcasting videos, and generate positions of target objects such as football athletes in match rebroadcasting video images, so that rich viewing experience is provided for users watching the match. While the relevant game responsible personnel, such as assistant referees, can assist in game penalties based on the location of the target object in the game's rebroadcast video image.

Disclosure of Invention

The present disclosure provides a target object detection method, apparatus, electronic device, storage medium, and computer program product.

According to an aspect of the present disclosure, there is provided a target object detection method including: determining object pixels representing a target object in images to be detected, wherein the images to be detected comprise N images to be detected, the N images to be detected are respectively shot by corresponding shooting devices, and N is an integer larger than 1; determining imaging rays passing through the object pixels and the shooting devices corresponding to the object pixels according to the device geographic positions and the device attribute parameters of the shooting devices; obtaining candidate point elements representing the target object according to the imaging rays corresponding to the N shooting devices; and determining the target geographic position of the target object according to the candidate geographic position of the candidate point element.

According to another aspect of the present disclosure, there is provided a target object detection apparatus including: the object pixel determining module is used for determining object pixels representing target objects in images to be detected, the images to be detected comprise N images to be detected, the N images to be detected are respectively obtained by shooting through shooting devices corresponding to the N images to be detected, and N is an integer larger than 1; the imaging ray determining module is used for determining imaging rays passing through the object pixels and the shooting devices corresponding to the object pixels according to the device geographic positions and the device attribute parameters of the shooting devices; the candidate point element determining module is used for obtaining candidate point elements representing the target object according to the imaging rays corresponding to the N shooting devices; and the target geographic position determining module is used for determining the target geographic position of the target object according to the candidate geographic positions of the candidate point elements.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method provided according to an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method provided according to embodiments of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which target object detection methods and apparatus may be applied, according to embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a target object detection method according to an embodiment of the disclosure;

FIG. 3 schematically illustrates an application scenario diagram of a target object detection method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates an application scenario diagram of a target object detection method according to another embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a target object detection method according to another embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of a target object detection apparatus according to an embodiment of the disclosure; and

fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are taken, and the public order harmony is not violated.

Along with the rapid development of science and technology, in scenes such as football match rebroadcasting, the method can detect according to collected match rebroadcasting videos, and generate positions of target objects such as football athletes in match rebroadcasting video images, so that rich viewing experience is provided for users watching the match. While the relevant game responsible personnel, such as assistant referees, can assist in game penalties based on the location of the target object in the game's rebroadcast video image. However, the inventors have found that it is difficult for a general target object detection method to accurately characterize the actual position of a target object. In the related application scenario, for example, in the process of performing off-site penalty according to the position of the target object in the video image of the game by the assistant referee, and performing penalty decision analysis of the type such as football out-of-bounds penalty, it is difficult to obtain the accurate position of the target object such as football and athlete in the video image of the game, and further it is difficult to perform accurate penalty decision according to the position of the target object in the video image of the game.

Meanwhile, relevant sports technology analysts need important information such as the positions, the motion trails and the like of target objects such as athletes, football and the like in the match replay video images to carry out sports professional analysis work so as to help the athletes to promote the competitive level.

Embodiments of the present disclosure provide a target object detection method, apparatus, electronic device, storage medium, and computer program product. The target object detection method comprises the following steps: determining object pixels representing a target object in images to be detected, wherein the images to be detected comprise N images to be detected, the N images to be detected are respectively shot by corresponding shooting devices, and N is an integer larger than 1; determining imaging rays passing through the object pixels and the shooting devices corresponding to the object pixels according to the device geographic positions and the device attribute parameters of the shooting devices; obtaining candidate point elements representing the target object according to the imaging rays corresponding to the N shooting devices; and determining the target geographic position of the target object according to the candidate geographic position of the candidate point element.

According to the embodiment of the disclosure, the object pixels which respectively represent the target object in the image to be detected are determined from the image to be detected acquired by the shooting devices, and the candidate point elements of the N imaging rays are obtained according to the imaging rays passing through the geographic positions of the devices and the object pixels, so that the candidate point elements can be integrated with the image to be detected of a plurality of perspectives to represent the geographic position of the target object, the target geographic position is determined according to the candidate geographic positions of the candidate point elements, the detection precision of the geographic position of the target object can be improved, the movement condition of the target object can be accurately analyzed according to the geographic position of the target object, and the analysis efficiency and the precision of the position or the movement condition of the target object in application scenes such as video broadcasting are improved.

Fig. 1 schematically illustrates an exemplary system architecture to which target object detection methods and apparatus may be applied, according to embodiments of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the target object detection method and apparatus may be applied may include a terminal device, but the terminal device may implement the target object detection method and apparatus provided by the embodiments of the present disclosure without interaction with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (as examples only).

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the target object detection method provided by the embodiments of the present disclosure may be generally performed by the terminal device 101, 102, or 103. Accordingly, the target object detection apparatus provided by the embodiments of the present disclosure may also be provided in the terminal device 101, 102, or 103.

Alternatively, the target object detection method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the target object detection apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The target object detection method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the target object detection apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a flowchart of a target object detection method according to an embodiment of the present disclosure.

As shown in fig. 2, the target object detection method includes operations S210 to S240.

In operation S210, it is determined that the to-be-detected image includes N to-be-detected images, where N is an integer greater than 1, which are respectively captured by the respective corresponding capturing devices.

In operation S220, an imaging ray passing through the object pixel and the photographing device corresponding to the object pixel is determined according to the device geographical location and the device attribute parameter of the photographing device.

In operation S230, candidate point elements characterizing the target object are obtained according to the imaging rays corresponding to each of the N photographing devices.

In operation S240, a target geographic location of the target object is determined based on the candidate geographic locations of the candidate point elements.

According to the embodiment of the disclosure, the photographing device may include an image capturing device such as a video camera or a still camera, and the N photographing devices may each capture an image to be detected related to the target object. The N cameras may be disposed at respective device geographical locations, which may be world coordinate locations of the respective cameras in a world coordinate system, so that the N cameras may form multi-view shots for the target object.

According to an embodiment of the disclosure, the images to be detected obtained by the N photographing devices may be images to be detected aligned in time sequence, for example, the images to be detected obtained by the N photographing devices may be obtained by collecting the same image collecting time.

It is understood that the device geographic location may be the geographic location of the lens center of the camera. Alternatively, the geographic position of the device can be the geographic position of any point of the shooting device, by fixing the shooting device, the geographic position of the lens center of the shooting device can be calculated by the size of the shooting device after fixing,

according to the embodiment of the disclosure, the object pixels representing the target object in the image to be detected are determined, the object pixels can be obtained after the image to be detected is processed through the target detection algorithm, but not limited to this, or the object pixels in the image to be detected can be determined through other modes such as manual labeling, the specific mode of determining the object pixels is not limited, and a person skilled in the art can select according to actual requirements.

According to the embodiment of the present disclosure, the device attribute parameters may include any type of attribute parameters such as an in-device parameter, an out-device parameter, a focal length of the photographing device, etc., and the specific type of the device attribute parameters in the embodiment of the present disclosure is not limited, and may be selected by those skilled in the art according to actual needs.

According to an embodiment of the present disclosure, the imaging ray may be a ray passing through a lens center of the photographing device and the object pixel. Because the shooting device arranged at the geographic position of the device acquires the image of the target object to be detected, at least part of the target object can be projected onto the object pixels of the image to be detected, and the imaging rays can be primarily determined to pass through the lens center of the shooting device in the world coordinate system and pass through the position of the target object in the world coordinate system.

According to an embodiment of the present disclosure, the candidate point element may be a point element in an area surrounded by imaging rays corresponding to each of the N photographing devices. Alternatively, the candidate point element may be a point element on the imaging ray corresponding to each of the N cameras. Alternatively still, the candidate point element may be derived from one or more imaging rays based on other means, as long as the candidate point element can be determined using the positional relationship between the imaging rays and the world coordinate system.

According to the embodiment of the disclosure, candidate point elements are obtained according to imaging rays corresponding to N shooting devices, so that the candidate point elements can be integrated into images to be detected of multiple perspectives to represent geographic positions of a target object, and then the target geographic positions are determined according to the candidate geographic positions of the candidate point elements, so that detection precision of the geographic positions of the target object can be improved, movement conditions of the target object can be accurately analyzed according to the geographic positions of the target object, and analysis efficiency and accuracy of the position or movement conditions of the target object in application scenes such as video rebroadcasting are improved.

The method shown in fig. 2 is further described below with reference to fig. 3-5 in conjunction with specific embodiments.

According to an embodiment of the present disclosure, determining object pixels representing a target object in an image to be detected includes: performing target detection on the image to be detected to obtain a target detection frame corresponding to the target object; and determining the object pixels from the image to be detected according to the target detection frame.

According to the embodiment of the disclosure, the image to be detected can be subjected to target detection based on the target detection model, for example, the image to be detected can be subjected to target detection based on the target detection model constructed by the fast RCNN (Regions with CNN features) algorithm, and a target detection frame corresponding to the target object in the image to be detected is obtained.

According to an embodiment of the disclosure, the image to be detected may include one or more pixels characterizing the target object in the region image in the target detection frame. Thus, object pixels characterizing the position of the target object can be determined from the region image of the image to be detected in the target detection frame.

According to an embodiment of the present disclosure, the determination of the object pixel from the image to be detected according to the target detection frame may be to determine the pixel of the image to be detected in the target detection frame as the object pixel, for example, determine the pixel corresponding to the center point of the target detection frame as the object pixel. However, not limited thereto, a pixel on the overlapping boundary of the target detection frame and the image to be detected may be determined as the target pixel, and for example, a pixel in the image to be detected corresponding to one vertex of the target detection frame may be determined as the target pixel. The embodiment of the present disclosure is not limited to a specific manner of determining the object pixel, and one skilled in the art may select according to actual needs.

According to an embodiment of the present disclosure, determining an object pixel from an image to be detected according to a target detection frame may include: determining a pixel corresponding to a center point of a target detection frame of the target detection frame in the image to be detected as an object pixel; and/or determining the pixel corresponding to the midpoint of the bottom frame of the target detection frame in the image to be detected as the target pixel.

In one embodiment of the present disclosure, in a case where the target detection frame characterizes the target object as a first type of target object having a smaller volume, for example, in a case where the target detection frame characterizes the target object as a soccer ball, a pixel corresponding to a center point of the target detection frame in the image to be detected may be determined as the target pixel. In this way, the position of the target object with smaller volume can be represented more precisely by the obtained object pixels.

In one embodiment of the present disclosure, in a case where the target detection frame characterizes the target object as a second type of target object having a larger volume, for example, in a case where the target detection frame characterizes the target object as an athlete, a pixel corresponding to a midpoint of a bottom frame of the target detection frame in the image to be detected may be determined as the target pixel. In this way, the position of the target object with a larger volume can be represented more accurately by the obtained object pixels.

According to the embodiments of the present disclosure, in a case where two types of target objects having large volume differences exist simultaneously in an image to be detected, for example, in a case where a soccer player and a soccer ball exist simultaneously in a soccer relay video frame, for a target detection frame corresponding to the soccer player, a pixel corresponding to a midpoint of a bottom frame of the target detection frame may be determined as an object pixel corresponding to the soccer player. Accordingly, for the target detection frame corresponding to the soccer ball, a pixel corresponding to the target detection frame center point of the target detection frame may be determined as the target pixel corresponding to the soccer ball.

According to an embodiment of the present disclosure, the device attribute parameters include device intrinsic information and device extrinsic information.

According to an embodiment of the present disclosure, determining an imaging ray passing through a subject pixel according to a device geographical location and a device attribute parameter of the photographing device, and the photographing device corresponding to the subject pixel may include:

processing the object pixel position of the object pixel according to the device internal reference information to obtain a first imaging point geographic position; and obtaining imaging rays corresponding to the shooting device according to the geographic position of the first imaging point, the geographic position of the device and the device external parameter information.

According to the present disclosureIn the embodiment of the present invention, in the case where the target object is a soccer ball, a pixel corresponding to the center point of the target detection frame in the image to be detected may be determined as the target pixel. The coordinates of the object pixel corresponding to the image to be detected (i.e., the object pixel position) can be expressed as (u) _f ，v _f ). According to the object pixel position (u _f ，v _f ) And device reference information (such as reference matrix of camera), the geographic position of the first imaging point can be obtained asWhere K represents device reference information.

Accordingly, the device geographical location may be represented by device profile information, e.g., the device geographical location may be represented as-T, where T represents a translation matrix in the device profile information. Meanwhile, the imaging ray may be represented by formula (1).

L＝-T+nV； (1)

In the formula (1), L represents an imaging ray, -T represents a device geographical position, which may be, for example, a coordinate position of a lens center of the photographing device in a world coordinate system; v may represent a vector of the device geographic location to the first imaging point geographic location, V may be represented by a difference between the device geographic location and the first imaging point geographic location; n may represent imaging radiation parameters.

According to an embodiment of the present disclosure, obtaining candidate point elements characterizing a target object according to imaging rays corresponding to each of the N photographing devices may include: from the imaging rays corresponding to the N shooting devices, determining an ith imaging ray and a jth imaging ray, wherein both the ith imaging ray and the jth imaging ray are integers which are larger than 0 and smaller than or equal to N, and the ith imaging ray is not equal to the j; determining a vertical vector perpendicular to the ith imaging ray and perpendicular to the jth imaging ray; and determining candidate point elements according to a first intersection point element of the vertical vector intersecting the ith imaging ray and a second intersection point element of the vertical vector intersecting the jth imaging ray.

According to an embodiment of the present disclosure, the ith imaging ray and the jth imaging ray may be any two different rays of the N imaging rays. The ith imaging ray and the jth imaging ray in the world coordinate system can be expressed by the following formulas (2) and (3), respectively.

L _i ＝-T _i +nV _i ； (2)

L _j ＝-T _j +nV _j ； (3)

In the formulas (2) and (3), L _i For the ith imaging ray, L _j For the j-th imaging ray, -T _i Means for representing a device geographical location of a camera corresponding to the i-th imaging ray; -T _j Means for representing a device geographical location of a camera corresponding to the j-th imaging ray; v (V) _i May represent a vector between the device geographical location for the ith imaging ray and the geographical location of the first imaging point, V _j A vector between the device geographic location for the jth imaging ray to the first imaging point geographic location may be represented.

According to embodiments of the present disclosure, it may be provided that the first intersection element is denoted as-T _i +n _i V _i Setting the second intersection element to be-T _j +n _j V _j . The perpendicular vector E between the first intersection element and the second intersection element may satisfy the constraint condition described in the formula (4), so as to realize that the vector E in the world coordinate system is perpendicular to the ith imaging ray and perpendicular to the jth imaging ray.

In equation (4), E may be a vertical vector between the first intersection element and the second intersection element.

According to the above formulas (2) to (4), n can be obtained by solving _i And n _j Further, the coordinate positions of the first intersection element and the second intersection element in the world coordinate system are obtained. The midpoint of a line segment formed by the first intersection point element and the second intersection point element in the world coordinate system can be a candidate point element, and correspondingly, the geographic position of the candidate point element can be obtained by the respective geographic positions of the first intersection point element and the second intersection point element, namely, the candidate point element is obtained by solvingGeographic location.

Fig. 3 schematically illustrates an application scenario diagram of a target object detection method according to an embodiment of the present disclosure.

As shown in fig. 3, the 1 st camera 311, the 2 nd camera 312, and the soccer field 320 may be included in the application scene 300. The 1 st photographing device 311 and the 2 nd photographing device 312 can perform image acquisition on the football field 320 at the same image acquisition time, and respectively obtain a 1 st image 331 to be detected and a 2 nd image 332 to be detected.

The 1 st target detection frame 3311 corresponding to the 1 st image 331 to be detected and the 2 nd target detection frame 3321 corresponding to the 2 nd image 332 to be detected can be obtained by performing target detection on the 1 st image 331 to be detected and the 2 nd image 332 to be detected respectively. The 1 st target detection box 3311 and the 2 nd target detection box 3321 may characterize a soccer ball in the soccer field 320.

In the 1 st to-be-detected image 331, the pixel corresponding to the midpoint of the target detection frame of the 1 st target detection frame 3311 is the 1 st object pixel. In the 2 nd image to be detected 332, the pixel corresponding to the midpoint of the target detection frame of the 2 nd target detection frame 3321 is the 2 nd object pixel.

As shown in fig. 3, a first imaging point geographical position 341 corresponding to the 1 st camera 311 can be obtained from the device attribute parameter of the 1 st camera 311 and the world coordinate position (device geographical position) of the 1 st camera 311 lens. From the device attribute parameters of the 2 nd camera 312, and the world coordinate position (device geographic position) of the lens of the 2 nd camera 312, a first imaging point geographic position 342 corresponding to the 2 nd camera 312 can be obtained.

As shown in fig. 3, the 1 st imaging ray 351 passing through the device geographic location of the 1 st camera 311 and the first imaging point geographic location 341 corresponding to the 1 st camera 311 may be represented as-T ₁ +nV ₁ . The device geographic location through the 2 nd camera 312, and the 2 nd imaging ray 352 through the first imaging point geographic location 342 corresponding to the 2 nd camera 312, may be represented as-T ₂ +nV ₂ . The first intersection element 351A and the second intersection element 352B are set to be respectively positioned at the 1 st imaging ray 351 and the 2 nd imaging ray On line 352. The coordinate position of the first intersection element 351A in the world coordinate system may be expressed as-T ₁ +n ₁ V ₁ The method comprises the steps of carrying out a first treatment on the surface of the The coordinate position of the second intersection element 352B in the world coordinate system may be represented as-T ₂ +n ₂ V ₂ . The vertical vector between the first intersection element 351A and the second intersection element 352B may be represented as E _AB . Vertical vector E _AB Satisfying the condition that the 1 st imaging ray 351 is perpendicular and the 2 nd imaging ray 352 is perpendicular, E can be made _AB The constraint condition of the following formula (5) is satisfied.

Therefore, the coordinate position of the first intersection element 351A in the world coordinate system can be calculated, and the coordinate position of the second intersection element 352B in the world coordinate system can be calculated. After determining the first intersection element 351A and the second intersection element 352B, a midpoint of a line segment constituted by the first intersection element 351A and the second intersection element 352B may be determined as the candidate point element 361C. The coordinate position of the candidate point element 361C in the world coordinate system, i.e., the geographic position of the candidate point element 361C, may also be calculated based on the coordinate positions of the first intersection element 351A and the second intersection element 352B, respectively, in the world coordinate system.

It should be noted that, the inventor creatively finds that the device attribute parameters of each of the 1 st photographing device 311 and the 2 nd photographing device 312 may be calibrated by a calibration method, and the device attribute parameters may have errors. Accordingly, there may be an error in the 1 st object pixel position of the 1 st object pixel and the 2 nd object pixel position of the 2 nd object pixel. The accumulation of errors may result in the generated first imaging point geographic location 341 corresponding to camera 311 1 and the first imaging point geographic location 342 corresponding to camera 312 not converging to the same geographic location in the world coordinate system. And the target object detection method provided according to the embodiment of the present disclosure is based on the vertical vector E _AB Determining a first intersection element 351A and a second intersection elementIn element 352B, the candidate point element 361C is determined according to the midpoint of the line segment connecting the first intersection element 351A and the second intersection element 352B, and the candidate point element 361C may be an error between the first imaging point geographic position 341 and the first imaging point geographic position 342. Therefore, the target geographic position obtained according to the candidate geographic position of the candidate point element (namely the coordinate position of the candidate point element 361C in the world coordinate system) can at least partially reduce the error between the target geographic position and the real geographic position of the target object, and the accuracy of the position detection of the target object in the world coordinate system is improved.

Note that, the 1 st imaging ray 351 shown in fig. 3 may be a 1 st object pixel passing through the lens center of the 1 st photographing device 311 and passing through the 1 st object pixel corresponding to the 1 st object detection frame 3311, so that the 1 st imaging ray 351 in fig. 3 does not pass through the 1 st object pixel corresponding to the 1 st object detection frame 3311 and is not used to define the positional relationship between the 1 st imaging ray 351 and the 1 st object detection frame 3311 in order to clearly describe the positional conversion relationship between the 1 st object pixel corresponding to the 1 st object detection frame 3311 and the first imaging point geographic position 341.

Accordingly, for the same or similar reasons, the 2 nd imaging ray 352 may pass through the 2 nd object pixel corresponding to the 2 nd target detection frame 3321, and the 2 nd imaging ray 352 shown in fig. 3 does not pass through the 2 nd object pixel corresponding to the 2 nd target detection frame 3321, and is not used to define the positional relationship between the 2 nd imaging ray 352 and the 2 nd target detection frame 3321.

According to an embodiment of the present disclosure, the candidate point element includes a plurality of.

For example, in the case where the image to be detected includes 3, i.e., n=3. Each image to be detected may correspond to a respective camera, i.e. the 1 st camera corresponds to the 1 st image to be detected, the 2 nd camera corresponds to the 2 nd image to be detected, and the 3 rd camera corresponds to the 3 rd image to be detected. It should be understood that according to the target object detection method provided by the embodiment of the present disclosure, the 1 st imaging ray corresponding to the 1 st photographing device, the 2 nd imaging ray corresponding to the 2 nd photographing device, and the 3 rd imaging ray corresponding to the 3 rd photographing device may be obtained. A first vertical vector perpendicular to the 1 st imaging ray and perpendicular to the 2 nd imaging ray is determined as E1, and a midpoint P1 of a first vertical segment between the first vertical vector E1 and the 1 st imaging ray and the 2 nd imaging ray may be used as the first candidate point element P1.

Based on the same or corresponding manner, it is also possible to obtain a second candidate point element P2 corresponding to the 2 nd imaging ray and the 3 rd imaging ray, and to obtain a third candidate point element P3 corresponding to the 1 st imaging ray and the 3 rd imaging ray.

According to an embodiment of the present disclosure, determining a target geographic location of a target object from candidate geographic locations of candidate point elements includes: and obtaining the target geographic position according to the candidate geographic positions of each of the candidate point elements.

According to an embodiment of the present disclosure, in the case where the image to be detected includes N, it is possible to generateCandidate point elements. The target geographic position is obtained according to the candidate geographic positions of the candidate point elements, and the +.>The weighted average result obtained for each candidate geographic location of the candidate point elements may be determined as the target geographic location.

According to the embodiment of the disclosure, the candidate geographic positions of each of the plurality of candidate point elements may also be processed according to other algorithms, for example, the candidate geographic positions of each of the plurality of candidate point elements may also be processed based on a clustering algorithm, and the obtained candidate point element corresponding to the clustering center may be the target point element, so that the target geographic position is determined according to the target point element. The embodiment of the present disclosure does not limit the specific algorithm type for processing the candidate geographic locations of each of the plurality of candidate point elements, and those skilled in the art may select according to actual requirements.

According to the embodiment of the disclosure, by obtaining the target geographic position according to the candidate geographic positions of each of the candidate point elements, the device attribute parameters of each of the plurality of photographing devices and the angles of view of the plurality of photographing devices can be further integrated to detect the position of the target object in the world coordinate system, so that the negative influence of the device attribute parameter errors of the photographing devices and the imaging errors of the object pixels in the image to be detected on the geographic position of the detected target object is reduced, and the detection precision of the geographic position of the target object is improved.

The target object detection method provided by the embodiment of the disclosure can be used for detecting the accurate position of the football in the retransmission process of the football match, for example, whether the football passes through a goal line, whether the football goes out of bounds or not can be detected according to the target object detection method provided by the embodiment of the disclosure. However, the method is not limited thereto, and the method can be applied to other application scenarios, for example, in scenarios such as unmanned vehicle driving position detection, and the application scenario of the target object detection method is not limited in the embodiments of the present disclosure.

According to an embodiment of the present disclosure, obtaining candidate point elements characterizing the target object according to imaging rays corresponding to each of the N photographing devices may further include: and determining the intersection point of the imaging ray and the reference surface as a candidate point element, wherein the target object moves on the reference surface.

According to embodiments of the present disclosure, the reference surface may be a surface that limits the movement of the target object, such as a soccer field that limits the movement of a soccer ball on a soccer field plane during the course of a soccer ball. By determining the coordinate position of the reference plane in the world coordinate system and determining the imaging ray according to the target object detection method provided by the embodiment of the disclosure, the intersection point of the imaging ray and the reference plane can be calculated

Fig. 4 schematically illustrates an application scenario diagram of a target object detection method according to another embodiment of the present disclosure.

As shown in fig. 4, the application scene 400 may include a 3 rd photographing device 413, a 4 th photographing device 413, and a soccer field 420 (reference surface). The 3 rd photographing device 413 and the 4 th photographing device 414 can perform image acquisition on the football field 420 at the same image acquisition time, and respectively obtain a 3 rd image to be detected 433 and a 4 th image to be detected 434.

Target detection is performed on the 3 rd to-be-detected image 433 and the 4 th to-be-detected image 434, respectively, so that a 3 rd target detection frame 4331 corresponding to the 3 rd to-be-detected image 433 and a 4 th target detection frame 4341 corresponding to the 4 th to-be-detected image 434 can be obtained. The 3 rd target detection box 4331 and the 4 th target detection box 4341 may characterize soccer in the soccer field 420.

In the 3 rd to-be-detected image 433, the pixel corresponding to the midpoint of the target detection frame of the 3 rd target detection frame 4331 is the 3 rd object pixel. In the 4 th to-be-detected image 434, the pixel corresponding to the midpoint of the target detection frame of the 4 th target detection frame 4341 is the 4 th object pixel.

As shown in fig. 4, from the device attribute parameter of the 3 rd photographing device 413, the world coordinate position (device geographical position) of the lens of the 3 rd photographing device 413, and the 3 rd object pixel, the intersection point of the 3 rd imaging ray 453 and the soccer field 420 can be obtained, that is, the reference plane intersection point 441 where the 3 rd imaging ray 453 and the soccer field 420 intersect. According to the device attribute parameter of the 4 th photographing device 414, the world coordinate position (device geographical position) of the lens of the 4 th photographing device 414, and the 4 th object pixel, the intersection point of the 4 th imaging ray 454 and the football field 420 can be obtained, that is, the intersection point 442 of the reference plane where the 4 th imaging ray 454 and the football field 420 intersect.

Accordingly, reference surface intersection point 441 and reference surface intersection point 442 may be determined as candidate point elements, respectively.

According to an embodiment of the present disclosure, determining the target geographic location of the target object from the candidate geographic locations of the candidate point elements may further include: determining candidate point elements corresponding to the N shooting devices respectively; and obtaining the target geographic position according to the candidate geographic positions of the N candidate point elements.

As shown in fig. 4, in the case where n=2, the average value of the reference plane intersection geographic positions of the reference plane intersection point 441 and the reference plane intersection point 442 may be calculated, that is, the candidate geographic positions of the two different candidate point elements may be calculated, to obtain the target geographic position of the target point 461.

According to the embodiment of the disclosure, the target geographic position of the characterization target object in the world coordinate system is obtained according to the candidate geographic positions of each of the candidate point elements, so that the error between the target geographic position and the real geographic position of the target object can be reduced, and the accuracy of position detection of the target object in the world coordinate system is improved.

Note that, the 3 rd imaging ray 453 shown in fig. 4 may be a 3 rd object pixel passing through the lens center of the 3 rd photographing device 413 and passing through the 3 rd object detection frame 4331, and in order to clearly describe the positional conversion relationship between the 3 rd object pixel corresponding to the 3 rd object detection frame 4331 and the reference plane intersection point 441, the 3 rd imaging ray 453 in fig. 4 does not pass through the 3 rd object pixel corresponding to the 3 rd object detection frame 4331, and is not used to define the positional relationship between the 3 rd imaging ray 453 and the 3 rd object detection frame 4331.

Accordingly, for the same or similar reasons, the 4 th imaging ray 454 may pass through the 4 th object pixel corresponding to the 4 th target detection frame 4341, and the 4 th imaging ray 454 shown in fig. 4 does not pass through the 4 th object pixel corresponding to the 4 th target detection frame 4341, but is not used to define the positional relationship between the 4 th imaging ray 454 and the 4 th target detection frame 4341.

According to an embodiment of the disclosure, the target geographic location includes a plurality of target geographic locations, and the target geographic locations are associated with image acquisition moments when images to be detected are taken.

Fig. 5 schematically illustrates a flowchart of a target object detection method according to another embodiment of the present disclosure.

As shown in fig. 5, the target object detection method may further include operations S510 to S520.

In operation S510, the plurality of target geographic locations are ordered according to the image acquisition time corresponding to each of the plurality of target geographic locations, to obtain a target geographic location sequence.

In operation S520, the target geographic position sequence is smoothed to obtain a motion trajectory of the target object.

According to the embodiment of the disclosure, the plurality of target geographic positions can be respectively associated with the image acquisition time, so that the plurality of target geographic positions can be arranged according to the time sequence relation of the respectively associated image acquisition time to obtain target geographic position sequences corresponding to different image acquisition times in the same space. The target geographic position sequence can initially represent the motion trail condition of the target object in a time period formed by a plurality of image acquisition moments.

According to the embodiment of the disclosure, the target geographic position sequence can be smoothed based on a fitting algorithm, for example, the target geographic position sequence can be smoothed based on a least square method, and a motion trail is obtained. However, the present invention is not limited thereto, and the target geographic position sequence may be smoothed based on other types of algorithms, for example, a five-point cubic smoothing algorithm, a spline difference function smoothing algorithm, or the like. The embodiments of the present disclosure do not limit the type of algorithm that performs the smoothing process.

According to the embodiment of the disclosure, the obtained motion trail can be used for accurately representing the motion trail development condition of the target object in the world coordinate system by carrying out smoothing processing on the target geographic position sequence, and the three-dimensional motion trail representing the motion of the target object in the space can be obtained. Therefore, the method is beneficial to relevant technicians to clearly and accurately analyze sports technology and evaluate technical capability of relevant personnel such as athletes, and improves the efficiency and accuracy of the subsequent technical analysis process.

According to an embodiment of the present disclosure, there is also provided a calibration method of a photographing apparatus, including:

Performing target detection on the initial image to obtain an identification object thermodynamic diagram corresponding to the target identification object; determining an object homography matrix according to the identification object pixels of the identification object, which represent the target identification object, and the geographic position of the target identification object; and determining device attribute parameters of the shooting device according to the object homography matrix, wherein the initial image is obtained after the shooting device acquires the image of the area containing the target identification object.

According to an embodiment of the present disclosure, the object homography matrix is adapted to convert object pixel locations of the identification object pixels in the pixel coordinate system to target identification object geographic locations of the target identification object in the world coordinate system.

According to an embodiment of the present disclosure, the target identification object comprises any one or more of an identification line, an identification point, an identification pattern of the ground in the initial image.

For example, the target identification object may be a mark point such as a tee point, a ball point in a soccer field, or the like. For another example, the target identification object may also be an identification line such as a border line, a forbidden zone line, a goal line, or the like in a soccer field. For another example, the target identification object may be a middle graph arc pattern, a forbidden zone top arc pattern, a goal pattern, or the like in the soccer field.

According to an embodiment of the present disclosure, an initial image is subject to object detection, and the initial image is input to a semantic segmentation network model. The semantic segmentation network model may include an image feature extraction network, a feature fusion network, and a thermodynamic diagram output network.

According to embodiments of the disclosure, the image feature extraction network may be constructed based on a convolutional neural network algorithm, the feature fusion network may be constructed based on a hole space convolutional pooling pyramid (Atrous Spatial Pyramid Pooling, ASPP), and the thermodynamic diagram output network may be constructed based on a multi-layer perceptron algorithm. In the thermodynamic diagram of the identification object output by the semantic segmentation network model, the target identification object positioned on the ground in the initial image can be used as a foreground image, for example, gaussian distribution around the target identification object is used as foreground image pixels, so that the foreground image area can be enlarged for the field area with more target identification objects such as football courts, basketball courts and the like, and the foreground image pixels (identification object pixels) with more data volume can be obtained. Therefore, the data volume for calibrating the attribute parameters of the shooting device can be improved, and the calibration precision of the shooting device is improved.

According to the embodiment of the disclosure, the calibration of the shooting device is realized according to the initial image acquired by the shooting device to the target object, the calibration of the shooting device by using the nonstandard condition can be realized, the technical problem of low efficiency generated by calibrating auxiliary components such as a calibration plate is reduced, and the calibration efficiency of the shooting device is improved.

In accordance with an embodiment of the present disclosure, in the case where the target identification object includes a plurality of target identification objects, each target identification object may correspond to a single identification object thermodynamic diagram. The value identifying the (x, y) pixel location in the object thermodynamic diagram may be represented by the following equation (6).

In equation (6), (x, y) is the pixel position in the initial; (mu) ₁ ，μ ₂ ) Pixel coordinate locations that are target identification objects; sigma (sigma) ₁ ，σ ₂ Is a constant, and represents the variance of the Gaussian distribution in the x and y directions; h represents a value identifying the (x, y) pixel location in the object thermodynamic diagram.

Fig. 6 schematically illustrates a block diagram of a target object detection apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the target object detection apparatus 600 may include: an object pixel determination module 610, an imaging ray determination module 620, a candidate point element determination module 630, and a target geographic location determination module 640.

The object pixel determining module 610 is configured to determine object pixels representing a target object in an image to be detected, where the image to be detected includes N, where N is an integer greater than 1, and the N images to be detected are respectively captured by respective corresponding capturing devices.

The imaging ray determination module 620 is configured to determine, according to the device geographical location and the device attribute parameter of the camera, an imaging ray passing through the object pixel and the camera corresponding to the object pixel.

The candidate point element determining module 630 is configured to obtain candidate point elements that characterize the target object according to imaging rays corresponding to the N cameras.

The target geographic position determining module 640 is configured to determine a target geographic position of the target object according to the candidate geographic positions of the candidate point elements.

Wherein the imaging ray determination module comprises: a first imaging point geographical position obtaining unit and a first imaging ray obtaining unit.

And the first imaging point geographic position obtaining unit is used for processing the object pixel position of the object pixel according to the device internal parameter information to obtain the first imaging point geographic position.

The first imaging ray obtaining unit is used for obtaining imaging rays corresponding to the shooting device according to the geographic position of the first imaging point, the geographic position of the device and the device external parameter information.

According to an embodiment of the present disclosure, the candidate point element determination module includes: a second imaging ray obtaining unit, a vertical vector obtaining unit, and a first candidate point element obtaining unit.

And the second imaging ray obtaining unit is used for determining an ith imaging ray and a jth imaging ray from the imaging rays corresponding to the N shooting devices respectively, wherein i and j are integers which are larger than 0 and smaller than or equal to N, and i is not equal to j.

A vertical vector obtaining unit for determining a vertical vector perpendicular to the ith imaging ray and perpendicular to the jth imaging ray.

A first candidate point element obtaining unit for determining a candidate point element according to a first intersection point element where the vertical vector intersects the ith imaging ray and a second intersection point element where the vertical vector intersects the jth imaging ray.

The target object geographic position determining module comprises a first target geographic position obtaining unit.

The first target geographic position obtaining unit is used for obtaining the target geographic position according to the candidate geographic positions of the candidate point elements.

According to an embodiment of the present disclosure, the candidate point element determination module may further include a second candidate point element obtaining unit.

And the second candidate point element obtaining unit is used for determining the intersection point of the imaging ray and the reference plane as the candidate point element, wherein the target object moves on the reference plane.

According to an embodiment of the present disclosure, the target geographic location determination module may further include: a third candidate point element obtaining unit and a second target geographic location obtaining unit.

And the third candidate point element obtaining unit is used for determining candidate point elements corresponding to the N photographing devices respectively.

The second target geographic position obtaining unit is used for obtaining the target geographic position according to the candidate geographic positions of the N candidate point elements.

The target object detection device further includes: and the target geographic position sequence obtaining module and the motion trail obtaining module.

The target geographic position sequence obtaining module is used for sequencing the plurality of target geographic positions according to the image acquisition time corresponding to the plurality of target geographic positions respectively to obtain a target geographic position sequence.

And the motion track obtaining module is used for carrying out smoothing processing on the target geographic position sequence to obtain the motion track of the target object.

According to an embodiment of the present disclosure, the object pixel determination module includes: a target detection frame obtaining unit and an object pixel determining unit.

The target detection frame obtaining unit is used for carrying out target detection on the image to be detected to obtain a target detection frame corresponding to the target object; and

And the object pixel determining unit is used for determining the object pixels from the image to be detected according to the target detection frame.

According to an embodiment of the present disclosure, an object pixel determination unit includes: the first object pixel determination subunit and/or the second object pixel determination subunit.

And the first object pixel determining subunit is used for determining a pixel corresponding to the center point of the target detection frame in the image to be detected as an object pixel.

And the second object pixel determining subunit is used for determining a pixel corresponding to the midpoint of the bottom frame of the target detection frame in the image to be detected as an object pixel.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, for example, a target object detection method. For example, in some embodiments, the target object detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the target object detection method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the target object detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A target object detection method, comprising:

determining object pixels representing a target object in an image to be detected, wherein the image to be detected comprises N, N images to be detected are respectively obtained by shooting by corresponding shooting devices, and N is an integer larger than 1;

determining imaging rays passing through the object pixels and the shooting devices corresponding to the object pixels according to the device geographic positions and the device attribute parameters of the shooting devices;

Obtaining candidate point elements representing the target object according to the imaging rays corresponding to the N shooting devices; and

determining a target geographic position of the target object according to the candidate geographic position of the candidate point element,

the obtaining candidate point elements representing the target object according to the imaging rays corresponding to the N shooting devices respectively includes:

from the imaging rays corresponding to the N shooting devices, determining an ith imaging ray and a jth imaging ray, wherein both the ith imaging ray and the jth imaging ray are integers which are larger than 0 and smaller than or equal to N, and the i is not equal to the j;

determining a vertical vector perpendicular to the ith imaging ray and perpendicular to the jth imaging ray; and

and determining the candidate point element according to a first intersection point element of the vertical vector intersecting with the ith imaging ray and a second intersection point element of the vertical vector intersecting with the jth imaging ray.

2. The method of claim 1, wherein the device attribute parameters include device intrinsic information and device extrinsic information;

wherein determining, according to the device geographical location and the device attribute parameter of the photographing device, an imaging ray passing through the object pixel and the photographing device corresponding to the object pixel includes:

Processing the object pixel position of the object pixel according to the device internal reference information to obtain a first imaging point geographic position; and

and obtaining imaging rays corresponding to the shooting device according to the geographic position of the first imaging point, the geographic position of the device and the device external parameter information.

3. The method of claim 1, wherein the candidate point element comprises a plurality of;

wherein, the determining the target geographic position of the target object according to the candidate geographic position of the candidate point element includes:

and obtaining the target geographic position according to the candidate geographic positions of each of the candidate point elements.

4. The method of claim 1, wherein the target geographic location comprises a plurality of target geographic locations, the target geographic locations being associated with image acquisition moments at which the images to be detected were taken;

the target object detection method further comprises the following steps:

sequencing the plurality of target geographic positions according to the image acquisition time corresponding to each of the plurality of target geographic positions to obtain a target geographic position sequence; and

and carrying out smoothing treatment on the target geographic position sequence to obtain the motion trail of the target object.

5. The method of claim 1, wherein the determining object pixels representing a target object in the image to be detected comprises:

performing target detection on the image to be detected to obtain a target detection frame corresponding to the target object; and

and determining the object pixel from the image to be detected according to the target detection frame.

6. The method of claim 5, wherein said determining the object pixel from the image to be detected according to the target detection box comprises:

determining a pixel corresponding to a center point of a target detection frame of the target detection frame in the image to be detected as the target pixel; and/or

And determining a pixel corresponding to the midpoint of the bottom frame of the target detection frame in the image to be detected as the object pixel.

7. A target object detection apparatus comprising:

the object pixel determining module is used for determining object pixels representing target objects in images to be detected, the images to be detected comprise N, N images to be detected are respectively obtained by shooting through corresponding shooting devices, and N is an integer larger than 1;

an imaging ray determination module for determining an imaging ray passing through the object pixel and a camera corresponding to the object pixel according to a device geographical position and a device attribute parameter of the camera;

The candidate point element determining module is used for obtaining candidate point elements representing the target object according to the imaging rays corresponding to the N shooting devices; and

a target geographic position determining module for determining a target geographic position of the target object according to the candidate geographic positions of the candidate point elements,

wherein the candidate point element determining module includes:

the second imaging ray obtaining unit is used for determining an ith imaging ray and a jth imaging ray from the imaging rays corresponding to the N shooting devices respectively, wherein both the ith imaging ray and the jth imaging ray are integers which are larger than 0 and smaller than or equal to N, and the i is not equal to the j;

a vertical vector obtaining unit configured to determine a vertical vector perpendicular to the i-th imaging ray and perpendicular to the j-th imaging ray; and

a first candidate point element obtaining unit, configured to determine the candidate point element according to a first intersection point element where the vertical vector intersects with the ith imaging ray and a second intersection point element where the vertical vector intersects with the jth imaging ray.

8. The apparatus of claim 7, wherein the apparatus attribute parameters include apparatus intrinsic information and apparatus extrinsic information;

wherein the imaging ray determination module comprises:

A first imaging point geographical position obtaining unit, configured to process the object pixel position of the object pixel according to the device internal parameter information, so as to obtain a first imaging point geographical position; and

9. The apparatus of claim 7, wherein the candidate point element comprises a plurality of;

wherein the target geographic location determination module comprises:

10. The apparatus of claim 7, wherein the target geographic location comprises a plurality of target geographic locations, the target geographic locations being associated with an image acquisition moment at which the image to be detected was taken;

the target object detection apparatus further includes:

the target geographic position sequence obtaining module is used for sequencing the plurality of target geographic positions according to the image acquisition time corresponding to each of the plurality of target geographic positions to obtain a target geographic position sequence; and

11. The apparatus of claim 7, wherein the object pixel determination module comprises:

and the object pixel determining unit is used for determining the object pixel from the image to be detected according to the target detection frame.

12. The apparatus of claim 11, wherein the object pixel determination unit comprises:

a first object pixel determining subunit, configured to determine, as the object pixel, a pixel corresponding to a target detection frame center point of the target detection frame in the image to be detected; and/or

And the second object pixel determining subunit is used for determining a pixel corresponding to the midpoint of the bottom frame of the target detection frame in the image to be detected as the object pixel.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 6.