CN117459687B

CN117459687B - Multi-scene information on-screen display system and method

Info

Publication number: CN117459687B
Application number: CN202311585481.2A
Authority: CN
Inventors: 杨冬
Original assignee: Beijing Rongzhi Yunwang Technology Co ltd
Current assignee: Beijing Rongzhi Yunwang Technology Co ltd
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-04-30
Anticipated expiration: 2043-11-24
Also published as: CN117459687A

Abstract

The invention provides a multi-scene information on-screen display system and a method, which relate to the technical field of computers, wherein the system comprises: camera, treater and display, the treater is used for: receiving a display mode selected by a user; receiving user input information; determining a target video, an alternative video and a thumbnail video according to the display mode and the user input information; determining the positions and the sizes of a main display window and an alternative display window according to at least one of a display mode, user input information and a video to be processed; and controlling the display to display the target video in the main display window, displaying the alternative video in the alternative display window, and displaying the thumbnail video in the thumbnail display window. According to the invention, the target video can be displayed in the main display window, the candidate video of the scene to which the target object possibly goes in the future is displayed in the candidate display window, searching is not required to be carried out with great manpower, the target object and the scene possibly appearing in the future can be intuitively displayed, and the display effect is improved.

Description

Multi-scene information on-screen display system and method

Technical Field

The invention relates to the technical field of computers, in particular to a multi-scene information on-screen display system and method.

Background

In the related art, videos or images from different signal sources can be displayed on a display at the same time, for example, monitoring videos shot by a plurality of cameras can be displayed on the display at the same time, and videos to be displayed can be selected manually. However, when the target object in the video moves between different scenes shot by different cameras, the efficiency of the manner of manually searching the target object in the plurality of videos is low, and the future scene of the target object cannot be predicted while the video of the scene where the target object is located is displayed, so that after the target object continues to move to other scenes, a great deal of manpower is required to be expended again for searching, the working efficiency is reduced, and the display effect is poor.

The information disclosed in the background section of the application is only for enhancement of understanding of the general background of the application and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention provides a multi-scene information on-screen display system and a multi-scene information on-screen display method, which can solve the problem that the efficiency of a mode of manually searching a target object in a plurality of videos is low.

According to a first aspect of the present invention, there is provided a multi-scene information on-screen display system, comprising: a plurality of cameras, a processor, and a display;

the plurality of cameras are respectively used for shooting a plurality of scenes to obtain videos to be processed of each scene;

The display is used for displaying the video to be processed selected by the processor according to the display mode set by the processor;

The processor is configured to:

Receiving a display mode selected by a user, wherein the display mode comprises a manual selection display mode, a target tracking display mode and a key scene display mode, the manual selection display mode is used for receiving a selection instruction of the user for a video to be processed and controlling the display to display the selected video to be processed according to the selection instruction, the target tracking display mode is used for determining a scene where a target object appears according to a tracked action route of the target object and controlling the display to display the video to be processed corresponding to the scene, and the key scene display mode is used for determining importance of each scene and controlling the display to display the video to be processed according to the importance;

receiving user input information matched with the display mode;

Determining target videos, alternative videos and thumbnail videos in videos to be processed of a plurality of scenes according to the display mode and the user input information;

determining the position and the size of a main display window and the position and the size of an alternative display window according to at least one of the display mode, the user input information and the videos to be processed of a plurality of scenes;

and controlling the display to display the target video in a main display window, displaying the alternative video in an alternative display window, and displaying at least a part of the thumbnail video in a thumbnail display window.

According to the present invention, the user input information matched with the manual selection display mode includes a selection instruction for a video to be processed, a position setting instruction and a size setting instruction of the main display window, and a position setting instruction, a size setting instruction and a number setting instruction of the alternate display window.

According to the present invention, the user input information matched with the target tracking display mode includes reference information of the target object, a position setting instruction and a size setting instruction of the main display window, and a position setting instruction, a size setting instruction and a number setting instruction of the alternative display window.

According to the present invention, according to the display mode and the user input information, determining a target video, an alternative video and a thumbnail video in videos to be processed of a plurality of scenes, includes:

Searching the reference information of the target object in the video frames of the current moments of the videos to be processed of a plurality of scenes, determining the video to be processed, which belongs to the video frames comprising the target object, as the target video, and determining the scene corresponding to the target video as the target scene;

According to the design information of the scenes, determining alternative scenes with a communication relation with the target scene in the multiple scenes;

Determining the image position information of the target object in a plurality of video frames of the historical moment of the target video, and determining the image position information of the target object in the video frame of the current moment of the target video;

Determining geographic position information of the target object at a plurality of moments according to the image position information;

Determining the orientation information of the target object in a video frame of the current moment of the target video;

Determining the occurrence probability of the target object in each alternative scene in the future according to the geographic position information, the orientation information of the target object and the design information of the alternative scene;

selecting a number of alternative videos corresponding to the number setting instruction from the videos to be processed corresponding to the alternative scenes according to the occurrence probability;

And determining the videos to be processed except the target video and the alternative video as the thumbnail video.

According to the present invention, determining the occurrence probability of the target object in each candidate scene in the future according to the geographic position information, the orientation information of the target object, and the design information of the candidate scenes, includes:

Fitting the geographical position information with the historical time and the current time to obtain a first predicted route;

Obtaining a second predicted route according to the orientation information of the target object and the geographic position information of the target object at the current moment, wherein the second predicted route is a straight line route;

According to the formula

Determining the occurrence probability P _i of the target object in the ith alternative scene in the future, wherein L ₁ is a first predicted route, L ₂ is a second predicted route,B _i (x, y) is a curved surface equation of the ith alternative scene, and ds is a curved surface infinitesimal, wherein the curved surface equation is an area surrounded by the first predicted route and the second predicted route.

According to the invention, the user input information matched with the key scene display mode comprises a number setting instruction of the alternative display windows.

respectively selecting video frames at a plurality of historical moments from videos to be processed of a plurality of scenes;

For a video frame of a history time which is selected from the same video to be processed and is separated from the current time by a preset time period and a video frame of the current time, determining the movement direction angles of a plurality of target objects in the video frame of the current time and the movement rates of the plurality of target objects;

determining human body key points of the target objects;

Determining importance scores of all videos to be processed according to the human body key points, the movement direction angles of the plurality of target objects and the movement rates of the plurality of target objects;

and determining the video to be processed with the highest importance score as a target video, determining the videos to be processed with the importance score ranking from 2 to n+1 as candidate videos, and determining the videos to be processed with the importance score ranking from n+2 and later as thumbnail videos, wherein n is the number of candidate display windows corresponding to the number setting instruction.

According to the invention, according to the human body key point, the movement direction angles of a plurality of target objects and the movement rates of a plurality of target objects, the importance scores of the videos to be processed are determined, and the method comprises the following steps:

According to the formula

Determining a motion consistency score S _j of the jth video to be processed, wherein x _j,_k,_t is the abscissa of a vector of the jth human body key point of the kth target object in the jth video to be processed relative to the centroid of the selection frame of the kth target object, y _j,_k,_t is the ordinate of a vector of the kth human body key point of the kth target object in the jth video to be processed relative to the centroid of the selection frame of the kth target object,For the average value of the abscissa of the vector of the t-th human keypoint of a plurality of target objects in the j-th video to be processed relative to the centroid of the respective selection frame,/>The method comprises the steps that the average value of the ordinate of a vector of a t-th human body key point of a plurality of target objects in a j-th video to be processed relative to the centroid of each selection frame is taken, m is the number of human body key points, N is the number of target objects, k is less than or equal to N, t is less than or equal to m, k, t, N and m are positive integers, and min is a minimum function;

According to the formula

Determining a motion direction consistency score D _j of a jth video to be processed, wherein theta _j,_k is a motion direction angle of a kth target object in the jth video to be processed, D (theta _j,_k) is a variance value of motion direction angles of a plurality of target objects in the jth video to be processed, and E (theta _j,_k) is an expected value of motion direction angles of a plurality of target objects in the jth video to be processed;

According to the formula

Determining a motion rate consistency score of a jth video to be processed, wherein v _j,_k is the motion rate of a kth target object in the jth video to be processed, D (v _j,_k) is a variance value of the motion rates of a plurality of target objects in the jth video to be processed, and E (v _j,_k) is an expected value of the motion rates of a plurality of target objects in the jth video to be processed;

and carrying out weighted summation on the motion consistency score of the jth video to be processed, the motion direction consistency score of the jth video to be processed and the motion rate consistency score of the jth video to be processed to obtain the importance score of the jth video to be processed.

According to a second aspect of the present invention, there is provided a multi-scene information on-screen display method, including:

Receiving a display mode selected by a user, wherein the display mode comprises a manual selection display mode, a target tracking display mode and a key scene display mode, the manual selection display mode is used for receiving a selection instruction of the user for a video to be processed and controlling a display to display the selected video to be processed according to the selection instruction, the target tracking display mode is used for determining a scene where a target object appears according to a tracked action route of the target object and controlling the display to display the video to be processed corresponding to the scene, and the key scene display mode is used for determining importance of each scene and controlling the display to display the video to be processed according to the importance;

receiving user input information matched with the display mode;

According to a third aspect of the present invention, there is provided a multi-scene information on-screen display apparatus, comprising: a processor; a memory for storing processor-executable instructions; the processor is configured to call the instructions stored in the memory to execute the multi-scene information on-screen display method.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the multi-scene information co-screen presentation method.

The technical effects are as follows: according to the method and the device for displaying the target object, the target video to be displayed can be screened based on the selected display mode, the target video, the candidate video and the thumbnail video are determined, the position and the size of the main display window and the position and the size of the candidate display window are determined, so that the target video of the target object is displayed in the main display window, the candidate video of the scene to which the target object possibly goes in the future can be displayed in the candidate display window, the target object and the scene which possibly appears in the future can be intuitively displayed without spending a large amount of manpower for searching, and the display effect is improved. When the occurrence probability of the target object in the alternative scenes is determined, a first predicted route can be determined based on the geographic position information of the historical moment, and a second predicted route can be determined based on the orientation information and the geographic position information of the current moment, so that the area surrounded by the first predicted route and the second predicted route can be determined as a set of routes for the target object to walk in the future, the position where the target object possibly appears in the future and the occurrence probability of the target object in the future in each alternative scene can be determined, the occurrence probability of the target object can be determined based on both the current moment and the historical moment, and the accuracy of the occurrence probability is improved. When determining the importance scores of all videos to be processed, the importance scores of the videos to be processed can be determined jointly by using the action consistency, the movement direction consistency and the movement speed consistency of a plurality of target objects, the accuracy and objectivity of the importance scores are improved, and when determining the action consistency scores, the influences of the visual angles, the distance, the height and the like of the target objects are removed by using the vectors formed by the abscissa-ordinate ratios of the vectors of the key points of the human bodies, and the accuracy of the action consistency scores is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed. Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the invention or the solutions of the prior art, the drawings which are necessary for the description of the embodiments or the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other embodiments may be obtained from these drawings without inventive effort to a person skilled in the art,

FIG. 1 schematically illustrates a schematic diagram of a scene information on-screen presentation system according to an embodiment of the invention;

Fig. 2 exemplarily shows a flowchart of a scene information on-screen presentation method according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 schematically illustrates a scene information on-screen presentation system according to an embodiment of the invention, the system comprising: a plurality of cameras, a processor, and a display;

The processor is configured to:

receiving user input information matched with the display mode;

According to the multi-scene information same-screen display system provided by the embodiment of the invention, the video to be displayed to be processed can be screened based on the selected display mode, the target video, the alternative video and the thumbnail video are determined, the position and the size of the main display window and the position and the size of the alternative display window are determined, so that the target video of the target object is displayed in the main display window, the alternative video of the scene possibly to which the target object is in the future can be displayed in the alternative display window, and therefore, the target object and the scene possibly to which the target object is in the future can be intuitively displayed without spending a large amount of manpower for searching, and the display effect is improved.

According to one embodiment of the invention, the camera can be arranged in various areas such as stadiums, markets, streets and office areas and is respectively used for shooting a plurality of scenes in the areas to obtain the video to be processed of each scene.

According to one embodiment of the invention, the processor may perform analysis, filtering, etc. among a plurality of videos to be processed and control the display to display one or more of the videos to be processed.

According to one embodiment of the invention, the user may issue instructions to the processor to select the display mode. The display modes include a manual selection display mode, a target tracking display mode and a key scene display mode. In the manual selection mode, the user can manually select a video to be processed to be displayed on the display, or can manually set the position and size of the display window of each video to be processed in the case where a plurality of videos to be processed are simultaneously displayed. In the target tracking display mode, a user can input a target object to be tracked, and the processor can search a plurality of videos to be processed for the target object and control the display to display the videos to be processed of the scene where the target object is located. In the highlight scene display mode, the processor may determine the importance of the content in each of the videos to be processed and display on the display according to the importance.

According to one embodiment of the present invention, the user may also input user input information to the processor that matches the display mode, such that the processor can control the display to display a corresponding video to be processed based on the user input information. The user input information matched with the manual selection display mode includes a selection instruction for a video to be processed, a position setting instruction and a size setting instruction of the main display window, and a position setting instruction, a size setting instruction and a number setting instruction of the alternative display window.

According to one embodiment of the present invention, the selection instruction for the video to be processed may be used to manually select the video to be processed to be displayed, for example, one or more videos to be processed may be selected from the videos to be processed in multiple scenes, and the processor may control the display to display the selected videos to be processed. The position setting instruction and the size setting instruction of the main display window are used for setting the position and the size of the main display window, respectively, for example, the user can drag the main display window by using a mouse so as to set the position of the main display window, and the user can zoom in or zoom out the main display window by using the mouse so as to set the size of the main display window. The user may also set the position and size of the alternative display windows in a similar manner, and may also set the number of alternative display windows, e.g., the main display window may include only one, the alternative display window may include a plurality of, may be used to display one to-be-processed video of greatest interest to the user, and a plurality of to-be-processed videos of comparative interest to the user, respectively.

According to one embodiment of the present invention, under the manual selection display model, the processor may display the video to be processed according to the above user input information, and may simultaneously display a plurality of videos to be processed on the display, for example, display the target video selected by the user on the main display window, display the candidate video selected by the user on the candidate display window, and display at least a part of other videos on the thumbnail display window. And the positions and the sizes of the main display window and the alternative display window can be set by a user and can be adjusted at any time.

According to one embodiment of the present invention, in the target tracking display mode, the user input information matched with the target tracking display mode includes reference information of the target object, a position setting instruction and a size setting instruction of the main display window, and a position setting instruction, a size setting instruction and a number setting instruction of the alternate display window. The reference information of the target object may include an image of the target object input by the user to the processor, or a target object selected by the user in a certain video to be processed. The positions and dimensions of the main display window and the positions and dimensions of the alternative display windows are set in a manner similar to those described above, and will not be described in detail herein.

According to one embodiment of the present invention, after inputting the reference information of the target object, the target video and the alternative video may be selected by the processor among the plurality of videos to be processed and displayed in the main display window and the alternative display window, respectively. According to the display mode and the user input information, determining target videos, alternative videos and thumbnail videos in videos to be processed of a plurality of scenes, wherein the target videos, the alternative videos and the thumbnail videos comprise: searching the reference information of the target object in the video frames of the current moments of the videos to be processed of a plurality of scenes, determining the video to be processed, which belongs to the video frames comprising the target object, as the target video, and determining the scene corresponding to the target video as the target scene; according to the design information of the scenes, determining alternative scenes with a communication relation with the target scene in the multiple scenes; determining the image position information of the target object in a plurality of video frames of the historical moment of the target video, and determining the image position information of the target object in the video frame of the current moment of the target video; determining geographic position information of the target object at a plurality of moments according to the image position information; determining the orientation information of the target object in a video frame of the current moment of the target video; determining the occurrence probability of the target object in each alternative scene in the future according to the geographic position information, the orientation information of the target object and the design information of the alternative scene; selecting a number of alternative videos corresponding to the number setting instruction from the videos to be processed corresponding to the alternative scenes according to the occurrence probability; and determining the videos to be processed except the target video and the alternative video as the thumbnail video.

According to one embodiment of the present invention, if the reference information of the target object is an image input by the user, the reference information of the target object may be searched in a plurality of videos to be processed by the processor, if the reference information of the target object is a certain image block selected by the user in a video frame of a current time of the videos to be processed, the search may not be performed, or if the reference information of the target object is a certain image block selected by the user in a video frame of a historical time, the search may also be required to search in which video frame of the videos to be processed of which scene the target object appears at the current time. The search can be performed through deep learning neural network model, and the invention does not limit the search mode. After the video frame where the target object is located is determined, the video to be processed to which the video frame belongs can be determined as a target video, and a scene corresponding to the target video is a target scene.

According to one embodiment of the invention, communication relation may exist between each scene, for example, a region photographed by a plurality of cameras is a market, a region A and a region B in the market are communicated, the region A and the region C are not directly communicated, the region A and the region C need to be reached through the region B, and the like, an alternative scene with communication relation with a target scene can be determined based on design information of the scene, and as the target object appears in the target scene at the current moment, the target object may appear in the alternative scene with communication relation with the target scene at the future moment. Therefore, one or more videos to be processed with the highest possibility of the target object appearing in the future can be selected from the videos to be processed corresponding to the alternative scenes to serve as the alternative videos.

According to one embodiment of the present invention, an alternative video may be selected among videos to be processed of a plurality of alternative scenes, and a possibility that a target object will appear in each alternative scene in the future may be determined based on a path of the target object, thereby selecting an alternative video among videos to be processed of a plurality of alternative scenes.

According to one embodiment of the invention, the image position information of the target object can be determined in a plurality of video frames at historical moments and video frames at current moments, and converted into geographic position information through calibration parameters of a camera. Further, orientation information of the target object in the video frame at the current moment can be determined, where the orientation information is orientation information of the target object in the geographic coordinate system.

According to one embodiment of the present invention, by the above-obtained geographical position information and orientation information, the path of the target object may be estimated, so as to determine the occurrence probability of the target object in each candidate scene in the future, so that the candidate video may be selected from the videos to be processed of the plurality of candidate scenes.

According to one embodiment of the present invention, determining the occurrence probability of the target object in the future in each candidate scene according to the geographic position information, the orientation information of the target object, and the design information of the candidate scenes includes: fitting the geographical position information with the historical time and the current time to obtain a first predicted route; obtaining a second predicted route according to the orientation information of the target object and the geographic position information of the target object at the current moment, wherein the second predicted route is a straight line route; the occurrence probability P _i of the target object in the i-th alternative scene in the future is determined according to equation (1),

Wherein L ₁ is a first predicted route, L ₂ is a second predicted route,B _i (x, y) is a curved surface equation of the ith alternative scene, and ds is a curved surface infinitesimal, wherein the curved surface equation is an area surrounded by the first predicted route and the second predicted route.

According to one embodiment of the present invention, the geographical position information and the respective moments are fitted to obtain a first predicted route, which is predicted based on the geographical position information at the historic moment. A second predicted route may be determined based on the geographic location information determined by the video frame at the current time and the orientation information of the target object, the second predicted route being predicted based on the geographic location information and the orientation information at the current time. Therefore, the route formed by the respective positions in the area surrounded by the first predicted route and the second predicted route is a result of weighted summation of the first predicted route and the second predicted route, and various weights are converted to obtain various routes in the area surrounded by the first predicted route and the second predicted route, which represent routes in the case where the history time and the current time are set to various different importance, respectively. These routes are all routes that the target object may walk on at a future time.

According to one embodiment of the present invention, the molecular part of the formula (1) is an area surrounded by the first predicted route and the second predicted route, and the intersection with the ith alternative scene is a set of positions of the target object in the future that may occur in the ith alternative scene. The denominator part of the formula (1) is the total area of the ith alternative scene, i.e. the set of all positions in the ith alternative scene. Thus, the probability of occurrence of the target object in the future in the i-th alternative scene can be determined in comparison with the two.

According to the embodiment of the invention, a plurality of alternative scenes with highest occurrence probability can be selected from the alternative scenes, the videos to be processed of the alternative scenes are the alternative videos, and the number of selections is consistent with the number set by the number setting instruction. The unselected pending video may be displayed as a thumbnail video in a thumbnail display window. Then, after the target object moves, the target object is more likely to appear in the candidate video, the candidate video in which the target object appears may be determined as a new target video, and the new candidate video may be determined again in the above manner.

In this way, the first predicted route can be determined based on the geographic position information of the historical moment, and the second predicted route can be determined based on the orientation information and the geographic position information of the current moment, so that the area surrounded by the first predicted route and the second predicted route can be determined as a set of routes for future walking of the target object, further, the possible future occurrence position of the target object and the occurrence probability of the target object in each alternative scene can be determined, the occurrence probability of the target object can be determined based on both the current moment and the historical moment, and the accuracy of the occurrence probability can be improved.

According to one embodiment of the present invention, the key scene display mode may select a video to be displayed according to importance, for example, if an event occurring in a certain scene has the highest importance, the video to be processed of the certain scene may be displayed as a target video in a main display window, a certain number of videos to be processed with a certain importance may be displayed as alternative videos in alternative display windows, and other videos to be processed may be displayed in thumbnail display windows. The main display window and the alternate display window may be sized by a user or may be sized by a processor based on importance, as the invention is not limited in this regard.

According to one embodiment of the present invention, determining a target video, an alternative video, and a thumbnail video among videos to be processed of a plurality of scenes according to the display mode and the user input information includes: respectively selecting video frames at a plurality of historical moments from videos to be processed of a plurality of scenes; for a video frame of a history time which is selected from the same video to be processed and is separated from the current time by a preset time period and a video frame of the current time, determining the movement direction angles of a plurality of target objects in the video frame of the current time and the movement rates of the plurality of target objects; determining human body key points of the target objects; determining importance scores of all videos to be processed according to the human body key points, the movement direction angles of the plurality of target objects and the movement rates of the plurality of target objects; and determining the video to be processed with the highest importance score as a target video, determining the videos to be processed with the importance score ranking from 2 to n+1 as candidate videos, and determining the videos to be processed with the importance score ranking from n+2 and later as thumbnail videos, wherein n is the number of candidate display windows corresponding to the number setting instruction.

According to one embodiment of the present invention, in some scenes, the importance of the video to be processed, in which the actions of a plurality of target objects are consistent, is high, for example, in scenes of stadiums, the consistency of the actions of a plurality of athletes is high, and the speed is similar, for example, in sports such as swimming, track and field, the actions and postures of the athletes are similar, and the speed is similar, and the postures and actions of the spectators are different, so in such scenes, the importance of the video to be processed of the shooting athlete is high, and the importance of the video to be processed of the shooting spectators is low. In other scenarios, the above rule still exists, for example, in a street scenario, the walking directions of pedestrians are different, the speeds are different, then this is usually a more common street scenario, no special event occurs, and if the walking directions of many pedestrians are close (i.e. many people walk in the same direction), or the motion consistency of many pedestrians is high (e.g. many people use running motions at the same time), then a special event may occur, or the importance of the place where many pedestrians will go is high (e.g. going to an office building or going to a train station, etc.), and therefore, in the case where there is high consistency of the motions of many people, and/or the consistency of the motion rates is high, the importance of the video to be processed is high.

According to one embodiment of the invention, the motion consistency, the motion speed consistency and the motion direction consistency of the plurality of target objects in each video to be processed can be determined through parameters such as the motion speed, the human body key points, the motion direction angles and the like of the plurality of target objects.

According to one embodiment of the present invention, determining importance scores of respective videos to be processed according to the human body key points, the movement direction angles of the plurality of target objects, and the movement rates of the plurality of target objects includes: determining an action consistency score S _j for the jth video to be processed according to equation (2),

Wherein x _j,_k,_t is the abscissa of the vector of the t-th human body key point of the kth target object in the jth video to be processed relative to the centroid of the selection frame of the kth target object, y _j,_k,_t is the ordinate of the vector of the t-th human body key point of the kth target object in the jth video to be processed relative to the centroid of the selection frame of the kth target object,For the average value of the abscissa of the vector of the t-th human keypoint of a plurality of target objects in the j-th video to be processed relative to the centroid of the respective selection frame,/>The method comprises the steps that the average value of the ordinate of a vector of a t-th human body key point of a plurality of target objects in a j-th video to be processed relative to the centroid of each selection frame is taken, m is the number of human body key points, N is the number of target objects, k is less than or equal to N, t is less than or equal to m, k, t, N and m are positive integers, and min is a minimum function; determining a motion direction consistency score D _j of the jth video to be processed according to the formula (3),

Wherein, θ _j,_k is the movement direction angle of the kth target object in the jth video to be processed, D (θ _j,_k) is the variance value of the movement direction angles of the plurality of target objects in the jth video to be processed, and E (θ _j,_k) is the expected value of the movement direction angles of the plurality of target objects in the jth video to be processed; determining a motion rate consistency score for the jth video to be processed according to equation (4),

Wherein v _j,_k is the motion rate of the kth target object in the jth video to be processed, D (v _j,_k) is the variance value of the motion rates of the plurality of target objects in the jth video to be processed, and E (v _j,_k) is the expected value of the motion rates of the plurality of target objects in the jth video to be processed; and carrying out weighted summation on the motion consistency score of the jth video to be processed, the motion direction consistency score of the jth video to be processed and the motion rate consistency score of the jth video to be processed to obtain the importance score of the jth video to be processed.

According to one embodiment of the present invention, in the formula (3), (x _j,k,t,y_j,k,t) may be used to represent the vector of the t-th human key point of the kth target object in the jth video to be processed, since the distance, the perspective, etc. of each target object in the video to be processed are different, and the heights of each target object are also different, it is difficult to determine the motion consistency directly by vector comparison. Thus, the similarity of the actions of the respective target objects can be determined by the abscissa ratio.And if the vector formed by the abscissa ratios of the vectors of the key points of the kth target object in the jth video to be processed is completely consistent with the motion of the kth target object and is only another target object with differences of distance, near, height and the like, the vector formed by the abscissa ratios of the vectors of the key points of the other target object is consistent with the vector corresponding to the kth target object. Therefore, the difference of distance, height and the like can be removed by comparing the vectors formed by the abscissa and ordinate ratios of the vectors of the key points of the human body, the motion amplitude of each target object can be represented, and the motion consistency can be used for comparing only the motion consistency.

According to one embodiment of the invention, the method can be realized byA vector composed of the ratio of the average values of the abscissa and the ordinate of the vector representing the human body key points of the respective target objects can be used to represent the average motion amplitudes of the plurality of target objects. And determining the vector/>, among vectors formed by the abscissa-ordinate ratios of the vectors of the key points of the target objectsAnd the vector with the lowest cosine similarity is used for determining the highest cosine similarity as the action consistency score of the j-th video to be processed. The cosine similarity may be used to represent the similarity between the motion amplitude and the average motion amplitude of each target object, and the higher the minimum value of the similarity is, the higher the similarity between the motion amplitude and the average motion amplitude of the target object with the lowest consistency with other target objects is, and the higher the overall similarity between the motion amplitude and the average motion amplitude of the plurality of target formations is. Thus, the higher the consistency of actions of the plurality of target objects can be represented.

According to one embodiment of the present invention, in the formula (3), D (θ _j,_k) is a variance value of the movement direction angles of the plurality of target objects, the smaller the variance value is, the higher the uniformity of the movement direction is, and thus, can be obtained by a form of a ratioThe motion direction consistency score is obtained by subtracting the ratio from 1, so that the higher the motion direction consistency score is, the more consistent the motion directions of the plurality of target objects are.

According to one embodiment of the present invention, in the formula (4), D (v _j,_k) is a variance value of the motion rates of the plurality of target objects, the smaller the variance value is, the higher the uniformity of the motion rates is, and thus, can be obtained by a form of a ratioThe rate of motion uniformity score is obtained by subtracting the ratio from 1, such that the higher the rate of motion uniformity score, the more uniform the rate of motion of the plurality of target objects.

According to one embodiment of the invention, the motion consistency score, the motion direction consistency score and the motion rate consistency score can be weighted and summed to obtain the importance score of the video to be processed, and the importance score can be used for representing the consistency of the motion directions, the motion rates and the motions of a plurality of target objects in the video to be processed and representing the importance of the video to be processed. And the video to be processed having the highest importance score (i.e., the most important) may be determined as the target video, the 2 nd to n+1 th videos to be processed may be determined as the candidate videos (n is the number of candidate display windows corresponding to the number setting instruction), and the n+2 th and subsequent videos to be processed may be determined as the thumbnail videos.

According to one embodiment of the present invention, a target video may be displayed in a main display window, an alternative video may be displayed in an alternative display window, and a thumbnail video may be displayed in a thumbnail display window. Also, the size of the primary and alternate display windows may be automatically set by the processor, e.g., the size of the primary and alternate display windows may be proportional to the importance scores of the target and alternate videos. Or the sizes of the main display window and the alternative display window may be set by the user at his own discretion, which is not limited by the present invention.

In this way, the importance score of the video to be processed can be determined jointly by using the action consistency, the movement direction consistency and the movement speed consistency of a plurality of target objects, the accuracy and objectivity of the importance score are improved, and when the action consistency score is determined, the influence of the visual angle, the distance, the height and the like of the target objects is removed by using the vector formed by the abscissa-ordinate ratio of the vector of the key points of the human body, and the accuracy of the action consistency score is improved.

According to the multi-scene information same-screen display system provided by the embodiment of the invention, the video to be displayed to be processed can be screened based on the selected display mode, the target video, the alternative video and the thumbnail video are determined, the position and the size of the main display window and the position and the size of the alternative display window are determined, so that the target video of the target object is displayed in the main display window, the alternative video of the scene possibly to which the target object is in the future can be displayed in the alternative display window, and therefore, the target object and the scene possibly to which the target object is in the future can be intuitively displayed without spending a large amount of manpower for searching, and the display effect is improved. When the occurrence probability of the target object in the alternative scenes is determined, a first predicted route can be determined based on the geographic position information of the historical moment, and a second predicted route can be determined based on the orientation information and the geographic position information of the current moment, so that the area surrounded by the first predicted route and the second predicted route can be determined as a set of routes for the target object to walk in the future, the position where the target object possibly appears in the future and the occurrence probability of the target object in the future in each alternative scene can be determined, the occurrence probability of the target object can be determined based on both the current moment and the historical moment, and the accuracy of the occurrence probability is improved. When determining the importance scores of all videos to be processed, the importance scores of the videos to be processed can be determined jointly by using the action consistency, the movement direction consistency and the movement speed consistency of a plurality of target objects, the accuracy and objectivity of the importance scores are improved, and when determining the action consistency scores, the influences of the visual angles, the distance, the height and the like of the target objects are removed by using the vectors formed by the abscissa-ordinate ratios of the vectors of the key points of the human bodies, and the accuracy of the action consistency scores is improved.

Fig. 2 exemplarily shows a flowchart of a scene information on-screen presentation method according to an embodiment of the present invention. The method comprises the following steps:

step S101, receiving a display mode selected by a user, wherein the display mode comprises a manual selection display mode, a target tracking display mode and an important scene display mode, the manual selection display mode is used for receiving a selection instruction of the user for a video to be processed and controlling a display to display the selected video to be processed according to the selection instruction, the target tracking display mode is used for determining a scene of the target object according to a tracked action route of the target object and controlling the display to display the video to be processed corresponding to the scene, and the important scene display mode is used for determining the importance of each scene and controlling the display to display the video to be processed according to the importance;

step S102, receiving user input information matched with the display mode;

Step S103, determining target videos, alternative videos and thumbnail videos in videos to be processed of a plurality of scenes according to the display mode and the user input information;

Step S104, determining the position and the size of a main display window and the position and the size of an alternative display window according to at least one of the display mode, the user input information and the videos to be processed of a plurality of scenes;

step S105, controlling the display to display the target video in the main display window, displaying the candidate video in the candidate display window, and displaying at least a part of the thumbnail video in the thumbnail display window.

According to an embodiment of the present invention, there is provided a multi-scene information on-screen display apparatus including: a processor; a memory for storing processor-executable instructions; the processor is configured to call the instructions stored in the memory to execute the multi-scene information on-screen display method.

According to one embodiment of the present invention, a computer-readable storage medium is provided, on which computer program instructions are stored, which when executed by a processor implement the multi-scene information on-screen presentation method.

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A multi-scene information on-screen display system, comprising: a plurality of cameras, a processor, and a display;

The processor is configured to:

Receiving a display mode selected by a user, wherein the display mode comprises a manual selection display mode, a target tracking display mode and an important scene display mode, the manual selection display mode is used for receiving a selection instruction of the user for the video to be processed and controlling the display to display the selected video to be processed according to the selection instruction, the target tracking display mode is used for determining a scene where the target object appears according to a tracked action route of the target object and controlling the display to display the video to be processed corresponding to the scene, and the important scene display mode is used for determining the importance of each scene and controlling the display to display the video to be processed according to the importance;

receiving user input information matched with the display mode;

controlling the display to display the target video in a main display window, displaying the alternative video in an alternative display window, and displaying at least a part of the thumbnail video in a thumbnail display window;

In the target tracking display mode, the user input information matched with the target tracking display mode comprises reference information of a target object, a position setting instruction and a size setting instruction of a main display window, and a position setting instruction, a size setting instruction and a quantity setting instruction of an alternative display window;

According to the display mode and the user input information, determining target videos, alternative videos and thumbnail videos in videos to be processed of a plurality of scenes, wherein the target videos, the alternative videos and the thumbnail videos comprise:

determining the videos to be processed except the target video and the alternative video as the thumbnail video;

In the key scene display mode, user input information matched with the key scene display mode comprises a number setting instruction of alternative display windows;

determining human body key points of the target objects;

2. The multi-scene information on-screen presentation system of claim 1 wherein the user input information matching the manual selection display mode includes a selection instruction for the video to be processed, a position setting instruction and a size setting instruction for the main display window, and a position setting instruction, a size setting instruction and a number setting instruction for the alternate display window.

3. The multi-scene information co-screen presentation system according to claim 1, wherein determining the occurrence probability of the target object in the future in each of the candidate scenes based on the geographic position information, the orientation information of the target object, and the design information of the candidate scenes, comprises:

According to the formula

4. The multi-scene information co-screen presentation system of claim 1, wherein determining the importance score of each video to be processed based on the human keypoints, the direction of motion angles of the plurality of target objects, and the rate of motion of the plurality of target objects, comprises:

According to the formula

Determining a motion consistency score S _j of the jth video to be processed, wherein x _j,k,t is the abscissa of a vector of the jth human body key point of the kth target object in the jth video to be processed relative to the centroid of the selection frame of the kth target object, y _j,k,t is the ordinate of a vector of the kth human body key point of the kth target object in the jth video to be processed relative to the centroid of the selection frame of the kth target object,For the average value of the abscissa of the vector of the t-th human keypoint of a plurality of target objects in the j-th video to be processed relative to the centroid of the respective selection frame,/>The method comprises the steps that the average value of the ordinate of a vector of a t-th human body key point of a plurality of target objects in a j-th video to be processed relative to the centroid of each selection frame is taken, m is the number of human body key points, N is the number of target objects, k is less than or equal to N, t is less than or equal to m, k, t, N and m are positive integers, and min is a minimum function;

According to the formula

Determining a motion direction consistency score D _j of a jth video to be processed, wherein theta _j,k is a motion direction angle of a kth target object in the jth video to be processed, D (theta _j,k) is a variance value of motion direction angles of a plurality of target objects in the jth video to be processed, and E (theta _j,k) is an expected value of motion direction angles of a plurality of target objects in the jth video to be processed;

According to the formula

Determining a motion rate consistency score of a jth video to be processed, wherein v _j,k is the motion rate of a kth target object in the jth video to be processed, D (v _j,k) is a variance value of the motion rates of a plurality of target objects in the jth video to be processed, and E (v _j,k) is an expected value of the motion rates of a plurality of target objects in the jth video to be processed;

5. The method for displaying the multi-scene information on the same screen is characterized by comprising the following steps of:

Receiving a display mode selected by a user, wherein the display mode comprises a manual selection display mode, a target tracking display mode and an important scene display mode, the manual selection display mode is used for receiving a selection instruction of the user for a video to be processed and controlling a display to display the selected video to be processed according to the selection instruction, the target tracking display mode is used for determining a scene where a target object appears according to a tracked action route of the target object and controlling the display to display the video to be processed corresponding to the scene, and the important scene display mode is used for determining importance of each scene and controlling the display to display the video to be processed according to the importance;

receiving user input information matched with the display mode;

determining human body key points of the target objects;

6. A computer readable storage medium, having stored thereon computer program instructions which, when executed by a processor, implement the method of claim 5.