CN107122746B

CN107122746B - Video analysis apparatus, method, and computer-readable storage medium

Info

Publication number: CN107122746B
Application number: CN201710294505.7A
Authority: CN
Inventors: 谷源涛; 侯奇; 车向前
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2020-01-21
Anticipated expiration: 2037-04-28
Also published as: CN107122746A

Abstract

The present disclosure relates to a video analysis apparatus, a method and a computer-readable storage medium, including a video analysis apparatus, characterized by comprising: an ideal position determination unit configured to determine ideal positions of a plurality of objects in a video using a predetermined arrangement pattern of the plurality of objects; a geometric transformation establishing unit configured to establish a geometric transformation of initial positions of the plurality of objects into ideal positions using a coordinate transformation method; and a precise position obtaining unit configured to obtain precise positions of the plurality of objects according to an inverse transformation of the geometrically transformed mapping matrix. The video analysis apparatus, method, and computer-readable storage medium according to aspects of the present disclosure can improve the accuracy of positioning objects in a video by using a predetermined arrangement pattern of a plurality of objects in the video.

Description

Video analysis apparatus, method, and computer-readable storage medium

Technical Field

The present disclosure relates to the field of video analytics, and more particularly to locating multiple objects in a video through video analytics.

Background

Video analysis, and in particular, positioning analysis of objects appearing in video, and the like, have wide application across disciplines. For example, in the biological context, animal behavior is an important research direction, and algorithms for automatically analyzing animal behavior videos are increasingly applied to the field. Automatic extraction and segmentation of objects in the video is an important step of an automatic analysis method. However, the conventional extraction algorithm generally has the problems of poor positioning accuracy, missing detection of the object position and the like, and the automation of the process is seriously influenced.

Disclosure of Invention

The present inventors have appreciated that existing video analysis algorithms do not take full advantage of the inherent information between objects in the video. In view of the above, the present disclosure proposes a video analysis apparatus, method and computer-readable storage medium.

According to an aspect of the present disclosure, there is provided a video analysis apparatus, comprising: an ideal position determination unit configured to determine ideal positions of a plurality of objects in a video using a predetermined arrangement pattern of the plurality of objects; a geometric transformation establishing unit configured to establish a geometric transformation of initial positions of the plurality of objects into ideal positions using a coordinate transformation method; and a precise position obtaining unit configured to obtain precise positions of the plurality of objects according to an inverse transformation of the geometrically transformed mapping matrix. The video analysis apparatus, method, and computer-readable storage medium according to aspects of the present disclosure can improve the accuracy of positioning objects in a video by using a predetermined arrangement pattern of a plurality of objects in the video.

In one possible implementation, the predetermined arrangement pattern includes a substantially centrosymmetric arrangement.

In one possible implementation, the ideal position determining unit is configured to specify, for the predetermined arrangement pattern, a direction, a scale, and a coordinate origin of arrangement thereof to determine ideal positions of the plurality of objects in an ideal case.

In one possible implementation, the geometric transformation comprises an affine transformation or a perspective transformation; the precise position obtaining unit is configured to solve an inverse transformation of the mapping matrix of the affine transformation or the perspective transformation, apply the inverse transformation to ideal positions of the plurality of objects, to obtain precise positions of the plurality of objects.

In one possible implementation, the plurality of objects are a plurality of insect platforms at which insects can move, and the predetermined arrangement pattern is a regular polygonal arrangement.

In one possible implementation, the video analysis device further includes: an initial position obtaining unit configured to preliminarily obtain positions of the plurality of objects as the initial positions using shape information of the plurality of objects, wherein the preliminarily obtaining the positions of the plurality of objects includes: extracting a plurality of frames from the video, detecting positions of the shapes of the plurality of objects in the plurality of frames, clustering the positions into the number of the objects, and obtaining the initial position.

According to another aspect of the present disclosure, there is provided a video analysis method, including: determining ideal positions of a plurality of objects in a video by using a preset arrangement mode of the plurality of objects; establishing a geometric transformation from the initial positions of the plurality of objects to ideal positions by using a coordinate transformation method; obtaining the precise positions of the plurality of objects according to an inverse transformation of the geometrically transformed mapping matrix.

In one possible implementation, the video analysis method further includes: preliminarily obtaining positions of the plurality of objects as the initial positions using shape information of the plurality of objects; wherein preliminarily obtaining the locations of the plurality of objects comprises: extracting a plurality of frames from the video, detecting positions of the shapes of the plurality of objects in the plurality of frames, clustering the positions into the number of the objects, and obtaining the initial position.

In one possible implementation, determining the ideal positions of the plurality of objects in the video using the predetermined arrangement pattern of the plurality of objects comprises: for the predetermined arrangement pattern, the direction, scale, and origin of coordinates of the arrangement thereof are specified to determine ideal positions of the plurality of objects in an ideal case.

In one possible implementation, the geometric transformation comprises an affine transformation or a perspective transformation; and obtaining the precise locations of the plurality of objects according to an inverse transformation of the geometrically transformed mapping matrix comprises: solving an inverse transformation of the mapping matrix of the affine transformation or perspective transformation, the inverse transformation being applied to ideal positions of the plurality of objects to obtain precise positions of the plurality of objects.

According to another aspect of the present disclosure, there is provided a video analysis apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the above method.

According to another aspect of the present disclosure, a computer-readable storage medium is provided, which when executed by a processor implements the above-described method.

The video analysis apparatus, method, and computer-readable storage medium according to aspects of the present disclosure can improve the accuracy of positioning objects in a video by using a predetermined arrangement pattern of a plurality of objects in the video.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a schematic device block diagram of a video analysis apparatus according to an exemplary embodiment.

Fig. 2 shows a schematic detailed block diagram of a video analysis device according to a specific implementation example.

Fig. 3 shows a schematic flow diagram of a video analysis method according to an exemplary embodiment.

Fig. 4 shows a schematic detailed flow diagram of a video analysis method according to a specific implementation example.

Fig. 5 is a schematic diagram showing a plurality of objects a to F arranged in a regular hexagon according to a specific implementation example.

Fig. 6 is a block diagram of a video analysis device 1900 according to an example embodiment.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Although the present disclosure is exemplified with respect to an insect arena used as an insect mobile station (where insects can move) as a positioning object in a video, it will be readily understood by those skilled in the art that the present disclosure is equally applicable to positioning various types of objects in videos of various different fields.

Fig. 1 shows a schematic device block diagram of a video analysis apparatus 100 according to an exemplary embodiment. The video analysis device 100 comprises an ideal position determination unit 102, a geometric transformation establishing unit 104 and a precise position obtaining unit 106. Wherein the ideal position determination unit 102 is configured to determine the ideal positions of the plurality of objects using a predetermined arrangement pattern of the plurality of objects in the video. Here, the predetermined arrangement pattern may be any arrangement pattern in which the objects have an arrangement rule known to the video analysis apparatus 100. In one specific example, the predetermined arrangement pattern includes a substantially centrosymmetric arrangement. Those skilled in the art will appreciate that the arrangement of objects in practice makes it difficult to achieve a geometrically strict central symmetry. "substantially centrosymmetric" means that the objects are arranged in a centrosymmetric manner within tolerances that are known or generally accepted by those skilled in the art. In one specific example, the predetermined arrangement pattern may preferably include a checkered arrangement or a square arrangement, a rectangular arrangement (the square and rectangular arrangements may be collectively referred to as a rectangular arrangement), or a regular hexagonal arrangement. Of course, the predetermined arrangement pattern may include other regular polygonal arrangements such as regular octagons, as will be appreciated by those skilled in the art. The geometric transformation establishing unit 104 is configured to establish a geometric transformation of initial positions of the plurality of objects into ideal positions using a coordinate transformation method. The precise position obtaining unit 106 is configured to obtain the precise positions of the plurality of objects according to an inverse transformation of the geometrically transformed mapping matrix.

Fig. 2 shows a schematic detailed block diagram of a video analysis device 200 according to a specific implementation example. The video analysis apparatus 200 may further include an initial position obtaining unit 202 on the basis of the video analysis apparatus 100 in addition to the ideal position determination unit 102, the geometric transformation establishment unit 104, and the precise position obtaining unit 106. In one particular implementation, the video analysis device 200 receives an input video. The initial position obtaining unit 202 extracts a plurality of frames from an input video, detects positions where shapes of a plurality of objects are located using shape information of the plurality of objects in the video, clusters the positions into the number of objects, and obtains initial positions where the shapes of the plurality of objects are located. Thereby, the video analysis device 200 can preliminarily obtain the initial positions of the plurality of objects.

Fig. 3 shows a schematic flow diagram of a video analysis method according to an exemplary embodiment. The video analysis method may include the following steps. In step S302, ideal positions of a plurality of objects in a video are determined using a predetermined arrangement pattern of the plurality of objects. In step S304, a geometric transformation of the initial positions of the plurality of objects to ideal positions is established using a coordinate transformation method. In step S306, the exact positions of the plurality of objects are obtained according to the inverse transformation of the mapping matrix of the geometric transformation.

Fig. 4 shows a schematic detailed flow diagram of a video analysis method according to a specific implementation example. In one specific implementation example, the video analysis method of the present disclosure may be applied to analyze an insect motion stage as an object in a video. It will be apparent to those skilled in the art that the objects that can be analyzed by the present disclosure are not limited to insect platforms. As shown in fig. 4, the following steps may be performed.

● preliminary extraction of the location of the insect moving table

A number of frames are extracted from the input video and the position of the insect stage shape therein is detected. For example, for a circular insect motion stage, a hough circle transform algorithm may be used. For other shapes, known shape detection algorithms corresponding to the other shapes may be employed. The detected positions of the insect moving table are clustered, for example, algorithms such as Kmeans or GMM can be used to obtain N clustering centers, namely the initial position P of the insect moving table₁,P₂,…，P_NWherein N is the number of insect stands. P₁,P₂,…，P_NPointing substantially at the location of the insect ramp.

For known arrangements of insect stands, the direction, dimensions and origin of coordinates of the arrangement are specified and the ideal position of the stand is determined. For example, in one particular example, the insect stands may be arranged in a grid. At this time, it can be assumed that the ideal positions of the insect platforms are Q₁,Q₂,…,Q_NWherein Q is_i＝(m_i，n_i)，m_iAnd n_iRespectively, the number of rows and columns in the checkered arrangement. Of course, other dimensions, directions and origins of coordinates may be specified. Q under different coordinate systems₁,Q₂,…,Q_NThere are different representations. For example, in another specific example, the arrangement of the insect stands may be a regular hexagonal arrangement. Fig. 5 is a schematic diagram illustrating a plurality of objects a to F arranged in a regular hexagon, a symmetry center of the regular hexagon being shown as O, according to a specific implementation example. At this time, coordinates of the insect motion table as an example of the object may be expressed as

Wherein m and n correspond to the number of rows and columns of the insect platform in the OC direction and the OB direction, respectively. It will be appreciated that the dimensions may represent a measure of the size of a predetermined pattern of objects arranged in an ideal coordinate system, for example an insect ramp. In one example, in the case where the predetermined arrangement pattern is a regular hexagonal or rectangular arrangement, the dimension may be a regular hexagonal or rectangularThe dimension of the side length under an ideal coordinate system. In other examples, where the predetermined arrangement pattern is a circular arrangement, the dimension may be a radius of the circle.

Establishing an initial position P of the insect platform₁,P₂,…,P_NAnd ideal position Q₁,Q₂,…,Q_NThe mapping relationship between them. Suppose P₁,P₂,…,P_NAnd Q_k1,Q_k2,…,Q_kNOne-to-one correspondence exists, then there is a mapping A to convert the point from the initial coordinate P₁,P₂,…,P_NMapped to ideal coordinates Q_k1,Q_k2,…,Q_kN. The mapping matrix a may represent a perspective transformation or an affine transformation. It should be noted that the rotation transformation, the shearing transformation, the translation transformation, the scaling transformation, the flipping transformation, and the like may be implemented by affine transformation, and are not described herein.

Setting mapping matrix

Obtain an over-determined equation

In one specific example, mapping matrix a may be solved, for example, according to a least squares method, and inverse transformation B of mapping matrix a may be solved.

Ideal position Q of the insect platform_k1,Q_k2,…,Q_kNObtaining the accurate position P 'of the insect movable table through the action of the inverse transformation B'₁,P’₂,…,P’_NI.e. by

It should be noted that, in fact, the videos to which the present disclosure applies and the objects analyzed in the videos are not limited to insect videos and insect consoles. The user can flexibly set and select according to the actual application scene, as long as a plurality of objects in the video are arranged in a preset arrangement mode.

Fig. 6 is a block diagram of a video analysis device 1900 according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to FIG. 5, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules, each module corresponding to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the video analysis method described above.

The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932^TM，MacOS X^TM，Unix^TM，Linux^TM， FreeBSD^TMOr the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided that includes instructions, such as the memory 1932 that includes instructions, which are executable by the processing component 1922 of the apparatus 1900 to perform the above-described method.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

In this way, the video analysis apparatus, method, and computer-readable storage medium according to the above-described embodiments of the present disclosure can improve the accuracy of object location in a video by making full use of predetermined arrangement pattern information.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A video analysis apparatus, comprising:

an ideal position determination unit configured to determine ideal positions of a plurality of objects in a video using a predetermined arrangement pattern of the plurality of objects;

a geometric transformation establishing unit configured to establish a geometric transformation of initial positions of the plurality of objects into ideal positions using a coordinate transformation method; and

a precise position obtaining unit configured to apply the inverse transform to ideal positions of the plurality of objects according to an inverse transform of a mapping matrix of the geometric transform to obtain precise positions of the plurality of objects.

2. The video analysis device of claim 1, wherein the predetermined arrangement pattern comprises a substantially centrosymmetric arrangement.

3. The video analysis apparatus according to claim 1, wherein the ideal position determination unit is configured to specify, for the predetermined arrangement pattern, a direction, a scale, and a coordinate origin of arrangement thereof to determine ideal positions of the plurality of objects in an ideal case.

4. The video analysis device of claim 1, wherein the geometric transformation comprises an affine transformation or a perspective transformation; and

the precise position obtaining unit is configured to solve an inverse transformation of the mapping matrix of the affine transformation or the perspective transformation, apply the inverse transformation to ideal positions of the plurality of objects, to obtain precise positions of the plurality of objects.

5. The video analysis apparatus according to claim 1, wherein the plurality of objects are a plurality of insect moving stages where insects can move, and the predetermined arrangement pattern is a regular hexagonal or rectangular arrangement.

6. The video analysis device of claim 1, further comprising:

an initial position obtaining unit configured to preliminarily obtain positions of the plurality of objects as the initial positions using shape information of the plurality of objects,

wherein preliminarily obtaining the locations of the plurality of objects comprises: extracting a plurality of frames from the video, detecting positions of the shapes of the plurality of objects in the plurality of frames, clustering the positions into the number of the objects, and obtaining the initial position.

7. A method of video analysis, comprising:

determining ideal positions of a plurality of objects in a video by using a preset arrangement mode of the plurality of objects;

establishing a geometric transformation from the initial positions of the plurality of objects to ideal positions by using a coordinate transformation method;

applying the inverse transform to ideal positions of the plurality of objects according to an inverse transform of a mapping matrix of the geometric transform to obtain precise positions of the plurality of objects.

8. The video analytics method of claim 7, further comprising:

preliminarily obtaining positions of the plurality of objects as the initial positions using shape information of the plurality of objects;

9. The video analytics method of claim 7, wherein the predetermined arrangement pattern comprises a substantially centrosymmetric arrangement.

10. The video analytics method of claim 7, wherein determining ideal locations of a plurality of objects in the video using a predetermined arrangement pattern of the plurality of objects comprises: for the predetermined arrangement pattern, the direction, scale, and origin of coordinates of the arrangement thereof are specified to determine ideal positions of the plurality of objects in an ideal case.

11. The video analysis method of claim 7, wherein the geometric transformation comprises an affine transformation or a perspective transformation; and

obtaining the precise locations of the plurality of objects according to an inverse transformation of the geometrically transformed mapping matrix comprises: solving an inverse transformation of the mapping matrix of the affine transformation or perspective transformation, the inverse transformation being applied to ideal positions of the plurality of objects to obtain precise positions of the plurality of objects.

12. The video analysis method according to claim 7, wherein the plurality of objects are a plurality of insect moving stages where insects can move, and the predetermined arrangement pattern is a regular hexagonal or rectangular arrangement.

13. A video analysis apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of any one of claims 7-12.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 7 to 12.