CN115294529A

CN115294529A - Data enhancement method and system for distinguishing crowd activity properties

Info

Publication number: CN115294529A
Application number: CN202210968789.4A
Authority: CN
Inventors: 高志鹏; 吴俊毅; 赵建强; 张辉极; 杜新胜
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Meiya Pico Information Co Ltd
Priority date: 2022-08-12
Filing date: 2022-08-12
Publication date: 2022-11-04

Abstract

The invention provides a data enhancement method and a data enhancement system aiming at the judgment of the crowd activity property, which comprise the steps of preparing a crowd activity training data set and a pre-training model for judging the crowd activity property, and generating a thermodynamic diagram; randomly extracting a data pair from the crowd activity training data set, and mixing the image and the label by linear combination by using a pixel-level linear mixing enhancement strategy; splicing the images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing the labels according to an area ratio; and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set. The method and the device effectively and pertinently realize the expansion of the related sample library, and the expansion process and the expansion result can generate obvious positive influence on the judgment algorithm of the human group activity property.

Description

Data enhancement method and system for distinguishing crowd activity properties

Technical Field

The invention relates to the technical field of computer vision, in particular to a data enhancement method and a data enhancement system aiming at the judgment of crowd activity properties.

Background

And (4) judging the crowd activity property, namely performing semantic property summarization on public scenes with crowds and more people in the picture so as to summarize the names and the characteristics of human activities in the picture, wherein the sample generally has the undesirable properties of dispersed semantics, unclear emphasis and the like. Although the algorithm combining human target detection and semantic segmentation can count and locate human individuals in the picture, the algorithm cannot better integrate individual cross-correlation information for overall analysis. The crowd counting algorithm can ideally count the number of human bodies in a picture, but still cannot reflect the expressed human activity category. There are some auxiliary schemes for analyzing secondary semantics in pictures, such as capturing slogans, fluorescent bars, human body arrangement overlapping features, etc., but the overall process is too complex, heuristic traces are obvious, resource consumption is extremely high, and the effect is not good due to multiple influences of multiple parties. The classification model has the characteristics of direct reasoning, simplicity in use, low resource consumption, strong generality and the like, is a preferred scheme for the problems from the application angle, has great demand on sample size, is often unsatisfactory in direct training precision, and is easy to form scene overfitting. For example, a high-density crowd appears on a road and is often a certain scene characteristic of tourist shows, but tourist shows can also appear in occasions such as stadiums, indoor halls and the like; a large number of slogans and billboards may be an ancillary feature of a large prize event, but may also be a feature of a witness line. Scene overfitting can bind the category property in a specific scene, thereby seriously misleading the classification result of such pictures. At present, no enhancement scheme specially aiming at a crowd activity property discrimination algorithm is found.

Disclosure of Invention

In order to solve the technical problem that no enhancement scheme specially aiming at a crowd activity property distinguishing algorithm exists in the prior art, the invention provides a data enhancement method and a data enhancement system aiming at the crowd activity property distinguishing, so as to solve the technical problem.

According to a first aspect of the present invention, a data enhancement method for discriminating the nature of human activities is provided, which comprises:

s1: preparing a crowd activity training data set and a pre-training model for judging crowd activity properties to generate a thermodynamic diagram;

s2: randomly extracting a data pair from a crowd activity training data set, and mixing an image and a label by using a pixel-level linear mixing enhancement strategy and linear combination;

s3: splicing the images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing the labels according to an area ratio;

s4: and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set.

In some specific embodiments, the pre-training model comprises xception or Senet, and the crowd activity training data set is defined as { (I) _i ，Y _i ) I =0,1.. N-1}, where I = 5363 _i ∈R ³ xWxH is a standard RGB image, Y _i Is an image label.

In some specific embodiments, S2 is specifically: randomly extracting a data pair from a crowd activity training data set (I) ₁ ，Y ₁ )，(I ₂ ，Y ₂ ) }, setting two parameters b ₁ 、b ₂ Distribution Beta (b) from one Beta ₁ ，b ₂ ) Two pairs of proportional parameters (gamma) are extracted ₁ ，γ ₂ )，(γ ₃ ，γ ₄ ) (ii) a Image and label blending with linear combination: i is _M1 ＝γ ₁ ×T _s (I ₁ )+(1-γ ₁ )×T _s (I ₂ )；U _a ＝γ ₁ ，U _b ＝1-γ ₁ ；Y _M1 ＝U _a ×Y ₁ +U _b ×Y ₂ (ii) a Wherein I _M1 For the mixed image, Y _M1 For corresponding hybrid labels, T _s Enhancing functions for random data of the same type meeting the requirement of fusion form scale.

In some specific embodiments, S3 is specifically represented as:

Q _a ＝1-γ ₂ ，Q _b ＝γ ₂ ；Y _M2 ＝Q _a ×Y ₁ +Q _b ×Y ₂ (ii) a Wherein I _M2 For the stitched image, Y _M2 For corresponding hybrid labels, T _s And randomly enhancing functions of the same type of data to meet the requirement of fusion form and scale.

In some specific embodiments, by enhancing the class gradient activation visualization policy in S4, the extraction of the output class activation heatmap is specifically represented as:

wherein

The class activation heat map, which is derived for the C-th class, i, j represents the pixel coordinates,

in order to activate the attention mask(s),

in order to adapt the coefficients of the motion vector,

is the kth feature map, for L ^c Up-sampling to make its size consistent with that of input image to obtain

To pair

And mapping the semantic map to make the sum of the pixels of the semantic map 1.

In some specific embodiments, the image quadratic mixture enhancement in S4 is specifically:

wherein

And

is two binary masks comprising an area ratio of gamma ₃ Has a random frame area and an area ratio of gamma ₄ Random frame region of (TR) _θ For the conversion function, I _M2 Is converted into a match I _M1 The frame region of (a); the label fusion method comprises the following steps: y is _Mix ＝C _a ×Y _M1 +C _b ×Y _M2 Wherein, C _a ，C _b Is the semantic weight of the quadratic hybrid label.

In some specific embodiments, the expansion ratio of the original data set in S4 is 35% of data generated by the pixel-level linear blending enhancement strategy, 35% of data generated by the region-level affine stitching enhancement strategy, and 30% of data generated by the image secondary blending.

According to a second aspect of the invention, a computer-readable storage medium is proposed, on which one or more computer programs are stored, which when executed by a computer processor implement the method of any of the above.

According to a third aspect of the present invention, there is provided a data enhancement system for crowd activity property determination, the system comprising:

a preparation unit: configuring a pre-training model for preparing a crowd activity training data set and distinguishing crowd activity properties to generate a thermodynamic diagram;

pixel level linear blend enhancement unit: configured to randomly extract a data pair from a crowd activity training data set, blend the image and the label using a pixel-level linear blending enhancement strategy, and utilize a linear combination;

and (3) an area-level affine splicing enhancing unit: configuring a region-level affine splicing enhancement strategy, splicing images through cutting and pasting operations, and mixing labels according to an area ratio;

a data set expansion unit: the system is configured to extract an output class activation heat map through an enhancement class gradient activation visualization strategy, execute secondary mixed enhancement and label fusion of an image, form a secondary mixed image enhancement data set and expand an original data set.

In some specific embodiments, the pixel-level linear blend enhancement unit is specifically configured to: randomly extracting a data pair from a crowd activity training data set (I) ₁ ，Y ₁ )，(I ₂ ，Y ₂ ) }, setting two parameters b ₁ 、b ₂ Distribution of Beta (b) from one Beta ₁ ，b ₂ ) Two pairs of proportional parameters (gamma) are extracted ₁ ，γ ₂ )，(γ ₃ ，γ ₄ ) (ii) a Image and label blending with linear combination: i is _M1 ＝γ ₁ ×T _s (I ₁ )+(1-γ ₁ )×T _s (I ₂ )；U _a ＝γ ₁ ，U _b ＝1-γ ₁ ；Y _M1 ＝U _a ×Y ₁ +U _b ×Y ₂ (ii) a Wherein I _M1 For the mixed image, Y _M1 For corresponding hybrid labels, T _s Enhancing functions for random data of the same type meeting the requirement of fusion form scale.

In some specific embodiments, the region-level affine stitching enhancing unit is specifically represented as:

In some embodiments, the data set expansion unit is specifically configured to: by enhancing the class gradient activation visualization strategy, extracting an output class activation heat map specifically represented as:

wherein

in order to activate the attention mask(s),

in order to adapt the coefficients of the motion vector,

To pair

Mapping a semantic graph to enable the sum of pixels of the semantic graph to be 1; the image secondary mixing enhancement specifically comprises the following steps:

wherein

And

In some specific embodiments, the expansion ratio of the original data set is expanded to generate 35% of data by a pixel-level linear mixing enhancement strategy, 35% of data by an area-level affine stitching enhancement strategy, and 30% of data by image secondary mixing.

The invention provides a data enhancement method and a data enhancement system aiming at the judgment of the crowd activity property, and provides a novel crowd scene sample synthesis scheme aiming at the data enhancement method for the judgment of the crowd activity property, so that the expansion of a related sample library is effectively and pertinently realized. The expansion process and the result can both generate obvious positive influence on the judgment algorithm of the human group activity property. The method solves the problem of distinguishing the activity property of the crowd from the perspective of scene overfitting for the first time, focuses on data enhancement and sample set distribution rationalization and standardization, obtains a remarkable effect of solving the problem from the root, and can be adapted to any frame and any algorithm model.

Drawings

The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which the present application may be applied;

FIG. 2 is a flow chart of a data enhancement method for crowd activity nature discrimination according to an embodiment of the present application;

FIG. 3 is a block diagram of a data enhancement system for crowd activity nature discrimination according to one embodiment of the present application;

FIG. 4 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which a data enhancement method for crowd activity nature discrimination according to an embodiment of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications, such as a data processing application, a data visualization application, a web browser application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background information processing server providing support for mapping table data presented on the

terminal devices

101, 102, 103. The background information processing server can process the acquired logical address and generate a processing result.

It should be noted that the method provided in the embodiment of the present application may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103, and the corresponding apparatus is generally disposed in the server 105, or may be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services), or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

Fig. 2 shows a flow chart of a data enhancement method for the determination of the crowd activity property according to an embodiment of the present application. As shown in fig. 2, the method includes:

s201: and preparing a crowd activity training data set and a pre-training model for distinguishing the crowd activity property for generating the thermodynamic diagram. Xception, S may be selectedAnd the enet and other models with strong classifying effect on fine-grained images. Preparing a population activity training data set defined as { (I) _i ，Y _i ) I =0,1,.. N-1}, where I = I _i ∈R ³ xWxH is a standard RGB image, Y _i Is an image label.

S202: randomly extracting a data pair from the crowd activity training data set, and mixing the image and the label by linear combination by using a pixel-level linear mixing enhancement strategy.

In a specific embodiment, randomly extracting a data pair from a population activity training data set (I) ₁ ，Y ₁ )，(I ₂ ，Y ₂ ) }, setting two parameters b ₁ 、b ₂ Distribution of Beta (b) from one Beta ₁ ，b ₂ ) Two pairs of proportional parameters (gamma) are extracted ₁ ，γ ₂ )，(γ ₃ ，γ ₄ )。

In a specific embodiment, a pixel-level linear blend enhancement strategy is used, i.e. blending the image with the label with a linear combination: i is _M1 ＝γ ₁ ×T _s (I ₁ )+(1-γ ₁ )×T _s (I ₂ )；U _a ＝γ ₁ ，U _b ＝1-γ ₁ ；Y _M1 ＝U _a ×Y ₁ +U _b ×Y ₂ (ii) a Wherein I _M1 For the mixed image, Y _M1 For corresponding hybrid labels, T _s In order to meet the random same type data enhancement functions (namely, random execution of rotation, translation, cutting, noise addition, scale scaling, quality transformation and the like) with the fusion form and scale requirements, the method can improve the overall generalization, introduce an additional regularization effect and obviously gain the crowd activity problem.

S203: and splicing the images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing the labels according to the area ratio. The concrete expression is as follows:

Q _a ＝1-γ ₂ ，Q _b ＝γ ₂ ；Y _M2 ＝Q _a ×Y ₁ +Q _b ×Y ₂ (ii) a WhereinI _M2 For the stitched image, Y _M2 For corresponding hybrid labels, T _s And randomly enhancing functions of the same type of data to meet the requirement of fusion form and scale. The method has the advantages of integrating scene semantics, enriching data set contents, breaking the capability of general experience characteristics of crowd activities, and effectively relieving scene overfitting.

S204: and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set.

In a specific embodiment, the enhanced class gradient activation visualization strategy is used to extract the output class activation heat map, and the specific method is as follows:

wherein

Class activation heatmap, denoted as C-th class extraction, i, j denotes pixel coordinates, for L ^c Upsampling is performed to make its size consistent with the input image, denoted

To pair

And mapping the semantic map to make the sum of the pixels of the semantic map 1. Wherein

In order to activate the attention mask(s),

in order to adapt the coefficients of the motion vector,

is the kth feature map.

In a specific embodiment, a final image blending strategy is implemented, that is, the image is secondarily blended and enhanced:

wherein

And

is two binary masks, including an area ratio of gamma ₃ Has a random frame area and an area ratio of gamma ₄ Random frame region of (TR) _θ For the conversion function, I _M2 Is converted into a match I _M1 The frame region of (a); the label fusion method comprises the following steps: y is _Mix ＝C _a ×Y _M1 +C _b ×Y _M2 Wherein, in the step (A),

K _I1 、K _I2 indicating that the corresponding activation-like heat map is semantically mapped such that the sum of its pixels is 1, which can be specifically expressed as

C _a ，C _b Is the semantic weight of the quadratic hybrid label.

In a specific embodiment, based on the above method, the image enhancement data set is formed by twice blending to expand the original data set. The scale is to generate 35% of the data using a pixel-level linear hybrid enhancement strategy approach, 35% of the data using an enhanced gradient-activation visualization strategy, and 30% of the data using a quadratic hybrid approach.

The data enhancement method aiming at the crowd activity property judgment provides a novel crowd scene sample synthesis scheme, so that the expansion of a related sample library is effectively and pertinently realized. The expansion process and the result can both generate obvious positive influence on the judgment algorithm of the human group activity property. The method solves the problem of distinguishing the activity property of the crowd from the perspective of scene overfitting for the first time, focuses on data enhancement and sample set distribution rationalization and normalization, obtains a more obvious effect of solving the problem from the root, and can be adapted to any frame and any algorithm model.

With continued reference to fig. 3, fig. 3 illustrates a block diagram of a data enhancement system for crowd activity nature discrimination according to an embodiment of the present application. The system specifically comprises a preparation unit 301, a pixel-level linear mixing enhancement unit 302, a region-level affine stitching enhancement unit 303 and a data set expansion unit 304. The preparation unit 301 is configured to prepare a crowd activity training data set and a pre-training model for crowd activity property determination, so as to generate a thermodynamic diagram; the pixel-level linear mixture enhancement unit 302 is configured to randomly extract a data pair from the crowd activity training data set, blend the image and the label with a linear combination using a pixel-level linear mixture enhancement strategy; the region-level affine stitching enhancing unit 303 is configured to use a region-level affine stitching enhancing strategy, stitch the images through a shearing and pasting operation, and mix the labels according to an area ratio; the data set expansion unit 304 is configured to activate the visualization policy by enhancing the class gradient, extract the output class activation heat map, perform secondary hybrid enhancement of the image and tag fusion, form a secondary hybrid image enhanced data set, and expand the original data set.

Referring now to FIG. 4, shown is a block diagram of a computer system 400 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU) 401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable storage medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present application may be implemented by software or hardware.

As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer-readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: preparing a crowd activity training data set and a pre-training model for judging crowd activity properties to generate a thermodynamic diagram; randomly extracting a data pair from the crowd activity training data set, and mixing the image and the label by using a pixel-level linear mixing enhancement strategy and linear combination; splicing images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing labels according to an area ratio; and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set.

The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A data enhancement method aiming at the discrimination of the activity property of the crowd is characterized by comprising the following steps:

s1: preparing a crowd activity training data set and a pre-training model for distinguishing crowd activity properties to generate a thermodynamic diagram;

s2: randomly extracting a data pair from the crowd activity training data set, and mixing the image and the label by linear combination by using a pixel-level linear mixing enhancement strategy;

s3: splicing images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing labels according to an area ratio;

2. The method of claim 1, wherein the pre-training model comprises xception or Senet, and the crowd activity training data set is defined as { (I) _i ，Y _i ) I =0,1.. N-1}, where I = 5363 _i ∈R ³ xWxH is a standard RGB image, Y _i Is an image label.

3. The method of claim 2, wherein the S2 is specifically: randomly extracting a data pair from the crowd activity training data set{(I ₁ ，Y ₁ )，(I ₂ ，Y ₂ ) }, setting two parameters b ₁ 、b ₂ Distribution of Beta (b) from one Beta ₁ ，b ₂ ) Two pairs of proportional parameters (gamma) are extracted ₁ ，γ ₂ )，(γ ₃ ，γ ₄ ) (ii) a Image and label blending with linear combination: i is _M1 ＝γ ₁ ×T _s (I ₁ )+(1-γ ₁ )×T _s (I ₂ )；U _a ＝γ ₁ ，U _b ＝1-γ ₁ ；Y _M1 ＝U _a ×Y ₁ +U _b ×Y ₂ (ii) a Wherein I _M1 For the mixed image, Y _M1 For corresponding hybrid labels, T _s Enhancing functions for random data of the same type meeting the requirement of fusion form scale.

4. The method of claim 3, wherein S3 is specifically expressed as:

5. The data enhancement method for distinguishing the nature of human activities according to claim 4, wherein in the step S4, the class gradient activation visualization strategy is enhanced, and the extracted output class activation heat map is specifically represented as:

wherein

in order to activate the attention mask(s),

in order to adapt the coefficients of the motion vector,

To pair

6. The method of claim 5, wherein the enhancing the image quadratic mixture in S4 is specifically:

wherein

And

is two binary masks, including an area ratio of gamma ₃ Has a random frame area and an area ratio of gamma ₄ Random frame region of (TR) _θ For the conversion function, add I _M2 Is converted into a match I _M1 The frame area of (a); the label fusion method comprises the following steps: y is _Mix ＝C _a ×Y _M1 +Cb×Y _M2 Wherein, C _a ，C _b Is the semantic weight of the quadratic hybrid label.

7. The method of claim 1, wherein the expansion ratio of the expanded original data set in S4 is 35% of data generated by the pixel-level linear mixture enhancement strategy, 35% of data generated by the region-level affine stitching enhancement strategy, and 30% of data generated by the image secondary mixture.

8. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any one of claims 1 to 7.

9. A data enhancement system for discriminating on the nature of human activity, the system comprising:

pixel level linear blend enhancement unit: configured to randomly extract a data pair from the crowd activity training data set, blend the image and the label with a linear combination using a pixel-level linear blend enhancement strategy;

a data set extension unit: the method is configured for extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement and label fusion of an image, forming a secondary mixed image enhancement data set, and expanding an original data set.

10. The numbers judged for the nature of human activity according to claim 9The data enhancement system is characterized in that the pre-training model comprises xception or Senet, and the crowd activity training data set is defined as { (I) _i ，Y _i ) I =0,1.. N-1}, where I = 5363 _i ∈R ³ xWxH is a standard RGB image, Y _i Is an image label.

11. The data enhancement system for crowd activity property discrimination as claimed in claim 10, wherein the pixel-level linear blend enhancement unit is specifically configured to: randomly extracting a data pair from the crowd activity training data set (I) ₁ ，Y ₁ )，(I ₂ ，Y ₂ ) }, setting two parameters b ₁ 、b ₂ Distribution Beta (b) from one Beta ₁ ，b ₂ ) Two pairs of proportional parameters (gamma) are extracted ₁ ，γ ₂ )，(γ ₃ ，γ ₄ ) (ii) a Image and label blending with linear combination: i is _M1 ＝γ ₁ ×T _s (I ₁ )+(1-γ ₁ )×T _s (I ₂ )；U _a ＝γ ₁ ，U _b ＝1-γ ₁ ；Y _M1 ＝U _a ×Y ₁ +U _b ×Y ₂ (ii) a Wherein I _M1 For the mixed image, Y _M1 For corresponding hybrid labels, T _s Enhancing functions for random data of the same type meeting the requirement of fusion form scale.

12. The data enhancement system for crowd activity property discrimination as claimed in claim 11, wherein the region-level affine stitching enhancement unit is specifically represented as:

13. The data enhancement system for crowd activity property discrimination according to claim 12, wherein the data set augmenting unit is specifically configured to: by enhancing the class gradient activation visualization strategy, the extracted output class activation heatmap is specifically represented as:

wherein

in order to activate the attention mask, the user may,

in order to adapt the coefficients of the motion vector,

To pair

wherein

And

14. The data enhancement system for crowd activity property discrimination according to claim 9, wherein the expansion ratio of the expanded raw data set is 35% of data generated by the pixel-level linear blending enhancement strategy, 35% of data generated by the region-level affine stitching enhancement strategy, and 30% of data generated by the image secondary blending.