CN115294529A - Data enhancement method and system for distinguishing crowd activity properties - Google Patents

Data enhancement method and system for distinguishing crowd activity properties Download PDF

Info

Publication number
CN115294529A
CN115294529A CN202210968789.4A CN202210968789A CN115294529A CN 115294529 A CN115294529 A CN 115294529A CN 202210968789 A CN202210968789 A CN 202210968789A CN 115294529 A CN115294529 A CN 115294529A
Authority
CN
China
Prior art keywords
enhancement
image
data
data set
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210968789.4A
Other languages
Chinese (zh)
Inventor
高志鹏
吴俊毅
赵建强
张辉极
杜新胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN202210968789.4A priority Critical patent/CN115294529A/en
Publication of CN115294529A publication Critical patent/CN115294529A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Abstract

The invention provides a data enhancement method and a data enhancement system aiming at the judgment of the crowd activity property, which comprise the steps of preparing a crowd activity training data set and a pre-training model for judging the crowd activity property, and generating a thermodynamic diagram; randomly extracting a data pair from the crowd activity training data set, and mixing the image and the label by linear combination by using a pixel-level linear mixing enhancement strategy; splicing the images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing the labels according to an area ratio; and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set. The method and the device effectively and pertinently realize the expansion of the related sample library, and the expansion process and the expansion result can generate obvious positive influence on the judgment algorithm of the human group activity property.

Description

Data enhancement method and system for distinguishing crowd activity properties
Technical Field
The invention relates to the technical field of computer vision, in particular to a data enhancement method and a data enhancement system aiming at the judgment of crowd activity properties.
Background
And (4) judging the crowd activity property, namely performing semantic property summarization on public scenes with crowds and more people in the picture so as to summarize the names and the characteristics of human activities in the picture, wherein the sample generally has the undesirable properties of dispersed semantics, unclear emphasis and the like. Although the algorithm combining human target detection and semantic segmentation can count and locate human individuals in the picture, the algorithm cannot better integrate individual cross-correlation information for overall analysis. The crowd counting algorithm can ideally count the number of human bodies in a picture, but still cannot reflect the expressed human activity category. There are some auxiliary schemes for analyzing secondary semantics in pictures, such as capturing slogans, fluorescent bars, human body arrangement overlapping features, etc., but the overall process is too complex, heuristic traces are obvious, resource consumption is extremely high, and the effect is not good due to multiple influences of multiple parties. The classification model has the characteristics of direct reasoning, simplicity in use, low resource consumption, strong generality and the like, is a preferred scheme for the problems from the application angle, has great demand on sample size, is often unsatisfactory in direct training precision, and is easy to form scene overfitting. For example, a high-density crowd appears on a road and is often a certain scene characteristic of tourist shows, but tourist shows can also appear in occasions such as stadiums, indoor halls and the like; a large number of slogans and billboards may be an ancillary feature of a large prize event, but may also be a feature of a witness line. Scene overfitting can bind the category property in a specific scene, thereby seriously misleading the classification result of such pictures. At present, no enhancement scheme specially aiming at a crowd activity property discrimination algorithm is found.
Disclosure of Invention
In order to solve the technical problem that no enhancement scheme specially aiming at a crowd activity property distinguishing algorithm exists in the prior art, the invention provides a data enhancement method and a data enhancement system aiming at the crowd activity property distinguishing, so as to solve the technical problem.
According to a first aspect of the present invention, a data enhancement method for discriminating the nature of human activities is provided, which comprises:
s1: preparing a crowd activity training data set and a pre-training model for judging crowd activity properties to generate a thermodynamic diagram;
s2: randomly extracting a data pair from a crowd activity training data set, and mixing an image and a label by using a pixel-level linear mixing enhancement strategy and linear combination;
s3: splicing the images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing the labels according to an area ratio;
s4: and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set.
In some specific embodiments, the pre-training model comprises xception or Senet, and the crowd activity training data set is defined as { (I) i ,Y i ) I =0,1.. N-1}, where I = 5363 i ∈R 3 xWxH is a standard RGB image, Y i Is an image label.
In some specific embodiments, S2 is specifically: randomly extracting a data pair from a crowd activity training data set (I) 1 ,Y 1 ),(I 2 ,Y 2 ) }, setting two parameters b 1 、b 2 Distribution Beta (b) from one Beta 1 ,b 2 ) Two pairs of proportional parameters (gamma) are extracted 1 ,γ 2 ),(γ 3 ,γ 4 ) (ii) a Image and label blending with linear combination: i is M1 =γ 1 ×T s (I 1 )+(1-γ 1 )×T s (I 2 );U a =γ 1 ,U b =1-γ 1 ;Y M1 =U a ×Y 1 +U b ×Y 2 (ii) a Wherein I M1 For the mixed image, Y M1 For corresponding hybrid labels, T s Enhancing functions for random data of the same type meeting the requirement of fusion form scale.
In some specific embodiments, S3 is specifically represented as:
Figure BDA0003795694340000021
Q a =1-γ 2 ,Q b =γ 2 ;Y M2 =Q a ×Y 1 +Q b ×Y 2 (ii) a Wherein I M2 For the stitched image, Y M2 For corresponding hybrid labels, T s And randomly enhancing functions of the same type of data to meet the requirement of fusion form and scale.
In some specific embodiments, by enhancing the class gradient activation visualization policy in S4, the extraction of the output class activation heatmap is specifically represented as:
Figure BDA0003795694340000022
wherein
Figure BDA0003795694340000023
The class activation heat map, which is derived for the C-th class, i, j represents the pixel coordinates,
Figure BDA0003795694340000024
in order to activate the attention mask(s),
Figure BDA0003795694340000025
in order to adapt the coefficients of the motion vector,
Figure BDA0003795694340000026
is the kth feature map, for L c Up-sampling to make its size consistent with that of input image to obtain
Figure BDA0003795694340000027
To pair
Figure BDA0003795694340000028
And mapping the semantic map to make the sum of the pixels of the semantic map 1.
In some specific embodiments, the image quadratic mixture enhancement in S4 is specifically:
Figure BDA0003795694340000029
Figure BDA00037956943400000210
wherein
Figure BDA00037956943400000211
And
Figure BDA00037956943400000212
is two binary masks comprising an area ratio of gamma 3 Has a random frame area and an area ratio of gamma 4 Random frame region of (TR) θ For the conversion function, I M2 Is converted into a match I M1 The frame region of (a); the label fusion method comprises the following steps: y is Mix =C a ×Y M1 +C b ×Y M2 Wherein, C a ,C b Is the semantic weight of the quadratic hybrid label.
In some specific embodiments, the expansion ratio of the original data set in S4 is 35% of data generated by the pixel-level linear blending enhancement strategy, 35% of data generated by the region-level affine stitching enhancement strategy, and 30% of data generated by the image secondary blending.
According to a second aspect of the invention, a computer-readable storage medium is proposed, on which one or more computer programs are stored, which when executed by a computer processor implement the method of any of the above.
According to a third aspect of the present invention, there is provided a data enhancement system for crowd activity property determination, the system comprising:
a preparation unit: configuring a pre-training model for preparing a crowd activity training data set and distinguishing crowd activity properties to generate a thermodynamic diagram;
pixel level linear blend enhancement unit: configured to randomly extract a data pair from a crowd activity training data set, blend the image and the label using a pixel-level linear blending enhancement strategy, and utilize a linear combination;
and (3) an area-level affine splicing enhancing unit: configuring a region-level affine splicing enhancement strategy, splicing images through cutting and pasting operations, and mixing labels according to an area ratio;
a data set expansion unit: the system is configured to extract an output class activation heat map through an enhancement class gradient activation visualization strategy, execute secondary mixed enhancement and label fusion of an image, form a secondary mixed image enhancement data set and expand an original data set.
In some specific embodiments, the pre-training model comprises xception or Senet, and the crowd activity training data set is defined as { (I) i ,Y i ) I =0,1.. N-1}, where I = 5363 i ∈R 3 xWxH is a standard RGB image, Y i Is an image label.
In some specific embodiments, the pixel-level linear blend enhancement unit is specifically configured to: randomly extracting a data pair from a crowd activity training data set (I) 1 ,Y 1 ),(I 2 ,Y 2 ) }, setting two parameters b 1 、b 2 Distribution of Beta (b) from one Beta 1 ,b 2 ) Two pairs of proportional parameters (gamma) are extracted 1 ,γ 2 ),(γ 3 ,γ 4 ) (ii) a Image and label blending with linear combination: i is M1 =γ 1 ×T s (I 1 )+(1-γ 1 )×T s (I 2 );U a =γ 1 ,U b =1-γ 1 ;Y M1 =U a ×Y 1 +U b ×Y 2 (ii) a Wherein I M1 For the mixed image, Y M1 For corresponding hybrid labels, T s Enhancing functions for random data of the same type meeting the requirement of fusion form scale.
In some specific embodiments, the region-level affine stitching enhancing unit is specifically represented as:
Figure BDA0003795694340000031
Figure BDA0003795694340000032
Q a =1-γ 2 ,Q b =γ 2 ;Y M2 =Q a ×Y 1 +Q b ×Y 2 (ii) a Wherein I M2 For the stitched image, Y M2 For corresponding hybrid labels, T s And randomly enhancing functions of the same type of data to meet the requirement of fusion form and scale.
In some embodiments, the data set expansion unit is specifically configured to: by enhancing the class gradient activation visualization strategy, extracting an output class activation heat map specifically represented as:
Figure BDA0003795694340000033
Figure BDA0003795694340000034
wherein
Figure BDA0003795694340000035
The class activation heat map, which is derived for the C-th class, i, j represents the pixel coordinates,
Figure BDA0003795694340000036
in order to activate the attention mask(s),
Figure BDA0003795694340000037
in order to adapt the coefficients of the motion vector,
Figure BDA0003795694340000038
is the kth feature map, for L c Up-sampling to make its size consistent with that of input image to obtain
Figure BDA0003795694340000039
To pair
Figure BDA00037956943400000310
Mapping a semantic graph to enable the sum of pixels of the semantic graph to be 1; the image secondary mixing enhancement specifically comprises the following steps:
Figure BDA0003795694340000041
wherein
Figure BDA0003795694340000042
And
Figure BDA0003795694340000043
is two binary masks comprising an area ratio of gamma 3 Has a random frame area and an area ratio of gamma 4 Random frame region of (TR) θ For the conversion function, I M2 Is converted into a match I M1 The frame region of (a); the label fusion method comprises the following steps: y is Mix =C a ×Y M1 +C b ×Y M2 Wherein, C a ,C b Is the semantic weight of the quadratic hybrid label.
In some specific embodiments, the expansion ratio of the original data set is expanded to generate 35% of data by a pixel-level linear mixing enhancement strategy, 35% of data by an area-level affine stitching enhancement strategy, and 30% of data by image secondary mixing.
The invention provides a data enhancement method and a data enhancement system aiming at the judgment of the crowd activity property, and provides a novel crowd scene sample synthesis scheme aiming at the data enhancement method for the judgment of the crowd activity property, so that the expansion of a related sample library is effectively and pertinently realized. The expansion process and the result can both generate obvious positive influence on the judgment algorithm of the human group activity property. The method solves the problem of distinguishing the activity property of the crowd from the perspective of scene overfitting for the first time, focuses on data enhancement and sample set distribution rationalization and standardization, obtains a remarkable effect of solving the problem from the root, and can be adapted to any frame and any algorithm model.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram to which the present application may be applied;
FIG. 2 is a flow chart of a data enhancement method for crowd activity nature discrimination according to an embodiment of the present application;
FIG. 3 is a block diagram of a data enhancement system for crowd activity nature discrimination according to one embodiment of the present application;
FIG. 4 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which a data enhancement method for crowd activity nature discrimination according to an embodiment of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications, such as a data processing application, a data visualization application, a web browser application, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background information processing server providing support for mapping table data presented on the terminal devices 101, 102, 103. The background information processing server can process the acquired logical address and generate a processing result.
It should be noted that the method provided in the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, and 103, and the corresponding apparatus is generally disposed in the server 105, or may be disposed in the terminal devices 101, 102, and 103.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services), or as a single software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
Fig. 2 shows a flow chart of a data enhancement method for the determination of the crowd activity property according to an embodiment of the present application. As shown in fig. 2, the method includes:
s201: and preparing a crowd activity training data set and a pre-training model for distinguishing the crowd activity property for generating the thermodynamic diagram. Xception, S may be selectedAnd the enet and other models with strong classifying effect on fine-grained images. Preparing a population activity training data set defined as { (I) i ,Y i ) I =0,1,.. N-1}, where I = I i ∈R 3 xWxH is a standard RGB image, Y i Is an image label.
S202: randomly extracting a data pair from the crowd activity training data set, and mixing the image and the label by linear combination by using a pixel-level linear mixing enhancement strategy.
In a specific embodiment, randomly extracting a data pair from a population activity training data set (I) 1 ,Y 1 ),(I 2 ,Y 2 ) }, setting two parameters b 1 、b 2 Distribution of Beta (b) from one Beta 1 ,b 2 ) Two pairs of proportional parameters (gamma) are extracted 1 ,γ 2 ),(γ 3 ,γ 4 )。
In a specific embodiment, a pixel-level linear blend enhancement strategy is used, i.e. blending the image with the label with a linear combination: i is M1 =γ 1 ×T s (I 1 )+(1-γ 1 )×T s (I 2 );U a =γ 1 ,U b =1-γ 1 ;Y M1 =U a ×Y 1 +U b ×Y 2 (ii) a Wherein I M1 For the mixed image, Y M1 For corresponding hybrid labels, T s In order to meet the random same type data enhancement functions (namely, random execution of rotation, translation, cutting, noise addition, scale scaling, quality transformation and the like) with the fusion form and scale requirements, the method can improve the overall generalization, introduce an additional regularization effect and obviously gain the crowd activity problem.
S203: and splicing the images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing the labels according to the area ratio. The concrete expression is as follows:
Figure BDA0003795694340000061
Q a =1-γ 2 ,Q b =γ 2 ;Y M2 =Q a ×Y 1 +Q b ×Y 2 (ii) a WhereinI M2 For the stitched image, Y M2 For corresponding hybrid labels, T s And randomly enhancing functions of the same type of data to meet the requirement of fusion form and scale. The method has the advantages of integrating scene semantics, enriching data set contents, breaking the capability of general experience characteristics of crowd activities, and effectively relieving scene overfitting.
S204: and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set.
In a specific embodiment, the enhanced class gradient activation visualization strategy is used to extract the output class activation heat map, and the specific method is as follows:
Figure BDA0003795694340000062
wherein
Figure BDA0003795694340000063
Class activation heatmap, denoted as C-th class extraction, i, j denotes pixel coordinates, for L c Upsampling is performed to make its size consistent with the input image, denoted
Figure BDA0003795694340000064
To pair
Figure BDA0003795694340000065
And mapping the semantic map to make the sum of the pixels of the semantic map 1. Wherein
Figure BDA0003795694340000066
In order to activate the attention mask(s),
Figure BDA0003795694340000067
in order to adapt the coefficients of the motion vector,
Figure BDA0003795694340000068
is the kth feature map.
In a specific embodiment, a final image blending strategy is implemented, that is, the image is secondarily blended and enhanced:
Figure BDA0003795694340000069
Figure BDA00037956943400000610
wherein
Figure BDA00037956943400000611
And
Figure BDA00037956943400000612
is two binary masks, including an area ratio of gamma 3 Has a random frame area and an area ratio of gamma 4 Random frame region of (TR) θ For the conversion function, I M2 Is converted into a match I M1 The frame region of (a); the label fusion method comprises the following steps: y is Mix =C a ×Y M1 +C b ×Y M2 Wherein, in the step (A),
Figure BDA00037956943400000613
Figure BDA00037956943400000614
K I1 、K I2 indicating that the corresponding activation-like heat map is semantically mapped such that the sum of its pixels is 1, which can be specifically expressed as
Figure BDA0003795694340000071
C a ,C b Is the semantic weight of the quadratic hybrid label.
In a specific embodiment, based on the above method, the image enhancement data set is formed by twice blending to expand the original data set. The scale is to generate 35% of the data using a pixel-level linear hybrid enhancement strategy approach, 35% of the data using an enhanced gradient-activation visualization strategy, and 30% of the data using a quadratic hybrid approach.
The data enhancement method aiming at the crowd activity property judgment provides a novel crowd scene sample synthesis scheme, so that the expansion of a related sample library is effectively and pertinently realized. The expansion process and the result can both generate obvious positive influence on the judgment algorithm of the human group activity property. The method solves the problem of distinguishing the activity property of the crowd from the perspective of scene overfitting for the first time, focuses on data enhancement and sample set distribution rationalization and normalization, obtains a more obvious effect of solving the problem from the root, and can be adapted to any frame and any algorithm model.
With continued reference to fig. 3, fig. 3 illustrates a block diagram of a data enhancement system for crowd activity nature discrimination according to an embodiment of the present application. The system specifically comprises a preparation unit 301, a pixel-level linear mixing enhancement unit 302, a region-level affine stitching enhancement unit 303 and a data set expansion unit 304. The preparation unit 301 is configured to prepare a crowd activity training data set and a pre-training model for crowd activity property determination, so as to generate a thermodynamic diagram; the pixel-level linear mixture enhancement unit 302 is configured to randomly extract a data pair from the crowd activity training data set, blend the image and the label with a linear combination using a pixel-level linear mixture enhancement strategy; the region-level affine stitching enhancing unit 303 is configured to use a region-level affine stitching enhancing strategy, stitch the images through a shearing and pasting operation, and mix the labels according to an area ratio; the data set expansion unit 304 is configured to activate the visualization policy by enhancing the class gradient, extract the output class activation heat map, perform secondary hybrid enhancement of the image and tag fusion, form a secondary hybrid image enhanced data set, and expand the original data set.
Referring now to FIG. 4, shown is a block diagram of a computer system 400 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU) 401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable storage medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer-readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: preparing a crowd activity training data set and a pre-training model for judging crowd activity properties to generate a thermodynamic diagram; randomly extracting a data pair from the crowd activity training data set, and mixing the image and the label by using a pixel-level linear mixing enhancement strategy and linear combination; splicing images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing labels according to an area ratio; and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set.
The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (14)

1. A data enhancement method aiming at the discrimination of the activity property of the crowd is characterized by comprising the following steps:
s1: preparing a crowd activity training data set and a pre-training model for distinguishing crowd activity properties to generate a thermodynamic diagram;
s2: randomly extracting a data pair from the crowd activity training data set, and mixing the image and the label by linear combination by using a pixel-level linear mixing enhancement strategy;
s3: splicing images by using a region-level affine splicing enhancement strategy through a shearing and pasting operation, and mixing labels according to an area ratio;
s4: and extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement of the image and label fusion, and forming a secondary mixed image enhancement data set for expanding the original data set.
2. The method of claim 1, wherein the pre-training model comprises xception or Senet, and the crowd activity training data set is defined as { (I) i ,Y i ) I =0,1.. N-1}, where I = 5363 i ∈R 3 xWxH is a standard RGB image, Y i Is an image label.
3. The method of claim 2, wherein the S2 is specifically: randomly extracting a data pair from the crowd activity training data set{(I 1 ,Y 1 ),(I 2 ,Y 2 ) }, setting two parameters b 1 、b 2 Distribution of Beta (b) from one Beta 1 ,b 2 ) Two pairs of proportional parameters (gamma) are extracted 1 ,γ 2 ),(γ 3 ,γ 4 ) (ii) a Image and label blending with linear combination: i is M1 =γ 1 ×T s (I 1 )+(1-γ 1 )×T s (I 2 );U a =γ 1 ,U b =1-γ 1 ;Y M1 =U a ×Y 1 +U b ×Y 2 (ii) a Wherein I M1 For the mixed image, Y M1 For corresponding hybrid labels, T s Enhancing functions for random data of the same type meeting the requirement of fusion form scale.
4. The method of claim 3, wherein S3 is specifically expressed as:
Figure FDA0003795694330000011
Q a =1-γ 2 ,Q b =γ 2 ;Y M2 =Q a ×Y 1 +Q b ×Y 2 (ii) a Wherein I M2 For the stitched image, Y M2 For corresponding hybrid labels, T s And randomly enhancing functions of the same type of data to meet the requirement of fusion form and scale.
5. The data enhancement method for distinguishing the nature of human activities according to claim 4, wherein in the step S4, the class gradient activation visualization strategy is enhanced, and the extracted output class activation heat map is specifically represented as:
Figure FDA0003795694330000012
Figure FDA0003795694330000019
wherein
Figure FDA0003795694330000013
The class activation heat map, which is derived for the C-th class, i, j represents the pixel coordinates,
Figure FDA0003795694330000014
in order to activate the attention mask(s),
Figure FDA0003795694330000015
in order to adapt the coefficients of the motion vector,
Figure FDA0003795694330000016
is the kth feature map, for L c Up-sampling to make its size consistent with that of input image to obtain
Figure FDA0003795694330000017
To pair
Figure FDA0003795694330000018
And mapping the semantic map to make the sum of the pixels of the semantic map 1.
6. The method of claim 5, wherein the enhancing the image quadratic mixture in S4 is specifically:
Figure FDA0003795694330000021
wherein
Figure FDA0003795694330000022
And
Figure FDA0003795694330000023
is two binary masks, including an area ratio of gamma 3 Has a random frame area and an area ratio of gamma 4 Random frame region of (TR) θ For the conversion function, add I M2 Is converted into a match I M1 The frame area of (a); the label fusion method comprises the following steps: y is Mix =C a ×Y M1 +Cb×Y M2 Wherein, C a ,C b Is the semantic weight of the quadratic hybrid label.
7. The method of claim 1, wherein the expansion ratio of the expanded original data set in S4 is 35% of data generated by the pixel-level linear mixture enhancement strategy, 35% of data generated by the region-level affine stitching enhancement strategy, and 30% of data generated by the image secondary mixture.
8. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any one of claims 1 to 7.
9. A data enhancement system for discriminating on the nature of human activity, the system comprising:
a preparation unit: configuring a pre-training model for preparing a crowd activity training data set and distinguishing crowd activity properties to generate a thermodynamic diagram;
pixel level linear blend enhancement unit: configured to randomly extract a data pair from the crowd activity training data set, blend the image and the label with a linear combination using a pixel-level linear blend enhancement strategy;
and (3) an area-level affine splicing enhancing unit: configuring a region-level affine splicing enhancement strategy, splicing images through cutting and pasting operations, and mixing labels according to an area ratio;
a data set extension unit: the method is configured for extracting an output class activation heat map through an enhancement class gradient activation visualization strategy, executing secondary mixed enhancement and label fusion of an image, forming a secondary mixed image enhancement data set, and expanding an original data set.
10. The numbers judged for the nature of human activity according to claim 9The data enhancement system is characterized in that the pre-training model comprises xception or Senet, and the crowd activity training data set is defined as { (I) i ,Y i ) I =0,1.. N-1}, where I = 5363 i ∈R 3 xWxH is a standard RGB image, Y i Is an image label.
11. The data enhancement system for crowd activity property discrimination as claimed in claim 10, wherein the pixel-level linear blend enhancement unit is specifically configured to: randomly extracting a data pair from the crowd activity training data set (I) 1 ,Y 1 ),(I 2 ,Y 2 ) }, setting two parameters b 1 、b 2 Distribution Beta (b) from one Beta 1 ,b 2 ) Two pairs of proportional parameters (gamma) are extracted 1 ,γ 2 ),(γ 3 ,γ 4 ) (ii) a Image and label blending with linear combination: i is M1 =γ 1 ×T s (I 1 )+(1-γ 1 )×T s (I 2 );U a =γ 1 ,U b =1-γ 1 ;Y M1 =U a ×Y 1 +U b ×Y 2 (ii) a Wherein I M1 For the mixed image, Y M1 For corresponding hybrid labels, T s Enhancing functions for random data of the same type meeting the requirement of fusion form scale.
12. The data enhancement system for crowd activity property discrimination as claimed in claim 11, wherein the region-level affine stitching enhancement unit is specifically represented as:
Figure FDA0003795694330000031
Q a =1-γ 2 ,Q b =γ 2 ;Y M2 =Q a ×Y 1 +Q b ×Y 2 (ii) a Wherein I M2 For the stitched image, Y M2 For corresponding hybrid labels, T s And randomly enhancing functions of the same type of data to meet the requirement of fusion form and scale.
13. The data enhancement system for crowd activity property discrimination according to claim 12, wherein the data set augmenting unit is specifically configured to: by enhancing the class gradient activation visualization strategy, the extracted output class activation heatmap is specifically represented as:
Figure FDA0003795694330000032
wherein
Figure FDA0003795694330000033
The class activation heat map, which is derived for the C-th class, i, j represents the pixel coordinates,
Figure FDA0003795694330000034
in order to activate the attention mask, the user may,
Figure FDA0003795694330000035
in order to adapt the coefficients of the motion vector,
Figure FDA0003795694330000036
is the kth feature map, for L c Up-sampling to make its size consistent with that of input image to obtain
Figure FDA0003795694330000037
To pair
Figure FDA0003795694330000038
Mapping a semantic graph to enable the sum of pixels of the semantic graph to be 1; the image secondary mixing enhancement specifically comprises the following steps:
Figure FDA0003795694330000039
Figure FDA00037956943300000310
wherein
Figure FDA00037956943300000311
And
Figure FDA00037956943300000312
is two binary masks comprising an area ratio of gamma 3 Has a random frame area and an area ratio of gamma 4 Random frame region of (TR) θ For the conversion function, I M2 Is converted into a match I M1 The frame region of (a); the label fusion method comprises the following steps: y is Mix =C a ×Y M1 +C b ×Y M2 Wherein, C a ,C b Is the semantic weight of the quadratic hybrid label.
14. The data enhancement system for crowd activity property discrimination according to claim 9, wherein the expansion ratio of the expanded raw data set is 35% of data generated by the pixel-level linear blending enhancement strategy, 35% of data generated by the region-level affine stitching enhancement strategy, and 30% of data generated by the image secondary blending.
CN202210968789.4A 2022-08-12 2022-08-12 Data enhancement method and system for distinguishing crowd activity properties Pending CN115294529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210968789.4A CN115294529A (en) 2022-08-12 2022-08-12 Data enhancement method and system for distinguishing crowd activity properties

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210968789.4A CN115294529A (en) 2022-08-12 2022-08-12 Data enhancement method and system for distinguishing crowd activity properties

Publications (1)

Publication Number Publication Date
CN115294529A true CN115294529A (en) 2022-11-04

Family

ID=83830056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210968789.4A Pending CN115294529A (en) 2022-08-12 2022-08-12 Data enhancement method and system for distinguishing crowd activity properties

Country Status (1)

Country Link
CN (1) CN115294529A (en)

Similar Documents

Publication Publication Date Title
CN110458918B (en) Method and device for outputting information
CN109618222A (en) A kind of splicing video generation method, device, terminal device and storage medium
US20140204119A1 (en) Generating augmented reality exemplars
WO2020233166A1 (en) Comment data provision and display method, apparatus, electronic device, and storage medium
CN111275784B (en) Method and device for generating image
CN111681177B (en) Video processing method and device, computer readable storage medium and electronic equipment
CN112954450B (en) Video processing method and device, electronic equipment and storage medium
CN112017257B (en) Image processing method, apparatus and storage medium
CN112839223B (en) Image compression method, image compression device, storage medium and electronic equipment
CN112102445B (en) Building poster manufacturing method, device, equipment and computer readable storage medium
CN113569740B (en) Video recognition model training method and device, and video recognition method and device
CN112651475B (en) Two-dimensional code display method, device, equipment and medium
US20130182943A1 (en) Systems and methods for depth map generation
CN113409188A (en) Image background replacing method, system, electronic equipment and storage medium
TW201020968A (en) System, method, and computer program product for preventing display of unwanted content stored in a frame buffer
CN115294529A (en) Data enhancement method and system for distinguishing crowd activity properties
CN112954452B (en) Video generation method, device, terminal and storage medium
CN111914850B (en) Picture feature extraction method, device, server and medium
US10997365B2 (en) Dynamically generating a visually enhanced document
Chang et al. Distortion-free data embedding scheme for high dynamic range images
Gibin et al. Collaborative mapping of London using google maps: the LondonProfiler
Ngo et al. Pixel-Wise Information in Fake Image Detection
CN107742096A (en) Obtain method and device, electronic equipment, the storage medium of characteristic chart information
CN113360797B (en) Information processing method, apparatus, device, storage medium, and computer program product
US9165339B2 (en) Blending map data with additional imagery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination