CN113010572B - Public digital life scene rule model prediction early warning method based on deep Bayesian network - Google Patents
Public digital life scene rule model prediction early warning method based on deep Bayesian network Download PDFInfo
- Publication number
- CN113010572B CN113010572B CN202110292515.3A CN202110292515A CN113010572B CN 113010572 B CN113010572 B CN 113010572B CN 202110292515 A CN202110292515 A CN 202110292515A CN 113010572 B CN113010572 B CN 113010572B
- Authority
- CN
- China
- Prior art keywords
- data
- user
- information
- early warning
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000007405 data analysis Methods 0.000 claims abstract description 12
- 230000006399 behavior Effects 0.000 claims description 48
- 238000004422 calculation algorithm Methods 0.000 claims description 33
- 238000012545 processing Methods 0.000 claims description 22
- 239000013598 vector Substances 0.000 claims description 19
- 238000004458 analytical method Methods 0.000 claims description 18
- 230000008451 emotion Effects 0.000 claims description 18
- 238000005516 engineering process Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 18
- 238000010276 construction Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 238000009826 distribution Methods 0.000 claims description 10
- 238000007418 data mining Methods 0.000 claims description 9
- 238000011068 loading method Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000003058 natural language processing Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 238000001514 detection method Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 6
- 230000005021 gait Effects 0.000 claims description 6
- 230000014509 gene expression Effects 0.000 claims description 6
- 230000000306 recurrent effect Effects 0.000 claims description 6
- 238000007621 cluster analysis Methods 0.000 claims description 5
- 230000008909 emotion recognition Effects 0.000 claims description 5
- 238000012544 monitoring process Methods 0.000 claims description 5
- 238000003066 decision tree Methods 0.000 claims description 4
- 230000008921 facial expression Effects 0.000 claims description 4
- 238000005065 mining Methods 0.000 claims description 4
- 230000008447 perception Effects 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 2
- 230000001788 irregular Effects 0.000 claims description 2
- 238000007477 logistic regression Methods 0.000 claims description 2
- 230000003287 optical effect Effects 0.000 claims description 2
- 238000007637 random forest analysis Methods 0.000 claims description 2
- 238000013058 risk prediction model Methods 0.000 claims description 2
- 238000002372 labelling Methods 0.000 claims 1
- 230000007246 mechanism Effects 0.000 abstract description 3
- 230000009323 psychological health Effects 0.000 abstract description 3
- 238000013075 data extraction Methods 0.000 abstract description 2
- 208000015181 infectious disease Diseases 0.000 description 17
- 208000035473 Communicable disease Diseases 0.000 description 16
- 238000010586 diagram Methods 0.000 description 11
- 239000011159 matrix material Substances 0.000 description 10
- 230000008859 change Effects 0.000 description 6
- 230000004630 mental health Effects 0.000 description 6
- 230000036541 health Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 201000010099 disease Diseases 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 230000009193 crawling Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000005180 public health Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 208000020061 Hand, Foot and Mouth Disease Diseases 0.000 description 2
- 208000025713 Hand-foot-and-mouth disease Diseases 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000003997 social interaction Effects 0.000 description 2
- 230000036962 time dependent Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013076 uncertainty analysis Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000004308 accommodation Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Resources & Organizations (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Probability & Statistics with Applications (AREA)
- Molecular Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Biomedical Technology (AREA)
- Economics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Life Sciences & Earth Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Human Computer Interaction (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Multimedia (AREA)
- Game Theory and Decision Science (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Mathematical Analysis (AREA)
Abstract
The invention discloses a public digital living scene rule model prediction early warning method based on a deep Bayesian network, which is characterized in that data analysis and extraction are carried out on multi-source heterogeneous data in some key living scenes in public digital life, an information and behavior element feature library is generated, the information and behavior element feature library is combined with a user digital portrait, a personalized rule mechanism is constructed, prediction early warning is timely and accurately carried out on different key living scenes, powerful support is provided for pre-intervention, and the method can be applied to public safety and sanitation early warning, psychological health early warning, campus cheating event early warning and the like.
Description
Technical Field
The invention belongs to the technical field of big data analysis, and particularly relates to a public digital life scene rule model prediction early warning method based on a deep Bayesian network.
Background
With the updating iteration of internet technologies such as cloud computing and big data and the continuous improvement of living standard, the demand of people on public services such as basic education, public health, public transportation and old age care is continuously expanded, and all levels of government departments also think and pay attention to innovation government public service modes under the internet + era background, promote the digitization of public life and provide life convenience. In public digital life, once problems occur in some key life scenes such as economic dispute events, fire disasters and the like, the advantages of people and social stability are seriously influenced, and prediction and early warning are made in the key life scenes, so that great loss can be avoided by finding in advance. Other life scenes such as route planning, intelligent recommendation and the like are accurately analyzed and predicted, so that great convenience is provided for people, and the life happiness of people is improved; therefore, the problems that prediction and early warning can be timely and accurately carried out on different key life scenes and powerful support is provided for prior intervention are urgently needed to be solved.
The existing prediction early warning technology still analyzes and predicts the behaviors of people based on the characteristics of single dimensionality or few dimensionalities and has the defects of incomplete analysis characteristics, low prediction accuracy and the like. Chinese patent publication No. CN106709606A provides a personalized scene prediction method and apparatus, which first obtains geographic location information of a user based on location service, where the geographic location information includes POI information associated with time, then performs cluster analysis on all geographic location information of the user within a preset period to obtain a lifestyle trajectory vector sequence, then constructs a markov transition matrix based on the lifestyle trajectory vector sequence, and finally obtains a current scene of the user, and obtains a corresponding prediction scene from the markov transition matrix based on the current scene. Chinese patent publication No. CN107967578A provides a public safety big data early warning platform for a smart city, which comprises an early warning system, a communication module, a cloud data platform and an information receiving terminal, wherein the early warning system comprises a natural disaster early warning system, an accident disaster early warning system, a public health event early warning system and a social safety event early warning system, the information receiving terminal comprises a PC terminal or a mobile terminal, and the PC terminal or the mobile terminal respectively displays early warning information through an early warning application program interface. Chinese patent publication No. CN109711613A provides an early warning method and system based on a personnel relationship model and an event association model, the method extracts model information data from public safety big data, and filters the model information data; performing statistical analysis on the model information data according to the personnel identity data, and extracting personnel creating personnel relation models reflecting events for many times; extracting semantic elements from the model information data according to the event data, and extracting events reflected by personnel for many times to create an event relation model; setting a personnel early warning threshold according to the times that one person reflects an event; and setting an event early warning threshold according to the times that a plurality of people reflect an event, and early warning the people and the event exceeding the early warning threshold.
In conclusion, a high-quality early warning system can accurately and timely make prediction and early warning on different key life scenes, meanwhile, multi-dimensional attributes of users are fused, the limitation is broken, the various dimensional attributes are associated, and a corresponding processing method is used according to the various dimensional attribute characteristics, so that the early warning system is more timely and accurate.
Disclosure of Invention
In view of the above, the invention provides a public digital life scene rule model prediction and early warning method based on a deep bayesian network, which can accurately make prediction and early warning on different key life scenes in time and make strong support for prior intervention.
A public digital life scene rule model prediction early warning method based on a deep Bayesian network comprises the following steps:
(1) Obtaining mass multi-source heterogeneous data through three access ways of an Internet of things, an application terminal and a service system, and establishing a database;
(2) Layering the database, and constructing a subject database of five basic elements, namely people, enterprises, places, things and things;
(3) Processing multi-source heterogeneous data by adopting a batch-flow type big data real-time processing technology;
(4) Combining the five basic element subject libraries with a specific application scene to construct five dimensions of the user digital portrait under the specific application scene: demographic attributes, life attributes, social attributes, consumption characteristics, psychological attributes;
(5) According to the processed multi-source heterogeneous data, constructing a user digital portrait by data mining and analyzing a user label;
(6) Aiming at a specific application scene, training a deep Bayesian network by using user digital portrait information to obtain an event risk prediction model under the scene, and then predicting and early warning risks existing in a target event by using the model.
Further, the multi-source heterogeneous data in the step (1) includes structured data and unstructured data, the structured data includes basic data including basic information such as houses and addresses and extended data including vehicle entrance and exit information and internet of things perception information, and the unstructured data includes life event information acquired by personnel, video monitoring data acquired by devices such as cameras, audio data and image data.
Further, the batch-flow type big data real-time processing technology in the step (3) comprises five functional modules of data acquisition, data loading, a data bus, data analysis and business service, wherein the data acquisition module is responsible for accessing the flow data in real time in a mode of internet of things acquisition and application side acquisition; the data loading module is responsible for loading historical offline data and access stream data from the service system; the data bus module is responsible for putting various data into a specified channel for transmission according to a uniform format; the data analysis module is responsible for extracting and processing real-time data and pushing product data; when a real-time query request sent by a service system is received, the data analysis module can utilize an internal analysis processing model to calculate a corresponding index on a complete big data set in real time and judge the index, and the result is fed back to the service system through the service module.
Further, the population attributes in the step (4) are used for describing the basic characteristic information of the social level of the user, and helping each life-focused application scene to know the basic situation of the user (specifically including name, gender, grade specialty, school number, dormitory number, height, age, marriage and non-marriage, contact, occupation and the like); the living attributes are used for knowing living conditions of the users, including living activity ranges (including dining halls, teaching buildings, dormitory buildings, shopping malls, bus stations, railway stations and the like) and travel modes (including bicycles, shared bicycles, electric vehicles, buses, self-driving vehicles and the like) so as to provide accurate services for the users in the subsequent process; the social attributes are used for describing social graphs, family members, friend circles and interests (particularly comprising roommates, classmates, students, teachers, intimacy, liking to go to a library and the like) of the users, the information usually represents a social relationship network of the users, and the users can be known as completely as possible through social information so as to provide personalized services for the users; the consumption characteristics are used for describing main consumption habits and consumption preferences (including car families, shopping types, purchase periods, brand preferences and the like) of the users, mining potential users of related consumption services, recommending related products and services according to the consumption characteristics of the users and improving the recommendation conversion rate; the psychological attributes are used for paying attention to the psychological condition information (such as characters, abilities, temperaments, values, emotions, thinking and the like) of the user, acquiring the psychological condition of the user in an anonymous questionnaire survey or similar user clustering mode, and providing corresponding psychological services or paying important attention according to the psychological condition.
Further, in the step (5), for non-video data and video data in the multi-source heterogeneous data, a user tag construction mode based on original data mining and a user tag construction mode based on a video structuring technology are respectively adopted; for non-video data, five methods of natural language processing, user intention identification, association rules, cluster analysis and track similarity are fused in a user tag construction mode based on original data mining; for the condition that specific dimension data of a specific user is missing, the completeness of a user label is ensured by using a collaborative filtering algorithm through the analysis completion characteristics of other similar users; for video data, a user label construction mode based on a video structuring technology integrates three methods of target detection, openCV + CNN emotion recognition and GaitSet gait recognition.
Furthermore, the natural language processing process adopts TF-IDF algorithm to calculate the similarity between texts, then a fastText classifier is adopted to classify the texts according to the similarity, finally Word vectors in the texts are extracted by adopting Word2Vec, the Word vectors are fused into sentence vectors by using LSTM and are input into a pre-trained recurrent neural network or a recurrent neural network, and therefore the emotion shown by the similar texts is predicted and analyzed.
Furthermore, the user intention recognition is to judge the behavior intention of the user according to the search record of the user or the analyzed user label, a TF-IDF algorithm is adopted to carry out vectorization on data in the specific implementation process, a word frequency, chi-square and mutual information mode is utilized to carry out feature selection, and finally a pre-trained decision tree CART (Classification and Regression Trees), a random forest containing a plurality of decision Trees, a logistic Regression or a Bayesian model is adopted to judge the behavior intention of the user.
Furthermore, the association rule is used for discovering the association between the seemingly irregular data of the surface, so as to find out the regularity and the development trend between the data, and an Apriori algorithm or an FP-Growth algorithm is adopted in the specific implementation process; the cluster analysis is used for classifying similar data into one class, the similarity of each class of data is the maximum in principle, and the cluster is taken as an unsupervised algorithm and is suitable for analyzing high-dimensional data; the track similarity is to analyze the behavior tracks from the time domain and the space domain, mine the daily behavior rules and the preference of the user from the historical behavior tracks, and label the daily behavior rules and the preference.
Further, the OpenCV + CNN emotion recognition is used for detecting the expression state of the face in the video image, and the specific implementation process includes firstly face detection and positioning, then facial expression feature extraction, and finally the use of a pre-trained convolutional neural network CNN for classification and judgment of the face expression.
Further, the GaitSet gait recognition is used for detecting the walking posture of a person in a video image, and in the specific implementation process, the image is firstly input into a Convolutional Neural Network (CNN) to extract features, then the multi-feature Pooling mode is integrated to aggregate the features in the image into a feature vector, and meanwhile, a Horizontal Pyramid Pooling (HPP) is adopted to make the features more discriminative, and a double-flow method, that is, two channels are adopted in the prediction calculation: one is an RGB image channel used for modeling spatial information, the other is an optical flow channel used for RNN modeling time sequence information, the RGB image channel and the RNN modeling time sequence information are jointly trained and subjected to information fusion, and finally, the features are input into a trained model so as to realize gait recognition.
Further, the training and predicting process of the deep bayesian network in the step (6) is as follows: firstly, analyzing user digital portrait information in a specific application scene, acquiring various information elements and behavior elements related to an event, knowing the association relationship among the elements of the event, and establishing a feature sample library based on the information elements and the behavior elements of the event; then combining the characteristic sample with expert opinions (namely as a true value), and determining the prior probability of the network node, namely the initial evidence of the risk probability; inputting the characteristic sample and the initial evidence into a network structure, and inferring the conditional probability distribution of the non-root nodes in the network by using an EM (effective man-machine) algorithm; and finally, based on a Bayesian algorithm criterion, converting the prior probability and the conditional probability into a posterior probability, namely a probability prediction result of the occurrence risk of the target event.
According to the public digital living scene rule model prediction early warning method, data analysis and extraction are carried out on multi-source heterogeneous data in some key living scenes in public digital life, an information and behavior element feature library is generated and combined with a user digital portrait to construct an individualized rule mechanism, prediction early warning can be timely and accurately carried out on different key living scenes, and powerful support is provided for pre-intervention.
Drawings
Fig. 1 is a schematic flow diagram of a public digital life scene rule model prediction and early warning method of the invention.
FIG. 2 is a schematic diagram of the public digital life data basic element theme base.
Fig. 3 is a schematic diagram of a specific data processing flow of the batch streaming big data real-time processing module according to the present invention.
FIG. 4 is a diagram of a user representation construction framework according to the present invention.
FIG. 5 is a schematic diagram of a personalized feature model construction framework according to the present invention.
FIG. 6 is a schematic diagram of an event anomaly prediction early warning model route according to the present invention.
FIG. 7 is a schematic view of a risk assessment process of various events according to the present invention.
Fig. 8 is a schematic diagram of a route of the public safety early warning technology of the present invention.
FIG. 9 is a diagram of a Bayesian network structure according to the present invention.
FIG. 10 (a) is a diagram of a Bayesian network for class social interaction according to the present invention.
FIG. 10 (b) is a diagram of a Bayesian network for gender-specific social interaction in accordance with the present invention.
Detailed Description
In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.
The general process of the present invention is shown in fig. 1, and can be applied to the scenes of campus, district, garden, and countryside. The following introduces a public digital life scene rule model and a prediction early warning method based on a deep Bayesian network by taking a campus scene as a specific example, and the specific process is as follows:
(1) And accessing multi-source heterogeneous data. The multi-source heterogeneous data mainly comprises two characteristics: firstly, the data source has multiple sources, such as image acquisition of a camera, a man brake, a car brake and the like, and system data access of each government department; secondly, the data types and forms have complexity, namely isomerism. The data source mainly comprises two types of data, namely structured data and unstructured data, wherein the structured data take basic information such as houses, addresses and the like as basic data, and the expanded data comprise face data, vehicle access data and Internet of things perception data; unstructured data includes: the life event information collected by personnel, and the video monitoring data, audio data and image data collected by equipment such as a camera. In a campus scene, the embodiment accesses massive multi-source heterogeneous data from internet of things equipment such as a camera, a man gate and a vehicle gate, mobile terminals such as WeChat, microblog and GPS, and business system data such as campus one-card data, student registration data, access records, consumption records, campus wifi access logs and one-card.
(2) And constructing a basic element subject library. And (3) carrying out dimension decomposition on the data, and constructing a subject database of five basic elements, namely people, enterprises, places, things and things, as shown in figure 2. In a campus scene, people in the element subject library can be refined into students, teaching workers, parents, visitors and the like; the enterprise can be divided into a supermarket, a canteen, a print shop, a glasses shop and the like; the 'affairs' can be refined into student entrance and exit records, stranger access records, infectious disease conditions and the like; the 'ground' can be refined into a library, a dining hall, a teaching building and the like.
(3) And (6) data processing. In the embodiment, a batch-type big data real-time processing module is built by combining a batch-type big data computing framework and a stream-type big data computing framework, so that massive data files can be processed in parallel in real time.
The specific data processing flow of the batch-flow type big data real-time processing module is shown in fig. 3, and the module is internally divided into small modules such as data acquisition, data loading, data bus, data analysis, business service and the like. The data acquisition module is responsible for accessing stream data in real time in the modes of internet of things acquisition, application end acquisition and the like; the data loading module is responsible for loading historical offline data and accessing stream data from a specific service system; the data bus module is responsible for putting various data into a specified channel for transmission according to a uniform format; the data analysis module is responsible for extracting and processing real-time data and pushing the product data. When the batch-flow type big data real-time processing module receives a real-time query request sent by the service system, the batch-flow type big data real-time processing module can calculate a corresponding index on the complete big data set in real time according to an analysis processing model in the data analysis small module, judges the index and feeds the result back to the service system through the service module.
(4) Dimensions of the user representation are constructed. Combining the data in the base element topic library with the campus scene depth, as shown in fig. 4, proposes to construct five dimensions surrounding the user portraits in the campus scene: demographic attributes, life attributes, social attributes, consumption characteristics, psychological attributes, in particular:
the population attributes are used for describing the basic characteristic information of the user social level and helping each key life application scene to know the basic situation of the user, and the method specifically comprises the following steps: name, gender, grade specialty, school number, dormitory number, height, age, marriage, contact, occupation, and the like.
The life attribute is used for knowing the life condition of the user, such as the life activity range, the travel mode and the like, so as to provide accurate service for the user in the following process, and the method specifically comprises the following steps: living activity range, travel pattern, etc.; wherein the life activity range includes: dining room, teaching building, dormitory building, market, bus station, railway station etc. the trip mode includes: bicycles, shared bicycles, electric vehicles, buses, self-driving, and the like.
The social attributes are used for describing a social graph, family members, a friend circle, interests and hobbies and the like of the user, the information usually represents a social relationship network of the user, and the user can be known as completely as possible through the social information so as to provide personalized services for the user, and the method specifically comprises the following steps: roommates, classmates, students, teachers, being more intimate, liking to go to a library, etc.
The consumption characteristics are used for describing main consumption habits and consumption preferences of users, potential users for consuming related services recommend related products and services according to the consumption characteristics of the users, the conversion rate is very high, and the consumption characteristics comprise: there are car families, shopping types, purchase cycles, brand preferences, etc.
The psychological attributes are used for paying attention to the psychological condition information of the users, such as characters, abilities, temperaments, values, emotions, thinking and the like, the psychological conditions of the users are obtained through anonymous questionnaires or similar user clustering, and corresponding psychological services are provided or important attention is paid according to the psychological conditions of the users.
(5) A user digital representation is constructed. According to whether the data belongs to non-video data or video data, two user portrait label construction modes, namely user label construction based on original data mining and user label construction based on a video structuring technology, are proposed, as shown in fig. 4.
For non-video data, comprehensive analysis and calculation are carried out on data of the five element topic libraries by using Natural Language Processing (NLP), clustering, classifying and association rule algorithms in a data mining algorithm, differences of behavior rules of different user groups are mined, and tags are marked for users.
Through the non-video data, detailed information of the user trip, such as behavior mode and dressing information, cannot be directly acquired. Therefore, to address this issue, the present example employs a video structuring technique that combines both traditional algorithms and deep learning algorithms.
The video structuring technology is that the video is extracted to obtain key information of different levels through algorithms in the fields of video image processing technology, text analysis technology and the like, corresponding semantic description is carried out on the key information of different levels, and finally the key video image information and the corresponding semantic information are structurally stored through video standardized description, so that the key information of the video can be conveniently recorded and retrieved. The method mainly relates to the technologies of target detection, behavior recognition, emotion recognition and the like, so that the information in the video image can be effectively expressed, and a corresponding descriptive sentence, namely a text label, can be generated for each image; for the attributes which are insufficient in data and difficult to determine, the embodiment performs complementation according to the corresponding attributes of similar users through a collaborative filtering algorithm.
This example will construct student representations that are rich and diverse, such as "super school", "weak school", "sports man", "diligent" and "extrasexual", among others, primarily from the perspective of the student.
(6) The method for constructing the deep Bayesian network rule model based on the event characteristics comprises the following steps: firstly, user digital portrait information in a campus scene is analyzed, various information elements and behavior elements related to an event are obtained, and an event characteristic model is constructed in a supporting mode, as shown in fig. 5. The information elements specifically comprise time information, place information, track information, character information, time information, learning achievement and the like; behavioral elements include purchase, travel, communication, stay, and the like. Each type of key life scene can extract information elements and behavior elements of virtual and real spaces and even thought spaces which are specific to the type of events as much as possible by carrying out ontology analysis on the events, generalize the common characteristics and the common behaviors of the type of events on the basis of analyzing a plurality of similar events, construct and form a characteristic library of the information elements and the behavior elements which are specific to the type of events, and support risk prediction and early warning analysis of campus life scenes.
(7) And (5) predicting and early warning analysis. The behaviors of various event objects generated in different stages have abnormal characteristics, on one hand, the behaviors of the event objects are abnormal compared with most behaviors of ordinary people, and on the other hand, the behaviors of the event objects are abnormal compared with the daily behaviors of the event objects. And analyzing data information of the virtual and real space of the target object, wherein the data information comprises basic information, communication behaviors, network behaviors, economic behaviors, consumption traces, accommodation traces and the like. As shown in fig. 6, in the present embodiment, by analyzing the behavior habits of the target object, and developing, comparing and mining the actual situation and the daily behavior of the target object or the behaviors of other ordinary people, a deep bayesian network is used to perform comprehensive research and judgment, identify abnormal behaviors, and support abnormal perception of events.
In the construction of the deep Bayesian network rule model, several events with high occurrence probability and poor influence are focused, such as public safety and health exception, campus deception event, mental health exception and the like. The prediction early warning analysis is carried out by adopting a deep Bayesian network, and the basic principle is that on the premise of knowing prior probability and a conditional probability density expression, a conditional probability density function is deduced through statistical learning of samples aiming at the uncertainty problem of various event risks, and Bayesian algorithm criterion is used for converting the conditional probability density function into the posterior probability.
The Deep Bayesian network (Deep Bayesian network) is a description of the Probability relation of uncertainty knowledge, and combines the classical Probability Theory (Probability Theory) and the Graph Theory (Graph Theory), thereby not only having the Probability Theory as a solid mathematical basis, but also having the visual expression of the Graph Theory. In the deep Bayesian network, if the state of any node in the network is determined, the network can carry out forward or reverse reasoning in the network by using Bayesian rules, so that the posterior probability of any node in the network is obtained, which is a key mechanism for establishing a prediction early warning system in the deep Bayesian network.
The construction of the prediction early warning model based on the deep Bayesian network comprises four steps: (1) and (3) based on the information element and behavior element feature library of the event, understanding the incidence relation among the event elements and constructing a deep Bayesian network structure model. (2) Combining the historical sample data and the expert opinions to determine the prior probability of the network nodes, namely the initial evidence of the risk probability. (3) Inputting sample data and initial evidence into a network structure model, and inferring the conditional probability distribution of the non-root nodes of the network by using a parameter learning algorithm; because of the dynamic property and uncertainty of event occurrence, part of invisible variables which cannot be observed often exist in sample data, the example adopts an iterative convergence algorithm (EM algorithm) with missing values of the sample to carry out parameter learning, and model parameters continuously tend to maximum likelihood estimation through multiple iterations to finally obtain conditional probability distribution. (4) And based on a Bayesian algorithm criterion, converting the prior probability and the conditional probability into a posterior probability, namely the risk probability of the target event in the model. As shown in fig. 7.
According to the prediction early warning model based on the deep Bayesian network, an abnormity early warning function module in a campus scene displays students with possible abnormity according to a result of big data judgment of a background model, and key factors causing abnormity are given through a graph model, so that the prediction early warning model plays a vital role in timely and effectively managing the students for an education supervisor. The system is mainly divided into public safety and health abnormity, psychological health abnormity and event abnormity, and public safety and health early warning, psychological health early warning and campus deception event early warning are correspondingly carried out.
Example 1 public health safety Pre-Warning
1.1 technical route
The traditional infectious disease outbreak risk prediction mainly comprises the following four aspects: (1) selecting infection types and regions of interest; (2) Selecting pathological, environmental and climatic factors related to the onset of infectious diseases; (3) Selecting a proper model to establish an infectious disease outbreak risk evaluation model; (4) And predicting the probability of the epidemic situation of the infectious disease under various conditions and verifying the accuracy of the established model. The embodiment is modified appropriately, and the specific technical route is shown in fig. 8.
The method mainly adopts a mobile percentile method, and the selected risk factors mainly comprise meteorological factors, economy, population density factors and the like. The establishment of the Bayesian model mainly comprises four steps, namely discretization of data, bayesian structure learning, parameter learning and network verification, wherein when a verification result is not ideal, the structural learning needs to be returned again, and a Bayesian network structure is reconstructed; finally, uncertainty analysis is carried out on the adopted method, and the uncertainty analysis mainly comprises uncertainty of data processing, uncertainty of panel data clustering analysis, uncertainty of a mobile percentile method in classification of infection outbreak grades and uncertainty in an early warning model building process based on a Bayesian network.
1.2 clustering algorithm based on spatio-temporal panel model
Panel Data (Panel Data) is also called time series-cross section mixed Data, and mainly refers to sample Data with time series, and Data obtained by taking a plurality of sections on the time series for experiment; the panel data typically includes time series features and cross-sectional features, as well as features in both spatial and temporal dimensions.
A general linear panel data regression model is:
y ij =X it β+μ i +ε it
wherein: i is an element of [1,2, \ 8230;, N]N different space individuals, T is from [1,2 ] \ 8230;, T]Refers to the change in time, y it Dependent variable observed value, X it Is a row vector of a K-dimensional interpretation variable, beta is a column vector of a K-dimensional coefficient, mu i Represents the spatial unit individual effect, epsilon it Is a random error term.
If a certain phenomenon or a certain attribute of one spatial unit is similar to the phenomenon or the attribute of another spatial unit to a high degree, the two spatial units have certain spatial correlation, and the spatial panel data is divided into single-index spatial panel data and multi-index spatial panel data according to the indexes of the spatial panel data. The data of the single-index panel is represented by a two-dimensional table or matrix, and the data is as follows:
assuming that the total is N samples, X represents a characteristic index of each sample, and T is a time length, X i (t) represents an index value of the i-th sample at time t.
Because the actual situation is too complex, the object to be studied in the actual research is often multi-index panel data, the structure of which is more complex than that of the traditional panel data structure, the time and space characteristics of which are usually represented by a three-dimensional table and sometimes can be represented by a matrix form.
Assuming an overall sample X comprising N samples each having a characteristic value, T being the time length, a matrix of a multi-indicator panel sample X is represented as:
general ofThe sample X actually contains data of three dimensions of space (total number of samples), time and a plurality of indexes, and can be subjected to dimensionality reduction on the spatial dimension, namely, can be represented as a group of space samples, namely, a three-dimensional table is expanded in a two-dimensional table form on the space, namely, X S =[X 1 ,...,X i ,...,X N ] T One spatial sample X of the sample X i The matrix of yes is represented as:
wherein: i is more than or equal to 1 and less than or equal to N, j is more than or equal to 1 and less than or equal to P, T is more than or equal to 1 and less than or equal to T,and (3) representing the index value of the jth index of the ith sample at the time t.
The sample X can be expressed as a group of indexes in the index dimension, namely a three-dimensional table is expanded into a two-dimensional table according to the index sequence, namely X V =[X 1 ,...,X j ,...,X P ]An index X of the sample X j The matrix of (d) is represented as:
sample X can be represented in the time dimension as a set of "ordered samples", that is, a three-dimensional table is spread out chronologically as a two-dimensional table, i.e.:
X O =[X(1),...,X(t),...,X(T)]
the matrix for an ordered sample X (t) of sample X is represented as:
wherein, its digital characteristic mainly includes:
(1) mean of jth index at time t:
(2) mean of jth index:
(3) variance of jth index at time t:
(4) variance of jth index:
compared with the traditional time series and cross section data, the spatio-temporal panel data can predict the situation of a future period more accurately and more quickly, and the accuracy of prediction and early warning can be improved more quickly in the uncertain field when the spatio-temporal panel data is combined with a Bayesian network.
1.3 Bayesian network-based space-time early warning algorithm
The method comprises the steps of establishing an infectious disease early warning model based on the Bayesian network by utilizing the existing knowledge, wherein the infectious disease early warning model mainly comprises data preprocessing, establishment of the Bayesian network for infectious disease outbreak risk, calculation of infectious disease outbreak risk probability, network verification and the like. The establishment of the Bayesian network is a crucial step, and is the key to the success of the early warning model establishment; when a network structure which is most suitable for the actual morbidity is found, the joint probability distribution of each node is calculated, and therefore the outbreak risk of the infectious disease is predicted.
Since an infectious disease is caused by not only one factor, but also many related epidemiological factors, economic factors, meteorological factors or environmental factors are combined together under the common condition, when the factors cannot be completely acquired, the factors are considered to be related to a part of data, and therefore, the factors which are most related to the outbreak and the epidemic of the infectious disease are found and analyzed. Because the Bayesian model can only process level and discrete data, for most influencing elements of continuous variables, only data discretization can be carried out, and an equidistant method is adopted for discretization, the number of a plurality of regions is required to be specified, and then a value domain is divided into a plurality of sub-regions according to a calculation method with equal width, so that a discretization result is obtained.
Then, a network structure learning algorithm based on independent test is adopted for carrying out the method, and the method mainly comprises the following steps:
(1) firstly, initializing a graph structure G < V, E >; where V is node = { dataset for all attribute fields }, E = { }, S = p, R = p;
(2) for each node pair (v) i ,v j ) Wherein v is i ,v j e.V, I ≠ j, and the interaction information I (V) of the e.V, I ≠ j is calculated i ,v j ) When the value I is larger than a certain fixed threshold value, adding the values I into the data set S in sequence according to the size sequence;
(3) marking and removing a first node pair in the data set S, and putting two corresponding edges into an edge set A;
(4) selecting a first node pair from the rest data sets S, if the node of the first node pair does not have a communication path, adding the node (4) pair into the edge set A, otherwise, putting the node into the edge set R;
(5) repeating (4) until S is empty;
(6) the first node pair in R is marked;
(7) taking out the node pair, carrying out conditional independence test on the node pair, and adding the node pair into the edge set A if the two nodes are still interdependent;
(8) repeating until R is empty;
(9) for any edge in E, if an edge other than one edge exists between nodes, deleting the edge from E temporarily; a conditional dependency test is then used to detect if two points are conditional, and if so, the edge is permanently deleted, otherwise E is added again.
Friedman theoretically proves that the learning algorithm based on the independent test has the semantic characteristics of the network and achieves effective results in practical application. As shown in fig. 9, a bayesian network is a graphical structure, and each variable is a node therein and contains information represented by one or more probability distributions. A variable does not have any dependency on other variables if it does not have any arcs attached to it, and if it does, it has a probability distribution associated with it if it has an associated child or parent node.
1.4 infectious disease outbreak risk probability estimation
And when the structure based on the Bayesian network early warning model is constructed, the next step of work is to calculate a conditional probability distribution table of the relative nodes in the network structure. In this example, a bayesian formula method is mainly used to learn parameters of a bayesian network, and the method is performed under the assumption that variables in a data set are all discrete and have no missing value, and nodes in the network are independent of each other, and the method mainly includes the following steps:
(1) first, data sets N and D are defined, where N has N variables and X has r possible sample segment values, i.e.The data set D has m records, is a data set for recording the epidemic outbreak risk level, and each record in the data set D has the information of all variables in the Z; a Bayesian network structure B is defined, which contains all the variables in N.
(2) In structure B G In each node X i Will have a set of parent nodes pi i (ii) a Definition of w ij Denotes pi i J (j =1, 2.., q.) in red i ) Fractional value of each sample, N ijk Represents variable X i Is v is ik Its father node pi i Is w ij The number of data records in time D, then
(3) Defining a network conditional probability θ ijk Is a conditional probability P (X) i =v ik |π i =w ij ) It represents when node X i Parent node pi of i Has a value of w ij ,X i Has a value of v ik ,k∈[1,r i ]Probability of time.
(4) Given a dataset D and a Bayesian network structure B G When theta is greater than theta ijk The expected value of (a) is calculated as:
θ ijk the variance of (a) is calculated as:
in parameter learning, it is usually necessary to calculate P (N) 1 |N 2 ) To infer the probability of an event occurring, where N 1 And N 2 Representing two different sets of variables, N 1 Expressed as the infectious disease outbreak risk rating, N 2 Representing environmental, climate and economic factor variables associated with the outbreak of the infectious disease, i.e., calculating probability values corresponding to various risk levels of the outbreak of the infectious disease in the presence of various associated factor variables. If N is present 2 As is known, the expected value E [ P (N) of this probability value is calculated 1 |N 2 )]It depends only on N 1 The likelihood value of (d); then, given a data set D and a Bayesian network structure B G When, E [ P (N) 1 |N 2 )]The calculation formula of (c) is as follows:
E[P(N 1 |N 2 )|D,B G ]=P(N 1 |N 2 ,D,B G )
wherein P (N) 1 |N 2 ,D,B G ) The calculation of (b) can be calculated by a bayesian calculation formula and an iterative product-sum summation formula in a bayesian network,meanwhile, the probability estimation value of each node, namely variable in the network can be obtained through calculation by the method, and the estimation structure is the expected value of the estimation structure.
1.5 introduction of related data
(1) Etiology index: generally, data such as virus detection rate and severe death incidence need to be provided by professional organizations.
(2) Demographic indexes: the population density (total number of susceptible people/area) of the susceptible population can be adjusted by regions according to the population flow of a specific region.
(3) Meteorological indexes are as follows: the weather indexes such as sunshine days, air temperature difference, average air temperature, average wind speed and the like are researched, the data mainly comes from a China weather data sharing service network and is obtained by an inverse distance weighting interpolation method on the basis of 756 station data in the whole country.
(4) Economic condition indexes are as follows: economics represents a regional development and also affects the prevalence and spread of disease to some extent. The urbanization level (town population/general population) is mainly considered in the example and is taken as the economic index, and the data is derived from the Chinese economic statistical database.
1.6 spatial aggregative predictor indices
The incidence conditions of the hand-foot-and-mouth disease are different in different months according to the regional distribution, so that the spatial clustering detection is required. The two indexes of the disease incidence S and the severe rate Q are comprehensively considered, the clustering method of the multi-index spatial panel is utilized in the embodiment, the clustering is carried out under SPSS analysis software, and the following three aspects of information are comprehensively considered:
(1) incidence and severity data itself, i.e. the actual condition of hand-foot-and-mouth disease.
(2) The time-dependent changes in incidence and severity, i.e., the incremental indicators, represent the time-dependent changes in incidence and severity.
(3) The change rate or the change speed of increment of the morbidity and the severe rate, namely the increment change condition of the morbidity and the severe rate, comprehensively considers the level index, the increment index and the time sequence of the increment change rate index of the morbidity and the severe rate, and has the following main formula:
single level indicators, i.e. the data itself S and Q, i.e.:
incremental indicators, namely:
the incremental rate of change indicator, i.e.:
and calculating the Euclidean distance of the disease to perform system clustering, so as to obtain areas with similar risk levels, and calculating the risk levels of the diseases according to the meteorological indexes and population flow conditions.
Example 2 mental health Pre-Warning
The form of an online questionnaire can be used for effectively screening students for depression, and self-assessment data of students can be collected online by using an online health questionnaire-depression scale (PHQ-9), but the online health questionnaire is time-consuming and labor-consuming, lacks real-time and reliability, and is not high in quality and quantity of collected data. The research of psychologists shows that the real-time screening of the depression by using the data of social media such as WeChat, microblog and the like is feasible and accurate.
Therefore, the example combines the characteristics of students, utilizes the data of social media to construct student word clouds, combines data such as one-card data, internet data, mobile terminal data, access records, consumption records, video monitoring, GPS (global positioning system), campus wifi access logs and the like to obtain spatio-temporal information on the basis, analyzes the behavior tracks of the students, and constructs student figures and information behavior elements on the basis of the student word clouds and the behavior tracks.
And finally, early warning is carried out by using a deep Bayesian model according to data such as the social network, word cloud, information behavior elements and the like of the students, and the information of the students with the early warning value exceeding a threshold value is displayed and used as an attention object of a school to find out the abnormality of the psychology or behavior of the students in advance and make a break-away and precaution work.
2.1 building word clouds
1) Emotion dictionary construction
On the basis of the existing more complete general emotion dictionary, an emotion dictionary related to depression is constructed, and the emotion dictionary is divided into an active dictionary and a passive dictionary.
Crawling depression overword and contents in depression overword as an alternative passive dictionary, then crawling microblog contents at random as an alternative positive dictionary, and then performing data cleaning on the alternative passive dictionary and the alternative positive dictionary and reserving expression characters so as to improve the analysis capability on microblog expressions and network hotwords; and comparing the cleaned data with data in the emotion dictionary by using a TF-IDF algorithm, and bringing words with high similarity into the corresponding dictionary.
For the text part, firstly calling the registered basic information of the student, and crawling the microblog content and the WeChat friend circle content of the student; then data pre-processing operation is carried out: removing information such as microblog topics and friend circle advertisements and links, and putting pictures into a picture library; and finally, segmenting words of microblog and friend characters by using a word segmentation technology in natural language processing, and performing text comparison with the emotion dictionary by using the TF-IDF algorithm to optimize a passive dictionary and an active dictionary.
2) Text sentiment analysis based on LSTM
In the embodiment, an open source semantic frame Word2Vec is used, high-dimensional vectors are used for Word representation, words with similar meanings are placed at similar positions, and then two words with similar meanings are found out by Euclidean distance or cosine similarity, so that the problem of 'one-meaning multiple-Word' is solved.
Combining the divided word vectors and sentences into a matrix, and encoding the input in the form of the matrix into one-dimensional vectors with lower dimensionality by using a Recurrent Neural Network (RNNs) or a Recurrent Neural Network (RNNs), while retaining most useful information, and combining an emotion dictionary to realize text emotion analysis.
3) Image emotion analysis
And manually marking the data in the picture library, wherein the labels are negative and positive, and then performing model training on the data by using an image classification model VGGNet in a computer vision technology to obtain a picture emotion classification model.
In the embodiment, an emotion dictionary and a picture library are divided into a training set and a testing set according to the proportion of 7.
Based on the method, sentiment analysis is carried out on the student friend circle and the microblog content by combining the sentiment dictionary and the picture library, and word cloud is constructed.
4) Emotion value calculation method
For the word cloud of the student, the example calculates the emotion values of a friend circle and a microblog of the student by using a weighted average method:
wherein: n is a radical of p 、N n Number of words, wp, representing positive and negative respectively i 、wp j Weights representing positive and negative words, M p 、M n Number of words, wp, representing positive and negative respectively a 、wp b Representing the weight of the active and passive vocabulary, respectively.
2.2 student trajectories
According to the in-out record, the consumption record and the video monitoring of students or teaching workers, the action tracks of the students or the teaching workers are analyzed through data such as a mobile terminal GPS, a campus wifi access log and an all-purpose card, the track similarity is calculated according to the Hausdorff distance, and generally the higher the similarity is, the more intimate the relationship is. The moving track sequence of each user is calculated pairwise to obtain an intimacy value between the users, then density clustering is carried out according to an intimacy threshold value of 0.4, a plurality of user groups with social relations are classified, labels are applied to the user groups, a student digital portrait is constructed, and behavior patterns of students, such as behavior habits, life styles, consumption levels, network behaviors, learning states and the like of the students are represented.
Wherein the similarity measure between the tracks is the basis of track data mining and querying, for any two tracks T a And T b Is provided with T a And T b The distance between is Dist (T) a ,T b ) A distance of 0 means that the two tracks are identical, and a larger distance means that the two tracks have a lower similarity or a higher dissimilarity. CPD (Closest-Pfoir Distance) is a method for measuring the Distance between two tracks by taking the minimum Distance between position points in the two tracks, T a And T b The CPD values in between are calculated as follows:
wherein: dist (loc, loc ') represents the euclidean distance between two location points loc and loc'.
2.3 social networking
The students are used as nodes of the neural network, the threshold condition of connection establishment between the nodes is that the track similarity between the two students exceeds 0.5, and the weight of the connection between the nodes is the track similarity between the two students. The obtained social network formed by all students is shown in fig. 10 (a), wherein nodes in the social network represent each student, different shades and colors of the nodes represent classes of the students, and the size of the nodes reflects the degree of the nodes, namely the number of the nodes connected with the nodes; it should be noted that, the network topology relationship of the student social network is shown in the figure, not the mapping of the student vector in the two-dimensional plane, it can be obviously found that most students are distributed in a cluster-like network by taking class as a unit, but there are also more isolated students, and from the size distribution of the nodes, there is a great difference in the individual sociability of the students, namely, there is a large node in the center of the cluster and a small node isolated to be hardly found. And fig. 10 (b) shows a social network diagram distinguished by the gender of the student, and it can be seen that circles of social contact of boys and girls are basically separated, and boys and girls are basically clustered respectively except for the campus lovers relationship.
The accuracy of the student vector calculation can be laterally verified by combining common sense and the graphs in fig. 10 (a) and 10 (b), the student social network can show the isolation of students, and the calculation of the student isolation is converted into the mental health early warning based on the deep bayesian network in the example.
2.4 mental health early warning method
Establishing a deep Bayesian network by referring to a 1.4 risk probability estimation method, setting different weights for word cloud emotion, social networks and user portraits established by students, and training a model by using the weights as input features of the deep Bayesian network; the mental health early warning value is between 0 and 1, and early warning is carried out when the mental health early warning value exceeds 0.6.
Example 3 campus fraud Warning
Behavior elements and information elements of the past campus deception event are obtained and analyzed according to the methods 2.1-2.3, and the personality, consumption condition, behavior habit, learning state, psychological condition and the like of the campus deception event are analyzed by combining student information, so that a deception student user portrait is constructed.
Constructing a deep Bayesian network by referring to a 1.4 risk probability estimation method, constructing a feature vector according to student user figures, behavior elements and information element features thereof, and training to obtain a campus cheating early warning model; and alarming when the risk value exceeds 0.5, paying corresponding attention to related students, and performing psychological dispersion, family visit or punishment if necessary.
The foregoing description of the embodiments is provided to enable one of ordinary skill in the art to make and use the invention, and it is to be understood that other modifications of the embodiments, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty, as will be readily apparent to those skilled in the art. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.
Claims (2)
1. A public digital life scene rule model prediction early warning method based on a deep Bayesian network comprises the following steps:
(1) Obtaining mass multi-source heterogeneous data through three access ways of an Internet of things, an application terminal and a service system, and establishing a database;
(2) Layering the database, and constructing a subject database of five basic elements, namely people, enterprises, places, matters and things;
(3) Processing multi-source heterogeneous data by adopting a batch-flow type big data real-time processing technology;
the batch-flow type big data real-time processing technology comprises five functional modules of data acquisition, data loading, a data bus, data analysis and business service, wherein the data acquisition module is responsible for accessing streaming data in real time in a mode of internet of things acquisition and application end acquisition; the data loading module is responsible for loading historical offline data and access stream data from the service system; the data bus module is responsible for putting various data into an appointed channel for transmission according to a uniform format; the data analysis module is responsible for extracting and processing real-time data and pushing product data; when a real-time query request sent by a service system is received, the data analysis module can utilize an internal analysis processing model to calculate a corresponding index on a complete big data set in real time and judge the index, and the result is fed back to the service system through the service module;
(4) Combining the five basic element subject libraries with a specific application scene to construct five dimensions of the user digital portrait under the specific application scene: demographic attributes, life attributes, social attributes, consumption characteristics, psychological attributes;
the population attributes are used for describing the basic characteristic information of the user social level and helping each key life application scene to know the basic situation of the user; the life attributes are used for knowing the life conditions of the user, including the life activity range and the travel mode, so that accurate services can be provided for the user in the following process; the social attributes are used for describing a social graph, family members, a friend circle and interests of the user, the information usually represents a social relationship network of the user, and the user can be known as completely as possible through the social information so as to provide personalized services for the user; the consumption characteristics are used for describing main consumption habits and consumption preferences of the users, mining potential users of related consumption services, recommending related products and services according to the consumption characteristics of the users and improving the recommendation conversion rate; the psychological attributes are used for paying attention to the psychological condition information of the user, acquiring the psychological condition of the user through anonymous questionnaire survey or a similar user clustering mode, and providing corresponding psychological service or paying attention to the psychological condition;
(5) According to the processed multi-source heterogeneous data, constructing a user digital portrait by data mining and analyzing a user label;
aiming at non-video data and video data in multi-source heterogeneous data, a user tag construction mode based on original data mining and a user tag construction mode based on a video structuring technology are respectively adopted; for non-video data, five methods of natural language processing, user intention identification, association rules, cluster analysis and track similarity are fused in a user tag construction mode based on original data mining; for the condition that specific dimension data of a specific user is missing, the completeness of a user label is ensured by using a collaborative filtering algorithm through the analysis completion characteristics of other similar users; for video data, a user label construction mode based on a video structuring technology integrates three methods of target detection, openCV + CNN emotion recognition and GaitSet gait recognition;
the natural language processing process adopts TF-IDF algorithm to calculate the similarity between texts, further adopts a fastText classifier to classify the texts according to the similarity, finally adopts Word2Vec to extract Word vectors in the texts, and utilizes LSTM to fuse the Word vectors into sentence vectors and input the sentence vectors into a pre-trained recurrent neural network or a recurrent neural network, thereby predicting and analyzing the emotion shown by the similar texts;
the user intention identification is to judge the behavior intention of the user according to the search record of the user or the analyzed user label, particularly, a TF-IDF algorithm is adopted to carry out vectorization on data in the implementation process, the characteristic selection is carried out by utilizing a word frequency, chi-square and mutual information mode, and finally, a pre-trained decision tree CART, a random forest containing a plurality of decision trees, a logistic regression or Bayesian model is adopted to judge the behavior intention of the user;
the association rule is used for discovering the association between the data with seemingly irregular surfaces so as to find the regularity and the development trend between the data, and an Apriori algorithm or an FP-Growth algorithm is adopted in the specific realization process; the cluster analysis is used for classifying similar data into one class, the similarity of each class of data is the maximum in principle, and the cluster is taken as an unsupervised algorithm and is suitable for analyzing high-dimensional data; analyzing the behavior tracks from the time domain and the space domain according to the track similarity, mining the daily behavior rules and the preference of the user from the historical behavior tracks, and labeling the daily behavior rules and the preference;
the OpenCV + CNN emotion recognition is used for detecting the expression state of the face in a video image, and the specific implementation process comprises the steps of firstly detecting and positioning the face, then extracting facial expression characteristics, and finally using a pre-trained convolutional neural network CNN for classifying and judging the facial expression; the GaitSet gait recognition is used for detecting the walking posture of a person in a video image, in the specific implementation process, firstly, the image is input into a Convolutional Neural Network (CNN) to extract features, then, a multi-feature Pooling mode is integrated to aggregate the features in the image into a feature vector, meanwhile, a Horizontal Pyramid Pooling method is adopted to enable the features to be more discriminative, and a double-flow method is adopted in prediction calculation, namely, the method comprises two channels: one is an RGB image channel used for modeling spatial information, the other is an optical flow channel used for RNN modeling time sequence information, the RGB image channel and the RNN modeling time sequence information are jointly trained and subjected to information fusion, and finally, the features are input into a trained model so as to realize gait recognition;
(6) Aiming at a specific application scene, training a deep Bayesian network by using user digital portrait information to obtain an event risk prediction model under the scene, and then predicting and early warning risks existing in a target event by using the model, specifically:
firstly, analyzing user digital portrait information in a specific application scene, acquiring various information elements and behavior elements related to an event, knowing the association relationship among the elements of the event, and establishing a feature sample library based on the information elements and the behavior elements of the event; then combining the characteristic sample with the expert opinion to determine the prior probability of the network node, namely the initial evidence of the risk probability; inputting the characteristic sample and the initial evidence into a network structure, and inferring the conditional probability distribution of the non-root nodes in the network by using an EM (effective man-machine) algorithm; and finally, based on a Bayesian algorithm criterion, converting the prior probability and the conditional probability into a posterior probability, namely a probability prediction result of the occurrence risk of the target event.
2. The public digital life scene rule model prediction early warning method as claimed in claim 1, wherein: the multi-source heterogeneous data in the step (1) comprises structured data and unstructured data, the structured data comprises basic data including basic information such as houses and addresses and extended data including vehicle access information and internet of things perception information, and the unstructured data comprises life event information acquired by personnel and video monitoring data, audio data and image data acquired by equipment such as a camera.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110292515.3A CN113010572B (en) | 2021-03-18 | 2021-03-18 | Public digital life scene rule model prediction early warning method based on deep Bayesian network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110292515.3A CN113010572B (en) | 2021-03-18 | 2021-03-18 | Public digital life scene rule model prediction early warning method based on deep Bayesian network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113010572A CN113010572A (en) | 2021-06-22 |
CN113010572B true CN113010572B (en) | 2023-04-18 |
Family
ID=76402593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110292515.3A Active CN113010572B (en) | 2021-03-18 | 2021-03-18 | Public digital life scene rule model prediction early warning method based on deep Bayesian network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113010572B (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704371B (en) * | 2021-07-16 | 2024-06-28 | 重庆工商大学 | Method for adaptively detecting and dividing subareas in geographic information network |
CN113642986B (en) * | 2021-08-02 | 2024-04-16 | 上海示右智能科技有限公司 | Method for constructing digital notarization |
CN113610309B (en) * | 2021-08-13 | 2022-06-03 | 清华大学 | Fire station site selection method and device based on big data and artificial intelligence |
CN113641831B (en) * | 2021-08-16 | 2022-04-15 | 中国科学院空天信息创新研究院 | Knowledge graph-based forest fire spreading trend prediction method oriented to multi-source discrete data |
CN113609360B (en) * | 2021-08-19 | 2024-07-05 | 武汉东湖大数据科技股份有限公司 | Method and system based on scenerization multi-source data fusion analysis |
CN113778802B (en) * | 2021-09-15 | 2024-09-24 | 深圳前海微众银行股份有限公司 | Abnormality prediction method and device |
CN114092132A (en) * | 2021-11-01 | 2022-02-25 | 常州工学院 | User-oriented shared bicycle prediction method |
CN114358984A (en) * | 2021-12-31 | 2022-04-15 | 城云科技(中国)有限公司 | Dispute management method and device, readable storage medium and electronic device |
CN114972938B (en) * | 2022-02-21 | 2024-09-24 | 上海应用技术大学 | Indoor strange scene recognition system integrating knowledge graph and space semantic topological graph |
CN115456843A (en) * | 2022-09-14 | 2022-12-09 | 北京易思汇商务服务有限公司 | Intelligent wind control system and method based on study-keeping big data analysis |
CN115545758B (en) * | 2022-09-26 | 2024-09-10 | 苏州大学 | Method and system for self-adaptive incremental site selection of urban service facilities |
CN115577289B (en) * | 2022-12-08 | 2023-03-10 | 工福(北京)科技发展有限公司 | Aggregation access management system and method for digital workshop administration system |
CN116340619B (en) * | 2023-03-01 | 2023-12-12 | 复旦大学 | Role mining analysis method for online community network spoofing |
CN116071077B (en) * | 2023-03-06 | 2023-06-27 | 深圳市迪博企业风险管理技术有限公司 | Risk assessment and identification method and device for illegal account |
CN116151494A (en) * | 2023-04-24 | 2023-05-23 | 中国科学院地理科学与资源研究所 | Data processing method, device, equipment and computer readable storage medium |
CN116662606B (en) * | 2023-04-28 | 2024-06-18 | 青岛尘元科技信息有限公司 | Method and system for determining new video event, storage medium and electronic device |
CN116362549B (en) * | 2023-05-22 | 2023-08-04 | 北京航天常兴科技发展股份有限公司 | Fire disaster prevention and control method based on data information mining technology |
CN116823511B (en) * | 2023-08-30 | 2024-01-09 | 北京中科心研科技有限公司 | Method and device for identifying social isolation state of user and wearable device |
CN117131944B (en) * | 2023-10-24 | 2024-01-12 | 中国电子科技集团公司第十研究所 | Multi-field-oriented interactive crisis event dynamic early warning method and system |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190019193A1 (en) * | 2017-07-13 | 2019-01-17 | Zeek Mobile Ltd. | Systems and methods for detection of online payment mechanism fraud |
CN108234463B (en) * | 2017-12-22 | 2021-02-02 | 杭州安恒信息技术股份有限公司 | User risk assessment and analysis method based on multi-dimensional behavior model |
CN108573411B (en) * | 2018-04-17 | 2021-09-21 | 重庆理工大学 | Mixed recommendation method based on deep emotion analysis and multi-source recommendation view fusion of user comments |
CN109902216A (en) * | 2019-03-04 | 2019-06-18 | 桂林电子科技大学 | A kind of data collection and analysis method based on social networks |
CN110290120B (en) * | 2019-06-12 | 2021-09-17 | 西安邮电大学 | Time sequence evolution network security early warning method of cloud platform |
CN112434814A (en) * | 2020-12-07 | 2021-03-02 | 中国人民解放军国防科技大学 | Method for analyzing shipping economic potential based on multi-source heterogeneous information fusion algorithm |
-
2021
- 2021-03-18 CN CN202110292515.3A patent/CN113010572B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN113010572A (en) | 2021-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113010572B (en) | Public digital life scene rule model prediction early warning method based on deep Bayesian network | |
Liu et al. | Visual listening in: Extracting brand image portrayed on social media | |
Zhang et al. | Interactive COVID-19 mobility impact and social distancing analysis platform | |
Ma et al. | Analyzing driving factors of land values in urban scale based on big data and non-linear machine learning techniques | |
Chen et al. | A survey on an emerging area: Deep learning for smart city data | |
Nissan | Digital technologies and artificial intelligence’s present and foreseeable impact on lawyering, judging, policing and law enforcement | |
Cheng et al. | Evaluation methods and measures for causal learning algorithms | |
Dodge et al. | Towards a taxonomy of movement patterns | |
Rinzivillo et al. | Visually driven analysis of movement data by progressive clustering | |
CN113158023B (en) | Public digital life accurate classification service method based on mixed recommendation algorithm | |
CN106126549A (en) | A kind of community's trust recommendation method decomposed based on probability matrix and system thereof | |
Taylor et al. | Artificial intelligence from colonial india: Race, statistics, and facial recognition in the global south | |
Smits et al. | The agency of computer vision models as optical instruments | |
Nohekhan et al. | A deep learning model for off-ramp hourly traffic volume estimation | |
Piccialli et al. | A deep learning approach for path prediction in a location-based IoT system | |
CN117493973A (en) | Social media negative emotion recognition method based on generation type artificial intelligence | |
Reyes et al. | Proposal for a pivot-based vehicle trajectory clustering method | |
Karimiziarani | A Tutorial on Event Detection using Social Media Data Analysis: Applications, Challenges, and Open Problems | |
Lemos et al. | Influence of the Spatial Distribution of Jobs in Intervening Opportunities Models | |
Hopfe et al. | Short-term forecasting airport passenger flow during periods of volatility: Comparative investigation of time series vs. neural network models | |
Sun et al. | Mining vehicle trajectories to discover individual significant places: Case study using floating car data in the Paris region | |
CN112287243B (en) | Service information recommendation device and method based on collaborative filtering algorithm | |
Selvarathi et al. | A visualisation technique of extracting hidden patterns for maintaining road safety | |
Wang et al. | AsCDPR: a novel framework for ratings and personalized preference hotel recommendation using cross-domain and aspect-based features | |
Garcia-Arteaga et al. | A network-based analysis to assess COVID-19 disruptions in the Bogotá BRT system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |