CN113010572A - Public digital life scene rule model prediction early warning method based on deep Bayesian network - Google Patents

Public digital life scene rule model prediction early warning method based on deep Bayesian network Download PDF

Info

Publication number
CN113010572A
CN113010572A CN202110292515.3A CN202110292515A CN113010572A CN 113010572 A CN113010572 A CN 113010572A CN 202110292515 A CN202110292515 A CN 202110292515A CN 113010572 A CN113010572 A CN 113010572A
Authority
CN
China
Prior art keywords
data
user
early warning
information
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110292515.3A
Other languages
Chinese (zh)
Other versions
CN113010572B (en
Inventor
马汉杰
董慧
许永恩
刘烈宏
李柏睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Maquan Information Technology Co ltd
Original Assignee
Hangzhou Maquan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Maquan Information Technology Co ltd filed Critical Hangzhou Maquan Information Technology Co ltd
Priority to CN202110292515.3A priority Critical patent/CN113010572B/en
Publication of CN113010572A publication Critical patent/CN113010572A/en
Application granted granted Critical
Publication of CN113010572B publication Critical patent/CN113010572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Fuzzy Systems (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)

Abstract

The invention discloses a public digital living scene rule model prediction early warning method based on a deep Bayesian network, which is characterized in that multi-source heterogeneous data in some key living scenes in public digital life are subjected to data analysis and extraction, an information and behavior element feature library is generated and combined with a user digital portrait to construct a personalized rule mechanism, prediction early warning is timely and accurately carried out on different key living scenes, powerful support is provided for prior intervention, and the method can be applied to public safety and sanitation early warning, psychological health early warning, campus cheating and cheating event early warning and the like.

Description

Public digital life scene rule model prediction early warning method based on deep Bayesian network
Technical Field
The invention belongs to the technical field of big data analysis, and particularly relates to a public digital life scene rule model prediction early warning method based on a deep Bayesian network.
Background
With the updating iteration of internet technologies such as cloud computing and big data and the continuous improvement of living standard, the demand of people on public services such as basic education, public health, public transportation and old people is continuously expanded, and all levels of government departments also think and pay attention to the innovation of government public service modes under the internet + era background, so that the digitization of public life is promoted and the convenience of life is provided. In public digital life, once problems occur in some key life scenes such as economic dispute events, fire disasters and the like, the advantages of people and social stability are seriously influenced, and prediction and early warning are made in the key life scenes, so that great loss can be avoided by finding in advance. Other life scenes such as route planning, intelligent recommendation and the like are accurately analyzed and predicted, so that great convenience is provided for people, and the life happiness of people is improved; therefore, the problems that prediction and early warning can be timely and accurately carried out on different key life scenes and powerful support is provided for prior intervention are urgently needed to be solved.
The existing prediction early warning technology still analyzes and predicts the behaviors of people based on the characteristics of single dimensionality or few dimensionalities and has the defects of incomplete analysis characteristics, low prediction accuracy and the like. Chinese patent publication No. CN106709606A provides a personalized scene prediction method and apparatus, which first obtains geographical location information of a user based on location services, where the geographical location information includes POI information associated with time, then performs cluster analysis on all the geographical location information of the user within a preset period to obtain a lifestyle habit trajectory vector sequence, then constructs a markov transition matrix based on the lifestyle habit trajectory vector sequence, and finally obtains a current scene of the user, and obtains a corresponding prediction scene from the markov transition matrix based on the current scene. Chinese patent publication No. CN107967578A provides a public safety big data early warning platform for smart city, which comprises an early warning system, a communication module, a cloud data platform and an information receiving terminal, wherein the early warning system comprises a natural disaster early warning system, an accident disaster early warning system, a public health event early warning system and a social safety event early warning system, the information receiving terminal comprises a PC terminal or a mobile terminal, and the PC terminal or the mobile terminal respectively displays early warning information through an early warning application program interface, and transmits the early warning information monitored by the natural disaster early warning system, the accident disaster early warning system, the public health event early warning system and the social safety event early warning system to the cloud data platform through the communication module and transmits the early warning information to the PC terminal or the mobile terminal, and the early warning information is displayed through the early warning application program interface, so that the public and the early warning system can be better connected, early warning information can be timely known only through one early warning application program, and the method is convenient and fast. The Chinese patent with publication number CN109711613A provides an early warning method and system based on a personnel relationship model and an event correlation model, the method extracts model information data from public safety big data, and filters the model information data; performing statistical analysis on the model information data according to the personnel identity data, and extracting personnel creating personnel relation models reflecting events for many times; extracting semantic elements from the model information data according to the event data, and extracting events reflected by personnel for many times to create an event relation model; setting a personnel early warning threshold according to the times that one person reflects an event; and setting an event early warning threshold according to the times that a plurality of people reflect an event, and early warning the people and the event exceeding the early warning threshold.
In conclusion, a high-quality early warning system can accurately and timely make prediction and early warning on different key life scenes, meanwhile, multi-dimensional attributes of users are fused, the limitation is broken, the various dimensional attributes are associated, and a corresponding processing method is used according to the various dimensional attribute characteristics, so that the early warning system is more timely and accurate.
Disclosure of Invention
In view of the above, the invention provides a public digital life scene rule model prediction and early warning method based on a deep bayesian network, which can accurately make prediction and early warning on different key life scenes in time and make strong support for prior intervention.
A public digital life scene rule model prediction early warning method based on a deep Bayesian network comprises the following steps:
(1) obtaining mass multi-source heterogeneous data through three access ways of an Internet of things, an application terminal and a service system, and establishing a database;
(2) layering the database, and constructing a subject database of five basic elements, namely people, enterprises, places, matters and things;
(3) processing multi-source heterogeneous data by adopting a batch-flow type big data real-time processing technology;
(4) combining the five basic element subject libraries with a specific application scene to construct five dimensions of the user digital portrait under the specific application scene: demographic attributes, life attributes, social attributes, consumption characteristics, psychological attributes;
(5) according to the processed multi-source heterogeneous data, constructing a user digital portrait by data mining and analyzing a user label;
(6) aiming at a specific application scene, training a deep Bayesian network by using user digital portrait information to obtain an event risk prediction model under the scene, and then predicting and early warning risks existing in a target event by using the model.
Further, the multi-source heterogeneous data in the step (1) includes structured data and unstructured data, the structured data includes basic data including basic information such as houses and addresses and extended data including vehicle entrance and exit information and internet of things perception information, and the unstructured data includes life event information acquired by personnel, video monitoring data acquired by devices such as cameras, audio data and image data.
Further, the batch-flow type big data real-time processing technology in the step (3) comprises five functional modules of data acquisition, data loading, a data bus, data analysis and business service, wherein the data acquisition module is responsible for accessing the flow data in real time in a mode of internet of things acquisition and application side acquisition; the data loading module is responsible for loading historical offline data and access stream data from the service system; the data bus module is responsible for putting various data into an appointed channel for transmission according to a uniform format; the data analysis module is responsible for extracting and processing real-time data and pushing product data; when a real-time query request sent by a service system is received, the data analysis module can utilize an internal analysis processing model to calculate a corresponding index on a complete big data set in real time and judge the index, and the result is fed back to the service system through the service module.
Further, the population attributes in the step (4) are used for describing the basic characteristic information of the social level of the user, and helping each life-focused application scene to know the basic situation of the user (specifically including name, gender, grade specialty, school number, dormitory number, height, age, marriage and non-marriage, contact, occupation and the like); the life attributes are used for knowing the life conditions of the users, and comprise life activity ranges (including canteens, teaching buildings, dormitory buildings, shopping malls, bus stations, railway stations and the like) and travel modes (including bicycles, shared bicycles, electric vehicles, buses, self-driving vehicles and the like) so as to provide accurate services for the users in the subsequent process; the social attributes are used for describing social graphs, family members, friend circles and interests (particularly comprising roommates, classmates, students, teachers, intimacy, liking to go to a library and the like) of the users, the information usually represents a social relationship network of the users, and the users can be known as completely as possible through social information so as to provide personalized services for the users; the consumption characteristics are used for describing main consumption habits and consumption preferences (including car families, shopping types, purchase periods, brand preferences and the like) of the users, mining potential users of related consumption services, recommending related products and services according to the consumption characteristics of the users and improving the recommendation conversion rate; the psychological attributes are used for paying attention to the psychological condition information (such as character, ability, temperament, value, emotion, thinking and the like) of the user, acquiring the psychological condition of the user in an anonymous questionnaire survey or similar user clustering mode, and providing corresponding psychological service or paying attention to the psychological condition according to the psychological condition of the user.
Further, in the step (5), for non-video data and video data in the multi-source heterogeneous data, a user tag construction mode based on original data mining and a user tag construction mode based on a video structuring technology are respectively adopted; for non-video data, five methods of natural language processing, user intention identification, association rules, cluster analysis and track similarity are fused in a user tag construction mode based on original data mining; for the condition that specific dimension data of a specific user is missing, the completeness of a user label is ensured by using a collaborative filtering algorithm through the analysis completion characteristics of other similar users; for video data, a user label construction mode based on a video structuring technology integrates three methods of target detection, OpenCV + CNN emotion recognition and GaitSet gait recognition.
Furthermore, the natural language processing process adopts TF-IDF algorithm to calculate the similarity between texts, then a fastText classifier is adopted to classify the texts according to the similarity, finally Word vectors in the texts are extracted by adopting Word2Vec, the Word vectors are fused into sentence vectors by using LSTM and are input into a pre-trained recurrent neural network or a recurrent neural network, and therefore the emotion shown by the similar texts is predicted and analyzed.
Furthermore, the user intention recognition is to judge the behavior intention of the user according to the search record of the user or the analyzed user label, a TF-IDF algorithm is adopted to carry out vectorization on data in the specific implementation process, a word frequency, chi-square and mutual information mode is utilized to carry out feature selection, and finally a pre-trained decision tree CART (classification and Regression Trees), a random forest comprising a plurality of decision trees, a logistic Regression or a Bayesian model are adopted to judge the behavior intention of the user.
Furthermore, the association rule is used for discovering the association between the seemingly irregular data of the surface, so as to find out the regularity and the development trend between the data, and an Apriori algorithm or an FP-Growth algorithm is adopted in the specific implementation process; the cluster analysis is used for classifying similar data into one class, the similarity of each class of data is the maximum in principle, and the cluster is taken as an unsupervised algorithm and is suitable for analyzing high-dimensional data; the track similarity is to analyze the behavior tracks from the time domain and the space domain, mine the daily behavior rules and the preference of the user from the historical behavior tracks, and label the daily behavior rules and the preference.
Further, the OpenCV + CNN emotion recognition is used for detecting the expression state of the face in the video image, and the specific implementation process includes firstly face detection and positioning, then facial expression feature extraction, and finally the use of a pre-trained convolutional neural network CNN for classification and judgment of the face expression.
Further, the GaitSet gait recognition is used for detecting the walking posture of a person in a video image, and in the specific implementation process, the image is firstly input into a Convolutional Neural Network (CNN) to extract features, then the multi-feature Pooling mode is integrated to aggregate the features in the image into a feature vector, and meanwhile, a Horizontal Pyramid Pooling (HPP) is adopted to make the features more discriminative, and a double-flow method, that is, two channels are adopted in the prediction calculation: one is an RGB image channel used for modeling spatial information, the other is an optical flow channel used for RNN modeling time sequence information, the RGB image channel and the RNN modeling time sequence information are jointly trained and subjected to information fusion, and finally, the features are input into a trained model so as to realize gait recognition.
Further, the training and predicting process of the deep bayesian network in the step (6) is as follows: firstly, analyzing user digital portrait information in a specific application scene, acquiring various information elements and behavior elements related to an event, knowing the association relationship among the elements of the event, and establishing a feature sample library based on the information elements and the behavior elements of the event; then combining the characteristic sample with expert opinions (namely as a true value), and determining the prior probability of the network node, namely the initial evidence of the risk probability; inputting the characteristic sample and the initial evidence into a network structure, and inferring the conditional probability distribution of the non-root nodes in the network by using an EM (effective man-machine) algorithm; and finally, based on a Bayesian algorithm criterion, converting the prior probability and the conditional probability into a posterior probability, namely a probability prediction result of the occurrence risk of the target event.
According to the public digital living scene rule model prediction early warning method, data analysis and extraction are carried out on multi-source heterogeneous data in some key living scenes in public digital life, an information and behavior element feature library is generated and combined with a user digital portrait to construct an individualized rule mechanism, prediction early warning can be timely and accurately carried out on different key living scenes, and powerful support is provided for pre-intervention.
Drawings
Fig. 1 is a flow diagram of a public digital life scene rule model prediction early warning method of the invention.
FIG. 2 is a schematic diagram of the basic element theme library of public digital life data.
Fig. 3 is a schematic diagram of a specific data processing flow of the batch streaming big data real-time processing module according to the present invention.
FIG. 4 is a diagram of a user representation construction framework according to the present invention.
FIG. 5 is a schematic diagram of a personalized feature model construction framework according to the present invention.
Fig. 6 is a schematic diagram of an event anomaly prediction early warning model route according to the present invention.
FIG. 7 is a schematic view of a risk assessment process of various events according to the present invention.
Fig. 8 is a schematic diagram of a route of the public safety early warning technology of the present invention.
FIG. 9 is a diagram of a Bayesian network structure according to the present invention.
FIG. 10(a) is a diagram of a Bayesian network for class social interaction according to the present invention.
FIG. 10(b) is a diagram of a Bayesian network for gender-specific social interaction in accordance with the present invention.
Detailed Description
In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.
The general process of the present invention is shown in fig. 1, and can be applied to the scenes of campus, district, garden, and countryside. The following introduces a public digital life scene rule model and a prediction early warning method based on a deep Bayesian network by taking a campus scene as a specific example, and the specific process is as follows:
(1) and accessing multi-source heterogeneous data. The multi-source heterogeneous data mainly comprises two characteristics: firstly, the data source has multiple sources, such as image acquisition of a camera, a man brake, a car brake and the like, and system data access of each government department; secondly, the data types and forms have complexity, namely isomerism. The data source mainly comprises two types of data, namely structured data and unstructured data, wherein the structured data take basic information such as houses, addresses and the like as basic data, and the expanded data comprise face data, vehicle access data and Internet of things perception data; unstructured data includes: the life event information collected by personnel, and the video monitoring data, audio data and image data collected by equipment such as a camera. In a campus scene, the embodiment accesses massive multi-source heterogeneous data from internet of things equipment such as a camera, a man gate and a vehicle gate, mobile terminals such as WeChat, microblog and GPS, and business system data such as campus one-card data, student registration data, access records, consumption records, campus wifi access logs and one-card.
(2) And constructing a basic element subject library. And performing dimension decomposition on the data to construct a human, enterprise, ground, affair and object basic element subject database, which is shown in FIG. 2. In a campus scene, people in the element subject library can be refined into students, teaching workers, parents, visitors and the like; the enterprise can be divided into a supermarket, a canteen, a print shop, a glasses shop and the like; the 'affairs' can be refined into student entrance and exit records, stranger access records, infectious disease conditions and the like; the 'ground' can be refined into libraries, canteens, teaching buildings and the like.
(3) And (6) data processing. In the embodiment, a batch-type big data real-time processing module is built by combining a batch-type big data computing framework and a stream-type big data computing framework, so that massive data files can be processed in parallel in real time.
The specific data processing flow of the batch-flow type big data real-time processing module is shown in fig. 3, and the module is internally divided into small modules such as data acquisition, data loading, data bus, data analysis, business service and the like. The data acquisition module is responsible for accessing stream data in real time in the modes of internet of things acquisition, application end acquisition and the like; the data loading module is responsible for loading historical offline data and accessing stream data from a specific service system; the data bus module is responsible for putting various data into an appointed channel for transmission according to a uniform format; the data analysis module is responsible for extracting and processing real-time data and pushing product data. When the batch flow type big data real-time processing module receives a real-time query request sent by the service system, the batch flow type big data real-time processing module can calculate corresponding indexes on the complete big data set in real time according to an analysis processing model in the data analysis small module, judge the indexes and feed back the results to the service system through the service module.
(4) Dimensions of the user representation are constructed. Combining the data in the base element topic library with the campus scene depth, as shown in fig. 4, proposes to construct five dimensions surrounding the user portraits in the campus scene: demographic attributes, life attributes, social attributes, consumption characteristics, psychological attributes, in particular:
the population attributes are used for describing the basic characteristic information of the user social level and helping each important life application scene to know the basic situation of the user, and specifically include: name, gender, grade specialty, school number, dormitory number, height, age, marriage, contact, occupation, and the like.
The life attribute is used for knowing the life condition of the user, such as the life activity range, the travel mode and the like, so as to provide accurate service for the user in the following process, and the method specifically comprises the following steps: living activity range, travel pattern, etc.; wherein the life activity range includes: dining room, teaching building, dormitory building, market, bus station, railway station etc. the trip mode includes: bicycles, shared bicycles, electric vehicles, buses, self-driving, and the like.
The social attributes are used for describing a social graph, family members, a friend circle, interests and hobbies and the like of the user, the information usually represents a social relationship network of the user, and the user can be known as completely as possible through the social information so as to provide personalized services for the user, and the method specifically comprises the following steps: roommates, classmates, students, teachers, being more intimate, liking to go to a library, etc.
The consumption characteristics are used for describing main consumption habits and consumption preferences of users, potential users for consuming related services recommend related products and services according to the consumption characteristics of the users, the conversion rate is very high, and the consumption characteristics comprise: vehicle family, shopping type, purchase period, brand preference, etc.
The psychological attributes are used for paying attention to the psychological condition information of the users, such as characters, abilities, temperaments, values, emotions, thinking and the like, the psychological conditions of the users are obtained through anonymous questionnaires or similar user clustering, and corresponding psychological services are provided or important attention is paid according to the psychological conditions of the users.
(5) A user digital representation is constructed. According to whether the data belongs to non-video data or video data, two user portrait label construction modes, namely user label construction based on original data mining and user label construction based on a video structuring technology, are proposed, as shown in fig. 4.
For non-video data, comprehensive analysis and calculation are carried out on data of the five element topic libraries by using Natural Language Processing (NLP), clustering, classifying and association rule algorithms in a data mining algorithm, differences of behavior rules of different user groups are mined, and tags are marked for users.
Through the non-video data, detailed information of the user trip, such as behavior mode and dressing information, cannot be directly acquired. Therefore, to address this issue, the present example employs a video structuring technique that combines both traditional algorithms and deep learning algorithms.
The video structuring technology is that the video is subjected to algorithms in the fields of video image processing technology, text analysis technology and the like to extract key information of different levels, corresponding semantic description is carried out on the key information of the different levels, and finally the key video image information and the corresponding semantic information are subjected to structured storage through video standardized description, so that the key information of the video is conveniently recorded and retrieved. The method mainly relates to the technologies of target detection, behavior recognition, emotion recognition and the like, so that the information in the video image can be effectively expressed, and a corresponding descriptive sentence, namely a text label, can be generated for each image; for the attributes which are insufficient in data and difficult to determine, the embodiment performs complementation according to the corresponding attributes of similar users through a collaborative filtering algorithm.
This example will construct student representations that are rich and diverse, such as "super", "weak", "sports", and "hard" and "outer in nature", primarily from the individual perspective of the student.
(6) The method for constructing the deep Bayesian network rule model based on the event characteristics comprises the following steps: firstly, analyzing user digital portrait information in a campus scene, acquiring various information elements and behavior elements related to an event, and supporting the construction of an event feature model, as shown in fig. 5. The information elements specifically comprise time information, place information, track information, character information, time information, learning achievement and the like; behavioral elements include purchase, travel, communication, stay, and the like. Each type of key life scene can extract information elements and behavior elements of virtual and real spaces and even thought spaces which are specific to the type of events as much as possible by carrying out ontology analysis on the events, generalize the common characteristics and the common behaviors of the type of events on the basis of analyzing a plurality of similar events, construct and form a characteristic library of the information elements and the behavior elements which are specific to the type of events, and support risk prediction and early warning analysis of campus life scenes.
(7) And (5) predicting and early warning analysis. The behaviors of various types of event objects generated in different stages have abnormal characteristics, on one hand, the behaviors of the various types of event objects are abnormal compared with most behaviors of ordinary people, and on the other hand, the behaviors of the various types of event objects are abnormal compared with the daily behaviors of the various types of event objects. And analyzing data information of the virtual space and the real space of the target object, wherein the data information comprises basic information, communication behaviors, network behaviors, economic behaviors, consumption traces, accommodation traces and the like. As shown in fig. 6, in the present embodiment, by analyzing the behavior habits of the target object, and developing, comparing and mining the actual situation and the daily behavior of the target object or the behaviors of other ordinary people, a deep bayesian network is used to perform comprehensive research and judgment, identify abnormal behaviors, and support abnormal perception of events.
In the construction of the deep Bayesian network rule model, important attention is paid to several types of events with high occurrence probability and poor influence, such as public safety and health exception, campus deception event, mental health exception event and the like. The prediction early warning analysis is carried out by adopting a deep Bayesian network, and the basic principle is that on the premise of knowing prior probability and a conditional probability density expression, a conditional probability density function is deduced through statistical learning of samples aiming at the uncertainty problem of various event risks, and Bayesian algorithm criterion is used for converting the conditional probability density function into the posterior probability.
The Deep Bayesian network (Deep Bayesian network) is a description of the Probability relation of uncertainty knowledge, and combines the classical Probability Theory (Probability Theory) and the Graph Theory (Graph Theory), thereby not only having the Probability Theory as a solid mathematical basis, but also having the visual expression of the Graph Theory. In the deep Bayesian network, if the state of any node in the network is determined, the network can carry out forward or reverse reasoning in the network by using Bayesian rules, so that the posterior probability of any node in the network is obtained, which is a key mechanism for establishing a prediction early warning system in the deep Bayesian network.
The construction of the prediction early warning model based on the deep Bayesian network comprises four steps: firstly, based on an information element and behavior element feature library of an event, the incidence relation among the elements of the event is known, and a deep Bayesian network structure model is constructed. Combining historical sample data with expert opinions to determine the prior probability of the network node, namely the initial evidence of the risk probability. Inputting the sample data and the initial evidence into a network structure model, and inferring the conditional probability distribution of the non-root nodes of the network by using a parameter learning algorithm; because of the dynamic property and uncertainty of event occurrence, part of invisible variables which cannot be observed often exist in sample data, the example adopts an iterative convergence algorithm (EM algorithm) with missing values of the sample to carry out parameter learning, and model parameters continuously tend to maximum likelihood estimation through multiple iterations to finally obtain conditional probability distribution. And fourthly, based on the Bayesian algorithm criterion, converting the prior probability and the conditional probability into the posterior probability, namely the risk probability of the target event in the model. As shown in fig. 7.
According to the prediction early warning model based on the deep Bayesian network, an abnormity early warning function module in a campus scene displays students with possible abnormity according to a result of big data judgment of a background model, and key factors causing abnormity are given through a graph model, so that the prediction early warning model plays a vital role in timely and effectively managing the students for an education supervisor. The system is mainly divided into public safety and health abnormity, psychological health abnormity and event abnormity, and correspondingly comprises public safety and health early warning, psychological health early warning and campus deception event early warning.
Example 1 public health safety Pre-alarm
1.1 technical route
The traditional infectious disease outbreak risk prediction mainly comprises the following four aspects: (1) selecting infection types and regions of study; (2) selecting pathological, environmental and climatic factors related to the onset of infectious diseases; (3) selecting a proper model to establish an infectious disease outbreak risk evaluation model; (4) and predicting the probability of the epidemic situation of the infectious disease under various conditions and verifying the accuracy of the established model. The embodiment is modified appropriately, and the specific technical route is shown in fig. 8.
The method mainly adopts a mobile percentile method, and the selected risk factors mainly comprise meteorological factors, economy, population density factors and the like. The establishment of the Bayesian model mainly comprises four steps, namely discretization of data, Bayesian structure learning, parameter learning and network verification, wherein when the verification result is unsatisfactory, the structural learning needs to be returned again, and a Bayesian network structure is reconstructed; finally, uncertainty analysis is carried out on the adopted method, and the uncertainty analysis mainly comprises uncertainty of data processing, uncertainty of panel data clustering analysis, uncertainty of a mobile percentile method in classification of infection outbreak grades and uncertainty in an early warning model building process based on a Bayesian network.
1.2 clustering algorithm based on spatio-temporal panel model
Panel Data (Panel Data) is also called time series-cross section mixed Data, and mainly refers to sample Data with time series, and Data obtained by taking a plurality of sections on the time series for experiment; the panel data typically includes time series features and cross-sectional features, as well as features in both spatial and temporal dimensions.
A general linear panel data regression model is:
yij=Xitβ+μiit
wherein: i e [1, 2, …, N ∈]Refers to N different spatial individuals, T is from [1, 2, …, T ∈]Change in time of finger, yitDependent variable observed value, XitIs a row vector of a K-dimensional interpretation variable, beta is a column vector of a K-dimensional coefficient, muiRepresents the spatial unit individual effect, epsilonitIs a random error term.
If a certain phenomenon or a certain attribute of one spatial unit is similar to the phenomenon or the attribute of another spatial unit to a high degree, the two spatial units have certain spatial correlation, and the spatial panel data is divided into single-index spatial panel data and multi-index spatial panel data according to the indexes of the spatial panel data. The data of the single-index panel is represented by a two-dimensional table or matrix, and the data is as follows:
Figure BDA0002982864260000111
assuming that the total is N samples, X represents a characteristic index of each sample, and T is a time length, Xi(t) represents an index value of the i-th sample at time t.
Because the actual situation is too complex, the object to be studied in the actual research is often multi-index panel data, the structure of which is more complex than that of the traditional panel data structure, the time and space characteristics of which are usually represented by a three-dimensional table and sometimes can be represented by a matrix form.
Assuming a global sample X comprising N samples each having a characteristic value, T being the time length, the matrix of a multi-index panel sample X is represented as:
Figure BDA0002982864260000112
the overall sample X actually contains data of three dimensions of space (total number of samples), time and multiple indexes, and it can be reduced in dimension of space, i.e. it can be expressed as a group of "space samples", i.e. a three-dimensional table is expanded in form of a two-dimensional table in space, i.e. XS=[X1,...,Xi,...,XN]TOne spatial sample X of the sample XiThe matrix of yes is represented as:
Figure BDA0002982864260000113
wherein: i is more than or equal to 1 and less than or equal to N, j is more than or equal to 1 and less than or equal to P, T is more than or equal to 1 and less than or equal to T,
Figure BDA0002982864260000114
and (3) representing the index value of the jth index of the ith sample at the time t.
The sample X can be expressed as a group of indexes in the index dimension, namely a three-dimensional table is expanded into a two-dimensional table according to the index sequence, namely XV=[X1,...,Xj,...,XP]An index X of the sample XjThe matrix of (d) is represented as:
Figure BDA0002982864260000115
sample X can be represented in the time dimension as a set of "ordered samples", that is, a three-dimensional table is spread out chronologically as a two-dimensional table, i.e.:
XO=[X(1),...,X(t),...,X(T)]
the matrix of an ordered sample X (t) of sample X is represented as:
Figure BDA0002982864260000121
wherein, its digital characteristic mainly includes:
mean value of jth index at t time:
Figure BDA0002982864260000122
mean of jth index:
Figure BDA0002982864260000123
③ the variance of the jth index at the time t:
Figure BDA0002982864260000124
fourthly, variance of jth index:
Figure BDA0002982864260000125
compared with the traditional time series and cross section data, the spatio-temporal panel data can predict the situation of a future period more accurately and more quickly, and the accuracy of prediction and early warning can be improved more quickly in the uncertain field when the spatio-temporal panel data is combined with a Bayesian network.
1.3 Bayesian network-based space-time early warning algorithm
The method comprises the steps of establishing an infectious disease early warning model based on the Bayesian network by utilizing the existing knowledge, wherein the infectious disease early warning model mainly comprises data preprocessing, establishment of the Bayesian network for infectious disease outbreak risk, calculation of infectious disease outbreak risk probability, network verification and the like. The establishment of the Bayesian network is a crucial step and is the key to the success or failure of the establishment of the early warning model; when a network structure which is most fit with the actual disease state is found, the joint probability distribution of each node is calculated, so that the outbreak risk of the infectious disease is predicted.
Since an infectious disease is caused by not only one factor, but also many related epidemiological factors, economic factors, meteorological factors or environmental factors are combined together under the common condition, when the factors cannot be completely acquired, the factors are considered to be related to a part of data, and therefore, the factors which are most related to the outbreak and the epidemic of the infectious disease are found and analyzed. Because the Bayesian model can only process level and discrete data, for most influencing elements of continuous variables, only data discretization can be carried out, and an equidistant method is adopted for discretization, the number of a plurality of regions is required to be specified, and then a value domain is divided into a plurality of sub-regions according to a calculation method with equal width, so that a discretization result is obtained.
Then, a network structure learning algorithm based on independent test is adopted for carrying out the method, and the method mainly comprises the following steps:
firstly, initializing a graph structure G < V, E >; wherein, V is node ═ dataset of all attribute fields }, E { }, S ═ p, R { };
(v) for each node pairi,vj) Wherein v isi,vjE.g. V, I ≠ j, and calculates the interaction information I (V) of the e.g. V, I ≠ ji,vj) When the value I is larger than a certain fixed threshold value, adding the values I into the data set S in sequence according to the size sequence;
marking and removing a first node pair in the data set S, and putting two corresponding edges into the edge set A;
selecting a first node pair in the rest data set S, if the node has no communication path, adding the node pair into the edge set A, otherwise, putting the node pair into the R;
fifthly, repeating the fourth step until the S is empty;
sixthly, marking the first node pair in the R;
seventhly, taking out the node pair, testing the conditional independence of the node pair, and adding the node pair into the edge set A if the two nodes are still dependent on each other;
repeating until R is empty;
ninthly, for any edge in the E, if an edge other than one edge exists between the nodes, deleting the edge from the E temporarily; a conditional dependency test is then used to detect if two points are conditional, and if so, the edge is permanently deleted, otherwise E is added again.
Friedman theoretically proves that the learning algorithm based on independent test has the semantic characteristics of the network and achieves effective results in practical application. As shown in fig. 9, a bayesian network is a graphical structure, and each variable is a node therein and contains information represented by one or more probability distributions. A variable does not have any dependency on other variables if it does not have any arcs attached to it, and if it does, it has a probability distribution associated with it if it has an associated child or parent node.
1.4 infectious disease outbreak risk probability estimation
And when the structure based on the Bayesian network early warning model is constructed, the next step of work is to calculate a conditional probability distribution table of the relative nodes in the network structure. In this example, a bayesian formula method is mainly used to learn parameters of a bayesian network, and the method is performed under the assumption that variables in a data set are all discrete and have no missing value, and nodes in the network are independent of each other, and the method mainly includes the following steps:
first, data sets N and D are defined, where N has N variables and X has r possible sample segment values, i.e.
Figure BDA0002982864260000141
The data set D has m records, is a data set for recording the epidemic outbreak risk level, and each record in the data set D has the information of all variables in the Z; a Bayesian network structure B is also defined, which contains all the variables in N.
② in the structure BGIn, each node XiWill have a set of parent nodes pii(ii) a Definition of wijDenotes piiJ (j ═ 1, 2., q) in li Di) Fractional value of individual samples, NijkRepresents variable XiIs v isikIts father node piiIs wijThe number of data records in time D, then
Figure BDA0002982864260000142
Defining network conditional probability thetaijkIs a conditional probability P (X)i=viki=wij) It represents when node XiParent node pi ofiHas a value of wij,XiHas a value of vik,k∈[1,ri]Probability of time.
Given dataset D and Bayesian network structure BGWhen theta is greater than thetaijkThe expected value of (a) is calculated as:
Figure BDA0002982864260000143
θijkthe variance of (a) is calculated as:
Figure BDA0002982864260000144
in parameter learning, it is usually necessary to calculate P (N)1|N2) To infer the probability of an event occurring, where N1And N2Representing two different sets of variables, N1Expressed as the infectious disease outbreak risk rating, N2Representing the environment, climate and conditions associated with the outbreak of the infectious diseaseThe economic factor variable is the probability value corresponding to various risk levels of infectious diseases when various related factor variables are calculated. If N is present2As is known, the expected value E [ P (N) of this probability value is calculated1|N2)]It depends only on N1The likelihood value of (d); then, given a data set D and a Bayesian network structure BGWhen, E [ P (N)1|N2)]The calculation formula of (a) is as follows:
E[P(N1|N2)|D,BG]=P(N1|N2,D,BG)
wherein P (N)1|N2,D,BG) The calculation of the method can be obtained through a Bayes calculation formula and a repeated product-sum summation formula in a Bayesian network, meanwhile, the probability estimation value of each node, namely variable, in the network can be obtained through the calculation of the method, and the estimation structure is the expected value of the estimation structure.
1.5 introduction of related data
Pathogenic indexes: generally, the data such as virus detection rate and severe death rate need to be provided by professional institutions.
Demographic indexes: the population density (total number of susceptible people/area) of the susceptible population can be adjusted by regions according to the population flow of a specific region.
③ meteorological indexes: the weather indexes such as sunshine days, air temperature difference, average air temperature, average wind speed and the like are researched, the data mainly comes from a China weather data sharing service network and is obtained by an inverse distance weighting interpolation method on the basis of 756 station data in the whole country.
And fourthly, economic condition indexes: economics represents a regional development and also affects the prevalence and spread of disease to some extent. The present example mainly considers the urbanization level (town population/general population) as the economic index, and the data is derived from the Chinese economic statistics database.
1.6 spatial aggregative predictor indices
The incidence conditions of the hand-foot-and-mouth disease are different in different months according to the regional distribution, so that the spatial clustering detection is required. The two indexes of the disease incidence S and the severe rate Q are comprehensively considered, the clustering method of the multi-index spatial panel is utilized in the embodiment, the clustering is carried out under SPSS analysis software, and the following three aspects of information are comprehensively considered:
the incidence and the serious rate data are the actual incidence conditions of the hand-foot-and-mouth disease.
② the change condition of incidence and severe rate with time, namely the increment index, represents the change condition of incidence and severe rate with time.
The change rate or the change speed of the increment of the morbidity and the severe rate, namely the increment change condition of the morbidity and the severe rate, comprehensively considers the horizontal index, the increment index and the time sequence of the increment change rate index of the morbidity and the severe rate, and has the following main formula:
single level indicators, i.e. the data itself S and Q, i.e.:
Figure BDA0002982864260000151
the incremental indicators, namely:
Figure BDA0002982864260000161
the incremental rate of change indicator, namely:
Figure BDA0002982864260000162
and calculating the Euclidean distance of the disease to perform system clustering, so as to obtain areas with similar risk levels, and calculating the risk levels of the diseases according to the meteorological indexes and population flow conditions.
Example 2 mental health Pre-Warning
An on-line questionnaire is in the form of an effective screen for depression, and self-assessment data of students can be collected on-line using the on-line health questionnaire-depression scale (PHQ-9), but it is time and labor consuming, lacks real-time and reliability, and the quality and quantity of data collected are not high. Psychologist research shows that the real-time screening of the depression is feasible and accurate by using data of social media such as WeChat and microblog.
Therefore, the example combines the characteristics of students, utilizes the data of social media to construct student word clouds, combines data such as one-card data, internet data, mobile terminal data, access records, consumption records, video monitoring, GPS (global positioning system), campus wifi access logs and the like to obtain spatio-temporal information on the basis, analyzes the behavior tracks of the students, and constructs student figures and information behavior elements on the basis of the student word clouds and the behavior tracks.
And finally, early warning is carried out by using a deep Bayesian model according to data such as the social network, word cloud, information behavior elements and the like of the students, and the information of the students with the early warning value exceeding a threshold value is displayed and used as an attention object of a school to find out the abnormality of the psychology or behavior of the students in advance and make a break-away and precaution work.
2.1 building word clouds
1) Emotion dictionary construction
On the basis of the existing more complete general emotion dictionary, an emotion dictionary related to depression is constructed, and the emotion dictionary is divided into an active dictionary and a passive dictionary.
Crawling depression overword and contents in depression overword as an alternative passive dictionary, then crawling microblog contents at random as an alternative positive dictionary, and then performing data cleaning on the alternative passive dictionary and the alternative positive dictionary and reserving expression characters so as to improve the analysis capability on microblog expressions and network hotwords; and comparing the cleaned data with data in the emotion dictionary by using a TF-IDF algorithm, and bringing words with high similarity into the corresponding dictionary.
For the text part, firstly calling the registered basic information of the student, and crawling the microblog content and the WeChat friend circle content of the student; then data preprocessing operations are performed: removing information such as microblog topics and friend circle advertisements and links, and putting pictures into a picture library; and finally, segmenting words of the microblog and friend characters by using a word segmentation technology in natural language processing, and then performing text comparison with the emotion dictionary by using the TF-IDF algorithm to optimize the passive dictionary and the active dictionary.
2) Text sentiment analysis based on LSTM
In the embodiment, an open source semantic frame Word2Vec is used, high-dimensional vectors are used for Word representation, words with similar meanings are placed at similar positions, and then two words with similar meanings are found out by Euclidean distance or cosine similarity, so that the problem of 'one-meaning multiple-Word' is solved.
Combining the divided word vectors and sentences into a matrix, and encoding the input in the form of the matrix into one-dimensional vectors with lower dimensionality by using a Recurrent Neural Network (RNNs) or a Recurrent Neural Network (RNNs), while retaining most useful information, and combining an emotion dictionary to realize text emotion analysis.
3) Image emotion analysis
And manually marking the data in the picture library, wherein the labels are negative and positive, and then performing model training on the data by using an image classification model VGGNet in a computer vision technology to obtain a picture emotion classification model.
In the embodiment, the emotion dictionary and the picture library are divided into a training set and a testing set according to the proportion of 7:3, and a text emotion analysis model and a picture emotion analysis model are obtained through training.
Based on the method, sentiment analysis is carried out on the student friend circle and the microblog content by combining the sentiment dictionary and the picture library, and word cloud is constructed.
4) Emotion value calculation method
For the word cloud of the student, the example calculates the emotion values of a friend circle and a microblog of the student by using a weighted average method:
Figure BDA0002982864260000171
wherein: n is a radical ofp、NnNumber of words, wp, representing positive and negative respectivelyi、wpjWeights representing positive and negative words, Mp、MnNumber of words, wp, representing positive and negative respectivelya、wpbRepresenting the weight of the active and passive vocabulary, respectively.
2.2 student trajectories
According to the in-out record, the consumption record and the video monitoring of students or teaching workers, the action tracks of the students or the teaching workers are analyzed through data such as a mobile terminal GPS, a campus wifi access log and an all-purpose card, the track similarity is calculated according to the Hausdorff distance, and generally the higher the similarity is, the more intimate the relationship is. The moving track sequence of each user is calculated pairwise to obtain an intimacy value between the users, then density clustering is carried out according to an intimacy threshold value of 0.4, a plurality of user groups with social relations are classified, labels are applied to the user groups, a student digital portrait is constructed, and behavior patterns of students, such as behavior habits, life styles, consumption levels, network behaviors, learning states and the like of the students are represented.
Wherein the similarity measure between the tracks is the basis of track data mining and querying, for any two tracks TaAnd TbIs provided with TaAnd TbThe distance between is Dist (T)a,Tb) A distance of 0 means that the two tracks are identical, and a larger distance means that the two tracks have a lower similarity or a higher dissimilarity. CPD (Closest-Pfoir Distance) is a method for measuring the Distance between two tracks by taking the minimum Distance between position points in the two tracks, TaAnd TbThe CPD values in between are calculated as follows:
Figure BDA0002982864260000181
wherein: dist (loc, loc ') represents the euclidean distance between two location points loc and loc'.
2.3 social network
The students are used as nodes of the neural network, the threshold condition of connection establishment between the nodes is that the track similarity between the two students exceeds 0.5, and the weight of the connection between the nodes is the track similarity between the two students. The obtained social network formed by all students is shown in fig. 10(a), wherein nodes in the graph represent each student, different shades and colors of the nodes represent classes of the students, and the size of the nodes reflects the degree of the nodes, namely the number of the nodes connected with the nodes; it should be noted that, the network topology relationship of the student social network is shown in the figure, not the mapping of the student vector in the two-dimensional plane, it can be obviously found that most students are distributed in a cluster-like network by taking class as a unit, but there are also more isolated students, and from the size distribution of the nodes, there is a great difference in the individual sociability of the students, namely, there is a large node in the center of the cluster and a small node isolated to be hardly found. And fig. 10(b) shows a social network diagram distinguished by the gender of the student, and it can be seen that circles of social contact of boys and girls are basically separated, and boys and girls are basically clustered respectively except for the campus lovers relationship.
The accuracy of the student vector calculation can be laterally verified by combining common sense and the graphs in fig. 10(a) and 10(b), the student social network can show the isolation of students, and the calculation of the student isolation is converted into the mental health early warning based on the deep bayesian network in the example.
2.4 mental health early warning method
Establishing a deep Bayesian network by referring to a 1.4 risk probability estimation method, setting different weights for word cloud emotion, social networks and user portraits established by students, and training a model by using the weights as input features of the deep Bayesian network; the mental health early warning value is between 0 and 1, and early warning is carried out when the mental health early warning value exceeds 0.6.
Example 3 campus Notice Accident Warning
Behavior elements and information elements of the past campus deception event are obtained and analyzed according to the methods 2.1-2.3, and the personality, consumption condition, behavior habit, learning state, psychological condition and the like of the campus deception event are analyzed by combining student information, so that a deception student user portrait is constructed.
Constructing a deep Bayesian network by referring to a 1.4 risk probability estimation method, constructing a feature vector according to student user figures, behavior elements and information element characteristics thereof, and training to obtain a campus deception early warning model; and alarming when the risk value exceeds 0.5, paying corresponding attention to the related students, and performing psychological counseling, home visit or punishment if necessary.
The foregoing description of the embodiments is provided to enable one of ordinary skill in the art to make and use the invention, and it is to be understood that other modifications of the embodiments, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty, as will be readily apparent to those skilled in the art. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims (10)

1. A public digital life scene rule model prediction early warning method based on a deep Bayesian network comprises the following steps:
(1) obtaining mass multi-source heterogeneous data through three access ways of an Internet of things, an application terminal and a service system, and establishing a database;
(2) layering the database, and constructing a subject database of five basic elements, namely people, enterprises, places, matters and things;
(3) processing multi-source heterogeneous data by adopting a batch-flow type big data real-time processing technology;
(4) combining the five basic element subject libraries with a specific application scene to construct five dimensions of the user digital portrait under the specific application scene: demographic attributes, life attributes, social attributes, consumption characteristics, psychological attributes;
(5) according to the processed multi-source heterogeneous data, constructing a user digital portrait by data mining and analyzing a user label;
(6) aiming at a specific application scene, training a deep Bayesian network by using user digital portrait information to obtain an event risk prediction model under the scene, and then predicting and early warning risks existing in a target event by using the model.
2. The public digital life scene rule model prediction early warning method as claimed in claim 1, wherein: the multi-source heterogeneous data in the step (1) comprises structured data and unstructured data, the structured data comprises basic data including basic information such as houses and addresses and extended data including vehicle access information and internet of things perception information, and the unstructured data comprises life event information acquired by personnel and video monitoring data, audio data and image data acquired by equipment such as a camera.
3. The public digital life scene rule model prediction early warning method as claimed in claim 1, wherein: the batch-flow type big data real-time processing technology in the step (3) comprises five functional modules of data acquisition, data loading, a data bus, data analysis and business service, wherein the data acquisition module is responsible for accessing the flow data in real time by using the modes of internet of things acquisition and application end acquisition; the data loading module is responsible for loading historical offline data and access stream data from the service system; the data bus module is responsible for putting various data into an appointed channel for transmission according to a uniform format; the data analysis module is responsible for extracting and processing real-time data and pushing product data; when a real-time query request sent by a service system is received, the data analysis module can utilize an internal analysis processing model to calculate a corresponding index on a complete big data set in real time and judge the index, and the result is fed back to the service system through the service module.
4. The public digital life scene rule model prediction early warning method as claimed in claim 1, wherein: the population attributes in the step (4) are used for describing the basic characteristic information of the user social level and helping each important life application scene to know the basic situation of the user; the life attributes are used for knowing the life conditions of the user, including the life activity range and the travel mode, so that accurate services can be provided for the user in the following process; the social attributes are used for describing a social graph, family members, a friend circle and interests of the user, the information usually represents a social relationship network of the user, and the user can be known as completely as possible through the social information so as to provide personalized services for the user; the consumption characteristics are used for describing main consumption habits and consumption preferences of the users, mining potential users of related consumption services, recommending related products and services according to the consumption characteristics of the users and improving the recommendation conversion rate; the psychological attributes are used for paying attention to the psychological condition information of the user, acquiring the psychological condition of the user through anonymous questionnaire survey or a similar user clustering mode, and providing corresponding psychological service or paying attention to the psychological condition according to the psychological condition of the user.
5. The public digital life scene rule model prediction early warning method as claimed in claim 1, wherein: in the step (5), a user tag construction mode based on original data mining and a user tag construction mode based on a video structuring technology are respectively adopted for non-video data and video data in multi-source heterogeneous data; for non-video data, five methods of natural language processing, user intention identification, association rules, cluster analysis and track similarity are fused in a user tag construction mode based on original data mining; for the condition that specific dimension data of a specific user is missing, the completeness of a user label is ensured by using a collaborative filtering algorithm through the analysis completion characteristics of other similar users; for video data, a user label construction mode based on a video structuring technology integrates three methods of target detection, OpenCV + CNN emotion recognition and GaitSet gait recognition.
6. The public digital life scene rule model prediction early warning method as claimed in claim 5, wherein: the natural language processing process adopts TF-IDF algorithm to calculate the similarity between texts, further adopts a fastText classifier to classify the texts according to the similarity, finally adopts Word2Vec to extract Word vectors in the texts, and utilizes LSTM to fuse the Word vectors into sentence vectors and input the sentence vectors into a pre-trained recurrent neural network or a recurrent neural network, thereby predicting and analyzing the emotion shown by the similar texts.
7. The public digital life scene rule model prediction early warning method as claimed in claim 5, wherein: the user intention identification is to judge the behavior intention of the user according to the search record of the user or the analyzed user label, particularly, a TF-IDF algorithm is adopted to carry out vectorization on data in the implementation process, the feature selection is carried out by utilizing the mode of word frequency, chi-square and mutual information, and finally, a pre-trained decision tree CART, a random forest containing a plurality of decision trees, a logistic regression or a Bayesian model is adopted to judge the behavior intention of the user.
8. The public digital life scene rule model prediction early warning method as claimed in claim 5, wherein: the association rule is used for discovering the association between the data with the seemingly irregular surface, so that the regularity and the development trend between the data are found, and an Apriori algorithm or an FP-Growth algorithm is adopted in the specific implementation process; the cluster analysis is used for classifying similar data into one class, the similarity of each class of data is the maximum in principle, and the cluster is taken as an unsupervised algorithm and is suitable for analyzing high-dimensional data; the track similarity is to analyze the behavior tracks from the time domain and the space domain, mine the daily behavior rules and the preference of the user from the historical behavior tracks, and label the daily behavior rules and the preference.
9. The public digital life scene rule model prediction early warning method as claimed in claim 5, wherein: the OpenCV + CNN emotion recognition is used for detecting the expression state of the face in a video image, and the specific implementation process comprises the steps of firstly detecting and positioning the face, then extracting facial expression characteristics, and finally using a pre-trained convolutional neural network CNN for classifying and judging the facial expression; the GaitSet gait recognition is used for detecting the walking posture of a person in a video image, in the specific implementation process, firstly, the image is input into a Convolutional Neural Network (CNN) to extract features, then, a multi-feature Pooling mode is integrated to aggregate the features in the image into a feature vector, meanwhile, the features are enabled to be more discriminative by adopting Horizontal farming Pooling, and a double-flow method is adopted in prediction calculation, namely, the method comprises two channels: one is an RGB image channel used for modeling spatial information, the other is an optical flow channel used for RNN modeling time sequence information, the RGB image channel and the RNN modeling time sequence information are jointly trained and subjected to information fusion, and finally, the features are input into a trained model so as to realize gait recognition.
10. The public digital life scene rule model prediction early warning method as claimed in claim 1, wherein: the training and predicting process of the deep Bayesian network in the step (6) is as follows: firstly, analyzing user digital portrait information in a specific application scene, acquiring various information elements and behavior elements related to an event, knowing the association relationship among the elements of the event, and establishing a feature sample library based on the information elements and the behavior elements of the event; then combining the characteristic sample with the expert opinion to determine the prior probability of the network node, namely the initial evidence of the risk probability; inputting the characteristic sample and the initial evidence into a network structure, and inferring the conditional probability distribution of the non-root nodes in the network by using an EM (effective man-machine) algorithm; and finally, based on a Bayesian algorithm criterion, converting the prior probability and the conditional probability into a posterior probability, namely a probability prediction result of the occurrence risk of the target event.
CN202110292515.3A 2021-03-18 2021-03-18 Public digital life scene rule model prediction early warning method based on deep Bayesian network Active CN113010572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110292515.3A CN113010572B (en) 2021-03-18 2021-03-18 Public digital life scene rule model prediction early warning method based on deep Bayesian network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110292515.3A CN113010572B (en) 2021-03-18 2021-03-18 Public digital life scene rule model prediction early warning method based on deep Bayesian network

Publications (2)

Publication Number Publication Date
CN113010572A true CN113010572A (en) 2021-06-22
CN113010572B CN113010572B (en) 2023-04-18

Family

ID=76402593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110292515.3A Active CN113010572B (en) 2021-03-18 2021-03-18 Public digital life scene rule model prediction early warning method based on deep Bayesian network

Country Status (1)

Country Link
CN (1) CN113010572B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610309A (en) * 2021-08-13 2021-11-05 清华大学 Fire station site selection method and device based on big data and artificial intelligence
CN113641831A (en) * 2021-08-16 2021-11-12 中国科学院空天信息创新研究院 Knowledge graph-based forest fire spreading trend prediction method oriented to multi-source discrete data
CN113642986A (en) * 2021-08-02 2021-11-12 上海示右智能科技有限公司 Method for constructing digital notarization
CN115456843A (en) * 2022-09-14 2022-12-09 北京易思汇商务服务有限公司 Intelligent wind control system and method based on study-keeping big data analysis
CN115577289A (en) * 2022-12-08 2023-01-06 工福(北京)科技发展有限公司 Aggregation access management system and method for digital work administration system
CN116071077A (en) * 2023-03-06 2023-05-05 深圳市迪博企业风险管理技术有限公司 Risk assessment and identification method and device for illegal account
CN116151494A (en) * 2023-04-24 2023-05-23 中国科学院地理科学与资源研究所 Data processing method, device, equipment and computer readable storage medium
CN116340619A (en) * 2023-03-01 2023-06-27 复旦大学 Role mining analysis method for online community network spoofing
CN116362549A (en) * 2023-05-22 2023-06-30 北京航天常兴科技发展股份有限公司 Fire disaster prevention and control method based on data information mining technology
CN116823511A (en) * 2023-08-30 2023-09-29 北京中科心研科技有限公司 Method and device for identifying social isolation state of user and wearable device
CN117131944A (en) * 2023-10-24 2023-11-28 中国电子科技集团公司第十研究所 Multi-field-oriented interactive crisis event dynamic early warning method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573411A (en) * 2018-04-17 2018-09-25 重庆理工大学 Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method
US20190019193A1 (en) * 2017-07-13 2019-01-17 Zeek Mobile Ltd. Systems and methods for detection of online payment mechanism fraud
CN109902216A (en) * 2019-03-04 2019-06-18 桂林电子科技大学 A kind of data collection and analysis method based on social networks
CN110290120A (en) * 2019-06-12 2019-09-27 西安邮电大学 A kind of timing evolved network safe early warning method of cloud platform
CN108234463B (en) * 2017-12-22 2021-02-02 杭州安恒信息技术股份有限公司 User risk assessment and analysis method based on multi-dimensional behavior model
CN112434814A (en) * 2020-12-07 2021-03-02 中国人民解放军国防科技大学 Method for analyzing shipping economic potential based on multi-source heterogeneous information fusion algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190019193A1 (en) * 2017-07-13 2019-01-17 Zeek Mobile Ltd. Systems and methods for detection of online payment mechanism fraud
CN108234463B (en) * 2017-12-22 2021-02-02 杭州安恒信息技术股份有限公司 User risk assessment and analysis method based on multi-dimensional behavior model
CN108573411A (en) * 2018-04-17 2018-09-25 重庆理工大学 Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method
CN109902216A (en) * 2019-03-04 2019-06-18 桂林电子科技大学 A kind of data collection and analysis method based on social networks
CN110290120A (en) * 2019-06-12 2019-09-27 西安邮电大学 A kind of timing evolved network safe early warning method of cloud platform
CN112434814A (en) * 2020-12-07 2021-03-02 中国人民解放军国防科技大学 Method for analyzing shipping economic potential based on multi-source heterogeneous information fusion algorithm

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HANQING CHAO: "GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition", 《HTTPS://ARXIV.ORG》 *
WATERSINK: "步态识别之GaitSet", 《CSDN博客》 *
何福贵: "《Python深度学习 逻辑、算法与编程实战》", 30 September 2020 *
刘峡壁: "《人工智能——机器学习与神经网络》", 31 August 2020 *
张秀伟: "Web服务个性化推荐研究综述", 《计算机工程与科学》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642986A (en) * 2021-08-02 2021-11-12 上海示右智能科技有限公司 Method for constructing digital notarization
CN113642986B (en) * 2021-08-02 2024-04-16 上海示右智能科技有限公司 Method for constructing digital notarization
CN113610309A (en) * 2021-08-13 2021-11-05 清华大学 Fire station site selection method and device based on big data and artificial intelligence
CN113641831A (en) * 2021-08-16 2021-11-12 中国科学院空天信息创新研究院 Knowledge graph-based forest fire spreading trend prediction method oriented to multi-source discrete data
CN113641831B (en) * 2021-08-16 2022-04-15 中国科学院空天信息创新研究院 Knowledge graph-based forest fire spreading trend prediction method oriented to multi-source discrete data
CN115456843A (en) * 2022-09-14 2022-12-09 北京易思汇商务服务有限公司 Intelligent wind control system and method based on study-keeping big data analysis
CN115577289A (en) * 2022-12-08 2023-01-06 工福(北京)科技发展有限公司 Aggregation access management system and method for digital work administration system
CN116340619A (en) * 2023-03-01 2023-06-27 复旦大学 Role mining analysis method for online community network spoofing
CN116340619B (en) * 2023-03-01 2023-12-12 复旦大学 Role mining analysis method for online community network spoofing
CN116071077A (en) * 2023-03-06 2023-05-05 深圳市迪博企业风险管理技术有限公司 Risk assessment and identification method and device for illegal account
CN116151494A (en) * 2023-04-24 2023-05-23 中国科学院地理科学与资源研究所 Data processing method, device, equipment and computer readable storage medium
CN116362549A (en) * 2023-05-22 2023-06-30 北京航天常兴科技发展股份有限公司 Fire disaster prevention and control method based on data information mining technology
CN116362549B (en) * 2023-05-22 2023-08-04 北京航天常兴科技发展股份有限公司 Fire disaster prevention and control method based on data information mining technology
CN116823511A (en) * 2023-08-30 2023-09-29 北京中科心研科技有限公司 Method and device for identifying social isolation state of user and wearable device
CN116823511B (en) * 2023-08-30 2024-01-09 北京中科心研科技有限公司 Method and device for identifying social isolation state of user and wearable device
CN117131944A (en) * 2023-10-24 2023-11-28 中国电子科技集团公司第十研究所 Multi-field-oriented interactive crisis event dynamic early warning method and system
CN117131944B (en) * 2023-10-24 2024-01-12 中国电子科技集团公司第十研究所 Multi-field-oriented interactive crisis event dynamic early warning method and system

Also Published As

Publication number Publication date
CN113010572B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN113010572B (en) Public digital life scene rule model prediction early warning method based on deep Bayesian network
Chen et al. A survey on an emerging area: Deep learning for smart city data
Ma et al. Analyzing driving factors of land values in urban scale based on big data and non-linear machine learning techniques
Nissan Digital technologies and artificial intelligence’s present and foreseeable impact on lawyering, judging, policing and law enforcement
Hu et al. A semantic and sentiment analysis on online neighborhood reviews for understanding the perceptions of people toward their living environments
Dodge et al. Towards a taxonomy of movement patterns
Rinzivillo et al. Visually driven analysis of movement data by progressive clustering
Pham et al. Selection of K in K-means clustering
Cheng et al. Evaluation methods and measures for causal learning algorithms
CN113158023B (en) Public digital life accurate classification service method based on mixed recommendation algorithm
CN108027888A (en) Detected using the local anomaly of context signal
Kaklauskas et al. Intelligent decision support systems
Li et al. Extraction of affective responses from customer reviews: an opinion mining and machine learning approach
Ebrahimpour et al. Comparison of main approaches for extracting behavior features from crowd flow analysis
Taylor et al. Artificial intelligence from colonial india: Race, statistics, and facial recognition in the global south
Cao et al. Cluster-based correlation of severe driving events with time and location
Mikhailov et al. Tourist behaviour analysis based on digital pattern of life—an approach and case study
Papadimitriou et al. Needs and priorities of road safety stakeholders for evidence-based policy making
Zeng et al. A framework for WWW user activity analysis based on user interest
Liu et al. A real-time explainable traffic collision inference framework based on probabilistic graph theory
Bernasco The usefulness of measuring spatial opportunity structures for tracking down offenders: A theoretical analysis of geographic offender profiling using simulation studies
Silveira et al. TensorAnalyzer: identification of urban patterns in big cities using non-negative tensor factorization
Hopfe et al. Short-term forecasting airport passenger flow during periods of volatility: Comparative investigation of time series vs. neural network models
Selvarathi et al. A visualisation technique of extracting hidden patterns for maintaining road safety
Rossmo Bayesian geographic profiling: A fundamental limitation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant