CN113822498A - Social contradiction index prediction method based on big data - Google Patents

Social contradiction index prediction method based on big data Download PDF

Info

Publication number
CN113822498A
CN113822498A CN202111273135.1A CN202111273135A CN113822498A CN 113822498 A CN113822498 A CN 113822498A CN 202111273135 A CN202111273135 A CN 202111273135A CN 113822498 A CN113822498 A CN 113822498A
Authority
CN
China
Prior art keywords
index
events
word
level
social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111273135.1A
Other languages
Chinese (zh)
Other versions
CN113822498B (en
Inventor
陈鹏
周金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inspector Intelligent Technology Co Ltd
Original Assignee
Nanjing Inspector Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inspector Intelligent Technology Co Ltd filed Critical Nanjing Inspector Intelligent Technology Co Ltd
Priority to CN202111273135.1A priority Critical patent/CN113822498B/en
Publication of CN113822498A publication Critical patent/CN113822498A/en
Application granted granted Critical
Publication of CN113822498B publication Critical patent/CN113822498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a big data-based social contradiction index prediction method, which comprises the following steps of 1, constructing a social contradiction index system, designing the grade number of the index system and indexes contained in each grade, and determining the score weight of a child index in a parent index; step 2, classifying the collected social contradiction events into N3Under three-level indexes; step 3, calculating the score of each three-level index based on the social contradiction event corresponding to each three-level index, and predicting the social contradiction index; and 4, carrying out model iterative optimization. By constructing the index system of the social contradiction index, the severity of the contradiction in each social field can be seen systematically and integrally, so that the index system of the social contradiction index can be establishedThe method can be used for pertinently focusing on the serious contradiction field, so that manpower and material resources are greatly saved.

Description

Social contradiction index prediction method based on big data
Technical Field
The invention relates to the field of big data and social contradiction research, in particular to a social contradiction index prediction method based on big data.
Background
With the continuous development of society and the continuous increase of population, the communication between people is more and more, various contradiction conflicts are more prominent, and if the social contradiction conflict is unpredictable after the conflict occurs, the development of the contradiction conflict is finally changed into larger contradiction events. In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: at present, when social management related personnel judge the possibility of major contradiction events in related fields, the judgment is mainly carried out manually, the processing mode depends heavily on the professional knowledge level of people, people with different background technical knowledge need to carry out judgment processing in different social fields, a large amount of manpower and material resources are consumed, meanwhile, the judgment of the personnel on the field affiliation of the social contradiction events has certain subjective randomness, the classification cannot be accurate, and the scientific evaluation cannot be carried out on the contradiction indexes in the social fields.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a social contradiction index prediction method based on big data, and by constructing a social contradiction index system, the severity of contradictions in each social field can be seen systematically on the whole, so that the serious contradiction field can be focused and processed in a targeted manner, and manpower and material resources are greatly saved.
The technical scheme is as follows: the invention provides a social contradiction index calculation method based on big data, which comprises the following steps:
step 1, constructing a social contradiction index system, designing the number of grades of the index system and indexes contained in each grade, and determining the score weight of a sub-index in a father index; the index system of social contradiction index comprises N1First level index, N1Each first-level index includes N2Second level index, N2N is included under each secondary index3And determining the score weight of each tertiary index in the secondary indexes thereof, and determining the score weight of each secondary index in the primary indexes thereof.
Step 2, classifying the collected social contradiction events into N3Under the individual three-level index, the social contradictory events mainly comprise the content, title and classification of the contradictory events; and (4) performing text cleaning on the collected social contradiction events to remove invalid information.
Aiming at each three-level index, screening out the social contradiction events containing the three-level index from the collected social contradiction events, and combining the title, classification and content of each event in the screened social contradiction events to form new text content of the event; and performing word segmentation processing on the new text content by using a Baidu LAC model, and obtaining a word segmentation result through part of speech screening and stop word removal.
Aiming at each three-level index, calculating the weight w of the words in each word segmentation result under the three-level indextAnd selecting the top k words with the maximum weight as the classified keywords of the three-level index to construct a keyword dictionary.
According to the keyword dictionary, similarity calculation is carried out on all the collected social contradiction events, and a third-level index with the maximum similarity is selected as a third-level index of the social contradiction events, and the specific method comprises the following steps:
firstly, performing word segmentation processing on the contents of all the social contradiction events by using a Baidu LAC model, and removing and screening new word segmentation results according to the part of speech and stop words; removing duplication of all word segmentation words in the new word segmentation result and all words in the third-level index keyword dictionary, putting the words together to form a word bag, and numbering each word in the word bag; combining the words in the new word segmentation result and the words in the three-level index keyword dictionary to form a word set, and converting the words in the word set into word vectors by using the word bags marked with numbers, wherein the form is as follows:
[(N1,C1),(N2,C2),...(Nn,Cn)];
wherein N isiNumber indicating the word in the pocket, CiIndicating the number of times the word occurs in the set of words.
Based on the converted word vectors, calculating the weight TF _ IDF of each component in each word vector by using a TF-IDF algorithmt,eA weighted normalized vector is generated.
Calculating cosine similarity of the normalized vector corresponding to each event and the normalized vector of each three-level indexX,Y
Figure BDA0003329319300000021
Wherein, XiCorresponding for each eventWeight of the i-th word of the normalized vector X, YiThe weight of the ith word of the normalized vector Y for each tertiary index.
And selecting the three-level index with the maximum similarity as the belonging three-level index of the event.
And 3, calculating the score of each three-level index based on the social contradiction event corresponding to each three-level index, and predicting the social contradiction index.
Training the extra-severe event risk Model 1: according to the severity of the contradictory events, screening out the special severe events in the latest period of time from the social contradictory events corresponding to each three-level index, and counting the characteristics of all special severe events under the three-level index in the period before the events occur as positive examples; in order to avoid the problem of category imbalance, 2 times of the number of the non-severe events in the same period are randomly selected, the characteristics of all the non-severe events under the three-level index in a period before the events occur are counted and used as counterexamples, and therefore a training sample is obtained.
According to the severity of the contradictory events, selecting a special severe event as a positive example in the latest period of time from the social contradictory events corresponding to each three-level index, and randomly selecting 2 times of the same-period non-severe events as negative examples to avoid the problem of category imbalance as a sample label; and respectively counting the characteristics of all the social contradiction events under the three-level index in a period of time before the events occur, and taking the characteristics as sample characteristics.
Training by using an LR Model to obtain a special serious event risk Model 1; and the weight coefficient of each characteristic of the model is properly adjusted, so that the model is more suitable for predicting the contradictory events.
And predicting the probability of the occurrence of the extremely serious event as the score of the corresponding three-level index by using the trained extremely serious event risk Model1 based on the data of the social contradiction events of the three-level indexes in the latest period of time.
Calculating the scores of indexes at all levels: and calculating the score of the secondary index according to the score of the tertiary index and the score weight of the child index in the parent index, and further calculating the score of the primary index.
And (4) taking the social contradiction index as a zero-level index, and predicting the social contradiction index according to the weight and the score of the first-level index.
Step 4, model iteration optimization
The Model1 is optimized iteratively at regular intervals;
and updating the keyword dictionary of the third-level indexes: after the method of step 2 is used in the initial state to obtain the keyword dictionary of the initial version, newly-added social contradiction events every day are added into all the social contradiction events, the existing keyword dictionary is used for event and three-level index matching, word segmentation and weight calculation processing are carried out on the matched events, k keywords with the maximum weight (such as k value of 100) under each three-level index are selected to construct a new keyword dictionary, and updating of the keyword dictionary is achieved.
Iterative training of the very severe event risk Model 1: and adding the newly added social contradiction events every day into all the social contradiction events, updating the training samples, and obtaining a new special serious event risk Model1 after iterative training.
Preferably, the index system of social contradiction index in step 1 includes: the 2 first-level indexes comprise 13 second-level indexes under the 2 first-level indexes, and the 36 third-level indexes under the 13 second-level indexes.
Preferably, a social contradiction index system is constructed in the step 1, and the reasonability and completeness of the index system and the score weight are determined by constructing in an expert review mode.
Preferably, the first k words with the largest weight are selected as the classification keywords of the third-level index, and k takes a value of 100.
Preferably, in step 2, the weight w of the word in each word segmentation result under the three-level index is calculatedtThe method specifically comprises the following steps:
Figure BDA0003329319300000041
wherein, count (t) represents the word t at event ejThe frequency of occurrence in the segmentation result, | ej| represents event ejWord in word segmentation resultThe number of words, n, represents the total number of events under the three-level index.
Further, in step 2, the weight TF _ IDF of each component in each word vector is calculated by using the TF-IDF algorithmt,eThe method specifically comprises the following steps:
Figure BDA0003329319300000042
Figure BDA0003329319300000043
Figure BDA0003329319300000044
wherein, count (t) is the word t at event ejFrequency of occurrence in the word segmentation result, | ejL is event ejThe number of words in the word segmentation result, m is the total number of all events and all three-level indexes, I (t, e)j) Represents an event ejWhether the word segmentation result contains a word t or not is 1 if the word t is contained, and is 0 if the word t is not contained;
compared with the prior art, one of the technical schemes has the following beneficial effects: firstly, dividing social contradiction into three classes by constructing a social contradiction index system, and constructing a corresponding index system through the classes, so that scientific evaluation can be performed on various social fields; secondly, classifying the collected social contradiction events under each three-level index by using a text algorithm, realizing accurate classification and avoiding the problems caused by manual classification; thirdly, aiming at each three-level index, calculating a corresponding index score based on the classified social contradiction events; and finally, sequentially calculating a secondary index score, a primary index score and a final social contradiction index score according to the weight of the index system constructed in the first step by using the calculated tertiary index score.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.
In a first aspect: the embodiment of the disclosure provides a social contradiction index calculation method based on big data, which comprises the following steps:
step 1, constructing a social contradiction index system, designing the number of grades of the index system and indexes contained in each grade, and determining the score weight of a sub-index in a father index; the index system of the social contradiction index comprises: the method comprises the steps of obtaining 2 first-level indexes, obtaining 13 second-level indexes under the 2 first-level indexes, obtaining 36 third-level indexes under the 13 second-level indexes, determining the score weight of each third-level index in the second-level indexes, and determining the score weight of each second-level index in the first-level indexes.
For example, the first-level index is a contradiction in material quality, the second-level index below the first-level index is an education problem, a medical problem and the like, and the third-level index below the education problem is an education charge, an education official and the like.
Preferably, a social contradiction index system is constructed, the reasonability and the completeness of the index system and the score weight are determined through construction in an expert review mode, and classification of the social contradiction can be determined without delay.
Step 2, classifying the collected social contradiction events into N3Under the three-level index, the social contradictory events mainly comprise the content, title and classification of the contradictory events.
And the collected social contradiction events are subjected to text cleaning, so that invalid information is removed, and the interference on the next operation is prevented.
Aiming at each three-level index, such as an education fairness index, screening out social contradiction events containing the three-level index (such as screening out the social contradiction events containing the education fairness) from the collected social contradiction events, and combining the title, classification and content of each event in the screened social contradiction events to form new text content of the event; and performing word segmentation processing on the new text content by using a Baidu LAC model, and obtaining a word segmentation result through part of speech screening and stop word removal.
Aiming at each three-level index, calculating the weight w of the words in each word segmentation result under the three-level indextAnd selecting the top k (such as k taking the value of 100) words with the maximum weight as the classification keywords of the three-level index to construct a keyword dictionary.
Preferably, in step 2, the weight w of the word in each word segmentation result under the three-level index is calculatedtThe method specifically comprises the following steps:
Figure BDA0003329319300000061
wherein, count (t) represents the word t at event ejThe frequency of occurrence in the segmentation result, | ej| represents event ejThe number of words in the word segmentation result, and n represents the total number of events under the three-level index.
According to the keyword dictionary, similarity calculation is carried out on all the collected social contradiction events, and a third-level index with the maximum similarity is selected as a third-level index of the social contradiction events, and the specific method comprises the following steps:
firstly, performing word segmentation processing on the contents of all the social contradiction events by using a Baidu LAC model (aiming at the contents only and preventing the titles from wrongly interfering with the similarity calculation of the original classification), and removing and screening new word segmentation results according to the parts of speech and stop words; removing duplication of all word segmentation words in the new word segmentation result and all words in the third-level index keyword dictionary, putting the words together to form a word bag, and numbering each word in the word bag; combining the words in the new word segmentation result and the words in the three-level index keyword dictionary to form a word set, and converting the words in the word set into word vectors by using the word bags marked with numbers, wherein the form is as follows:
[(N1,C1),(N2,C2),...(Nn,Cn)]
wherein N isiNumber indicating the word in the pocket, CiIndicating the number of times the word occurs in the set of words.
Based on the converted word vectors, calculating the weight TF _ IDF of each component in each word vector by using a TF-IDF algorithmt,eA weighted normalized vector is generated.
Preferably, in step 2, the weight TF _ IDF of each component in each word vector is calculated by using TF-IDF algorithmt,eThe method specifically comprises the following steps:
Figure BDA0003329319300000062
Figure BDA0003329319300000063
Figure BDA0003329319300000064
wherein, count (t) is the word t at event ejFrequency of occurrence in the word segmentation result, | ejL is event ejThe number of words in the word segmentation result, m is the total number of all events and all three-level indexes, I (t, e)j) Represents an event ejAnd whether the word segmentation result contains the word t or not is 1 if the word t is contained, and is 0 if the word t is not contained.
Calculating cosine similarity of the normalized vector corresponding to each event and the normalized vector of each three-level indexX,Y
Figure BDA0003329319300000071
Wherein, XiWeight of the i-th word of the normalized vector X for each event, YiThe weight of the ith word of the normalized vector Y for each tertiary index.
And selecting the three-level index with the maximum similarity as the belonging three-level index of the event.
Step 3, calculating the score of each three-level index based on the social contradiction event corresponding to each three-level index, and predicting the social contradiction index;
training the extra-severe event risk Model 1:
according to the severity of the contradictory events, screening out the special severe events in the latest period of time from the social contradictory events corresponding to each three-level index, and counting the characteristics of all special severe events under the three-level index in the period before the events occur as positive examples; in order to avoid the problem of category imbalance, 2 times of the number of the non-severe events in the same period are randomly selected, the characteristics of all the non-severe events under the three-level index in a period before the events occur are counted and used as counterexamples, and therefore a training sample is obtained.
According to the severity of the contradictory events, selecting a special severe event as a positive example in the latest period of time from the social contradictory events corresponding to each three-level index, and randomly selecting 2 times of the same-period non-severe events as negative examples to avoid the problem of category imbalance as a sample label; and respectively counting the characteristics of all the social contradiction events under the three-level index in a period of time before the events occur, and taking the characteristics as sample characteristics.
Training by using an LR Model to obtain a special serious event risk Model 1; and the weight coefficient of each characteristic of the model is properly adjusted, so that the model is more suitable for predicting the contradictory events.
And predicting the probability of the occurrence of the extremely serious event as the score of the corresponding three-level index by using the trained extremely serious event risk Model1 based on the data of the social contradiction events of the three-level indexes in the latest period of time.
Calculating the scores of indexes at all levels: and calculating the score of the secondary index according to the score of the tertiary index and the score weight of the child index in the parent index, and further calculating the score of the primary index.
And (4) taking the social contradiction index as a zero-level index, and predicting the social contradiction index according to the weight and the score of the first-level index.
Preferably, the method also comprises a step 4 of model iterative optimization
And setting fixed time every day, and performing iterative optimization on the model.
And updating the keyword dictionary of the third-level indexes: after the method of step 2 is used in the initial state to obtain the keyword dictionary of the initial version, newly-added social contradiction events every day are added into all the social contradiction events, the existing keyword dictionary is used for event and three-level index matching, word segmentation and weight calculation processing are carried out on the matched events, k keywords with the maximum weight (such as k value of 100) under each three-level index are selected to construct a new keyword dictionary, and updating of the keyword dictionary is achieved.
Iterative training of the very severe event risk Model 1: and adding newly added social contradiction events every day into all the social contradiction events, updating the training samples, and retraining to obtain a new special serious event risk Model1 so as to improve the accuracy of Model prediction.
The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (6)

1. A social contradiction index calculation method based on big data is characterized by comprising the following steps:
step 1, constructing a social contradiction index system, designing the number of grades of the index system and indexes contained in each grade, and determining the score weight of a sub-index in a father index; index system of social contradiction indexDraw N1First level index, N1Each first-level index includes N2Second level index, N2N is included under each secondary index3The third-level indexes simultaneously determine the scoring weight of each third-level index in the second-level indexes thereof and determine the scoring weight of each second-level index in the first-level indexes thereof;
step 2, classifying the collected social contradiction events into N3Under the individual three-level index, the social contradictory events mainly comprise the content, title and classification of the contradictory events; performing text cleaning on the collected social contradiction events to remove invalid information;
aiming at each three-level index, screening out the social contradiction events containing the three-level index from the collected social contradiction events, and combining the title, classification and content of each event in the screened social contradiction events to form new text content of the event; performing word segmentation processing on new text contents by using a Baidu LAC model, and obtaining word segmentation results through part of speech screening and stop word removal;
aiming at each three-level index, calculating the weight w of the words in each word segmentation result under the three-level indextSelecting the first k words with the largest weight as the classified keywords of the three-level index, and constructing a keyword dictionary;
according to the keyword dictionary, similarity calculation is carried out on all the collected social contradiction events, and a third-level index with the maximum similarity is selected as a third-level index of the social contradiction events, and the specific method comprises the following steps:
firstly, performing word segmentation processing on the contents of all the social contradiction events by using a Baidu LAC model, and removing and screening new word segmentation results according to the part of speech and stop words; removing duplication of all word segmentation words in the new word segmentation result and all words in the third-level index keyword dictionary, putting the words together to form a word bag, and numbering each word in the word bag; combining the words in the new word segmentation result and the words in the three-level index keyword dictionary to form a word set, and converting the words in the word set into word vectors by using the word bags marked with numbers, wherein the form is as follows:
[(N1,C1),(N2,C2),...(Nn,Cn)];
wherein N isiNumber indicating the word in the pocket, CiRepresenting the number of times the word appears in the set of words;
based on the converted word vectors, calculating the weight TF _ IDF of each component in each word vector by using a TF-IDF algorithmt,eGenerating a weighted normalized vector;
calculating cosine similarity of the normalized vector corresponding to each event and the normalized vector of each three-level indexX,Y
Figure FDA0003329319290000011
Wherein, XiWeight of the i-th word of the normalized vector X for each event, YiThe weight of the ith word of the normalized vector Y for each tertiary index;
selecting the three-level index with the maximum similarity as the belonging three-level index of the event;
step 3, calculating the score of each three-level index based on the social contradiction event corresponding to each three-level index, and predicting the social contradiction index;
training the extra-severe event risk Model 1: according to the severity of the contradictory events, screening out the special severe events in the latest period of time from the social contradictory events corresponding to each three-level index, and counting the characteristics of all special severe events under the three-level index in the period before the events occur as positive examples; in order to avoid the problem of category imbalance, randomly selecting 2 times of the same-period non-severe events, and counting the characteristics of all non-severe events under the three-level index in a period before the events occur to be used as counterexamples, so as to obtain a training sample;
according to the severity of the contradictory events, selecting a special severe event as a positive example in the latest period of time from the social contradictory events corresponding to each three-level index, and randomly selecting 2 times of the same-period non-severe events as negative examples to avoid the problem of category imbalance as a sample label; respectively counting the characteristics of all the social contradiction events under the three-level index in a period of time before the events occur, and taking the characteristics as sample characteristics;
training by using an LR Model to obtain a special serious event risk Model 1; the weight coefficient of each characteristic of the model is properly adjusted, so that the model is more suitable for predicting contradictory events;
predicting the probability of the occurrence of the extremely serious event by using a trained extremely serious event risk Model1 based on the data of the social contradiction events of the three levels of indexes in the latest period of time, and taking the probability as the score of the corresponding three levels of indexes;
calculating the scores of indexes at all levels: calculating the score of the secondary index according to the score of the tertiary index and the score weight of the child index in the parent index, and further calculating the score of the primary index;
the social contradiction index is regarded as a zero-level index, and the social contradiction index is predicted according to the weight and the score of the first-level index;
step 4, model iteration optimization
The Model1 is optimized iteratively at regular intervals;
and updating the keyword dictionary of the third-level indexes: after the method of step 2 is used in the initial state to obtain the keyword dictionary of the initial version, newly-added social contradiction events every day are added into all the social contradiction events, the existing keyword dictionary is used for matching the events with the three-level indexes, word segmentation and weight calculation processing are carried out on the matched events, k keywords with the maximum weight under each three-level index are selected to construct a new keyword dictionary, and updating of the keyword dictionary is achieved;
iterative training of the very severe event risk Model 1: and adding the newly added social contradiction events every day into all the social contradiction events, updating the training samples, and obtaining a new special serious event risk Model1 after iterative training.
2. The big data-based social contradiction index calculation method according to claim 1, wherein the social contradiction index system in step 1 comprises: the 2 first-level indexes comprise 13 second-level indexes under the 2 first-level indexes, and the 36 third-level indexes under the 13 second-level indexes.
3. The social contradiction index calculation method based on big data according to claim 1, characterized in that a social contradiction index system is constructed in step 1, and the reasonability and completeness of the index system and the score weight are determined by constructing in an expert review mode.
4. The big data-based social contradiction index calculation method according to claim 1, wherein the top k words with the largest weight are selected as the classification keywords of the three-level index, and k takes a value of 100.
5. The big data-based social contradiction index calculation method according to any one of claims 1-4, wherein the weight w of the words in each word segmentation result under the three-level index is calculated in step 2tThe method specifically comprises the following steps:
Figure FDA0003329319290000031
wherein, count (t) represents the word t at event ejThe frequency of occurrence in the segmentation result, | ej| represents event ejThe number of words in the word segmentation result, and n represents the total number of events under the three-level index.
6. The big-data-based social contradiction index calculation method according to claim 5, wherein the weight TF _ IDF of each component in each word vector is calculated in step 2 by using TF-IDF algorithmt,eThe method specifically comprises the following steps:
Figure FDA0003329319290000032
Figure FDA0003329319290000033
Figure FDA0003329319290000034
wherein, count (t) is the word t at event ejFrequency of occurrence in the word segmentation result, | ejL is event ejThe number of words in the word segmentation result, m is the total number of all events and all three-level indexes, I (t, e)j) Represents an event ejAnd whether the word segmentation result contains the word t or not is 1 if the word t is contained, and is 0 if the word t is not contained.
CN202111273135.1A 2021-10-29 2021-10-29 Social contradiction index prediction method based on big data Active CN113822498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111273135.1A CN113822498B (en) 2021-10-29 2021-10-29 Social contradiction index prediction method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111273135.1A CN113822498B (en) 2021-10-29 2021-10-29 Social contradiction index prediction method based on big data

Publications (2)

Publication Number Publication Date
CN113822498A true CN113822498A (en) 2021-12-21
CN113822498B CN113822498B (en) 2023-07-18

Family

ID=78917586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111273135.1A Active CN113822498B (en) 2021-10-29 2021-10-29 Social contradiction index prediction method based on big data

Country Status (1)

Country Link
CN (1) CN113822498B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062414A (en) * 2017-12-31 2018-05-22 郑州玄机器人有限公司 A kind of contradiction and disputes public safety index statistical method
CN109711627A (en) * 2018-12-28 2019-05-03 大庆市嘉华科技有限公司 A kind of data processing method and device
WO2019227710A1 (en) * 2018-05-31 2019-12-05 平安科技(深圳)有限公司 Network public opinion analysis method and apparatus, and computer-readable storage medium
CN111798073A (en) * 2019-04-08 2020-10-20 郑州大学 Medical equipment evaluation method and index weight determination method and device
CN112883169A (en) * 2021-04-29 2021-06-01 南京视察者智能科技有限公司 Contradiction evolution analysis method and device based on big data
CN113450026A (en) * 2021-08-06 2021-09-28 智绿(福建)科技有限公司 Method for evaluating social influence index of environmental risk

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062414A (en) * 2017-12-31 2018-05-22 郑州玄机器人有限公司 A kind of contradiction and disputes public safety index statistical method
WO2019227710A1 (en) * 2018-05-31 2019-12-05 平安科技(深圳)有限公司 Network public opinion analysis method and apparatus, and computer-readable storage medium
CN109711627A (en) * 2018-12-28 2019-05-03 大庆市嘉华科技有限公司 A kind of data processing method and device
CN111798073A (en) * 2019-04-08 2020-10-20 郑州大学 Medical equipment evaluation method and index weight determination method and device
CN112883169A (en) * 2021-04-29 2021-06-01 南京视察者智能科技有限公司 Contradiction evolution analysis method and device based on big data
CN113450026A (en) * 2021-08-06 2021-09-28 智绿(福建)科技有限公司 Method for evaluating social influence index of environmental risk

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
徐建国;刘梦凡;刘泳慧;: "突发事件网络舆情风险预警模型研究", 软件导刊, no. 07 *
魏洁: "突发事件社交网络舆情演化分析研究", 硕士论文库, no. 2 *
黄微: "网络舆情衍进指数构建与实证分析", 图书情报工作, vol. 63, no. 20 *

Also Published As

Publication number Publication date
CN113822498B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
US8364618B1 (en) Large scale machine learning systems and methods
KR20020049164A (en) The System and Method for Auto - Document - classification by Learning Category using Genetic algorithm and Term cluster
CN109299271A (en) Training sample generation, text data, public sentiment event category method and relevant device
CN110347701B (en) Target type identification method for entity retrieval query
CN109308323A (en) A kind of construction method, device and the equipment of causality knowledge base
US20160170993A1 (en) System and method for ranking news feeds
CN104050556A (en) Feature selection method and detection method of junk mails
CN108027814A (en) Disable word recognition method and device
CN106951565B (en) File classification method and the text classifier of acquisition
CN108681548A (en) A kind of lawyer's information processing method and system
CN114757302A (en) Clustering method system for text processing
CN108596637A (en) A kind of electric business service problem discovery system
CN115238040A (en) Steel material science knowledge graph construction method and system
CN112800232B (en) Case automatic classification method based on big data
CN108681977A (en) A kind of lawyer's information processing method and system
Do et al. Constraints based taxonomic relation classification
CN114398891A (en) Method for generating KPI curve and marking wave band characteristics based on log keywords
CN117272995B (en) Repeated work order recommendation method and device
CN108776652B (en) Market forecasting method based on news corpus
CN108614860A (en) A kind of lawyer's information processing method and system
KR100842216B1 (en) Automatic document classification method and apparatus for multiple category documents with plural associative classification rules extracted using association rule mining technique
CN110362828B (en) Network information risk identification method and system
JP4891638B2 (en) How to classify target data into categories
CN105337842A (en) Method for filtering junk mail irrelevant to contents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant