CN115098794A - Public opinion manufacturing group identification method, device, equipment and storage medium - Google Patents

Public opinion manufacturing group identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN115098794A
CN115098794A CN202210677303.1A CN202210677303A CN115098794A CN 115098794 A CN115098794 A CN 115098794A CN 202210677303 A CN202210677303 A CN 202210677303A CN 115098794 A CN115098794 A CN 115098794A
Authority
CN
China
Prior art keywords
public opinion
organization
manufacturing
initial
members
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210677303.1A
Other languages
Chinese (zh)
Inventor
罗霞
吴海林
王超君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC Potevio Science and Technology Co Ltd
Original Assignee
CETC Potevio Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC Potevio Science and Technology Co Ltd filed Critical CETC Potevio Science and Technology Co Ltd
Priority to CN202210677303.1A priority Critical patent/CN115098794A/en
Publication of CN115098794A publication Critical patent/CN115098794A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a public opinion manufacturing group identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring social network relations among all organization members in an organization to be identified based on a social network platform, and constructing a social network relation graph; acquiring interactive data among all organization members, preprocessing the interactive data to acquire public opinion interactive data, and then extracting features to acquire public opinion interactive data feature vectors; determining a plurality of public opinion initial members based on a social network relationship graph and public opinion interaction data, and performing breadth-first search by taking the public opinion initial members as starting points to obtain a search member set; performing initial identification on public opinion manufacturing members on the set by using a cosine similarity algorithm, and determining an initial public opinion manufacturing member set; and based on the characteristic information of the organization members, carrying out public opinion manufacturing member re-identification on the initial public opinion manufacturing member set through a naive Bayes model, and determining a public opinion manufacturing group. The invention can accurately identify public opinion manufacturing groups in the organization.

Description

Public opinion manufacturing group identification method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to a public opinion manufacturing group identification method, a public opinion manufacturing group identification device, public opinion manufacturing group identification equipment and a computer readable storage medium.
Background
With the development of the internet era, networks have become important media for people to transmit information, but negative public opinions inevitably exist in complicated network information, so a method for identifying public opinion manufacturing groups is urgently needed to monitor the public opinion manufacturing groups in time so as to avoid the public opinions from being continuously disseminated on the networks.
At present, public opinion manufacturing groups generally disseminate negative public opinions in organizations, but the existing public opinion manufacturing group identification method is difficult to accurately identify the public opinion manufacturing groups in the organizations, so that the public opinion manufacturing groups in the organizations are difficult to monitor in time.
Disclosure of Invention
The invention provides a public opinion manufacturing group identification method, device, equipment and storage medium, which aims to solve the technical problem that the identification accuracy of the public opinion manufacturing group in an organization is not high in the existing public opinion manufacturing group identification method, and can firstly carry out initial identification on public opinion manufacturing members by utilizing a cosine similarity algorithm based on a social network relation graph of members in the organization to determine an initial public opinion manufacturing member set, then, aiming at organization members in the initial public opinion manufacturing member set, based on the characteristic information of each organization member, the naive Bayes model is used for carrying out the second recognition of the public opinion manufacturing members, and the public opinion manufacturing groups in the organizations are determined, in the process of identifying public opinion manufacturing groups, the characteristic information of each organization member is fully considered, therefore, public opinion manufacturing groups in the organization can be accurately identified, and timely monitoring of the public opinion manufacturing groups in the organization is facilitated.
In order to solve the above technical problems, a first aspect of the embodiments of the present invention provides a public opinion manufacturing group identification method, including the following steps:
acquiring social network relationships among all organization members in an organization to be identified based on a social network platform, and constructing a social network relationship graph of the organization to be identified according to the social network relationships;
acquiring interaction data among all organization members based on the social network platform, and preprocessing the interaction data to acquire public opinion interaction data;
carrying out feature extraction on the public opinion interactive data to obtain public opinion interactive data feature vectors;
determining a plurality of public opinion initial members based on public opinion interaction data between the social network relationship graph and each organization member, performing breadth-first search by taking the plurality of public opinion initial members as starting points, determining a plurality of organization members participating in public opinion interaction, and constructing a search member set according to the plurality of public opinion initial members and the plurality of organization members participating in public opinion interaction;
utilizing a cosine similarity algorithm to obtain the similarity between the public opinion interactive data characteristic vector and the initial public opinion interactive data characteristic vector between each organization member in the search member set, and carrying out public opinion manufacturing member primary identification on the organization members in the search member set according to the comparison result of the similarity and a preset similarity threshold value to determine the initial public opinion manufacturing member set; the initial public opinion interaction data feature vector is a public opinion interaction data feature vector between any public opinion initial member in the search member set and an organization member interacting with the public opinion initial member;
and based on the preset characteristic information of each organization member in the organization to be recognized, performing public opinion manufacturing member re-recognition on the organization members in the initial public opinion manufacturing member set through a preset naive Bayes model, and determining a public opinion manufacturing group in the organization to be recognized.
As a preferred scheme, the method for determining the initial public opinion manufacturing member set comprises the steps of obtaining the similarity between public opinion interaction data feature vectors and initial public opinion interaction data feature vectors between each organization member in the search member set by using a cosine similarity algorithm, and performing public opinion manufacturing member initial identification on the organization members in the search member set according to a comparison result of the similarity and a preset similarity threshold, wherein the method specifically comprises the following steps:
and obtaining the similarity between the public opinion interaction data feature vector and the initial public opinion interaction data feature vector between each organization member in the search member set by using the cosine similarity algorithm through the following expression:
Figure BDA0003694992060000021
wherein cos theta represents the similarity between the public opinion interaction data feature vector B and the initial public opinion interaction data feature vector A, n represents the number of public opinion interaction data features contained in one public opinion interaction data feature vector, A i Representing the ith public opinion interaction data characteristic value, B, in the initial public opinion interaction data characteristic vector A i Expressing the ith public opinion interaction data characteristic value in the public opinion interaction data characteristic vector B;
and determining the initial public opinion manufacturing member set according to organization members corresponding to the public opinion interactive data feature vectors with the similarity larger than the preset similarity threshold.
As a preferred scheme, the method for identifying again the public opinion manufacturing members of the organization members in the initial public opinion manufacturing member set through a preset naive bayes model based on the preset feature information of each organization member in the organization to be identified specifically includes the following steps:
determining organization members which receive public opinions finally in the initial public opinion manufacturing member set based on public opinion interaction data among all organization members in the initial public opinion manufacturing member set;
based on the characteristic information of each organization member in the organization to be identified, the probability that any organization member receiving the public opinion at last is a public opinion manufacturing member is obtained by adopting the following expression through the naive Bayes model:
Figure BDA0003694992060000031
wherein, P (x) 1 ,x 2 ,...,x m C) represents the probability that the organization member who received the public opinion last at present is the member for making the public opinion, c tableShowing a predetermined condition, x i I-th feature information representing the organization member who has last received public opinion at present, m representing the number of feature information, P (x) i | c) represents the presence of the feature information x i The probability that the organization member receiving the public opinion is the public opinion manufacturing member;
and identifying the organization member which receives the public opinion at the last time and has the probability larger than a preset probability threshold value as the public opinion manufacturing member.
As a preferred scheme, the interactive data at least comprises a timestamp, a sender, a receiver, text data and picture data;
then, the pre-processing the interactive data to obtain public opinion interactive data specifically includes the following steps:
extracting text data in the image data by an OCR method to obtain extracted text data;
based on a preset word bank, utilizing a crust word segmentation method to segment the text data and the extracted text data respectively to obtain a first segmentation result and a second segmentation result;
respectively cleaning the first word segmentation result and the second word segmentation result;
respectively carrying out standardization processing on a first segmentation result after cleaning processing and a second segmentation result after cleaning processing to obtain a first keyword of the text data and a second keyword of the extracted text data, and taking the timestamp, the sender, the receiver, the first keyword and the second keyword as the public opinion interaction data.
As a preferred scheme, the acquiring interaction data between each organization member based on the social network platform specifically includes the following steps:
and acquiring interaction data between each organization member based on the social network platform under the condition of acquiring preset authorization permission information.
Preferably, the method further comprises the following steps:
marking public opinion making members in the public opinion making group on the social network relation graph based on the determined public opinion making group.
As a preferred scheme, the characteristic information of each organization member in the organization to be identified at least comprises sex, age, position category, position, age, technical level, reward and punishment records, education level and social liveness.
A second aspect of the embodiments of the present invention provides a public opinion manufacturing group identification apparatus, including:
the social network relationship graph building module is used for obtaining the social network relationship between each organization member in the organization to be identified based on a social network platform and building the social network relationship graph of the organization to be identified according to the social network relationship;
the public opinion interactive data acquisition module is used for acquiring interactive data among all organization members based on the social network platform, and preprocessing the interactive data to acquire public opinion interactive data;
the public opinion interaction data feature extraction module is used for extracting features of the public opinion interaction data to obtain public opinion interaction data feature vectors;
the search member set construction module is used for determining a plurality of public opinion initial members based on public opinion interaction data between the social network relationship graph and each organization member, carrying out breadth-first search by taking the plurality of public opinion initial members as starting points, determining a plurality of organization members participating in public opinion interaction, and constructing a search member set according to the plurality of public opinion initial members and the plurality of organization members participating in public opinion interaction;
the public opinion manufacturing member initial identification module is used for obtaining the similarity between public opinion interaction data feature vectors and initial public opinion interaction data feature vectors among all organization members in the search member set by utilizing a cosine similarity algorithm, carrying out public opinion manufacturing member initial identification on the organization members in the search member set according to a comparison result of the similarity and a preset similarity threshold, and determining the initial public opinion manufacturing member set; the initial public opinion interaction data feature vector is a public opinion interaction data feature vector between any public opinion initial member in the search member set and an organization member interacting with the public opinion initial member;
and the public opinion manufacturing member re-identification module is used for carrying out public opinion manufacturing member re-identification on the organization members in the initial public opinion manufacturing member set through a preset naive Bayes model based on the preset characteristic information of each organization member in the organization to be identified, and determining a public opinion manufacturing group in the organization to be identified.
A third aspect of the embodiments of the present invention provides a public opinion manufacturing group identification apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the public opinion manufacturing group identification method according to any one of the first aspect when executing the computer program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, a device in which the computer-readable storage medium is located is controlled to execute the public opinion manufacturing group identification method according to any one of the first aspect.
Compared with the prior art, the method has the advantages that public opinion manufacturing members can be firstly identified for the first time by utilizing a cosine similarity algorithm based on the social network relation map of the members in the organization, the initial public opinion manufacturing member set is determined, then the public opinion manufacturing members are identified for the second time by utilizing a naive Bayes model aiming at the organization members in the initial public opinion manufacturing member set based on the characteristic information of each organization member, the public opinion manufacturing group in the organization is determined, and the characteristic information of each organization member is fully considered in the identification process of the public opinion manufacturing group, so that the public opinion manufacturing group in the organization can be accurately identified, and the monitoring on the public opinion manufacturing group in the organization is facilitated in time.
Drawings
Fig. 1 is a flow chart illustrating a public opinion manufacturing group identification method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a public opinion manufacturing group identification device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a first aspect of the embodiments of the present invention provides a public opinion manufacturing group identification method, including the following steps S1 to S6:
step S1, acquiring social network relations among all organization members in the organization to be identified based on a social network platform, and constructing a social network relation graph of the organization to be identified according to the social network relations;
step S2, acquiring interaction data among each organization member based on the social network platform, and preprocessing the interaction data to acquire public opinion interaction data;
step S3, extracting the characteristics of the public opinion interactive data to obtain public opinion interactive data characteristic vectors;
step S4, determining a plurality of public opinion initial members based on public opinion interaction data between the social network relationship graph and each organization member, performing breadth-first search by taking the plurality of public opinion initial members as starting points, determining a plurality of organization members participating in public opinion interaction, and constructing a search member set according to the plurality of public opinion initial members and the plurality of organization members participating in public opinion interaction;
step S5, utilizing a cosine similarity algorithm to obtain the similarity between the public opinion interactive data feature vector and the initial public opinion interactive data feature vector between each organization member in the search member set, and carrying out public opinion manufacturing member primary identification on the organization members in the search member set according to the comparison result of the similarity and a preset similarity threshold value to determine an initial public opinion manufacturing member set; the initial public opinion interaction data feature vector is a public opinion interaction data feature vector between any public opinion initial member in the search member set and an organization member performing public opinion interaction with the public opinion initial member;
and step S6, based on the preset feature information of each organization member in the organization to be identified, carrying out public opinion manufacturing member re-identification on the organization members in the initial public opinion manufacturing member set through a preset naive Bayes model, and determining a public opinion manufacturing group in the organization to be identified.
It should be noted that the social network platform includes, but is not limited to, third-party social network platforms such as WeChat, QQ, microblog, and the like. The method and the device for identifying the social network relationship of the organization are based on a social network platform, the social network relationship between all organization members in the organization to be identified is obtained, and a social network relationship graph of the organization to be identified is constructed by utilizing software such as NetworkX and Gephi according to the social network relationship.
Further, interaction data among all organization members are obtained based on the social network platform, and the interaction data comprise tags, special symbols, stop words and the like which are irrelevant to the current public opinion to be surveyed, so that the interaction data need to be preprocessed to obtain the public opinion interaction data.
Further, in the embodiment of the invention, the public opinion interaction data is subjected to feature extraction by using methods such as Tf-idf (Term frequency-inverse text frequency index), Bigram (binary word segmentation model), word2vec and the like, so as to obtain a public opinion interaction data feature vector. It is worth to be noted that, taking a Tf-idf method as an example, the embodiment of the present invention performs feature extraction on the public opinion interaction data, mainly obtains the word frequency and Tf-idf value of each keyword, and calculates the Tf-idf value of each keyword by using TfidfVectorizer classes in a sklern library, where the calculation expression is as follows:
Figure BDA0003694992060000071
TF-IDF(x)=TF(x)*IDF(x)
wherein tf (x) represents the word frequency of the word x in the current text, N represents the total number of texts in the corpus, and N (x) represents the total number of texts containing the word x in the corpus.
Further, the embodiment of the invention determines a plurality of public opinion initial members based on public opinion interaction data between the social network relationship graph and each organization member, performs breadth-first search by taking the plurality of public opinion initial members as starting points, determines a plurality of organization members participating in public opinion interaction, and constructs a search member set according to the plurality of public opinion initial members and the plurality of organization members participating in public opinion interaction. It can be understood that each public opinion is definitely proposed by a person and then is transmitted, so that according to public opinion interaction data among each organization member, an interaction starting point of a public opinion keyword or a public opinion keyword group appearing earliest can be obtained, an initiator corresponding to the interaction starting point is a public opinion starting member, there may be more than one interaction starting point, and a set of the interaction starting points is recorded as: VI { V 1 ,V 2 ,V 3 ,...,V i And marking all interaction starting points in the social network relationship graph.
Then, carrying out breadth-first search by taking the plurality of public opinion initial members (namely interactive starting points) as starting points, searching out organization members carrying out public opinion interaction with any one public opinion initial member, and all organization members participating in public opinion interaction in the subsequent public opinion transmission process, and constructing a search member set according to the plurality of public opinion initial members and the plurality of organization members participating in public opinion interaction.
Furthermore, the embodiment of the invention obtains the similarity between the public opinion interaction data feature vector and the initial public opinion interaction data feature vector between each organization member in the search member set by using a cosine similarity algorithm, and performs public opinion manufacturing member initial identification on the organization members in the search member set according to the comparison result of the similarity and a preset similarity threshold value to determine the initial public opinion manufacturing member set. It can be understood that, in the search member set, not all organization members participate in public opinion propagation, and therefore, it is necessary to analyze the similarity between the public opinion interaction data feature vector and the initial public opinion interaction data feature vector between each organization member by using a cosine similarity algorithm, and then perform public opinion manufacturing member initial identification by using a comparison result of the similarity and a preset similarity threshold to determine the initial public opinion manufacturing member set. For example, the feature vector of the initial public opinion interaction data between the public opinion initiating member a and the organization member b is a1, the feature vector of the public opinion interaction data between the organization member b and the organization member c is b1, the similarity between a1 and b1 is calculated by using cosine similarity algorithm, if the similarity is greater than the preset similarity threshold, the organization member b is determined to transmit public opinion to the organization member c, and therefore the public opinion initiating member a, the organization member b and the organization member c are all included in the initial public opinion manufacturing member set.
Further, in the embodiment of the present invention, based on the preset feature information of each organization member in the to-be-identified organization, the organization members in the initial public opinion manufacturing member set are identified again through a preset naive bayes model, and a public opinion manufacturing group in the to-be-identified organization is determined.
It can be understood that, assuming that the public opinion initiating member a propagates public opinion to the organization member b, but the organization member b does not have public opinion related interaction with other organization members except the public opinion initiating member a, in this case, the organization member b may belong to a public opinion manufacturing member or not, and therefore, the organization member b needs to be identified again by a preset naive bayes model based on the characteristic information of the organization member b, and then the probability that the organization member b is a public opinion manufacturing member is analyzed.
The public opinion manufacturing group identification method provided by the embodiment of the invention can be used for firstly carrying out primary identification on public opinion manufacturing members by utilizing a cosine similarity calculation method based on a social network relation map of the members in the organization, determining an initial public opinion manufacturing member set, then carrying out secondary identification on the public opinion manufacturing members by utilizing a naive Bayes model aiming at the organization members in the initial public opinion manufacturing member set based on the characteristic information of each organization member, and determining the public opinion manufacturing group in the organization.
As a preferred scheme, the method comprises the following steps of obtaining similarity between a public opinion interaction data feature vector and an initial public opinion interaction data feature vector between each organization member in the search member set by using a cosine similarity algorithm, carrying out public opinion manufacturing member initial identification on the organization members in the search member set according to a comparison result of the similarity and a preset similarity threshold, and determining the initial public opinion manufacturing member set, wherein the method specifically comprises the following steps:
and obtaining the similarity between the public opinion interaction data feature vector and the initial public opinion interaction data feature vector between each organization member in the search member set by using the cosine similarity algorithm through the following expression:
Figure BDA0003694992060000091
wherein cos theta represents the similarity between the public opinion interaction data feature vector B and the initial public opinion interaction data feature vector A, n represents the number of public opinion interaction data features contained in one public opinion interaction data feature vector, A i Representing the ith public opinion interaction data characteristic value, B, in the initial public opinion interaction data characteristic vector A i Expressing the ith public opinion interaction data characteristic value in the public opinion interaction data characteristic vector B;
and determining the initial public opinion manufacturing member set according to organization members corresponding to public opinion interaction data feature vectors with the similarity greater than the preset similarity threshold.
It can be understood that the closer the angle θ between the initial public opinion interaction data feature vector a and the public opinion interaction data feature vector B is to 0 °, the more similar the two feature vectors are determined, and when the two feature vectors are completely the same, cos θ is equal to 1.
As a preferred scheme, the method for identifying again the public opinion manufacturing members of the organization members in the initial public opinion manufacturing member set through a preset naive bayes model based on the preset feature information of each organization member in the organization to be identified specifically includes the following steps:
determining a public opinion interaction data between each organization member in the initial public opinion manufacturing member set, and determining an organization member which receives public opinion finally in the initial public opinion manufacturing member set;
based on the characteristic information of each organization member in the organization to be identified, the probability that any organization member receiving the public opinion at last is a public opinion manufacturing member is obtained by adopting the following expression through the naive Bayes model:
Figure BDA0003694992060000092
wherein, P (x) 1 ,x 2 ,...,x m | c) represents the probability that the organization member which receives the public opinion last at present is a public opinion manufacturing member, c represents a preset condition, and x i I-th feature information representing the organization member who has last received public opinion at present, m representing the number of feature information, P (x) i | c) represents the presence of the feature information x i The probability that the organization member receiving the public opinion is the public opinion manufacturing member;
and identifying the organization member which receives the public opinion at the last time and has the probability larger than a preset probability threshold value as the public opinion manufacturing member.
It can be understood that the public opinion manufacturing probability corresponding to each feature information can be analyzed based on the feature information of a certain organization member, such as sex, age, position category, position, and the like, age, technology level, reward and punishment records, education level, social platform attention, social platform post number, post subject category, social activity, and the like. For example, an organization member with good education and low social activity generally has a low probability of being judged as a public opinion manufacturing member. And the public opinion manufacturing probability corresponding to each characteristic information is obtained by training the original naive Bayes model by utilizing a pre-training set, and the data in the training set is the data for marking the public opinion manufacturing probability aiming at each characteristic information.
As a preferred scheme, the interactive data at least comprises a timestamp, a sender, a receiver, text data and picture data;
then, the pre-processing the interactive data to obtain public opinion interactive data specifically includes the following steps:
extracting text data in the image data by an OCR method to obtain extracted text data;
based on a preset word bank, utilizing a crust word segmentation method to segment the text data and the extracted text data respectively to obtain a first segmentation result and a second segmentation result;
respectively cleaning the first word segmentation result and the second word segmentation result;
respectively carrying out standardization processing on the first word segmentation result after cleaning processing and the second word segmentation result after cleaning processing to obtain a first keyword of the text data and a second keyword of the extracted text data, and taking the timestamp, the sender, the receiver, the first keyword and the second keyword as the public opinion interaction data.
Specifically, the embodiment of the invention prepares a word bank in advance according to keywords possibly appearing in the public opinion to be investigated and commonly used words of each organization member in the organization to be identified, and then performs word segmentation on the text data and the extracted text data respectively by using a crust word segmentation method to obtain a first word segmentation result and a second word segmentation result. It should be noted that the ending segmentation method adopts an HMM model based on the Chinese character word-forming capability, and implements segmentation of the input text by using the Viterbi algorithm.
It is worth noting that in the word segmentation process, by means of Python instructions: add _ word (word, freq _ None, tag _ None) and del _ word (word) dynamically modify the lexicon.
Further, the first segmentation result and the second segmentation result are respectively cleaned so as to remove text data such as labels, special symbols and stop words which are irrelevant to the current public opinion to be investigated.
Further, the first segmentation result after the cleaning treatment and the second segmentation result after the cleaning treatment are respectively subjected to standardization treatment. The purpose of the standardization processing is to unify the idioms and the entity designations, for example, when words of an electric department group, a middle electric department, a Chinese electric department and the like appear in the first word segmentation result and the second word segmentation result, the words are unified into the middle electric department through the standardization processing.
As a preferred scheme, the acquiring interaction data between each organization member based on the social network platform specifically includes the following steps:
and acquiring interaction data between each organization member based on the social network platform under the condition of acquiring preset authorization permission information.
Specifically, the authorization information includes, but is not limited to, a court law enforcement document number, a law enforcement CA certificate, and a lawyer investigation order number.
Preferably, the method further comprises the following steps:
marking public opinion making members in the public opinion making group on the social network relation map based on the determined public opinion making group.
Specifically, the embodiment of the invention marks the public opinion manufacturing members in the public opinion manufacturing group on the social network relationship map, can realize the visualization of the public opinion manufacturing members, and thus displays the public opinion manufacturing group in the organization to be identified more intuitively.
Preferably, the characteristic information of each tissue member in the tissue to be identified at least comprises sex, age, position category, position, age, technical grade, reward and punishment record, education degree and social liveness.
Referring to fig. 2, a second aspect of the embodiment of the present invention provides a public opinion manufacturing group identification apparatus, including:
the social network relationship graph building module 201 is configured to obtain a social network relationship between each organization member in an organization to be identified based on a social network platform, and build a social network relationship graph of the organization to be identified according to the social network relationship;
a public opinion interaction data obtaining module 202, configured to obtain interaction data between each organization member based on the social network platform, and preprocess the interaction data to obtain public opinion interaction data;
a feature vector obtaining module 203, configured to perform feature extraction on the public opinion interaction data to obtain a public opinion interaction data feature vector;
a search member set construction module 204, configured to determine a plurality of public opinion initial members based on public opinion interaction data between the social network relationship graph and each organization member, perform breadth-first search using the plurality of public opinion initial members as starting points, determine a plurality of organization members participating in public opinion interaction, and construct a search member set according to the plurality of public opinion initial members and the plurality of organization members participating in public opinion interaction;
a public opinion manufacturing member primary identification module 205, configured to obtain similarity between a public opinion interaction data feature vector and an initial public opinion interaction data feature vector between each organization member in the search member set by using a cosine similarity algorithm, perform public opinion manufacturing member primary identification on the organization members in the search member set according to a comparison result between the similarity and a preset similarity threshold, and determine an initial public opinion manufacturing member set; the initial public opinion interaction data feature vector is a public opinion interaction data feature vector between any public opinion initial member in the search member set and an organization member performing public opinion interaction with the public opinion initial member;
and a public opinion manufacturing member re-identifying module 206, configured to perform public opinion manufacturing member re-identification on the organization members in the initial public opinion manufacturing member set through a preset naive bayes model based on preset feature information of each organization member in the to-be-identified organization, so as to determine a public opinion manufacturing group in the to-be-identified organization.
Preferably, the public opinion manufacturing member initial identification module 205 is configured to obtain similarity between a public opinion interaction data feature vector and an initial public opinion interaction data feature vector between each organization member in the search member set by using a cosine similarity algorithm, perform public opinion manufacturing member initial identification on the organization members in the search member set according to a comparison result between the similarity and a preset similarity threshold, and determine an initial public opinion manufacturing member set, including the following steps:
and by utilizing the cosine similarity algorithm, obtaining the similarity between the public opinion interaction data feature vector and the initial public opinion interaction data feature vector between each organization member in the search member set through the following expression:
Figure BDA0003694992060000131
wherein cos theta represents the similarity between the public opinion interaction data feature vector B and the initial public opinion interaction data feature vector A, n represents the number of public opinion interaction data features contained in one public opinion interaction data feature vector, A i Representing the ith public opinion interaction data characteristic value in the initial public opinion interaction data characteristic vector A, B i Representing the ith public opinion interaction data characteristic value in the public opinion interaction data characteristic vector B;
and determining the initial public opinion manufacturing member set according to organization members corresponding to public opinion interaction data feature vectors with the similarity greater than the preset similarity threshold.
Preferably, the public opinion manufacturing member re-recognition module 206 is configured to perform public opinion manufacturing member re-recognition on the organization members in the initial public opinion manufacturing member set through a preset naive bayes model based on preset feature information of each organization member in the organization to be recognized, and specifically includes the following steps:
determining organization members which receive public opinions finally in the initial public opinion manufacturing member set based on public opinion interaction data among all organization members in the initial public opinion manufacturing member set;
based on the characteristic information of each organization member in the organization to be identified, the probability that any organization member receiving public opinion at last is a public opinion manufacturing member is obtained by the naive Bayesian model by adopting the following expression:
Figure BDA0003694992060000132
wherein, P (x) 1 ,x 2 ,...,x m | c) represents the probability that the organization member which receives the public opinion last at present is the public opinion manufacturing member, c represents a preset condition, and x i The ith feature information representing the organization member who has received the public opinion last time, m representing the number of feature information, P (x) i | c) represents the presence of the feature information x i The probability that the organization member receiving public opinion is a public opinion manufacturing member;
and identifying the organization member which receives the public opinion at the last time and has the probability larger than a preset probability threshold value as the public opinion manufacturing member.
As a preferred scheme, the interactive data at least comprises a timestamp, a sender, a receiver, text data and picture data;
then, the public opinion interactive data obtaining module 202 is configured to preprocess the interactive data to obtain the public opinion interactive data, and specifically includes the following steps:
extracting text data in the image data by an OCR method to obtain extracted text data;
based on a preset word bank, utilizing a crust word segmentation method to segment the text data and the extracted text data respectively to obtain a first segmentation result and a second segmentation result;
respectively cleaning the first word segmentation result and the second word segmentation result;
respectively carrying out standardization processing on the first word segmentation result after cleaning processing and the second word segmentation result after cleaning processing to obtain a first keyword of the text data and a second keyword of the extracted text data, and taking the timestamp, the sender, the receiver, the first keyword and the second keyword as the public opinion interaction data.
As a preferred scheme, the public opinion interaction data obtaining module 202 is configured to obtain interaction data between each organization member based on the social network platform, and specifically includes the following steps:
and acquiring interaction data between each organization member based on the social network platform under the condition of acquiring preset authorization permission information.
Preferably, the apparatus further comprises a marking module configured to:
marking public opinion making members in the public opinion making group on the social network relation map based on the determined public opinion making group.
Preferably, the characteristic information of each tissue member in the tissue to be identified at least comprises sex, age, position category, position, age, technical grade, reward and punishment record, education degree and social liveness.
It should be noted that, the public opinion manufacturing group identification device provided in the embodiment of the present invention can implement all the processes of the public opinion manufacturing group identification method described in any one of the above embodiments, and the functions and technical effects of the modules in the device are respectively the same as those of the public opinion manufacturing group identification method described in the above embodiments and the implemented technical effects, and are not described herein again.
A third aspect of the embodiments of the present invention provides a public opinion manufacturing group identification apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the public opinion manufacturing group identification method according to any one of the embodiments of the first aspect when executing the computer program.
The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. The terminal device may also include input and output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal device and connects the various parts of the entire terminal device using various interfaces and lines.
The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, a device in which the computer-readable storage medium is located is controlled to execute the public opinion manufacturing group identification method according to any of the embodiments of the first aspect.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and may also be implemented by hardware entirely. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background art may be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A public opinion manufacturing group identification method is characterized by comprising the following steps:
acquiring social network relationships among all organization members in an organization to be identified based on a social network platform, and constructing a social network relationship graph of the organization to be identified according to the social network relationships;
acquiring interaction data among all organization members based on the social network platform, and preprocessing the interaction data to acquire public opinion interaction data;
carrying out feature extraction on the public opinion interactive data to obtain public opinion interactive data feature vectors;
determining a plurality of public opinion initial members based on public opinion interaction data between the social network relationship graph and each organization member, performing breadth-first search by taking the plurality of public opinion initial members as starting points, determining a plurality of organization members participating in public opinion interaction, and constructing a search member set according to the plurality of public opinion initial members and the plurality of organization members participating in public opinion interaction;
obtaining the similarity between public opinion interaction data characteristic vectors among all organization members in the search member set and initial public opinion interaction data characteristic vectors by using a cosine similarity algorithm, and carrying out public opinion manufacturing member primary identification on the organization members in the search member set according to a comparison result of the similarity and a preset similarity threshold value to determine an initial public opinion manufacturing member set; the initial public opinion interaction data feature vector is a public opinion interaction data feature vector between any public opinion initial member in the search member set and an organization member interacting with the public opinion initial member;
and carrying out public opinion manufacturing member re-identification on the organization members in the initial public opinion manufacturing member set through a preset naive Bayes model based on preset characteristic information of each organization member in the organization to be identified, and determining a public opinion manufacturing group in the organization to be identified.
2. The public opinion manufacturing group identification method as claimed in claim 1, wherein the method for obtaining similarity between public opinion interaction data feature vector and initial public opinion interaction data feature vector between each organization member in the search member set by using cosine similarity algorithm, and performing public opinion manufacturing member initial identification on the organization members in the search member set according to the comparison result of the similarity and a preset similarity threshold to determine an initial public opinion manufacturing member set specifically comprises the following steps:
and by utilizing the cosine similarity algorithm, obtaining the similarity between the public opinion interaction data feature vector and the initial public opinion interaction data feature vector between each organization member in the search member set through the following expression:
Figure FDA0003694992050000021
wherein cos theta represents the similarity between the public opinion interaction data feature vector B and the initial public opinion interaction data feature vector A, n represents the number of public opinion interaction data features contained in one public opinion interaction data feature vector, A i Representing the ith public opinion interaction data characteristic value, B, in the initial public opinion interaction data characteristic vector A i Representing the ith public opinion interaction data characteristic value in the public opinion interaction data characteristic vector B;
and determining the initial public opinion manufacturing member set according to organization members corresponding to public opinion interaction data feature vectors with the similarity greater than the preset similarity threshold.
3. The method as claimed in claim 2, wherein the method for identifying again public opinion manufacturing members from the organization members in the initial public opinion manufacturing member set through a preset naive bayes model based on the preset feature information of each organization member in the organization to be identified comprises the following steps:
determining organization members which receive public opinions finally in the initial public opinion manufacturing member set based on public opinion interaction data among all organization members in the initial public opinion manufacturing member set;
based on the characteristic information of each organization member in the organization to be identified, the probability that any organization member receiving public opinion at last is a public opinion manufacturing member is obtained by the naive Bayesian model by adopting the following expression:
Figure FDA0003694992050000022
wherein, P (x) 1 ,x 2 ,...,x m | c) represents that the organization member who has last received the public opinion at present makes members for the public opinionC represents a preset condition, x i The ith feature information representing the organization member who has received the public opinion last time, m representing the number of feature information, P (x) i | c) represents the presence of the feature information x i The probability that the organization member receiving public opinion is a public opinion manufacturing member;
and identifying the organization member which receives the public opinion at the last time and has the probability larger than a preset probability threshold value as the public opinion manufacturing member.
4. The public opinion manufacturing group identification method as claimed in claim 3, wherein the interactive data at least includes time stamp, sender, receiver, text data and picture data;
then, the pre-processing the interactive data to obtain public opinion interactive data specifically includes the following steps:
extracting text data in the image data by an OCR method to obtain extracted text data;
based on a preset word bank, utilizing a crust word segmentation method to segment the text data and the extracted text data respectively to obtain a first segmentation result and a second segmentation result;
respectively cleaning the first segmentation result and the second segmentation result;
respectively carrying out standardization processing on the first word segmentation result after cleaning processing and the second word segmentation result after cleaning processing to obtain a first keyword of the text data and a second keyword of the extracted text data, and taking the timestamp, the sender, the receiver, the first keyword and the second keyword as the public opinion interaction data.
5. The method as claimed in claim 4, wherein the step of obtaining interaction data between each organization member based on the social network platform comprises the following steps:
and acquiring interaction data between each organization member based on the social network platform under the condition of acquiring preset authorization permission information.
6. The public opinion manufacturing group identification method as claimed in claim 5, wherein the method further comprises the steps of:
marking public opinion making members in the public opinion making group on the social network relation map based on the determined public opinion making group.
7. The public opinion manufacturing group identification method according to claim 6, wherein the characteristic information of each organization member in the organization to be identified at least comprises sex, age, position category, position, age, technical level, reward and punishment records, education level and social activity.
8. A public opinion manufacturing group identification device, comprising:
the social network relationship graph building module is used for obtaining the social network relationship between each organization member in the organization to be identified based on a social network platform and building the social network relationship graph of the organization to be identified according to the social network relationship;
the public opinion interactive data acquisition module is used for acquiring interactive data among all organization members based on the social network platform, and preprocessing the interactive data to acquire public opinion interactive data;
the public opinion interaction data feature extraction module is used for extracting features of the public opinion interaction data to obtain public opinion interaction data feature vectors;
a searching member set building module, which is used for determining a plurality of public opinion initial members based on public opinion interaction data between the social network relationship map and each organization member, carrying out breadth-first searching by taking the plurality of public opinion initial members as starting points, determining a plurality of organization members participating in public opinion interaction, and building a searching member set according to the plurality of public opinion initial members and the plurality of organization members participating in public opinion interaction;
the public opinion manufacturing member initial identification module is used for obtaining the similarity between public opinion interaction data feature vectors and initial public opinion interaction data feature vectors among all organization members in the search member set by utilizing a cosine similarity algorithm, carrying out public opinion manufacturing member initial identification on the organization members in the search member set according to a comparison result of the similarity and a preset similarity threshold, and determining the initial public opinion manufacturing member set; the initial public opinion interaction data feature vector is a public opinion interaction data feature vector between any public opinion initial member in the search member set and an organization member performing public opinion interaction with the public opinion initial member;
and the public opinion manufacturing member re-identification module is used for carrying out public opinion manufacturing member re-identification on the organization members in the initial public opinion manufacturing member set through a preset naive Bayes model based on the preset characteristic information of each organization member in the organization to be identified, and determining a public opinion manufacturing group in the organization to be identified.
9. Public opinion manufacturing group identification equipment, characterized by comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the processor implements the public opinion manufacturing group identification method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, wherein the computer-readable storage medium comprises a stored computer program, and wherein when the computer program runs, a device where the computer-readable storage medium is located is controlled to execute the public opinion manufacturing group identification method according to any one of claims 1 to 7.
CN202210677303.1A 2022-06-15 2022-06-15 Public opinion manufacturing group identification method, device, equipment and storage medium Pending CN115098794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210677303.1A CN115098794A (en) 2022-06-15 2022-06-15 Public opinion manufacturing group identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210677303.1A CN115098794A (en) 2022-06-15 2022-06-15 Public opinion manufacturing group identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115098794A true CN115098794A (en) 2022-09-23

Family

ID=83291767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210677303.1A Pending CN115098794A (en) 2022-06-15 2022-06-15 Public opinion manufacturing group identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115098794A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965137A (en) * 2022-12-26 2023-04-14 北京码牛科技股份有限公司 Method, system, terminal and storage medium for predicting tendency of specific organization development object

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965137A (en) * 2022-12-26 2023-04-14 北京码牛科技股份有限公司 Method, system, terminal and storage medium for predicting tendency of specific organization development object
CN115965137B (en) * 2022-12-26 2023-11-14 北京码牛科技股份有限公司 Specific object relevance prediction method, system, terminal and storage medium

Similar Documents

Publication Publication Date Title
US11017178B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN112164391B (en) Statement processing method, device, electronic equipment and storage medium
CN106649818B (en) Application search intention identification method and device, application search method and server
US20180336193A1 (en) Artificial Intelligence Based Method and Apparatus for Generating Article
CN110263248B (en) Information pushing method, device, storage medium and server
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
WO2021218028A1 (en) Artificial intelligence-based interview content refining method, apparatus and device, and medium
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
WO2015021937A1 (en) Method and device for user recommendation
CN111522915A (en) Extraction method, device and equipment of Chinese event and storage medium
CN112559747B (en) Event classification processing method, device, electronic equipment and storage medium
US10970488B2 (en) Finding of asymmetric relation between words
CN111444387A (en) Video classification method and device, computer equipment and storage medium
CN112395391B (en) Concept graph construction method, device, computer equipment and storage medium
CN110990533A (en) Method and device for determining standard text corresponding to query text
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN112613293A (en) Abstract generation method and device, electronic equipment and storage medium
CN112686051A (en) Semantic recognition model training method, recognition method, electronic device, and storage medium
CN112395421A (en) Course label generation method and device, computer equipment and medium
CN115840808A (en) Scientific and technological project consultation method, device, server and computer-readable storage medium
CN115098794A (en) Public opinion manufacturing group identification method, device, equipment and storage medium
CN112632255B (en) Method and device for obtaining question and answer results
CN112163415A (en) User intention identification method and device for feedback content and electronic equipment
CN113505293B (en) Information pushing method and device, electronic equipment and storage medium
CN114579876A (en) False information detection method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination