CN116522901B - Method, device, equipment and medium for analyzing attention information of IT community - Google Patents

Method, device, equipment and medium for analyzing attention information of IT community Download PDF

Info

Publication number
CN116522901B
CN116522901B CN202310778213.6A CN202310778213A CN116522901B CN 116522901 B CN116522901 B CN 116522901B CN 202310778213 A CN202310778213 A CN 202310778213A CN 116522901 B CN116522901 B CN 116522901B
Authority
CN
China
Prior art keywords
vertex
vocabulary
weight
node
vocabularies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310778213.6A
Other languages
Chinese (zh)
Other versions
CN116522901A (en
Inventor
董方
金宏伟
闫锋
常星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinrui Tongchuang Beijing Technology Co ltd
Original Assignee
Jinrui Tongchuang Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinrui Tongchuang Beijing Technology Co ltd filed Critical Jinrui Tongchuang Beijing Technology Co ltd
Priority to CN202310778213.6A priority Critical patent/CN116522901B/en
Publication of CN116522901A publication Critical patent/CN116522901A/en
Application granted granted Critical
Publication of CN116522901B publication Critical patent/CN116522901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method, a device, equipment and a medium for analyzing attention information of an IT community, and relates to the field of data analysis, wherein the method comprises the following steps: acquiring text data of published data of an IT community and performing sentence segmentation to obtain a plurality of keyword sentences containing IT technical vocabulary; obtaining vertex vocabulary from the keyword sentences through a grammar filter; determining the weight of the vertex vocabulary, selecting a preset number of vertex vocabulary according to the magnitude of the weight value, and taking the selected vertex vocabulary as a keyword; determining emotion color information of the keywords according to pre-stored corresponding relations between the keywords and emotion color information; calculating the good sending frequency of the keywords, wherein the good sending frequency represents the publishing frequency of the keywords; and analyzing the attention degree of different IT technologies in the IT communities according to the emotion color information and the good sending frequency of the keywords. The scheme achieves the purpose of analyzing the attention of the IT technology by extracting the keywords of the IT community.

Description

Method, device, equipment and medium for analyzing attention information of IT community
Technical Field
The present application relates to the field of data analysis technologies, and in particular, to a method, an apparatus, a device, and a medium for analyzing attention information of an IT community.
Background
With the rapid development of the internet and information technology, the information content of the internet has been increased unprecedentedly, and more institutions and individuals can publish their own discussions and attitudes on the latest IT technology, such as news websites, microblogs, community forums and other social websites, on social media in various ways, so understanding the attention dynamics of the IT communities becomes an important method for researching the popular technical problems and grasping the industry dynamics.
Currently, text analysis is applied to less IT communities because: evaluating the attention information of an IT community needs to consider a large amount of data, and the data scale, the data quality, the data processing method and the like are all problems to be faced; meanwhile, the technical IT technical expression is more time-efficient, and information concerned by the IT community needs to be judged timely and quickly; the information focused by IT communities usually uses special words and has difficulty in establishing correct relation with traditional emotion words relative to other words. The automatic positive and negative evaluation of community interest information requires text emotion analysis, so how to quickly and accurately convert traditional emotion analysis into real-time community interest analysis is an important problem.
Disclosure of Invention
In view of the above, the embodiment of the application provides a method for analyzing attention information of an IT community, so as to solve the technical problem that in the prior art, the attention information of the IT technology in the IT community is difficult to analyze. The method comprises the following steps:
text data of published data of an IT community is obtained, sentence segmentation is carried out on the text data, and a plurality of keyword sentences containing IT technical vocabulary are obtained;
acquiring IT technology related words from the keyword sentences through a grammar filter, wherein the IT technology related words are called as vertex words;
determining the weight of vertex vocabulary, selecting a preset number of vertex vocabulary according to the magnitude of the weight value, and taking the selected vertex vocabulary as a keyword, wherein the magnitude of the weight value is in direct proportion to the attention;
determining emotion color information of the keywords according to pre-stored corresponding relations between the keywords and emotion color information;
calculating the good sending frequency of the keywords, wherein the good sending frequency represents the publishing frequency of the keywords;
and analyzing the attention degree of different IT technologies in the IT communities according to the emotion color information and the good sending frequency of the keywords.
The embodiment of the application also provides an analysis device for the attention information of the IT community, which is used for solving the technical problem that the attention information of the IT technology in the IT community is difficult to analyze in the prior art. The device comprises:
the data preprocessing module is used for acquiring text data of published data of the IT community, and performing sentence segmentation on the text data to obtain a plurality of keyword sentences containing IT technical vocabulary;
the vertex vocabulary generating module is used for acquiring IT technology related vocabularies from the keyword sentences through a grammar filter, and the IT technology related vocabularies are called vertex vocabularies;
the weight calculation module is used for determining the weight of the vertex vocabulary, selecting a preset number of vertex vocabulary according to the magnitude of the weight value, and taking the selected vertex vocabulary as a keyword, wherein the magnitude of the weight value is in direct proportion to the attention;
the emotion assignment module is used for determining emotion color information of the keywords according to pre-stored corresponding relations between the keywords and the emotion color information;
the good sending frequency calculation module is used for calculating the good sending frequency of the keywords, wherein the good sending frequency represents the publishing frequency of the keywords;
and the attention degree analysis module is used for analyzing the attention degree of different IT technologies in the IT communities according to the emotion color information and the good sending frequency of the keywords.
The embodiment of the application also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the analysis method of the attention information of any IT community when executing the computer program so as to solve the technical problem of difficult analysis of the attention information of the IT technology in the IT community in the prior art.
The embodiment of the application also provides a computer readable storage medium which stores a computer program for executing the analysis method of the attention information of any IT community, so as to solve the technical problem of difficult analysis of the attention information of the IT technology in the IT community in the prior art.
Compared with the prior art, the beneficial effects that above-mentioned at least one technical scheme that this description embodiment adopted can reach include at least:
text data of IT community published data is extracted, and words related to the IT technology are obtained through segmentation and filtering, so that information related to the IT technology is accurately extracted from community information; by calculating the weight of each vertex vocabulary, keywords with higher attention can be selected according to the weight, so that the good sending frequency of the keywords can be effectively calculated, and the accuracy of the good sending frequency calculation is improved; the key words and the emotion color information are compared, emotion colors (for example, positive emotion or negative emotion) of the key words of each IT community can be marked, so that emotion colors of community attention information can be determined, attention degrees (for example, which IT information is popular attention information) of different IT technologies in the IT community can be rapidly and accurately analyzed according to emotion color information and good sending frequency of the key words, and attention dynamics and attention information of the IT community can be effectively and conveniently analyzed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for analyzing information of interest of an IT community according to an embodiment of the present application;
FIG. 2 is a block diagram of a computer device according to an embodiment of the present application;
fig. 3 is a block diagram of an analysis apparatus for information of interest of an IT community according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In an embodiment of the present application, there is provided a method for analyzing information of interest of an IT community, as shown in fig. 1, the method including:
step S101: text data of published data of an IT community is obtained, sentence segmentation is carried out on the text data, and a plurality of keyword sentences containing IT technical vocabulary are obtained;
step S102: acquiring IT technology related words from the keyword sentences through a grammar filter, wherein the IT technology related words are called as vertex words;
step S103: determining the weight of vertex vocabulary, selecting a preset number of vertex vocabulary according to the magnitude of the weight value, and taking the selected vertex vocabulary as a keyword, wherein the magnitude of the weight value is in direct proportion to the attention;
step S104: determining emotion color information of the keywords according to pre-stored corresponding relations between the keywords and emotion color information;
step S105: calculating the good sending frequency of the keywords, wherein the good sending frequency represents the publishing frequency of the keywords;
step S106: and analyzing the attention degree of different IT technologies in the IT communities according to the emotion color information and the good sending frequency of the keywords.
Specifically, IT communities may be a platform that may be speaking, including web news, blogs, forums, SNS, and the like. In order to improve analysis accuracy of attention information of the IT community, text data of published data of the IT community is obtained, the text data can be firstly subjected to data cleaning and then sentence segmentation, and word banks and the like can be also built. The sentence segmentation is to divide the text into separate sentences according to a certain rule, wherein the sentences containing IT technical vocabulary are keyword sentences, a word stock can be established to store the results after sentence segmentation into a database or a file, and subsequent continuous processing is facilitated. The data cleansing is mainly to remove HTML tags, special symbols, line feed symbols, punctuation marks, etc.
Specifically, after obtaining the keyword sentence, a grammar filter may be used to filter words in the keyword sentence, for example, nouns and verbs related to the IT technology vocabulary in the keyword sentence are reserved to obtain the IT technology related vocabulary (for example, vocabularies related to the IT technology such as access, address space, plug-in, and the like), and the ordinary living vocabulary in the keyword sentence is deleted.
In a specific implementation, in order to find a vocabulary with high attention in text data, it is proposed to calculate the weight of a vertex vocabulary, and then select the vertex vocabulary with high attention (i.e. a larger weight value) as a keyword based on the weight, for example, the process of calculating the weight of the vertex vocabulary may be implemented by the following steps:
constructing an undirected graph according to the vertex vocabulary, wherein each node in the undirected graph corresponds to one vertex vocabulary; determining the relevance among different vertex vocabularies according to the relevance of each node in the undirected graph; and calculating the weight of each vertex word according to the relevance among different vertex words.
In specific implementation, according to the relevance among different vertex vocabularies, the weight of each vertex vocabulary is calculated by the following steps:
the weight of each node is calculated through the following formula, and the weight of each node is determined as the weight of the vertex vocabulary corresponding to the node:
(1)
wherein, the liquid crystal display device comprises a liquid crystal display device,din order to be a damping coefficient,representing all pointing nodes +.>Is>Representing all slave nodes->A set of nodes to which the starting edge is connected, +.>Representing node->And node->The weight of the two-way valve is equal to the weight of the two-way valve,representing node->Weight of->Representing node->Weights of (2); in particular, the method comprises the steps of,dthe value interval of (2) may be 0.84 to 0.85.
In particular, if a node appears behind many nodes, it is important to specify that node (i.e., the vertex vocabulary); if a vertex word with a high weight value is followed by a vertex word, the weight value of the vertex word is correspondingly increased.
In particular, the weight of the vertex vocabulary can be determined based on the mode of constructing the undirected graph, and in order to further improve the accuracy and reliability of the weight of the vertex vocabulary, the final weight of the vertex vocabulary can be determined by combining the mode of constructing the undirected graph and the weight of the pre-stored IT technical vocabulary, for example,
constructing an undirected graph according to the vertex vocabularies, determining the relevance between different vertex vocabularies according to the relevance of each node in the undirected graph, and calculating the first weight (namely the weight calculated by the formula 1) of each vertex vocabulary according to the relevance between different vertex vocabularies; determining a weight of a pre-stored IT technical vocabulary, wherein the weight is called a second weight; consistency matching is carried out on the vertex vocabulary and the prestored IT technical vocabulary, and aiming at the vertex vocabulary which is successfully matched, the first weight and the second weight are overlapped according to respective proportionality coefficients, and the overlapped result is used as the final weight of the vertex vocabulary which is successfully matched; and aiming at the vertex vocabulary with failed matching, taking the first weight as the final weight of the vertex vocabulary.
Specifically, the pre-stored IT technical vocabulary may be currently known IT popular vocabulary, the weights (the second weights) calculated by adopting the undirected graph are overlapped to a certain extent by the weights (the first weights) of the pre-stored vocabulary (for example, the first weights and the second weights respectively correspond to a coefficient, the sum of the two coefficients is 1, and the first weights and the second weights are multiplied by the corresponding coefficients respectively and then the two products are overlapped), so that the accuracy of the calculated weights can be increased, and more accurate data can be provided for the follow-up judgment of the IT technical attention.
Specifically, two methods are used to calculate the weights. The first is the final weight calculated directly using the vertex vocabulary. And the second method is to calculate the final weight by superposing the weights calculated by the vertex vocabulary by adopting the pre-stored vocabulary weights (second weights), and to calculate the weights by adopting two methods, so that various situations can be effectively dealt with and the weight calculation result is optimized.
In the specific implementation, in order to screen out the important and high-attention IT technical vocabulary, the weight of the pre-stored IT technical vocabulary can be determined by the following steps:
performing topic modeling according to the pre-stored IT technical vocabulary to obtain topic distribution of the pre-stored IT technical vocabulary, wherein each node in the topic distribution corresponds to one pre-stored IT technical vocabulary; determining an association node of each node according to association relations among nodes in the topic distribution, and calculating topic similarity between a pre-stored IT technical vocabulary corresponding to the node and a pre-stored IT technical vocabulary corresponding to the association node of the node aiming at each node to obtain the topic similarity of each node; and determining the weight of the pre-stored IT technical vocabulary corresponding to each node according to the topic similarity of each node, wherein the magnitude of the weight value is in direct proportion to the topic similarity of the node.
In the specific implementation, after a preset number of vertex vocabularies are selected according to the size of the weight value, the selected vertex vocabularies can be directly used as keywords; in order to further increase the probability that the keywords are popular words, so that popular attention information can be accurately determined, it is proposed that a phrase formed by the selected vertex words can also be used as the keywords, for example,
the process of using the selected vertex vocabulary as a keyword may be implemented by:
sorting the vertex vocabulary according to the order of the weights from big to small; selecting vertex vocabularies before ranking corresponding to the preset number in the ranking, and taking the selected vertex vocabularies as candidate keywords; and taking a phrase formed by adjacent candidate keywords in the keyword sentences as keywords.
Specifically, after the vertex vocabulary is ordered, firstly, vertex vocabulary with relatively low weight is removed, only vertex data with relatively high weight is reserved, and the vertex vocabulary with relatively high weight and vocabulary associated with the vertex vocabulary are used as key points, namely, keywords are generated by taking the vertex vocabulary with high weight as the center. Removing vertex vocabularies with relatively low weight (for example, selecting the vertex vocabularies before ranking corresponding to the preset number in the ranking, wherein the vertex vocabularies after ranking corresponding to the preset number are vertex vocabularies with relatively low weight), thereby being beneficial to improving the accuracy of calculating the good sending frequency, and generating keywords by taking the vertex vocabularies with high weight as the center is beneficial to more accurately finding out popular vocabularies and determining the related emotion colors.
Specifically, the emotion colors of the keywords include positive emotion, negative emotion and neutral emotion. For example, an emotion word bank dictionary (the emotion word bank dictionary includes a pre-stored correspondence between keywords and emotion color information) may be established, and the emotion colors of the keywords may be determined by comparing the keywords with the content of the emotion word bank dictionary.
In specific implementation, the process of calculating the frequency of occurrence of the keyword can be realized by the following steps:
and calculating the occurrence frequency of each keyword in the text data, and taking a calculation result as the good occurrence frequency.
Specifically, the keywords, the emotion colors of the keywords and the frequency of sending the keywords can be displayed to each terminal through the visual interface. The attention to different IT technologies in the IT communities can be analyzed by analyzing the keywords, the emotion colors of the keywords and the frequency of the keywords, for example, the content such as what the currently focused technology has, the attitudes to the technology (such as positive emotion colors of the words or negative emotion colors of the words, etc., the description language of the technology is the emotion colors of the words, etc.), the using experience and the experience sharing frequency can be analyzed and controlled.
In this embodiment, as shown in fig. 2, a computer device is provided, which includes a memory 201, a processor 202, and a computer program stored in the memory and executable on the processor, where the processor implements the method for analyzing the attention information of any IT community when executing the computer program.
In particular, the computer device may be a computer terminal, a server or similar computing means.
In the present embodiment, there is provided a computer-readable storage medium storing a computer program for executing the analysis method of the information of interest of any of the above-described IT communities.
In particular, computer-readable storage media, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable storage media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
Based on the same inventive concept, the embodiment of the application also provides an analysis device for the attention information of the IT community, as described in the following embodiment. Because the principle of the analysis device for the attention information of the IT community for solving the problem is similar to that of the analysis method for the attention information of the IT community, the implementation of the analysis device for the attention information of the IT community can refer to the implementation of the analysis method for the attention information of the IT community, and repeated parts are not repeated. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 3 is a block diagram of a structure of an analysis apparatus for information of interest of an IT community according to an embodiment of the present application, as shown in fig. 3, including: the structure is described below, and the data preprocessing module 301, the vertex vocabulary generating module 302, the weight calculating module 303, the emotion assigning module 304, the hair frequency calculating module 305, and the attention analyzing module 306 are described.
The data preprocessing module 301 is configured to obtain text data of published data of an IT community, and segment the text data to obtain a plurality of keyword sentences including an IT technical vocabulary;
the vertex vocabulary generating module 302 is configured to obtain an IT technology related vocabulary from the keyword sentences through a grammar filter, and the IT technology related vocabulary is called a vertex vocabulary;
the weight calculation module 303 is configured to determine the weight of the vertex vocabulary, select a preset number of vertex vocabularies according to the magnitude of the weight value, and use the selected vertex vocabularies as keywords, where the magnitude of the weight value is proportional to the attention;
the emotion assignment module 304 is configured to determine emotion color information of the keyword according to a pre-stored correspondence between the keyword and emotion color information;
the good-hair frequency calculation module 305 is configured to calculate a good-hair frequency of the keyword, where the good-hair frequency represents a publication frequency of the keyword;
and the attention analysis module 306 is used for analyzing the attention of different IT technologies in the IT community according to the emotion color information and the good sending frequency of the keywords.
In one embodiment, the weight calculation module includes:
the undirected graph constructing unit is used for constructing an undirected graph according to the vertex vocabularies, and each node in the undirected graph corresponds to one vertex vocabularies;
the vertex vocabulary relevance determining unit is used for determining relevance among different vertex vocabularies according to relevance of each node in the undirected graph;
and the weight calculation unit is used for calculating the weight of each vertex word according to the relevance among different vertex words.
In one embodiment, the weight calculation unit is configured to calculate the weight of each node by the following formula, and determine the weight of each node as the weight of the vertex vocabulary corresponding to the node:
wherein, the method comprises the steps of, wherein,dfor damping coefficient->Representing all pointing nodes +.>Is>Representing all slave nodes->A set of nodes to which the starting edge is connected, +.>Representing node->And node->Weights between->Representing node->Weight of->Representing node->Is a weight of (2).
In one embodiment, the weight calculation module further comprises:
the first weight generating unit is used for constructing an undirected graph according to the vertex vocabularies, determining the relevance between different vertex vocabularies according to the relevance of each node in the undirected graph, and calculating the first weight of each vertex vocabulary according to the relevance between different vertex vocabularies;
a second weight generating unit, configured to determine a weight of a pre-stored IT technical vocabulary, where the weight is called a second weight;
the final weight calculation unit is used for carrying out consistency matching on the vertex vocabulary and the prestored IT technical vocabulary, and aiming at the vertex vocabulary which is successfully matched, the first weight and the second weight are overlapped according to the respective proportionality coefficients, and the overlapped result is used as the final weight of the vertex vocabulary which is successfully matched; and aiming at the vertex vocabulary with failed matching, taking the first weight as the final weight of the vertex vocabulary.
In one embodiment, the second weight generating unit is configured to perform topic modeling according to a pre-stored IT technical vocabulary to obtain a topic distribution of the pre-stored IT technical vocabulary, where each node in the topic distribution corresponds to one pre-stored IT technical vocabulary; determining an association node of each node according to association relations among nodes in the topic distribution, and calculating topic similarity between a pre-stored IT technical vocabulary corresponding to the node and a pre-stored IT technical vocabulary corresponding to the association node of the node aiming at each node to obtain the topic similarity of each node; and determining the weight of the pre-stored IT technical vocabulary corresponding to each node according to the topic similarity of each node, wherein the magnitude of the weight value is in direct proportion to the topic similarity of the node.
In one embodiment, the weight calculation module includes:
the vertex vocabulary ordering unit is used for ordering the vertex vocabularies according to the order of the weights from big to small;
the candidate keyword generation unit is used for selecting vertex vocabularies before ranking corresponding to the preset number in the ranking, and the selected vertex vocabularies are called as candidate keywords;
and the keyword generation unit is used for taking a phrase formed by adjacent candidate keywords in the keyword sentences as keywords.
In one embodiment, the good hair frequency calculation module includes:
and the good sending frequency calculation unit is used for calculating the frequency of each keyword in the text data, and taking the calculation result as the good sending frequency.
The embodiment of the application realizes the following technical effects: firstly, text data of IT community published data is extracted, and words related to the IT technology are obtained through segmentation and filtering, so that information related to the IT technology is accurately extracted from community information; by calculating the weight of each vertex vocabulary and removing vertex vocabularies with relatively low weights, the good sending frequency can be effectively calculated and the accuracy of the good sending frequency calculation can be improved; generating keywords by taking vertex vocabulary with higher weight and vocabulary related to the vertex vocabulary as key points, namely taking the vertex vocabulary with higher weight as the center; the weight is calculated by adopting two methods, so that various conditions can be effectively dealt with, and the calculation result of the weight is optimized; and comparing the keywords with emotion color information, and labeling emotion colors of the attention information of each IT community, so as to determine emotion colors of the community attention information.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the application are not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, and various modifications and variations can be made to the embodiments of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (7)

1. A method for analyzing information of interest of an IT community, comprising:
acquiring text data of published data of an IT community, and performing sentence segmentation on the text data to obtain a plurality of keyword sentences containing IT technical vocabulary;
acquiring IT technology related words from the keyword sentences through a grammar filter, wherein the IT technology related words are called as vertex words;
determining the weight of the vertex vocabulary, selecting a preset number of vertex vocabularies according to the magnitude of the weight value, and taking the selected vertex vocabularies as key words, wherein the magnitude of the weight value is in direct proportion to the attention;
determining the weight of the vertex vocabulary comprises the following steps:
constructing an undirected graph according to the vertex vocabularies, determining the relevance between different vertex vocabularies according to the relevance of each node in the undirected graph, and calculating the first weight of each vertex vocabulary according to the relevance between different vertex vocabularies; determining a weight of a pre-stored IT technical vocabulary, wherein the weight is called a second weight; performing consistency matching on the vertex vocabulary and the pre-stored IT technical vocabulary, and overlapping the first weight and the second weight according to respective proportionality coefficients aiming at the vertex vocabulary successfully matched, wherein the overlapping result is used as the final weight of the vertex vocabulary successfully matched; aiming at the vertex vocabulary which fails to be matched, taking the first weight as the final weight of the vertex vocabulary;
according to the relevance among different vertex vocabularies, calculating the first weight of each vertex vocabulary, wherein the first weight comprises the following steps:wherein, the method comprises the steps of, wherein,dfor damping coefficient->Representing all pointing nodes +.>Is>Representing all slave nodes->A set of nodes to which the starting edge is connected, +.>Representing node->And node->Weights between->Representing node->Weight of->Representing node->Weights of (2);
determining weights of pre-stored IT technical words, comprising:
performing topic modeling according to the pre-stored IT technical vocabulary to obtain topic distribution of the pre-stored IT technical vocabulary, wherein each node in the topic distribution corresponds to one pre-stored IT technical vocabulary; determining an association node of each node according to the association relation among the nodes in the topic distribution, and calculating topic similarity between a pre-stored IT technical vocabulary corresponding to each node and a pre-stored IT technical vocabulary corresponding to the association node of the node aiming at each node to obtain the topic similarity of each node; determining the weight of a pre-stored IT technical vocabulary corresponding to each node according to the topic similarity of each node, wherein the magnitude of the weight value is in direct proportion to the topic similarity of the node;
determining emotion color information of a keyword according to a pre-stored corresponding relation between the keyword and emotion color information;
calculating the good sending frequency of the keywords, wherein the good sending frequency represents the publishing frequency of the keywords;
and analyzing the attention degree of different IT technologies in the IT communities according to the emotion color information and the good sending frequency of the keywords.
2. The method of claim 1, wherein determining the weights of the vertex words comprises:
constructing an undirected graph according to the vertex vocabulary, wherein each node in the undirected graph corresponds to one vertex vocabulary;
determining the relevance among different vertex vocabularies according to the relevance of each node in the undirected graph;
and calculating the weight of each vertex word according to the relevance among different vertex words.
3. The analysis method of attention information of an IT community according to any one of claims 1 to 2, wherein selecting a preset number of the vertex vocabularies according to the magnitude of the weight value, using the selected vertex vocabularies as keywords, comprises:
sorting the vertex vocabularies according to the order of the weights from big to small;
selecting the vertex vocabulary before ranking corresponding to the preset number in the ranking, and taking the selected vertex vocabulary as a candidate keyword;
and taking a phrase formed by adjacent candidate keywords in the keyword sentences as the keywords.
4. The analysis method of attention information of an IT community according to any one of claims 1 to 2, wherein calculating the frequency of occurrence of the keyword includes:
and calculating the occurrence frequency of each keyword in the text data, and taking a calculation result as the good occurrence frequency.
5. An analysis device for information of interest of an IT community, comprising:
the data preprocessing module is used for acquiring text data of published data of the IT community, and performing sentence segmentation on the text data to obtain a plurality of keyword sentences containing IT technical vocabulary;
the vertex vocabulary generating module is used for acquiring IT technology related vocabularies from the keyword sentences through a grammar filter, and the IT technology related vocabularies are called vertex vocabularies;
the weight calculation module is used for determining the weight of the vertex vocabulary, selecting a preset number of vertex vocabularies according to the magnitude of the weight value, and taking the selected vertex vocabularies as key words, wherein the magnitude of the weight value is in direct proportion to the attention; determining the weight of the vertex vocabulary comprises the following steps: constructing an undirected graph according to the vertex vocabularies, determining the relevance between different vertex vocabularies according to the relevance of each node in the undirected graph, and calculating the first weight of each vertex vocabulary according to the relevance between different vertex vocabularies; determining a weight of a pre-stored IT technical vocabulary, wherein the weight is called a second weight; performing consistency matching on the vertex vocabulary and the pre-stored IT technical vocabulary, and overlapping the first weight and the second weight according to respective proportionality coefficients aiming at the vertex vocabulary successfully matched, wherein the overlapping result is used as the final weight of the vertex vocabulary successfully matched; aiming at the vertex vocabulary which fails to be matched, taking the first weight as the final weight of the vertex vocabulary; according to the relevance among different vertex vocabularies, calculating the first weight of each vertex vocabulary, wherein the first weight comprises the following steps:wherein, the method comprises the steps of, wherein,dfor damping coefficient->Representing all pointing nodes +.>Is>Representing all slave nodes/>A set of nodes to which the starting edge is connected, +.>Representing node->And node->Weights between->Representing node->Weight of->Representing node->Weights of (2); determining weights of pre-stored IT technical words, comprising: performing topic modeling according to the pre-stored IT technical vocabulary to obtain topic distribution of the pre-stored IT technical vocabulary, wherein each node in the topic distribution corresponds to one pre-stored IT technical vocabulary; determining an association node of each node according to the association relation among the nodes in the topic distribution, and calculating topic similarity between a pre-stored IT technical vocabulary corresponding to each node and a pre-stored IT technical vocabulary corresponding to the association node of the node aiming at each node to obtain the topic similarity of each node; determining the weight of a pre-stored IT technical vocabulary corresponding to each node according to the topic similarity of each node, wherein the magnitude of the weight value is in direct proportion to the topic similarity of the node;
the emotion assignment module is used for determining emotion color information of the keywords according to pre-stored corresponding relations between the keywords and the emotion color information;
the good sending frequency calculation module is used for calculating the good sending frequency of the keywords, wherein the good sending frequency represents the publishing frequency of the keywords;
and the attention degree analysis module is used for analyzing the attention degree of different IT technologies in the IT communities according to the emotion color information and the good sending frequency of the keywords.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the method of analyzing information of interest of the IT community of any one of claims 1 to 4.
7. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that performs the method of analyzing information of interest of the IT community of any one of claims 1 to 4.
CN202310778213.6A 2023-06-29 2023-06-29 Method, device, equipment and medium for analyzing attention information of IT community Active CN116522901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310778213.6A CN116522901B (en) 2023-06-29 2023-06-29 Method, device, equipment and medium for analyzing attention information of IT community

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310778213.6A CN116522901B (en) 2023-06-29 2023-06-29 Method, device, equipment and medium for analyzing attention information of IT community

Publications (2)

Publication Number Publication Date
CN116522901A CN116522901A (en) 2023-08-01
CN116522901B true CN116522901B (en) 2023-09-15

Family

ID=87406636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310778213.6A Active CN116522901B (en) 2023-06-29 2023-06-29 Method, device, equipment and medium for analyzing attention information of IT community

Country Status (1)

Country Link
CN (1) CN116522901B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN103617169A (en) * 2013-10-23 2014-03-05 杭州电子科技大学 Microblog hot topic extracting method based on Hadoop
CN110362678A (en) * 2019-06-04 2019-10-22 哈尔滨工业大学(威海) A kind of method and apparatus automatically extracting Chinese text keyword
CN110781289A (en) * 2019-11-07 2020-02-11 北京邮电大学 Text visualization method for reserving unstructured text semantics
WO2020131004A1 (en) * 2017-12-29 2020-06-25 Nokia Technologies Oy Domain-independent automated processing of free-form text
CN111931516A (en) * 2020-08-25 2020-11-13 汪金玲 Text emotion analysis method and system based on reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230177835A1 (en) * 2021-12-07 2023-06-08 Insight Direct Usa, Inc. Relationship modeling and key feature detection based on video data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN103617169A (en) * 2013-10-23 2014-03-05 杭州电子科技大学 Microblog hot topic extracting method based on Hadoop
WO2020131004A1 (en) * 2017-12-29 2020-06-25 Nokia Technologies Oy Domain-independent automated processing of free-form text
CN110362678A (en) * 2019-06-04 2019-10-22 哈尔滨工业大学(威海) A kind of method and apparatus automatically extracting Chinese text keyword
CN110781289A (en) * 2019-11-07 2020-02-11 北京邮电大学 Text visualization method for reserving unstructured text semantics
CN111931516A (en) * 2020-08-25 2020-11-13 汪金玲 Text emotion analysis method and system based on reinforcement learning

Also Published As

Publication number Publication date
CN116522901A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
US11699074B2 (en) Training sequence generation neural networks using quality scores
US10296837B2 (en) Comment-comment and comment-document analysis of documents
US9710829B1 (en) Methods, systems, and articles of manufacture for analyzing social media with trained intelligent systems to enhance direct marketing opportunities
CN108228704A (en) Identify method and device, the equipment of Risk Content
CN110232112A (en) Keyword extracting method and device in article
CN111274822A (en) Semantic matching method, device, equipment and storage medium
CN112650842A (en) Human-computer interaction based customer service robot intention recognition method and related equipment
Kasztelnik et al. Data analytics and social media as the innovative business decision model with natural language processing
CN113821527A (en) Hash code generation method and device, computer equipment and storage medium
CN113887213A (en) Event detection method and device based on multilayer graph attention network
CN115374259A (en) Question and answer data mining method and device and electronic equipment
Stein et al. Applying QNLP to sentiment analysis in finance
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
CN116522901B (en) Method, device, equipment and medium for analyzing attention information of IT community
Banwo Artificial intelligence and financial services: Regulatory tracking and change management
CN107291686B (en) Method and system for identifying emotion identification
CN116756281A (en) Knowledge question-answering method, device, equipment and medium
CN113627177A (en) Multi-batch document processing method and device and computer equipment
Teckchandani et al. AIML and Sequence-to-Sequence Models to Build Artificial Intelligence Chatbots: Insights from a Comparative Analysis
Nahili et al. Digital marketing with social media: What Twitter says!
CN112966070A (en) Company employee comment analysis system and method based on aspect emotion analysis
Ali et al. Identifying and Profiling User Interest over time using Social Data
CN112115258A (en) User credit evaluation method, device, server and storage medium
Kosarava et al. Topic modeling application for intellectual analysis of reviews in Russian
CN111400577B (en) Search recall method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant