CN113961694A

CN113961694A - Conference-based auxiliary analysis method and system for operation condition of each company unit

Info

Publication number: CN113961694A
Application number: CN202111105581.1A
Authority: CN
Inventors: 杨梦琳; 周峰; 杨迪; 梁懿; 彭放; 陈红; 赵鹏; 闫崇峰; 陈雪萍; 翁贞
Original assignee: Big Data Center Of State Grid Corp Of China; State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; State Grid Shandong Electric Power Co Ltd; State Grid Fujian Electric Power Co Ltd; State Grid Shanghai Electric Power Co Ltd; Weifang Power Supply Co of State Grid Shandong Electric Power Co Ltd; Fujian Yirong Information Technology Co Ltd
Current assignee: Big Data Center Of State Grid Corp Of China; State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; State Grid Shandong Electric Power Co Ltd; State Grid Fujian Electric Power Co Ltd; State Grid Shanghai Electric Power Co Ltd; Weifang Power Supply Co of State Grid Shandong Electric Power Co Ltd; Fujian Yirong Information Technology Co Ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2022-01-21

Abstract

The invention discloses a conference-based auxiliary analysis method and system for operation conditions of each company unit, and relates to the technical field of document analysis. The method comprises the following steps: the method comprises a word2vec model training process, a subject thesaurus building process, a subject word-task association process, a subject word-conference association process and a task-conference association process. The embodiment of the invention analyzes the working development condition of each unit by combining the working task and the subject label based on the conference data of each unit, thereby knowing the development condition of the actual service of the lower unit, and promoting the comprehensive control of the specific working condition of the lower unit and the process tracking of the execution condition of the counterweight work. The embodiment of the invention mainly applies the technologies of text mining, natural language processing, machine learning, deep learning and the like, analyzes the completion conditions of key work tasks and characteristic work tasks of each unit on the basis of the conference data, and improves the capability of the conference data in assisting the decision of companies.

Description

Conference-based auxiliary analysis method and system for operation condition of each company unit

Technical Field

The invention relates to the technical field of document analysis, in particular to a conference-based auxiliary analysis method and system for the operation condition of each company unit.

Background

The conference is an important carrier for driving government affairs and enterprise business activities, and is a driving hub for important business activities. The main body information and the related content of the conference can represent the important basis of the enterprise for executing superior policy and company characteristic work.

In the prior art, intelligent association analysis is mainly performed on conference data, traditional conference information is analyzed in key extraction and clustering modes, association relations between multiple associated topics and hierarchies are lacked, and association relations between company key work tasks and topic labels cannot be reflected from a global level.

Disclosure of Invention

The invention aims to solve the technical problem of providing a conference-based auxiliary analysis method and system for the operation condition of each company unit, which are used for analyzing by combining a company key work task and a theme label on the basis of conference information and mining the internal association relation before the conference, the key work task and the theme label, thereby improving the application value of conference data analysis.

In a first aspect, the present invention provides a conference-based method for assisting in analyzing the operation conditions of each company unit, including:

word2vec model training procedure: filtering office system data, then segmenting words, combining with a user-defined term library, and training through a word2vec algorithm to obtain a word2vec model;

the process of constructing the subject word library comprises the following steps: firstly, extracting subject words in headquarter task data to form a subject word set, then obtaining keywords associated with each subject word to form a keyword set by combining a trained word2vec model and a manual carding method, and finally merging the subject word set and the keyword set to obtain a subject word library;

subject word-task association process: importing task data, wherein the task data comprises headquarter task data and network province task data, and performing correlation analysis on the subject word bank and the task data to obtain subject word-task correlation data;

subject term-meeting association process: importing network province conference data, and performing association analysis on the subject word bank and the network province conference data to obtain subject word-conference association data;

task-meeting association procedure: and performing correlation analysis on the subject term-task associated data and the subject term-conference associated data, summarizing the task data and the conference data associated with the same subject term into task-conference associated data, classifying the tasks corresponding to the subject terms of the conference associated with the headquarter task and the network province task simultaneously into headquarter key tasks, and classifying the tasks corresponding to the subject terms of the conference associated with the network province task only into network province special tasks.

Further, the word2vec model training process further specifically includes: the method comprises the steps of firstly importing office system data into a user-defined stop word bank for filtering, then performing jieba word segmentation, merging the office system data with a user-defined term bank to obtain office system data word segmentation texts, and training the office system data word segmentation texts through a word2vec algorithm to obtain a word2vec model.

Further, in the process of constructing the topic word library, extracting topic words in headquarter task data to form a topic word set, and specifically realizing the topic word set through a TF-IDF algorithm, importing a hot topic word library and a manual carding method.

In a second aspect, the present invention provides a conference-based auxiliary analysis system for the operation status of each company unit, including: the system comprises a word2vec model training module, a subject thesaurus building module, a subject word-task association module, a subject word-conference association module and a task-conference association module;

the word2vec model training module is used for filtering office system data, dividing words, combining with a user-defined term base, and training through a word2vec algorithm to obtain a word2vec model;

the topic word library construction module is used for firstly extracting topic words in headquarter task data to form a topic word set, then obtaining keywords associated with each topic word by combining a trained word2vec model and a manual carding method to form a keyword set, and finally combining the topic word set and the keyword set to obtain a topic word library;

the topic word-task association module is used for importing task data, wherein the task data comprises headquarter task data and internet province task data, and performing association analysis on the topic word bank and the task data to obtain topic word-task association data;

the topic word-conference association module is used for importing network province conference data and performing association analysis on the topic word library and the network province conference data to obtain topic word-conference association data;

the task-conference association module is used for performing association analysis on the subject term-task association data and the subject term-conference association data, summarizing the task data and the conference data associated with the same subject term into task-conference association data, classifying the tasks corresponding to the subject terms of the conference associated with the headquarter task and the cybercoin task at the same time into headquarter key tasks, and classifying the tasks corresponding to the subject terms of the conference associated with the cybercoin task only into the cybercoin special tasks.

Further, the word2vec model training module is further specifically configured to: the method comprises the steps of firstly importing office system data into a user-defined stop word bank for filtering, then performing jieba word segmentation, merging the office system data with a user-defined term bank to obtain office system data word segmentation texts, and training the office system data word segmentation texts through a word2vec algorithm to obtain a word2vec model.

Further, in the topic word library construction module, topic words in headquarter task data are extracted to form a topic word set, and the topic word set is specifically realized through a TF-IDF algorithm, hot-point word library importing and manual carding methods.

The embodiment of the invention provides a technical scheme, which has the following technical effects or advantages:

by constructing a theme word library, mining the internal association between theme words and tasks (headquarter tasks and network province tasks), between theme words and meetings and between tasks and meetings, the association relationship between the key work tasks and the theme labels of the company is embodied from the global level, an important basis is provided for supporting the auxiliary decision of the company, and reference is provided for other similar data analysis scenes, so that the application value of meeting data analysis is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

The invention will be further described with reference to the following examples with reference to the accompanying drawings.

FIG. 1 is a flow chart of a method according to one embodiment of the present invention;

FIG. 2 is a flow chart of a task-meeting correlation technique route according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a system according to a second embodiment of the present invention.

Detailed Description

The technical scheme in the embodiment of the invention has the following general idea:

the embodiment of the invention analyzes the working development condition of each unit by combining the working task and the subject label based on the conference data of each unit, thereby knowing the development condition of the actual service of the lower unit, and promoting the comprehensive control of the specific working condition of the lower unit and the process tracking of the execution condition of the counterweight work. The embodiment of the invention mainly applies the technologies of text mining, natural language processing, machine learning, deep learning and the like, analyzes the completion conditions of key work tasks and characteristic work tasks of each unit on the basis of the conference data, and improves the capability of the conference data in assisting the decision of companies.

The embodiment of the invention mainly takes the conference information as the basis, combines the key work tasks and the theme labels of the company to carry out analysis, and applies machine learning and deep learning algorithms to mine the internal association relation among the conference, the key work tasks and the theme labels, thereby improving the application value of the conference data analysis.

The technology mainly relates to the construction of a subject word bank, the mining of internal associations between subject words and tasks (headquarter tasks and cyberse tasks), between subject words and meetings and between tasks and meetings, and mainly adopts advanced technologies such as manual carding, word2vec algorithm and the like to extract the subject words and key words (attributes of the subject words) in headquarter task data, and then analyzes the associations between the subject words and the tasks, between the subject words and the meetings and between the tasks and the meetings to finally obtain meeting data, headquarter key tasks and cyberse special tasks which are associated with the tasks.

Example one

Referring to fig. 1 and fig. 2, the present embodiment provides a method for auxiliary analysis of operation conditions of each company unit based on a conference, including:

s1, word2vec model training process: filtering office system data, then segmenting words, combining with a user-defined term library, and training through a word2vec algorithm to obtain a word2vec model;

word2vec, a group of correlation models used to generate Word vectors. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic word text. The network is represented by words and the input words in adjacent positions are guessed, and the order of the words is unimportant under the assumption of the bag-of-words model in word2 vec. After training is completed, the word2vec model can be used to map each word to a vector, which can be used to represent word-to-word relationships, and the vector is a hidden layer of the neural network.

In a specific embodiment, the word2vec model training process further includes: the method comprises the steps of firstly importing office system data into a user-defined stop word bank for filtering, then performing jieba word segmentation, merging the office system data with a user-defined term bank to obtain office system data word segmentation texts, and training the office system data word segmentation texts through a word2vec algorithm to obtain a word2vec model.

The custom deactivation thesaurus is used for filtering out useless information, such as: the user-defined term library is a professional subject term library defined by a user, so that the subject library is larger, the contained subject terms are more diversified, and the subsequent correlation operation is more accurate.

S2, constructing a subject word library: firstly, extracting subject words in headquarter task data to form a subject word set, then obtaining keywords associated with each subject word to form a keyword set by combining a trained word2vec model and a manual carding method, and finally merging the subject word set and the keyword set to obtain a subject word library;

in a specific embodiment, in the process of constructing the topic lexicon, topic words in the headquarter task data are extracted to form a topic lexicon, and the topic lexicon is specifically constructed by a TF-IDF algorithm, importing a hot topic lexicon and a manual combing method.

The TF-IDF (term frequency-inverse document frequency) algorithm is a commonly used weighting technique for information retrieval and data mining. TF is term frequency (termfequency), and IDF is Inverse text frequency index (Inverse document frequency). The main idea of TF-IDF is: if a word or phrase appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification.

TF-IDF is a statistical method to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of the degree of relevance between a document and a user query.

On the basis of a word2vec model and a TF-IDF algorithm, a hot word bank and a manual carding method are introduced to supplement and correct the word set, so that the correctness of the subsequent association steps is ensured.

S3-1, subject word-task association process: importing task data, wherein the task data comprises headquarter task data and internet province task data, and performing association analysis (for example, finding intersection) on the subject word library and the task data to obtain subject word-task association data (for example, displaying in the form of an association tree diagram);

s3-2, subject term-conference association process: importing the online province conference data, and performing association analysis (for example, finding intersection) on the subject word library and the online province conference data to obtain subject word-conference association data (for example, displaying in the form of an associated tree diagram);

s4, task-conference association process: and performing correlation analysis on the subject term-task associated data and the subject term-conference associated data, summarizing the task data and the conference data associated with the same subject term into task-conference associated data (conference data corresponding to each task), classifying the tasks corresponding to the subject terms of the conference associated with the headquarter task and the network province task simultaneously into headquarter key tasks, and classifying the tasks corresponding to the subject terms of the conference associated with the network province task only into network province special tasks.

The task-conference associated data comprises conference data corresponding to each task, and for any one of the task-conference associated data, whether the task belongs to a headquarter key task or a network province characteristic task can be judged according to whether a conference associated subject term is associated with the headquarter task and the network province task at the same time, and then according to the conference associated data of the task, the content of the conference is obtained to be analyzed, so that the progress condition of the task is obtained.

Based on the same inventive concept, the application also provides a device corresponding to the method in the first embodiment, which is detailed in the second embodiment.

Example two

In this embodiment, an auxiliary analysis system for operation conditions of each company unit based on a conference is provided, as shown in fig. 3, including: the system comprises a word2vec model training module, a subject thesaurus building module, a subject word-task association module, a subject word-conference association module and a task-conference association module;

In a specific embodiment, the word2vec model training module is further specifically configured to: the method comprises the steps of firstly importing office system data into a user-defined stop word bank for filtering, then performing jieba word segmentation, merging the office system data with a user-defined term bank to obtain office system data word segmentation texts, and training the office system data word segmentation texts through a word2vec algorithm to obtain a word2vec model.

In a specific embodiment, in the topic lexicon building module, topic words in headquarter task data are extracted to form a topic lexicon, and the topic lexicon is specifically implemented by a TF-IDF algorithm, a hot-spot lexicon importing method and a manual combing method.

Since the apparatus described in the second embodiment of the present invention is an apparatus used for implementing the method of the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the deformation of the apparatus, and thus the details are not described herein. All the devices adopted in the method of the first embodiment of the present invention belong to the protection scope of the present invention.

Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims

1. A conference-based auxiliary analysis method for operation conditions of each company unit is characterized by comprising the following steps:

2. The method of claim 1, wherein: the word2vec model training process further comprises the following specific steps: the method comprises the steps of firstly importing office system data into a user-defined stop word bank for filtering, then performing jieba word segmentation, merging the office system data with a user-defined term bank to obtain office system data word segmentation texts, and training the office system data word segmentation texts through a word2vec algorithm to obtain a word2vec model.

3. The method according to claim 1 or 2, characterized in that: in the process of constructing the topic word library, extracting topic words in headquarter task data to form a topic word set, and specifically realizing the topic word set through a TF-IDF algorithm, importing a hot topic word library and a manual carding method.

4. A conference-based auxiliary analysis system for operation conditions of each company unit is characterized in that: the method comprises the following steps: the system comprises a word2vec model training module, a subject thesaurus building module, a subject word-task association module, a subject word-conference association module and a task-conference association module;

5. The system of claim 4, wherein: the word2vec model training module is further specifically configured to: the method comprises the steps of firstly importing office system data into a user-defined stop word bank for filtering, then performing jieba word segmentation, merging the office system data with a user-defined term bank to obtain office system data word segmentation texts, and training the office system data word segmentation texts through a word2vec algorithm to obtain a word2vec model.

6. The system according to claim 4 or 5, characterized in that: in the subject word bank building module, the subject words in the headquarter task data are extracted to form a subject word set, and the topic word set is specifically realized through a TF-IDF algorithm, a hot word bank import and a manual combing method.