CN117150245B

CN117150245B - Enterprise intelligent diagnosis information generation method, device, equipment and storage medium

Info

Publication number: CN117150245B
Application number: CN202311412939.4A
Authority: CN
Inventors: 魏炜; 刘红瑜
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2024-02-13
Anticipated expiration: 2043-10-30
Also published as: CN117150245A

Abstract

The invention is applicable to the field of intelligent diagnosis of enterprise information, and provides an intelligent diagnosis information generation method, device, equipment and storage medium for enterprises, wherein the method comprises the following steps: acquiring source data and generating preprocessing data; grouping objects in the preprocessing data by adopting a clustering model, a classification model and a topic mining modeling mode to generate a corresponding topic set and probability distribution; normalizing the generated theme set, and respectively configuring weight values of corresponding models; calculating the weighted score of the prediction probability of each object in all models according to the probability distribution and the weight value of the corresponding model; and extracting information from the packet result and then analyzing the information to generate a diagnosis result abstract. The preprocessing data are grouped by adopting three models at the same time, the weighted score is calculated according to the weight of each model, and the topic category with the highest score is selected and used as the final grouping result, so that the accuracy of the grouping result is improved, and the diagnosis result is more reliable.

Description

Enterprise intelligent diagnosis information generation method, device, equipment and storage medium

Technical Field

The invention relates to the field of intelligent diagnosis of enterprise information, in particular to an intelligent diagnosis information generation method, device, equipment and storage medium for enterprises.

Background

The enterprise diagnosis is to comprehensively analyze and evaluate the operation condition, the operation problem and the potential risk of the enterprise, further know the current operation condition, find the problem and the opportunity, and propose corresponding improvement measures and strategy suggestions, and is generally used for assisting an enterprise high-rise, a financial institution or a management training company to evaluate and diagnose the enterprise strategy and risk.

At present, the existing enterprise diagnosis is to rely on a special analyst to review a large amount of data, manually arrange and judge according to experience, and because the amount of information is large, and the result of data diagnosis is judged according to the experience of the analyst, the problem that the query diagnosis efficiency is poor and the result of judgment according to experience is not completely reliable exists.

In order to solve the problems, in the prior art, a clustering and classifying method is adopted to collect enterprise internal data, analysis scheme is carried out on the data, clustering is unsupervised learning, machines are enabled to classify mass texts, classification is supervised learning, users define some categories, then the machines are classified according to the categories given by the users, omission exists in information mining by the clustering and classifying method in the prior art, the requirements of the users cannot be met, the comprehensiveness of the information mining is difficult to ensure, and the existing information diagnosis method is difficult to accurately group the data; in addition, the enterprise diagnosis method in the prior art performs general analysis on the acquired data, and has no targeted data analysis object, so that the defect of unreliable data analysis results exists.

In view of the foregoing, there is a need for an enterprise intelligent diagnosis information generation method to solve the problems of incomplete information mining, difficulty in realizing accurate grouping and unreliable data analysis results in the existing enterprise diagnosis methods.

Disclosure of Invention

The embodiment of the invention provides an enterprise intelligent diagnosis information generation method, device, equipment and storage medium, which aim to solve the problems that the information mining is incomplete, accurate grouping is difficult to realize and the data analysis result is unreliable in the existing enterprise diagnosis method.

The embodiment of the invention is realized in such a way that an enterprise intelligent diagnosis information generation method is provided, which comprises the following steps:

acquiring source data, preprocessing the source data, and generating preprocessed data, wherein the preprocessed data comprises a plurality of objects, and the objects are words and/or phrases;

grouping objects in the preprocessing data by adopting a clustering model, a classification model and a topic mining modeling mode to generate a corresponding topic set and probability distribution, wherein each topic set comprises one or more topics, each topic comprises one or more objects, and the probability distribution predicts the probability that each model predicts that the object is respectively affiliated to different topics;

Carrying out normalization processing on the corresponding theme sets, and respectively configuring weight values of corresponding models for the theme sets subjected to the normalization processing;

calculating weighted scores of the prediction probabilities of all the objects in all the models according to the probability distribution and the weight values of the corresponding models, and selecting the topic category with the highest weighted score as a final grouping result;

and extracting and analyzing the information of the grouping result to generate a diagnosis result abstract.

Further, the step of extracting and analyzing information according to the grouping result to generate a diagnosis result abstract specifically comprises the following steps:

extracting keywords from the grouping result to generate a core text of the grouping result;

carrying out emotion analysis on the core text to identify emotion polarity of the core text;

and generating an enterprise evaluation diagnosis abstract according to the emotion analysis result.

Still further, the keywords include entity information, positive information, or negative information.

extracting named entity information from the grouping result to generate entity information of the grouping result;

Performing time sequence analysis on the entity information to identify the change trend of the event along with time;

and generating an enterprise event combing abstract according to the time sequence analysis result.

Still further, the entity information is one of a name, a place, a time, or an event or any combination thereof.

extracting keywords from the grouping result to generate a core text of the grouping result, and extracting named entity information from the grouping result to generate entity information of the grouping result;

carrying out emotion analysis on the core text and carrying out time sequence analysis on the entity information at the same time;

and generating an enterprise comprehensive diagnosis result abstract according to the emotion analysis result and the time sequence analysis result.

Still further, the modeling is business topic modeling using an LDA model.

The embodiment of the invention also provides an enterprise intelligent diagnosis information generating device, which comprises:

the data preprocessing unit is used for acquiring source data, preprocessing the source data and generating preprocessed data, wherein the preprocessed data comprises a plurality of objects, and the objects are words and/or phrases;

The multi-model grouping unit is used for respectively grouping the objects in the preprocessing data by adopting a clustering model, a classifying model and a topic mining modeling mode to generate a corresponding topic set and probability distribution, wherein each topic set comprises one or more topics, each topic comprises one or more objects, and the probability distribution predicts the probability that each model predicts that the object is respectively affiliated to different topics;

the normalization processing unit is used for carrying out normalization processing on the corresponding theme sets and respectively configuring weight values of the corresponding models for the theme sets after normalization processing;

the grouping result calculation unit is used for calculating the weighted score of the prediction probability of each object in all models according to the probability distribution and the weight value of the corresponding model, and selecting the subject category with the highest weighted score as a final grouping result;

and the abstract output unit is used for extracting and analyzing the information of the grouping result and generating a diagnosis result abstract.

The embodiment of the invention also provides equipment for generating the intelligent diagnosis information of the enterprise, which comprises the following steps: a memory and a processor;

the memory is used for storing programs;

The processor is used for executing the program to realize each step of the enterprise intelligent diagnosis information generation method.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, realizes the steps of the enterprise intelligent diagnosis information generation method.

The method has the advantages that the source data is preprocessed, then potential association of the preprocessed data can be mined through theme clustering, theme classification and potential main body mining, and grouping basis is acquired according to the potential association, so that the data can be more accurately grouped; the reliability of the enterprise diagnosis result is improved by carrying out standardization or normalization processing on the output of the three models, the weighted score is calculated according to the weight of each model, the topic category with the highest score is selected as the final grouping result, and the accuracy of the grouping result is improved.

Meanwhile, a mode of firstly extracting information and then focusing on analysis is adopted for grouping results, the method comprises the steps of respectively adopting a keyword extraction algorithm and a named entity information extraction algorithm for text data of different topic categories to realize information extraction, carrying out emotion analysis, time sequence analysis and comprehensive analysis on the information extraction results, and finally generating diagnosis results for different objects of enterprises, so that targeted data analysis objects are realized, and the diagnosis results are more focused and more reliable.

Drawings

FIG. 1 is a flow chart of one embodiment of an enterprise intelligent diagnostic information generation method provided by an embodiment of the present invention;

FIG. 2 is a flow chart of one embodiment of the invention for information extraction and analysis of packet results to generate a summary of the diagnostic results;

FIG. 3 is a flow chart of another embodiment of the invention for information extraction and analysis of packet results to generate a summary of the diagnostic results;

FIG. 4 is a flow chart of yet another embodiment of the invention for information extraction and analysis of packet results to generate a summary of the diagnostic results;

FIG. 5 is a schematic structural diagram of an enterprise intelligent diagnosis information generating apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an enterprise intelligent diagnosis information generating apparatus according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The existing clustering and classifying method is adopted to collect enterprise internal data, so that a machine classifies mass texts or a user defines some categories, then the machine classifies according to the categories given by the user, and the problems that information mining is incomplete, accurate grouping is difficult to achieve and data analysis results are unreliable cannot be met are solved.

Example 1

Referring to fig. 1, fig. 1 is a flowchart of an embodiment of an enterprise intelligent diagnosis information generating method according to an embodiment of the present invention.

In step S1, source data is acquired, and preprocessing is performed on the source data to generate preprocessed data, where the preprocessed data includes a plurality of objects, and the objects are words and/or phrases.

It may be understood that in the embodiment of the present invention, text source data of an enterprise may be obtained, which may be text source data inside the enterprise or text source data outside the enterprise, and after the source data is obtained, the source data is preprocessed, where the preprocessed source data may generate a plurality of objects, where the objects may be vocabularies, phrases, or both vocabularies and phrases.

Specifically, in this embodiment, the text source data inside the enterprise may include various internal files, reports, memos, conference records, and the like, in which important information such as operation conditions, decision-making processes, business plans, financial data, and the like of the enterprise are recorded; the system can also be an organization structure and human resource data: the method comprises the steps of recording organization structure diagrams, post responsibilities, employee lists, salary benefit information and training records of aspects such as personnel configuration, management levels, manpower resource strategies and the like in enterprises; the method can also be used for analyzing sales reports, client information, market research data and competitors, wherein the sales reports, the client information and the market research data record information such as enterprise sales performance, market trend and client requirements; and records financial statement, profit table, liability statement, cash flow table of enterprise in aspects of financial status, operation performance, fund flow, etc.; and project planning, progress reporting, production data, inventory records, etc. in terms of enterprise project progress, production efficiency, supply chain management, etc. are recorded.

The enterprise external text data source data comprises industry reports, market research, industry analysis and the like which record aspects of industry trend, market scale, competition pattern, consumer behavior and the like; but also news reports, industry dynamics, company announcements, etc. about the news events, market performances, product release, partners, etc. of the enterprise are recorded; and records customer investigation, market investigation, supplier assessment, etc. regarding customer demand, market competition, supply chain management, etc.; and user comments, scores, feedback, etc. regarding product praise, user experience, market response, etc. are recorded.

In another embodiment, the source data of the text material is preprocessed, which may be a word, phrase or both word and phrase, by removing noise, errors and redundant information from the data.

Removing HTML labels, URL links, special characters, stop words and the like in the source data, wherein specific words can be common nonsensical words, such as 'yes', 'in' and the like; the text set can also be a low-frequency word, namely a word with fewer occurrence times in the whole text set; part-of-speech labels, such as nouns, verbs, adjectives, etc.; then, the text data is segmented into independent words, phrases or words and phrases, then, characteristics which are significant to the task are selected based on a rule method or a machine learning method, and finally, the whole language segments of the source data are segmented into words, phrases or words and phrases which are significant to the intelligent diagnosis of the enterprise.

In step S2, the objects in the preprocessed data are respectively grouped by adopting a clustering model, a classification model and a topic mining modeling mode, so as to generate a corresponding topic set and probability distribution, wherein each topic set contains one or more topics, each topic contains one or more objects, and the probability distribution predicts the probability that each model predicts that the object is respectively affiliated to different topics.

In this embodiment, after the objects pass through the clustering model, the classification model and the main mining model, a corresponding topic set and probability distribution are generated, that is, words, phrases, words and phrases in the objects in the preprocessed data after the source data is preprocessed are grouped, and a corresponding main set and probability distribution are generated.

In one embodiment, the clustering model is a technology for grouping or classifying text data, and is used for mining text data with similarity and relativity, and specifically, one or more of K-means clustering, hierarchical clustering, density clustering or model clustering can be adopted, and specifically, different clustering algorithms can be selected according to actual requirements.

In another embodiment, the classification model is a machine learning model for classifying input data into different predefined categories or labels, and classification predictions may be made for new unlabeled data by learning the association between the input data and its corresponding label.

Classification models are typically trained based on labeled training data sets, where each sample has a known class label, and the model adjusts parameters or weights of the model by learning the relationships between input features and labels to maximize the accuracy of predicting the class of the unknown sample.

Specifically, in this embodiment, the classification model may be logistic regression (Logistic Regression), decision Tree (Decision Tree), random Forest (Random Forest), support vector machine (Support Vector Machine, SVM), or Neural Network (Neural Network).

In yet another embodiment, topic mining modeling is a text analysis technique aimed at finding hidden topics or topics from large-scale text data, understanding the main content and key information behind the text, and extracting useful knowledge from it, in which the topic mining modeling first converts the preprocessed objects into numerical representations, then applies topic modeling algorithms to find topics in the text data, finally analyzes the topic distribution output by the model, interprets the meaning of each topic, and makes visual presentation.

In this embodiment, each topic set includes one or more topics, each topic includes one or more objects, the probability distribution predicts the probability that the objects are respectively affiliated to different topics for each model, and it is easy to understand that the topic set in this embodiment may include a plurality of different commercial topics, such as product service topics of electronic products, food and beverage, healthcare, and the like; market trend class topics such as digital transformation, sustainable development, artificial intelligence application and the like; customer satisfaction, product improvement advice, complaint treatment, and other consumer perspective topics; competitive analysis class topics such as competitor strategies, market share, differential advantages and the like; brand images, public praise management, social media impact, etc., brand reputation class topics, etc.

In this embodiment, it may be understood that each theme includes one or more objects, for example, an electronic product theme includes a plurality of words or phrases of a mobile phone, a computer, an earphone, a watch, etc., where the words or phrases are objects forming one theme, and each theme includes one or more objects. Based on the fact that the probabilities of each object being predicted and divided into different topic sets by the model are different, each object is predicted to be affiliated to a certain topic, a certain probability exists, and the probabilities of all objects affiliated to different topics respectively form probability distribution of the objects corresponding to different topics.

The probability distribution is obtained by counting the number of the subjects to which each object belongs, and further by calculating the number of the subjects to which each object belongs, and the probability that each model predicts the probability that each object belongs to different subjects can be obtained by the probability distribution.

It can be understood that the plurality of objects generated by preprocessing the data source are distributed to different topics, for example, when the objects contain time words, the objects can be specifically distributed to time objects in a topic set and also can be distributed to time sentence objects of a core phrase topic; the time vocabulary can be distributed to the time topics in the entity set, at this time, the probability that the time vocabulary is distributed to the text set and the entity set can be generated, and the probability distribution can be used to make the cluster model, the classification model and the topic mining modeling prediction object respectively belong to the topics such as name topics, place topics and time topics.

In step S3, the generated theme set is normalized, and weight values of the corresponding models are respectively configured for the theme set after the normalization.

In this embodiment, the normalization processing is performed on the topic set, and the weight values of the corresponding models are respectively configured on the topic set after the normalization processing, which is easy to understand that the normalization processing is performed on the text set and the entity set, that is, the normalization processing is performed on the objects in the text set and the entity set, and the weight values of the corresponding models are respectively configured on the topic set after the normalization processing, for example, the weight given to the clustering model is a coefficient a ₁ The weight of the classification model is A ₂ The weight of the topic mining modeling is a coefficient A _3， Wherein the weight value A _1、 A ₂ 、A ₃ May be determined based on domain knowledge, model performance, cross-validation, etc.; may also beManually assigning weights to each model according to expertise; it is also possible to determine the performance of each model using cross-validation and then assign weights based on the performance; it is also possible to learn the weights of each model using an optimization algorithm, such as gradient descent or genetic algorithm, to maximize the performance of the overall result.

In another embodiment, the normalization process may specifically determine a target of normalization of the data set, then select one or more of Min-Max normalization (Min-Max normalization), Z-score normalization (standard normalization), and Decimal Scaling (Scaling), normalize the data set, then calculate parameters required for normalization according to the selected normalization method, for example, min-Max normalization requires calculating a minimum value and a maximum value, Z-score normalization requires calculating a mean value and a standard deviation, and then apply normalization parameters: and applying the calculated normalization parameters to each feature in the data set, and finally, verifying the normalization effect: the normalized data may be visualized or statistically analyzed to verify whether the effect of normalization is in line with expectations. Statistical indicators such as range, mean, and variance of the data can be checked.

In step S4, according to the probability distribution and the weight value of the corresponding model, the weighted score of the prediction probability of each object in all models is calculated, and the topic class with the highest weighted score is selected as the final grouping result.

According to the probability distribution and the weight value of the corresponding model, the weighted score of the prediction probability of each object in all models can be calculated, and it can be understood that the specific data value and the specific weight value of the probability distribution can be generated through the probability distribution and the subject set after normalization treatment, and specifically, the category with the highest score can be selected through argmax (sigma (model weight) to be used as the final voting result;

where Σ represents the sum, from i=1 to n.

The model weight is the weight of the ith model.

The model predictive score is the predictive score or probability of the ith model for each category.

The prediction results of each model are multiplied by their weights, the weighted results of all models are added, and finally the class with the highest weighted score is selected as the final prediction result by the argmax function.

And calculating the weighted score of the prediction probability of each object in all models according to the weight value and the specific probability distribution data value, and selecting the topic category with the highest weighted score as a final grouping result through the weighted score.

In this embodiment, the outputs of the three models are preprocessed, so that the output of each model is normalized or normalized, so that they are compared on the same scale, the convergence speed of the model can be increased, and the result of the model can be better analyzed and explained.

The weight score is calculated by comprehensively considering the importance of different indexes and obtaining a more comprehensive and objective evaluation result, wherein the calculation method of the weight score is to multiply the value of each component or index with the corresponding weight, and then the weight scores are weighted and summed to obtain the final weight score; predictive probability refers to the likelihood of a given word, phrase, vocabulary, and phrase going into a particular topic.

In step S5, the packet result is subjected to information extraction and analysis, and a diagnosis result abstract is generated.

Specifically, in this embodiment, the information extraction is a process of automatically extracting structured information from unstructured or semi-structured text, and specific information extraction may be based on data core text extraction, so as to perform emotion analysis on core and key text, for example, identify emotion polarities, such as positive, negative and neutral, in the text, so as to understand emotion tendencies of employees, clients and markets, further diagnose enterprise evaluation situations, and generate enterprise evaluation summaries.

The information extraction can also be based on named entity identification and extraction, and for different types of data, name, place name, company name and other named entity information in the text are identified, and time series analysis is further carried out on the extracted content, so that how the event and trend change along with the time is obtained, and market fluctuation and event influence are analyzed.

The information extraction can be based on rule information extraction, pattern matching information extraction or statistics information extraction, and the result of the corresponding strategy is further generated according to different information extraction strategies.

For example, if the information of the enterprise evaluation category is analyzed and diagnosed, a specific data set about enterprise evaluation and time can be obtained by extracting the enterprise evaluation information and time, then the data set of enterprise evaluation and time is analyzed, and then the results of two data set groups of enterprise evaluation and time can be generated, so that a database corresponding to the enterprise evaluation and time by a client is obtained, different enterprise evaluation information of different users at different times can be obtained through operation, and the comprehensiveness of information mining is ensured.

If the information of the enterprise event combing category is analyzed and diagnosed, a specific data set about enterprise event combing and time can be obtained by extracting the enterprise event combing and time, then the data set of the enterprise event combing and time is analyzed, and the results of two data set groups of the enterprise event combing and time can be generated, so that a database corresponding to the client enterprise event combing and time is obtained, and the development potential of the enterprise event combing can be predicted through calculation.

It can be understood that the large model is used for automatically generating the enterprise diagnosis result abstract according to the analysis result, and integrating the problems and solutions of the enterprise in different industry fields, different business developments and different industry analyses.

The method comprises the steps of simultaneously adopting three models to group preprocessing data, calculating weighted scores according to weights of the models, selecting a topic class with the highest score as a final grouping result, improving accuracy of the grouping result, adopting a mode of firstly extracting information and then carrying out emphasis analysis on the grouping result, adopting a keyword extraction algorithm and a named entity information extraction algorithm for texts of different topic classes to realize information extraction, carrying out emotion analysis, time sequence analysis and comprehensive analysis on the information extraction result, finally generating diagnosis results for different objects of enterprises, realizing targeted data analysis objects, and enabling the diagnosis results to have emphasis and more reliable results.

By assigning a weight to each model, the importance of certain features in the topic may be more emphasized, thereby increasing the accuracy of the model.

Example two

In the embodiment of the present application, fig. 2 is a flowchart of one embodiment of performing information extraction and analysis on a packet result to generate a summary of the diagnostic result provided by the present invention.

In step S511, keyword extraction is performed on the grouping result to generate a core text of the grouping result;

in step S512, emotion analysis is performed on the core text to identify emotion polarities of the core text;

in step S513, an enterprise evaluation diagnosis digest is generated based on the emotion analysis result.

It can be understood that, when extracting keywords from the grouping result, specifically, core texts of each category can be extracted by using a keyword extraction algorithm such as TF-IDF, etc., to generate core texts of the grouping result, and then emotion analysis is performed on the core texts to identify emotion polarities of the core texts, so as to know emotion tendencies of employees, clients and markets, and an enterprise evaluation diagnosis abstract is generated according to emotion analysis results, so that problems, opportunities and potential threats are revealed.

In this embodiment, keyword extraction may be performed on the packet result by using a Word Bag model (Bag-of-Words), word Embedding (Word Embedding), or other models, and the keywords such as entity information, positive information, and negative information in the packet result may be extracted.

For example, when the original sentence "XX enterprise working environment is good, the relationship between staff is very consistent, but the working time and the working pressure are large" when the keyword is extracted and generated, firstly, punctuation and adjective in the sentence are deleted, then the keyword in the original sentence is extracted, and the extraction of the keyword is as follows: XX enterprise, good working environment, good relationship, long working time and large working pressure; and then carrying out emotion analysis according to the core text, specifically dividing the keyword into positive emotion and negative emotion, wherein the positive emotion is 2 points, the negative emotion is-1 point, then identifying the emotion polarity of the core text, and generating an enterprise evaluation diagnosis abstract according to the emotion analysis result, so as to generate the enterprise evaluation diagnosis abstract of the original sentence for 'XX enterprise work'.

By extracting keywords from the grouping result to generate a core text and then carrying out emotion analysis on the core text, the efficiency can be improved, the text information is reduced to the most important part, so that the calculation and processing time required by an emotion analysis model is reduced, the complexity of the model is reduced, the analysis is targeted, and the reliability of the data analysis result is improved.

Example III

Keywords include entity information, positive information, or negative information.

Through including entity information, positive information and negative information in the keywords, emotion corresponding to the entity can be understood more easily, and further, the result can be calculated and analyzed more accurately and rapidly, and further, the influence of which keywords on emotion scoring can be known.

Example IV

In the embodiment of the present application, fig. 3 is a flowchart of another embodiment of performing information extraction and analysis on the packet result to generate a summary of the diagnostic result provided by the present invention.

In step S521, named entity information extraction is performed on the grouping result to generate entity information of the grouping result;

in step S522, the entity information is subjected to time-series analysis to identify the trend of the event over time;

in step S523, an enterprise event package summary is generated according to the time-series analysis result.

By naming entity information on the grouping result, aiming at different types of data, name, place name, company name and other named entity information in the text can be identified, and further the entity information for generating the grouping result is extracted, so that the related characters, places and partners can be known, then practical sequence analysis is carried out on the entity information, the change trend of events along with time is identified, so that market fluctuation, event influence and the like are analyzed, finally enterprise event combing abstract is generated according to the time sequence analysis result, and the problems and solutions of enterprises in different industrial fields, different business development and different industrial analysis are integrated.

For example, when the event category analysis is performed on the original sentence "a company with stock code X, stock is three times dropped, two times raised, twenty percent dropped, eighteen percent raised, and two percent dropped overall", firstly, punctuation and adjective in the sentence are deleted, then the named entity information of the grouping result is extracted, the entity information of the grouping result is generated, and the keywords in the original sentence are extracted, wherein the extraction keywords are as follows: the stock codes are XA company, stock, nearly 10 days, twenty percent total drop, eighteen percent total rise, and two percent total drop.

Then, the grouping basis is divided according to the vocabularies, which is specifically as follows: event name: company a with stock code X; event vocabulary: stock, twenty percent total drop, eighteen percent total rise, two percent total drop; time vocabulary: approximately 10 days; and then, analyzing the keywords according to a time sequence to generate an event combing abstract of the enterprise A company stock.

It is readily understood that positive potential is the potential for an event to have a positive progression, and negative potential is the potential for an event to have a negative progression.

By identifying named entities and then analyzing the time sequence, accurate entity names and information used in the time analysis process can be ensured, and data analysis deviation caused by named entity errors can be avoided.

Example five

The entity information is one of a name, a place, a time, or an event or any combination thereof.

It is readily understood that in event categories, the keywords of entity information may include a single vocabulary of names, places, times, etc., or any combination of vocabularies.

Extraction by named entities can help integrate information scattered across different data sources into a complete data set, making time series analysis easier.

Example six

In the embodiment of the present application, fig. 4 is a flowchart of still another embodiment of performing information extraction and analysis on the packet result to generate a summary of the diagnostic result provided by the present invention.

In step S531, keyword extraction is performed on the grouping result to generate a core text of the grouping result, and named entity information extraction is performed on the grouping result to generate entity information of the grouping result;

in step S532, emotion analysis is performed on the core text, and time-series analysis is performed on the entity information;

in step S533, an enterprise comprehensive diagnosis result abstract is generated according to the emotion analysis result and the time-series analysis result.

In this embodiment, keyword extraction may be performed on the packet result to generate a packet result and a core text, and named entity information is performed on the packet result to generate entity information of the packet result, first, a packet basis related to data core text and named entity recognition is generated, then, a potential association between the data core text and named entity recognition is established, and then, a data core text extraction policy for emotion analysis and a named entity recognition extraction policy corresponding to an event category are respectively configured, and a public opinion diagnosis digest and a business entity/event carding digest are generated through emotion analysis or time sequence analysis.

For example, for the original sentence "participate in XX mobile phone release meeting today, it is very happy, XX mobile phone chip operation is very fast, duration and sense of touch are also very good, but because of XX mobile phone release meeting delay, make B company, reputation decline. When analyzing emotion type and event type, firstly, punctuation and adjective in a sentence are deleted, then, keywords in the original sentence are extracted, and the extracted keywords are as follows: the mobile phone release meeting, the chip operation is fast, the cruising and touch feeling are good, and the price is high; then, the grouping basis is divided according to the vocabularies, which is specifically as follows:

entity vocabulary: a mobile phone; forward vocabulary: the chip has fast operation, good endurance and touch feeling; negative vocabulary: the price is high;

name vocabulary: company B; negative vocabulary: the credit declines;

then dividing the keywords into positive emotion and negative emotion, and extracting strategies according to emotion data sets, wherein the positive emotion is divided into 2 points and the negative emotion is divided into-1 point; meanwhile, the keywords are classified into approval and disapproval, meanwhile, according to an event data set extraction strategy, the positive event is obtained by 2 points, the negative event is obtained by-1 point, the extraction result is calculated, and the result is further analyzed, so that an evaluation diagnosis abstract and an entity abstract of the original sentence for XX mobile phone release meeting and company B are generated.

By dividing the data set into emotion categories, event categories, emotion categories and event categories, documents can be automatically mapped to a topic space, each topic represents a potential topic or concept, potential association between the documents can be found, and the obtained enterprise comprehensive diagnosis result abstract is analyzed, so that the diagnosis abstract is more accurate.

Example seven

In the embodiment of the application, modeling is to perform business topic modeling by adopting an LDA model.

In this embodiment, the LDA (Latent Dirichlet Allocation) model is preferably used for business body modeling,

firstly dividing each object in source data into a plurality of topics, dividing the proportion of each topic in a document, splitting key words of each topic, recording the distribution of each word in the topic, randomly distributing a topic for each word in each document, further, calculating the probability that each word belongs to each topic according to the current topic distribution condition and the topic distribution condition of other words for each word in each document, then reassigning the topic of each word according to the calculated probability that each word belongs to each topic, and repeating the steps until the appointed iteration times or convergence conditions are reached.

By adding LDA model modeling based on cluster model, classification model and topic mining modeling, LDA can learn the word distribution of each topic and the topic distribution of each document from text data through the processes of iterative inference and topic redistribution.

Example eight

Referring to fig. 5, a schematic structural diagram of an enterprise intelligent diagnosis information generating apparatus 100 according to an embodiment of the present invention includes:

a data preprocessing unit 110, configured to obtain source data, preprocess the source data, and generate preprocessed data, where the preprocessed data includes a plurality of objects, and the objects are words and/or phrases;

the multi-model grouping unit 120 is configured to respectively group the objects in the preprocessed data by adopting a clustering model, a classification model and a topic mining modeling manner, so as to generate a corresponding topic set and probability distribution, where each topic set contains one or more topics, each topic contains one or more objects, and the probability distribution predicts the probability that each model predicts that the object is respectively affiliated to different topics;

the normalization processing unit 130 is configured to normalize the corresponding topic sets, and configure weight values of the corresponding models for the normalized topic sets respectively;

The grouping result calculating unit 140 is configured to calculate weighted scores of prediction probabilities of the objects in all models according to the probability distribution and the weight values of the corresponding models, and select a topic class with the highest weighted score as a final grouping result;

and a summary output unit 150, configured to extract and analyze the information of the grouping result, and generate a summary of the diagnosis result.

The beneficial effects of the enterprise intelligent diagnosis information generating apparatus 100 in the embodiment of the present invention are identical to the beneficial effects of the method for generating enterprise intelligent diagnosis information described above, and will not be described herein.

Example nine

Referring to fig. 6, a schematic structural diagram of an enterprise intelligent diagnosis information generating apparatus 200 according to an embodiment of the present invention includes: a memory 210 and a processor 220;

a memory 210 for storing a program;

a processor 220 for executing a program, implementing the steps of the enterprise intelligent diagnosis information generation method as described above,

the memory 210 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc., as will be appreciated by those skilled in the art.

In some embodiments, the memory 210 may be an internal storage unit of the enterprise smart diagnostic information generation device, such as a hard disk or memory of the enterprise information processor.

In other embodiments, the memory 210 may also be an external storage device of the Smart diagnostic information generating device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the Smart diagnostic information generating device 200.

Of course, the memory 210 may also include both an internal storage unit of the enterprise smart diagnostic information generation device and an external storage device thereof.

In this embodiment, the memory 210 is generally used to store an operating system and various types of application software installed in the enterprise intelligent diagnosis information generating apparatus.

In addition, the memory 210 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 220 may be a central processing unit (Central Processing Unit, CPU), controller, microprocessor, or other data processing chip in some embodiments. The processor 220 is generally used to control the overall operation of the enterprise smart diagnostic information generation device.

In this embodiment, the processor 220 is configured to execute computer readable instructions stored in the memory 210 or process data, such as computer readable instructions for executing an enterprise intelligent diagnosis information generating method of the enterprise intelligent diagnosis information generating apparatus.

Examples ten

A computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the steps of the method for generating intelligent diagnostic information for an enterprise as described above.

The beneficial effects of the storage medium of the invention are equivalent to those of the enterprise intelligent diagnosis information generating method of the enterprise intelligent diagnosis information generating device, and are not described herein.

The invention is operational with numerous general purpose or special purpose computer system environments or configurations.

For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.

Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.

In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk or an optical disk, or a random access memory (Random Access Memory, RAM).

In summary, the embodiment of the invention automatically diagnoses enterprises by text mining on massive text data without manually turning over data research, is used for assisting enterprise high-level, financial institutions or management training companies to evaluate and diagnose enterprise strategy and risk, adopts the sequence of classification before processing, and independently analyzes different types of texts to obtain the characteristics of each type, can more comprehensively identify problems, opportunities and potential threats, and provides deep business insight.

The sequence of extraction and analysis is adopted, so that the obtained emotion and time analysis results have specific corresponding objects, and the implicit subject can be mined by using subject modeling, so that the efficiency and accuracy of enterprise problem discovery and trend prediction are greatly improved, and the accuracy of diagnosis results and the reliability of data analysis are ensured.

It is understood that those skilled in the art can combine the various embodiments of the above embodiments to obtain technical solutions of the various embodiments under the teachings of the above embodiments.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The intelligent diagnosis information generation method for the enterprise is characterized by comprising the following steps of:

acquiring source data, wherein the source data is text data source data of an enterprise, and comprises text data source data inside the enterprise and text data source data outside the enterprise, preprocessing the source data to generate preprocessed data, wherein the preprocessed data comprises a plurality of objects, and the objects are words and/or phrases;

Grouping objects in the preprocessing data by adopting a clustering model, a classification model and a topic mining modeling mode respectively to generate a corresponding topic set and probability distribution, wherein the topic set comprises different business topics, each topic set comprises one or more topics, each topic comprises one or more objects, and the probability distribution predicts the probability that each model predicts that the object is respectively affiliated to different topics;

according to the probability distribution and the weight value of the corresponding model, calculating the weighted score of the prediction probability of each object in all models, and selecting the subject category with the highest weighted score as a final grouping result, wherein the method specifically comprises the following steps:

selecting the category with the highest score as the final grouping result by argmax (Σ (model weight×model prediction score);

wherein Σ represents a summation, from i=1 to n, the model weight is the weight of the ith model, the model prediction score is the prediction score or probability of the ith model for each class, each of the model prediction scores is multiplied by the model weight, the weighted results of all models are added, and the class with the highest weighted score is selected as the final grouping result by the argmax function;

Extracting and analyzing information of the grouping result to generate a diagnosis result abstract, wherein the method comprises the following steps of:

generating an enterprise evaluation diagnosis abstract according to the emotion analysis result;

or generating an enterprise event combing abstract according to the time sequence analysis result;

or generating an enterprise comprehensive diagnosis result abstract according to the emotion analysis result and the time sequence analysis result.

2. The method for generating intelligent diagnostic information of enterprise according to claim 1, wherein the step of extracting and analyzing information according to the grouping result to generate the diagnostic result abstract comprises the following steps:

and carrying out emotion analysis on the core text to identify emotion polarity of the core text.

3. The method for generating intelligent diagnostic information for an enterprise according to claim 2, wherein the keyword comprises entity information, positive information or negative information.

4. The method for generating intelligent diagnostic information of enterprise according to claim 1, wherein the step of extracting and analyzing information according to the grouping result to generate the diagnostic result abstract comprises the following steps:

and carrying out time sequence analysis on the entity information to identify the change trend of the event along with time.

5. The method for generating intelligent diagnostic information for an enterprise of claim 4, wherein the entity information is one of a name, a place, a time, or an event or any combination thereof.

6. The method for generating intelligent diagnostic information for an enterprise of claim 1, wherein the modeling is modeling of business topics using an LDA model.

7. An enterprise intelligent diagnosis information generating apparatus, characterized by comprising:

the data preprocessing unit is used for acquiring source data, wherein the source data is text data source data of an enterprise, the source data comprises text data source data inside the enterprise and text data source data outside the enterprise, preprocessing is carried out on the source data, preprocessed data is generated, the preprocessed data comprises a plurality of objects, and the objects are words and/or phrases;

the multi-model grouping unit is used for respectively grouping objects in the preprocessing data by adopting a clustering model, a classifying model and a topic mining modeling mode to generate a corresponding topic set and probability distribution, wherein the topic set comprises different business topics, each topic set comprises one or more topics, each topic comprises one or more objects, and the probability distribution predicts the probability that each model predicts that the object is respectively affiliated to different topics;

the grouping result calculation unit is configured to calculate a weighted score of the prediction probability of each object in all models according to the probability distribution and the weight value of the corresponding model, and select a topic class with the highest weighted score as a final grouping result, where the weighted score is specifically: the method comprises the following steps:

the summary output unit is used for extracting and analyzing the information of the grouping result and generating a diagnosis result summary, and comprises the following steps: extracting keywords from the grouping result to generate a core text of the grouping result, and extracting named entity information from the grouping result to generate entity information of the grouping result;

8. An enterprise intelligent diagnosis information generation apparatus, characterized by comprising: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the respective steps of the enterprise intelligent diagnosis information generation method according to any one of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the respective steps of the enterprise intelligent diagnosis information generation method according to any one of claims 1 to 6.