CN112434889A - Expert industry analysis method, device, equipment and storage medium - Google Patents

Expert industry analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN112434889A
CN112434889A CN202011504242.6A CN202011504242A CN112434889A CN 112434889 A CN112434889 A CN 112434889A CN 202011504242 A CN202011504242 A CN 202011504242A CN 112434889 A CN112434889 A CN 112434889A
Authority
CN
China
Prior art keywords
industry
information
expert
expert information
emerging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011504242.6A
Other languages
Chinese (zh)
Inventor
杨婉琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Saiante Technology Service Co Ltd
Original Assignee
Shenzhen Saiante Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Saiante Technology Service Co Ltd filed Critical Shenzhen Saiante Technology Service Co Ltd
Priority to CN202011504242.6A priority Critical patent/CN112434889A/en
Publication of CN112434889A publication Critical patent/CN112434889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to artificial intelligence, and provides an expert industry analysis method, device, equipment and storage medium, comprising the following steps: acquiring expert information; carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model; acquiring text information in the expert information, and carrying out named entity recognition through a BilSTM-CRF model; preprocessing the expert information and then judging emerging industries through a pre-constructed word vector model; generating expert industry analysis corresponding to the expert information through the industry category prediction, the entity naming identification and the emerging industry judgment; the labor cost and the time cost are reduced, the classification accuracy of the industry categories is improved through intelligent identification, the loss caused by artificial error and omission judgment is reduced, and the analysis and prediction of the historical industry categories, all the industry categories and the emerging industry of experts are realized; in addition, the invention also relates to a block chain technology, and the expert information can be stored in the block chain.

Description

Expert industry analysis method, device, equipment and storage medium
Technical Field
The invention relates to artificial intelligence, in particular to an expert industry analysis method, device, equipment and storage medium.
Background
The industry classification refers to the detailed division of operation units or individual organizational structure systems of the same-property production in national economy or other economic societies, such as forestry, automobile industry, banking industry and the like. The industry classification may explain the development stage of the industry itself and its position in the national economy. The method comprises the steps of explaining the development stage of the industry and the position of the industry in national economy, analyzing various factors influencing the development of the industry and judging the influence strength on the industry, predicting and guiding the future development trend of the industry, judging the investment value of the industry, disclosing the wind direction of the industry and providing investment decisions or investment basis for various organizations.
At present, the industry classification is based on the national standard of national economic industry classification GB/T4754 plus 2011 issued by the nation, the classification and code of the social economic activity are specified by the standard and are divided into 4 classification levels, namely a gate class, a major class, a middle class and a minor class, and the finest level (minor class) has 1094 classes. The classification of the expert industry is that a specially-assigned person judges the industry category (such as engineering cost and equipment power supply) corresponding to the expert through the information of figure history, scientific research projects, published papers, academic awards and the like of the expert; the classification or analysis of the expert industry by the special person may have misjudgment or inaccuracy due to the subjectivity of the special person.
For the industry classification of experts, due to the fact that the subjectivity of manual classification is strong, and the understanding degree of classification personnel to the national economy industry classification standard is different, error classification is easy to occur, the efficiency of manual classification is low, large manpower, material resources and financial resources are needed, and huge interference is brought to practical application; moreover, the same expert may correspond to different industry categories or may be classified into a plurality of industry categories, and the manual judgment may cause mistakes and omissions, and the accuracy is not high; and the judgment and analysis of emerging industries belonging to the expert are difficult, and the prediction analysis of emerging industries which may appear in the future of the expert cannot be carried out.
Disclosure of Invention
In view of the above, embodiments of the present invention are proposed in order to provide an expert analysis method, apparatus, device and storage medium that overcome the above problems or at least partially solve the above problems.
In order to solve the above problems, an embodiment of the present invention discloses an expert industry analysis method, including:
acquiring expert information, wherein the expert information at least comprises character information, scientific research project information, thesis information and academic winning information;
carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model;
acquiring text information in the expert information, and carrying out named entity recognition through a BilSTM-CRF model;
preprocessing the expert information and then judging emerging industries through a pre-constructed word vector model;
and generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment.
Further, the industry category prediction of the expert information through a pre-constructed multi-label multi-element classification prediction model comprises the following steps:
acquiring N historical industry categories, wherein N is more than or equal to 1, wherein the N historical industry categories comprise collected historical data samples, and the historical data samples comprise positive samples which collect historical data of expert information corresponding to the industry categories and negative samples which collect historical data of the expert information not corresponding to the industry categories;
classifying according to the N historical industry categories to obtain a multi-label multi-element classification prediction model;
and performing industry category prediction on the expert information through the multi-label multi-element classification prediction model, and obtaining the historical industry category of the expert information according to the industry category prediction.
Further, the step of acquiring the text information in the expert information and carrying out named entity recognition through a BilSTM-CRF model comprises the following steps:
acquiring text information in the expert information, wherein the text information comprises a character information text, a scientific research project information text, a thesis information text and an academic prize winning information text in the expert information;
and carrying out named entity recognition on the text information through a BilSTM-CRF model.
Further, the named entity recognition of the text information through a BilSTM-CRF model comprises:
obtaining context information characteristics in the text information and calculating to obtain probability distribution;
generating a classification result according to the probability distribution and outputting the classification result;
and carrying out legal constraint processing on the classification result output.
Further, the step of performing emerging industry judgment on the preprocessed expert information through a pre-constructed word vector model comprises the following steps:
performing word segmentation processing on the expert information and then forming a word vector model through deep learning processing;
and initializing the word vector model and outputting the category probability of the emerging industry through convolution processing.
Further, the step of outputting the emerging industry class probability through convolution processing after initializing the word vector model includes:
initializing the word vector model and inputting the word vector model into a sparse matrix;
performing convolution calculation in the sparse matrix to output a convolution kernel, wherein the convolution calculation adopts cosine similarity calculation;
performing pooling processing on the convolution kernel to obtain convolution characteristics with the same dimensionality;
and outputting the convolution characteristics to the emerging industry class probability through a full connection layer.
Further, the generating of the expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification, and the emerging industry judgment includes:
obtaining the historical industry category corresponding to the expert information through the industry category prediction;
identifying and obtaining all industry categories corresponding to the expert information through the named entity;
judging and obtaining the emerging industry type corresponding to the expert information through the emerging industry;
and integrating the historical industry categories, all the industry categories and the emerging industry categories to form industry analysis corresponding to the expert information.
The embodiment of the invention also discloses an expert industry analysis device, which comprises:
the system comprises an acquisition module, a decision module and a decision module, wherein the acquisition module is used for acquiring expert information, and the expert information at least comprises character information, scientific research project information, thesis information and academic winning information;
the prediction module is used for carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model;
the recognition module is used for acquiring text information in the expert information and carrying out named entity recognition through a BilSTM-CRF model;
the judgment module is used for carrying out judgment on emerging industries through a pre-constructed word vector model after preprocessing the expert information;
and the generation module is used for generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment.
The embodiment of the present invention also discloses a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method described in any one of the above embodiments.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the program is executed by a processor, the steps of the method are realized.
The embodiment of the invention has the following advantages: acquiring expert information, wherein the expert information at least comprises character information, scientific research project information, thesis information and academic winning information; carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model; acquiring text information in the expert information, and carrying out named entity recognition through a BilSTM-CRF model; preprocessing the expert information and then judging emerging industries through a pre-constructed word vector model; generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment; the method has the advantages that the labor cost and the time cost are reduced, the classification accuracy of the industry categories is improved through intelligent identification, the loss caused by manual error and omission judgment is reduced, the identification, prediction and analysis prediction of emerging industries are realized for experts, and information with reference value can be provided for business personnel through the historical industry categories, all the industry categories and the emerging industries corresponding to the experts.
Drawings
FIG. 1 is a flow chart of the steps of a first embodiment of an expert analysis method of the present invention;
FIG. 2 is a flow chart of the steps of a second embodiment of an expert analysis method of the present invention;
FIG. 3 is a flowchart illustrating the third step of an expert analysis method of the present invention;
FIG. 4 is a flow chart of the fourth step of an embodiment of an expert analysis method of the present invention;
FIG. 5 is a flow chart of the fifth step of an embodiment of an expert analysis method of the present invention;
FIG. 6 is a flow chart of steps of an embodiment six of an expert industry analysis method of the present invention;
FIG. 7 is a flow chart of steps of a seventh embodiment of an expert industry analysis method of the present invention;
FIG. 8 is a block diagram of an embodiment of an expert analysis device according to the present invention;
FIG. 9 illustrates a computer device of an expert industry analysis method of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
One of the core ideas of the embodiment of the invention is to obtain expert information, wherein the expert information at least comprises character information, scientific research project information, thesis information and academic winning information; carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model; acquiring text information in the expert information, and carrying out named entity recognition through a BilSTM-CRF model; preprocessing the expert information and then judging emerging industries through a pre-constructed word vector model; generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment; the method has the advantages that the labor cost and the time cost are reduced, the classification accuracy of the industry categories is improved through intelligent identification, the loss caused by manual error and omission judgment is reduced, the identification, prediction and analysis prediction of emerging industries are realized for experts, and information with reference value can be provided for business personnel through the historical industry categories, all the industry categories and the emerging industries corresponding to the experts.
Referring to fig. 1, a flowchart illustrating steps of a first embodiment of an expert industry analysis method according to the present invention is shown, which may specifically include the following steps:
step S10, acquiring expert information, wherein the expert information at least comprises character information, scientific research project information, thesis information and academic winning information;
it should be emphasized that, in order to further ensure the privacy and security of the expert information, the expert information may also be stored in a node of a block chain;
step S20, carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model;
step S30, acquiring text information in the expert information, and carrying out named entity recognition through a BilSTM-CRF model;
step S40, preprocessing the expert information and then judging emerging industries through a pre-constructed word vector model;
and step S50, generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment.
In the embodiment of the invention, expert information is acquired, wherein the expert information comprises character information, scientific research project information, thesis information and academic winning information; the method comprises the steps of predicting the expert information by text classification, wherein the text classification is to automatically classify and mark a text set (or other entities or objects) by a computer according to a certain classification system or standard, and the expert information is automatically classified and marked by the automatic classification system or standard in the method; according to a labeled training document set, finding a relation model between document characteristics and document types, and then judging the type of a new document by using the relation model obtained by learning; the common text classification comprises a decision tree, Rocchio, naive Bayes, a neural network, a support vector machine, linear least square fitting, kNN, a genetic algorithm, maximum entropy Generalized Instance Se, a multi-label classification algorithm and the like.
Further, all industry categories are obtained by carrying out Named Entity Recognition on expert information, wherein the Named Entity Recognition is translated into a Named Entity Recognition (NER), which is also called as 'proper name Recognition', and is an Entity with specific meaning in a Recognition text, and mainly comprises a name of a person, a place, a mechanism name, a proper name and the like; the task of named entity identification is to identify named entities of three major classes (entity class, time class and number class) and seven minor classes (name, organization name, place name, time, date, currency and percentage) in the text to be processed; it generally comprises two parts: (1) identifying entity boundaries; (2) determining entity categories (person name, place name, organization name, or others); the commonly used model for named entity recognition comprises an LSTM + CRF model and an ID-CNN + CRF model, a BilSTM-CRF model, namely a bidirectional LSTM, is adopted in the embodiment of the invention, and the industry category suitable for the expert is obtained by analyzing and calculating through combining expert information and all industry categories.
Further, after the expert information is preprocessed, the preprocessing is deep learning processing, and the deep learning is a general term of a class of pattern analysis methods, and in particular, mainly relates to three classes of methods: (1) convolutional operation-based neural network systems, namely Convolutional Neural Networks (CNNs); (2) self-Coding neural networks based on multi-layer neurons, including self-Coding (Auto encoder) and Sparse Coding (Sparse Coding) which has received much attention in recent years; (3) pre-training in a multilayer self-coding neural network mode, and further optimizing a Deep Belief Network (DBN) of the neural network weight by combining identification information; through multilayer processing, after the initial low-level feature representation is gradually converted into the high-level feature representation, complex learning tasks such as classification can be completed by using a simple model; and after deep learning processing, obtaining identification and judgment analysis of emerging industries belonging to the expert through Word2vec analysis which is a pre-constructed Word vector model.
The method can obtain historical industry categories to which experts belong by respectively predicting industry categories of expert information, obtain all industry categories by identifying named entities, and obtain emerging industries by judging and calculating the emerging industries; the method has the advantages that the labor cost and the time cost are reduced, the classification accuracy of the industry categories is improved through intelligent identification, the loss caused by manual error and omission judgment is reduced, the identification, prediction and analysis prediction of emerging industries of experts are realized, and the information with reference value of the experts can be provided for business personnel by integrating the historical industry categories, all the industry categories and the emerging industries corresponding to the experts.
Referring to fig. 2, a flowchart illustrating steps of a second embodiment of the expert industry analysis method according to the present invention is shown, where the performing industry category prediction on the expert information through a pre-constructed multi-label multivariate classification prediction model specifically includes the following steps:
step S201, obtaining N historical industry categories, wherein N is more than or equal to 1, wherein the N historical industry categories comprise collected historical data samples, and the historical data samples comprise positive samples collected and belong to the historical data of the expert information corresponding to the industry category and negative samples collected and do not belong to the historical data of the expert information corresponding to the industry category;
step S202, classifying according to the N historical industry categories to obtain a multi-label multi-element classification prediction model;
and S203, performing industry category prediction on the expert information through the multi-label multi-element classification prediction model, and obtaining a historical industry category of the expert information according to the industry category prediction.
In the embodiment of the invention, a plurality of or N historical industry categories are obtained, wherein N is more than or equal to 1, historical data are collected in the N historical industry categories, the historical data are used as samples, the historical data are divided into positive samples and negative samples, the positive samples are expert information historical data which belong to the corresponding industry category, and the negative samples are expert information historical data which do not belong to the corresponding industry category; carrying out classification processing after historical data samples exist in the N historical industry categories, wherein in the classification processing, carrying out binary classification processing on the N historical industry categories, wherein the binary classification is translated into a dual or dual classification; in the comparative research containing two types of matters, a binary classification model is obtained after binary classification according to classification made by two marks, N binary classification results are obtained according to the binary classification model, and multivariate classification processing is carried out on the N binary classification results, namely the N binary classification results are added into multivariate classifier training, so that classifiers are overlapped to obtain a multi-label multivariate classification prediction model; then, carrying out industry category prediction on expert information to be predicted through a multi-label multivariate classification model, and obtaining a historical industry category of the expert information through the industry category prediction; the method and the device can avoid the historical industry category analysis of expert information through manpower, reduce the labor cost, improve the accuracy of the historical industry category analysis, and obtain the historical industry category analysis through intelligent identification of the expert information.
Referring to fig. 3, a flowchart illustrating steps of a third embodiment of the expert industry analysis method according to the present invention is shown, where the step of obtaining text information in the expert information and performing named entity recognition through a BiLSTM-CRF model specifically includes the following steps:
step S301, acquiring text information in expert information, wherein the text information comprises a character information text, a scientific research project information text, a thesis information text and a academic winning information text in the expert information;
and step S302, carrying out named entity identification on the text information through a BilSTM-CRF model.
In the embodiment of the invention, entity naming identification is carried out on the text information through a BilSTM-CRF model; the BiLSTM-CRF model is divided into a BiLSTM part and a CPF part; acquiring context information characteristics from text information of expert information in a BilSTM part, calculating according to the context information characteristics to obtain probability distribution, and outputting a classification result; wherein, the BilSTM is the abbreviation of Bi-directional Long Short-Term Memory, is formed by combining forward LSTM and backward LSTM, and is translated into a bidirectional Long-Short Memory network; obtaining a classification result output based on a BilSTM part in the CPF part, and constraining the classification result output through a CPF model to ensure that the classification result output is legal; among them, the CPF model (Conditional Random Field), namely, the Conditional Random Field. Is often used for sequence tagging, including the fields of part-of-speech tagging, word segmentation, named entity recognition, and the like.
Referring to fig. 4, a flowchart illustrating a fourth step of the expert industry analysis method according to the embodiment of the present invention is shown, where the named entity recognition on the text information by using the BiLSTM-CRF model specifically includes the following steps:
step S3021, obtaining context information characteristics in the text information and calculating to obtain probability distribution;
step S3022, generating a classification result according to the probability distribution and outputting the classification result;
and step S3023, performing legal constraint processing on the classification result output.
In the embodiment of the invention, the text information is obtained from the expert information, the obtained text information can be a word or a sentence or a paragraph, the information characteristics are obtained according to the context of the word or the whole sentence, the potential energy distribution (probability distribution) of the word marked as various labels is obtained according to the information characteristics, namely, the potential energy distribution is obtained through the calculation of the BilSTM and output, the classification result is obtained through output, the output classification result is restrained through the CPF model, the finally obtained industry class is legal, the industry class output by the BilSTM is restrained reasonably through the CPF model, and the reasonability among prediction results is considered; BilSTM can be discerned fast, and CPF can improve the rationality of discernment, ensures to obtain accurate all trade categories according to expert's information to reduce because of the loss that artifical mistake and omission was judged and is brought.
Referring to fig. 5, a flowchart illustrating a fifth step of the expert industry analysis method according to the embodiment of the present invention is shown, where the preprocessing of the expert information and the judgment of the emerging industry by the pre-constructed word vector model specifically include the following steps:
step S401, performing word segmentation processing on the expert information, and forming a word vector model through deep learning processing;
and step S402, initializing the word vector model and outputting the emerging industry class probability through convolution processing.
In the embodiment of the invention, the expert information is subjected to word segmentation processing, the word segmentation is the basis of natural language processing, and the word segmentation accuracy directly determines the quality of subsequent part-of-speech tagging, syntactic analysis, word vector and text analysis; performing deep learning processing after Word segmentation processing, wherein the deep learning processing forms a Word vector model through Word2vec processing, wherein Word2vec is a group of relevant models used for generating Word vectors; these models are shallow, two-layer neural networks that are trained to reconstruct linguistic word text. The network is expressed by words, input words at adjacent positions need to be guessed, and the order of the words is unimportant under the assumption of a Word bag model in Word2 vec; after training is completed, the Word2vec model can be used to map each Word to a vector, which can be used to represent the Word-to-Word relationship, and the vector is the hidden layer of the neural network; initializing the word vector model, and outputting the category probability of the emerging industry through convolutional neural network processing after initialization; the Convolutional Neural network is translated into a Convolutional Neural Networks (CNN), is a feedforward Neural network containing Convolutional calculation and having a deep structure, and is one of the representative algorithms of deep learning.
Referring to fig. 6, which is a flowchart illustrating a sixth step of the expert industry analysis method according to the present invention, the initializing the word vector model and outputting the emerging industry category probability through convolution processing may specifically include the following steps:
step S4021, initializing the word vector model and inputting the word vector model into a sparse matrix;
step S4022, performing convolution calculation in the sparse matrix to output a convolution kernel, wherein the convolution calculation adopts cosine similarity calculation;
step S4023, performing pooling treatment on the convolution kernel to obtain convolution characteristics with the same dimensionality;
and S4024, outputting the probability of the emerging industry category through the convolution characteristics through a full connection layer.
In the embodiment of the invention, a Word vector model is initialized, and the initialized Word vector model is input into a sparse matrix, namely, after expert information is participled, Word vectors trained by Word2vec are initialized and combined into the sparse matrix as input; performing convolutional layer calculation in a sparse matrix through a convolutional neural network, wherein the convolutional layer calculation adopts cosine similarity calculation, namely a convolutional kernel is output through the calculation of a convolutional layer; calculating by cosine similarity to obtain a convolution kernel, performing pooling layer processing on the convolution kernel, and forming convolution characteristics with the same dimensionality after the pooling layer processing, namely converting articles of different lengths into fixed-length representation; in the embodiment of the invention, articles with different lengths are converted into articles with fixed length representation; outputting the category probability of the emerging industry through a full connection layer; the method can judge and identify the emerging industry categories according to the expert information, and provides information with reference value for business personnel.
Referring to fig. 7, a flowchart illustrating a seventh step of an expert industry analysis method according to an embodiment of the present invention is shown, where the generating of the expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification, and the emerging industry judgment specifically includes the following steps:
step S501, obtaining the historical industry category corresponding to the expert information through the industry category prediction;
step S502, all industry categories corresponding to the expert information are obtained through the named entity identification;
step S503, judging and obtaining emerging industry types corresponding to the expert information through the emerging industries;
and step S504, integrating the historical industry categories, all the industry categories and the emerging industry categories to form industry analysis corresponding to the expert information.
In the embodiment of the invention, the identification of the historical industry category, all industry categories and emerging industry categories of the expert information is carried out, the identification result is integrated to obtain the industry analysis related to the expert, and the information that the expert has reference value can be provided for business personnel by integrating the historical industry category, all industry categories and emerging industry corresponding to the expert.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Referring to fig. 8, a block diagram of an embodiment of the expert industry analysis device according to the present invention is shown, and may specifically include the following modules:
an obtaining module 1001, configured to obtain expert information, where the expert information at least includes character information, scientific research project information, thesis information, and academic winning information; it should be emphasized that, in order to further ensure the privacy and security of the expert information, the expert information may also be stored in a node of a block chain;
the prediction module 1002 is configured to perform industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model;
the identification module 1003 is used for acquiring text information in the expert information and carrying out named entity identification through a BilSTM-CRF model;
the judging module 1004 is used for carrying out emerging industry judgment through a pre-constructed word vector model after preprocessing the expert information;
a generating module 1005, configured to generate an expert industry analysis corresponding to the expert information according to the industry category prediction, the named entity identification, and the emerging industry judgment.
In a preferred embodiment of the present invention, the predicting module 1002 is configured to perform industry category prediction on the expert information through a pre-constructed multi-label multivariate classification prediction model, and includes:
the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring N historical industry categories, N is more than or equal to 1, historical data samples acquired in the N historical industry categories comprise positive samples acquired and negative samples acquired, and the positive samples belong to the historical data of the expert information corresponding to the industry category and the negative samples do not belong to the historical data of the expert information corresponding to the industry category;
the classification processing unit is used for performing classification processing according to the N historical industry categories to obtain a multi-label multi-element classification prediction model;
and the prediction subunit is used for performing industry category prediction on the expert information through the multi-label multi-element classification prediction model and obtaining the historical industry category of the expert information according to the industry category prediction.
In a preferred embodiment of the present invention, the identifying module 1003 is configured to obtain text information in the expert information and perform named entity identification through a BiLSTM-CRF model, and includes:
the second acquisition unit is used for acquiring text information in the expert information, wherein the text information comprises a character information text, a scientific research project information text, a thesis information text and a academic winning information text in the expert information;
and the identification subunit is used for carrying out named entity identification on the text information through a BilSTM-CRF model.
In a preferred embodiment of the present invention, the identifying subunit is configured to perform named entity identification on the text information through a BiLSTM-CRF model, and includes:
the first calculation unit is used for acquiring context information characteristics in the text information and calculating to obtain probability distribution;
the first generation unit is used for generating a classification result according to the probability distribution and outputting the classification result;
and the first output unit is used for carrying out legal constraint processing on the classification result output.
In a preferred embodiment of the present invention, the determining module 1004 is configured to perform the determination of emerging industry through a pre-constructed word vector model after preprocessing the expert information, and includes:
the deep learning processing unit is used for performing word segmentation processing on the expert information and then forming a word vector model through deep learning processing;
and the probability unit is used for initializing the word vector model and outputting the category probability of the emerging industry through convolution processing.
In a preferred embodiment of the present invention, the probability unit is configured to output the emerging industry class probability through convolution processing after initializing the word vector model, and includes:
the initialization unit is used for initializing the word vector model and inputting the word vector model into a sparse matrix;
the second calculation unit is used for performing convolution calculation in the sparse matrix and outputting a convolution kernel, wherein the convolution calculation adopts cosine similarity calculation;
the pooling processing unit is used for pooling the convolution kernels to obtain convolution characteristics with the same dimensionality;
and the second output unit is used for outputting the convolution characteristics to the emerging industry class probability through a full connection layer.
In a preferred embodiment of the present invention, the generating module 1005 is configured to generate the expert industry analysis corresponding to the expert information according to the industry category prediction, the named entity identification, and the emerging industry judgment, and includes:
the affiliated historical industry category unit is used for predicting the affiliated historical industry category corresponding to the expert information through the industry category;
the all-industry category unit is used for identifying and obtaining all industry categories corresponding to the expert information through the named entity;
the emerging industry type unit is used for judging and obtaining emerging industry types corresponding to the expert information through the emerging industry;
and the industry analysis unit is used for integrating the historical industry categories, all the industry categories and the emerging industry categories to form industry analysis corresponding to the expert information.
Referring to fig. 9, in an embodiment of the present invention, the present invention further provides a computer device, where the computer device 12 is represented in a form of a general-purpose computing device, and components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics terminal port, a processor, or a local bus 18 using any of a variety of bus 18 structures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus 18, micro-channel architecture (MAC) bus 18, enhanced ISA bus 18, audio Video Electronics Standards Association (VESA) local bus 18, and Peripheral Component Interconnect (PCI) bus 18.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)31 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (commonly referred to as "hard drives"). Although not shown in FIG. 9, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, with the program modules 42 configured to carry out the functions of embodiments of the invention.
A program/utility 41 having a set (at least one) of program modules 42 may be stored, for example, in memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN)), a Wide Area Network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As shown, the network adapter 21 communicates with the other modules of the computer device 12 via the bus 18. It should be appreciated that although not shown in FIG. 9, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, and data backup storage systems 34, etc.
The processing unit 16 executes various functional applications and data processing, such as implementing expert business analysis methods provided by embodiments of the present invention, by running programs stored in the system memory 28.
That is, the processing unit 16 implements, when executing the program: acquiring expert information, wherein the expert information at least comprises character information, scientific research project information, thesis information and academic winning information; it should be emphasized that, in order to further ensure the privacy and security of the expert information, the expert information may also be stored in a node of a block chain; carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model; acquiring text information in the expert information, and carrying out named entity recognition through a BilSTM-CRF model; preprocessing the expert information and then judging emerging industries through a pre-constructed word vector model; and generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment.
In an embodiment of the present invention, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the expert industry analysis method as provided in all embodiments of the present application.
That is, the program when executed by the processor implements: acquiring expert information, wherein the expert information at least comprises character information, scientific research project information, thesis information and academic winning information; it should be emphasized that, in order to further ensure the privacy and security of the expert information, the expert information may also be stored in a node of a block chain; carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model; acquiring text information in the expert information, and carrying out named entity recognition through a BilSTM-CRF model; preprocessing the expert information and then judging emerging industries through a pre-constructed word vector model; and generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer-readable storage medium or a computer-readable signal medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPOM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The expert industry analysis method, the expert industry analysis device, the expert industry analysis equipment and the expert industry analysis storage medium are introduced in detail, specific examples are applied in the description to explain the principle and the implementation mode of the expert industry analysis method, and the description of the specific examples is only used for helping to understand the method and the core idea of the expert industry analysis method; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. An expert industry analysis method, comprising:
acquiring expert information, wherein the expert information at least comprises character information, scientific research project information, thesis information and academic winning information;
carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model;
acquiring text information in the expert information, and carrying out named entity recognition through a BilSTM-CRF model;
preprocessing the expert information and then judging emerging industries through a pre-constructed word vector model;
and generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment.
2. The method of claim 1, wherein the performing industry category prediction on the expert information through a pre-constructed multi-label multivariate classification prediction model comprises:
acquiring N historical industry categories, wherein N is more than or equal to 1, wherein the N historical industry categories comprise collected historical data samples, and the historical data samples comprise positive samples which collect historical data of expert information corresponding to the industry categories and negative samples which collect historical data of the expert information not corresponding to the industry categories;
classifying according to the N historical industry categories to obtain a multi-label multi-element classification prediction model;
and performing industry category prediction on the expert information through the multi-label multi-element classification prediction model, and obtaining the historical industry category of the expert information according to the industry category prediction.
3. The method of claim 1, wherein the obtaining of the text information in the expert information by means of a BilSTM-CRF model for named entity recognition comprises:
acquiring text information in the expert information, wherein the text information comprises a character information text, a scientific research project information text, a thesis information text and an academic prize winning information text in the expert information;
and carrying out named entity recognition on the text information through a BilSTM-CRF model.
4. The method of claim 3, wherein the identifying the text message by a BilSTM-CRF model comprises:
obtaining context information characteristics in the text information and calculating to obtain probability distribution;
generating a classification result according to the probability distribution and outputting the classification result;
and carrying out legal constraint processing on the classification result output.
5. The method of claim 1, wherein the preprocessing the expert information followed by the emerging industry judgment via the pre-constructed word vector model comprises:
performing word segmentation processing on the expert information and then forming a word vector model through deep learning processing;
and initializing the word vector model and outputting the category probability of the emerging industry through convolution processing.
6. The method of claim 5, wherein the initializing the word vector model and outputting the emerging industry class probability by convolution processing comprises:
initializing the word vector model and inputting the word vector model into a sparse matrix;
performing convolution calculation in the sparse matrix to output a convolution kernel, wherein the convolution calculation adopts cosine similarity calculation;
performing pooling processing on the convolution kernel to obtain convolution characteristics with the same dimensionality;
and outputting the convolution characteristics to the emerging industry class probability through a full connection layer.
7. The method of claim 1, wherein generating an expert industry analysis corresponding to the expert information from the industry category prediction, the named entity identification, and the emerging industry judgment comprises:
obtaining the historical industry category corresponding to the expert information through the industry category prediction;
identifying and obtaining all industry categories corresponding to the expert information through the named entity;
judging and obtaining the emerging industry type corresponding to the expert information through the emerging industry;
and integrating the historical industry categories, all the industry categories and the emerging industry categories to form industry analysis corresponding to the expert information.
8. An expert industry analysis device, comprising:
the system comprises an acquisition module, a decision module and a decision module, wherein the acquisition module is used for acquiring expert information, and the expert information at least comprises character information, scientific research project information, thesis information and academic winning information;
the prediction module is used for carrying out industry category prediction on the expert information through a pre-constructed multi-label multi-element classification prediction model;
the recognition module is used for acquiring text information in the expert information and carrying out named entity recognition through a BilSTM-CRF model;
the judgment module is used for carrying out judgment on emerging industries through a pre-constructed word vector model after preprocessing the expert information;
and the generation module is used for generating expert industry analysis corresponding to the expert information through the industry category prediction, the named entity identification and the emerging industry judgment.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method of any one of claims 1 to 7.
CN202011504242.6A 2020-12-18 2020-12-18 Expert industry analysis method, device, equipment and storage medium Pending CN112434889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011504242.6A CN112434889A (en) 2020-12-18 2020-12-18 Expert industry analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011504242.6A CN112434889A (en) 2020-12-18 2020-12-18 Expert industry analysis method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112434889A true CN112434889A (en) 2021-03-02

Family

ID=74696728

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011504242.6A Pending CN112434889A (en) 2020-12-18 2020-12-18 Expert industry analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112434889A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298352A (en) * 2021-04-28 2021-08-24 北京网核精策科技管理中心(有限合伙) Enterprise industry information processing method and device, electronic equipment and readable storage medium
CN114997262A (en) * 2022-04-20 2022-09-02 企知道网络技术有限公司 Industry category identification method and device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298352A (en) * 2021-04-28 2021-08-24 北京网核精策科技管理中心(有限合伙) Enterprise industry information processing method and device, electronic equipment and readable storage medium
CN114997262A (en) * 2022-04-20 2022-09-02 企知道网络技术有限公司 Industry category identification method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US10824815B2 (en) Document classification using attention networks
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN112015859A (en) Text knowledge hierarchy extraction method and device, computer equipment and readable medium
US11016740B2 (en) Systems and methods for virtual programming by artificial intelligence
CN113486178B (en) Text recognition model training method, text recognition method, device and medium
CN110490304B (en) Data processing method and device
CN112906398B (en) Sentence semantic matching method, sentence semantic matching system, storage medium and electronic equipment
Sharp et al. Toward Semi-autonomous Information: Extraction for Unstructured Maintenance Data in Root Cause Analysis
CN112434889A (en) Expert industry analysis method, device, equipment and storage medium
CN114896386A (en) Film comment semantic emotion analysis method and system based on BilSTM
Ciaburro et al. Python Machine Learning Cookbook: Over 100 recipes to progress from smart data analytics to deep learning using real-world datasets
CN117807482B (en) Method, device, equipment and storage medium for classifying customs clearance notes
CN116150367A (en) Emotion analysis method and system based on aspects
CN113902569A (en) Method for identifying the proportion of green assets in digital assets and related products
CN117251777A (en) Data processing method, device, computer equipment and storage medium
CN114970467B (en) Method, device, equipment and medium for generating composition manuscript based on artificial intelligence
CN115827865A (en) Method and system for classifying objectionable texts by fusing multi-feature map attention mechanism
CN116089605A (en) Text emotion analysis method based on transfer learning and improved word bag model
CN115238077A (en) Text analysis method, device and equipment based on artificial intelligence and storage medium
CN114817537A (en) Classification method based on policy file data
Abassi et al. Crowd label aggregation under a belief function framework
CN114238641A (en) Method, device, equipment, storage medium and program product for mining operation and maintenance knowledge
Oswal Identifying and categorizing offensive language in social media
Kinger et al. Towards smarter hiring: resume parsing and ranking with YOLOv5 and DistilBERT
US20240281741A1 (en) System and method for generating an action strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination