CN117333037A - Industrial brain construction method and device for publishing big data - Google Patents

Industrial brain construction method and device for publishing big data Download PDF

Info

Publication number
CN117333037A
CN117333037A CN202311336925.9A CN202311336925A CN117333037A CN 117333037 A CN117333037 A CN 117333037A CN 202311336925 A CN202311336925 A CN 202311336925A CN 117333037 A CN117333037 A CN 117333037A
Authority
CN
China
Prior art keywords
data
analysis
publishing
industry
tool
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311336925.9A
Other languages
Chinese (zh)
Inventor
范波
贾广胜
范林海
裴恒利
何宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Publishing Digital Fusion Industry Research Institute Co ltd
Original Assignee
Shandong Publishing Digital Fusion Industry Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Publishing Digital Fusion Industry Research Institute Co ltd filed Critical Shandong Publishing Digital Fusion Industry Research Institute Co ltd
Priority to CN202311336925.9A priority Critical patent/CN117333037A/en
Publication of CN117333037A publication Critical patent/CN117333037A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Accounting & Taxation (AREA)
  • Tourism & Hospitality (AREA)
  • Finance (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for constructing industrial brains for publishing big data, wherein the method comprises the following steps: collecting various publishing industry data from related data of publishing industry inside and outside the industry to obtain a publishing industry brain data set; analyzing and managing the brain data set of the publishing industry by using a data analysis method and a natural language processing method to obtain a common key technical tool set covering the publishing industry; analyzing a brain data set of the publishing industry aiming at a macroscopic development situation in the publishing industry, and visually displaying an analysis result to obtain a brain decision analysis platform of the publishing industry; and carrying out accurate analysis on specific problems in the microscopic business process of the publishing industry to obtain a brain characteristic analysis tool of the publishing industry. The advantages are that: the problem that the artificial intelligence technology is relatively absent in analysis application of visual and macroscopic layers in the publishing industry is solved, and support is provided for decision development and technical innovation of the publishing industry.

Description

Industrial brain construction method and device for publishing big data
Technical Field
The invention relates to the technical field of publishing industry, in particular to an industrial brain construction method and device for publishing big data.
Background
At present, the brain construction of the publishing industry is to collect and sort various content resource data, product information data, product circulation data, product consumption data, consumption feedback data and the like generated in the publishing production process, and the main aim is to gather and integrate the data resources of the publishing industry, perfect the informatization and digitalization infrastructure of the publishing industry and build a full-flow digital production platform of the publishing industry. However, existing publishing industry brain constructions have the following drawbacks:
the brain construction of the publishing industry stays in the improvement of the database, and due to the lack of a sound and reasonable data exchange sharing mechanism between the upstream and the downstream in the industry, the existing artificial intelligence application is concentrated in the microcosmic fields such as arrangement, printing, product recommendation and the like, and the rich data sources and analysis means provided by the artificial intelligence technology are not directly applied to the mesoscopic and macroscale management and analysis of the publishing industry.
Disclosure of Invention
The invention aims to provide an industrial brain construction method and device for publishing big data, so as to solve the problems in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
An industrial brain construction method oriented to big data publishing comprises the following steps,
s1, acquiring a publishing industry brain data set:
collecting various publishing industry data from related data of publishing industry inside and outside the industry to obtain a publishing industry brain data set;
s2, acquiring a common key technical tool set covering the publishing industry:
analyzing and managing the brain data set of the publishing industry by using a data analysis method and a natural language processing method to obtain a common key technical tool set covering the publishing industry;
s3, acquiring a publishing industry brain decision analysis platform:
analyzing a brain data set of the publishing industry aiming at a macroscopic development situation in the publishing industry, and visually displaying an analysis result to obtain a brain decision analysis platform of the publishing industry;
s4, acquiring a publication industry brain characteristic analysis tool:
and carrying out accurate analysis on specific problems in the microscopic business process of the publishing industry to obtain a brain characteristic analysis tool of the publishing industry.
Preferably, step S2 specifically includes,
s21, analyzing and managing data information in a brain data set of the publishing industry by using a data analysis method according to the commonality requirement of the data information analysis of the publishing industry to form a data analysis and management tool;
S22, analyzing the text information of the brain data set of the publishing industry by using a natural language processing method according to the common requirements of the text information analysis of the publishing industry data to form a natural language processing and application tool.
Preferably, the data analysis and management tool comprises one or more of a statistical model and marketing analysis tool, a network analysis and industrial structure analysis tool, a scientific metering modeling and industrial technology analysis tool, a system modeling and enterprise innovation assessment tool, a self-defined index evaluation tool and a space-time analysis and industrial evolution tool;
the statistical model and the marketing analysis tool realize statistical analysis, trend prediction and related information recommendation of data by means of statistical analysis, time sequence analysis and collaborative recommendation algorithm based on multi-dimensional and multi-level industry time sequence data of the digital publishing industry;
the network analysis and industry structure analysis tool builds entity portraits of different nodes in an industry chain based on papers, patents and publications in published big data and marketing data, and carries out node relation recognition according to portrait information and builds a complex relation network;
the scientific metering modeling and industrial technology analysis tool takes papers and patent data as analysis objects, and utilizes literature metering, patent metering, knowledge graph and super network model analysis methods to analyze the general outline, development trend, core mechanism, core talents and the current situation of obstetrical research in the publishing industry;
The system modeling and enterprise innovation assessment tool takes an enterprise competition theory as a guide to construct a comprehensive competitive assessment index system of a marketing enterprise;
the self-defined index evaluation tool dynamically and flexibly formulates evaluation indexes according to actual application scenes and designs index weights by an entropy weight method and an analytic hierarchy process to construct an index system tree depending on publishing big data facing to decision requirements of different levels of governments and enterprises;
the space-time analysis and industry evolution tool applies a geospatial analysis technology, analyzes spatial information of publishing data with different dimensions by adopting a k-means and DBSCAN clustering algorithm according to spatial fields of publishing big data, realizes visual expression of spatial layout of the publishing industry, and reveals spatial distribution and association modes of the publishing industry.
Preferably, in the network analysis and industrial structure analysis tool,
the entity image is constructed by extracting technical feature words of papers and patent texts by adopting a TF-IDF algorithm and a textRank algorithm, and constructing a technical image of a publishing industry; extracting and analyzing data of paper authors, patentees and institutions, and constructing publishing industry experts and institution portraits; performing semantic analysis and keyword extraction on the publication text data by adopting a BERT semantic analysis algorithm, a k-means and a DBSCAN clustering algorithm, constructing publication classification and semantic portraits, and constructing a publication industry market portraits by combining node extraction results of enterprises and the like in large publication data;
Node relation identification is that based on the entity portraits, cooperative relation identification is carried out according to information such as common publication among papers and patents, common publication among patentees and publication information, and competition and potential cooperative relation identification is carried out by applying a similarity discovery algorithm;
and (3) constructing a complex relation network, optimizing keyword clusters by adopting a t-SNE dimension reduction algorithm based on the entity portraits and the relation recognition results, optimizing the relation network layout by adopting a force-guided layout algorithm, and realizing complex network analysis of publishing industry entities.
Preferably, the natural language processing and application tool comprises one or more of a web crawler and open source data integration tool, a machine translation and international industry dynamic monitoring tool, an entity identification and industry key role analysis tool, a semantic annotation and industry public opinion analysis tool and an information retrieval and personalized knowledge service tool;
the web crawler and open source data integration tool firstly comprehensively acquires webpage data based on a traversal strategy, and then performs duplicate checking and duplicate removal, text classification, entity extraction and sensitive word recognition processing on the initial data to obtain effective data required by a user and construct a related data set;
The machine translation and international industry dynamic monitoring tool crawls information of overseas main publications and overseas main publications by utilizing a crawler technology, and displays crawler contents in a Chinese form by utilizing a machine translation technology, so that book copyright introduction analysis and book product culture overseas analysis are realized;
the entity identification and industry key role analysis tool performs entity identification on the publishing industry data text resource, provides a role label for an identification result, and further determines roles of the entity in a publishing industry chain;
the semantic annotation and industry public opinion analysis tool automatically combing the profound connotation of the publishing content resources through keyword annotation, association annotation, classification annotation and public opinion analysis, and profoundly managing and applying the publishing content resources from the semantic level and profoundly mining the publishing content;
the information retrieval and personalized knowledge service tool forms a reusable data retrieval rule based on the keywords and the information resources, periodically maintains an index structure, provides information retrieval support for users, and simultaneously provides content recommendation for the users by using a recommendation algorithm.
Preferably, the semantics are labeled in an industrial public opinion analysis tool,
The keyword labeling takes word vectors and dictionaries as technical supports, uses a vector constructor to perform word segmentation, word stopping and framework feature operation on text resources to complete feature construction of indexing texts, uses a correlation finder to perform direct matching and correlation matching, acquires a keyword list of the texts, and finally calculates the correlation degree between the keywords and the texts through a calculation layer to realize keyword labeling of the texts;
the association labeling is characterized in that firstly, keywords are extracted from a text by using a TextRank algorithm to serve as feature description of the text, and then, topic labels describing the text are obtained by using word frequency statistics and label matching operation to realize topic association labeling of text resources;
the classification label selects a corresponding classification algorithm model based on deep learning according to training data, available computing resources and algorithm performance requirement angles, and trains the selected model to realize classification prediction of an input text;
and the public opinion analysis is based on a HowNet emotion dictionary, and subjective information in book comment texts is extracted, analyzed, processed, induced and inferred.
Preferably, in the information retrieval and personalization knowledge service tool,
The establishment of the search rule is to establish an association index library between the search rule and the resource by utilizing a semantic annotation result according to the existing resource of the current publishing mechanism;
the index structure maintains an index library established for the retrieval rule, and updates a result set pointed by an index in the library, so that the freshness of data during retrieval is ensured;
the personalized knowledge service filters data which is already recommended to the user by carrying out portrayal on the natural attribute and dynamic behavior of the user and matching the keyword and the weight of the user portrayal and the data marking, simultaneously considers hot data and data with new timeliness in a database, and then carries out fine-ranking recommendation on the recommended data according to the matching weight and the specific service condition; and when the user and the content are large, adopting a recall strategy to reduce recommended content.
Preferably, step S3 comprises in particular,
s31, carrying out data analysis and visual display from a large-scale level of the publishing industry on the basis of macroscopic policies, the Internet and enterprise data, and realizing the depiction of macroscopic layout of the publishing industry;
s32, based on enterprise, technology and product data, combining a social network analysis method, carrying out data analysis and visual display from the full chain level of the publishing industry, and realizing the description of the static space layout and dynamic development rule results of the publishing industry;
S33, based on the specific enterprise and related industrial data, data analysis and visual display are carried out from the aspect of the strong main body of the publishing industry, and the development situation of the enterprise and the expression of the related relation between the enterprise and the industry field are realized.
Preferably, the specific questions in the microscopic business process of the publishing industry comprise one or more of publishing topic auxiliary content analysis, author intelligent inquiry, publication heat analysis and publication marketing data analysis.
The invention also aims to provide an industrial brain construction device for big data of publications, which comprises,
publishing industry brain dataset acquisition module: the method is used for collecting various publishing industry data from related data of publishing industry inside and outside industries to obtain a publishing industry brain data set;
a common key technology tool set acquisition module covering the publishing industry: the method is used for analyzing and managing the brain data set of the publishing industry by using a data analysis method and a natural language processing method to obtain a common key technical tool set covering the publishing industry;
the publishing industry brain decision analysis platform acquisition module: the brain decision analysis platform is used for analyzing a brain data set of the publishing industry aiming at the macroscopic development situation in the publishing industry, and visually displaying an analysis result to obtain a brain decision analysis platform of the publishing industry;
The publishing industry brain characteristic analysis tool acquisition module: the method is used for accurately analyzing the specific problems in the microscopic business process of the publishing industry and obtaining the brain characteristic analysis tool of the publishing industry.
The beneficial effects of the invention are as follows: aiming at the requirements of creative monitoring and evaluation of large published data, the invention collects, extracts and correlates various published industry data, builds a common technical tool set of various application data analysis methods and natural language processing methods aiming at the data information of a published industry chain, and presents and intuitively applies industrial data analysis results to different levels of industrial application scenes, so that the industrial data analysis results can present the middle macroscopic distribution situation of the industry chain for management decisions, and realize the accurate analysis and service of specific problems of the industry chain on a microscopic level. The problem that the artificial intelligence technology is relatively lost in analysis application of view and macroscopic level in the publishing industry is solved, and support is provided for macroscopic management decision making, middle macroscopic situation analysis and microscopic technical innovation in the publishing industry.
Drawings
FIG. 1 is a flow chart of a method in an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the invention.
As shown in fig. 1, in this embodiment, an industrial brain construction method for publishing big data is provided, by collecting, extracting and correlating multiple publishing industry data, multiple common key technical tools based on data analysis and natural language processing are constructed for the data information of the publishing industry chain, the middle macroscopic distribution situation of the industry chain is presented for management decision, and accurate analysis and service of specific problems of the industry chain are realized on a microscopic level. The method comprises the steps of,
1. acquiring a publishing industry brain dataset:
and acquiring various publishing industry data from related data of publishing industries inside and outside the industry to obtain a publishing industry brain data set.
The publishing industry data comprises one or more of policy data, enterprise data, talent data, publication data, marketing data, internet data, paper data, patent data and software copyright data, and the publishing industry data can be specifically selected according to actual conditions.
2. Acquiring a common key technical tool set covering the publishing industry:
and analyzing and managing the brain data set of the publishing industry by using a data analysis method and a natural language processing method to obtain a common key technical tool set covering the publishing industry.
In particular to the preparation method of the composite material,
1. according to the commonality requirement of the publishing industry data information analysis, the data analysis method is used for analyzing and managing the data information in the publishing industry brain data set, and a data analysis and management tool is formed. The data analysis and management tool comprises one or more of a statistical model and marketing analysis tool, a network analysis and industrial structure analysis tool, a scientific metering modeling and industrial technology analysis tool, a system modeling and enterprise innovation assessment tool, a self-defined index assessment tool and a space-time analysis and industrial evolution tool; specifically, the selection can be performed according to actual conditions.
(1) The statistical model and the marketing analysis tool realize the targets of statistical analysis, trend prediction, relevant information recommendation and the like of data by means of simple statistical analysis, time sequence analysis and collaborative recommendation algorithm based on multi-dimensional and multi-level industry time sequence data of the digital publishing industry.
In this embodiment, three time series analysis algorithms, that is, an autoregressive moving average model, a growth curve model, and a time series decomposition model, are used to analyze and predict publishing industry data.
The autoregressive moving average model is a 'mixed' model based on an autoregressive model and a moving average model, and can reflect the influence of historical factors of a data sequence and the self-variation law at the same time. The invention adopts an autoregressive moving average model to express trend changes of publishing industry data such as digital publishing industry scale, publication sales and the like, and the formula is as follows:
Y t =β 01 x t-12 x t-2 +…+β p x t-p01t-12t-2 +…+α pt-pt
Wherein t is the current time, x t Is the result of the data at the current time (e.g., digital publishing industry scale, publication sales), ε t As the error of the current time, Y t Is the observed value of the predicted data (such as digital publishing industry scale and publication sales), and alpha and beta are model coefficients.
The growth curve model is used for describing the curve of the quantity indexes of various social and natural phenomena showing certain regularity according to time change, and can better describe the whole process of occurrence, development and maturation of things. The invention adopts a growth curve model to predict the life cycle of publishing industry data (such as the industrial scale of electronic books), and the formula is as follows:
y=ka bt
wherein k >0,0< a <1,0< b <1, y is a published industry data lifecycle prediction value (e.g., e-book industry scale). The coefficients k, a, b are calculated by means of qualitative analysis.
The time series decomposition is to separate different factors constituting fluctuation on the time series, and analyze each factor separately, so as to help to explain the component cause of the fluctuation of the quantity index. The invention adopts a time sequence decomposition model to analyze certain publishing industry data (such as enterprise income and publication sales) with periodic characteristics.
Using Y t Periodic time series data representing the publishing industry (e.g., publication sales), T t 、C t 、S t 、I t Respectively show trend and circulationThe cyclic, seasonal and irregular variation factors, the periodic time series data of the publishing industry can be decomposed into the following three modes:
addition mode:
Y t =T t +C t +S t +I t
multiplication mode:
Y t =T t C t S t I t
hybrid mode:
Y t =T t C t S t +I t
for publishing industry data with no obvious trend change due to seasonal change, comparing the average value of the data in the same season in the past year with the total average value of all seasons to obtain seasonal coefficients, and multiplying the average value of all seasons in the last year by the seasonal coefficients to obtain the publishing industry data predicted value of all seasons in the next year; for publishing industry data with trend change and seasonal change, a trend change model is established by analyzing the trend change and the seasonal fluctuation rule of the data, the seasonal coefficient is obtained, and then the seasonal coefficient is corrected and reflected to predict the publishing industry data of each season.
In this embodiment, the collaborative filtering-based recommendation algorithm may achieve the goal of content recommendation according to data purchased by a user, and mainly includes collaborative filtering based on the user and collaborative filtering based on a publication.
The collaborative filtering algorithm based on the user carries out recommendation according to similar users, firstly builds a user-project scoring matrix according to the historical behavior information of the users, then calculates the similarity (such as cosine similarity) among the users according to the scoring matrix, finally obtains the first k most similar users of the target users by setting a similarity threshold value or arranging the similarity in a descending order according to the similarity calculation result, predicts the interest scoring of the users on the publications according to the following formula and finishes recommendation:
Wherein w is uv Representing the similarity between user u and user v, and r represents the interest of user v in publication i. And the same is true.
Collaborative filtering algorithm based on publications is based on similar products, and recommendation is performed by constructing a similarity matrix among publications.
(2) The network analysis and industry structure analysis tool builds entity portraits of different nodes in an industry chain based on papers, patents and publications in published big data and marketing data, and performs node relation recognition according to portrait information to build a complex relation network.
The construction of the entity portrait comprises the following steps: extracting technical feature words of papers and patent texts by adopting a TF-IDF algorithm and a textRank algorithm to construct a technical image of a publishing industry; extracting and analyzing data of paper authors, patentees and institutions, and constructing publishing industry experts and institution portraits; and carrying out semantic analysis and keyword extraction on the publication text data by adopting a BERT semantic analysis algorithm, a k-means and a DBSCAN clustering algorithm to construct publication classification and semantic portraits, and constructing a market portraits of the publishing industry by combining node extraction results of enterprises and the like in the large publication data.
And carrying out cooperative relation recognition according to information such as common publication among papers and patents, common patentees and common publication in published information based on the entity portraits, and carrying out competition and potential cooperative relation recognition by using a similarity discovery algorithm.
And (3) constructing a complex relation network, optimizing keyword clusters by adopting a t-SNE dimension reduction algorithm based on the entity portraits and the relation recognition results, optimizing the relation network layout by adopting a force-guided layout algorithm, and realizing complex network analysis of publishing industry entities.
(3) The scientific metering modeling and industrial technology analysis tool takes papers and patent data as analysis objects, and utilizes analysis methods such as literature metering, patent metering, knowledge maps and super network models to analyze the general outline, development trend, core organization, core talents and the current situation of obstetrical research and study of the publishing industry.
The overall profile analysis is based on literature metering and patent metering, and is to select the quantitative indexes such as papers, international partnership papers, patents and the like and the quality indexes such as total introduced frequency, influence factors, invention patents, national invention patents and the like, and to carry out comparison analysis on the output conditions of published industrial achievements in China and the main countries of the world.
The development trend analysis adopts methods such as literature metering, patent metering, topic evolution analysis, patent technology topic identification and the like, analyzes paper data from the angles of publication time, country and region, journals, research topics and the like, and analyzes patent data from the angles of application trend, layout area, source country, main applicant, patent type and legal state, patent technology constitution, IPC classification number and the like so as to obtain the front research hot spot of the publishing industry, research topic and key technology evolution trend.
The core mechanism identifies a scientific research cooperation network constructed between enterprises based on the treatise and treatise partnership relation between enterprises and the joint application relation of patentees, and judges the core mechanism of the publishing industry according to the characteristic indexes of the network nodes such as treatises, patent numbers, intermediation centrality, proximity centrality and the like of the enterprises.
The core talents identify scientific research cooperation networks between talents constructed based on paper partnership relations among authors and joint application relations among patent applicants, and important talents of the publishing industry are judged according to the number of the authors' papers, the patent application quantity of the patent applicants, the intermediacy center, the proximity center and other network node characteristic indexes.
The present situation of obstetric research is based on the patent data of the combined application of the obstetric research in the published industry of China, takes the nodes of universities, research institutions and enterprises participating in the obstetric research as nodes, takes the patent application as the superside, and constructs the published industry of the weighted supergraph-based published industrial obstetric research application patent supernetwork topology structure, and the formula is expressed as follows: h= (V, E, W). Wherein H is a weighted hypergraph, and the finite set v= { V 1 ,v 2 ,v 3 ,...,v n The set of all nodes in the super network, each v i (i=1, 2, …, n) represents a node in the super network, i.e. participating in the obstetrical research and development of patent applications A mechanism; e= { E 1 ,E 2 ,E 3 ,...,E m Super-network superside, i.e. all teams involved in the obstetrical research and development of patent applications, each E j (j=1, 2, …, m) represents a combination of institutions of the obstetric and research co-pending patent; at the position ofWhen there is a weight w for each superside (E j ) Represents the number of patent applications in each of the combinations of the obstetric and research application patent forms, w= { W (E 1 ),w(E 2 ),w(E 3 ),...,w(E m ) And represents a set of superside weights in the supernetwork.
(4) The system modeling and enterprise innovation assessment tool takes an enterprise competition theory as a guide to construct a comprehensive competitive assessment index system of a marketing enterprise; the index comprises: technological innovation, market competitiveness, talent mobilization, asset support, social impact and the like.
The technological innovation can best embody the factors of enterprise competitiveness and future development potential, and the invention evaluates the factors from the aspects of innovation investment capability, basic research capability, application research capability, technical competition strength and the like of the enterprise. The innovation investment capability mainly reflects the investment of scientific research expenses and the composition of research personnel, and reflects the importance of enterprises on technological innovation; the basic research capability mainly reflects the quality and quantity of papers published by enterprises, and measures and analyzes according to the number of core papers published by the enterprises and the number of average core papers; the application research capability is measured and analyzed from the aspects of patent application, core patent application, invention patent application, core invention patent application and patent application amount of human average invention; the technical competition strength is evaluated through the calculation of the technical similarity among enterprise technical competitors.
The market competitiveness is the result of the enterprise utilizing its own resources, advantages and capabilities and integrating the external comprehensive operation of corresponding resources. The enterprise profit capability is analyzed from the aspects of total compensation rate, operation profit rate, net asset profit rate and the like; the liability is analyzed in terms of liquidity, asset liability, capital liability, etc.; the development capability is analyzed in terms of a business income comparably increasing rate, a total asset comparably increasing rate, a capital accumulating rate and the like; operational capability relates to all asset, mobile asset camping and fixed asset operational capability, etc. for analysis; sales capacity is analyzed in terms of sales net rate, revenue of camping service, number of sales personnel, sales person ratio, etc.
Talent mobilization force analyzes enterprise personnel from two angles of academic distribution and function distribution, focuses on states of high-school personnel, high-management personnel and technicians, and judges positions and advantages and disadvantages of high-quality talents of enterprises in industries.
Asset support is an assessment of the asset support of an enterprise from the perspective of total assets, intangible assets, etc., and potential competitiveness in industries and areas is exploited. Wherein, the total assets refer to all the assets which are owned or controlled by a certain economic entity and can bring economic benefits; intangible assets are evaluated from three aspects of informatization capability, innovation asset and economic competitiveness based on enterprise intangible asset data.
Social impact was evaluated from three aspects of publication web popularity, publication web assessment, and author social impact. Wherein the publication network popularity is based on the number of network reviews of the publication; the publication network evaluation calculates emotion indexes of the network comment text for analysis through emotion analysis according to the network scoring result; the author social impact is evaluated based on the author's prize scale and cumulative number.
(5) The self-defined index evaluation tool dynamically and flexibly formulates evaluation indexes according to actual application scenes and designs index weights through an entropy weight method and an analytic hierarchy process to construct an index system tree depending on publishing big data facing to decision requirements of different levels of governments, enterprises and the like.
The entropy weight method determines objective weights according to the size of index variability. If the information entropy E of a certain index j Smaller indicates that the index is worth changing moreThe larger the amount of information supplied, the greater the effect that can be played in the comprehensive evaluation, and the greater the weight thereof. Conversely, the larger the information entropy of a certain index, the smaller the degree of variation of the index value, the smaller the information amount provided, and the smaller the function played in the comprehensive evaluation, and the smaller the weight. In the present invention, the information entropy of a certain formulation index is expressed as:
Wherein the method comprises the steps ofY ij For the index data X i Normalized values, namely:
if p is ij Definition of =0Obtaining the information entropy E of the index j Then, the formula for calculating the index weight based on the information entropy is as follows:
the hierarchical analysis method utilizes a tree-like hierarchical structure to distinguish complex decision problems from a plurality of simple sub-problems in one hierarchy, and each sub-problem can be independently analyzed. The analytic hierarchy process divides the evaluation scale into five grades of equal strength, slightly strong, quite strong, extremely strong and extremely strong, and endows the evaluation scale with the scale values of famous scales 1, 3, 5, 7 and 9, and additionally sets four scales between five basic scales, and endows the evaluation scale with the scale values of 2, 4, 6 and 8, and the meanings represented by the scales are shown in the following table.
Table 1 analytic hierarchy process evaluation scale description
Evaluation of dimensions Definition of the definition Description of the invention
1 Equally important The contribution degree of the two elements has the same importance
3 Slightly important Experience and judgment slightly favors a certain element
5 Is of great importance Experience and judgment strongly favoring a certain element
7 Is extremely important The actual display very strongly favors a certain element
9 Absolute importance of There is enough evidence to confirm that certain element is absolutely preferred
2,4,6,8 Intermediate values of adjacent dimensions Between two kinds of judgment
In the invention, a judgment matrix A (orthogonal matrix) is constructed for all features participating in index calculation by an analytic hierarchy process, and a is used ij The comparison result of the ith factor with respect to the jth factor is shown:
geometric average (square root method) is carried out on each row vector of the matrix A, and then normalization is carried out, so that each evaluation index weight and each characteristic vector w are obtained:
(6) The space-time analysis and industry evolution tool applies a geospatial analysis technology, analyzes spatial information of publishing data with different dimensions by adopting a k-means and DBSCAN clustering algorithm according to spatial fields of publishing big data, realizes visual expression of spatial layout of the publishing industry, and reveals spatial distribution and association modes of the publishing industry.
k-means algorithm clusters publishing industry data space based on Euclidean distance according to publishing industry data x i Is aggregated into k spatial categories, each data belonging to a spatial category t after clustering i And the k spatial cluster centers are u i . Defining the loss function of k-means clustering as follows:
spatial clustering t by finding the best published data i The loss function L is minimized, and then the spatial clustering center u is subjected to i And (5) performing direct calculation to obtain a spatial analysis result of the publishing industry data.
DBSCAN algorithm performs density-based publishing industry data spaceClustering, by finding high density regions separated by low density regions, the high density regions are treated as a clustered "cluster". If publishing industry data sample x i Comprises at least MinPts data samples, namely N ε (X i ) Equal to or greater than MinPts, then refer to data point x i Is the core point. Determining core objects of all spatial clustering of the published data according to given neighborhood parameters epsilon and MinPts; for each unprocessed core object, finding all samples with the reachable point density to form a cluster, and realizing the spatial clustering of the analyzed publishing industry data.
2. According to the commonality requirement of the publication industry data text information analysis, a natural language processing method is used for analyzing the text information of the publication industry brain data set to form a natural language processing and application tool. The natural language processing and application tool comprises one or more of a web crawler and open source data integration tool, a machine translation and international industry dynamic monitoring tool, an entity identification and industry key role analysis tool, a semantic annotation and industry public opinion analysis tool and an information retrieval and personalized knowledge service tool; specifically, the selection can be performed according to actual conditions.
(1) The web crawler and open source data integration tool firstly obtains webpage data as comprehensively as possible based on a traversing strategy, and then performs duplicate checking, duplicate removal, text classification, entity extraction, sensitive word and other identification processing on the initial data obtained by crawling so as to obtain effective data required by a user and construct a related data set.
The duplicate checking and removing method comprises the steps of firstly adopting an MD5 hash algorithm to code webpage content text into fingerprint information, enabling all words to be uniformly distributed in the whole space, and then calculating the similarity of the two text fingerprint information through the Hamming distance to identify similar texts. The similarity calculation formula of the two hash-coded webpage data is as follows:
wherein S is 1 、S 2 Hash representing two web page dataValue of the Highway, S 1k ,S 2k Representing the respective kth numerical value, 64 is the number of binary string bits obtained after the hash function processes the text. When the Hamming distance of the two texts is smaller than or equal to a specified threshold value, the two texts are considered to be repeated texts, the repeated texts are subjected to de-duplication processing, and only one content is reserved in the final data.
The text classification adopts a deep convolutional neural network as a classification model, sets classification systems for network texts such as news, product introduction, technical introduction and the like of different sources, determines a text expression model after word segmentation and stop word removal according to classification setting results, performs dimension reduction and extraction of training set features for a text matrix, and selects a classification model with highest performance to classify the text to be classified after the text classifier is trained by applying the classification model and a classification algorithm.
The entity extraction adopts an information extraction strategy based on iteration and combination, firstly extracts basic entity attribute information of the publishing fields such as author names, organization names, place names (country, province, city, county, village and the like), product names, technical names and the like from the existing network text, searches all possible web pages containing specific attribute information through a searcher, then performs page segmentation on all the web pages to be extracted, performs iterative extraction on blocks in the web pages for a plurality of times, integrates the extracted data together in an incremental mode, and finally forms a complete information entity.
The sensitive word recognition is based on a sensitive word list and a sensitive word deformation, and relevant sensitive words are recognized from the crawled webpage text and marked.
(2) The machine translation and international industry dynamic monitoring tool utilizes a crawler technology to crawl information of overseas main publications and overseas main publications, and utilizes the machine translation technology to display crawler contents in Chinese form, so that book copyright introduction analysis and book product culture overseas analysis are realized.
Machine translation is implemented using a attention-based transducer model. The model adopts a completely self-attention-based 'encoder-decoder' structure, wherein the encoder is formed by stacking 6 layers of networks with the same structure, and each layer has two sublayers: the first sub-layer is a multi-headed self-focusing layer and the second sub-layer is a conventional feed-forward layer. The decoder is also stacked from 6 layers of the same network, but each layer includes three sublayers: the first sub-layer is a masked multi-headed self-attention layer that prevents the attention model from noticing subsequent words, the second sub-layer is a multi-headed attention layer, and the third sub-layer is a feed-forward layer. The attention mechanism can be seen as giving a series of queries Q and a series of key value pairs K, V, the weights of V being obtained by the calculation of Q and K, and further the weighted summation of V, the formula of which is as follows:
Attention(Q,K,V)=softmax(f(Q,K)V)
Where k=v=q. Q, K, V is respectively subjected to different linear transformations for h times to obtain Q 1 ,Q 2 ,…,Q h ,K 1 ,K 2 ,…,K h ,V 1 ,V 2 ,…,V h . Attention mechanisms are applied to Q for all i e 1,2,3,..h i ,K i ,V i And then, splicing the results of the h attention models, and decoding the text source language sentences one by one to obtain target language words and form target sentences.
(3) The entity identification and industry key role analysis tool performs entity identification on the publishing industry data text resource, provides a role label for an identification result, and further determines roles of the entity in a publishing industry chain.
The entity identification adopts a sequence labeling method, after the text sentence sequence is segmented and preprocessed, a deep neural network is combined with a conditional random field, the relation between the text sequence and the label is predicted by adopting a BI-LSTM, and the interrelation between labels is predicted by a transfer matrix in the conditional random field, so that the category label formulation of vocabulary or phrase in the text is realized, for example: name, place name, organization name, term and other categories, for constructing a knowledge graph of the publishing industry.
The industry role analysis first builds a role keyword library of the publishing industry (e.g., the role keywords of the publication have publication units, author units, partner units, sales units, etc.), and then builds feature words describing the role according to each keyword. And collecting important information such as the sections and the like where the entity is located according to the entity identification result, generating a conditional probability distribution model of random variables through a machine learning conditional random field, finally marking the entry of a feature library for the entity, acquiring the label of the corresponding role according to a statistical means, and completing role analysis.
(4) The semantic annotation and industry public opinion analysis tool comprises a keyword annotation, an association annotation, a classification annotation and public opinion analysis module, and deep connotation of publication content resources is mainly combed through an automatic system, and the publication content resources are deeply managed and applied from the semantic level, so that the publication content is deeply mined.
The keyword labeling uses word vectors and dictionaries as technical supports, uses a vector constructor to perform word segmentation, word deactivation, framework feature and other operations on text resources to complete feature construction of indexing texts, uses a correlation finder to perform direct matching and correlation matching, acquires a keyword list of the texts, and finally calculates the correlation degree of keywords and the texts through a calculation layer to realize keyword labeling of the texts.
The association labeling firstly uses a TextRank algorithm to extract keywords from a text as feature description of the text, and then uses word frequency statistics, label matching and other operations to obtain a theme label describing the text, so as to realize the theme association labeling of the text resource.
The classification label firstly selects a corresponding classification algorithm according to the angles of training data, available computing resources, algorithm performance requirements and the like, wherein the traditional classification algorithm mainly comprises a Bayesian network, a hidden Markov model, a support vector machine and the like, and the classification algorithm based on deep learning comprises a cyclic neural network, a convolution neural network, a graph neural network, various variants and the like. Training the selected model to realize the classification prediction of the input text.
Public opinion analysis is based on a universal HowNet emotion dictionary, and subjective information (such as views, emotions, attitudes, evaluations, moods and the like) in book comment texts is extracted, analyzed, processed, induced and inferred.
(5) The information retrieval and personalized knowledge service tool forms a reusable data retrieval rule based on the keywords and the information resources, periodically maintains an index structure, provides information retrieval support for users, and simultaneously provides content recommendation for the users by using a recommendation algorithm.
The search rule establishment and the index structure maintenance are realized based on an Apache Lucene architecture. The establishment of the search rule is to establish an association index library between the search rule and the resource, such as terms, policies, enterprises, talents and the like, by utilizing semantic annotation results according to the existing resources of the prior publishing mechanism; the index structure maintains an index library which is mainly established aiming at the retrieval rule, updates a result set pointed by an index in the library, and ensures that the freshness of data during retrieval is ensured.
The personalized knowledge service filters data which is already recommended to the user by carrying out portrayal on the natural attribute and dynamic behavior of the user and matching the keywords and weights of the portrayal of the user and the data marking, simultaneously considers hot data and data with newer timeliness in a database, and then carries out fine-ranking recommendation on the recommended data according to the matching weights and specific service conditions; and when the user and the content quantity are relatively large, adopting a recall strategy to reduce recommended content.
3. Acquiring a publishing industry brain decision analysis platform:
and analyzing the brain data set of the publishing industry aiming at the macroscopic development situation in the publishing industry, and visually displaying the analysis result to obtain a brain decision analysis platform of the publishing industry.
In particular to the preparation method of the composite material,
1. based on macroscopic policies, internet and enterprise data, carrying out data analysis and visual display from the large-scale level of the publishing industry, and realizing the depiction of macroscopic layout of the publishing industry;
2. based on enterprise, technology and product data, and in combination with a social network analysis method, data analysis and visual display are carried out from the full chain level of the publishing industry, so that the description of the static space layout and dynamic development rule results of the publishing industry is realized;
3. based on the industrial data related to the specific enterprises, the data analysis and the visual display are carried out from the aspect of the strong main body of the publishing industry, so that the development situation of the enterprises and the expression of the related relation between the enterprise development situation and the industry field are realized.
4. Acquiring a publication industry brain characteristic analysis tool:
and carrying out accurate analysis on specific problems in the microscopic business process of the publishing industry to obtain a brain characteristic analysis tool of the publishing industry.
The specific problems in the microscopic business flow of the publishing industry comprise one or more of publishing topic auxiliary content analysis, author intelligent inquiry, publication heat analysis and publication marketing data analysis, and the specific problems can be specifically selected according to actual conditions.
In this embodiment, there is also provided an industrial brain construction device for big data of publications, which is capable of implementing the above-mentioned method, the device comprising,
1. publishing industry brain dataset acquisition module: the method is used for collecting various publishing industry data from related data of publishing industry inside and outside industries to obtain a publishing industry brain data set;
2. a common key technology tool set acquisition module covering the publishing industry: the method is used for analyzing and managing the brain data set of the publishing industry by using a data analysis method and a natural language processing method to obtain a common key technical tool set covering the publishing industry;
3. the publishing industry brain decision analysis platform acquisition module: the brain decision analysis platform is used for analyzing a brain data set of the publishing industry aiming at the macroscopic development situation in the publishing industry, and visually displaying an analysis result to obtain a brain decision analysis platform of the publishing industry;
4. the publishing industry brain characteristic analysis tool acquisition module: the method is used for accurately analyzing the specific problems in the microscopic business process of the publishing industry and obtaining the brain characteristic analysis tool of the publishing industry.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
The invention provides a method and a device for constructing an industrial brain facing big publishing data, aiming at the requirements of innovative monitoring and evaluation of big publishing data, carrying out collection, extraction and association processing on various kinds of industrial publishing data, constructing a common technical tool set of various application data analysis methods and natural language processing methods aiming at data information of an industrial publishing chain, and carrying out presentation and visual application of industrial data analysis results on industrial application scenes of different levels, so that the industrial data analysis results can present a medium macroscopic distribution situation of the industrial chain for management decisions, and realize accurate analysis and service of specific problems of the industrial chain on a microscopic level. The problem that the artificial intelligence technology is relatively lost in analysis application of view and macroscopic level in the publishing industry is solved, and support is provided for macroscopic management decision making, middle macroscopic situation analysis and microscopic technical innovation in the publishing industry.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.

Claims (10)

1. A method for constructing industrial brains for publishing big data is characterized by comprising the following steps: comprises the following steps of the method,
s1, acquiring a publishing industry brain data set:
collecting various publishing industry data from related data of publishing industry inside and outside the industry to obtain a publishing industry brain data set;
s2, acquiring a common key technical tool set covering the publishing industry:
analyzing and managing the brain data set of the publishing industry by using a data analysis method and a natural language processing method to obtain a common key technical tool set covering the publishing industry;
s3, acquiring a publishing industry brain decision analysis platform:
analyzing a brain data set of the publishing industry aiming at a macroscopic development situation in the publishing industry, and visually displaying an analysis result to obtain a brain decision analysis platform of the publishing industry;
s4, acquiring a publication industry brain characteristic analysis tool:
and carrying out accurate analysis on specific problems in the microscopic business process of the publishing industry to obtain a brain characteristic analysis tool of the publishing industry.
2. The industrial brain construction method for publishing big data according to claim 1, wherein: step S2 specifically includes the following,
s21, analyzing and managing data information in a brain data set of the publishing industry by using a data analysis method according to the commonality requirement of the data information analysis of the publishing industry to form a data analysis and management tool;
S22, analyzing the text information of the brain data set of the publishing industry by using a natural language processing method according to the common requirements of the text information analysis of the publishing industry data to form a natural language processing and application tool.
3. The industrial brain construction method for publishing big data according to claim 2, characterized in that: the data analysis and management tool comprises one or more of a statistical model and marketing analysis tool, a network analysis and industrial structure analysis tool, a scientific metering modeling and industrial technology analysis tool, a system modeling and enterprise innovation assessment tool, a self-defined index assessment tool and a space-time analysis and industrial evolution tool;
the statistical model and the marketing analysis tool realize statistical analysis, trend prediction and related information recommendation of data by means of statistical analysis, time sequence analysis and collaborative recommendation algorithm based on multi-dimensional and multi-level industry time sequence data of the digital publishing industry;
the network analysis and industry structure analysis tool builds entity portraits of different nodes in an industry chain based on papers, patents and publications in published big data and marketing data, and carries out node relation recognition according to portrait information and builds a complex relation network;
The scientific metering modeling and industrial technology analysis tool takes papers and patent data as analysis objects, and utilizes literature metering, patent metering, knowledge graph and super network model analysis methods to analyze the general outline, development trend, core mechanism, core talents and the current situation of obstetrical research in the publishing industry;
the system modeling and enterprise innovation assessment tool takes an enterprise competition theory as a guide to construct a comprehensive competitive assessment index system of a marketing enterprise;
the self-defined index evaluation tool dynamically and flexibly formulates evaluation indexes according to actual application scenes and designs index weights by an entropy weight method and an analytic hierarchy process to construct an index system tree depending on publishing big data facing to decision requirements of different levels of governments and enterprises;
the space-time analysis and industry evolution tool applies a geospatial analysis technology, analyzes spatial information of publishing data with different dimensions by adopting a k-means and DBSCAN clustering algorithm according to spatial fields of publishing big data, realizes visual expression of spatial layout of the publishing industry, and reveals spatial distribution and association modes of the publishing industry.
4. The industrial brain construction method for publishing big data according to claim 3, wherein: in the network analysis and industrial structure analysis tool,
The entity image is constructed by extracting technical feature words of papers and patent texts by adopting a TF-IDF algorithm and a textRank algorithm, and constructing a technical image of a publishing industry; extracting and analyzing data of paper authors, patentees and institutions, and constructing publishing industry experts and institution portraits; performing semantic analysis and keyword extraction on the publication text data by adopting a BERT semantic analysis algorithm, a k-means and a DBSCAN clustering algorithm, constructing publication classification and semantic portraits, and constructing a publication industry market portraits by combining node extraction results of enterprises and the like in large publication data;
node relation identification is that based on the entity portraits, cooperative relation identification is carried out according to information such as common publication among papers and patents, common publication among patentees and publication information, and competition and potential cooperative relation identification is carried out by applying a similarity discovery algorithm;
and (3) constructing a complex relation network, optimizing keyword clusters by adopting a t-SNE dimension reduction algorithm based on the entity portraits and the relation recognition results, optimizing the relation network layout by adopting a force-guided layout algorithm, and realizing complex network analysis of publishing industry entities.
5. The industrial brain construction method for publishing big data according to claim 2, characterized in that: the natural language processing and application tool comprises one or more of a web crawler and open source data integration tool, a machine translation and international industry dynamic monitoring tool, an entity identification and industry key role analysis tool, a semantic annotation and industry public opinion analysis tool and an information retrieval and personalized knowledge service tool;
The web crawler and open source data integration tool firstly comprehensively acquires webpage data based on a traversal strategy, and then performs duplicate checking and duplicate removal, text classification, entity extraction and sensitive word recognition processing on the initial data to obtain effective data required by a user and construct a related data set;
the machine translation and international industry dynamic monitoring tool crawls information of overseas main publications and overseas main publications by utilizing a crawler technology, and displays crawler contents in a Chinese form by utilizing a machine translation technology, so that book copyright introduction analysis and book product culture overseas analysis are realized;
the entity identification and industry key role analysis tool performs entity identification on the publishing industry data text resource, provides a role label for an identification result, and further determines roles of the entity in a publishing industry chain;
the semantic annotation and industry public opinion analysis tool automatically combing the profound connotation of the publishing content resources through keyword annotation, association annotation, classification annotation and public opinion analysis, and profoundly managing and applying the publishing content resources from the semantic level and profoundly mining the publishing content;
the information retrieval and personalized knowledge service tool forms a reusable data retrieval rule based on the keywords and the information resources, periodically maintains an index structure, provides information retrieval support for users, and simultaneously provides content recommendation for the users by using a recommendation algorithm.
6. The industrial brain construction method for publishing big data according to claim 5, wherein: the semantics are noted in an industrial public opinion analysis tool,
the keyword labeling takes word vectors and dictionaries as technical supports, uses a vector constructor to perform word segmentation, word stopping and framework feature operation on text resources to complete feature construction of indexing texts, uses a correlation finder to perform direct matching and correlation matching, acquires a keyword list of the texts, and finally calculates the correlation degree between the keywords and the texts through a calculation layer to realize keyword labeling of the texts;
the association labeling is characterized in that firstly, keywords are extracted from a text by using a TextRank algorithm to serve as feature description of the text, and then, topic labels describing the text are obtained by using word frequency statistics and label matching operation to realize topic association labeling of text resources;
the classification label selects a corresponding classification algorithm model based on deep learning according to training data, available computing resources and algorithm performance requirement angles, and trains the selected model to realize classification prediction of an input text;
and the public opinion analysis is based on a HowNet emotion dictionary, and subjective information in book comment texts is extracted, analyzed, processed, induced and inferred.
7. The industrial brain construction method for publishing big data according to claim 5, wherein: in the information retrieval and personalized knowledge service tool,
the establishment of the search rule is to establish an association index library between the search rule and the resource by utilizing a semantic annotation result according to the existing resource of the current publishing mechanism;
the index structure maintains an index library established for the retrieval rule, and updates a result set pointed by an index in the library, so that the freshness of data during retrieval is ensured;
the personalized knowledge service filters data which is already recommended to the user by carrying out portrayal on the natural attribute and dynamic behavior of the user and matching the keyword and the weight of the user portrayal and the data marking, simultaneously considers hot data and data with new timeliness in a database, and then carries out fine-ranking recommendation on the recommended data according to the matching weight and the specific service condition; and when the user and the content are large, adopting a recall strategy to reduce recommended content.
8. The industrial brain construction method for publishing big data according to claim 1, wherein: step S3 specifically includes the following,
s31, carrying out data analysis and visual display from a large-scale level of the publishing industry on the basis of macroscopic policies, the Internet and enterprise data, and realizing the depiction of macroscopic layout of the publishing industry;
S32, based on enterprise, technology and product data, combining a social network analysis method, carrying out data analysis and visual display from the full chain level of the publishing industry, and realizing the description of the static space layout and dynamic development rule results of the publishing industry;
s33, based on the specific enterprise and related industrial data, data analysis and visual display are carried out from the aspect of the strong main body of the publishing industry, and the development situation of the enterprise and the expression of the related relation between the enterprise and the industry field are realized.
9. The industrial brain construction method for publishing big data according to claim 1, wherein: specific problems in the microscopic business process of the publishing industry include one or more of publishing topic-selection auxiliary content analysis, author intelligent query, publication heat analysis and publication marketing data analysis.
10. An industrial brain construction device for publication big data is characterized in that: comprising the steps of (a) a step of,
publishing industry brain dataset acquisition module: the method is used for collecting various publishing industry data from related data of publishing industry inside and outside industries to obtain a publishing industry brain data set;
a common key technology tool set acquisition module covering the publishing industry: the method is used for analyzing and managing the brain data set of the publishing industry by using a data analysis method and a natural language processing method to obtain a common key technical tool set covering the publishing industry;
The publishing industry brain decision analysis platform acquisition module: the brain decision analysis platform is used for analyzing a brain data set of the publishing industry aiming at the macroscopic development situation in the publishing industry, and visually displaying an analysis result to obtain a brain decision analysis platform of the publishing industry;
the publishing industry brain characteristic analysis tool acquisition module: the method is used for accurately analyzing the specific problems in the microscopic business process of the publishing industry and obtaining the brain characteristic analysis tool of the publishing industry.
CN202311336925.9A 2023-10-16 2023-10-16 Industrial brain construction method and device for publishing big data Pending CN117333037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311336925.9A CN117333037A (en) 2023-10-16 2023-10-16 Industrial brain construction method and device for publishing big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311336925.9A CN117333037A (en) 2023-10-16 2023-10-16 Industrial brain construction method and device for publishing big data

Publications (1)

Publication Number Publication Date
CN117333037A true CN117333037A (en) 2024-01-02

Family

ID=89278984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311336925.9A Pending CN117333037A (en) 2023-10-16 2023-10-16 Industrial brain construction method and device for publishing big data

Country Status (1)

Country Link
CN (1) CN117333037A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540747A (en) * 2024-01-09 2024-02-09 《全国新书目》杂志有限责任公司 Book publishing intelligent question selecting system based on artificial intelligence
CN117591676A (en) * 2024-01-19 2024-02-23 数据空间研究院 Method for identifying enterprise on industrial chain of Coarse-to-fine
CN117786131A (en) * 2024-02-23 2024-03-29 广东省投资和信用中心(广东省发展和改革事务中心) Industrial chain safety monitoring analysis method, medium and equipment
CN117951357A (en) * 2024-03-25 2024-04-30 中国标准化研究院 Dynamic scientific and technological standard monitoring method and system based on big data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540747A (en) * 2024-01-09 2024-02-09 《全国新书目》杂志有限责任公司 Book publishing intelligent question selecting system based on artificial intelligence
CN117540747B (en) * 2024-01-09 2024-04-16 《全国新书目》杂志有限责任公司 Book publishing intelligent question selecting system based on artificial intelligence
CN117591676A (en) * 2024-01-19 2024-02-23 数据空间研究院 Method for identifying enterprise on industrial chain of Coarse-to-fine
CN117591676B (en) * 2024-01-19 2024-04-05 数据空间研究院 Method for identifying enterprise on industrial chain of Coarse-to-fine
CN117786131A (en) * 2024-02-23 2024-03-29 广东省投资和信用中心(广东省发展和改革事务中心) Industrial chain safety monitoring analysis method, medium and equipment
CN117951357A (en) * 2024-03-25 2024-04-30 中国标准化研究院 Dynamic scientific and technological standard monitoring method and system based on big data

Similar Documents

Publication Publication Date Title
CN111737495B (en) Middle-high-end talent intelligent recommendation system and method based on domain self-classification
Li et al. DeepPatent: patent classification with convolutional neural networks and word embedding
CN110633373B (en) Automobile public opinion analysis method based on knowledge graph and deep learning
Buber et al. Web page classification using RNN
CN117333037A (en) Industrial brain construction method and device for publishing big data
Benabderrahmane et al. On the predictive analysis of behavioral massive job data using embedded clustering and deep recurrent neural networks
CN112632228A (en) Text mining-based auxiliary bid evaluation method and system
CN107895303B (en) Personalized recommendation method based on OCEAN model
CN114048305B (en) Class case recommendation method of administrative punishment document based on graph convolution neural network
CN114254201A (en) Recommendation method for science and technology project review experts
Rithish et al. Automated assessment of question quality on online community forums
Ozcan et al. Human resources mining for examination of R&D progress and requirements
Lin Sentiment analysis of e-commerce customer reviews based on natural language processing
Barberá et al. Methodological challenges in estimating tone: Application to news coverage of the US economy
Ayoobkhan et al. Web page recommendation system by integrating ontology and stemming algorithm
CN110717089A (en) User behavior analysis system and method based on weblog
Sadesh et al. Automatic Clustering of User Behaviour Profiles for Web Recommendation System.
Melba Rosalind et al. Predicting students’ satisfaction towards online courses using aspect-based sentiment analysis
Du et al. An iterative reinforcement approach for fine-grained opinion mining
Qian et al. Multi-hop interactive attention based classification network for expert recommendation
Kim et al. High-quality train data generation for deep learning-based web page classification models
Wang et al. SOTagRec: A combined tag recommendation approach for stack overflow
Santos et al. Do papers (really) match journals’“aims and scope”? A computational assessment of innovation studies
Chebil et al. Clustering social media data for marketing strategies: Literature review using topic modelling techniques
Roy et al. Automated resume classification using machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination