CN117009622A

CN117009622A - Training method of public opinion root cause analysis model and public opinion root cause analysis method

Info

Publication number: CN117009622A
Application number: CN202310893016.9A
Authority: CN
Inventors: 刘琳琅; 朱治潮; 黄修添; 许文浩; 姚信威; 李强; 邢伟伟; 陆琦超; 袁知恒
Original assignee: Zhejiang University of Technology ZJUT; Alipay Hangzhou Information Technology Co Ltd
Current assignee: Zhejiang University of Technology ZJUT; Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-07-19
Filing date: 2023-07-19
Publication date: 2023-11-07

Abstract

The embodiment of the specification discloses a training method and a root cause analysis method of a public opinion root cause analysis model, and relates to the field of natural language processing. In the specification, clustering processing is performed on a plurality of training data by extracting multi-mode mixed features of the plurality of training data, and a knowledge graph corresponding to the plurality of training data and related to public opinion root causes is constructed, so that a public opinion root cause analysis model is obtained through training. Further, when public opinion root cause analysis is performed on the data to be analyzed, the public opinion root cause of the data to be analyzed is predicted by extracting multi-mode mixed features corresponding to the data to be analyzed based on a public opinion root cause analysis model and constructing a knowledge graph and performing graph completion on the knowledge graph of the data to be analyzed.

Description

Training method of public opinion root cause analysis model and public opinion root cause analysis method

Technical Field

The present disclosure relates to the field of natural language processing, and in particular, to a training method for a public opinion root cause analysis model and a public opinion root cause analysis method.

Background

Public opinion (abbreviated as "public opinion") refers to the sociality of a subject people in a certain social space around the occurrence, development and change of an intermediate social event, which is oriented and held by a subject social manager, enterprise, individual and other various organizations, their society, morals and the like. It is the sum of expressions of beliefs, attitudes, ideas, emotions, and the like expressed by many masses about various phenomena, problems in society. In the context of digital commercialization and open ecology, more and more service providers are beginning to serve users in the form of applets and/or HTML5 technology. Behind the prosperous digitized ecology, the number of public opinion questions users feed back for applets or/HTML 5 is also growing. Public opinion is an effective mechanism for discovering spam on-line. And the quick and accurate root cause analysis is performed aiming at the public opinion fed back by the user, so that the corresponding processing is performed aiming at the public opinion based on the root cause, thereby being beneficial to controlling the influence surface and the influence degree of the problem.

However, at present, a set of better intelligent recognition schemes do not exist for root cause analysis and positioning of public opinion. The root cause analysis of the manual intervention is not only low in efficiency, but also has the problems of long time consumption and the like.

Disclosure of Invention

The embodiment of the specification provides a training method and a root cause analysis method of a public opinion root cause analysis model, which can enhance the accuracy of the public opinion root cause analysis and improve the public opinion root cause analysis efficiency. The technical scheme is as follows:

in a first aspect, embodiments of the present disclosure provide a training method for a public opinion root analysis model, where the method includes: extracting features of the plurality of training data to obtain mixed features corresponding to the training data;

performing multiple clustering processing on the plurality of training data according to the mixed characteristics corresponding to the training data to obtain at least one clustering result, wherein the clustering result is a data set comprising at least one training data;

according to at least one mixed characteristic respectively corresponding to the training data included in each clustering result, obtaining a knowledge graph corresponding to each clustering result and related to the public opinion root cause;

and training the public opinion root cause analysis model to be trained according to the at least one clustering result and the knowledge graph corresponding to each clustering result until the public opinion root cause analysis model is obtained.

In a second aspect, embodiments of the present disclosure provide a public opinion root analysis method, the method including:

acquiring data to be analyzed;

extracting features of the data to be analyzed according to a public opinion root analysis model to obtain mixed features corresponding to the data to be analyzed, wherein the public opinion root analysis model is a model trained by adopting the training method of the public opinion root analysis model according to the first aspect;

according to the mixed characteristics corresponding to the data to be analyzed, performing characteristic matching on the data to be analyzed and at least one clustering result included in the public opinion root cause analysis model to obtain a target clustering result corresponding to the data to be analyzed;

and obtaining a knowledge graph corresponding to the data to be analyzed and related to the public opinion root cause according to the public opinion root cause analysis model and the knowledge graph corresponding to the target clustering result.

In a third aspect, embodiments of the present disclosure provide a training apparatus for a public opinion root analysis model, where the apparatus includes:

the feature extraction module is used for extracting features of the plurality of training data to obtain mixed features corresponding to the training data;

the multi-clustering module is used for carrying out multi-clustering processing on the plurality of training data according to the mixed characteristics corresponding to the training data to obtain at least one clustering result, wherein the clustering result is a data set comprising at least one training data;

The atlas construction module is used for obtaining knowledge atlas corresponding to each clustering result and related to public opinion root cause according to the mixed characteristics respectively corresponding to at least one training data included in each clustering result;

and the model training module is used for training the public opinion root cause analysis model to be trained according to the at least one clustering result and the knowledge graph corresponding to each clustering result until the public opinion root cause analysis model is obtained.

In a fourth aspect, embodiments of the present disclosure provide a public opinion root analysis device, the device including:

the data acquisition module is used for acquiring data to be analyzed;

the extraction feature module is used for carrying out feature extraction on the data to be analyzed according to a public opinion root cause analysis model to obtain mixed features corresponding to the data to be analyzed, wherein the public opinion root cause analysis model is a model trained by adopting the training method of the public opinion root cause analysis model according to the first aspect;

the feature matching module is used for carrying out feature matching on the data to be analyzed and at least one clustering result included in the public opinion root analysis model according to the mixed features corresponding to the data to be analyzed to obtain a target clustering result corresponding to the data to be analyzed;

And the root cause analysis module is used for obtaining a knowledge graph corresponding to the data to be analyzed and related to the public opinion root cause according to the public opinion root cause analysis model and the knowledge graph corresponding to the target clustering result.

In a fifth aspect, the present description provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a sixth aspect, the present description provides a computer program product storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a seventh aspect, embodiments of the present disclosure provide an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

The technical scheme provided by some embodiments of the present specification has the following beneficial effects:

in the specification, clustering processing is performed on a plurality of training data by extracting multi-mode mixed features of the plurality of training data, and a knowledge graph corresponding to the plurality of training data and related to public opinion root causes is constructed, so that a public opinion root cause analysis model is obtained through training. Further, when public opinion root cause analysis is performed on the data to be analyzed, the public opinion root cause of the data to be analyzed is predicted by extracting multi-mode mixed features corresponding to the data to be analyzed based on a public opinion root cause analysis model and constructing a knowledge graph and performing graph completion on the knowledge graph of the data to be analyzed. Because of the noise in the plurality of training data, the noise is added into the knowledge graph as a normal node when the knowledge graph corresponding to the plurality of training data is constructed, so that the accuracy of predicting the public opinion root cause of the data to be analyzed in a graph complement mode is affected. Therefore, the method and the device effectively find and remove noise in a plurality of training data through clustering, improve the accuracy of knowledge graph construction and further improve the accuracy of predicting the root cause of public opinion. Compared with a method for predicting the root cause of the public opinion only through clustering, the method for predicting the root cause of the public opinion based on the clustering is higher in prediction efficiency and richer in prediction result.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present description, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a training method of a public opinion root analysis model according to an embodiment of the present disclosure;

fig. 2 is a flow chart of a training method of a public opinion root analysis model according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart of a hybrid feature extraction provided in an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart of a multiple clustering process provided in an embodiment of the present disclosure;

fig. 5 is a schematic flow chart of knowledge graph construction according to an embodiment of the present disclosure;

fig. 6 is a flowchart of a training method of a public opinion root analysis model according to an embodiment of the present disclosure;

fig. 7 is a schematic flow chart of a training picture processing provided in the embodiment of the present disclosure;

FIG. 8 is a flow chart of a method for analyzing a public opinion cause according to an embodiment of the present disclosure

FIG. 9 is a flow chart of a method for analyzing a public opinion cause according to an embodiment of the present disclosure

Fig. 10 is a schematic structural diagram of a training device for a public opinion root analysis model according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a public opinion root analysis device according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present specification, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The present specification is described in detail below with reference to specific examples.

It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, object features, interactive behavior features, user information, and the like referred to in this specification are all acquired with sufficient authorization.

Public opinion (abbreviated as "public opinion") refers to the sociality of a subject people in a certain social space around the occurrence, development and change of an intermediate social event, which is oriented and held by a subject social manager, enterprise, individual and other various organizations, their society, morals and the like. It is the sum of expressions of beliefs, attitudes, ideas, emotions, and the like expressed by many masses about various phenomena, problems in society. The cause of public opinion can be understood as the cause of the public opinion of the product.

In one embodiment, as shown in fig. 1, an architecture diagram of a training method of a public opinion root analysis model provided in the present specification is provided, where the architecture diagram includes: the public opinion root analysis server 101 and a plurality of electronic devices for transmitting training data or data to be analyzed related to public opinion, wherein the plurality of electronic devices at least comprises an electronic device 1021, an electronic device 1022 and an electronic device 1023. It should be understood that the public opinion root analysis server 101 and the number of electronic devices shown in fig. 1 are only illustrative, and the embodiment of the present disclosure is not limited in this respect.

The public opinion root analysis server 101 may be understood as a server or a cluster of multiple servers, where the public opinion root analysis server 101 is configured to receive requests or information through multiple interfaces set up, and provide corresponding data or services based on the requested content of the requests. The plurality of servers may be a plurality of physical servers, and the plurality of physical servers are independent in hardware; or multiple servers are in multiple virtual servers, the multiple virtual servers are deployed in the same hardware resource pool, and the deployment modes of the virtual servers include but are not limited to: VMware, virtual Box, and Virtual PC.

Electronic devices including, but not limited to, physical or virtual servers, mobile Stations (MSs), mobile Terminal devices (Mobile terminals), mobile phones (Mobile telephones), handsets (handsets), and portable devices (portable equipment), bluetooth headsets, smart watches, etc. may communicate with one or more core networks via a radio access network (Radio Access Network, RAN). It is understood that the embodiment of the present disclosure is not limited to the type of electronic device described above.

In the embodiment of the present specification, the electronic device may further be provided with a display device, and the display device may be various devices capable of implementing a display function, for example: the display device may be a cathode ray tube display (Cathode raytubedisplay, CR), a Light-emitting diode display (Light-emitting diodedisplay, LED), an electronic ink screen, a liquid crystal display (Liquid crystal display, LCD), a plasma display panel (Plasma displaypanel, PDP), or the like. The user may view the error information when the application or program reports errors by using the display device on the electronic device, and send an instruction to the electronic device through the display device, for example, by performing long-press or click or double-click operations on the display device of the electronic device, where the instruction includes sending, by the electronic device, public opinion data that the running of the target application has errors to the public opinion root cause analysis server 101.

The public opinion root analysis server 101, the electronic device 1021, the electronic device 1022, and the electronic device 1023 may communicate through a communication link established based on a communication protocol, for example: gRPC Protocol, gRPC is a high-performance, general open-source remote server call (Remote Procedure Call, RPC) framework, which is mainly developed for mobile applications and designed based on HTTP/2 Protocol standards, is developed based on Protocol Buffers (PB) serialization protocols, and supports numerous development languages. In addition, the communication link may also be a wireless communication link or a wired communication link, such as: the wired communication link may include an optical fiber, twisted pair or coaxial cable, and the WIreless communication link may include a Bluetooth communication link, a WIreless-FIdelity (Wi-Fi) communication link, a microwave communication link, or the like.

The public opinion root analysis server 101 is configured to receive public opinion data sent by a user through an electronic device, and execute a training method or a public opinion root analysis method of a public opinion root analysis model according to the public opinion data. The public opinion data is data including error information of application or program running, and in this embodiment of the present disclosure, both training data and data to be analyzed are public opinion data, except that training data for training a public opinion root cause analysis model includes labeling information, where the labeling information is related to a public opinion root cause of the training data. It can be understood that the public opinion root analysis server 101, the electronic device 1021, the electronic device 1022, and the electronic device 1023 also have other service capabilities and functions to accomplish the tasks in the following embodiments. For example, the public opinion root analysis server 101 also provides portal services, resource management services, and CI/CD services, etc.

In one embodiment, as shown in fig. 2, a flow chart of a training method of a public opinion root analysis model according to the present disclosure is provided, and the method may be implemented by a computer program and may be executed on a training device based on a public opinion root analysis model of von neumann system. The computer program may be integrated in the application or may run as a stand-alone tool class application.

Specifically, the training method of the public opinion root cause analysis model comprises the following steps:

s102, extracting features of the training data to obtain mixed features corresponding to the training data.

The training data is public opinion data comprising annotation information related to public opinion roots. The labeling information corresponding to the training data is artificial labeling, namely the training data is public opinion data with known public opinion root cause analysis results. According to the public opinion root cause analysis method and device, the public opinion root cause analysis model is trained through training data with known public opinion root cause analysis results, so that public opinion root cause analysis is carried out on data to be analyzed with unknown public opinion root cause. It will be appreciated that the number of training data in this specification is a large number, for example, 5 ten thousand training data.

In this specification, a hybrid feature is feature information extracted by different feature extraction means or in different types of information. Because a single feature information can only characterize a certain aspect of the training data, a single-sided feature information alone often cannot fully characterize the training data. Therefore, the accuracy of the public opinion root cause analysis model obtained through training according to the single characteristic information is low, and even a large vulnerability appears. In the embodiment of the specification, the representation capability of the training data is enhanced by extracting the mixed characteristics corresponding to each training data, so that the accuracy of carrying out public opinion root cause analysis on the data to be trained by adopting the public opinion root cause analysis model is improved.

Specifically, in this embodiment, feature extraction is performed according to training data including labeling information, and hybrid features including picture features and text features of the training data are obtained. As shown in fig. 3, fig. 3 is a schematic flow chart of a hybrid feature extraction provided in an embodiment of the present disclosure. In this embodiment, taking the training data Train date101 as an example, the training data 101 includes a training picture Image1011 and a training text Content1012, and the training data 101 corresponds to the labeling information 102. The label information 102 corresponding to the training data 101 at least includes error result error mag1021, error Code1022 and error Type1023. The error type1023 characterizes a type of an application or program running error corresponding to the training data 101, for example, the error type1023 is Js error. Error result 1021 characterizes the result of the corresponding application or program running error of the training data 101, e.g., error result 1021 is unreadable Cannot read property. The error code1022 is a code corresponding to a preset error type when the application or program has an operation error, for example, the error code1022 is h5_al_page_jserror.

It is to be understood that the training data 101 comprising training pictures 1011 and training words 1012 is only schematic, and that the present description also comprises training data comprising training pictures only and training data comprising training words only. The method for extracting the mixed features for the training data may refer to the process of extracting the picture features 103 and the process of extracting the text features 104, respectively.

The training picture 1011 and the training Text 1012 included in the training data 101 are preprocessed, and the preprocessed training picture 1011 and the preprocessed training Text 1012 are respectively subjected to feature extraction to obtain picture features Vision features 103 corresponding to the training picture 1011 and Text features 104 corresponding to the training Text 102. Further fusing the picture features 103 and the text features 104 to obtain the hybrid feature Fusion 105 corresponding to the training data 101.

In this embodiment, the picture feature103 and the text feature104 are extracted by a plurality of classifiers. For example, picture feature103 is extracted by residual network ResNET152 model, and text feature104 is extracted by bi-directional encoder BERT model. For another example, text feature104 and picture feature103 are extracted by XGBoost classifier, ligtgbm classifier, or GBDT classification, respectively. For another example, the picture feature103 and the text feature104 are extracted by combining a plurality of classifiers. For example, feature extraction is performed on training text 1012 by XGBoost classifier and LigthGBM classifier. Wherein, the XGBoost classifier adopts a level-wise splitting strategy, and the LigthGBM classifier adopts a leaf-wise strategy. In this embodiment, the image features and the text features are extracted by different classifiers or classifier combinations respectively, so that the accuracy of extracting the image features and the text features is improved, and the expression capability of the mixed features on the training data is improved.

After the picture feature 103 and the text feature 104 are obtained, the picture feature 103 and the text feature 104 are fused, so that the mixed feature 105 corresponding to the training data 101 is obtained. For example, after obtaining the image feature 103 corresponding to the assembled vector representation and the assembled vector representation of the text feature 104, the two assembled vector representations are fused to obtain a fusion vector, i.e. the hybrid feature 105.

In one embodiment, the preprocessing shown in fig. 3 includes picture preprocessing and text preprocessing. The picture preprocessing includes the processing steps of noise reduction processing, graying processing, threshold segmentation, character recognition processing and the like for the training picture 1011. For example, training the training pictures using a contrast learning model simCLR, such that the finally obtained picture features 103 have similar zooming-in, dissimilar zooming-out effects.

Word preprocessing includes processing steps such as word cleaning, semantic analysis, etc., of training words 1012. Specifically, semantic analysis is a technology for summarizing and analyzing text information, extracts key content of a core in the text information through text analysis of the information, and performs semantic analysis on training text 1012 to determine keywords corresponding to the training text 1012, so that feature extraction is performed on the keywords to obtain text features 104, and the expression capability of the extracted text features 104 for representing the training text 1012 can be improved.

For example, in training text 1012, the same word has different meanings expressed in different contexts, and thus, physical Disambiguation (Disambiguation) is required. The purpose of entity disambiguation is to correspond the same word to different entities according to different contexts, e.g. for JMS, it may be determined that NBA ball stars appear in the context of basketball, and it may be determined that movie directors appear in the context of movie relevance. Similarly, in practical applications, it may happen that two words correspond to the same entity, such as "beijing" and "capital of the country", and the two words are literally two different entities, but actually refer to the same entity, so that the entity normalization (Entity Resolution) operation needs to be performed on multiple candidate entities in the training text 1012. Reference resolution (Co-reference Resolution) is also an important step in preprocessing, where there are typically many references such as "he", "it", "they", etc., in training words 1012, and where preprocessing is also required to determine the entity to which each reference corresponds. If the first page of the applet is not opened for the training text 1012, it is blacked out, wherein the reference is resolved, the reference is determined to be specifically referred to as a "web page" rather than the applet. Through preprocessing such as entity disambiguation, entity normalization, reference resolution and the like, the expression capability of the extracted character feature representation training characters can be improved, so that the expression capability of the obtained mixed feature representation training data is improved.

S104, carrying out multiple clustering treatment on the training data according to the mixed characteristics corresponding to the training data to obtain at least one clustering result.

The clustering result is a dataset comprising at least one training data. In other words, the plurality of training data are clustered into at least one data set by a plurality of clustering processes, and there is a similarity between the mixed features of the plurality of training data in each data set. Because the embodiment of the specification adopts the mixed characteristic at least comprising the picture characteristic and the text characteristic, the training data is clustered for multiple times according to the mixed characteristic, so that the clustering effect of clustering the training data can be improved, and the accuracy of at least one obtained clustering result is improved.

Specifically, as shown in fig. 4, fig. 4 is a schematic flow chart of a multiple clustering process provided in the embodiment of the present disclosure. The plurality of training data includes at least training data 2011, training data 2012, training data 2013, and the like. And the mixed feature corresponding to each training data is obtained through S102, as shown in fig. 4, the training data 2011 corresponds to the mixed feature 2021, the training data 2012 corresponds to the mixed feature 2022, and the training data 2013 corresponds to the mixed feature 2023.

And clustering the training data according to the mixed characteristics corresponding to the training data to obtain at least one preliminary clustering result, wherein the at least one preliminary clustering result at least comprises a preliminary clustering result 2031 and a preliminary clustering result 2032. Each preliminary clustering result includes at least one training data, and as shown in fig. 4, the preliminary clustering result 2031 includes at least training data 2011, training data 2013, training data 2015, training data 2016, and training data 2018. It will be understood that the number of preliminary clustering results and the number of training data included in each preliminary clustering result in fig. 4 are only schematic, and the present specification does not limit this.

Further, clustering processing is carried out on at least one training data included in each preliminary clustering result according to the plurality of preliminary clustering results, and at least one clustering result corresponding to each preliminary clustering result is obtained. For example, the clustering process is performed on the training data 2011, the training data 2013, the training data 2015, the training data 2016, and the training data 2018 at least included in the preliminary clustering result 2031, so as to obtain a plurality of clustering results corresponding to the preliminary clustering result 2031. The plurality of clustering results corresponding to the preliminary clustering result 2031 at least includes a clustering result 2041, a clustering result 2042, and a clustering result 2043, and the clustering result 2042 includes training data 2013 and training data 2015, and the clustering result 2043 includes training data 2018. And each clustering result comprises labeling information corresponding to each training data. As shown in fig. 4, the clustering result 2041 includes training data 2011 and training data 2016, the training data 2011 corresponds to the labeling information 2051, the training data 2016 corresponds to the labeling information, the clustering result 2042 includes at least the labeling information 2052, and the clustering result 2043 includes at least the labeling information 2053. It will be understood that the number of clustering results and the number of training data included in each clustering result in fig. 4 are only schematic, and the present specification does not limit this.

In this embodiment, the method of the multiple clustering processing may be a clustering (classification) algorithm or a classification (classification) algorithm in machine learning, so that the multiple training data are clustered to obtain multiple clustering results. The clustering process may also be any suitable clustering algorithm that is conceivable by those skilled in the art, such as K-means clustering, DBSCAN clustering, quantum clustering, or any suitable classification algorithm that is conceivable by those skilled in the art, such as linear regression, logistic regression, KNN, support vector machine, and the like.

S106, obtaining a knowledge graph corresponding to each clustering result and related to the public opinion root cause according to the mixed characteristics respectively corresponding to at least one training data included in each clustering result.

The knowledge graph is a semantic network for revealing the relationship between labels. In the specification, the knowledge graph corresponding to each clustering result comprises at least one label related to the public opinion root cause, and the label related to the public opinion root cause is obtained through mixed features and labeling information respectively corresponding to at least one training data included in each clustering result.

As shown in fig. 5, fig. 5 is a schematic flow chart of knowledge graph construction according to an embodiment of the present disclosure. Fig. 5 is an example of a clustering result 301 among a plurality of clustering results, and a method for constructing a knowledge graph corresponding to other clustering results may refer to fig. 5. The clustering result 301 at least comprises training data 3011, training data 3012 and the like, the labeling information 302 corresponding to the clustering result 302 at least comprises labeling information 3021, labeling information 3022 and the like, and the mixed feature 303 corresponding to the clustering result 302 at least comprises mixed feature 3031 and mixed feature 3032, wherein the training data 3011, the labeling information 3021 and the mixed feature 3031 are in one-to-one correspondence, and the training data 3012, the labeling information 3022 and the mixed feature 3032 are in one-to-one correspondence. It will be understood that the number of training data included in the clustering result in fig. 5 is merely illustrative, and the present disclosure does not limit this, and the training data in the clustering result corresponds to the labeling information and the mixing features.

Further, knowledge graph construction is performed according to the labeling information and the mixed features corresponding to the clustering result 301, and a Knowledge graph knowledgegraph 304 corresponding to the clustering result 301 is obtained. The knowledge graph304 includes a plurality of labels related to the public opinion root cause, as shown in fig. 5, the knowledge graph304 includes feedback intention Cls points 3041 corresponding to the clustering result 301, an application identifier App id 3042, an error Type3044, an error Code3045, a picture feature Image3046, a text feature Content3047, and a feedback subject Dm subject3048, and the labels have a hierarchical relationship. It should be understood that the knowledge graph304 corresponding to the clustering result in fig. 5 is only schematic, and the present disclosure is not limited thereto.

Specifically, first, a plurality of mixed features corresponding to the clustering result 301 are fused to obtain a fused feature corresponding to the clustering result 301. The above fusion method may be that the average value of the fusion features represented by the multiple vectors is calculated, so as to obtain the fusion feature corresponding to the clustering result 301.

Secondly, after the fusion features and the labeling information 302 corresponding to the clustering result 301 are obtained, the fusion features and the labeling information 302 corresponding to the clustering result 301 extract entity meanings to generate corresponding labels and hierarchical relations among the labels. Entity extraction, also known as named entity recognition (named entity recognition, NER), refers to the automatic recognition of named entities from fused features and labeling information. For example, for the labeling information 3021, the error result error mag, the error Code and the error Type included in the labeling information 3021 may be directly used as the entity to generate a label, and for the text "the first page of the applet is black-shielded, the text is not opened", and through entity extraction, the entity "the first page", "the black screen", and the applet identification ID "may be obtained as the label. After the labels are obtained, because the labels are a series of discrete named entities, the association relation among the labels is determined through relation extraction, and the labels are connected through the relation to form the knowledge graph of the network structure. For example, taking the example of the text "the first page of the applet is blacked out and not opened", three labels "the applet identification ID", "the first page" and "the blacked out" are sequentially progressive relations.

And finally, generating a knowledge graph corresponding to the clustering result based on the association relation among the labels. The labeling information 302 is also converted into vector representations similar to the fusion features, the similarity between the vector representations is calculated, the vector with the similarity larger than the threshold value is determined to be a similar vector, and the association relationship among the labels and the similarity between the vector are combined, the labels corresponding to the vectors are connected and laid out, so that a knowledge graph 304 corresponding to the clustering result 301 shown in fig. 5 is generated.

It can be understood that the present specification may further include other methods for obtaining a knowledge-graph according to the mixed features respectively corresponding to the at least one training data included in the clustering result, for example, by constructing a known knowledge-graph construction model.

S108, training the to-be-trained public opinion root cause analysis model according to at least one clustering result and the knowledge graph corresponding to the clustering result until the public opinion root cause analysis model is obtained.

In this embodiment, a knowledge base may be constructed according to knowledge maps corresponding to each clustering result. The knowledge base can be understood as a knowledge-based intelligent system, consists of a plurality of knowledge maps, and is a structured, easy-to-operate, easy-to-use and comprehensive and organized knowledge cluster. The constructed knowledge base can provide better help in the process of training the public opinion root cause analysis model to be trained and carrying out public opinion root cause analysis on the data to be analyzed.

The public opinion root model to be trained at least comprises a plurality of classifiers for extracting features of each training data, a clustering algorithm or a clustering model for carrying out multiple clustering treatment on each training data, an algorithm or a model for constructing a knowledge graph, and the like. Training the public opinion root factor model to be trained through at least one clustering result and a knowledge graph corresponding to the clustering result until training reaches a preset round or the training effect reaches the expected value, and obtaining a public opinion root factor analysis model so as to conduct public opinion root factor analysis on data to be analyzed through the public opinion root factor analysis model.

In one embodiment, as shown in fig. 6, a flow chart of a training method of a public opinion root analysis model according to the present disclosure is provided, and the method may be implemented by a computer program and may be executed on a training device based on a public opinion root analysis model of von neumann system. The computer program may be integrated in the application or may run as a stand-alone tool class application.

s202, acquiring a plurality of training data, wherein each training data comprises training pictures and/or training characters.

The method comprises the steps of acquiring a plurality of training data, wherein the acquisition mode comprises any active or passive legal acquisition mode. For example, the user marks the public opinion data uploaded by the electronic device by a technician, and training data including marking information is obtained. Each training data includes training pictures and/or training words. In other words, the training data is classified into three types, that is, first training data including training pictures and training characters, second training data including only training pictures, and third training data including only training characters, respectively.

S204, extracting picture features and text features corresponding to the training data according to the training data.

In one embodiment, according to target training data at least comprising training pictures, threshold segmentation processing is carried out on the training pictures in the target training data to obtain segmented pictures corresponding to the training pictures in the target training data; and extracting picture features and text features corresponding to the target training data at least comprising the segmented pictures according to the target training data at least comprising the segmented pictures.

As shown in fig. 7, fig. 7 is a schematic flow chart of a training picture processing provided in the embodiment of the present disclosure. And carrying out gray processing on the training picture 401 included in the target training data to obtain a gray picture 402 corresponding to the training picture 401. The gray-scale picture 402 is subjected to threshold segmentation processing, and a binary picture 403 is obtained. And drawing the outline of the binary image 403 to obtain an outline image 404 with the outline corresponding to the marked text content. The contour picture 404 is subjected to a segmentation process to obtain a segmented picture 405. Character recognition OCR is performed on the divided picture 405, thereby obtaining a character area 406 corresponding to the training picture 401. And obtaining the picture features and the text features corresponding to the training data according to the text region 406 corresponding to the training picture 401 and the training picture 401.

In this embodiment, the training pictures in the training data are preprocessed and text recognition is performed, so that a text recognition result effectively reflecting the features of the pictures is obtained, interference of a large amount of background information in the training pictures on the text recognition result is removed, and the extraction accuracy of the text features and the picture features is improved.

In one embodiment, the method for obtaining the picture feature and the text feature comprises the following steps: and extracting picture features and text features corresponding to the target training data at least comprising the segmented pictures through a plurality of feature classifiers according to the target training data at least comprising the segmented pictures. For example, picture feature 103 is extracted by residual network ResNET152 model, and text feature 104 is extracted by bi-directional encoder BERT model. For another example, text feature 104 and picture feature 103 are extracted by XGBoost feature classifier, ligtgbm feature classifier, or GBDT classification, respectively. For another example, the picture feature 103 and the text feature 104 are extracted by combining a plurality of feature classifiers. For example, training text 1012 is feature extracted by XGBoost feature classifier and LigthGBM feature classifier. The XGBoost feature classifier adopts a level-wise splitting strategy, and the LigthGBM feature classifier adopts a leaf-wise strategy. In this embodiment, the image features and the text features are extracted by different feature classifiers or feature classifier combinations respectively, so that the extraction accuracy of the image features and the text features is improved, and the expression capability of the mixed features on the training data is improved.

Because the training data are divided into different types, the feature extraction methods for the different types of training data are not completely the same. Specifically, according to first training data comprising training pictures and training characters, extracting picture features and character features of the training pictures and character features of the training characters in the first training data; and obtaining the mixed characteristics of the first training data according to the picture characteristics and the character characteristics of the training pictures and the character characteristics of the training characters in the first training data.

For another example, according to second training data comprising training pictures, extracting picture features and text features of the training pictures in the second training data; and obtaining the mixed characteristics of the second training data based on the picture characteristics and the text characteristics of the training pictures in the second training data. For the method steps of extracting the picture features and the text features, refer to S102 and fig. 3, which are not described herein again.

In this embodiment, feature extraction of different processes is performed for different types of training data, so that picture features and text features corresponding to the training data can be ensured to be obtained, and therefore, hybrid features corresponding to the training data are obtained according to the picture features and the text features, and the expression capability of the hybrid features on the training data is improved.

S206, obtaining the mixed characteristics of the training data based on the picture characteristics and the text characteristics corresponding to the training data.

Referring to S102 and fig. 3, details are not repeated here.

S208, performing first clustering processing according to the picture features in the mixed features corresponding to the training data to obtain at least one preliminary clustering result.

The preliminary clustering result is a dataset comprising at least one training data. And carrying out first clustering processing on the picture features in the mixed features corresponding to the training data to obtain at least one preliminary clustering result, as shown in fig. 4. In fig. 4, a first clustering process is performed on the plurality of training data to obtain at least one preliminary clustering result, where the at least one preliminary clustering result includes at least a preliminary clustering result 2031 and a preliminary clustering result 2032. Each preliminary clustering result includes at least one training data, and as shown in fig. 4, the preliminary clustering result 2031 includes at least training data 2011, training data 2013, training data 2015, training data 2016, and training data 2018.

S210, performing second clustering processing according to the Chinese character features in the mixed features respectively corresponding to at least one training data included in each preliminary clustering result to obtain at least one clustering result corresponding to each preliminary clustering result.

Further, according to the plurality of preliminary clustering results and the Chinese character features of the mixed features respectively corresponding to the at least one training data included in each preliminary clustering result, clustering processing is carried out on the at least one training data included in each preliminary clustering result, and at least one clustering result corresponding to each preliminary clustering result is obtained. For example, as shown in fig. 4, the preliminary clustering result 2031 is subjected to a second clustering process, so as to obtain a plurality of clustering results corresponding to the preliminary clustering result 2031. The clustering results corresponding to the plurality of preliminary clustering results 2031 at least include a clustering result 2041, a clustering result 2042, and a clustering result 2043.

In one embodiment, the first clustering process corresponds to a DBSCAN cluster and the second clustering process corresponds to a K-means cluster. In this embodiment, the training data is clustered for multiple times according to the hybrid features, so that a clustering effect of clustering the multiple training data can be improved, and accuracy of at least one obtained clustering result is improved.

S212, obtaining a plurality of labels corresponding to the clustering results and related to the public opinion root cause according to the mixed characteristics respectively corresponding to at least one training data included in the clustering results.

As shown in fig. 5, a Knowledge graph is constructed according to the labeling information and the mixed features corresponding to the clustering result 301, so as to obtain a Knowledge graph knowledgegraph 304 corresponding to the clustering result 301. First, a plurality of mixed features corresponding to the clustering result 301 are fused, and fusion features corresponding to the clustering result 301 are obtained. Secondly, after the fusion features and the labeling information 302 corresponding to the clustering result 301 are obtained, the fusion features and the labeling information 302 corresponding to the clustering result 301 extract entity meanings to generate corresponding labels and hierarchical relations among the labels. Referring specifically to fig. 5, details are not repeated here.

S214, obtaining a knowledge graph corresponding to each clustering result according to the labels corresponding to each clustering result.

And finally, generating a knowledge graph corresponding to the clustering result based on the association relation among the labels. As shown in fig. 5, the knowledge graph304 includes a plurality of labels related to the public opinion root, including at least feedback intention Cls points 3041 corresponding to the clustering result 301, application identification App id 3042, error Type3044, error Code3045, picture feature Image3046, text feature Content3047, feedback subject Dm subject3048, and a hierarchical relationship among the labels.

In one embodiment, as shown in fig. 8, a flow chart of a method for analyzing the public opinion root is provided in the present specification, and the method may be implemented by a computer program and may be executed on a public opinion root analysis device based on von neumann system. The computer program may be integrated in the application or may run as a stand-alone tool class application.

Specifically, the public opinion root cause analysis method comprises the following steps:

s302, acquiring data to be analyzed.

The data to be analyzed is acquired, and the acquisition mode comprises any active or passive legal acquisition mode. For example, when a user finds that an applet has an operation error while using the applet, public opinion data corresponding to the operation error is uploaded through the electronic device.

And S304, extracting characteristics of the data to be analyzed according to the public opinion root cause analysis model to obtain mixed characteristics corresponding to the data to be analyzed.

The public opinion root cause analysis model is obtained by training the public opinion root cause analysis model to be trained through the steps shown in fig. 2-6, and the processor performs feature extraction on the data to be analyzed based on the public opinion root cause analysis model to obtain the mixed features corresponding to the data to be analyzed. Fig. 8 is a flow chart of a public opinion root analysis method according to an embodiment of the present disclosure. And obtaining a mixed feature 502 corresponding to the data 501 to be analyzed through the public opinion root analysis model, wherein the mixed feature 502 comprises picture features and text features corresponding to the data 501 to be analyzed.

And S306, performing feature matching on the data to be analyzed and at least one clustering result included in the public opinion root factor analysis model according to the mixed features corresponding to the data to be analyzed, and obtaining a target clustering result corresponding to the data to be analyzed.

The public opinion root analysis model comprises at least one clustering result, and each clustering result corresponds to the fusion characteristic respectively. The method for obtaining the fusion characteristics corresponding to each clustering result comprises the following steps: at least one training data included in each clustering result corresponds to the mixed feature respectively, and average value calculation is carried out on at least one mixed feature corresponding to the clustering result to obtain a fusion feature corresponding to each clustering result, namely a central feature.

As shown in fig. 9, the public opinion root analysis model at least includes a clustering result 5031, a clustering result 5032, and the like, the clustering result 5031 at least includes training data 5041, training data 5042, and the like, and fusion features corresponding to the clustering result 5031 are obtained through mixed features corresponding to the training data 5041 and the training data 5042, respectively. And respectively performing feature matching on the clustering result 5031 and the clustering result 5032 and the data 501 to be analyzed to obtain a target clustering result 5031 which is most similar to the data to be analyzed. For example, the method of feature matching is: and (3) performing cosine similarity calculation on the mixed features 502, fusion features corresponding to the clustering results 5031 and fusion features corresponding to the clustering results 5032, and taking the clustering result with the smallest calculated value as a target clustering result corresponding to the data 501 to be analyzed. It will be appreciated that the clustering result 5031 is shown in fig. 9 as a target clustering result of the data 501 to be analyzed, which is merely illustrative.

And S308, obtaining a knowledge graph corresponding to the data to be analyzed and related to the public opinion root factor according to the public opinion root factor analysis model and the knowledge graph corresponding to the target clustering result.

Specifically, according to the mixed characteristics corresponding to the public opinion root cause analysis module and the data to be analyzed, an initial knowledge graph corresponding to the data to be analyzed is obtained; and carrying out prediction completion on the initial knowledge graph according to the knowledge graph corresponding to the target clustering result to obtain the knowledge graph corresponding to the data to be analyzed and related to the public opinion root cause.

And carrying out knowledge graph construction according to the mixed characteristics corresponding to the data to be analyzed by the public opinion root cause analysis module to obtain an initial knowledge graph corresponding to the data to be analyzed. The initial knowledge graph is incomplete, including a tag deficiency associated with a public opinion root factor, or a lack of association between a plurality of tags. According to the embodiment, the initial knowledge graph is complemented in a graph complement mode, so that a complete knowledge graph corresponding to the data to be analyzed is obtained, and the public opinion root cause corresponding to the data to be analyzed is also obtained through the knowledge graph.

Map completion may also be referred to as link prediction. Specifically, in this embodiment, for a plurality of labels generated by mixed features in an initial knowledge graph of the missing public opinion root cause, namely, node Target nodes, feature matching is performed on the initial knowledge graph corresponding to the data to be analyzed according to a plurality of labels in the knowledge graph corresponding to the Target clustering result, and node Matched nodes similar to the node Target nodes in the knowledge graph corresponding to the Target clustering result are screened out. And further performing similarity calculation on the node Target node and the node Matched node, and taking the node Matched node of the front N similarity rows in the knowledge graph corresponding to the Target clustering result. And finally, predicting and completing the initial knowledge graph according to the N nodes. Where N is a value set as desired, for example, N is 5.

As shown in fig. 9, after obtaining the target clustering result 5032 corresponding to the data 501 to be analyzed, a preliminary knowledge graph 506 corresponding to the data 501 to be analyzed is obtained according to the public opinion root analysis model. And further performing spectrum complementation on the preliminary knowledge graph 506 according to the knowledge graph 5051 corresponding to the target clustering result, thereby obtaining a knowledge graph 507 corresponding to the data 501 to be analyzed. The public opinion root cause corresponding to the data 501 to be analyzed can be obtained through the knowledge graph 507. For example, the data to be analyzed 501 includes the text "the first page of the applet is blocked and not opened", and the root corresponding to the data to be analyzed 501 is finally obtained through the steps shown in fig. 9 because the "applet version does not match with the mobile phone model", or "the applet is being updated".

The following are device embodiments of the present specification that may be used to perform method embodiments of the present specification. For details not disclosed in the device embodiments of the present specification, please refer to the method embodiments of the present specification.

Referring to fig. 10, a schematic structural diagram of a training device for a public opinion root analysis model according to an exemplary embodiment of the present disclosure is shown. The training device of the public opinion root analysis model may be implemented as all or part of the device by software, hardware, or a combination of both. The training device of the public opinion root analysis model comprises a feature extraction module 1001, a multi-clustering module 1002, a map construction module 1003 and a model training module 1004.

The feature extraction module 1001 is configured to perform feature extraction on a plurality of training data, so as to obtain a hybrid feature corresponding to each training data;

the multiple clustering module 1002 is configured to perform multiple clustering processing on the plurality of training data according to the hybrid features corresponding to each training data, so as to obtain at least one clustering result, where the clustering result is a dataset including at least one training data;

the atlas construction module 1003 is configured to obtain a knowledge atlas corresponding to each clustering result and related to public opinion root cause according to at least one mixed feature corresponding to the training data included in each clustering result;

The model training module 1004 is configured to train the public opinion root cause analysis model to be trained according to the at least one clustering result and the knowledge graph corresponding to each clustering result until the public opinion root cause analysis model is obtained.

In one embodiment, the feature extraction module 1001 includes:

the data acquisition unit is used for acquiring the plurality of training data, and each training data comprises a training picture and/or training text;

the feature extraction unit is used for extracting picture features and text features corresponding to the training data according to the training data;

and the mixed extraction unit is used for obtaining the mixed characteristics of the training data based on the picture characteristics and the text characteristics corresponding to the training data.

In one embodiment, a hybrid extraction unit includes:

the threshold segmentation subunit is used for carrying out threshold segmentation processing on the training pictures in the target training data according to the target training data at least comprising the training pictures to obtain segmented pictures corresponding to the training pictures in the target training data;

and the mixed extraction subunit is used for extracting the picture features and the text features corresponding to the target training data at least comprising the segmented pictures according to the target training data at least comprising the segmented pictures.

In one embodiment, the hybrid extraction subunit is specifically configured to extract, according to target training data at least including the segmented picture, a picture feature and a text feature corresponding to the target training data at least including the segmented picture through a plurality of feature classifiers.

In one embodiment, the feature extraction unit includes:

the first extraction subunit is used for extracting the picture characteristics and the character characteristics of the training pictures and the character characteristics of the training characters in the first training data according to the first training data comprising the training pictures and the training characters;

and the first mixing subunit is used for obtaining the mixing characteristics of the first training data according to the picture characteristics and the character characteristics of the training pictures and the character characteristics of the training characters in the first training data.

In one embodiment, the feature extraction unit includes:

the second extraction subunit is used for extracting the picture characteristics and the text characteristics of the training pictures in the second training data according to the second training data comprising the training pictures;

and the second mixing subunit is used for obtaining the mixing characteristics of the second training data based on the picture characteristics and the text characteristics of the training pictures in the second training data.

In one embodiment, the mixed features corresponding to the training data include picture features and text features corresponding to the training data;

a multiple clustering module 1002, comprising:

the first clustering unit is used for carrying out first clustering processing according to the picture features in the mixed features corresponding to the training data to obtain at least one preliminary clustering result, wherein the preliminary clustering result is a data set comprising at least one training data;

and the second clustering unit is used for performing second clustering processing according to the Chinese character features of the mixed features respectively corresponding to the training data included in each preliminary clustering result to obtain at least one clustering result corresponding to each preliminary clustering result.

In one embodiment, the map construction module 1003 includes:

the label generating unit is used for obtaining a plurality of labels corresponding to the clustering results and related to the public opinion root cause according to the mixed characteristics respectively corresponding to at least one training data included in the clustering results;

and the map construction unit is used for obtaining the knowledge maps corresponding to the clustering results according to the labels corresponding to the clustering results.

It should be noted that, when the training device for the public opinion root analysis model provided in the above embodiment performs the training method for the public opinion root analysis model, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the training device of the public opinion root cause analysis model provided in the above embodiment and the training method embodiment of the public opinion root cause analysis model belong to the same conception, which embody detailed implementation processes and are not described herein.

Referring to fig. 11, a schematic structural diagram of a public opinion root analysis device according to an exemplary embodiment of the present disclosure is shown. The public opinion root analysis device may be implemented as all or part of the device by software, hardware, or a combination of both. The public opinion root cause analysis device comprises a data acquisition module 1101, an extracted feature module 1102, a feature matching module 1103 and a root cause analysis module 1104.

A data acquisition module 1101, configured to acquire data to be analyzed;

The extracted feature module 1102 is configured to perform feature extraction on the data to be analyzed according to a public opinion root analysis model, so as to obtain a mixed feature corresponding to the data to be analyzed, where the public opinion root analysis model is a model obtained by training the public opinion root analysis model according to any one of the above claims;

the feature matching module 1103 is configured to perform feature matching on the data to be analyzed and at least one clustering result included in the public opinion root analysis model according to the mixed feature corresponding to the data to be analyzed, so as to obtain a target clustering result corresponding to the data to be analyzed;

and the root cause analysis module 1104 is configured to obtain a knowledge graph related to the public opinion root cause corresponding to the data to be analyzed according to the public opinion root cause analysis model and the knowledge graph corresponding to the target clustering result.

In one embodiment, root cause analysis module 1104 includes:

the first analysis unit is used for obtaining an initial knowledge graph corresponding to the data to be analyzed according to the public opinion root factor analysis module and the mixed characteristics corresponding to the data to be analyzed;

and the second analysis unit is used for carrying out prediction complementation on the initial knowledge graph according to the knowledge graph corresponding to the target clustering result to obtain the knowledge graph corresponding to the data to be analyzed and related to the public opinion root cause.

It should be noted that, when the public opinion root analysis device provided in the above embodiment performs the public opinion root analysis method, only the division of the above functional modules is used for illustrating, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the public opinion root analysis device and the public opinion root analysis method provided in the above embodiments belong to the same concept, which embody detailed implementation process and are not described herein.

The foregoing embodiment numbers of the present specification are merely for description, and do not represent advantages or disadvantages of the embodiments.

The embodiment of the present disclosure further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are adapted to be loaded and executed by a processor, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 9 and will not be repeated herein.

The present disclosure further provides a computer program product, where at least one instruction is stored, where the at least one instruction is loaded by the processor and executed by the processor to implement the training method or the public opinion root analysis model according to the embodiments shown in fig. 1-9, and the specific implementation process may refer to the specific description of the embodiments shown in fig. 1-9, which is not repeated herein.

Referring to fig. 12, a schematic structural diagram of an electronic device is provided in an embodiment of the present disclosure. As shown in fig. 12, the electronic device 1200 may include: at least one processor 1201, at least one network interface 1204, a user interface 1203, a memory 1205, at least one communication bus 1202.

Wherein a communication bus 1202 is used to enable connected communications between these components.

The user interface 1203 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1203 may further include a standard wired interface and a standard wireless interface.

The network interface 1204 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the processor 1201 may include one or more processing cores. The processor 1201 connects various portions within the overall server 1200 using various interfaces and lines, and performs various functions of the server 1200 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1205, and invoking data stored in the memory 1205. Alternatively, the processor 1201 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1201 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 1201 and may be implemented by a single chip.

The Memory 1205 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1205 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory 1205 may be used to store instructions, programs, code sets, or instruction sets. The memory 1205 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described various method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 1205 may also optionally be at least one storage device located remotely from the processor 1201. As shown in fig. 12, an operating system, a network communication module, a user interface module, and an application program for training of a public opinion root analysis model or public opinion root analysis may be included in the memory 1205 as a kind of computer storage medium.

In the electronic device 1200 shown in fig. 12, the user interface 1203 is mainly used as an interface for providing input for a user, and obtains data input by the user; and the processor 1201 may be configured to invoke a training application of the public opinion root analysis model stored in the memory 1205 and specifically perform the following operations:

Extracting features of the plurality of training data to obtain mixed features corresponding to the training data;

In one embodiment, the processor 1201 performs the feature extraction on the plurality of training data to obtain a hybrid feature corresponding to each of the training data, and specifically performs:

acquiring a plurality of training data, wherein each training data comprises a training picture and/or training text;

extracting picture features and text features corresponding to the training data according to the training data;

and obtaining the mixed characteristic of the training data based on the picture characteristic and the text characteristic corresponding to each training data.

In one embodiment, the processor 1201 performs the extracting, according to the plurality of training data, the picture feature and the text feature corresponding to each of the training data, specifically performing:

according to target training data at least comprising training pictures, threshold segmentation processing is carried out on the training pictures in the target training data, and segmented pictures corresponding to the training pictures in the target training data are obtained;

and extracting picture features and text features corresponding to the target training data at least comprising the segmented pictures according to the target training data at least comprising the segmented pictures.

In one embodiment, the processor 1201 performs the extracting, according to the target training data including at least the segmented picture, the picture feature and the text feature corresponding to the target training data including at least the segmented picture, and specifically performs:

and extracting picture features and text features corresponding to the target training data at least comprising the segmented pictures through a plurality of feature classifiers according to the target training data at least comprising the segmented pictures.

According to first training data comprising training pictures and training characters, extracting picture features and character features of the training pictures and character features of the training characters in the first training data;

and obtaining the mixed characteristics of the first training data according to the picture characteristics and the character characteristics of the training pictures and the character characteristics of the training characters in the first training data.

extracting picture features and text features of training pictures in second training data according to the second training data comprising the training pictures;

and obtaining the mixed characteristic of the second training data based on the picture characteristic and the text characteristic of the training picture in the second training data.

the processor 1201 performs the multiple clustering processing on the plurality of training data according to the hybrid features corresponding to the respective training data to obtain at least one clustering result, and specifically performs:

Performing first clustering according to the picture features in the mixed features corresponding to the training data to obtain at least one preliminary clustering result, wherein the preliminary clustering result is a data set comprising at least one training data;

and performing second clustering processing according to the Chinese character features of the mixed features respectively corresponding to at least one training data included in each preliminary clustering result to obtain at least one clustering result corresponding to each preliminary clustering result.

In one embodiment, the processor 1201 executes the mixed feature corresponding to at least one training data included in each of the clustering results to obtain a knowledge graph corresponding to each of the clustering results and related to a public opinion cause, and specifically executes the following steps:

obtaining a plurality of labels corresponding to the clustering results and related to the public opinion root cause according to at least one mixed characteristic respectively corresponding to the training data included in the clustering results;

and obtaining a knowledge graph corresponding to each clustering result according to the plurality of labels corresponding to each clustering result.

In one embodiment, the processor 1201 may be configured to invoke the public opinion root analysis application stored in the memory 1205 and specifically perform the following operations:

Acquiring data to be analyzed;

extracting features of the data to be analyzed according to a public opinion root analysis model to obtain mixed features corresponding to the data to be analyzed, wherein the public opinion root analysis model is a model trained by adopting the training method of any one of the public opinion root analysis models;

In one embodiment, the processor 1201 executes the knowledge graph corresponding to the public opinion cause analysis model and the target clustering result to obtain the knowledge graph corresponding to the data to be analyzed, and specifically executes the following steps:

obtaining an initial knowledge graph corresponding to the data to be analyzed according to the public opinion root cause analysis module and the mixed characteristics corresponding to the data to be analyzed;

and carrying out prediction completion on the initial knowledge graph according to the knowledge graph corresponding to the target clustering result to obtain the knowledge graph corresponding to the data to be analyzed and related to the public opinion root cause.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the claims, which follow the meaning of the claims of the present invention.

Claims

1. A training method of a public opinion root cause analysis model, the method comprising:

2. The training method of the public opinion root analysis model of claim 1, wherein the feature extraction is performed on the plurality of training data to obtain the hybrid features corresponding to the training data, and the training method comprises:

3. The training method of the public opinion root analysis model of claim 2, wherein the extracting, according to the plurality of training data, the picture features and the text features corresponding to the training data includes:

4. The training method of the public opinion root analysis model of claim 3, wherein the extracting, according to the target training data at least including the segmented picture, the picture features and the text features corresponding to the target training data at least including the segmented picture includes:

5. The training method of the public opinion root analysis model of claim 2, wherein the extracting, according to the plurality of training data, the picture features and the text features corresponding to the training data includes:

6. The training method of the public opinion root analysis model of claim 2 or 5, wherein the extracting, according to the plurality of training data, the picture feature and the text feature corresponding to each training data includes:

7. The training method of the public opinion root analysis model of claim 1, wherein the mixed features corresponding to the training data comprise picture features and text features corresponding to the training data;

and performing multiple clustering processing on the training data according to the mixed characteristics corresponding to the training data to obtain at least one clustering result, wherein the clustering processing comprises the following steps:

8. The training method of the public opinion root analysis model of claim 1, wherein the obtaining the knowledge graph corresponding to each clustering result and related to the public opinion root according to the mixed features respectively corresponding to at least one training data included in each clustering result comprises:

9. A method of public opinion root cause analysis, the method comprising:

acquiring data to be analyzed;

extracting features of the data to be analyzed according to a public opinion root analysis model to obtain mixed features corresponding to the data to be analyzed, wherein the public opinion root analysis model is a model trained by the training method of the public opinion root analysis model according to any one of claims 1-8;

10. The public opinion root cause analysis method of claim 9, wherein the obtaining the knowledge graph related to the public opinion root cause corresponding to the data to be analyzed according to the knowledge graph corresponding to the public opinion root cause analysis model and the target clustering result comprises:

11. A training device for a public opinion root analysis model, the device comprising:

12. A public opinion root cause analysis device, the device comprising:

the data acquisition module is used for acquiring data to be analyzed;

the extraction feature module is used for carrying out feature extraction on the data to be analyzed according to a public opinion root analysis model to obtain mixed features corresponding to the data to be analyzed, wherein the public opinion root analysis model is a model trained by the training method of the public opinion root analysis model according to any one of claims 1-8;

13. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 10.

14. A computer program product storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any of claims 1 to 10.

15. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-10.