CN113157866B

CN113157866B - Data analysis method, device, computer equipment and storage medium

Info

Publication number: CN113157866B
Application number: CN202110459121.2A
Authority: CN
Inventors: 黄振宇; 陈思业; 吴文哲; 王磊; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2024-05-14
Anticipated expiration: 2041-04-27
Also published as: CN113157866A; WO2022227196A1

Abstract

The embodiment of the application provides a data analysis method, a device, computer equipment and a storage medium, wherein the method is applied to the technical field of big data and can comprise the following steps: obtaining public opinion data; extracting the entities from the public opinion data to obtain a plurality of entities; extracting the relationships of the entities according to the public opinion data to obtain a plurality of relationship pairs; determining standard naming corresponding to each entity included in each relation pair in the relation pairs; and mapping the relation among the entities included in each relation pair into the relation among the standard names corresponding to the entities included in each relation pair. By adopting the application, effective information can be extracted from public opinion data to discover the potential relation among things. The application relates to a blockchain technology, such as abstract information of public opinion data can be obtained from a blockchain, and the public opinion data can be queried based on the abstract information.

Description

Data analysis method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of data analysis technologies, and in particular, to a data analysis method, a data analysis device, a computer device, and a storage medium.

Background

With the development of globalization of information, media such as networks have become an indispensable part of people's daily lives. Public opinion data such as internet public opinion has become a main channel for people to express their own speech. Network public opinion is a social public opinion expressed through the internet. Fermentation of network public opinion can have various effects on individuals, businesses, industries, and even society, which can be positive or negative. In fact, the reasons for the appearance of emerging things, lack of knowledge, etc. may lead to an increase in the difficulty of extracting effective information from public opinion data, making it more difficult to find a link between potential things. How to extract effective information from public opinion data to discover potential inter-thing connections is a major issue.

Disclosure of Invention

The embodiment of the application provides a data analysis method, a data analysis device, computer equipment and a storage medium, which can extract effective information from public opinion data to discover potential inter-thing connection.

In a first aspect, an embodiment of the present application provides a data analysis method, including:

Obtaining public opinion data;

Extracting the entities from the public opinion data to obtain a plurality of entities;

extracting the relationships of the entities according to the public opinion data to obtain a plurality of relationship pairs;

Determining standard naming corresponding to each entity included in each relation pair in the relation pairs;

and mapping the relation among the entities included in each relation pair into the relation among the standard names corresponding to the entities included in each relation pair.

Optionally, the entity extraction of the public opinion data to obtain a plurality of entities includes:

Encoding a plurality of words included in the public opinion data to obtain a first word vector set, wherein the first word vector set comprises word vectors of each word in the plurality of words;

vocabulary enhancement is carried out on the first word vector set to obtain a second word vector set;

and carrying out entity recognition based on the second word vector set to obtain a plurality of entities.

Optionally, the extracting relationships between the plurality of entities according to the public opinion data to obtain a plurality of relationship pairs includes:

Obtaining a target entity pair according to the plurality of entities;

Determining a target sentence comprising the target entity pair from the public opinion data, and labeling the position information of each entity in the target entity pair in the target sentence;

inputting the target sentences and the position information of each entity in the target entity pair in a relation prediction model to perform relation prediction so as to obtain the relation among each entity in the target entity pair;

And constructing a target relation pair according to the target entity pair and the relation of each entity in the target entity pair, and obtaining a plurality of relation pairs comprising the target relation pair.

Optionally, the inputting the target sentence and the position information of each entity in the target entity pair in the target sentence into a relationship prediction model to perform relationship prediction, to obtain a relationship between each entity in the target entity pair, includes:

The coding layer included by the relation prediction model is utilized to carry out coding processing according to the target sentence and the position information of each entity in the target entity pair, so as to obtain a coding result of each entity in the target entity pair;

Carrying out pooling treatment on the coding results of each entity in the target entity pair by using a pooling layer included in the relation prediction model to obtain pooling results of each entity in the target entity pair;

and executing classification operation on the pooling result of each entity in the target entity pair by using a classification layer included in the relation prediction model to obtain the relation of each entity in the target entity pair.

Optionally, the determining the standard naming corresponding to each entity included in each of the plurality of relationship pairs includes:

Matching each first type entity in the plurality of relation pairs with each standard name included in a database to determine the standard name corresponding to each first type entity from the database;

and determining the standard naming corresponding to each entity of the second type in the plurality of relation pairs according to the corresponding relation between the entity of the second type and the standard naming, wherein the first type is different from the second type.

Optionally, the matching the entity of each first type in the plurality of relationship pairs with each standard name included in the database to determine, from the database, the standard name corresponding to each entity of the first type includes:

calculating a relationship coefficient between each entity of the first type in the plurality of relationship pairs and each standard name included in the database through a short text matching model;

And determining the standard names which are greater than or equal to a preset value and correspond to the entities of the first type from the database according to the relation coefficient between the entities of the first type and each standard name included in the database.

Optionally, the method further comprises:

carrying out emotion polarity analysis on a target sentence in the public opinion data to obtain an emotion polarity label of a target entity included in the target sentence by the target entity;

determining a target standard name corresponding to the target entity and other standard names associated with the target standard name;

And determining the influence condition of the public opinion data on the target standard naming corresponding to the target entity and the influence condition of the public opinion data on the other standard naming according to the emotion polarity label of the target entity.

In a second aspect, an embodiment of the present application provides a data analysis apparatus, including:

the acquisition module is used for acquiring public opinion data;

The entity extraction module is used for carrying out entity extraction on the public opinion data to obtain a plurality of entities;

the relation extraction module is used for extracting the relation of the entities according to the public opinion data to obtain a plurality of relation pairs;

The determining module is used for determining standard naming corresponding to each entity included in each relation pair in the relation pairs;

And the mapping module is used for mapping the relation among the entities included in each relation pair into the relation among the standard naming corresponding to the entities included in each relation pair.

In a third aspect, an embodiment of the present application provides a computer device, including a processor and a memory, the processor and the memory being connected to each other, wherein the memory is configured to store a computer program, the computer program including program instructions, the processor being configured to invoke the program instructions to perform the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program for execution by a processor to implement the method of the first aspect.

In summary, the computer device may obtain public opinion data, and perform entity extraction on the public opinion data to obtain a plurality of entities; and then the computer equipment extracts the relationships of the entities according to the public opinion data to obtain a plurality of relationship pairs, and determines standard names corresponding to the entities included in each relationship pair in the relationship pairs, so that the relationship among the entities included in each relationship pair is mapped into the relationship among the standard names corresponding to the entities included in each relationship pair, and the process can extract effective information from the public opinion data to discover the potential relationship among things.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a data analysis method according to an embodiment of the present application;

FIG. 2 is a flow chart of another data analysis method according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a data analysis device according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Fig. 1 is a schematic flow chart of a data analysis method according to an embodiment of the application. The method can be applied to computer equipment, and the computer equipment can be a server or an intelligent terminal. Specifically, the method may comprise the steps of:

S101, public opinion data is obtained.

S102, entity extraction is carried out on the public opinion data to obtain a plurality of entities.

Wherein public opinion data includes, but is not limited to, news, web talk, articles published by individuals/authorities, etc. The plurality of entities may include at least one type of entity of: a first type of entity (e.g., an industrial entity), a second type of entity (e.g., a business entity), a time, a place, a person. In one embodiment, the plurality of entities may also include other types of entities, not listed herein.

In one embodiment, the method for extracting the entities from the public opinion data by the computer device to obtain the plurality of entities may be: the method comprises the steps that a plurality of words included in public opinion data are encoded by computer equipment to obtain a first word vector set, wherein the first word vector set comprises word vectors of each word in the plurality of words; the computer equipment carries out vocabulary enhancement on the first word vector set to obtain a second word vector set, and carries out entity recognition based on the second word vector set to obtain a plurality of entities. In one embodiment, the computer device may encode a plurality of words included in the public opinion data through a first BERT (english full scale: bidirectional Encoder Representations from Transformers) model to obtain a first set of word vectors. In one embodiment, the computer device may perform vocabulary enhancement on the first set of word vectors via a vocabulary enhancement Lexicon Augment method, such as the Soft Lexicon method, to obtain a second set of word vectors. In one embodiment, the computer device may perform entity recognition on the second set of word vectors through the lstm+crf model to obtain a plurality of entities.

In one embodiment, the manner in which the computer device performs vocabulary enhancement on the first set of word vectors to obtain the second set of word vectors may specifically be as follows: the method comprises the steps that computer equipment obtains a target word coding set of target words in the plurality of words, wherein the target word is any word in the plurality of words, and the target word coding set comprises word codes of words corresponding to each position tag in the plurality of position tags; the computer equipment performs splicing processing on the target word coding set and the word vectors of the target words in the first word vector set to obtain spliced word vectors corresponding to the target words, and generates a second word vector set according to the spliced word vectors corresponding to the target words. The word vector of the target word is the basic vector expression of the target word, the spliced word vector corresponding to the target word is the final vector expression of the target word, and the vector expression of the target word is enhanced by using the target word coding set.

In one embodiment, the target word encoding set may be BMES word encoding sets, and the plurality of location tags may include tag B, tag M, tag E, and tag S. B denotes a start position, M denotes an intermediate position, E denotes an end position, S denotes a single or separate position. Wherein, BMES words code set can be obtained by formula 1.1:

e ^s(B,M,E,S)＝[v^s(B);v^s(M);v^s(E);v^s (S) ] formula 1.1;

The process of performing the concatenation processing on the BMES word coding set obtained based on the formula 1.1 and the word vector of the target word can be represented by the formula 1.2:

x ^c←[x^c;e^s (B, M, E, S) ] formula 1.2;

in equations 1.1 and 1.2, e ^s represents the BMES word code set. v ^s denotes vocabulary coding, x ^c denotes word vectors of target words. And (2) performing splicing processing on the x ^c and v ^s of the words corresponding to the tag B, the tag M, the tag E and the tag S respectively according to the formula 1.2 to obtain a spliced word vector corresponding to the target word.

And S103, extracting the relationships of the entities according to the public opinion data to obtain a plurality of relationship pairs.

In one embodiment, the manner in which the computer device performs relationship extraction on the plurality of entities according to the public opinion data to obtain the plurality of relationship pairs may be: the computer device may specifically perform relationship extraction on the plurality of entities according to the public opinion data by using a relationship extraction tool, so as to obtain a plurality of entity pairs.

In one embodiment, the method for extracting relationships between the plurality of entities by the computer device according to the public opinion data to obtain a plurality of relationship pairs may further be: the computer equipment can also obtain a target entity pair according to the entities, determine a target sentence comprising the target entity pair from the public opinion data, and mark the position information of each entity in the target entity pair in the target sentence; the computer equipment inputs the target sentence and the position information of each entity in the target entity pair in the target sentence into a relation prediction model to perform relation prediction, so as to obtain the relation among the entities in the target entity pair, construct a target relation pair according to the target entity pair and the relation among the entities in the target entity pair, and obtain a plurality of relation pairs comprising the target relation pair. In one embodiment, the computer device may determine the target entity pair from the plurality of entities for the computer device based on the manner in which the plurality of entities obtain the target entity pair. Wherein the target entity pair may be constituted by two entities of a first type, or by two entities of a second type, or by one entity of the first type and one entity of the second type. The target sentence refers to a sentence including a target entity pair. Generally, the entity pairs corresponding to a sentence may be one or more. In most cases, a sentence corresponds to a pair of entities. In one embodiment, the location information may be start location information. The relational prediction model may be, for example, a second BERT model. The target entity pair may be expressed as (entity x, entity y), and the target relationship pair may be expressed as (relationship r, entity x, entity y), for example.

In one embodiment, the method for obtaining the relationship between each entity in the target entity pair by inputting the target sentence and the position information of each entity in the target sentence in the relationship prediction model by the computer device to perform relationship prediction may be: the computer equipment utilizes a coding layer included in the relation prediction model to carry out coding processing according to the target sentence and the position information of each entity in the target entity pair to obtain a coding result of each entity in the target entity pair; the computer equipment utilizes a pooling layer included in the relation prediction model to pool the coding results of each entity in the target entity pair to obtain pooled results of each entity in the target entity pair, and utilizes a classification layer included in the relation prediction model to classify the pooled results of each entity in the target entity pair to obtain the relation among each entity in the target entity pair. The process can accurately predict the relationship between the entities through the relationship prediction model.

In one embodiment, the manner of performing the classification operation on the pooled result of each entity in the target entity pair at the classification layer included in the relationship prediction model to obtain the relationship between each entity in the target entity pair may be as follows: the computer equipment substitutes the pooling result of each entity in the target entity pair into a formula 1.3 to calculate the probability value of each relation in the target entity pair in a plurality of relations, and selects the relation with the largest probability value as the relation among the entities in the target entity pair.

P (r_ij-x, e_i, e_j) =softmax (W [ o_i: o_j ] +b) formula 1.3;

Where x represents a target sentence and r represents a relationship between each entity included in the target entity pair. e_i, e_j denote entity i and entity j. The target entity pair consists of e_i and e_j. Where o_i, o_j represent the pooling result of entity i and the pooling result of entity j, respectively. W is the weight, and b is the classification layer parameter.

In one embodiment, the loss function used in the process of training the relational predictive model is a logarithmic loss function.

S104, determining standard naming corresponding to each entity included in each relation pair of the relation pairs.

In the embodiment of the application, the computer device can have two different ways of determining standard naming for the first type of entity and the second type of entity. Two different ways of determining standard nomenclature will be set forth below.

In one embodiment, the manner in which the computer device determines the standard naming for each entity included in each of the plurality of relationship pairs may be: the computer device matches each of the first type of entity in the plurality of relationship pairs with each of the standard designations included in the database to determine from the database the standard designation corresponding to each of the first type of entity. In one embodiment, the method by which the computer determines the standard naming corresponding to the first type of entity may be referred to as a short text matching algorithm. It should be noted that, in the embodiment of the present application, not every relationship pair necessarily includes the first type of entity. Also, not every relationship pair necessarily includes an entity of the second type.

In one embodiment, the method for the computer device to match each of the first types of entities in the plurality of relationship pairs with each of the standard names included in the database to determine from the database that each of the first types of entities corresponds to the standard name may be: and the computer equipment calculates the relation coefficient between each entity of the first type in the plurality of relation pairs and each standard name included in the database through a short text matching model, and determines the standard name with the relation coefficient of each entity of the first type greater than or equal to a preset value from the database according to the relation coefficient between each entity of the first type and each standard name included in the database, and uses the standard name as the standard name corresponding to each entity of the first type. In one embodiment, the short text matching model may be an ESIM model. The ESIM model is a model capable of realizing a short text matching function.

For example, assume that the plurality of relationship pairs includes relationship pair 1, relationship pair 1 includes entity 1, entity 2, and that entity 1 and entity 2 are each entities of a first type. The database includes standard nomenclature 1 and standard nomenclature 2. The computer equipment can calculate the relation coefficient between the entity 1 and the standard name 1 through the short text matching model, calculate the relation coefficient between the entity 1 and the standard name 2, and then select the standard name with the largest corresponding relation coefficient from the standard name 1 and the standard name 2 as the standard name corresponding to the entity 1. Meanwhile, the computer equipment can calculate the relation coefficient between the entity 2 and the standard name 1 through the short text matching model, calculate the relation coefficient between the entity 2 and the standard name 2, and then select the standard name with the largest corresponding relation coefficient from the standard name 1 and the standard name 2 as the standard name corresponding to the entity 2.

In one embodiment, the computer device calculates, through a short text matching model, a relationship coefficient between each entity of the first type in the plurality of relationship pairs and each standard name included in the database, by:

① And (adopting BiLSTM algorithm) respectively coding a first type entity and a standard naming selected from the first database in the first type entities to obtain a coding result of the first type entity and a coding result of the standard naming. The encoding result of the entity of the first type comprises the encoding result of each word included in the entity of the first type. The code results of a standard naming include the code results of each word included in the standard naming. The coding manner of each word included in the entity of the first type and the coding manner of each word included in the standard naming can be referred to the following two formulas, formula 1.4 and formula 1.5. Encoding results representing the ith word comprised by the entity of the first type,/>The code result of the i-th word included in the standard naming is represented. l _a denotes the length of the first type of entity, and l _b denotes the length of the standard naming.

② Inputting a first type entity coding result and a standard naming coding result into a local reasoning modeling Local Inference Modeling layer, calculating the similarity between each word included in the first entity and each word included in a selected standard naming by the Local Inference Modeling layer, and carrying out local reasoning on the first type entity and the standard naming according to the calculated similarity to obtain local reasoning information of the first type entity and local reasoning information of the standard naming. The local inference information of the first type entity may include local inference information of each word included in the first type entity, and the local inference information of the standard name may include local inference information of each word included in the standard name. The process of local reasoning can be seen in the following two formulas, formula 1.6 and formula 1.7.Local inference information for the ith word representing an entity of a first type,/>Local inference information representing a standard named jth word. e _ij denotes the similarity of the i-th word of an entity of a first type to a standard named j-th word. e _ik denotes the similarity of the i-th word of an entity of a first type to a standard named k-th word. e _kj denotes the similarity of the kth word of an entity of a first type to a standard named jth word.

③ And calculating the enhanced local reasoning Enhancementlocal inference information of the first type entity according to the coding result of the first type entity and the local reasoning information of the first type entity, and calculating the enhanced local reasoning information of the standard naming according to the coding result of the standard naming and the local reasoning information of the standard naming. The process of calculating the enhanced local inference information can be found in the following formula.

The enhanced local reasoning information is denoted by m.

④ The enhanced local reasoning information is input into a max poling pooling layer and a full connection layer, and a similarity coefficient between an entity of a first type and a standard naming is output as a relationship coefficient between the entity of the first type and the standard naming.

In one embodiment, the manner in which the computer device determines the standard naming corresponding to each entity included in each of the plurality of relationship pairs may further be: and the computer equipment determines the standard naming corresponding to each entity of the second type in the plurality of relation pairs according to the corresponding relation between the entity of the second type and the standard naming, wherein the first type is different from the second type. In one embodiment, the method by which the computer device determines the standard naming corresponding to the second type of entity may be referred to as a full acronym matching algorithm. In one embodiment, the computer device determines the standard naming corresponding to each of the second type of entities from the other databases according to the correspondence between the second type of entities and the standard naming recorded by each of the plurality of pairs of relationships and the other databases.

S105, mapping the relation among the entities included in each relation pair into the relation among standard naming corresponding to the entities included in each relation pair.

In the embodiment of the application, the computer equipment can determine the relation among the entities included in each relation pair as the relation among the standard names corresponding to the entities included in each relation pair. The process can map the relationship between the entities extracted according to the public opinion data to the corresponding standard naming.

In one embodiment, the computer device may construct a relationship network based on the standard designations for the entities included in each relationship pair and the standard designations for the entities included in each relationship pair. In an actual application scene, the embodiment of the application can be adopted to deeply dig the relation between industry and enterprises related in public opinion data, thereby constructing an industry-enterprise relation network and providing assistance for subsequent conduction deduction and manual decision.

In one embodiment, the computer device may update the existing relationship network with relationships between standard designations for each of the relationship pairs included for each entity.

In the embodiment shown in fig. 1, the computer device may obtain public opinion data, and perform entity extraction on the public opinion data to obtain a plurality of entities; and then the computer equipment can extract the relationships of the entities according to the public opinion data to obtain a plurality of relationship pairs, and determine standard names corresponding to the entities included in each relationship pair in the plurality of relationship pairs, so that the relationship among the entities included in each relationship pair is mapped into the relationship among the standard names corresponding to the entities included in each relationship pair.

Fig. 2 is a flow chart of another data analysis method according to an embodiment of the application. The method can be applied to computer equipment, and the computer equipment can be a server or an intelligent terminal. Specifically, the method may comprise the steps of:

S201, public opinion data is obtained.

S202, entity extraction is carried out on the public opinion data to obtain a plurality of entities.

And S203, extracting the relationships of the entities according to the public opinion data to obtain a plurality of relationship pairs.

S204, determining standard naming corresponding to each entity included in each relation pair of the relation pairs.

S205, mapping the relation among the entities included in each relation pair into the relation among standard naming corresponding to the entities included in each relation pair.

The steps S201 to S205 may refer to the steps S201 to S205 in the embodiment of fig. 1, which are not described herein.

S206, carrying out emotion polarity analysis on the target sentences in the public opinion data to obtain emotion polarity labels of target entities included in the target sentences and the target entities.

The target sentence may be, for example, a title of public opinion data, a text of public opinion data, or a full text of public opinion data. In one embodiment, the target entity may be an entity of the aforementioned second type, which may be, for example, a business entity. The emotion polarity tags may be, for example, positive tags and/or negative tags, or may be other emotion polarity tags.

In one embodiment, the method for analyzing the emotion polarity of the target sentence in the public opinion data by the computer device to obtain the emotion polarity label of the target entity included in the target sentence may be: and the computer equipment performs emotion polarity analysis on the target sentences in the public opinion data by using the third BERT model to obtain target entities included in the target sentences and emotion polarity labels of the target entities.

S207, determining the target standard names corresponding to the target entities and other standard names related to the target standard names.

In one embodiment, the computer device may determine the target standard naming corresponding to the target entity in the aforementioned manner of determining the standard naming corresponding to each entity included in each of the plurality of relationship pairs. In one embodiment, the computer device may determine, according to the correspondence between the second type of entity and the standard naming, the target standard naming corresponding to the target entity.

In one embodiment, the manner in which the computer device may determine other standard designations associated with the target standard designations may be: the computer device searches for other standard designations associated with the target standard designation by searching the relationship network.

S208, according to the emotion polarity label of the target entity, determining the influence condition of the public opinion data on the target standard naming corresponding to the target entity and the influence condition of the public opinion data on the other standard naming.

The target standard is named as a standard name corresponding to the target entity. The other standard designations associated with the target standard designations may be standard designations corresponding to the entities of the first type and/or standard designations corresponding to the entities of the second type with which the target standard designations are associated.

Or, the computer device may further determine the target standard name corresponding to the target entity and the entities corresponding to other standard names associated with the target standard name, and then determine the influence condition of the public opinion data on the target entity and the influence condition of the public opinion data on the entities corresponding to the other standard names according to the emotion polarity label of the target entity.

In an actual application scenario, since public opinion data may involve multiple subjects, the emotion polarity of each subject is different. Different from the traditional emotion classification task, the embodiment of the application can fully use the advantage of sequence labeling of the BERT model in the process of training the initial BERT model, and label sentences of multiple subjects with different emotion polarity labels respectively. For example, for the sentence "# the price of the news stock increases greatly, and # the price of the easy stock drops greatly-! Message is an enterprise. * Is easy for another enterprise. The emotion polarity tag of the sentence is constructed as follows:

As can be seen from the table, the method specifically adopts a BIO labeling mode to label the sample sentences, and trains an initial BERT model by using the labeled sample sentences to obtain the BERT model for sexy polarity analysis as a third BERT model. As can be seen from the above table, the labeled tags include B-POS, I-POS, B-NEG, I-NEG, O. B-POS indicates that the character is at the beginning (Begin) of an entity and the emotion polarity of the entity where the character is located is Positive (Positive), I-POS indicates that the character is Inside the entity (Inside) and the emotion polarity of the entity where the character is located is Positive (Positive); similarly, B-NEG indicates that the character is at the beginning of the entity (Begin) and the emotion polarity of the entity where the character is located is negative (Negtive), I-NEG indicates that the character is Inside the entity (Inside) and the emotion polarity of the entity where the character is located is negative (Negtive), and O indicates that the character is outside the entity (Outside). By this labeling mode, the BERT model training can be considered that the signal is positive and the signal is easy to be negative, so that the BERT model for emotion analysis which can distinguish multiple subjects is trained.

In one embodiment, the computer device may determine a relationship between the target standard naming and the other standard naming before, or may determine a relationship between the target entity and the entity corresponding to the other standard naming, and then determine an influence condition of the public opinion data on the target standard naming corresponding to the target entity and an influence condition on the other standard naming according to the determined relationship and the emotion polarity label of the target entity.

In actual production and life, industry and enterprises are always hot spots in industrial analysis research. For government, the research of industry can effectively assist policy decision-making and macro regulation, for enterprises, the dynamic state of industry can reflect industry prospect and develop new business direction, and for individuals, the analysis of industry can assist individual investment direction and practitioner direction. The industrial analysis based on public opinion can better grasp the dynamic and development of the industry and can also mine the relationship among industry enterprises which cannot be found in the industry at present. The embodiment of the application can deduce the influence of the computer equipment on the related industry or enterprises after a positive or negative event aiming at a certain main body occurs, for example, through massive public opinion data, the computer equipment can mine the upstream supplier of the enterprise B which is A, and the industry I which is the industry of the enterprise A, if the enterprise A has a great positive news, the supplier B and the industry I can be influenced, obviously, the upstream supplier B can be favored because of the great positive of the enterprise A, meanwhile, the industry I can be favored, and the system can mine the hidden information of the public opinion by adopting the set of method, so as to obtain the favored directions of the enterprise B and the industry I.

It can be seen that, in the embodiment shown in fig. 2, the computer device may further determine that the emotion polarity analysis is performed on the target sentence in the public opinion data, so as to obtain the emotion polarity label of the target entity included in the target sentence, and determine the target standard naming corresponding to the target entity and other standard naming associated with the target standard naming, so as to determine, according to the emotion polarity label of the target entity, the influence condition of the public opinion data on the target standard naming corresponding to the target entity and the influence condition of the public opinion data on other standard naming, and the process can effectively perform the business industry conduction influence deduction based on the emotion polarity analysis.

The application relates to a blockchain technology, such as abstract information of public opinion data can be obtained from a blockchain, and the public opinion data can be queried based on the abstract information. Or, the application can also synchronize official data from the blockchain node associated with each entity of the second type in the plurality of entities of the second type, and replace false data in public opinion data based on the official data, thereby ensuring the correctness of the relation of subsequent mapping and the correctness of the deduced influence condition.

Fig. 3 is a schematic structural diagram of a data analysis device according to an embodiment of the application. The apparatus may be applied to a computer device. Specifically, the apparatus may include:

the obtaining module 301 is configured to obtain public opinion data.

The entity extraction module 302 is configured to perform entity extraction on the public opinion data to obtain a plurality of entities.

And the relationship extraction module 303 is configured to extract relationships between the plurality of entities according to the public opinion data, so as to obtain a plurality of relationship pairs.

A determining module 304, configured to determine standard names corresponding to entities included in each of the relationship pairs.

And the mapping module 305 is configured to map the relationships between the entities included in each relationship pair to the relationships between standard names corresponding to the entities included in each relationship pair.

In an optional implementation manner, the entity extraction module 302 performs entity extraction on the public opinion data to obtain a plurality of entities, specifically, encodes a plurality of words included in the public opinion data to obtain a first word vector set, where the first word vector set includes word vectors of each word in the plurality of words; vocabulary enhancement is carried out on the first word vector set to obtain a second word vector set; and carrying out entity recognition based on the second word vector set to obtain a plurality of entities.

In an alternative embodiment, the relationship extraction module 303 performs relationship extraction on the plurality of entities according to the public opinion data to obtain a plurality of relationship pairs, specifically obtain a target entity pair according to the plurality of entities; determining a target sentence comprising the target entity pair from the public opinion data, and labeling the position information of each entity in the target entity pair in the target sentence; inputting the target sentences and the position information of each entity in the target entity pair in a relation prediction model to perform relation prediction so as to obtain the relation among each entity in the target entity pair; and constructing a target relation pair according to the target entity pair and the relation of each entity in the target entity pair, and obtaining a plurality of relation pairs comprising the target relation pair.

In an alternative embodiment, the relationship extraction module 303 inputs the target sentence and the position information of each entity in the target entity pair in the target sentence into a relationship prediction model to perform relationship prediction, so as to obtain the relationship between each entity in the target entity pair, specifically, the relationship prediction model includes a coding layer for coding the target sentence and the position information of each entity in the target entity pair in accordance with the position information of each entity in the target entity pair, so as to obtain a coding result of each entity in the target entity pair; carrying out pooling treatment on the coding results of each entity in the target entity pair by using a pooling layer included in the relation prediction model to obtain pooling results of each entity in the target entity pair; and executing classification operation on the pooling result of each entity in the target entity pair by using a classification layer included in the relation prediction model to obtain the relation of each entity in the target entity pair.

In an alternative embodiment, the determining module 304 determines standard names corresponding to the entities included in each of the plurality of relationship pairs, specifically, matches each of the first types of entities in the plurality of relationship pairs with each of the standard names included in the database, so as to determine the standard names corresponding to each of the first types of entities from the database; and determining the standard naming corresponding to each entity of the second type in the plurality of relation pairs according to the corresponding relation between the entity of the second type and the standard naming, wherein the first type is different from the second type.

In an alternative embodiment, the relationship extraction module 303 matches each of the first types of entities in the plurality of relationship pairs with each of the standard names included in the database, so as to determine, from the database, the standard name corresponding to each of the first types of entities, specifically, calculate, through a short text matching model, a relationship coefficient between each of the first types of entities in the plurality of relationship pairs and each of the standard names included in the database; and determining the standard names which are greater than or equal to a preset value and correspond to the entities of the first type from the database according to the relation coefficient between the entities of the first type and each standard name included in the database.

In an alternative embodiment, the data analysis device further comprises an analysis module 306.

In an optional implementation manner, an analysis module 306 is configured to perform emotion polarity analysis on a target sentence in the public opinion data to obtain an emotion polarity tag of a target entity included in the target sentence by using the target entity; determining a target standard name corresponding to the target entity and other standard names associated with the target standard name; and determining the influence condition of the public opinion data on the target standard naming corresponding to the target entity and the influence condition of the public opinion data on the other standard naming according to the emotion polarity label of the target entity.

In the embodiment shown in fig. 3, the data analysis device may obtain public opinion data, and perform entity extraction on the public opinion data to obtain a plurality of entities; and then the data analysis device can extract the relationships of the entities according to the public opinion data to obtain a plurality of relationship pairs, and determine standard names corresponding to the entities included in each relationship pair in the plurality of relationship pairs, so that the relationship among the entities included in each relationship pair is mapped into the relationship among the standard names corresponding to the entities included in each relationship pair.

Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application. The computer device described in the present embodiment may include: one or more processors 1000 and a memory 2000. The processor 1000 and the memory 2000 may be connected by a bus or the like.

The Processor 1000 may be a central processing module (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 2000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a disk memory. The memory 2000 is used to store a set of program codes, and the processor 1000 may call the program codes stored in the memory 2000. Specifically:

A processor 1000 for obtaining public opinion data; extracting the entities from the public opinion data to obtain a plurality of entities; extracting the relationships of the entities according to the public opinion data to obtain a plurality of relationship pairs; determining standard naming corresponding to each entity included in each relation pair in the relation pairs; and mapping the relation among the entities included in each relation pair into the relation among the standard names corresponding to the entities included in each relation pair.

In one embodiment, the processor 1000 is specifically configured to encode a plurality of words included in the public opinion data to obtain a first word vector set, where the first word vector set includes a word vector of each word in the plurality of words; vocabulary enhancement is carried out on the first word vector set to obtain a second word vector set; and carrying out entity recognition based on the second word vector set to obtain a plurality of entities.

In one embodiment, the processor 1000 is further specifically configured to obtain a target entity pair according to the plurality of entities; determining a target sentence comprising the target entity pair from the public opinion data, and labeling the position information of each entity in the target entity pair in the target sentence; inputting the target sentences and the position information of each entity in the target entity pair in a relation prediction model to perform relation prediction so as to obtain the relation among each entity in the target entity pair; and constructing a target relation pair according to the target entity pair and the relation of each entity in the target entity pair, and obtaining a plurality of relation pairs comprising the target relation pair.

In one embodiment, the processor 1000 is further specifically configured to perform encoding processing according to the target sentence and the position information of each entity in the target entity pair by using an encoding layer included in the relational prediction model, so as to obtain an encoding result of each entity in the target entity pair; carrying out pooling treatment on the coding results of each entity in the target entity pair by using a pooling layer included in the relation prediction model to obtain pooling results of each entity in the target entity pair; and executing classification operation on the pooling result of each entity in the target entity pair by using a classification layer included in the relation prediction model to obtain the relation of each entity in the target entity pair.

In one embodiment, the processor 1000 is further specifically configured to match each of the first types of entities in the plurality of relationship pairs with each of the standard names included in the database, so as to determine, from the database, a standard name corresponding to each of the first types of entities; and determining the standard naming corresponding to each entity of the second type in the plurality of relation pairs according to the corresponding relation between the entity of the second type and the standard naming, wherein the first type is different from the second type.

In one embodiment, the processor 1000 is further specifically configured to calculate, through a short text matching model, a relationship coefficient between each entity of the first type in the plurality of relationship pairs and each standard name included in the database; and determining the standard names which are greater than or equal to a preset value and correspond to the entities of the first type from the database according to the relation coefficient between the entities of the first type and each standard name included in the database.

In one embodiment, the processor 1000 is further specifically configured to perform emotion polarity analysis on a target sentence in the public opinion data, so as to obtain an emotion polarity label of a target entity included in the target sentence by the target entity; determining a target standard name corresponding to the target entity and other standard names associated with the target standard name; and determining the influence condition of the public opinion data on the target standard naming corresponding to the target entity and the influence condition of the public opinion data on the other standard naming according to the emotion polarity label of the target entity.

In a specific implementation, the processor 1000 described in the embodiment of the present application may perform the implementation described in the embodiment of fig. 1 and the embodiment of fig. 2, and may also perform the implementation described in the embodiment of the present application, which is not described herein again.

The functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in the form of sampling hardware or in the form of sampling software functional modules.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Wherein the computer readable storage medium may be volatile or nonvolatile. For example, the computer storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like. The computer readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The above disclosure is only a preferred embodiment of the present application, and it should be understood that the scope of the application is not limited thereto, but all or part of the procedures for implementing the above embodiments can be modified by one skilled in the art according to the scope of the appended claims.

Claims

1. A method of data analysis, comprising:

Obtaining public opinion data;

Extracting the entities from the public opinion data to obtain a plurality of entities; the plurality of entities includes: a first type of entity, a second type of entity; wherein the entity extraction comprises: encoding a plurality of words included in the public opinion data to obtain a first word vector set, wherein the first word vector set comprises word vectors of each word in the plurality of words; acquiring a target word coding set of a target word in the plurality of words, wherein the target word is any word in the plurality of words, and the target word coding set comprises word codes of words corresponding to each position tag in the plurality of position tags; performing splicing processing on the target word coding set and word vectors of the target words in the first word vector set to obtain spliced word vectors corresponding to the target words, and generating a second word vector set according to the spliced word vectors corresponding to the target words; performing entity recognition based on the second word vector set to obtain a plurality of entities;

Mapping the relation among the entities included in each relation pair into the relation among standard naming corresponding to the entities included in each relation pair;

the determining the standard naming corresponding to each entity included in each relation pair in the plurality of relation pairs includes:

Matching each first type entity in the plurality of relation pairs with each standard name included in a database to determine the standard name corresponding to each first type entity from the database; the standard naming is determined by calculating a relationship coefficient between each entity of the first type in the plurality of relationship pairs and each standard naming included in the database, and the relationship coefficient is determined in the following manner: encoding a first type entity and a standard naming selected from a first database in each first type entity respectively to obtain an encoding result of the first type entity and an encoding result of the standard naming; inputting a first type entity coding result and a standard naming coding result into a local reasoning modeling layer, calculating the similarity between each word included in the first entity and each word included in a selected standard naming by the local reasoning modeling layer, and carrying out local reasoning on the first type entity and the standard naming according to the calculated similarity to obtain local reasoning information of the first type entity and local reasoning information of the standard naming; calculating the enhanced local reasoning information of the first type of entity according to the coding result of the first type of entity and the local reasoning information of the first type of entity, and calculating the enhanced local reasoning information of the standard naming according to the coding result of the standard naming and the local reasoning information of the standard naming; the enhanced local reasoning information is input into a maximum pooling layer and a full connection layer, and a similarity coefficient between a first type entity and a standard naming is output as a relationship coefficient between the first type entity and the standard naming;

Determining standard names corresponding to the entities of the second type in the plurality of relation pairs according to the corresponding relation between the entities of the second type and the standard names, wherein the first type is different from the second type, and the mode of determining the standard names adopted for the entities of the first type and the entities of the second type is different;

The method further comprises the steps of:

2. The method of claim 1, wherein the extracting relationships between the plurality of entities according to the public opinion data to obtain a plurality of relationship pairs comprises:

Obtaining a target entity pair according to the plurality of entities;

3. The method according to claim 2, wherein inputting the target sentence and the position information of each entity in the target entity pair in the target sentence into a relationship prediction model to perform relationship prediction, to obtain the relationship between each entity in the target entity pair, includes:

4. The method of claim 1, wherein said matching each of the first type of entity in the plurality of relationship pairs with each of the standard designations included in the database to determine from the database a standard designation corresponding to each of the first type of entity comprises:

5. A data analysis device, comprising:

the acquisition module is used for acquiring public opinion data;

The entity extraction module is used for carrying out entity extraction on the public opinion data to obtain a plurality of entities; the plurality of entities includes: a first type of entity, a second type of entity; wherein the entity extraction comprises: encoding a plurality of words included in the public opinion data to obtain a first word vector set, wherein the first word vector set comprises word vectors of each word in the plurality of words; acquiring a target word coding set of a target word in the plurality of words, wherein the target word is any word in the plurality of words, and the target word coding set comprises word codes of words corresponding to each position tag in the plurality of position tags; performing splicing processing on the target word coding set and word vectors of the target words in the first word vector set to obtain spliced word vectors corresponding to the target words, and generating a second word vector set according to the spliced word vectors corresponding to the target words; performing entity recognition based on the second word vector set to obtain a plurality of entities;

The mapping module is used for mapping the relation among the entities included in each relation pair into the relation among the standard naming corresponding to the entities included in each relation pair;

The determining module is specifically configured to:

The apparatus further comprises: an analysis module;

The analysis module is used for carrying out emotion polarity analysis on the target sentences in the public opinion data to obtain emotion polarity labels of target entities included in the target sentences and the target entities; determining a target standard name corresponding to the target entity and other standard names associated with the target standard name; and determining the influence condition of the public opinion data on the target standard naming corresponding to the target entity and the influence condition of the public opinion data on the other standard naming according to the emotion polarity label of the target entity.

6. A computer device comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is adapted to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-4.

7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor to implement the method of any of claims 1-4.