WO2022227196A1 - 一种数据分析方法、装置、计算机设备及存储介质 - Google Patents

一种数据分析方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022227196A1
WO2022227196A1 PCT/CN2021/097114 CN2021097114W WO2022227196A1 WO 2022227196 A1 WO2022227196 A1 WO 2022227196A1 CN 2021097114 W CN2021097114 W CN 2021097114W WO 2022227196 A1 WO2022227196 A1 WO 2022227196A1
Authority
WO
WIPO (PCT)
Prior art keywords
relationship
entity
target
pair
entities
Prior art date
Application number
PCT/CN2021/097114
Other languages
English (en)
French (fr)
Inventor
黄振宇
陈思业
吴文哲
王磊
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022227196A1 publication Critical patent/WO2022227196A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Definitions

  • the present application relates to the technical field of data analysis, and in particular, to a data analysis method, apparatus, computer equipment and storage medium.
  • Internet public opinion and other public opinion data have become the main channels that people use to express their opinions.
  • Internet public opinion is social public opinion expressed through the Internet.
  • the fermentation of online public opinion will have various impacts on individuals, enterprises, industries and even society, which may be positive or negative.
  • the inventor realizes that, in fact, the emergence of new things, lack of knowledge and other reasons will increase the difficulty of extracting effective information from public opinion data, thereby making it more difficult to discover potential connections between things. Therefore, how to extract effective information from public opinion data to discover potential connections between things has become an urgent problem to be solved.
  • the embodiments of the present application provide a data analysis method, apparatus, computer equipment and storage medium, which can extract effective information from public opinion data to discover potential connections between things.
  • an embodiment of the present application provides a data analysis method, including:
  • the relationship between the entities included in each relationship pair is mapped to the relationship between standard names corresponding to the entities included in each relationship pair.
  • an embodiment of the present application provides a data analysis device, including:
  • the acquisition module is used to acquire public opinion data
  • an entity extraction module which is used to perform entity extraction on the public opinion data to obtain multiple entities
  • a relationship extraction module configured to perform relationship extraction on the multiple entities according to the public opinion data to obtain multiple relationship pairs
  • a determining module configured to determine a standard naming corresponding to each entity included in each relationship pair in the plurality of relationship pairs
  • the mapping module is configured to map the relationship between the entities included in each relationship pair to the relationship between standard names corresponding to the entities included in each relationship pair.
  • an embodiment of the present application provides a computer device, including a processor and a memory, where the processor and the memory are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions , the processor is configured to invoke the program instructions to perform the following method:
  • the relationship between the entities included in each relationship pair is mapped to the relationship between standard names corresponding to the entities included in each relationship pair.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the following method:
  • the relationship between the entities included in each relationship pair is mapped to the relationship between standard names corresponding to the entities included in each relationship pair.
  • This application can extract effective information from public opinion data to discover potential connections between things.
  • FIG. 1 is a schematic flowchart of a data analysis method provided in an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of another data analysis method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a data analysis device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the technical solution of the present application may relate to the field of big data technology, and may be applied to scenarios such as data analysis for public opinion data, to extract effective information from the public opinion data, thereby promoting the construction of a smart city.
  • the data involved in this application such as public opinion data and/or relationship information between entities, can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application. .
  • FIG. 1 is a schematic flowchart of a data analysis method according to an embodiment of the present application.
  • the method can be applied to computer equipment, and the computer equipment can be a server or an intelligent terminal. Specifically, the method may include the following steps:
  • public opinion data includes but is not limited to data such as news, online speech, articles published by individuals/officers, etc.
  • the plurality of entities may include at least one of the following types of entities: a first type of entity (eg, an industrial entity), a second type of entity (eg, a business entity), time, place, person.
  • a first type of entity eg, an industrial entity
  • a second type of entity eg, a business entity
  • time, place, person e.g., time, place, person.
  • the plurality of entities may also include other types of entities, which are not listed here.
  • the computer device performs entity extraction on the public opinion data to obtain multiple entities: the computer device encodes multiple words included in the public opinion data to obtain a first set of word vectors, where the first word The vector set includes the word vector of each word in the plurality of words; the computer device performs vocabulary enhancement on the first word vector set to obtain a second word vector set, and performs entity recognition based on the second word vector set to obtain multiple word vector sets. an entity.
  • the computer device may encode multiple words included in the public opinion data by using the first BERT (full English name: Bidirectional Encoder Representations from Transformers) model to obtain the first set of word vectors.
  • the computer device can perform lexical enhancement on the first set of word vectors through a Lexicon Augment method of lexical enhancement, such as the Soft Lexicon method, to obtain a second set of word vectors.
  • the computer device may perform entity recognition on the second set of word vectors by using the LSTM+CRF model to obtain multiple entities.
  • the computer device performs lexical enhancement on the first word vector set to obtain the second word vector set specifically as follows: the computer device obtains the target word encoding set of the target word in the The word is any word in the plurality of words, and the target word code set includes the word code of the word corresponding to each position label in the plurality of position labels; the computer device combines the target word code set with the first word vector set in the first word vector set.
  • the word vector of the target word is spliced to obtain a spliced word vector corresponding to the target word, and a second word vector set is generated according to the spliced word vector corresponding to the target word.
  • the word vector of the target word is the basic vector expression of the target word, and the concatenated word vector corresponding to the target word is the final vector expression of the target word.
  • the embodiment of the present application enhances the vector representation of the target word by using the target word encoding set. express.
  • the target word code set may be a BMES word code set
  • the multiple position labels may include a label B, a label M, a label E, and a label S.
  • B is the start position
  • M is the middle position
  • E is the end position
  • S is the single or individual position.
  • the BMES word encoding set can be obtained by formula 1.1:
  • Equation 1.1 and Equation 1.2 es represents the BMES word encoding set.
  • v s represents the lexical encoding
  • x c represents the word vector of the target word.
  • Formula 1.2 performs splicing processing on x c and the v s of the words corresponding to the label B, label M, label E, and label S, respectively, to obtain the spliced word vector corresponding to the target word.
  • the computer device performs relationship extraction on the multiple entities according to the public opinion data
  • the method for obtaining multiple relationship pairs may be as follows: the computer device may specifically use a relationship extraction tool to perform a relationship extraction process on the multiple entities according to the public opinion data. Relation extraction to get multiple entity pairs.
  • the computer device performs relationship extraction on the plurality of entities according to the public opinion data, and obtains the plurality of relation pairs in the following manner: the computer device may also obtain the target entity pair according to the plurality of entities, and obtain the target entity pair from the public opinion data.
  • the target sentence including the target entity pair is determined in the data, and the position information of each entity in the target entity pair in the target sentence is marked; the computer device puts the target sentence and each entity in the target entity pair in the target sentence.
  • the position information is input into the relationship prediction model for relationship prediction, the relationship between the entities in the target entity pair is obtained, and the target relationship pair is constructed according to the target entity pair and the relationship between the entities in the target entity pair, and the target relationship pair is obtained including the Multiple relation pairs for the target relation pair.
  • the computer device may determine the target entity pair from the plurality of entities for the computer device according to the manner in which the plurality of entities obtains the target entity pair.
  • the target entity pair may be composed of two first-type entities, or two second-type entities, or one first-type entity and one second-type entity.
  • a target sentence refers to a sentence including a target entity pair. Generally speaking, there can be one or more entity pairs corresponding to a sentence. In most cases, a sentence corresponds to an entity pair.
  • the location information may be starting location information.
  • the relationship prediction model may be, for example, the second BERT model.
  • the target entity pair can be represented as (entity x, entity y), and the target relation pair can be represented as (relation r, entity x, entity y), for example.
  • the computer device inputs the target sentence and the position information of each entity in the target entity pair in the target sentence into a relationship prediction model to predict the relationship, and obtains the relationship between the entities in the target entity pair. It can be: the computer equipment uses the coding layer included in the relationship prediction model to perform coding processing according to the target sentence and the position information of each entity in the target entity pair in the target entity pair, and obtain the coding result of each entity in the target entity pair.
  • the computer equipment uses the pooling layer included in the relationship prediction model to perform pooling processing on the coding results of each entity in the target entity pair to obtain the pooling results of each entity in the target entity pair, and uses the relationship prediction model to include
  • the classification layer performs a classification operation on the pooling results of the entities in the target entity pair, and obtains the relationship between the entities in the target entity pair. This process can accurately predict the relationship between entities through the relationship prediction model.
  • the classification operation is performed on the pooled results of the entities in the target entity pair by using the classification layer included in the relationship prediction model, and the manner of obtaining the relationship between the entities in the target entity pair may be as follows: computer equipment Substitute the pooling results of each entity in the target entity pair into formula 1.3 to calculate the probability value of each relationship in the target entity pair in multiple relationships, and select the relationship with the largest probability value as the relationship between the entities in the target entity pair .
  • x represents the target sentence
  • r represents the relationship between the entities included in the target entity pair.
  • e_i, e_j represent entity i and entity j.
  • the target entity pair consists of e_i and e_j.
  • o_i and o_j represent the pooling result of entity i and the pooling result of entity j, respectively.
  • W is the weight
  • b is the classification layer parameter.
  • the loss function used in the process of training the relationship prediction model is a logarithmic loss function.
  • the computer device may have two different ways of determining the standard naming for the first type of entities and the second type of entities.
  • the following description will describe two different ways of determining the nomenclature of the standard.
  • the method for the computer device to determine the standard naming corresponding to each entity included in each relationship pair in the plurality of relationship pairs may be: the computer device associates each entity of the first type in the plurality of relationship pairs with the database The included standard names are matched to determine the standard names corresponding to the entities of the first type from the database.
  • the method for the computer to determine the standard name corresponding to the entity of the first type may be referred to as a short text matching algorithm. It should be noted that, in this embodiment of the present application, each relationship pair does not necessarily include an entity of the first type. Likewise, not every relationship pair necessarily includes an entity of the second type.
  • the computer device matches each of the entities of the first type in the plurality of relational pairs with each of the standard names included in the database, so as to determine from the database that the entities of the first type correspond to each other
  • the standard naming method can be as follows: the computer device calculates the relationship coefficients between the entities of the first type in the plurality of relationship pairs and the standard naming included in the database through the short text matching model, The relationship coefficient between the entity and each standard naming included in the database, and the standard naming with the relationship coefficient between the entities of the first type greater than or equal to the preset value is determined from the database, as the entity corresponding to each first type.
  • the short text matching model may be an ESIM model.
  • the ESIM model is a model that can realize the function of short text matching.
  • the plurality of relation pairs include relation pair 1, and relation pair 1 includes entity 1, entity 2, and both entity 1 and entity 2 are entities of the first type.
  • the database includes standard nomenclature 1 and standard nomenclature 2.
  • the computer device can calculate the relationship coefficient between entity 1 and standard naming 1 through the short text matching model, and calculate the relationship coefficient between entity 1 and standard naming 2, and then select the one with the largest corresponding relationship coefficient from standard naming 1 and standard naming 2.
  • the standard naming is used as the standard naming corresponding to entity 1.
  • the computer device can also calculate the relationship coefficient between entity 2 and standard naming 1 through the short text matching model, and calculate the relationship coefficient between entity 2 and standard naming 2, and then calculate the relationship coefficient between standard naming 1 and standard naming 2 from standard naming 1 and standard naming 2.
  • the standard name with the largest corresponding relationship coefficient is selected as the standard name corresponding to entity 2.
  • the computer device calculates the relationship coefficients between the entities of the first type in the plurality of relationship pairs and the standard names included in the database by using a short text matching model, and the process is as follows:
  • the encoding result of an entity of the first type includes the encoding result of each word included in the entity of the first type.
  • the encoding result of a standard name includes the encoding result of each word included in the standard name.
  • the encoding method of each word included in the entity of the first type and the encoding method of each word included in the standard naming may refer to the following two formulas, Equation 1.4 and Equation 1.5.
  • l a represents the length of the entity of the first type
  • l b represents the length of the standard naming
  • the Local Inference Modeling layer calculates the words included in the first entity and the selected standard naming. and perform local inference on an entity of the first type and a standard naming according to the calculated similarity, and obtain local inference information of an entity of the first type and local inference information of a standard naming .
  • the local inference information of an entity of the first type may include the local inference information of each word included in the entity of the first type
  • the local inference information of a standard naming may include the local inference information of each word included in the standard naming. Local reasoning information.
  • the process of local reasoning can refer to the following two formulas, formula 1.6 and formula 1.7.
  • local inference information representing the ith word of an entity of the first type Represents local inference information for the jth word of a standard naming.
  • e ij represents the similarity between the ith word of an entity of the first type and the jth word of a standard naming.
  • e ik represents the similarity between the ith word of an entity of the first type and the kth word of a standard naming.
  • e kj represents the similarity between the kth word of an entity of the first type and the jth word of a standard naming.
  • the local inference information of the first type of entity calculates the enhanced local inference enhancement local inference information of the first type of entity, and according to a standard named encoding result and the standard named Local Reasoning Information
  • the standard naming enhanced local reasoning information is computed.
  • the process of calculating the enhanced local reasoning information can refer to the following formula.
  • the enhanced local inference information is denoted by m.
  • the method for the computer device to determine the standard name corresponding to each entity included in each relationship pair in the plurality of relationship pairs may also be: the computer device determines the corresponding relationship between the entity of the second type and the standard name Standard naming corresponding to each entity of the second type in the plurality of relation pairs, the first type is different from the second type.
  • the method for the computer device to determine the standard name corresponding to the entity of the second type may be referred to as a full abbreviation matching algorithm.
  • the computer device determines each second type of entity from other databases according to the corresponding relationship between each second type of entity in the plurality of relationship pairs and the second type of entity recorded in other databases and standard naming Standard naming for entities.
  • the computer device may determine the relationship between the entities included in each relationship pair as a relationship between standard names corresponding to the entities included in each relationship pair. This process can map the relationship between entities extracted according to the public opinion data to the corresponding standard naming.
  • the computer device may construct a relationship network according to the standard naming corresponding to each entity included in each relationship pair and the standard naming corresponding to each entity included in each relationship pair.
  • the relationship between industries and enterprises involved in public opinion data can be deeply digged, so as to construct an industry-enterprise relationship network, which provides help for subsequent conduction deduction and manual decision-making .
  • the computer device may update the existing relational network by using the relation between the standard names corresponding to the entities included in each relation pair.
  • the computer equipment can obtain public opinion data, and perform entity extraction on the public opinion data to obtain multiple entities; and then the computer equipment can perform relationship extraction on the multiple entities according to the public opinion data to obtain Multiple relationship pairs, and determine the standard naming corresponding to each entity included in each relationship pair in the multiple relationship pairs, so that the relationship between the entities included in each relationship pair is mapped to each relationship included in the relationship pair.
  • the relationship between standard names corresponding to entities, this process can extract effective information from public opinion data to discover potential connections between things.
  • FIG. 2 is a schematic flowchart of another data analysis method provided by an embodiment of the present application.
  • the method can be applied to computer equipment, and the computer equipment can be a server or an intelligent terminal. Specifically, the method may include the following steps:
  • steps S201 to S205 reference may be made to steps S201 to S205 in the embodiment of FIG. 1 , and details are not described here.
  • the target sentence may be, for example, the title of the public opinion data, or the text of the public opinion data, or the full text of the public opinion data.
  • the target entity may be an entity of the second type mentioned above, for example, a business entity.
  • Sentiment polarity labels may be, for example, positive labels and/or negative labels, or may also be other sentiment polarity labels.
  • the computer device performs sentiment polarity analysis on the target sentence in the public opinion data
  • the method of obtaining the target entity included in the target sentence as the sentiment polarity label of the target entity may be: the computer device utilizes the third BERT The model analyzes the sentiment polarity of the target sentence in the public opinion data, and obtains the target entity included in the target sentence and the sentiment polarity label of the target entity.
  • the computer device may determine the target standard name corresponding to the target entity by using the aforementioned method of determining the standard name corresponding to each entity included in each relation pair in the plurality of relation pairs. In one embodiment, the computer device may determine the target standard name corresponding to the target entity according to the corresponding relationship between the entity of the second type and the standard name.
  • the manner in which the computer device may determine other standard names associated with the target standard name may be: the computer device searches for other standard names associated with the target standard name by searching the relational network.
  • the target standard naming is the standard naming corresponding to the target entity.
  • the other standard naming associated with the target standard naming may be the standard naming corresponding to the entity of the first type and/or the standard naming corresponding to the entity of the second type to which the target standard naming is associated.
  • the computer device can also determine the target standard naming corresponding to the target entity and other standard naming corresponding entities associated with the target standard naming, and then determine the impact of the public opinion data on the target entity and other standards according to the emotional polarity label of the target entity. Name the impact of the corresponding entity.
  • the public opinion data may involve multiple subjects, and each subject has a different emotional polarity.
  • the embodiment of the present application can make full use of the advantages of the sequence labeling of the BERT model, and label multi-subject sentences with different sentiment polarity labels respectively. For example, for the sentence "*Xun's stock price has risen sharply, while *Yi's stock price has fallen sharply!, *Xun is a company. * Easy for another business.
  • the sentiment polarity label of the sentence is constructed as follows:
  • this solution uses the BIO labeling method to label the sample sentences, uses the labeled sample sentences to train the initial BERT model, and obtains the BERT model for sexuality polarity analysis as the third BERT model.
  • the labeled labels include B-POS, I-POS, B-NEG, I-NEG, O.
  • B-POS means that the character is at the beginning (Begin) of an entity and the emotional polarity of the entity where the character is located is positive (Positive)
  • I-POS means that the character is inside the entity (Inside) and the entity where the character is located The emotional polarity of the character is positive (Positive); similarly, B-NEG indicates that the character is at the beginning of the entity (Begin) and the emotional polarity of the entity where the character is located is negative (Negtive), and I-NEG indicates that the character is in the The emotional polarity of the entity where the character is located is negative (Negtive), and O indicates that the character is outside the entity (Outside).
  • the BERT model will consider "*information" as positive and "*easy” as negative during training, so as to train a BERT model for sentiment analysis that can distinguish multiple subjects.
  • the computer device may determine the relationship between the target standard naming and other standard naming, or may determine the relationship between the target entity and entities corresponding to other standard naming, and then according to the determined relationship and the emotional polarity label of the target entity , and determine the impact of the public opinion data on the target standard naming corresponding to the target entity and the impact on the other standard naming.
  • industries and enterprises have always been the hotspots in industry analysis and research.
  • industry research can effectively assist policy decision-making and macro-control.
  • industry dynamics can reflect industry prospects and develop new business directions.
  • industry analysis can assist individual investment directions. and career direction.
  • Industry analysis based on public opinion can better grasp the dynamics and development of the industry, and at the same time, it can also dig out the relationships between industrial enterprises that have not been found in the industry at present.
  • the embodiments of the present application can realize that after a positive or negative event occurs for a certain subject, its impact on related industries or enterprises can be deduced. For example, through massive public opinion data, computer equipment can mine that enterprise B is the upstream of A.
  • the computer device can also determine to perform sentiment polarity analysis on the target sentence in the public opinion data, obtain the target entity included in the target sentence with the sentiment polarity label of the target entity, and determine The target standard naming corresponding to the target entity and other standard naming associated with the target standard naming, so as to determine the impact of the public opinion data on the target standard naming corresponding to the target entity and other standards according to the sentiment polarity label of the target entity Named impact situation, this process can effectively deduce the impact of enterprise industry transmission based on sentiment polarity analysis.
  • This application involves blockchain technology.
  • the abstract information of public opinion data can be obtained from the blockchain, and the public opinion data can be queried based on the abstract information.
  • the application can also synchronize the official data from the blockchain nodes associated with each of the second-type entities in the plurality of second-type entities, and based on the official data To replace the false data in the public opinion data, so as to ensure the correctness of the subsequent mapped relationship and the correctness of the deduced influence situation.
  • FIG. 3 is a schematic structural diagram of a data analysis apparatus according to an embodiment of the present application.
  • the apparatus can be applied to computer equipment.
  • the device may include:
  • the obtaining module 301 is used for obtaining public opinion data.
  • the entity extraction module 302 is configured to perform entity extraction on the public opinion data to obtain multiple entities.
  • the relationship extraction module 303 is configured to perform relationship extraction on the multiple entities according to the public opinion data to obtain multiple relationship pairs.
  • the determining module 304 is configured to determine a standard name corresponding to each entity included in each relationship pair in the plurality of relationship pairs.
  • the mapping module 305 is configured to map the relationship between the entities included in each relationship pair to the relationship between standard names corresponding to the entities included in each relationship pair.
  • the entity extraction module 302 performs entity extraction on the public opinion data to obtain a plurality of entities, specifically encoding a plurality of words included in the public opinion data to obtain a first set of word vectors,
  • the first word vector set includes a word vector of each word in the plurality of words; vocabulary enhancement is performed on the first word vector set to obtain a second word vector set; based on the second word vector set Identify, get multiple entities.
  • the relationship extraction module 303 performs relationship extraction on the multiple entities according to the public opinion data to obtain multiple relationship pairs, specifically obtaining target entity pairs according to the multiple entities; Determine the target sentence including the target entity pair from the public opinion data, and mark the position information of each entity in the target entity pair in the target sentence; combine the target sentence and each entity in the target entity pair The position information in the target sentence is input into the relationship prediction model to predict the relationship, and the relationship between the entities in the target entity pair is obtained; the relationship between the entities in the target entity pair is constructed according to the target entity pair and the relationship between the entities in the target entity pair. target relationship pairs, and obtain a plurality of relationship pairs including the target relationship pairs.
  • the relationship extraction module 303 inputs the target sentence and the position information of each entity in the target entity pair in the target sentence into a relationship prediction model for relationship prediction, and obtains the target The relationship between the entities in the entity pair, specifically, using the coding layer included in the relationship prediction model to perform coding processing according to the target sentence and the position information of each entity in the target entity pair in the target entity pair, to obtain the target entity pair. encoding results of each entity in the target entity pair; pooling the encoding results of each entity in the target entity pair by using the pooling layer included in the relationship prediction model to obtain the encoding results of each entity in the target entity pair. Pooling results; using the classification layer included in the relationship prediction model to perform a classification operation on the pooling results of the entities in the target entity pair, to obtain the relationship between the entities in the target entity pair.
  • the determining module 304 determines a standard name corresponding to each entity included in each relationship pair in the plurality of relationship pairs, specifically, assigning the first type of each relationship pair in the plurality of relationship pairs The entities are matched with the standard names included in the database, so as to determine the standard names corresponding to the entities of the first type from the database; determine the multiple Standard naming corresponding to each entity of the second type in the relationship pair, the first type being different from the second type.
  • the relationship extraction module 303 matches the entities of the first type in the plurality of relationship pairs with the standard names included in the database, so as to determine the entity from the database.
  • the standard naming corresponding to the entity of the first type specifically calculating the relationship coefficient between each entity of the first type in the plurality of relationship pairs and each standard naming included in the database by using a short text matching model;
  • the relationship coefficient between the entity of the type and each standard naming included in the database, and the standard naming with the relationship coefficient between the entities of each first type being greater than or equal to a preset value is determined from the database, as the corresponding entity of each first type. standard naming.
  • the data analysis apparatus further includes an analysis module 306 .
  • the analysis module 306 is configured to perform sentiment polarity analysis on the target sentence in the public opinion data, and obtain the target entity included in the target sentence with the sentiment polarity label of the target entity Determine the target standard naming corresponding to the target entity and other standard naming associated with the target standard naming; According to the emotional polarity label of the target entity, determine that the public opinion data is named for the target standard corresponding to the target entity. Impact and impact on the other standard nomenclature mentioned.
  • the data analysis device can obtain public opinion data, and perform entity extraction on the public opinion data to obtain multiple entities; then the data analysis device can perform relationship extraction on the multiple entities according to the public opinion data. , obtain multiple relationship pairs, and determine the standard naming corresponding to each entity included in each relationship pair in the multiple relationship pairs, so that the relationship between the entities included in each relationship pair is mapped to each relationship pair including This process can extract effective information from public opinion data to discover potential connections between things.
  • FIG. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the computer device described in this embodiment may include: one or more processors 1000 and a memory 2000 .
  • the processor 1000 and the memory 2000 may be connected through a bus or the like.
  • the processor 1000 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC) , Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 2000 can be a high-speed RAM memory, or a non-volatile memory, such as a disk memory.
  • the memory 2000 is used to store a set of program codes, and the processor 1000 can call the program codes stored in the memory 2000 . specifically:
  • the processor 1000 is configured to obtain public opinion data; perform entity extraction on the public opinion data to obtain a plurality of entities; perform relationship extraction on the plurality of entities according to the public opinion data to obtain a plurality of relationship pairs; determine the plurality of Standard naming corresponding to each entity included in each relationship pair in the relationship pair; mapping the relationship between the entities included in each relationship pair to the relationship between standard naming corresponding to the entities included in each relationship pair.
  • the processor 1000 is specifically configured to encode multiple words included in the public opinion data to obtain a first word vector set, where the first word vector set includes each word in the multiple words
  • the word vector is obtained by performing vocabulary enhancement on the first word vector set to obtain a second word vector set; and entity recognition is performed based on the second word vector set to obtain a plurality of entities.
  • the processor 1000 is further specifically configured to obtain a target entity pair according to the multiple entities; determine a target sentence including the target entity pair from the public opinion data, and mark the target entity pair position information of each entity in the target sentence; input the position information of each entity in the target sentence and the target entity pair in the target sentence into a relationship prediction model for relationship prediction, and obtain the target entity The relationship between the entities in the pair; the target relationship pair is constructed according to the target entity pair and the relationship between the entities in the target entity pair, and a plurality of relationship pairs including the target relationship pair are obtained.
  • the processor 1000 is further specifically configured to use the encoding layer included in the relationship prediction model to perform encoding processing according to the target sentence and the position information of each entity in the target entity pair in the target entity pair, Obtain the encoding result of each entity in the target entity pair; use the pooling layer included in the relationship prediction model to perform pooling processing on the encoding result of each entity in the target entity pair, and obtain the target entity pair.
  • the pooling result of each entity using the classification layer included in the relationship prediction model to perform a classification operation on the pooling result of each entity in the target entity pair, to obtain the relationship between each entity in the target entity pair.
  • the processor 1000 is further specifically configured to match the entities of the first type in the plurality of relation pairs with the standard names included in the database, so as to determine the entities of the first type from the database.
  • Standard naming corresponding to the entity of the first type according to the corresponding relationship between the entity of the second type and the standard naming, the standard naming corresponding to each entity of the second type in the plurality of relationship pairs is determined, and the first type is related to the standard naming.
  • the second type is different.
  • the processor 1000 is further specifically configured to calculate the relationship coefficient between each entity of the first type in the plurality of relationship pairs and each standard name included in the database by using a short text matching model; The relationship coefficient between the entity of the first type and each standard name included in the database, and the standard name whose relationship coefficient with each first type entity is greater than or equal to a preset value is determined from the database as the standard name of each first type. Standard naming for entities.
  • the processor 1000 is further specifically configured to perform sentiment polarity analysis on the target sentence in the public opinion data, and obtain the target entity included in the target sentence with the sentiment polarity label of the target entity; determine The target standard naming corresponding to the target entity and other standard naming associated with the target standard naming; according to the emotional polarity label of the target entity, determine the impact of the public opinion data on the target standard naming corresponding to the target entity. and the impact on the other standard nomenclature.
  • the processor 1000 described in the embodiments of the present application may execute the implementation manners described in the embodiments of FIG. 1 and FIG. 2 , and may also execute the implementation manners described in the embodiments of the present application, which will not be repeated here. .
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the methods in the foregoing embodiments can be implemented, or the computer program is processed When the device is executed, the functions of each module of the device in the above embodiment can be implemented, which will not be repeated here.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • Each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of sampling hardware or in the form of sampling software function modules.
  • the computer-readable storage medium can be volatile or non-volatile.
  • the computer storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), and the like.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; Use the created data, etc.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据分析方法、装置、计算机设备及存储介质,该方法应用于大数据技术领域,该方法可以包括:获取舆情数据(S101);对所述舆情数据进行实体抽取,得到多个实体(S102);根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对(S103);确定所述多个关系对中每个关系对包括的各实体对应的标准命名(S104);将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系(S105)。上述方法可以从舆情数据提取有效信息以发现潜在的事物间的联系。上述方法还涉及区块链技术,如可从区块链获取舆情数据的摘要信息,并基于摘要信息查询舆情数据。

Description

一种数据分析方法、装置、计算机设备及存储介质
本申请要求于2021年4月27日提交中国专利局、申请号为202110459121.2,发明名称为“一种数据分析方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据分析技术领域,尤其涉及一种数据分析方法、装置、计算机设备及存储介质。
背景技术
随着信息全球化的发展,网络等媒介已成为人们日常生活中不可缺少的一部分。网络舆论等等舆情数据已成为人们用来表达自己言论的主要渠道。网络舆论即通过互联网表达的社会舆论。网络舆论的发酵,会对个人、企业、行业乃至社会产生各种影响,这种影响可能是正面的也可能是负面的。发明人意识到,事实上,新兴事物的出现、知识的不足等原因会导致从舆情数据提取有效信息的难度增加,从而更难发现潜在的事物间的联系。因此如何从舆情数据提取有效信息以发现潜在的事物间的联系成为亟待解决的问题。
发明内容
本申请实施例提供了一种数据分析方法、装置、计算机设备及存储介质,可以从舆情数据提取有效信息以发现潜在的事物间的联系。
第一方面,本申请实施例提供了一种数据分析方法,包括:
获取舆情数据;
对所述舆情数据进行实体抽取,得到多个实体;
根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;
确定所述多个关系对中每个关系对包括的各实体对应的标准命名;
将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
第二方面,本申请实施例提供了一种数据分析装置,包括:
获取模块,用于获取舆情数据;
实体抽取模块,用于对所述舆情数据进行实体抽取,得到多个实体;
关系抽取模块,用于根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;
确定模块,用于确定所述多个关系对中每个关系对包括的各实体对应的标准命名;
映射模块,用于将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
第三方面,本申请实施例提供了一种计算机设备,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下方法:
获取舆情数据;
对所述舆情数据进行实体抽取,得到多个实体;
根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;
确定所述多个关系对中每个关系对包括的各实体对应的标准命名;
将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现以下方法:
获取舆情数据;
对所述舆情数据进行实体抽取,得到多个实体;
根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;
确定所述多个关系对中每个关系对包括的各实体对应的标准命名;
将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
本申请能够从舆情数据提取有效信息以发现潜在的事物间的联系。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种数据分析方法的流程示意图;
图2是本申请实施例提供的另一种数据分析方法的流程示意图;
图3是本申请实施例提供的一种数据分析装置的结构示意图;
图4是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
本申请的技术方案可涉及大数据技术领域,可应用于数据分析如针对舆情数据的数据分析等场景中,以从舆情数据提取有效信息,从而推动智慧城市的建设。可选的,本申请涉及的数据如舆情数据和/或实体间的关系信息等可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。
请参阅图1,为本申请实施例提供的一种数据分析方法的流程示意图。该方法可以应用于计算机设备,计算机设备可以为服务器或智能终端。具体地,该方法可包括如下步骤:
S101、获取舆情数据。
S102、对所述舆情数据进行实体抽取,得到多个实体。
其中,舆情数据包括但不限于新闻、网络言论、个人/官方等发布的文章等数据。多个实体可以包括以下至少一种类型的实体:第一类型的实体(如产业实体)、第二类型的实体(如企业实体)、时间、地点、人物。在一个实施例中,多个实体还可以包括其它类型的实体,在此不一一列举。
在一个实施例中,计算机设备对该舆情数据进行实体抽取,得到多个实体的方式可以为:计算机设备对该舆情数据包括的多个词进行编码,得到第一词向量集合,该第一词向量集合包括该多个词中每个词的词向量;计算机设备对所述第一词向量集合进行词汇增强,得到第二词向量集合,并基于该第二词向量集合进行实体识别,得到多个实体。在一个实施例中,计算机设备可以通过第一BERT(英文全称:Bidirectional Encoder Representations from Transformers)模型对该舆情数据包括的多个词进行编码,得到第一词向量集合。在一个实施例中,计算机设备可以通过词汇增强Lexicon Augment方法,如Soft Lexicon方法对该第一词向量集合进行词汇增强,得到第二词向量集合。在一个实施例中,计算机设备可以通过LSTM+CRF模型对第二词向量集合进行实体识别,得到多个实体。
在一个实施例中,计算机设备对该第一词向量集合进行词汇增强,得到第二词向量集合的方式具体可以如下:计算机设备获取该多个词中的目标词的目标词编码集合,该目标词为该多个词中的任一词,目标词编码集合包括多个位置标签中每个位置标签对应的词的词编码;计算机设备将该目标词编码集合与该第一词向量集合中该目标词的词向量进行拼接处理,得到该目标词对应的拼接的词向量,并根据该目标词对应的拼接的词向量生成第二词向量集合。其中,目标词的词向量为目标词的基础的向量表达,目标词对应的拼接的 词向量为目标词的最终的向量表达,本申请实施例通过使用目标词编码集合增强了对目标词的向量表示。
在一个实施例中,所述的目标词编码集合可以为BMES词编码集合,所述多个位置标签可以包括标签B、标签M、标签E、标签S。B表示开始位置、M表示中间位置、E表示结束位置、S表示单个或单独的位置。其中,BMES词编码集合可以通过公式1.1获得:
e s(B,M,E,S)=[v s(B);v s(M);v s(E);v s(S)]公式1.1;
其中,基于公式1.1得到的BMES词编码集合与目标词的词向量进行拼接处理的过程可以通过公式1.2表示:
x c←[x c;e s(B,M,E,S)]公式1.2;
在公式1.1和公式1.2中,e s表示BMES词编码集合。v s表示词汇编码,x c表示目标词的词向量。公式1.2将x c与标签B、标签M、标签E、标签S分别对应的词的v s进行拼接处理,得到目标词对应的拼接的词向量。
S103、根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对。
在一个实施例中,计算机设备根据该舆情数据对该多个实体进行关系抽取,得到多个关系对的方式可以为:计算机设备具体可以利用关系抽取工具来根据该舆情数据对该多个实体进行关系抽取,得到多个实体对。
在一个实施例中,计算机设备根据该舆情数据对该多个实体进行关系抽取,得到多个关系对的方式还可以为:计算机设备还可以根据该多个实体获得目标实体对,并从该舆情数据中确定出包括该目标实体对的目标句子,并标注该目标实体对中各实体在该目标句子中的位置信息;计算机设备将该目标句子以及该目标实体对中各实体在该目标句子中的位置信息输入关系预测模型以进行关系预测,得到该目标实体对中各实体间的关系,并根据该目标实体对以及该目标实体对中各实体间的关系构建目标关系对,并得到包括该目标关系对的多个关系对。在一个实施例中,计算机设备根据该多个实体获得目标实体对的方式可以为计算机设备从多个实体中确定出目标实体对。其中,目标实体对可以由两个第一类型的实体构成,或由两个第二类型的实体构成,或由一个第一类型的实体和一个第二类型的实体构成。目标句子指包括目标实体对的句子。一般来讲,一个句子对应的实体对可以为一个或多个。大多情况下,一个句子对应一个实体对。在一个实施例中,所述的位置信息可以为起始位置信息。其中,关系预测模型例如可以为第二BERT模型。目标实体对可以表示为(实体x,实体y),目标关系对例如可以表示为(关系r,实体x,实体y)。
在一个实施例中,计算机设备将该目标句子以及该目标实体对中各实体在该目标句子中的位置信息输入关系预测模型以进行关系预测,得到该目标实体对中各实体间的关系的方式可以为:计算机设备利用关系预测模型包括的编码层根据该目标句子以及该目标实体对中各实体在该目标实体对中的位置信息进行编码处理,得到对该目标实体对中各实体的编码结果;计算机设备利用该关系预测模型包括的池化层对该目标实体对中各实体的编码结果进行池化处理,得到对该目标实体对中各实体的池化结果,并利用该关系预测模型包括的分类层对该目标实体对中各实体的池化结果执行分类操作,得到该目标实体对中各实体间的关系。该过程通过关系预测模型能够准确预测实体间的关系。
在一个实施例中,在利用该关系预测模型包括的分类层对该目标实体对中各实体的池化结果执行分类操作,得到该目标实体对中各实体间的关系的方式可以如下:计算机设备将该目标实体对中各实体的池化结果代入公式1.3,以计算目标实体对在多个关系中每个关系的概率值,并选取概率值最大的关系作为目标实体对中各实体间的关系。
P(r_ij│x,e_i,e_j)=softmax(W[o_i:o_j]+b)公式1.3;
其中,x表示目标句子,r表示目标实体对包括的各实体间的关系。e_i,e_j表示实体 i和实体j。目标实体对由e_i和e_j构成。此处的o_i,o_j分别表示实体i的池化结果和实体j的池化结果。W为权重,b为分类层参数。
在一个实施例中,训练关系预测模型的过程所使用的损失函数为对数损失函数。
S104、确定所述多个关系对中每个关系对包括的各实体对应的标准命名。
本申请实施例中,计算机设备针对第一类型的实体和第二类型的实体可以有两种不同的确定标准命名的方式。下面将阐述将对两种不同的确定标准命名的方式进行阐述。
在一个实施例中,计算机设备确定该多个关系对中每个关系对包括的各实体对应的标准命名的方式可以为:计算机设备将该多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从该数据库中确定出该各第一类型的实体对应的标准命名。在一个实施例中,计算机确定第一类型的实体对应的标准命名的方法可以称之为短文本匹配算法。需要说明的是,本申请实施例中,不一定每个关系对都包括第一类型的实体。同样,不一定每个关系对都包括第二类型的实体。
在一个实施例中,计算机设备将所述多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从所述数据库中确定出所述各第一类型的实体对应的标准命名的方式可以为:计算机设备通过短文本匹配模型计算所述多个关系对中各第一类型的实体与数据库包括的各个标准命名之间的关系系数,并根据所述各第一类型的实体与数据库包括的各个标准命名之间的关系系数,从所述数据库中确定出与各第一类型的实体间关系系数大于等于预设值的标准命名,作为各第一类型的实体对应的标准命名。在一个实施例中,所述的短文本匹配模型可以为ESIM模型。ESIM模型为能够实现短文本匹配功能的模型。
例如,假设多个关系对包括关系对1,关系对1包括实体1,实体2,实体1和实体2均为第一类型的实体。数据库包括标准命名1和标准命名2。计算机设备可以通过短文本匹配模型计算实体1与标准命名1之间关系系数,并计算实体1与标准命名2之间的关系系数,然后从标准命名1和标准命名2中选取对应关系系数最大的标准命名作为实体1对应的标准命名。与此同时,计算机设备还可以通过短文本匹配模型计算实体2与标准命名1之间的关系系数,并计算实体2与标准命名2之间的关系系数,然后从标准命名1和标准命名2中选取对应关系系数最大的标准命名作为实体2对应的标准命名。
在一个实施例中,计算机设备通过短文本匹配模型计算所述多个关系对中各第一类型的实体与数据库包括的各个标准命名之间的关系系数,过程具体如下:
①(采用BiLSTM算法)分别对各第一类型的实体中的一第一类型的实体以及从第一数据库选取的一标准命名进行编码,得到对一第一类型的实体的编码结果,以及对一标准命名的编码结果。其中,一第一类型的实体的编码结果包括该第一类型的实体所包括的各个词的编码结果。一标准命名的编码结果包括该标准命名所包括的各个词的编码结果。其中,第一类型的实体所包括的各个词的编码方式以及标准命名所包括的各个词的编码方式可以参见下面的两个公式,公式1.4和公式1.5。
Figure PCTCN2021097114-appb-000001
表示第一类型的实体包括的第i个词的编码结果,
Figure PCTCN2021097114-appb-000002
表示标准命名包括的第i个词的编码结果。l a表示第一类型的实体的长度,l b表示标准命名的长度。
Figure PCTCN2021097114-appb-000003
Figure PCTCN2021097114-appb-000004
②将一第一类型的实体编码结果以及一标准命名的编码结果输入到局部推理建模Local Inference Modeling层,由Local Inference Modeling层计算第一实体所包括的各个词与选取的一标准命名所包括的各个词之间的相似度,并根据计算出的相似度对一第一类型的实体以及一标准命名进行局部推理,得到一第一类型的实体的局部推理信息以及一标准命名的局部推理信息。其中,一第一类型的实体的局部推理信息,可以包括该第一类型的实体所包括的各个词的局部推理信息,一标准命名的局部推理信息,可以包括该标准命名所 包括的各个词的局部推理信息。其中,局部推理的过程可以参见下面这两个公式,公式1.6和公式1.7。
Figure PCTCN2021097114-appb-000005
表示一第一类型的实体的第i个词的局部推理信息,
Figure PCTCN2021097114-appb-000006
表示一标准命名的第j个词的局部推理信息。e ij表示一第一类型的实体的第i个词与一标准命名的第j个词的相似度。e ik表示一第一类型的实体的第i个词与一标准命名的第k个词的相似度。e kj表示一第一类型的实体的第k个词与一标准命名的第j个词的相似度。
Figure PCTCN2021097114-appb-000007
Figure PCTCN2021097114-appb-000008
③根据一第一类型的实体的编码结果、该第一类型的实体的局部推理信息计算该第一类型的实体增强的局部推理Enhancementlocal inference信息,并根据一标准命名的编码结果以及该标准命名的局部推理信息计算该标准命名增强的局部推理信息。其中,计算增强的局部推理信息的过程可以参见下面这公式。增强的局部推理信息用m表示。
Figure PCTCN2021097114-appb-000009
④将增强的局部推理信息输入max polling池化层和全连接层,输出一第一类型的实体和一标准命名间的相似系数作为该第一类型的实体和该标准命名间的关系系数。
在一个实施例中,计算机设备确定该多个关系对中每个关系对包括的各实体对应的标准命名的方式还可以为:计算机设备根据第二类型的实体与标准命名的对应关系确定出该多个关系对中的各第二类型的实体对应的标准命名,该第一类型与该第二类型不同。在一个实施例中,计算机设备确定第二类型的实体对应的标准命名的方法可以称之为全简称匹配算法。在一个实施例中,计算机设备根据多个关系对中的各第二类型的实体以及其它数据库记录的第二类型的实体与标准命名间的对应关系,从其它数据库中确定出各第二类型的实体对应的标准命名。
S105、将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
本申请实施例中,计算机设备可以将每个关系对包括的各实体间的关系确定为每个关系对包括的各实体对应的标准命名间的关系。该过程可以将根据舆情数据抽取的实体间的关系映射到对应的标准命名上。
在一个实施例中,计算机设备可以根据每个关系对包括的各实体对应的标准命名以及每个关系对包括的各实体对应的标准命名,构建关系网络。在实际的应用场景中,采用本申请实施例,可以对舆情数据中涉及的产业和企业的关系进行深挖,从而构造出产业-企业的关系网络,为后续的传导推演以及人工决策提供的帮助。
在一个实施例中,计算机设备可以利用每个关系对包括的各实体对应的标准命名间的关系更新已有的关系网络。
可见,图1所示的实施例中,计算机设备可以获取舆情数据,并对该舆情数据进行实体抽取,得到多个实体;而后计算机设备可以根据该舆情数据对该多个实体进行关系抽取,得到多个关系对,并确定该多个关系对中每个关系对包括的各实体对应的标准命名,从而将该每个关系对包括的各实体间的关系映射为该每个关系对包括的各实体对应的标准命名间的关系,该过程能够从舆情数据提取有效信息以发现潜在的事物间的联系。
请参阅图2,为本申请实施例提供的另一种数据分析方法的流程示意图。该方法可以 应用于计算机设备,计算机设备可以为服务器或智能终端。具体地,该方法可以包括如下步骤:
S201、获取舆情数据。
S202、对所述舆情数据进行实体抽取,得到多个实体。
S203、根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对。
S204、确定所述多个关系对中每个关系对包括的各实体对应的标准命名。
S205、将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
其中,步骤S201-步骤S205可以参见图1实施例中的步骤S201-步骤S205,在此不做赘述。
S206、对所述舆情数据中的目标句子进行情感极性分析,得到所述目标句子包括的目标实体以所述目标实体的情感极性标签。
其中,目标句子,例如可以为舆情数据的标题,或为舆情数据的正文,或为舆情数据的全文等。在一个实施例中,目标实体可以前述提及的第二类型的实体,例如可以为企业实体。情感极性标签例如可以为正向标签和/或负向标签,或还可以为其它情感极性标签。
在一个实施例中,计算机设备对该舆情数据中的目标句子进行情感极性分析,得到该目标句子包括的目标实体以该目标实体的情感极性标签的方式可以为:计算机设备利用第三BERT模型对舆情数据中的目标句子进行情感极性分析,得到目标句子包括的目标实体以及目标实体的情感极性标签。
S207、确定所述目标实体对应的目标标准命名以及所述目标标准命名关联的其它标准命名。
在一个实施例中,计算机设备可以利用前述提及的确定所述多个关系对中每个关系对包括的各实体对应的标准命名的方式,以确定该目标实体对应的目标标准命名。在一个实施例中,计算机设备可以根据第二类型的实体与标准命名的对应关系确定出所述目标实体对应的目标标准命名。
在一个实施例中,计算机设备可以确定目标标准命名关联的其它标准命名的方式可以为:计算机设备通过搜索关系网络以搜索该目标标准命名关联的其它标准命名。
S208、根据所述目标实体的情感极性标签,确定所述舆情数据对所述目标实体对应的目标标准命名的影响情况以及对所述其它标准命名的影响情况。
其中,目标标准命名为目标实体对应的标准命名。目标标准命名关联的其它标准命名可以为目标标准命名所关联的第一类型的实体对应的标准命名和/或第二类型的实体对应的标准命名。
或,计算机设备还可以确定目标实体对应的目标标准命名以及目标标准命名关联的其它标准命名对应的实体,然后根据目标实体的情感极性标签,确定舆情数据对目标实体的影响情况以及对其它标准命名对应的实体的影响情况。
在实际的应用场景中,由于舆情数据可能会涉及到多个主体,且每个主体的情感极性不一样。不同于传统的情感分类任务,本申请实施例在训练初始的bert模型的过程中,可以充分运用BERT模型的序列标注的优势,将多主体的句子分别标注不同的情感极性标签。例如,对于句子“*讯股价大涨,而*易股价大跌!”,*讯为一个企业。*易为另一个企业。该句子的情感极性标签的构造如下表:
Figure PCTCN2021097114-appb-000010
由上表可以看出,本方案具体采用BIO标注方式对样本语句进行标注,利用标注了标签的样本语句训练初始的BERT模型,得到用于性感极性分析的BERT模型作为第三BERT模型。由上表可以看出,标注的标签包括B-POS、I-POS、B-NEG、I-NEG、O。B-POS表示该字符在某个实体的开始(Begin)位置并且该字符所在的实体的情感极性是正向(Positive),I-POS表示该字符在实体里面(Inside)并且该字符所在的实体的情感极性为正向(Positive);同理,B-NEG表示该字符在实体开始(Begin)位置并且该字符所在实体的情感极性是负向(Negtive),I-NEG表示该字符在实体里面(Inside)并且该字符所在实体的情感极性为负向(Negtive),O表示该字符在实体之外(Outside)。通过这种标注方式,BERT模型训练时会认为“*讯”是正向,“*易”是负向,从而训练出能区分多主体的用于情感分析的BERT模型。
在一个实施例中,计算机设备可以确定目标标准命名和其它标准命名之前的关系,或可以确定目标实体和其它标准命名对应的实体间的关系,然后根据确定的关系以及目标实体的情感极性标签,确定所述舆情数据对所述目标实体对应的目标标准命名的影响情况以及对所述其它标准命名的影响情况。
在实际的生产生活中,产业和企业一直是产业分析研究中的热点。对政府而言,产业的研究能有效的辅助政策决策、宏观调控,对企业而言,产业的动态能反映行业前景、发展新的业务方向,对于个人来说,产业的分析能辅助个体投资方向和从业方向。基于舆情的产业分析能更好的把握业内动态和发展,同时也能挖掘出目前业内未能发现的产业企业间的关系。本申请实施例能够实现发生了针对某个主体正面或负面事件后,推导出其对关联的产业或企业产生的影响,例如,通过海量的舆情数据,计算机设备可以挖掘出企业B是A的上游供应商,行业I是A企业的行业,那么如果企业A有一个重大的正面新闻,那么供应商B和该行业I都会有所影响,显而易见的,上游供应商B会因为A的重大正面而利好,同时行业I也会有所利好,系统采用这一套方法,可以挖掘出舆情本身隐藏的信息,得出企业B和行业I的利好利空方向。
可见,图2所示的实施例中,计算机设备还可以确定对该舆情数据中的目标句子进行情感极性分析,得到该目标句子包括的目标实体以该目标实体的情感极性标签,并确定该目标实体对应的目标标准命名以及该目标标准命名关联的其它标准命名,从而根据该目标实体的情感极性标签,确定该舆情数据对该目标实体对应的目标标准命名的影响情况以及对其它标准命名的影响情况,该过程能够基于情感极性分析有效的进行企业产业传导影响推演。
本申请涉及区块链技术,如可从区块链获取舆情数据的摘要信息,并基于摘要信息查询舆情数据。或,本申请还可以多个第二类型的实体中每个第二类型的实体关联的区块链节点,并从每个第二类型的实体关联的区块链节点同步官方数据,基于官方数据来对舆情数据中的虚假数据进行替换,从而保障后续映射出的关系的正确性以及推导出的影响情况的正确性。
请参阅图3,为本申请实施例提供的一种数据分析装置的结构示意图。该装置可以应用于计算机设备。具体地,该装置可以包括:
获取模块301,用于获取舆情数据。
实体抽取模块302,用于对所述舆情数据进行实体抽取,得到多个实体。
关系抽取模块303,用于根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对。
确定模块304,用于确定所述多个关系对中每个关系对包括的各实体对应的标准命名。
映射模块305,用于将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
在一种可选的实施方式中,实体抽取模块302对所述舆情数据进行实体抽取,得到多 个实体,具体为对所述舆情数据包括的多个词进行编码,得到第一词向量集合,所述第一词向量集合包括所述多个词中每个词的词向量;对所述第一词向量集合进行词汇增强,得到第二词向量集合;基于所述第二词向量集合进行实体识别,得到多个实体。
在一种可选的实施方式中,关系抽取模块303根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对,具体为根据所述多个实体获得目标实体对;从所述舆情数据中确定出包括所述目标实体对的目标句子,并标注所述目标实体对中各实体在所述目标句子中的位置信息;将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系;根据所述目标实体对以及所述目标实体对中各实体间的关系构建目标关系对,并得到包括所述目标关系对的多个关系对。
在一种可选的实施方式中,关系抽取模块303将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系,具体为利用关系预测模型包括的编码层根据所述目标句子以及所述目标实体对中各实体在所述目标实体对中的位置信息进行编码处理,得到对所述目标实体对中各实体的编码结果;利用所述关系预测模型包括的池化层对所述目标实体对中各实体的编码结果进行池化处理,得到对所述目标实体对中各实体的池化结果;利用所述关系预测模型包括的分类层对所述目标实体对中各实体的池化结果执行分类操作,得到所述目标实体对中各实体间的关系。
在一种可选的实施方式中,确定模块304定所述多个关系对中每个关系对包括的各实体对应的标准命名,具体为将所述多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从所述数据库中确定出所述各第一类型的实体对应的标准命名;根据第二类型的实体与标准命名的对应关系确定出所述多个关系对中的各第二类型的实体对应的标准命名,所述第一类型与所述第二类型不同。
在一种可选的实施方式中,关系抽取模块303将所述多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从所述数据库中确定出所述各第一类型的实体对应的标准命名,具体为通过短文本匹配模型计算所述多个关系对中各第一类型的实体与数据库包括的各个标准命名之间的关系系数;根据所述各第一类型的实体与数据库包括的各个标准命名之间的关系系数,从所述数据库中确定出与各第一类型的实体间关系系数大于等于预设值的标准命名,作为各第一类型的实体对应的标准命名。
在一种可选的实施方式中,所述数据分析装置还包括分析模块306。
在一种可选的实施方式中,分析模块306,用于对所述舆情数据中的目标句子进行情感极性分析,得到所述目标句子包括的目标实体以所述目标实体的情感极性标签;确定所述目标实体对应的目标标准命名以及所述目标标准命名关联的其它标准命名;根据所述目标实体的情感极性标签,确定所述舆情数据对所述目标实体对应的目标标准命名的影响情况以及对所述其它标准命名的影响情况。
可见,图3所示的实施例中,数据分析装置可以获取舆情数据,并对该舆情数据进行实体抽取,得到多个实体;而后数据分析装置可以根据该舆情数据对该多个实体进行关系抽取,得到多个关系对,并确定该多个关系对中每个关系对包括的各实体对应的标准命名,从而将该每个关系对包括的各实体间的关系映射为该每个关系对包括的各实体对应的标准命名间的关系,该过程能够从舆情数据提取有效信息以发现潜在的事物间的联系。
请参阅图4,为本申请实施例提供的一种计算机设备的结构示意图。本实施例中所描述的计算机设备可以包括:一个或多个处理器1000和存储器2000。处理器1000和存储器2000可以通过总线等方式连接。
处理器1000可以是中央处理模块(Central Processing Unit,CPU),该处理器还可以是其 他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器2000可以是高速RAM存储器,也可为非不稳定的存储器(non-volatile memory),例如磁盘存储器。存储器2000用于存储一组程序代码,处理器1000可以调用存储器2000中存储的程序代码。具体地:
处理器1000,用于获取舆情数据;对所述舆情数据进行实体抽取,得到多个实体;根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;确定所述多个关系对中每个关系对包括的各实体对应的标准命名;将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
在一个实施例中,处理器1000,具体用于对所述舆情数据包括的多个词进行编码,得到第一词向量集合,所述第一词向量集合包括所述多个词中每个词的词向量;对所述第一词向量集合进行词汇增强,得到第二词向量集合;基于所述第二词向量集合进行实体识别,得到多个实体。
在一个实施例中,处理器1000,还具体用于根据所述多个实体获得目标实体对;从所述舆情数据中确定出包括所述目标实体对的目标句子,并标注所述目标实体对中各实体在所述目标句子中的位置信息;将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系;根据所述目标实体对以及所述目标实体对中各实体间的关系构建目标关系对,并得到包括所述目标关系对的多个关系对。
在一个实施例中,处理器1000,还具体用于利用关系预测模型包括的编码层根据所述目标句子以及所述目标实体对中各实体在所述目标实体对中的位置信息进行编码处理,得到对所述目标实体对中各实体的编码结果;利用所述关系预测模型包括的池化层对所述目标实体对中各实体的编码结果进行池化处理,得到对所述目标实体对中各实体的池化结果;利用所述关系预测模型包括的分类层对所述目标实体对中各实体的池化结果执行分类操作,得到所述目标实体对中各实体间的关系。
在一个实施例中,处理器1000,还具体用于将所述多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从所述数据库中确定出所述各第一类型的实体对应的标准命名;根据第二类型的实体与标准命名的对应关系确定出所述多个关系对中的各第二类型的实体对应的标准命名,所述第一类型与所述第二类型不同。
在一个实施例中,处理器1000,还具体用于通过短文本匹配模型计算所述多个关系对中各第一类型的实体与数据库包括的各个标准命名之间的关系系数;根据所述各第一类型的实体与数据库包括的各个标准命名之间的关系系数,从所述数据库中确定出与各第一类型的实体间关系系数大于等于预设值的标准命名,作为各第一类型的实体对应的标准命名。
在一个实施例中,处理器1000,还具体用于对所述舆情数据中的目标句子进行情感极性分析,得到所述目标句子包括的目标实体以所述目标实体的情感极性标签;确定所述目标实体对应的目标标准命名以及所述目标标准命名关联的其它标准命名;根据所述目标实体的情感极性标签,确定所述舆情数据对所述目标实体对应的目标标准命名的影响情况以及对所述其它标准命名的影响情况。
具体实现中,本申请实施例中所描述的处理器1000可执行图1实施例、图2实施例所描述的实现方式,也可执行本申请实施例所描述的实现方式,在此不再赘述。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时可实现上述实施例中方法的步骤,或者,计算机程 序被处理器执行时可实现上述实施例中装置的各模块的功能,这里不再赘述。可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以是两个或两个以上模块集成在一个模块中。上述集成的模块既可以采样硬件的形式实现,也可以采样软件功能模块的形式实现。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的计算机可读存储介质可为易失性的或非易失性的。例如,该计算机存储介质可以为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。所述的计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
其中,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于本申请所涵盖的范围。

Claims (20)

  1. 一种数据分析方法,包括:
    获取舆情数据;
    对所述舆情数据进行实体抽取,得到多个实体;
    根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;
    确定所述多个关系对中每个关系对包括的各实体对应的标准命名;
    将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
  2. 根据权利要求1所述的方法,其中,所述对所述舆情数据进行实体抽取,得到多个实体,包括:
    对所述舆情数据包括的多个词进行编码,得到第一词向量集合,所述第一词向量集合包括所述多个词中每个词的词向量;
    对所述第一词向量集合进行词汇增强,得到第二词向量集合;
    基于所述第二词向量集合进行实体识别,得到多个实体。
  3. 根据权利要求1所述的方法,其中,所述根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对,包括:
    根据所述多个实体获得目标实体对;
    从所述舆情数据中确定出包括所述目标实体对的目标句子,并标注所述目标实体对中各实体在所述目标句子中的位置信息;
    将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系;
    根据所述目标实体对以及所述目标实体对中各实体间的关系构建目标关系对,并得到包括所述目标关系对的多个关系对。
  4. 根据权利要求3所述的方法,其中,所述将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系,包括:
    利用关系预测模型包括的编码层根据所述目标句子以及所述目标实体对中各实体在所述目标实体对中的位置信息进行编码处理,得到对所述目标实体对中各实体的编码结果;
    利用所述关系预测模型包括的池化层对所述目标实体对中各实体的编码结果进行池化处理,得到对所述目标实体对中各实体的池化结果;
    利用所述关系预测模型包括的分类层对所述目标实体对中各实体的池化结果执行分类操作,得到所述目标实体对中各实体间的关系。
  5. 根据权利要求1所述的方法,其中,所述确定所述多个关系对中每个关系对包括的各实体对应的标准命名,包括:
    将所述多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从所述数据库中确定出所述各第一类型的实体对应的标准命名;
    根据第二类型的实体与标准命名的对应关系确定出所述多个关系对中的各第二类型的实体对应的标准命名,所述第一类型与所述第二类型不同。
  6. 根据权利要求5所述的方法,其中,所述将所述多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从所述数据库中确定出所述各第一类型的实体对应的标准命名,包括:
    通过短文本匹配模型计算所述多个关系对中各第一类型的实体与数据库包括的各个标准命名之间的关系系数;
    根据所述各第一类型的实体与数据库包括的各个标准命名之间的关系系数,从所述数 据库中确定出与各第一类型的实体间关系系数大于等于预设值的标准命名,作为各第一类型的实体对应的标准命名。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    对所述舆情数据中的目标句子进行情感极性分析,得到所述目标句子包括的目标实体以所述目标实体的情感极性标签;
    确定所述目标实体对应的目标标准命名以及所述目标标准命名关联的其它标准命名;
    根据所述目标实体的情感极性标签,确定所述舆情数据对所述目标实体对应的目标标准命名的影响情况以及对所述其它标准命名的影响情况。
  8. 一种数据分析装置,包括:
    获取模块,用于获取舆情数据;
    实体抽取模块,用于对所述舆情数据进行实体抽取,得到多个实体;
    关系抽取模块,用于根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;
    确定模块,用于确定所述多个关系对中每个关系对包括的各实体对应的标准命名;
    映射模块,用于将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
  9. 一种计算机设备,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下方法:
    获取舆情数据;
    对所述舆情数据进行实体抽取,得到多个实体;
    根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;
    确定所述多个关系对中每个关系对包括的各实体对应的标准命名;
    将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
  10. 根据权利要求9所述的计算机设备,其中,执行所述对所述舆情数据进行实体抽取,得到多个实体,包括:
    对所述舆情数据包括的多个词进行编码,得到第一词向量集合,所述第一词向量集合包括所述多个词中每个词的词向量;
    对所述第一词向量集合进行词汇增强,得到第二词向量集合;
    基于所述第二词向量集合进行实体识别,得到多个实体。
  11. 根据权利要求9所述的计算机设备,其中,执行所述根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对,包括:
    根据所述多个实体获得目标实体对;
    从所述舆情数据中确定出包括所述目标实体对的目标句子,并标注所述目标实体对中各实体在所述目标句子中的位置信息;
    将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系;
    根据所述目标实体对以及所述目标实体对中各实体间的关系构建目标关系对,并得到包括所述目标关系对的多个关系对。
  12. 根据权利要求11所述的计算机设备,其中,执行所述将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系,包括:
    利用关系预测模型包括的编码层根据所述目标句子以及所述目标实体对中各实体在所 述目标实体对中的位置信息进行编码处理,得到对所述目标实体对中各实体的编码结果;
    利用所述关系预测模型包括的池化层对所述目标实体对中各实体的编码结果进行池化处理,得到对所述目标实体对中各实体的池化结果;
    利用所述关系预测模型包括的分类层对所述目标实体对中各实体的池化结果执行分类操作,得到所述目标实体对中各实体间的关系。
  13. 根据权利要求9所述的计算机设备,其中,执行所述确定所述多个关系对中每个关系对包括的各实体对应的标准命名,包括:
    将所述多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从所述数据库中确定出所述各第一类型的实体对应的标准命名;
    根据第二类型的实体与标准命名的对应关系确定出所述多个关系对中的各第二类型的实体对应的标准命名,所述第一类型与所述第二类型不同。
  14. 根据权利要求9所述的计算机设备,其中,所述处理器还用于执行:
    对所述舆情数据中的目标句子进行情感极性分析,得到所述目标句子包括的目标实体以所述目标实体的情感极性标签;
    确定所述目标实体对应的目标标准命名以及所述目标标准命名关联的其它标准命名;
    根据所述目标实体的情感极性标签,确定所述舆情数据对所述目标实体对应的目标标准命名的影响情况以及对所述其它标准命名的影响情况。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现以下方法:
    获取舆情数据;
    对所述舆情数据进行实体抽取,得到多个实体;
    根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对;
    确定所述多个关系对中每个关系对包括的各实体对应的标准命名;
    将所述每个关系对包括的各实体间的关系映射为所述每个关系对包括的各实体对应的标准命名间的关系。
  16. 根据权利要求15所述的计算机可读存储介质,其中,执行所述对所述舆情数据进行实体抽取,得到多个实体,包括:
    对所述舆情数据包括的多个词进行编码,得到第一词向量集合,所述第一词向量集合包括所述多个词中每个词的词向量;
    对所述第一词向量集合进行词汇增强,得到第二词向量集合;
    基于所述第二词向量集合进行实体识别,得到多个实体。
  17. 根据权利要求15所述的计算机可读存储介质,其中,执行所述根据所述舆情数据对所述多个实体进行关系抽取,得到多个关系对,包括:
    根据所述多个实体获得目标实体对;
    从所述舆情数据中确定出包括所述目标实体对的目标句子,并标注所述目标实体对中各实体在所述目标句子中的位置信息;
    将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系;
    根据所述目标实体对以及所述目标实体对中各实体间的关系构建目标关系对,并得到包括所述目标关系对的多个关系对。
  18. 根据权利要求17所述的计算机可读存储介质,其中,执行所述将所述目标句子以及所述目标实体对中各实体在所述目标句子中的位置信息输入关系预测模型以进行关系预测,得到所述目标实体对中各实体间的关系,包括:
    利用关系预测模型包括的编码层根据所述目标句子以及所述目标实体对中各实体在所 述目标实体对中的位置信息进行编码处理,得到对所述目标实体对中各实体的编码结果;
    利用所述关系预测模型包括的池化层对所述目标实体对中各实体的编码结果进行池化处理,得到对所述目标实体对中各实体的池化结果;
    利用所述关系预测模型包括的分类层对所述目标实体对中各实体的池化结果执行分类操作,得到所述目标实体对中各实体间的关系。
  19. 根据权利要求15所述的计算机可读存储介质,其中,执行所述确定所述多个关系对中每个关系对包括的各实体对应的标准命名,包括:
    将所述多个关系对中的各第一类型的实体与数据库包括的各标准命名进行匹配,以从所述数据库中确定出所述各第一类型的实体对应的标准命名;
    根据第二类型的实体与标准命名的对应关系确定出所述多个关系对中的各第二类型的实体对应的标准命名,所述第一类型与所述第二类型不同。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还用于实现:
    对所述舆情数据中的目标句子进行情感极性分析,得到所述目标句子包括的目标实体以所述目标实体的情感极性标签;
    确定所述目标实体对应的目标标准命名以及所述目标标准命名关联的其它标准命名;
    根据所述目标实体的情感极性标签,确定所述舆情数据对所述目标实体对应的目标标准命名的影响情况以及对所述其它标准命名的影响情况。
PCT/CN2021/097114 2021-04-27 2021-05-31 一种数据分析方法、装置、计算机设备及存储介质 WO2022227196A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110459121.2A CN113157866B (zh) 2021-04-27 2021-04-27 一种数据分析方法、装置、计算机设备及存储介质
CN202110459121.2 2021-04-27

Publications (1)

Publication Number Publication Date
WO2022227196A1 true WO2022227196A1 (zh) 2022-11-03

Family

ID=76871468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097114 WO2022227196A1 (zh) 2021-04-27 2021-05-31 一种数据分析方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN113157866B (zh)
WO (1) WO2022227196A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114386422B (zh) * 2022-01-14 2023-09-15 淮安市创新创业科技服务中心 基于企业污染舆情抽取的智能辅助决策方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635074A (zh) * 2018-11-13 2019-04-16 平安科技(深圳)有限公司 一种基于舆情信息的实体关系分析方法及终端设备
US20190155898A1 (en) * 2017-11-23 2019-05-23 Beijing Baidu Netcom Science And Technology Co. Ltd. Method and device for extracting entity relation based on deep learning, and server
CN110781683A (zh) * 2019-11-04 2020-02-11 河海大学 一种实体关系联合抽取方法
CN110990525A (zh) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 一种基于自然语言处理的舆情信息抽取及知识库生成方法
CN112395410A (zh) * 2021-01-13 2021-02-23 北京智源人工智能研究院 一种基于实体抽取的产业舆情推荐方法、装置及电子设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199972B (zh) * 2013-09-22 2018-08-03 中科嘉速(北京)信息技术有限公司 一种基于深度学习的命名实体关系抽取与构建方法
CN108170742A (zh) * 2017-12-19 2018-06-15 百度在线网络技术(北京)有限公司 图片舆情获取方法、装置、计算机设备及存储介质
CN110633373B (zh) * 2018-06-20 2023-06-09 上海财经大学 一种基于知识图谱和深度学习的汽车舆情分析方法
CN110837568A (zh) * 2019-11-26 2020-02-25 精硕科技(北京)股份有限公司 实体对齐方法及装置、电子设备、存储介质
CN111104524B (zh) * 2019-12-25 2024-06-21 北京航天云路有限公司 一种识别电视端用户集合的方法
CN112131881B (zh) * 2020-09-27 2023-11-21 腾讯科技(深圳)有限公司 信息抽取方法及装置、电子设备、存储介质
CN112256828B (zh) * 2020-10-20 2023-08-08 平安科技(深圳)有限公司 医学实体关系抽取方法、装置、计算机设备及可读存储介质
CN112257422B (zh) * 2020-10-22 2024-06-11 京东方科技集团股份有限公司 命名实体归一化处理方法、装置、电子设备及存储介质
CN112347759A (zh) * 2020-11-10 2021-02-09 华夏幸福产业投资有限公司 一种实体关系的抽取方法、装置、设备及存储介质
CN112613306A (zh) * 2020-12-31 2021-04-06 恒安嘉新(北京)科技股份公司 抽取实体关系的方法、装置、电子设备、及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190155898A1 (en) * 2017-11-23 2019-05-23 Beijing Baidu Netcom Science And Technology Co. Ltd. Method and device for extracting entity relation based on deep learning, and server
CN109635074A (zh) * 2018-11-13 2019-04-16 平安科技(深圳)有限公司 一种基于舆情信息的实体关系分析方法及终端设备
CN110781683A (zh) * 2019-11-04 2020-02-11 河海大学 一种实体关系联合抽取方法
CN110990525A (zh) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 一种基于自然语言处理的舆情信息抽取及知识库生成方法
CN112395410A (zh) * 2021-01-13 2021-02-23 北京智源人工智能研究院 一种基于实体抽取的产业舆情推荐方法、装置及电子设备

Also Published As

Publication number Publication date
CN113157866B (zh) 2024-05-14
CN113157866A (zh) 2021-07-23

Similar Documents

Publication Publication Date Title
CN109684440B (zh) 基于层级标注的地址相似度度量方法
CN112528672B (zh) 一种基于图卷积神经网络的方面级情感分析方法及装置
CN109241258B (zh) 一种应用税务领域的深度学习智能问答系统
WO2018218705A1 (zh) 一种基于神经网络概率消歧的网络文本命名实体识别方法
TWI662425B (zh) 一種自動生成語義相近句子樣本的方法
WO2019232893A1 (zh) 文本的情感分析方法、装置、计算机设备和存储介质
WO2013080406A1 (ja) 対話システム、冗長メッセージ排除方法および冗長メッセージ排除プログラム
US10824816B2 (en) Semantic parsing method and apparatus
CN111930792B (zh) 数据资源的标注方法、装置、存储介质及电子设备
CN116795973B (zh) 基于人工智能的文本处理方法及装置、电子设备、介质
CN109857846B (zh) 用户问句与知识点的匹配方法和装置
CN112686022A (zh) 违规语料的检测方法、装置、计算机设备及存储介质
US20240143644A1 (en) Event detection
CN112328800A (zh) 自动生成编程规范问题答案的系统及方法
CN113807103B (zh) 基于人工智能的招聘方法、装置、设备及存储介质
CN114676255A (zh) 文本处理方法、装置、设备、存储介质及计算机程序产品
CN112926308B (zh) 匹配正文的方法、装置、设备、存储介质以及程序产品
TW201403354A (zh) 以資料降維法及非線性算則建構中文文本可讀性數學模型之系統及其方法
CN110874536A (zh) 语料质量评估模型生成方法和双语句对互译质量评估方法
CN114357117A (zh) 事务信息查询方法、装置、计算机设备及存储介质
CN110222192A (zh) 语料库建立方法及装置
CN115292520A (zh) 一种面向多源移动应用知识图谱构建方法
WO2022227196A1 (zh) 一种数据分析方法、装置、计算机设备及存储介质
CN116611447A (zh) 一种基于深度学习方法的信息抽取和语义匹配系统及方法
CN111401069A (zh) 会话文本的意图识别方法、意图识别装置及终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938666

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21938666

Country of ref document: EP

Kind code of ref document: A1