CN111177391A - Method and device for acquiring social public opinion volume and computer-readable storage medium - Google Patents

Method and device for acquiring social public opinion volume and computer-readable storage medium Download PDF

Info

Publication number
CN111177391A
CN111177391A CN201911409854.4A CN201911409854A CN111177391A CN 111177391 A CN111177391 A CN 111177391A CN 201911409854 A CN201911409854 A CN 201911409854A CN 111177391 A CN111177391 A CN 111177391A
Authority
CN
China
Prior art keywords
entity
entity word
word
public opinion
social public
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911409854.4A
Other languages
Chinese (zh)
Other versions
CN111177391B (en
Inventor
付骁弈
吴信东
张�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201911409854.4A priority Critical patent/CN111177391B/en
Publication of CN111177391A publication Critical patent/CN111177391A/en
Application granted granted Critical
Publication of CN111177391B publication Critical patent/CN111177391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The embodiment of the invention discloses a method and a device for acquiring social public opinion volume and a computer readable storage medium, comprising the following steps: for each piece of social public opinion text, determining all first entity words in the social public opinion text; acquiring a second entity word which represents the same entity with a certain first entity word in an expert knowledge base, and adding 1 to the number of times of referring to the entity corresponding to the second entity word; wherein, the expert knowledge base comprises entity words representing known entities. The embodiment of the invention collects the entity words representing the known entities in the expert knowledge base, and then considers that the entity words representing the same entity as any entity word in the expert knowledge base refer to the entity, namely, the number of times of referring to the entity can be increased by 1 without depending on the inclusion words, the exclusion words and the corresponding rules, thereby improving the calculation precision of the number of times of referring to the entity.

Description

Method and device for acquiring social public opinion volume and computer-readable storage medium
Technical Field
Embodiments of the present invention relate to, but not limited to, data processing technologies, and in particular, to a method and an apparatus for obtaining social public sentiment volume and a computer-readable storage medium.
Background
The social public opinion text refers to e.g. microblog, wechat, small red book public number articles, and the like, and the social public opinion volume includes any combination of one or more of the following: the number of times that a certain entity in the social public opinion text is mentioned and the number of times that two entities in the social public opinion text are mentioned together.
At present, a method for obtaining social public opinion volume roughly comprises: defining a series of inclusion words and exclusion words for an entity, defining a series of rules according to the times and positions of the inclusion words and the exclusion words appearing in the social public opinion text to judge whether the entity is mentioned, and calculating the number of the social public opinion text for referring to the entity, namely the times of referring to the entity; and calculating the number of the social public opinion texts which refer to a certain two entities together, namely the number of times that the two entities are referred to together.
In the method for obtaining social public opinion sound volume, the precision of the finally obtained social public opinion sound volume depends on the inclusion word and the exclusion word defined for the entity, and the precision of the finally obtained social public opinion sound volume is lower due to the following reasons:
(1) new subject words that are not included in the inclusion and exclusion words or outside the rules cannot be identified;
(2) synonyms exist in the defined inclusion words and the defined exclusion words, and if the synonyms are not included in the inclusion words and the exclusion words, the synonyms cannot be added into the calculation;
(3) the defined inclusion words and exclusion words may have ambiguity in the actual social public opinion text, including: lexical ambiguities, e.g., "apple" may refer to a fruit or a company; grammatical ambiguity, such as "arrange to work" can be formulated as "arrange | work" or "arrange to work"; semantic ambiguities, such as "chicken does not eat", where "chicken" can be viewed as the subject or as the leading object; language ambiguity, such as 'going round this year, and the woman will not wear the trousers uniformly' and the content of the context is lack, it is easy to bring embarrassing misunderstanding to people; if the real meaning of a word in the scene cannot be well judged, the number of times that a certain entity is mentioned cannot be accurately calculated;
(4) in some cases, a relationship between two entities needs to exist to be possible to be referred to together, and the knowledge of the relationship between the two entities needs context-based relationship reasoning so as to accurately calculate the times that two entities are referred to together, for example, a white bottle and a black bottle are good products, but the latter is not good products.
Disclosure of Invention
The embodiment of the invention provides a method and a device for acquiring social public opinion volume and a computer-readable storage medium, which can improve the precision.
The embodiment of the invention provides a method for acquiring social public opinion volume, which comprises the following steps:
for each piece of social public opinion text, determining all first entity words in the social public opinion text;
acquiring a second entity word which represents the same entity with a certain first entity word in an expert knowledge base, and adding 1 to the number of times of referring to the entity corresponding to the second entity word; wherein, the expert knowledge base comprises entity words representing known entities.
In an embodiment of the present invention, the determining all first entity words in the social public opinion text includes:
and carrying out sequence marking on the social public opinion text to obtain all first entity words in the social public opinion text.
In this embodiment of the present invention, the obtaining a second entity word in the expert knowledge base, where the second entity word represents the same entity as a certain first entity word, includes:
acquiring all third entity words in the expert knowledge base, wherein the similarity between the third entity words and a certain first entity word is greater than or equal to a preset threshold;
and taking a third entity word with the highest relevance with the first entity word as the second entity word.
In the embodiment of the present invention, the similarity between the first entity word and the third entity word is calculated according to the following information:
the first entity word, the context of the first entity word in the social public opinion text, the second entity word, and the attribute of the second entity word.
In the embodiment of the invention, the entity words representing the known entities are standard entity words representing the known entities.
The embodiment of the invention provides a method for acquiring social public opinion volume, which comprises the following steps:
for each piece of social public opinion text, when the social public opinion text refers to an entity corresponding to a second entity word and an entity corresponding to a fourth entity word in an expert knowledge base, obtaining a relation between the second entity word and the fourth entity word; wherein the expert knowledge base includes entity words representing known entities;
acquiring one or more sentences containing fifth entity words and sixth entity words from the social public opinion text; wherein the fifth entity word and the second entity word represent the same entity, and the fourth entity word and the sixth entity word represent the same entity;
adding 1 to the number of times of referring to the entity corresponding to the second entity word and the entity corresponding to the fourth entity word together when there is an obtained relationship between the fifth entity word and the sixth entity word in at least one obtained sentence.
In an embodiment of the present invention, before obtaining one or more sentences including a fifth entity word and a sixth entity word from social public opinion text, the method further includes: performing reference resolution processing on the social public opinion text;
the obtaining one or more sentences containing fifth entity words and sixth entity words from social public opinion texts comprises:
and acquiring one or more sentences comprising fifth entity words and sixth entity words from the social public opinion text after the reference resolution processing.
In an embodiment of the present invention, the performing reference resolution processing on the social public opinion text includes:
determining a seventh entity word corresponding to each pronoun in the social public opinion text by adopting a reference resolution model;
for each pronoun, when an entity word representing the same entity as the seventh entity word exists in the expert knowledge base, replacing the pronoun with the seventh entity word.
In the embodiment of the invention, the expert knowledge base also comprises the types of the entity words and the relations among different types;
the obtaining of the relationship between the second entity word and the fourth entity word includes:
acquiring the type of the second entity word and the type of the fourth entity word from the expert knowledge base;
and acquiring the relationship between the type of the second entity word and the type of the fourth entity word from the expert knowledge base as the relationship between the fifth entity word and the sixth entity word.
In the embodiment of the present invention, it is determined whether there is an obtained relationship between the fifth entity word and the sixth entity word in the obtained sentence according to the following information:
the fifth entity word, the context of the fifth entity word, the sixth entity word, the context of the sixth entity word, and the obtained relationship.
The embodiment of the invention provides a device for obtaining social public opinion sound volume, which comprises a processor and a computer-readable storage medium, wherein instructions are stored in the computer-readable storage medium, and when the instructions are executed by the processor, the method for obtaining social public opinion sound volume according to any one of claims 1 to 10 is implemented.
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the above methods for obtaining social public sentiment volume.
One embodiment of the invention comprises: for each piece of social public opinion text, determining all first entity words in the social public opinion text; acquiring a second entity word which represents the same entity with a certain first entity word in an expert knowledge base, and adding 1 to the number of times of referring to the entity corresponding to the second entity word; wherein, the expert knowledge base comprises entity words representing known entities. The embodiment of the invention collects the entity words representing the known entities in the expert knowledge base, and then considers that the entity words representing the same entity as any entity word in the expert knowledge base refer to the entity, namely, the number of times of referring to the entity can be increased by 1 without depending on the inclusion words, the exclusion words and the corresponding rules, thereby improving the calculation precision of the number of times of referring to the entity.
Another embodiment of the invention comprises: for each piece of social public opinion text, when the social public opinion text refers to an entity corresponding to a second entity word and an entity corresponding to a fourth entity word in an expert knowledge base, obtaining a relation between the second entity word and the fourth entity word; wherein the expert knowledge base includes entity words representing known entities; acquiring one or more sentences containing fifth entity words and sixth entity words from the social public opinion text; wherein the fifth entity word and the second entity word represent the same entity, and the fourth entity word and the sixth entity word represent the same entity; adding 1 to the number of times of referring to the entity corresponding to the second entity word and the entity corresponding to the fourth entity word together when there is an obtained relationship between the fifth entity word and the sixth entity word in at least one obtained sentence. According to the embodiment of the invention, whether a corresponding relation exists between two entity words in a sentence in the social public opinion text is judged, if so, the social public opinion text is determined to refer to the entities corresponding to the two entity words together, that is, the number of times of referring to the entities corresponding to the two entity words is increased by 1, so that the calculation accuracy of the number of times of referring to the entities corresponding to the two entity words together is improved.
Additional features and advantages of embodiments of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of embodiments of the invention. The objectives and other advantages of the embodiments of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the examples of the invention serve to explain the principles of the embodiments of the invention and not to limit the embodiments of the invention.
Fig. 1 is a flowchart illustrating a method for obtaining social public opinion volume according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a method for obtaining social public opinion volume according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for obtaining social public opinion volume according to another embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for obtaining social public opinion volume according to another embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments of the present invention may be arbitrarily combined with each other without conflict.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
Referring to fig. 1, an embodiment of the present invention provides a method for obtaining social public opinion volume, including:
step 100, for each piece of social public opinion text, determining all first entity words in the social public opinion text.
In one illustrative example, the entity word is the name of the entity, or the product name of the entity.
In an illustrative example, all first entity words in the social public opinion text can be obtained by adopting a mode of carrying out sequence annotation on the social public opinion text.
In an exemplary example, the social public opinion file may be sequence labeled by using a sequence labeling method well known to those skilled in the art, and the specific labeling method is not limited to the scope of the embodiment of the present invention, for example, the sequence labeling may be performed by using a BIO sequence labeling method, that is, each character in the social public opinion text is labeled by using any one of B-X, I-X, O, where B-X indicates that the segment where the character is located belongs to the X type (e.g., a solid word) and the character is at the beginning of the segment, I-X indicates that the segment where the character is located belongs to the X type (e.g., a solid word) and the character is at the middle position of the segment, and O indicates that the character does not belong to any type. For example, after a certain social public opinion text "… … successfully followed 47.24% of the stock right into a contact security with 30 years of history, Guangzhou development area financial holdings group, Inc. wanted a name with Guangzhou characteristics: a security in a knowledge city. "the" contact securities "in" can be labeled as: b-subj I-subj I-subj I-subj, "Guangzhou development area finance holdings group, Inc." can be labeled as: b _ subj I _ subj I _ subj I _ subj I _ subj I _ subj I _ subj I _ subj I _ subj I _ subj, and the other characters are marked as O. And after the marking is finished, marking the social public opinion text as beginning with B-subj, and adding continuous character strings of one or more continuous I-subj to obtain a third entity word in the social public opinion text.
Step 101, acquiring a second entity word which represents the same entity with a certain first entity word in an expert knowledge base, and adding 1 to the number of times of referring to the entity corresponding to the second entity word; wherein, the expert knowledge base comprises entity words representing known entities.
In one illustrative example, an entity word representing a known entity is a standard entity word representing a known entity.
In an exemplary embodiment, obtaining a second entity word in the expert knowledge base that represents the same entity as a certain first entity word comprises:
acquiring all third entity words in the expert knowledge base, wherein the similarity between the third entity words and a certain first entity word is greater than or equal to a preset threshold; and taking a third entity word with the highest relevance with the first entity word as the second entity word.
In another exemplary example, when the similarity between a certain first entity word and all entity words in the expert knowledge base is less than a preset threshold, it is determined that the first entity word represents a new entity.
In one illustrative example, the similarity between the first entity word and the third entity word is calculated based on: the first entity word, the context of the first entity word in the social public opinion text, the second entity word, and the attribute of the second entity word.
In one illustrative example, the attribute of the second entity word may be a brand, a category, a composition, a potency, a scene, a need, a consumer, etc. of the entity to which the second entity word corresponds.
In an illustrative example, the context of a certain entity word in the social public opinion text may be a character string in the social public opinion text, wherein the distance between the certain entity word and the position of the certain entity word is less than or equal to a preset threshold.
In an exemplary example, a Search server ES (Elastic Search) may be used to obtain all third entity words from the expert knowledge base, where the similarity between the third entity words and a certain first entity word is greater than or equal to a preset threshold, where the purpose of obtaining all third entity words is to reduce the amount of calculation for performing correlation calculation subsequently, and also to screen out entity words in the expert knowledge base that may represent the same entity as the first entity word.
In an exemplary example, all the third entity words may be ranked in relevance to the first entity word by using an entity ranking model (e.g., a search ranking algorithm such as BM25, or a ranking learning method) well known to those skilled in the art, and the third entity word with the highest ranking is the third entity word with the highest relevance to the first entity word.
The embodiment of the invention collects the entity words representing the known entities in the expert knowledge base, and then considers that the entity words representing the same entity as any entity word in the expert knowledge base refer to the entity, namely, the number of times of referring to the entity can be increased by 1 without depending on the inclusion words, the exclusion words and the corresponding rules, thereby improving the calculation precision of the number of times of referring to the entity.
Referring to fig. 2, another embodiment of the present invention provides a method for obtaining social public opinion volume, including:
200, for each piece of social public opinion text, when the social public opinion text refers to an entity corresponding to a second entity word and an entity corresponding to a fourth entity word in an expert knowledge base, obtaining a relation between the second entity word and the fourth entity word; wherein the expert knowledge base includes entity words representing known entities.
In one illustrative example, an entity word representing a known entity is a standard entity word representing a known entity.
In one illustrative example, determining whether the social public opinion text refers to an entity corresponding to the second entity word includes:
determining all first entity words in the social public opinion text, and determining an entity corresponding to the second entity word in the social public opinion text when at least one first entity word and the second entity word represent the same entity; when all the first entity words and all the second entity words represent different entities, determining that the entities corresponding to the second entity words are not mentioned in the social public opinion text.
In one illustrative example, determining whether the social public opinion text refers to an entity corresponding to the fourth entity word includes:
determining all first entity words in the social public opinion text, and determining an entity corresponding to a fourth entity word in the social public opinion text when at least one first entity word and the fourth entity word represent the same entity; when all the first entity words and all the second entity words represent different entities, determining that the entities corresponding to the second entity words are not mentioned in the social public opinion text.
The specific implementation process for determining all the first entity words in the social public opinion text is the same as the specific implementation process of step 100 in the foregoing embodiment, and is not described herein again.
The specific implementation process of determining whether the first entity word and the second entity word or the fourth entity word represent the same entity is the same as the specific implementation process of step 101 in the foregoing embodiment, and is not described here again.
In one illustrative example, the expert knowledge base further includes types to which the entity words belong and relationships between the different types; the obtaining of the relationship between the second entity word and the fourth entity word includes:
acquiring the type of the second entity word and the type of the fourth entity word from the expert knowledge base; and acquiring the relationship between the type of the second entity word and the type of the fourth entity word from the expert knowledge base as the relationship between the fifth entity word and the sixth entity word.
In one illustrative example, the types to which entity words included in the expert knowledge base belong and the relationships between the different types may be represented using an ontology model.
In one illustrative example, the onto-model includes an entity model and a relationship model.
In one illustrative example, the entity model uses a knowledge graph to represent how many types of entity words are included in the expert knowledge base, what the name of each type of entity word is, and the entity words in the entity model include, but are not limited to, entity words of the following types: brand, category, product, ingredient, efficacy, scenario, need, consumer, etc.
In one illustrative example, a relationship model employs a knowledge graph to represent what the names of relationships that may exist between various types of entity words, including, but not limited to, relationships of the types including: including, having, developing, using, solving, providing, etc.
Step 201, acquiring one or more sentences containing fifth entity words and sixth entity words from the social public opinion text; wherein the fifth entity word and the second entity word represent the same entity, and the fourth entity word and the sixth entity word represent the same entity.
Step 202, when there is an obtained relationship between the fifth entity word and the sixth entity word in at least one obtained sentence, adding 1 to the number of times of referring to the entity corresponding to the second entity word and the entity corresponding to the fourth entity word together.
In one illustrative example, it is determined whether an obtained relationship exists between a fifth entity word and a sixth entity word in the obtained sentence according to the following information:
the fifth entity word, the context of the fifth entity word, the sixth entity word, the context of the sixth entity word, and the obtained relationship.
In an exemplary example, the deep neural network model may be used, but is not limited to, to determine whether there is an obtained relationship between the fifth entity word and the sixth entity word in the obtained sentence.
For example, one possible implementation method includes:
converting each character in the obtained sentence into a real-valued vector with a fixed length, wherein the real-valued vector is obtained by splicing three vectors, the first vector is a word vector corresponding to the character, and the word vector corresponding to the character can be obtained by inquiring a word vector table; the second vector is a random vector with fixed length obtained by mapping the distance between the character and the fifth volume word in the obtained sentence; the third vector is a random vector with fixed length obtained by mapping the distance between the character in the obtained sentence and the sixth entity word;
inputting real value vectors corresponding to all characters into a one-way or two-way long short-Term Memory network (LSTM) to obtain a coding matrix corresponding to the obtained sentence;
mapping the obtained coding matrix corresponding to the sentence into a probability value of an obtained relation between a fifth entity word and a sixth entity word in the obtained sentence by using a feedforward neural network, and determining that the obtained relation exists between the fifth entity word and the sixth entity word in the obtained sentence when the probability value is greater than or equal to a preset threshold value; and when the probability value is smaller than a preset threshold value, determining that no obtained relation exists between the fifth entity word and the sixth entity word in the obtained sentence.
In another embodiment of the invention, before obtaining one or more sentences containing fifth entity words and sixth entity words from social public opinion text, the method further comprises: performing reference resolution processing on the social public opinion text;
the obtaining one or more sentences containing fifth entity words and sixth entity words from social public opinion texts comprises: and acquiring one or more sentences comprising fifth entity words and sixth entity words from the social public opinion text after the reference resolution processing.
In an illustrative example, the reference resolution processing of social public opinion text comprises:
determining a seventh entity word corresponding to each pronoun in the social public opinion text by adopting a reference resolution model; for each pronoun, replacing the pronoun with a seventh entity word when an entity word representing the same entity as the seventh entity word exists in the expert knowledge base; and when the seventh entity word does not exist in the expert knowledge base and represents the entity word of the same entity, the pronouns are not replaced.
For example, "mary has eaten breakfast, she is happy. "she in" was replaced with "mary" by referring to the digestion process.
In an exemplary embodiment, the specific reference resolution model is not used to limit the protection scope of the embodiment of the present invention, and is not described herein again.
According to the embodiment of the invention, whether a corresponding relation exists between two entity words in a sentence in the social public opinion text is judged, if so, the social public opinion text is determined to refer to the entities corresponding to the two entity words together, that is, the number of times of referring to the entities corresponding to the two entity words is increased by 1, so that the calculation accuracy of the number of times of referring to the entities corresponding to the two entity words together is improved.
The invention further provides a device for obtaining social public opinion sound volume, which comprises a processor and a computer-readable storage medium, wherein the computer-readable storage medium stores instructions, and when the instructions are executed by the processor, the method for obtaining social public opinion sound volume is realized.
Another embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of any one of the above methods for obtaining social public opinion volume.
Referring to fig. 3, another embodiment of the present invention provides an apparatus for obtaining social public opinion volume, including:
the entity word recognition module 301 is configured to determine, for each piece of social public opinion text, all first entity words in the social public opinion text;
a first social public opinion sound volume calculation module 302, configured to obtain a second entity word that represents the same entity as one of the first entity words in an expert knowledge base, and add 1 to the number of times of referring to an entity corresponding to the second entity word; wherein, the expert knowledge base comprises entity words representing known entities.
In one illustrative example, an entity word representing a known entity is a standard entity word representing a known entity.
In one illustrative example, the entity word is the name of the entity, or the product name of the entity.
In an illustrative example, the entity word recognition module 301 may obtain all the first entity words in the social public opinion text by using a sequence tagging of the social public opinion text.
In an exemplary example, the entity word recognition module 301 may perform sequence tagging on the social public opinion file by using a sequence tagging method well known to those skilled in the art, where the particular tagging method is not limited to the scope of the embodiment of the present invention, for example, the sequence tagging may be performed by using a BIO sequence tagging method, that is, any one of B-X, I-X, and O is used to tag each character in the social public opinion text, B-X indicates that the segment where the character is located belongs to the X type (e.g., an entity word) and the character is at the beginning of the segment, I-X indicates that the segment where the character is located belongs to the X type (e.g., an entity word) and the character is at the middle position of the segment, and O indicates that the character does not belong to any type. For example, after a certain social public opinion text "… … successfully followed 47.24% of the stock right into a contact security with 30 years of history, Guangzhou development area financial holdings group, Inc. wanted a name with Guangzhou characteristics: a security in a knowledge city. "the" contact securities "in" can be labeled as: b-subj I-subj I-subj I-subj, "Guangzhou development area finance holdings group, Inc." can be labeled as: b _ subj I _ subj I _ subj I _ subj I _ subj I _ subj I _ subj I _ subj I _ subj I _ subj, and the other characters are marked as O. And after the marking is finished, marking the social public opinion text as beginning with B-subj, and adding continuous character strings of one or more continuous I-subj to obtain a third entity word in the social public opinion text.
In an exemplary example, the first social public opinion sound quantity calculating module 302 is specifically configured to obtain a second entity word representing the same entity as a certain first entity word in the expert knowledge base by:
acquiring all third entity words in the expert knowledge base, wherein the similarity between the third entity words and a certain first entity word is greater than or equal to a preset threshold; and taking a third entity word with the highest relevance with the first entity word as the second entity word.
In another exemplary example, when the similarity between a certain first entity word and all entity words in the expert knowledge base is less than a preset threshold, it is determined that the first entity word represents a new entity.
In an illustrative example, the first social public opinion volume calculation module 302 is further configured to: calculating a similarity between the first entity word and the third entity word according to the following information: the first entity word, the context of the first entity word in the social public opinion text, the second entity word, and the attribute of the second entity word.
In one illustrative example, the attribute of the second entity word may be a brand, a category, a composition, a potency, a scene, a need, a consumer, etc. of the entity to which the second entity word corresponds.
In an illustrative example, the context of a certain entity word in the social public opinion text may be a character string in the social public opinion text, wherein the distance between the certain entity word and the position of the certain entity word is less than or equal to a preset threshold.
In an exemplary example, the first social public opinion sound volume calculating module 302 may use a Search server ES (Elastic Search) to obtain all third entity words with a similarity greater than or equal to a preset threshold value with respect to a certain first entity word from an expert knowledge base, where the purpose of obtaining all third entity words is to reduce the calculation volume of the subsequent correlation calculation, and also to screen out entity words in the expert knowledge base that may represent the same entity as the first entity word.
In an exemplary example, the first social public opinion sound quantity calculation module 302 may use an entity ranking model (e.g., a search ranking algorithm such as BM25, or a ranking learning method) well known to those skilled in the art to rank all the third entity words with relevance to the first entity word, where the third entity word with highest ranking is the third entity word with highest relevance to the first entity word.
The embodiment of the invention collects the entity words representing the known entities in the expert knowledge base, and then considers that the entity words representing the same entity as any entity word in the expert knowledge base refer to the entity, namely, the number of times of referring to the entity can be increased by 1 without depending on the inclusion words, the exclusion words and the corresponding rules, thereby improving the calculation precision of the number of times of referring to the entity.
Referring to fig. 4, another embodiment of the present invention provides an apparatus for obtaining social public opinion volume, including:
a relationship obtaining module 401, configured to, for each social public opinion text, obtain a relationship between a second entity word and a fourth entity word when the social public opinion text refers to an entity corresponding to the second entity word and an entity corresponding to the fourth entity word in an expert knowledge base; wherein the expert knowledge base includes entity words representing known entities;
a sentence obtaining module 402, configured to obtain one or more sentences including a fifth entity word and a sixth entity word from the social public opinion text; wherein the fifth entity word and the second entity word represent the same entity, and the fourth entity word and the sixth entity word represent the same entity;
a second social public opinion sound volume calculating module 403, configured to add 1 to the number of times that an entity corresponding to the second entity word and an entity corresponding to the fourth entity word are referred to together when there is an obtained relationship between the fifth entity word and the sixth entity word in at least one obtained sentence.
In an exemplary example, the relationship obtaining module 401 is specifically configured to determine whether the social public opinion text refers to an entity corresponding to the second entity word by using the following method:
determining all first entity words in the social public opinion text, and determining an entity corresponding to the second entity word in the social public opinion text when at least one first entity word and the second entity word represent the same entity; when all the first entity words and all the second entity words represent different entities, determining that the entities corresponding to the second entity words are not mentioned in the social public opinion text.
In an exemplary example, the relationship obtaining module 401 is specifically configured to determine whether the social public opinion text refers to an entity corresponding to the fourth entity word by using the following method:
determining all first entity words in the social public opinion text, and determining an entity corresponding to a fourth entity word in the social public opinion text when at least one first entity word and the fourth entity word represent the same entity; when all the first entity words and all the second entity words represent different entities, determining that the entities corresponding to the second entity words are not mentioned in the social public opinion text.
The specific implementation process for determining all the first entity words in the social public opinion text is the same as the specific implementation process of step 100 in the foregoing embodiment, and is not described herein again.
The specific implementation process of determining whether the first entity word and the second entity word or the fourth entity word represent the same entity is the same as the specific implementation process of step 101 in the foregoing embodiment, and is not described here again.
In one illustrative example, the expert knowledge base further includes types to which the entity words belong and relationships between the different types; the relationship obtaining module 401 is specifically configured to implement the obtaining of the relationship between the second entity word and the fourth entity word in the following manner:
acquiring the type of the second entity word and the type of the fourth entity word from the expert knowledge base; and acquiring the relationship between the type of the second entity word and the type of the fourth entity word from the expert knowledge base as the relationship between the fifth entity word and the sixth entity word.
In one illustrative example, the types to which entity words included in the expert knowledge base belong and the relationships between the different types may be represented using an ontology model.
In one illustrative example, the onto-model includes an entity model and a relationship model.
In one illustrative example, the entity model uses a knowledge graph to represent how many types of entity words are included in the expert knowledge base, what the name of each type of entity word is, and the entity words in the entity model include, but are not limited to, entity words of the following types: brand, category, product, ingredient, efficacy, scenario, need, consumer, etc.
In one illustrative example, a relationship model employs a knowledge graph to represent what the names of relationships that may exist between various types of entity words, including, but not limited to, relationships of the types including: including, having, developing, using, solving, providing, etc.
In an exemplary instance, the second social public opinion volume calculation module 403 is further configured to: determining whether an obtained relationship exists between a fifth entity word and a sixth entity word in the obtained sentence according to the following information:
the fifth entity word, the context of the fifth entity word, the sixth entity word, the context of the sixth entity word, and the obtained relationship.
In an illustrative example, the second social public opinion sound quantity calculation module 403 may specifically employ, but is not limited to, employing a deep neural network model to determine whether there is an obtained relationship between a fifth entity word and a sixth entity word in the obtained sentence.
For example, one possible implementation method includes:
converting each character in the obtained sentence into a real-valued vector with a fixed length, wherein the real-valued vector is obtained by splicing three vectors, the first vector is a word vector corresponding to the character, and the word vector corresponding to the character can be obtained by inquiring a word vector table; the second vector is a random vector with fixed length obtained by mapping the distance between the character and the fifth volume word in the obtained sentence; the third vector is a random vector with fixed length obtained by mapping the distance between the character in the obtained sentence and the sixth entity word;
inputting real value vectors corresponding to all characters into a one-way or two-way long short-Term Memory network (LSTM) to obtain a coding matrix corresponding to the obtained sentence;
mapping the obtained coding matrix corresponding to the sentence into a probability value of an obtained relation between a fifth entity word and a sixth entity word in the obtained sentence by using a feedforward neural network, and determining that the obtained relation exists between the fifth entity word and the sixth entity word in the obtained sentence when the probability value is greater than or equal to a preset threshold value; and when the probability value is smaller than a preset threshold value, determining that no obtained relation exists between the fifth entity word and the sixth entity word in the obtained sentence.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Although the embodiments of the present invention have been described above, the descriptions are only used for understanding the embodiments of the present invention, and are not intended to limit the embodiments of the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the embodiments of the invention as defined by the appended claims.

Claims (12)

1. A method for obtaining social public opinion volume comprises the following steps:
for each piece of social public opinion text, determining all first entity words in the social public opinion text;
acquiring a second entity word which represents the same entity with a certain first entity word in an expert knowledge base, and adding 1 to the number of times of referring to the entity corresponding to the second entity word; wherein, the expert knowledge base comprises entity words representing known entities.
2. The method of claim 1, wherein the determining all first entity words in social public opinion text comprises:
and carrying out sequence marking on the social public opinion text to obtain all first entity words in the social public opinion text.
3. The method of claim 1, wherein obtaining a second entity word in the expert knowledge base that represents the same entity as a first entity word comprises:
acquiring all third entity words in the expert knowledge base, wherein the similarity between the third entity words and a certain first entity word is greater than or equal to a preset threshold;
and taking a third entity word with the highest relevance with the first entity word as the second entity word.
4. The method of claim 3, wherein the similarity between the first entity word and the third entity word is calculated according to the following information:
the first entity word, the context of the first entity word in the social public opinion text, the second entity word, and the attribute of the second entity word.
5. The method of claim 1, wherein the entity words representing known entities are standard entity words representing known entities.
6. A method for obtaining social public opinion volume comprises the following steps:
for each piece of social public opinion text, when the social public opinion text refers to an entity corresponding to a second entity word and an entity corresponding to a fourth entity word in an expert knowledge base, obtaining a relation between the second entity word and the fourth entity word; wherein the expert knowledge base includes entity words representing known entities;
acquiring one or more sentences containing fifth entity words and sixth entity words from the social public opinion text; wherein the fifth entity word and the second entity word represent the same entity, and the fourth entity word and the sixth entity word represent the same entity;
adding 1 to the number of times of referring to the entity corresponding to the second entity word and the entity corresponding to the fourth entity word together when there is an obtained relationship between the fifth entity word and the sixth entity word in at least one obtained sentence.
7. The method of claim 6, wherein before the obtaining one or more sentences containing fifth entity words and sixth entity words from social public opinion text, the method further comprises: performing reference resolution processing on the social public opinion text;
the obtaining one or more sentences containing fifth entity words and sixth entity words from social public opinion texts comprises:
and acquiring one or more sentences comprising fifth entity words and sixth entity words from the social public opinion text after the reference resolution processing.
8. The method of claim 7, wherein the reference resolution processing of social public opinion text comprises:
determining a seventh entity word corresponding to each pronoun in the social public opinion text by adopting a reference resolution model;
for each pronoun, when an entity word representing the same entity as the seventh entity word exists in the expert knowledge base, replacing the pronoun with the seventh entity word.
9. The method according to any one of claims 6 to 8, wherein the expert knowledge base further comprises types to which entity words belong and relationships between the different types;
the obtaining of the relationship between the second entity word and the fourth entity word includes:
acquiring the type of the second entity word and the type of the fourth entity word from the expert knowledge base;
and acquiring the relationship between the type of the second entity word and the type of the fourth entity word from the expert knowledge base as the relationship between the fifth entity word and the sixth entity word.
10. The method according to any one of claims 6 to 8, wherein it is determined whether an obtained relationship exists between a fifth entity word and a sixth entity word in the obtained sentence according to the following information:
the fifth entity word, the context of the fifth entity word, the sixth entity word, the context of the sixth entity word, and the obtained relationship.
11. An apparatus for obtaining social public opinion sound volume, comprising a processor and a computer-readable storage medium, wherein instructions are stored in the computer-readable storage medium, and when the instructions are executed by the processor, the method for obtaining social public opinion sound volume according to any one of claims 1 to 10 is implemented.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for obtaining social public opinion sound volume according to any one of claims 1 to 10.
CN201911409854.4A 2019-12-31 2019-12-31 Method and device for acquiring social public opinion volume and computer readable storage medium Active CN111177391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911409854.4A CN111177391B (en) 2019-12-31 2019-12-31 Method and device for acquiring social public opinion volume and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911409854.4A CN111177391B (en) 2019-12-31 2019-12-31 Method and device for acquiring social public opinion volume and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111177391A true CN111177391A (en) 2020-05-19
CN111177391B CN111177391B (en) 2023-08-08

Family

ID=70655829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911409854.4A Active CN111177391B (en) 2019-12-31 2019-12-31 Method and device for acquiring social public opinion volume and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111177391B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214663A (en) * 2020-10-22 2021-01-12 上海明略人工智能(集团)有限公司 Method, system, device, storage medium and mobile terminal for obtaining public opinion volume

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078918A1 (en) * 2010-09-28 2012-03-29 Siemens Corporation Information Relation Generation
CN107544988A (en) * 2016-06-27 2018-01-05 百度在线网络技术(北京)有限公司 A kind of method and apparatus for obtaining public sentiment data
CN108460014A (en) * 2018-02-07 2018-08-28 百度在线网络技术(北京)有限公司 Recognition methods, device, computer equipment and the storage medium of business entity
CN109710918A (en) * 2018-11-26 2019-05-03 平安科技(深圳)有限公司 Public sentiment relation recognition method, apparatus, computer equipment and storage medium
CN110188168A (en) * 2019-05-24 2019-08-30 北京邮电大学 Semantic relation recognition methods and device
CN110472019A (en) * 2019-08-22 2019-11-19 北京明略软件系统有限公司 Public sentiment searching method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078918A1 (en) * 2010-09-28 2012-03-29 Siemens Corporation Information Relation Generation
CN107544988A (en) * 2016-06-27 2018-01-05 百度在线网络技术(北京)有限公司 A kind of method and apparatus for obtaining public sentiment data
CN108460014A (en) * 2018-02-07 2018-08-28 百度在线网络技术(北京)有限公司 Recognition methods, device, computer equipment and the storage medium of business entity
CN109710918A (en) * 2018-11-26 2019-05-03 平安科技(深圳)有限公司 Public sentiment relation recognition method, apparatus, computer equipment and storage medium
CN110188168A (en) * 2019-05-24 2019-08-30 北京邮电大学 Semantic relation recognition methods and device
CN110472019A (en) * 2019-08-22 2019-11-19 北京明略软件系统有限公司 Public sentiment searching method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨献祥: ""面向中文微博的产品名实体识别与规范化算法设计与实现"" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214663A (en) * 2020-10-22 2021-01-12 上海明略人工智能(集团)有限公司 Method, system, device, storage medium and mobile terminal for obtaining public opinion volume

Also Published As

Publication number Publication date
CN111177391B (en) 2023-08-08

Similar Documents

Publication Publication Date Title
US20180158078A1 (en) Computer device and method for predicting market demand of commodities
US20130159277A1 (en) Target based indexing of micro-blog content
CN111522915A (en) Extraction method, device and equipment of Chinese event and storage medium
US11263400B2 (en) Identifying entity attribute relations
Bauer et al. # MeTooMaastricht: Building a chatbot to assist survivors of sexual harassment
CN111078837A (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN112241626A (en) Semantic matching and semantic similarity model training method and device
Liu et al. Correlation identification in multimodal weibo via back propagation neural network with genetic algorithm
US9563847B2 (en) Apparatus and method for building and using inference engines based on representations of data that preserve relationships between objects
CN116821372A (en) Knowledge graph-based data processing method and device, electronic equipment and medium
US20220164546A1 (en) Machine Learning Systems and Methods for Many-Hop Fact Extraction and Claim Verification
CN114528418A (en) Text processing method, system and storage medium
Sarkar A hidden markov model based system for entity extraction from social media english text at fire 2015
CN111177391B (en) Method and device for acquiring social public opinion volume and computer readable storage medium
CN113378090A (en) Internet website similarity analysis method and device and readable storage medium
CN115210705A (en) Vector embedding model for relational tables with invalid or equivalent values
CN114372454A (en) Text information extraction method, model training method, device and storage medium
CN111708870A (en) Deep neural network-based question answering method and device and storage medium
Drury A Text Mining System for Evaluating the Stock Market's Response To News
CN114328902A (en) Text labeling model construction method and device
CN112434126A (en) Information processing method, device, equipment and storage medium
Vargas et al. Hierarchical clustering of aspects for opinion mining: a corpus study
Ramos-Flores et al. Probabilistic vs deep learning based approaches for narrow domain NER in Spanish
CN111858860A (en) Search information processing method and system, server, and computer readable medium
CN112966511B (en) Entity word recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant