CN109918669B - Entity determining method, device and storage medium - Google Patents

Entity determining method, device and storage medium Download PDF

Info

Publication number
CN109918669B
CN109918669B CN201910177268.5A CN201910177268A CN109918669B CN 109918669 B CN109918669 B CN 109918669B CN 201910177268 A CN201910177268 A CN 201910177268A CN 109918669 B CN109918669 B CN 109918669B
Authority
CN
China
Prior art keywords
target
entity
text information
entities
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910177268.5A
Other languages
Chinese (zh)
Other versions
CN109918669A (en
Inventor
赵创钿
谢润泉
连凤宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910177268.5A priority Critical patent/CN109918669B/en
Publication of CN109918669A publication Critical patent/CN109918669A/en
Application granted granted Critical
Publication of CN109918669B publication Critical patent/CN109918669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an entity determining method, an entity determining device and a storage medium, and belongs to the field of natural language processing. The method comprises the following steps: extracting a target sentence in the target text information; acquiring an alternative entity set, wherein the alternative entity set comprises a plurality of alternative entities; acquiring the relativity of each candidate entity in the plurality of candidate entities and the target sentence; and determining a target entity of the target text information according to the acquired multiple correlations, wherein the correlation between the target entity and the target sentence is greater than the correlation between other entities in the multiple candidate entities and the target sentence. The method for determining the target entity is expanded, the meaning of the target sentence is fully considered, the target entity related to the semantic of the target sentence can be selected, the target entity appearing in the target text information is not only selected, the accuracy of the target entity is improved, and the functional range is expanded.

Description

Entity determining method, device and storage medium
Technical Field
The present invention relates to the field of natural language processing, and in particular, to a method and apparatus for determining an entity, and a storage medium.
Background
With the rapid development and wide spread of internet technology, much information in the internet may be spread in the form of text information, which may include information such as company names, person names, organization names, etc., which are collectively referred to as entities. In order to analyze text information, an entity that extracts text information is required to accurately understand the meaning of the text information.
In the related art, when target text information is acquired, word segmentation is performed on the target text information to obtain a plurality of words in the target text information, words such as a person name, a mechanism name, a geographic name and the like are identified from the plurality of words based on a word identification model, and the identified words are determined to be target entities of the target text information.
The inventor considers that the above target entity is determined according to the words appearing in the target text information, but other information of the target text information is not considered, the determined target text information is not accurate enough, and the mode of determining the target entity is limited.
Disclosure of Invention
The embodiment of the invention provides an entity determining method, an entity determining device and a storage medium, which solve the problems of the related technology. The technical scheme is as follows:
in one aspect, there is provided a method of entity determination, the method comprising:
extracting a target sentence in the target text information;
acquiring an alternative entity set, wherein the alternative entity set comprises a plurality of alternative entities;
acquiring the relevance between each candidate entity in the plurality of candidate entities and the target sentence;
and determining a target entity of the target text information according to the acquired multiple relatedness degrees, wherein the relatedness degree of the target entity and the target sentence is larger than the relatedness degree of other entities in the multiple alternative entities and the target sentence.
In another aspect, there is provided an entity determining apparatus, the apparatus comprising:
the extraction module is used for extracting target sentences in the target text information;
the first acquisition module is used for acquiring an alternative entity set, wherein the alternative entity set comprises a plurality of alternative entities;
the second acquisition module is used for acquiring the relevance between each candidate entity in the plurality of candidate entities and the target sentence;
the determining module is used for determining a target entity of the target text information according to the acquired multiple correlations, wherein the correlation between the target entity and the target sentence is greater than the correlation between other entities in the multiple candidate entities and the target sentence.
In another aspect, an entity determining apparatus is provided, the apparatus comprising a processor and a memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the program, the set of codes, or the set of instructions being loaded and executed by the processor to implement operations performed in the entity determining method.
In another aspect, a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions loaded and executed by a processor to implement the operations performed in the entity determining method is provided.
According to the entity determining method, the entity determining device and the storage medium, the target sentence in the target text information is extracted, the candidate entity set is obtained, the relevance between each candidate entity in the plurality of candidate entities and the target sentence is obtained, and the target entity of the target text information is determined according to the obtained relevance. The method for determining the target entity is expanded, the meaning of the target sentence is fully considered, the target entity related to the semantic of the target sentence can be selected, the target entity appearing in the target text information is not only selected, the accuracy of the target entity is improved, and the functional range is expanded.
And when a plurality of target entities of the target text information are determined, a plurality of designated entities with the same meaning in the plurality of target entities can be replaced by one entity with the same meaning, so that only one entity can be reserved for the plurality of entities with the same meaning, and the data volume is reduced under the condition of ensuring the accuracy of the extracted entities.
And the correlation degree between the target entity and the target text information can be obtained, so that the correlation degree between the target entity and the target text information can be measured, and the information quantity is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for determining an entity according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an association diagram according to an embodiment of the present invention;
FIG. 3 is a flowchart for obtaining descriptive text information provided by an embodiment of the present invention;
FIG. 4 is a flowchart of calculating a correlation based on a correlation model according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a display interface of a terminal according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an event and a target entity according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an entity determining apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
The entity determining method provided by the embodiment of the invention is applied to processing equipment, and the processing equipment is used for determining the target entity of the target text information.
The processing device may include a server, a terminal, or other device having processing functionality. The terminal can be a mobile phone, a tablet personal computer, a computer and other types of terminals, and the server can be a server, a server cluster formed by a plurality of servers, or a cloud computing service center.
In a possible implementation manner, when the method provided by the embodiment of the invention is applied to a terminal, the terminal can use the text information as target text information when acquiring any text information, and the method provided by the embodiment of the invention is adopted to determine the target entity of the target text information.
In another possible implementation manner, when the method provided by the embodiment of the invention is applied to a server and the server acquires any text information, for example, the text information displayed by one or more terminals or published text information is collected, the text information can be used as target text information, and the target entity of the target text information is determined by adopting the method provided by the embodiment of the invention.
Fig. 1 is a flowchart of an entity determining method according to an embodiment of the present invention. Referring to fig. 1, applied to a processing apparatus, the method includes:
101. and extracting a target sentence in the target text information.
Regarding the target text information: the target text information may include various types of text information such as news, advertisements, etc. from the content. From the source, the target text information may include text information in a web page, text information in an application interface, text information included in a captured image or video, and the like. From the display interface, the target text information can be displayed in a webpage, an application interface, an image or a video, wherein the application interface can comprise a video playing interface, an instant messaging interface and the like.
During operation of the processing device, text information may be received, displayed, sent, or other operations related to the text information may be performed. When this operation is performed, text information is acquired. In the embodiment of the invention, each time any text information is acquired, the text information is taken as target text information so as to extract the target entity of the target text information.
When the entity extraction is carried out on the target text information, the target sentence in the target text information is extracted, the target sentence can be used as a key sentence of the target text information, represents the meaning of the target text information, and the entity extraction can be carried out according to the target sentence.
Wherein the target sentence comprises one or more sentences in the target text information.
In one possible implementation, a key sentence extraction algorithm is adopted to process the target text information, so as to obtain a target sentence in the target text information. The key sentence extraction algorithm may be a TFIDF (Term Frequency Inverse Document Frequency, word frequency inverse text frequency index) algorithm, a TextRank algorithm, or other algorithms.
In another possible implementation manner, sentence division is performed on the target text information, each sentence in the target text information is obtained, and the target sentence is extracted from each sentence obtained by the division.
When sentence dividing is performed, punctuation marks, such as periods, question marks, exclamation marks and the like, which can represent that corresponding sentences are finished in the target text information are obtained, and the target text information is divided according to the obtained punctuation marks, so that each sentence in the target text information is obtained.
In extracting the target sentence, if a plurality of sentences are included in the target text information, any one or more sentences among the plurality of sentences may be extracted as the target sentence. Alternatively, a first sentence among the plurality of sentences is extracted as the target sentence. Alternatively, a pre-preset number of sentences in the plurality of sentences are extracted as target sentences, or other manners of extracting the target sentences may be adopted.
102. A set of candidate entities is obtained.
An entity refers to an object or thing that exists objectively and can be distinguished from each other, and an entity can be included in text information, and the entity has a specific meaning in the text information and can be used to describe an object or thing. The entity may include multiple types, such as company name, persona name, institution name, geographic location name, event name, product name, etc.
In the embodiment of the invention, the processing device can prestore a storage candidate entity set, wherein the candidate entity set comprises a plurality of candidate entities which can be used in the subsequent entity extraction process of the target text information.
In one possible implementation, a plurality of sample text information is collected, an entity is extracted from the plurality of sample text information, and the extracted entity is added to a set of candidate entities as candidate entities. And further sample text information can be continuously collected later, the entity is extracted from the plurality of sample text information, the extracted entity is used as an alternative entity and is continuously added into the alternative entity set, and therefore updating of the alternative entity set is achieved.
The plurality of sample text information can be obtained by grabbing a web page, extracting from instant messaging records, or collecting by other methods. The candidate entity set may be obtained by the processing device, or may be obtained by another device and then sent to the processing device.
103. And acquiring the relevance of each candidate entity in the plurality of candidate entities and the target sentence.
In the embodiment of the invention, other words related to the meaning of the target text information can exist besides the entity already appearing in the target text information. The target sentence can represent the meaning of the target text information, and the candidate entity related to the target sentence can be obtained from a plurality of candidate entities, and the candidate entities can be considered to be related to the meaning of the target text information and can be used as the target entity of the target text information.
In order to acquire the candidate entities related to the target sentence, the relevance of each candidate entity in the plurality of candidate entities to the target sentence is acquired. The degree of correlation between the candidate entity and the target sentence is used for indicating the degree of correlation between the candidate entity and the target sentence, when the degree of correlation between the candidate entity and the target sentence is larger, the more relevant the candidate entity and the target sentence are indicated, and when the degree of correlation between the candidate entity and the target sentence is smaller, the more irrelevant the candidate entity and the target sentence are indicated.
In one possible implementation manner, a vector of the target sentence is obtained, as a first vector, for each candidate entity, a vector of the candidate entity is obtained, as a second vector, a correlation degree between the first vector and the second vector is calculated, and as a correlation degree between the candidate entity and the target sentence, a correlation degree between each candidate entity in the plurality of candidate entities and the target sentence is obtained respectively in the above manner.
Alternatively, when obtaining a Vector of a target sentence or an alternative entity, a Word2Vec (Word to Vector, expressed as a Vector) algorithm may be employed to obtain the Vector.
After the first vector and the second vector are obtained, cosine similarity between the second vector and the first vector can be calculated, and the cosine similarity is used for representing correlation between the candidate entity and the target sentence, wherein the larger the cosine similarity is, the larger the correlation is, and the more relevant the candidate entity and the target sentence are. Or, the Euclidean distance between the second vector and the first vector is calculated, the correlation degree of the candidate entity and the target sentence is represented by the Euclidean distance, and the smaller the Euclidean distance is, the larger the correlation degree is, and the more the candidate entity and the target sentence are represented. Alternatively, the relevance of the first vector and the second vector may be calculated in other manners, and the relevance of the candidate entity and the target sentence may be represented by the calculated relevance.
104. And determining a target entity of the target text information according to the acquired multiple correlations.
After obtaining the correlation degree between each candidate entity in the plurality of candidate entities and the target sentence through step 103, the candidate entity corresponding to the larger correlation degree in the plurality of correlations can be selected according to the obtained plurality of correlations, and the selected candidate entity is the entity related to the target text information, while the remaining other candidate entities are entities not related to the target text information, so that the selected candidate entity is determined to be the target entity, and recall of the target entity is realized. The correlation degree of the target entity and the target sentence is larger than that of other candidate entities in the plurality of candidate entities and the target sentence.
The other candidate entities refer to a different candidate entity from the target entity in the plurality of candidate entities, and may include one candidate entity or a plurality of candidate entities. For example, the other candidate entities may be all candidate entities except the target entity in the plurality of candidate entities, or may be part of candidate entities except the target entity in the plurality of candidate entities.
In one possible implementation manner, the plurality of correlations are ranked, a preset number of correlations are selected according to the ranking order of the plurality of correlations, the selected correlations are larger than other correlations in the plurality of correlations, and the candidate entity corresponding to the selected correlations is determined as the target entity.
The obtained multiple correlations can be ranked according to the order from large to small, the correlation of the preset number before the selection is performed, and the candidate entity corresponding to the selected correlation is determined to be the target entity. Or, the obtained multiple correlations may be ranked in order from small to large, and a preset number of correlations are selected, and the candidate entity corresponding to the selected correlations is determined as the target entity. The preset number is a positive integer and can be 3, 4, 5 or other numerical values, and the preset number can be determined by comprehensively considering the accuracy and the number of the target entities to be extracted.
For example, if the number of the plurality of correlations of 0.8, 0.7, 0.6, 0.5, and 0.4, which are obtained by arranging in order from large to small, is 2, two entities having the correlations of 0.8 and 0.7 are determined as target entities of the target text information.
When the preset number of correlations is selected, the correlations may be selected only according to the arrangement order, so that the selected correlations are larger than the correlations that are not selected. Or, the selection may be performed according to the arrangement sequence and the type of the candidate entity corresponding to each relevance, so that the type of the candidate entity corresponding to the selected relevance is a preset type, and the selected relevance is greater than the relevance of other candidate entities of the same preset type. Or, the selection may be performed according to the arrangement sequence and the domain to which the candidate entity corresponding to each relevance belongs, so that the domain to which the candidate entity corresponding to the selected relevance belongs is a preset domain, and the selected relevance is greater than the relevance of other candidate entities that also belong to the preset domain.
In another possible implementation manner, according to the multiple correlations, a correlation larger than a preset threshold is selected, and an alternative entity corresponding to the selected correlation is determined as the target entity. And when the correlation degree is smaller than the preset threshold value, filtering the candidate entity corresponding to the correlation degree, thereby determining one or more target entities.
The preset threshold may be 0.5, 0.6 or other values, and the preset number may be determined by comprehensively considering the accuracy and number of the target entities to be extracted.
For example, if the plurality of correlations are 0.8, 0.7, 0.6, 0.5, and 0.4 and the preset threshold is 0.6, two entities having the correlations of 0.8 and 0.7 are determined as target entities of the target text information.
When the correlation is selected, the correlation larger than the preset threshold can be selected only according to the preset threshold, so that the selected correlation is larger than the unselected correlation. Or, the selection may be performed according to a preset threshold and the type of the candidate entity corresponding to each relevance, so that the type of the candidate entity corresponding to the selected relevance is a preset type, and the selected relevance is greater than the preset threshold. Or, the selection may be performed according to a preset threshold and a domain to which the candidate entity corresponding to each relevance belongs, so that the domain to which the candidate entity corresponding to the selected relevance belongs is a preset domain, and the selected relevance is greater than the preset threshold.
In another possible implementation manner, the plurality of correlations are ranked, the correlations larger than a preset threshold and the number of the correlations is a preset number are selected according to the ranking order of the plurality of correlations, and the selected correlations are determined to be the target entity.
For example, if the plurality of correlations are 0.8, 0.7, 0.6, 0.5, and 0.4, the preset threshold is 0.5, and the preset number is 2, two entities having the correlations of 0.8 and 0.7 are determined as target entities of the target text information.
According to the embodiment of the invention, according to the relevance between each candidate entity and the target sentence, the candidate entity which is relatively relevant to the target text information is selected, and the candidate entity which is not relevant to the target text information is not selected any more, so that the effect of extracting the implicit target entity which is relevant to the target sentence is realized, the effect of extracting the explicit target entity in the target text information is also realized, and the accuracy of the target entity is improved.
105. If a plurality of target entities of the target text information have been determined and a plurality of designated entities having the same meaning are included in the plurality of target entities, the plurality of designated entities are replaced with one entity having the same meaning as the plurality of designated entities.
The target entity determined in the above steps may be one or more. In one possible implementation manner, if a plurality of target entities of the target text information are determined, the plurality of target entities may be detected first, whether target entities with the same meaning exist in the plurality of target entities is determined, and processing is performed according to the detection result.
In the first case, no target entity with the same meaning exists in the plurality of target entities, that is, different target entities in the plurality of target entities have different meanings. In this case, the multiple target entities do not need to be subjected to deduplication processing.
In the second case, a plurality of specified entities having the same meaning exist in the plurality of target entities, and the plurality of specified entities are replaced with one entity having the same meaning as the plurality of specified entities.
In one possible implementation manner, when the entity is replaced, one entity with the same meaning corresponding to the specified entities is obtained, the specified entities are deleted, and the one entity is determined to be the target entity of the target text information, so that the effect of replacing the specified entities with the one entity with the same meaning is achieved.
In another possible implementation manner, when the entity replacement is performed, any one of the specified entities is acquired, the any one specified entity is reserved, and other specified entities in the specified entities are deleted, so that the effect that the specified entities are replaced by one entity with the same meaning is achieved.
It should be noted that, when the plurality of target entities are detected, the detection may be performed according to a preset association relationship. Namely, acquiring a preset association relationship, wherein the preset association relationship comprises at least one association item, and each association item comprises a plurality of entities with the same meaning. According to the target entities, respectively inquiring a preset association relation to obtain association items to which each target entity belongs, and when the target entities are determined to respectively belong to different association items, determining that target entities with the same meaning do not exist in the target entities. When determining that a plurality of specified entities in a plurality of target entities belong to the same association entry, determining that a plurality of specified entities with the same meaning exist in the plurality of target entities, replacing the plurality of specified entities with any entity in the association entry, wherein any entity can be any one of the plurality of specified entities or another entity which is different from the plurality of specified entities in the association entry.
For example, as shown in the following table 1, when the plurality of specified entities include "XX share company", "XX limited company", the "XX share company" may be deleted, the "XX limited company" may be reserved, or the "XX share company" and the "XX limited company" may be deleted, and the "XX limited liability company" may be taken as the target entity of the target text information.
TABLE 1
In the embodiment of the invention, a plurality of target entities in the target text information are determined, a plurality of appointed entities with the same meaning are determined according to the plurality of target entities, and the plurality of appointed entities are replaced by one entity with the same meaning, so that only one entity can be reserved for the plurality of entities with the same meaning, the disambiguation of the target entities is realized, and the data volume is reduced under the condition of ensuring the accuracy of the extracted entities.
In another embodiment, if a target entity of the target text information has been determined, no deduplication processing is required for the target entity.
In the embodiment of the invention, the correlation degree between different entities and the target text information is different, after the target entity of the target text information is determined, the correlation degree between the target entity and the target text information can be obtained, and the correlation degree between the target entity and the target text information can be determined according to the correlation degree. The following steps 106 to 107 are taken as an example in the embodiment of the present invention, and a procedure for acquiring a correlation degree according to descriptive text information of a target entity is described.
106. And acquiring descriptive text information of the target entity.
The descriptive text information is used for describing the target entity and can be used as a semantic tag of the target entity. Acquiring descriptive text information of the target entity may include at least one of the following steps 1061-1064:
1061. and acquiring a historical search record of the target entity, wherein the historical search record comprises search results obtained by searching by taking the target entity as a keyword, and acquiring descriptive text information according to the historical search record.
The generation method of the history search record is as follows: in the operation process of any device, keywords can be obtained, searching is performed according to the keywords to obtain search results related to the keywords, and search records of the keywords are generated, wherein the search records comprise the obtained search results, the search results can comprise articles, web pages or link addresses and the like, and the search results can comprise text information such as news and advertisements. In addition, the search record may include, in addition to the search results, a search time, a user initiating the search, and the like.
The search result obtained by searching the target entity by using the target entity as a keyword can be considered as a search result related to the target entity, can represent the meaning of the target entity, can describe the target entity, and the search result which is searched by using the target entity as a keyword but not obtained can be considered as a search result unrelated to the target entity, so that the processing device can acquire a historical search record of the target entity, acquire descriptive text information according to the historical search record, and can describe the target entity according to the search result in the historical search record.
The processing device may acquire the history search record generated by the local end, and may also acquire the history search records generated by other devices, or collect the history search records generated by a plurality of devices, so as to determine the history search record of the target entity.
In one possible implementation, obtaining descriptive text information from a historical search record may include: obtaining search results in the historical search records, extracting words from the search results, determining the extracted words as descriptive text information of the target entity, or determining the title of the search results as descriptive text information of the target entity, or determining the search results as descriptive text information of the target entity.
The descriptive text information can be acquired according to each search result in the historical search records, or only some search results are selected, and the descriptive text information is acquired according to the selected search results. If a preset number of search results are selected from the historical search records, and the release time of the selected search results is later than that of other search results.
1062. And acquiring a historical access record of the target entity, wherein the historical access record comprises a search result of executing access operation after searching by taking the target entity as a keyword, and acquiring descriptive text information according to the historical access record.
Regarding the generation mode of the history access record, in the operation process of any device, keywords can be obtained, searching is performed according to the keywords, and search results related to the keywords are obtained, wherein the search results can comprise articles, web pages or link addresses and the like, and the search results can comprise text information such as news and advertisements. And when the search result is displayed, the user can trigger access operation on the search result to access the search result, and access records of keywords are generated. The access record comprises a search result of executing the access operation after the search is obtained, and in addition, the access record can also comprise access time, a user initiating the access and the like.
The target entity is used as a keyword to search to obtain a search result, the search result is considered to be a search result related to the target entity and can represent the meaning of the target entity, the target entity can be described, and the search result which is used as a keyword to search but not obtained is obtained, or the search result which is used as a keyword to search but not perform the access operation is considered to be a search result unrelated to the target entity, so that the processing device can acquire the historical access record of the target entity and acquire descriptive text information according to the historical access record, and the target entity can be described according to the search result in the historical access record.
The processing device may acquire the history access record generated by the local terminal, and may also acquire the history access records generated by other devices, or collect the history access records generated by multiple devices, so as to determine the history access record of the target entity.
In one possible implementation, obtaining descriptive text information from the historical access record may include: the method comprises the steps of obtaining search results in a history access record, extracting words from the search results, determining the extracted words as descriptive text information of a target entity, or determining titles of the search results as descriptive text information of the target entity, or determining the search results as descriptive text information of the target entity.
The descriptive text information can be acquired according to each search result in the history access record, or only some search results are selected, and the descriptive text information is acquired according to the selected search results. If the preset number of search results are selected from the history access records, the release time of the selected search results is later than that of other search results.
1063. And acquiring an association diagram comprising a plurality of entities, wherein any two associated entities in the association diagram are connected, acquiring at least one entity connected with the target entity according to the association diagram, and taking the at least one entity as descriptive text information of the target entity.
The association diagram comprises a plurality of entities, and any two associated entities are connected with each other. For example, the association graph includes a plurality of nodes, each node represents an entity, and the nodes can be connected with each other, which represents association of two entities represented by the two nodes.
The association graph can be an event graph, a company graph, a name graph and the like, the event graph can represent the association relationship between event entities, the company graph can represent the association relationship between company entities, and the name graph can represent the association relationship between different name entities of the same object.
In the association diagram, at least one entity connected with the target entity can be regarded as an entity associated with the target entity, can represent the meaning of the target entity and can describe the target entity, so that the processing equipment acquires the at least one entity connected with the target entity according to the association diagram, takes the at least one entity as description text information of the target entity, and can describe the target entity according to the entity associated with the target entity.
For example, as shown in fig. 2, the entity connected to the target entity a is C, E, G, I, and C, E, G, I is taken as descriptive text information of the target entity.
1064. And acquiring the notification message of which the publisher is a target entity, and acquiring descriptive text information according to the notification message.
One or more publishers may publish notification messages for other users to view. These publishers may include companies, groups, organizations, individuals, etc., the published notification messages may include news, advertisements, business notifications, announcements, research reports, etc., and the notification messages include text information. And when the publisher publishes the notification message, the notification message can be published on a webpage, published in an instant messaging application, or published in other manners.
In the embodiment of the invention, the processing device can collect notification messages issued by a plurality of issuers, and when the target entity is determined, the notification messages issued by the target entity, namely the notification messages of which the issuers are target entities, are queried, and descriptive text information is acquired according to the notification messages.
The notification message issued by the target entity can be regarded as a notification message associated with the target entity, can represent the meaning of the target entity, and can describe the target entity, so that the processing device obtains description text information according to the notification message, and can describe the target entity according to the notification message issued by the target entity.
In one possible implementation, obtaining descriptive text information from a notification message includes: the method comprises the steps of obtaining a notification message issued by a target entity, extracting words from the notification message, determining the extracted words as descriptive text information of the target entity, or determining the title of the notification message as descriptive text information of the target entity, or determining the notification message as descriptive text information of the target entity.
As shown in fig. 3, in the embodiment of the present invention, four items in steps 1061-1064 may be executed to obtain descriptive text information of the target entity, and then at least one item of the four obtained descriptive text information may be combined to serve as descriptive text information of the target entity.
In the embodiment of the invention, the acquired descriptive text information can represent the meaning of the target entity, and the relevance of the target text information of the target entity is acquired according to the descriptive text information, so that a novel mode for acquiring the relevance of the target entity and the target text information is provided, and the accuracy of the relevance can be improved.
107. And acquiring the relativity of the target entity and the target text information according to the target text information, the target entity and the description text information of the target entity.
The processing equipment acquires the target text information, the target entity and the descriptive text information of the target entity, and acquires the correlation degree of the target entity and the target text information according to the target text information, the descriptive text information of the target entity and the target entity, so that the correlation degree of the target entity and the target text information can be measured according to the correlation degree.
It should be noted that steps 106-107 are optional and may or may not be performed. In another embodiment, the relevance of the target entity to the target text information may be obtained according to at least one of the target text information, the target entity, and descriptive text information of the target entity. The process may include at least one of the following steps 1071-1077:
1071. and obtaining the correlation degree between the target entity and the target text information according to the occurrence times of the target entity in the target text information.
The number of times of occurrence of the target entity in the target text information may be determined according to the number of target entities in the target text information.
In one possible implementation manner, the number of occurrences of the target entity in the target text information and the number of words in the target text information are obtained, a ratio of the number of occurrences of the target text information to the number of words in the target text information is calculated, and the ratio is determined as the correlation between the target entity and the target text information.
Or, obtaining the occurrence times of the target entity in the target text information and the number of words in the target text information, calculating the ratio of the occurrence times to the number of words, and determining the ratio as the relativity of the target entity and the target text information. Alternatively, the correlation degree between the target entity and the target text information may be obtained in other manners, which will not be described in detail herein.
In the embodiment of the invention, the relevance between the target entity and the target text information is determined according to the occurrence times of the target entity in the target text information, and the larger the occurrence times of the target entity, the larger the influence on the meaning of the target text information is, the larger the relevance is obtained to indicate that the target entity is more relevant to the target text information.
1072. The method comprises the steps of obtaining the number of the same words of target text information and description text information, obtaining the total number of words of the target text information and the description text information, and obtaining the relativity of a target entity and the target text information according to the number of the same words and the total number of the words.
The method comprises the steps of obtaining words in target text information, obtaining words in descriptive text information, matching the words in the target text information with the words in the descriptive text information, and determining the same words of the target text information and the descriptive text information, so that the number of the same words is obtained. And then acquiring words in the target text information, acquiring words in the descriptive text information, acquiring the total number of words of the target text information and the descriptive text information, and acquiring the relativity of the target entity and the target text information according to the same number of words and the total number of words.
When the total number of the words is obtained, the sum of the number of the words of the target text information and the number of the words describing the text information can be directly calculated to obtain the total number of the words, or the words of the target text information and the words describing the text information can be subjected to de-duplication processing to obtain non-repeated words, and then the number of the words is obtained to obtain the total number of the words.
In one possible implementation, a ratio of the number of identical words to the total number of words is calculated by a Jaccard similarity algorithm, and the ratio is determined as a relevance of the target entity to the target text information.
That is, the relevance of the target entity and the target text information is obtained by adopting the following formula:
wherein T is news Representing a set of words in target text information, T e Representing a set of words in the target text information, N (T news ∩T e ) Representing the number of words of the target text information that are identical to the words describing the text information, N (T news ∪T e ) Score representing total number of words of target text information and descriptive text information Jac And representing the relatedness of the target entity and the target text information.
For example, the number of words in the target text information is A, B, C, D, F, the number of words in the descriptive text information is A, C, D, Q, E, R, the number of words in the target text information which are the same as those in the descriptive text information is A, C, D, the number of words in the target text information is 3, and the total number of words in the target text information and the words in the descriptive text information is 8, so that the obtained correlation degree between the target entity and the target text information is 0.375.
1073. And acquiring the correlation degree between the target entity and the target text information according to the third vector of the target text information and the fourth vector of the target entity.
In one possible implementation, a vector of the target text information is obtained as a third vector, a vector of the target entity is obtained as a fourth vector, and a correlation between the third vector and the fourth vector is calculated as a correlation between the candidate entity and the target text information.
Alternatively, when a vector of target text information is acquired, a Doc2Vec (Document to Vector, text expressed as a vector) algorithm may be employed to acquire the vector.
In one possible implementation manner, after the third vector and the fourth vector are obtained, cosine similarity of the third vector and the fourth vector may be calculated, where the cosine similarity is used to represent the correlation between the target entity and the target text information, that is, the following formula is used to calculate the correlation between the target entity and the target text information:
wherein V is news Representing a third vector, V e Represents the fourth vector, |V news II represents the length of the third vector, II V e II represents the length of the fourth vector, score w2v And representing the relatedness of the target entity and the target text information.
Or calculating the Euclidean distance between the third vector and the fourth vector, determining the correlation degree of the target entity and the target text information according to the Euclidean distance, and enabling the correlation degree to be in a negative correlation relation with the Euclidean distance. Or calculating the relevance of the third vector and the fourth vector in other modes, and taking the calculated relevance as the relevance of the target entity and the target text information.
1074. And acquiring a first domain vector of the target text information and a second domain vector of the target entity, and acquiring the correlation degree between the target entity and the target text information according to the first domain vector and the second domain vector.
In the embodiment of the invention, each field represents a class of objects with the same characteristics, and each field can comprise a plurality of objects. Wherein, the multiple fields can include agriculture, service industry, etc. according to industry division. The division is made by profession, and the multiple fields may include philosophy, economics, medicine, etc. Of course, other manners of dividing may be adopted to obtain multiple fields.
The target text information and the target entity belong to any one or more fields respectively, and the correlation degree of the target text information and the target entity can be influenced by the difference of the fields of the target text information and the target entity. Therefore, the first domain vector and the second domain vector can be acquired, and the correlation degree between the target entity and the target text information can be acquired according to the first domain vector and the second domain vector. The multiple elements in the first domain vector are used for measuring the probability that the target text information belongs to multiple domains corresponding to the multiple elements respectively, and the multiple elements in the second domain vector are used for measuring the probability that the target entity belongs to multiple domains corresponding to the multiple elements respectively.
The processing device may acquire the target text information and the target entity, acquire probabilities that the target text information belongs to each field by using a preset field algorithm, combine probabilities that the target text information belongs to each field to obtain a first field vector of the target text information, acquire probabilities that the target entity belongs to each field by using the preset field algorithm, and combine probabilities that the target entity belongs to each field to obtain a second field vector of the target text information.
The preset domain algorithm may be a word matching method, a naive bayes algorithm, or other algorithms.
For example, the probability of the industry to which the target text information belongs is shown in table 2 below.
TABLE 2
In one possible implementation manner, after the first domain vector and the second domain vector are obtained, cosine similarity of the first domain vector and the second domain vector can be calculated, and the cosine similarity is determined as the correlation between the target entity and the target text information. Namely, the relevance of the target entity and the target text information is calculated by adopting the following formula:
wherein L is news Representing a first domain vector, L e Representing a second domain vector, |L news II represents the length of the first domain vector, L e II represents the length of the second field vector, score topic And representing the relatedness of the target entity and the target text information.
Or, the Euclidean distance between the first domain vector and the second domain vector is calculated, the Euclidean distance is determined as the correlation degree between the target entity and the target text information, and the correlation degree and the Euclidean distance are in a negative correlation relationship. Or calculating the correlation degree of the first domain vector and the second domain vector in other modes, and taking the calculated correlation degree as the correlation degree of the target entity and the target text information.
1075. Extracting a plurality of sentences in the target text information, and acquiring the relativity of the target entity and the target text information according to the occurrence times of the target entity and the descriptive text information in each sentence of the plurality of sentences.
After the target text information is obtained, punctuation marks, such as periods, question marks, exclamation marks and the like, which can represent that corresponding sentences are finished, in the target text information are obtained, and sentence division is carried out on the target text information according to the obtained punctuation marks, so that a plurality of sentences in the target text information are obtained.
According to the occurrence times of the target entity and the descriptive text information in each sentence of the multiple sentences, the occurrence times of the target entity and the descriptive text information in the target text information can be determined, and the correlation degree of the target entity and the target text information is obtained according to the occurrence times.
In one possible implementation manner, a word in the target entity and the descriptive text information is obtained, for each sentence in the target text information, the word in the sentence is obtained, the number of occurrences of the word in the target entity and the target text information in the sentence is determined according to the word in the sentence, the word in the target entity and the word in the descriptive text information, and the degree of correlation between the target entity and the target text information is obtained according to the number of occurrences of the word in the target entity and the descriptive text information in each sentence of a plurality of sentences.
In one possible implementation manner, the total occurrence frequency of the target entity and the description text information in the target text information is obtained, the number of sentences in the target text information is obtained, the average occurrence frequency of words in the target entity and the description text information in the target text information is determined according to the total occurrence frequency and the number of sentences, the variance is calculated according to the occurrence frequency and the average occurrence frequency of words in the target entity and the description text information in each sentence, and the variance is used as the correlation degree of the target entity and the description text information, namely the correlation degree of the target entity and the description text information is calculated by adopting the following formula:
Wherein b i Representing the number of occurrences of the target entity and words in the descriptive text information in the ith sentence,represents the average number of occurrences, n represents the number of sentences in the target text information, score std And representing the relatedness of the target entity and the target text information.
Or calculating the ratio of the occurrence times of the target entity and the descriptive text information in each sentence to the total occurrence times of the target text information, and determining the relativity of the target entity and the target text information according to the obtained ratio.
For any one of the sentences, calculating the ratio of the occurrence times of the target entity and the descriptive text information in the sentence to the total occurrence times of the descriptive text information in the target text information, calculating the logarithm of the ratio, and determining the relativity of the target entity and the target text information according to the ratio of each sentence and the logarithm of the ratio. That is, the relevance of the target entity to the target text information is calculated using the following formula:
wherein p (b) i ) Representing target entities and descriptive text messagesThe ratio of the number of occurrences of the message in the ith sentence to the total number of occurrences in the target text message, n representing the number of sentences in the target text message, score entropy And representing the relatedness of the target entity and the target text information.
1076. And obtaining the relevance of the target entity and the target text information according to the value score of the target entity, wherein the value score is determined by at least one of the number of resources, the number of interactions and the occurrence number of the target entity.
The number of resources of the target entity refers to the number of resources equivalent to the target entity, and the value of the target entity can be measured. The number of interactions of the target entities is the number of interactions performed by the target entities, and the interactions may include the amount of interaction data between the target entities, the number of interactions between the target entities, and the like, and the number of occurrences of the target entities is the number of occurrences of the target entities in the collected text information.
The number of resources can measure the value of the target entity, the number of interactions and the number of occurrences can measure the activity degree of the target entity, so that the value score can measure the importance of the target entity, the higher the value score of the target entity is, the more important the target entity is represented, and the greater the influence is, so that the value score can participate in the process of calculating the relevance between the target entity and the target text information, and the relevance between the target entity and the target text information can be determined according to the value score.
In the embodiment of the invention, the processing equipment can acquire the resource quantity, the interaction quantity and the occurrence number of the target entity in real time, and determine the value score of the target entity according to the resource quantity, the interaction quantity and the occurrence number of the target entity acquired each time, so as to acquire the correlation degree between the target entity and the target text information. Or the processing device may periodically obtain the number of resources, the number of interactions and the number of occurrences of the target entity, and determine the value score of the target entity according to the number of resources, the number of interactions and the number of occurrences of the target entity obtained in the current period, so as to obtain the correlation between the target entity and the target text information, and the next period may continuously obtain the correlation between the target entity and the target text information, so as to implement updating of the correlation.
In one case, the value score is the resource number of the target entity, the processing device presets the corresponding relation between the resource number range and the correlation degree, when the resource number of the target entity is obtained, the resource number range to which the resource number belongs is determined, and the correlation degree corresponding to the resource number range is obtained according to the corresponding relation and is used as the correlation degree between the target entity and the target text information.
In another case, the value score is the interaction number of the target entity, the processing device presets the corresponding relation between the interaction number range and the correlation degree, when the interaction number of the target entity is obtained, the interaction number range to which the interaction number belongs is determined, and the correlation degree corresponding to the interaction number range is obtained according to the corresponding relation and is used as the correlation degree between the target entity and the target text information.
In another case, the value score is the number of occurrences of the target entity, the processing device presets a correspondence between the number of occurrences and the degree of correlation, and when the number of occurrences of the target entity is obtained, the range of the number of occurrences to which the number of occurrences belongs is determined, and the degree of correlation corresponding to the range of the number of occurrences is obtained according to the correspondence, as the degree of correlation between the target entity and the target text information.
In one possible implementation manner, under the three situations, according to the number of resources, the number of interactions and the number of occurrences of the target entity, the correlation degree corresponding to the number of resources, the correlation degree corresponding to the number of interactions and the correlation degree corresponding to the number of occurrences of the target entity are respectively determined, the weights corresponding to the number of resources, the number of interactions and the number of occurrences of the target entity are respectively obtained, and the determined three correlation degrees are weighted and summed according to the weights corresponding to the number of resources, the number of interactions and the number of occurrences of the target entity, so that the correlation degree between the target entity and the target text information is obtained.
1077. And acquiring the correlation degree between the target entity and the target text information according to the position of the target entity in the target text information.
Wherein the target text information may be divided into a header portion and a body portion, and the position of the target entity in the target text information may be represented by the header portion or the body portion. Alternatively, the target text information may be divided according to sentences, and the position of the target entity in the target text information may be represented by the order of the sentences to which the target entity belongs in the target text information.
The more important the target entity is in the target text information, the more relevant the target entity is to the target text information, when the target entity is positioned farther forward in the target text information.
The processing device may obtain a correspondence between a position of the target entity in the target text information and the correlation, and determine the correlation between the target entity and the target text information according to the position of the target entity in the target text information and the correspondence. For example, if the target entity is located in the header portion of the target text information, the correlation is 0.7, and if the target entity is located in the body portion of the target text information, the correlation is 0.4.
It should be noted that, after the relevance is calculated by any one of the above methods, the obtained relevance may be directly determined as the relevance between the target entity and the target text information, and when the relevance is calculated by the above methods, the weight of each calculation method is obtained, after the relevance is calculated by the above method, the corresponding multiple relevance is weighted and summed according to the weight of each method to obtain the relevance, and the relevance is determined as the relevance between the target entity and the target text information.
In another possible implementation, a relevance model is obtained, and based on the relevance model, a relevance of the target entity to the target text information is determined.
For example, as shown in fig. 4, the target text information and the target entity are acquired, the word in the target text information and the descriptive text information of the target entity are acquired, the feature of the word in the target text information and the feature of the descriptive text information are extracted based on the relevance model, and the relevance of the target entity and the target text information is acquired according to the feature of the word in the target text information and the feature of the descriptive text information. The relevance model may be a logistic regression model or other model.
After the step 107 is performed to obtain the relevance between the target entity and the target text information, in one possible implementation manner, the corresponding relationship between the target text information, the target entity and the relevance may be stored for subsequent query. In another possible implementation manner, the target entity with low correlation degree with the target text information can be filtered out according to the correlation degree.
For example, if a plurality of target entities of the target text information are determined, according to the correlation degree between each target entity and the target text information, a target entity with a larger correlation degree can be selected, and a target entity with a smaller correlation degree is filtered, so that interference of irrelevant entities can be reduced, and the accuracy of the target entity is improved.
In one possible implementation manner, the determined multiple correlations are ranked, a preset number of correlations are selected according to the ranking order of the multiple correlations, the selected correlations are larger than other correlations in the multiple correlations, and target entities corresponding to other correlations in the multiple correlations are filtered out.
The obtained multiple correlations can be ranked according to the order from large to small, the correlation of the preset number before is selected, and target entities corresponding to other correlations in the multiple correlations are filtered out.
Or, the obtained multiple correlations may be ranked in order from small to large, and a preset number of correlations are selected to filter out target entities corresponding to other correlations in the multiple correlations. The preset number is a positive integer and can be 3, 4, 5 or other numerical values, and the preset number can be determined by comprehensively considering the accuracy and the number of the target entities to be extracted.
In another possible implementation manner, according to the multiple correlations, selecting a target entity corresponding to the correlation larger than the preset threshold, and filtering out the target entity corresponding to the correlation not larger than the preset threshold. And when the correlation degree is smaller than the preset threshold value, filtering the target entity corresponding to the correlation degree.
The preset threshold may be 0.5, 0.6 or other values, and the preset number may be determined by comprehensively considering the accuracy and number of the target entities to be extracted.
In another possible implementation manner, the plurality of correlations are ranked, the correlations larger than a preset threshold and the number of the correlations is a preset number are selected according to the ranking order of the plurality of correlations, and the selected correlations are determined to be the target entity.
According to the method provided by the embodiment of the invention, the target sentence in the target text information is extracted, the candidate entity set is obtained, the relevance between each candidate entity in the plurality of candidate entities and the target sentence is obtained, and the target entity of the target text information is determined according to the obtained plurality of relevance. The method for determining the target entity is expanded, the meaning of the target sentence is fully considered, the target entity related to the semantic of the target sentence can be selected, the target entity appearing in the target text information is not only selected, the accuracy of the target entity is improved, and the functional range is expanded.
And when a plurality of target entities of the target text information are determined, a plurality of designated entities with the same meaning in the plurality of target entities can be replaced by one entity with the same meaning, so that only one entity can be reserved for the plurality of entities with the same meaning, and the data volume is reduced under the condition of ensuring the accuracy of the extracted entities.
And the correlation degree between the target entity and the target text information can be obtained, so that the correlation degree between the target entity and the target text information can be measured, and the information quantity is improved.
It should be noted that the above embodiment provides a way to determine the target entity according to the correlation degree. On the basis of the embodiment, the following two ways can be provided to continuously determine the target entity of the target text information, so as to realize the expansion of the target entity.
In one possible implementation, the target entity of the target text information is determined by a NER (Named Entity Recognition ) algorithm: a target entity of the target text information may be determined based on the entity recognition model. The entity recognition model may be HMM (Hidden Markov Model, hidden markov), CRF (Conditional Random Field ), LSTM (Long Short Term Memory, long term short term memory) based model combined with CRF, or other entity recognition model.
As shown in table 3 below, when determining the target entity based on the model of LSTM combined with CRF, the target entity may be determined by using a word vector, or a combination of a word vector and a word vector, and the accuracy, recall, and F1 score (the harmonic mean of the accuracy and recall) of the target entity under these three modes may be collected. The accuracy is the ratio of the number of correctly recognized entities to the total number of the recognized entities, the recall rate is the total number of the correctly recognized entities and the entities in the test set, and the F1 score is a value determined according to the accuracy and the recall rate.
TABLE 3 Table 3
When the entity identification model is trained, each sample text message is labeled by crawling each sample text message, so that the labeled sample text message can be used as a training data set, and the entity identification model is trained by the training data set, so that the entity identification model has the capability of identifying an entity.
In another possible implementation, the entities in the target text information are matched by a set of alternative entities: obtaining an alternative entity set, wherein the alternative entity set comprises a plurality of alternative entities, the alternative entity set can be used as a white list, word segmentation processing is carried out on target text information to obtain a plurality of words, the plurality of words are matched with a plurality of alternative entities in the alternative entity set, and when the alternative entity set comprises words in the plurality of words, the included words are used as target entities of the target text information.
Wherein matching may be performed in the manner of a Tire tree, or by other means.
In summary, the embodiments of the present invention provide three ways of determining the target entity, and on the basis of the ways of determining the target entity according to the correlation, the target entity may be determined in the two possible implementations, so that different types of target entities may be extracted, and the comprehensiveness of extracting the target entity is enhanced.
Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.
The entity determining method provided by the embodiment of the invention can be applied to various application scenes.
In the entity recommendation scene, when a news is acquired, a target entity can be determined according to a sentence, a section or the whole news in the news, and when the news is displayed, the determined target entity is recommended to a user.
As shown in fig. 5, the display interface of the terminal is displaying an application interface of the XX application program, where the application interface includes a plurality of news, each news includes a brief introduction of the news and a target entity extracted from the news, and the news and the target entity are displayed to the user at the same time, so that the user can view the news conveniently.
Or recommending the determined target entity and the relevance of the target entity and the news to the user when the news is displayed.
For example, a news show that "XX company develops a three-dimensional integration technology that can simultaneously increase bandwidth, reduce latency, bring higher performance and lower power consumption, and that this XX company's president AA indicates that this three-dimensional integration technology has reached an international leading level. By adopting the method provided by the embodiment of the invention, the target entity can be determined as XX company, integrated technology, AA, and technology research and development, and the corresponding correlation degree is respectively 0.672, 0.821, 0.763 and 0.989, and the determined target entity and the corresponding correlation degree are displayed below the news.
Under the public opinion monitoring scene, determining target entities of a plurality of events, establishing a map of the events and the target entities, analyzing the relation between any two events or the relation between any two entities through the map, and summarizing to obtain entity association conditions, thereby realizing public opinion monitoring. As shown in fig. 6, different events may be linked to one entity or multiple entities to form a map.
Fig. 7 is a schematic structural diagram of an entity determining apparatus according to an embodiment of the present invention. Referring to fig. 7, the apparatus includes:
an extracting module 701, configured to extract a target sentence in the target text information;
a first obtaining module 702, configured to obtain an alternative entity set, where the alternative entity set includes a plurality of alternative entities;
a second obtaining module 703, configured to obtain a relevance between each of the plurality of candidate entities and the target sentence;
and the determining module 704 is configured to determine, according to the obtained multiple correlations, a target entity of the target text information, where the correlation between the target entity and the target sentence is greater than the correlation between other entities in the multiple candidate entities and the target sentence.
In one possible implementation, the second acquisition module 703 includes:
A first obtaining unit, configured to obtain a first vector of a target sentence;
and the second acquisition unit is used for acquiring a second vector of each candidate entity, and determining the correlation degree of the second vector and the first vector as the correlation degree of the candidate entity and the target sentence.
In another possible implementation, the determining module 704 includes:
the selection unit is used for selecting a preset number of correlations according to the arrangement sequence of the correlations, wherein the selected correlations are larger than other correlations in the correlations, and the candidate entity corresponding to the selected correlations is determined as a target entity;
the selecting unit is further configured to select, according to the plurality of correlations, a correlation greater than a preset threshold, and determine an alternative entity corresponding to the selected correlation as a target entity.
In another possible implementation, the apparatus further includes:
and the replacing module is used for replacing the specified entities with one entity with the same meaning as the specified entities if the target entities of the target text information are determined and the specified entities with the same meaning are included in the target entities.
In another possible implementation, the replacement module includes:
The acquisition unit is used for acquiring a preset association relationship, wherein the preset association relationship comprises at least one association item, and each association item comprises a plurality of entities with the same meaning;
the query unit is used for respectively querying preset association relations according to a plurality of target entities to obtain association items of each target entity;
and the replacing unit is used for replacing the specified entities with any entity in the association items when the specified entities in the target entities are determined to belong to the same association item.
In another possible implementation, the apparatus further includes:
the word segmentation module is used for carrying out word segmentation processing on the target text information to obtain at least one word in the target text information;
the determining module is further configured to determine a term included in the candidate entity set in the at least one term as a target entity of the target text information.
In another possible implementation, the apparatus further includes:
and the third acquisition module is used for acquiring the relativity between the target entity and the target text information according to at least one of the target text information, the target entity and the descriptive text information of the target entity.
In another possible implementation, the apparatus further includes: the descriptive text acquisition module is used for executing at least one of the following:
Acquiring a history search record of a target entity, wherein the history search record comprises search results obtained by searching by taking the target entity as a keyword, and acquiring descriptive text information according to the history search record;
acquiring a historical access record of a target entity, wherein the historical access record comprises a search result of executing access operation after searching by taking the target entity as a keyword, and acquiring descriptive text information according to the historical access record;
acquiring an association diagram comprising a plurality of entities, wherein any two associated entities in the association diagram are connected, acquiring at least one entity connected with a target entity according to the association diagram, and taking the at least one entity as descriptive text information of the target entity;
and acquiring the notification message of which the publisher is a target entity, and acquiring descriptive text information according to the notification message.
In another possible implementation manner, the third obtaining module is configured to perform at least one of the following:
acquiring the relativity of the target entity and the target text information according to the occurrence times of the target entity in the target text information;
acquiring the number of the same words of the target text information and the description text information, acquiring the total number of words of the target text information and the description text information, and acquiring the relativity of the target entity and the target text information according to the number of the same words and the total number of words;
Acquiring the correlation degree between the target entity and the target text information according to the third vector of the target text information and the fourth vector of the target entity;
acquiring a first domain vector of the target text information and a second domain vector of the target entity, wherein a plurality of elements in the first domain vector are respectively used for measuring the probability that the target text information belongs to a plurality of domains corresponding to the plurality of elements, and a plurality of elements in the second domain vector are respectively used for measuring the probability that the target entity belongs to a plurality of domains corresponding to the plurality of elements, and acquiring the relativity of the target entity and the target text information according to the first domain vector and the second domain vector;
extracting a plurality of sentences in the target text information, and acquiring the relativity of the target entity and the target text information according to the occurrence times of the target entity and the descriptive text information in each sentence of the plurality of sentences;
acquiring the relevance of the target entity and the target text information according to the value score of the target entity, wherein the value score is determined by at least one of the number of resources, the number of interactions and the occurrence number of the target entity;
and acquiring the correlation degree between the target entity and the target text information according to the position of the target entity in the target text information.
According to the device provided by the embodiment of the invention, the target sentence in the target text information is extracted, the candidate entity set is obtained, the correlation degree between each candidate entity in the plurality of candidate entities and the target sentence is obtained, and the target entity of the target text information is determined according to the obtained correlation degrees. The method for determining the target entity is expanded, the meaning of the target sentence is fully considered, the target entity related to the semantic of the target sentence can be selected, the target entity appearing in the target text information is not only selected, the accuracy of the target entity is improved, and the functional range is expanded.
It should be noted that: the entity determining apparatus provided in the above embodiment only illustrates the division of the above functional modules when determining the target entity, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the processing device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiments of the entity determining apparatus provided in the foregoing embodiments and the embodiments of the entity determining method belong to the same concept, and specific implementation processes of the embodiments of the entity determining apparatus are detailed in the method embodiments, which are not repeated herein.
Fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention. The terminal 800 may be a portable mobile terminal such as: smart phones, tablet computers, MP3 players (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) players, notebook computers, desktop computers, head mounted devices, or any other intelligent terminal. Terminal 800 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like.
In general, the terminal 800 includes: a processor 801 and a memory 802.
Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and rendering of content required to be displayed by the display screen. In some embodiments, the processor 801 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for being possessed by processor 801 to implement the entity determination methods provided by the method embodiments herein.
In some embodiments, the terminal 800 may further optionally include: a peripheral interface 803, and at least one peripheral. The processor 801, the memory 802, and the peripheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to the peripheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 804, a touch display 805, a camera 806, audio circuitry 807, and a power supply 809.
Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral to processor 801 and memory 802. In some embodiments, processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
The Radio Frequency circuit 804 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 8G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 804 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.
The display screen 805 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to collect touch signals at or above the surface of the display 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, the display 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 805 may be one, providing a front panel of the terminal 800; in other embodiments, the display 805 may be at least two, respectively disposed on different surfaces of the terminal 800 or in a folded design; in still other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the terminal 800. Even more, the display 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 805 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
The camera assembly 806 is used to capture images or video. Optionally, the camera assembly 806 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to the radio frequency circuit 804 for voice communication. For stereo acquisition or noise reduction purposes, a plurality of microphones may be respectively disposed at different portions of the terminal 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 807 may also include a headphone jack.
A power supply 809 is used to power the various components in the terminal 800. The power supply 809 may be an alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 809 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the terminal 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyroscope sensor 812, pressure sensor 813, optical sensor 815, and proximity sensor 816.
The acceleration sensor 811 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 800. For example, the acceleration sensor 811 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 801 may control the touch display screen 805 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 811. Acceleration sensor 811 may also be used for the acquisition of motion data of a game or user.
The gyro sensor 812 may detect a body direction and a rotation angle of the terminal 800, and the gyro sensor 812 may collect a 3D motion of the user to the terminal 800 in cooperation with the acceleration sensor 811. The processor 801 may implement the following functions based on the data collected by the gyro sensor 812: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
The pressure sensor 813 may be disposed at a side frame of the terminal 800 and/or at a lower layer of the touch display 805. When the pressure sensor 813 is disposed on a side frame of the terminal 800, a grip signal of the terminal 800 by a user may be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at the lower layer of the touch display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 805. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the touch display screen 805 based on the intensity of ambient light collected by the optical sensor 815. Specifically, when the intensity of the ambient light is high, the display brightness of the touch display screen 805 is turned up; when the ambient light intensity is low, the display brightness of the touch display screen 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera module 806 based on the ambient light intensity collected by the optical sensor 815.
A proximity sensor 816, also referred to as a distance sensor, is typically provided on the front panel of the terminal 800. The proximity sensor 816 is used to collect the distance between the user and the front of the terminal 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front of the terminal 800 gradually decreases, the processor 801 controls the touch display 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 800 gradually increases, the processor 801 controls the touch display 805 to switch from the off-screen state to the on-screen state.
Those skilled in the art will appreciate that the structure shown in fig. 8 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.
Fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention, where the server 900 may have a relatively large difference due to configuration or performance, and may include one or more processors (central processing units, CPU) 901 and one or more memories 902, where at least one instruction is stored in the memories 902, and the at least one instruction is loaded and executed by the processors 901 to implement the methods provided in the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
The server 900 may be configured to perform the steps performed by the processing device in the entity determination method described above.
The embodiment of the invention also provides an entity determining device, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, code set or instruction set is stored in the memory, and the instruction, the program, the code set or the instruction set is loaded by the processor and has the operation of the entity determining method in the embodiment.
The embodiment of the present invention also provides a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the instruction, the program, the set of codes, or the set of instructions being loaded by a processor and having operations for implementing the entity determining method of the above embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (16)

1. A method of entity determination, the method comprising:
extracting a target sentence in the target text information;
acquiring an alternative entity set, wherein the alternative entity set comprises a plurality of alternative entities;
acquiring the relevance between each candidate entity in the plurality of candidate entities and the target sentence;
determining a target entity of the target text information according to the acquired multiple correlations, wherein the correlation between the target entity and the target sentence is greater than the correlation between other candidate entities in the multiple candidate entities and the target sentence;
acquiring the relativity of the target entity and the target text information according to at least one of the target text information, the target entity and the descriptive text information of the target entity;
the method further comprises at least one of:
acquiring a history search record of the target entity, wherein the history search record comprises search results obtained by searching by taking the target entity as a keyword, and acquiring the descriptive text information according to the history search record;
Acquiring a history access record of the target entity, wherein the history access record comprises a search result of executing access operation after searching by taking the target entity as a keyword, and acquiring the descriptive text information according to the history access record;
acquiring an association diagram comprising a plurality of entities, wherein any two associated entities in the association diagram are connected, acquiring at least one entity connected with the target entity according to the association diagram, and taking the at least one entity as descriptive text information of the target entity;
and acquiring a notification message of the target entity by a publisher, and acquiring the descriptive text information according to the notification message.
2. The method of claim 1, wherein the obtaining the relevance of each of the plurality of candidate entities to the target sentence comprises:
acquiring a first vector of the target sentence;
and for each candidate entity, acquiring a second vector of the candidate entity, and determining the relevance of the second vector and the first vector as the relevance of the candidate entity and the target sentence.
3. The method of claim 1, wherein determining the target entity of the target text information based on the obtained plurality of correlations comprises at least one of:
Selecting a preset number of correlations according to the arrangement sequence of the correlations, wherein the selected correlations are larger than other correlations in the correlations, and determining an alternative entity corresponding to the selected correlations as the target entity;
and selecting a correlation degree larger than a preset threshold according to the correlation degrees, and determining an alternative entity corresponding to the selected correlation degree as the target entity.
4. The method of claim 1, wherein after determining the target entity of the target text information based on the obtained plurality of correlations, the method further comprises:
if a plurality of target entities of the target text information are determined and a plurality of designated entities having the same meaning are included in the plurality of target entities, the plurality of designated entities are replaced with one entity having the same meaning as the plurality of designated entities.
5. The method of claim 4, wherein if a plurality of target entities of the target text information have been determined and a plurality of designated entities having the same meaning are included in the plurality of target entities, replacing the plurality of designated entities with one entity having the same meaning as the plurality of designated entities, comprises:
Acquiring a preset association relationship, wherein the preset association relationship comprises at least one association item, and each association item comprises a plurality of entities with the same meaning;
respectively inquiring the preset association relation according to the target entities to obtain association items of each target entity;
when determining that a plurality of designated entities in the plurality of target entities belong to the same association entry, replacing the plurality of designated entities with any entity in the association entry.
6. The method of claim 1, wherein after the obtaining the set of candidate entities, the method further comprises:
word segmentation is carried out on the target text information, and at least one word in the target text information is obtained;
and determining the words included in the alternative entity set in the at least one word as target entities of the target text information.
7. The method of claim 1, wherein the obtaining the relevance of the target entity to the target text information based on at least one of the target text information, the target entity, and descriptive text information of the target entity comprises at least one of:
Acquiring the relativity of the target entity and the target text information according to the occurrence times of the target entity in the target text information;
acquiring the number of the same words of the target text information and the description text information, acquiring the total number of words of the target text information and the description text information, and acquiring the relativity of the target entity and the target text information according to the number of the same words and the total number of words;
acquiring the correlation degree between the target entity and the target text information according to the third vector of the target text information and the fourth vector of the target entity;
acquiring a first domain vector of the target text information and a second domain vector of the target entity, wherein a plurality of elements in the first domain vector are respectively used for measuring the probability that the target text information belongs to a plurality of domains corresponding to the plurality of elements, and a plurality of elements in the second domain vector are respectively used for measuring the probability that the target entity belongs to a plurality of domains corresponding to the plurality of elements, and acquiring the correlation degree of the target entity and the target text information according to the first domain vector and the second domain vector;
Extracting a plurality of sentences in the target text information, and acquiring the relativity of the target entity and the target text information according to the occurrence times of the target entity and the descriptive text information in each sentence of the plurality of sentences;
acquiring the relevance of the target entity and the target text information according to the value score of the target entity, wherein the value score is determined by at least one of the number of resources, the number of interactions and the occurrence number of the target entity;
and acquiring the correlation degree between the target entity and the target text information according to the position of the target entity in the target text information.
8. An entity determining apparatus, the apparatus comprising:
the extraction module is used for extracting target sentences in the target text information;
the first acquisition module is used for acquiring an alternative entity set, wherein the alternative entity set comprises a plurality of alternative entities;
the second acquisition module is used for acquiring the relevance between each candidate entity in the plurality of candidate entities and the target sentence;
the determining module is used for determining a target entity of the target text information according to the acquired multiple correlations, wherein the correlation between the target entity and the target sentence is greater than the correlation between other candidate entities in the multiple candidate entities and the target sentence;
A third obtaining module, configured to obtain a correlation between the target entity and the target text information according to at least one of the target text information, the target entity, and descriptive text information of the target entity;
the apparatus further comprises a descriptive text acquisition module for performing at least one of:
acquiring a history search record of the target entity, wherein the history search record comprises search results obtained by searching by taking the target entity as a keyword, and acquiring the descriptive text information according to the history search record;
acquiring a history access record of the target entity, wherein the history access record comprises a search result of executing access operation after searching by taking the target entity as a keyword, and acquiring the descriptive text information according to the history access record;
acquiring an association diagram comprising a plurality of entities, wherein any two associated entities in the association diagram are connected, acquiring at least one entity connected with the target entity according to the association diagram, and taking the at least one entity as descriptive text information of the target entity;
and acquiring a notification message of the target entity by a publisher, and acquiring the descriptive text information according to the notification message.
9. The apparatus of claim 8, wherein the second acquisition module comprises:
a first obtaining unit, configured to obtain a first vector of the target sentence;
and the second acquisition unit is used for acquiring a second vector of each candidate entity, and determining the relevance of the second vector and the first vector as the relevance of the candidate entity and the target sentence.
10. The apparatus of claim 8, wherein the means for determining comprises:
a selecting unit, configured to select a preset number of correlations according to an arrangement sequence of the multiple correlations, where the selected correlations are greater than other correlations in the multiple correlations, and determine an alternative entity corresponding to the selected correlations as the target entity;
the selecting unit is further configured to select, according to the plurality of correlations, a correlation greater than a preset threshold, and determine an alternative entity corresponding to the selected correlation as the target entity.
11. The apparatus of claim 8, wherein the apparatus further comprises:
and the replacing module is used for replacing the specified entities with one entity with the same meaning as the specified entities if the target entities of the target text information are determined and the specified entities with the same meaning are included in the target entities.
12. The apparatus of claim 11, wherein the replacement module comprises:
the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring a preset association relationship, the preset association relationship comprises at least one association item, and each association item comprises a plurality of entities with the same meaning;
the query unit is used for respectively querying the preset association relation according to the target entities to obtain association items of each target entity;
and the replacing unit is used for replacing the specified entities with any entity in the association items when the specified entities in the target entities belong to the same association item.
13. The apparatus of claim 8, wherein the apparatus further comprises:
the word segmentation module is used for carrying out word segmentation processing on the target text information to obtain at least one word in the target text information;
the determining module is further configured to determine a term included in the candidate entity set in the at least one term as a target entity of the target text information.
14. The apparatus of claim 8, wherein the third acquisition module is configured to perform at least one of:
Acquiring the relativity of the target entity and the target text information according to the occurrence times of the target entity in the target text information;
acquiring the number of the same words of the target text information and the description text information, acquiring the total number of words of the target text information and the description text information, and acquiring the relativity of the target entity and the target text information according to the number of the same words and the total number of words;
acquiring the correlation degree between the target entity and the target text information according to the third vector of the target text information and the fourth vector of the target entity;
acquiring a first domain vector of the target text information and a second domain vector of the target entity, wherein a plurality of elements in the first domain vector are respectively used for measuring the probability that the target text information belongs to a plurality of domains corresponding to the plurality of elements, and a plurality of elements in the second domain vector are respectively used for measuring the probability that the target entity belongs to a plurality of domains corresponding to the plurality of elements, and acquiring the correlation degree of the target entity and the target text information according to the first domain vector and the second domain vector;
Extracting a plurality of sentences in the target text information, and acquiring the relativity of the target entity and the target text information according to the occurrence times of the target entity and the descriptive text information in each sentence of the plurality of sentences;
acquiring the relevance of the target entity and the target text information according to the value score of the target entity, wherein the value score is determined by at least one of the number of resources, the number of interactions and the occurrence number of the target entity;
and acquiring the correlation degree between the target entity and the target text information according to the position of the target entity in the target text information.
15. An entity determining apparatus, characterized in that it comprises a processor and a memory, in which at least one instruction, at least one program, code set or instruction set is stored, which instruction, program, code set or instruction set is loaded and executed by the processor to implement the operations performed in the entity determining method according to any of claims 1 to 7.
16. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the program, the set of codes, or the set of instructions being loaded and executed by a processor to implement the operations performed in the entity determining method of any one of claims 1 to 7.
CN201910177268.5A 2019-03-08 2019-03-08 Entity determining method, device and storage medium Active CN109918669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910177268.5A CN109918669B (en) 2019-03-08 2019-03-08 Entity determining method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910177268.5A CN109918669B (en) 2019-03-08 2019-03-08 Entity determining method, device and storage medium

Publications (2)

Publication Number Publication Date
CN109918669A CN109918669A (en) 2019-06-21
CN109918669B true CN109918669B (en) 2023-08-08

Family

ID=66963981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910177268.5A Active CN109918669B (en) 2019-03-08 2019-03-08 Entity determining method, device and storage medium

Country Status (1)

Country Link
CN (1) CN109918669B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241839B (en) * 2020-01-16 2022-04-05 腾讯科技(深圳)有限公司 Entity identification method, entity identification device, computer readable storage medium and computer equipment
CN113761214A (en) * 2020-06-05 2021-12-07 智慧芽信息科技(苏州)有限公司 Information flow extraction method, device and equipment
CN111930722A (en) * 2020-09-21 2020-11-13 北京嘀嘀无限科技发展有限公司 Heterogeneous information network processing method, heterogeneous information network processing device, server and readable storage medium
CN112507198B (en) * 2020-12-18 2022-09-23 北京百度网讯科技有限公司 Method, apparatus, device, medium, and program for processing query text
CN112836513A (en) * 2021-02-20 2021-05-25 广联达科技股份有限公司 Linking method, device and equipment of named entities and readable storage medium
CN112989165B (en) * 2021-03-26 2022-07-01 浙江有数数智科技有限公司 Method for calculating public opinion entity relevance
CN112907301B (en) * 2021-03-29 2022-06-14 哈尔滨工业大学 Bi-LSTM-CRF model-based content-related advertisement delivery method and system
CN113111648B (en) * 2021-04-06 2022-09-09 北京字跳网络技术有限公司 Information processing method and device, terminal and storage medium
CN113486373A (en) * 2021-07-13 2021-10-08 苏州医沃智控科技有限公司 eCTD universal technology document submission management method and system
CN113657100B (en) * 2021-07-20 2023-12-15 北京百度网讯科技有限公司 Entity identification method, entity identification device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2605150A1 (en) * 2011-12-16 2013-06-19 Presans Method for identifying the named entity that corresponds to an owner of a web page
CN103748582A (en) * 2011-08-24 2014-04-23 国际商业机器公司 Entity Resolution based on relationships to common entity
CN108197137A (en) * 2017-11-20 2018-06-22 广州视源电子科技股份有限公司 Processing method, device, storage medium, processor and the terminal of text
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9189473B2 (en) * 2012-05-18 2015-11-17 Xerox Corporation System and method for resolving entity coreference
US10592807B2 (en) * 2016-09-01 2020-03-17 Facebook, Inc. Systems and methods for recommending content items

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103748582A (en) * 2011-08-24 2014-04-23 国际商业机器公司 Entity Resolution based on relationships to common entity
EP2605150A1 (en) * 2011-12-16 2013-06-19 Presans Method for identifying the named entity that corresponds to an owner of a web page
CN108197137A (en) * 2017-11-20 2018-06-22 广州视源电子科技股份有限公司 Processing method, device, storage medium, processor and the terminal of text
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device

Also Published As

Publication number Publication date
CN109918669A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
CN109918669B (en) Entity determining method, device and storage medium
CN108304441B (en) Network resource recommendation method and device, electronic equipment, server and storage medium
CN110471858B (en) Application program testing method, device and storage medium
CN110458360B (en) Method, device, equipment and storage medium for predicting hot resources
CN111291200B (en) Multimedia resource display method and device, computer equipment and storage medium
CN111339737B (en) Entity linking method, device, equipment and storage medium
CN111897996A (en) Topic label recommendation method, device, equipment and storage medium
CN112269853A (en) Search processing method, search processing device and storage medium
CN113032587B (en) Multimedia information recommendation method, system, device, terminal and server
CN110929159B (en) Resource release method, device, equipment and medium
CN113569042A (en) Text information classification method and device, computer equipment and storage medium
CN108416026B (en) Index generation method, content search method, device and equipment
WO2021218634A1 (en) Content pushing
CN110837557B (en) Abstract generation method, device, equipment and medium
CN112818080A (en) Search method, device, equipment and storage medium
CN110929137B (en) Article recommendation method, device, equipment and storage medium
CN111553163A (en) Text relevance determining method and device, storage medium and electronic equipment
CN113222771B (en) Method and device for determining target group based on knowledge graph and electronic equipment
CN111259252B (en) User identification recognition method and device, computer equipment and storage medium
CN111597823B (en) Method, device, equipment and storage medium for extracting center word
CN111414496B (en) Artificial intelligence-based multimedia file detection method and device
CN109635153B (en) Migration path generation method, device and storage medium
CN111259161B (en) Ontology establishing method and device and storage medium
CN113783909B (en) Data demand generation method, device, terminal, server and storage medium
CN112925963B (en) Data recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant