CN114201598A - Text recommendation method and text recommendation device - Google Patents

Text recommendation method and text recommendation device Download PDF

Info

Publication number
CN114201598A
CN114201598A CN202210148616.8A CN202210148616A CN114201598A CN 114201598 A CN114201598 A CN 114201598A CN 202210148616 A CN202210148616 A CN 202210148616A CN 114201598 A CN114201598 A CN 114201598A
Authority
CN
China
Prior art keywords
text
candidate
entity
entities
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210148616.8A
Other languages
Chinese (zh)
Other versions
CN114201598B (en
Inventor
丁红霞
余志颖
伍星
吴忠毅
徐更惟
李靖
李琪
廖宛玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingwei Jingwei Information Technology Beijing Co ltd
Original Assignee
Jingwei Jingwei Information Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingwei Jingwei Information Technology Beijing Co ltd filed Critical Jingwei Jingwei Information Technology Beijing Co ltd
Priority to CN202210148616.8A priority Critical patent/CN114201598B/en
Publication of CN114201598A publication Critical patent/CN114201598A/en
Application granted granted Critical
Publication of CN114201598B publication Critical patent/CN114201598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the disclosure provides a text recommendation method and a text recommendation device. The text recommendation method comprises the following steps: acquiring a plurality of reference entities included in a reference text and reference entity relations among the plurality of reference entities; establishing a reference relation graph aiming at a reference text according to the relation between the multiple reference entities and the reference entities; acquiring a plurality of texts to be recommended; taking each of the plurality of texts as a candidate text, respectively, and performing the following operations on the candidate text: obtaining a plurality of candidate entities included in the candidate text and candidate entity relations among the plurality of candidate entities; establishing a candidate relation graph aiming at the candidate text according to the candidate entities and the candidate entity relation; determining a common subgraph of the reference relationship graph and the candidate relationship graph; calculating the similarity between the candidate text and the reference text according to the occurrence frequency of the entities and the entity relations in the common subgraph in the reference text and the candidate text respectively; and recommending the text according to the similarity calculated for the plurality of texts.

Description

Text recommendation method and text recommendation device
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a text recommendation method and a text recommendation device.
Background
With the development of internet technology, the number of texts that can be read online is increasing explosively, and the online reading amount of people is increasing day by day. At present, the reading habit of users is mostly reading by utilizing fragmentary time. It is time consuming to find the content of interest to the user from the huge amount of text. Therefore, the intelligent recommendation of the content which the user may be interested in can effectively help the user to save the retrieval time and improve the use experience of the user.
Disclosure of Invention
Embodiments described herein provide a text recommendation method, a text recommendation apparatus, and a computer-readable storage medium storing a computer program.
According to a first aspect of the present disclosure, a text recommendation method is provided. The text recommendation method comprises the following steps: acquiring a plurality of reference entities included in a reference text and reference entity relations among the plurality of reference entities; establishing a reference relation graph aiming at a reference text according to the relation between the multiple reference entities and the reference entities; acquiring a plurality of texts to be recommended; taking each of the plurality of texts as a candidate text, respectively, and performing the following operations on the candidate text: obtaining a plurality of candidate entities included in the candidate text and candidate entity relations among the plurality of candidate entities; establishing a candidate relation graph aiming at the candidate text according to the candidate entities and the candidate entity relation; determining a common subgraph of the reference relationship graph and the candidate relationship graph; calculating the similarity between the candidate text and the reference text according to the occurrence frequency of the entities and the entity relations in the common subgraph in the reference text and the candidate text respectively; and recommending one or more of the plurality of texts according to the similarity calculated for the plurality of texts.
In some embodiments of the present disclosure, the similarity of the candidate text to the reference text is calculated as:
Figure 100002_DEST_PATH_IMAGE001
wherein S represents the similarity between the candidate text and the reference text,
Figure 187786DEST_PATH_IMAGE002
representing the frequency with which the ith entity in the common sub-graph appears in the reference text,
Figure 100002_DEST_PATH_IMAGE003
representing the frequency of occurrence of the ith entity in the common sub-graph in the candidate text,
Figure 774101DEST_PATH_IMAGE004
indicating the total frequency of occurrence of all reference entities in the reference relationship graph in the reference text,
Figure 100002_DEST_PATH_IMAGE005
representing the total frequency of occurrence of all candidate entities in the candidate relationship graph in the candidate text,
Figure 366887DEST_PATH_IMAGE006
representing the frequency with which the ith entity relationship in the common sub-graph appears in the reference text,
Figure 100002_DEST_PATH_IMAGE007
representing the frequency of occurrence of the ith entity relationship in the common subgraph in the candidate text,
Figure 558834DEST_PATH_IMAGE008
representing the total frequency of occurrence of all reference entity relationships in the reference relationship graph in the reference text,
Figure 100002_DEST_PATH_IMAGE009
representing the total frequency of occurrence of all candidate entity relations in the candidate relation graph in the candidate text, n representing the number of entities in the common sub-graph, m representing the number of entity relations in the common sub-graph, X representing a first parameter, and Y representing a second parameter.
In some embodiments of the present disclosure, the values of X and Y are determined based on the current user's browsing history.
In some embodiments of the present disclosure, the values of X and Y are determined from the browsing history of other users.
In some embodiments of the present disclosure, the values of X and Y are determined based on the browsing history of the current user and other users.
In some embodiments of the present disclosure, the values of X and Y are updated in real-time according to the current user's browsing history.
In some embodiments of the present disclosure, the values of X and Y are updated at predetermined time intervals based on the current user's browsing history.
In some embodiments of the present disclosure, the values of X and Y are updated in real-time according to the browsing history of other users.
In some embodiments of the present disclosure, the values of X and Y are updated at predetermined time intervals based on the browsing history of other users.
In some embodiments of the present disclosure, the values of X and Y are updated in real-time according to the current user's and other users' browsing histories.
In some embodiments of the present disclosure, the values of X and Y are updated at predetermined time intervals based on the browsing history of the current user and other users.
In some embodiments of the present disclosure, the text is news.
In some embodiments of the present disclosure, the reference text is news that the user is currently browsing.
In some embodiments of the present disclosure, the plurality of texts to be recommended is a plurality of news stored in a database.
In some embodiments of the present disclosure, the reference entity and/or the candidate entity comprise one or more of: drug name, disease name, symptom name, body tissue name, target name, medical regulation name, instrument name, examination name, and organizational name.
In some embodiments of the present disclosure, a reference entity relationship comprises a relationship between any two of the reference entities.
In some embodiments of the present disclosure, the candidate entity relationship comprises a relationship between any two of the candidate entities.
In some embodiments of the present disclosure, the text recommendation method further comprises: determining a main word corresponding to the obtained reference entity or the reference entity relationship, wherein the main word is used for representing a uniform name of the obtained reference entity or the reference entity relationship; converting the obtained reference entity or reference entity relationship into a corresponding main word; determining a main word corresponding to the obtained candidate entity or candidate entity relationship, wherein the main word is used for representing a uniform name of the obtained candidate entity or candidate entity relationship; and converting the obtained candidate entity or candidate entity relationship into the corresponding main word.
In some embodiments of the disclosure, obtaining the reference entity relationships between the plurality of reference entities and the plurality of reference entities included in the reference text comprises: labeling entities and entity relationships in the reference text using a predetermined dictionary; identifying entities and entity relationships in the reference text using a bi-directional encoder representation model; quantitatively scoring the entity and the entity relationship labeled by using a predetermined dictionary and the entity relationship identified by adopting a bidirectional encoder representation model by using a conditional random field weight network; and determining a plurality of reference entities and reference entity relationships between the plurality of reference entities according to the result of the quantitative scoring.
According to a second aspect of the present disclosure, a text recommendation apparatus is provided. The text recommendation device comprises at least one processor; and at least one memory storing a computer program. When the computer program is executed by at least one processor, causing the text recommendation apparatus to acquire a plurality of reference entities included in the reference text and reference entity relationships between the plurality of reference entities; establishing a reference relation graph aiming at a reference text according to the relation between the multiple reference entities and the reference entities; acquiring a plurality of texts to be recommended; taking each of the plurality of texts as a candidate text, respectively, and performing the following operations on the candidate text: obtaining a plurality of candidate entities included in the candidate text and candidate entity relations among the plurality of candidate entities; establishing a candidate relation graph aiming at the candidate text according to the candidate entities and the candidate entity relation; determining a common subgraph of the reference relationship graph and the candidate relationship graph; calculating the similarity between the candidate text and the reference text according to the occurrence frequency of the entities and the entity relations in the common subgraph in the reference text and the candidate text respectively; and recommending one or more of the plurality of texts according to the similarity calculated for the plurality of texts.
In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the text recommendation device to further: determining a main word corresponding to the obtained reference entity or the reference entity relationship, wherein the main word is used for representing a uniform name of the obtained reference entity or the reference entity relationship; converting the obtained reference entity or reference entity relationship into a corresponding main word; determining a main word corresponding to the obtained candidate entity or candidate entity relationship, wherein the main word is used for representing a uniform name of the obtained candidate entity or candidate entity relationship; and converting the obtained candidate entity or candidate entity relationship into the corresponding main word.
In some embodiments of the disclosure, the computer program, when executed by the at least one processor, causes the text recommendation apparatus to obtain a plurality of reference entities included in the reference text and reference entity relationships between the plurality of reference entities by: labeling entities and entity relationships in the reference text using a predetermined dictionary; identifying entities and entity relationships in the reference text using a bi-directional encoder representation model; quantitatively scoring the entity and the entity relationship labeled by using a predetermined dictionary and the entity relationship identified by adopting a bidirectional encoder representation model by using a conditional random field weight network; and determining a plurality of reference entities and reference entity relationships between the plurality of reference entities according to the result of the quantitative scoring.
According to a third aspect of the present disclosure, there is provided a computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the text recommendation method according to the first aspect of the present disclosure.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, it being understood that the drawings described below relate only to some embodiments of the present disclosure, and not to limit the present disclosure, wherein:
FIG. 1 is an exemplary flow diagram of a text recommendation method according to an embodiment of the present disclosure;
FIG. 2 is an exemplary diagram of a reference relationship graph for reference text;
FIG. 3 is an exemplary diagram of a candidate relationship graph for candidate text;
FIG. 4 is an exemplary diagram of the reference relationship graph of FIG. 2 after merging with the candidate relationship graph of FIG. 3; and
fig. 5 is a schematic block diagram of a text recommendation device according to an embodiment of the present disclosure.
The elements in the drawings are schematic and not drawn to scale.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below in detail and completely with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are also within the scope of protection of the disclosure.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the presently disclosed subject matter belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the specification and relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein. Terms such as "first" and "second" are used only to distinguish one element (or a portion of an element) from another element (or another portion of an element).
As described above, the number of texts that can be read online has increased explosively, and especially texts of news information (which may be simply referred to as "news") are released almost all the time. The release form of news is more and more diversified along with the development of technology, and the reading mode of news is more and more abundant. Therefore, there is a real and widespread need in the current era of fast food culture to recommend text (particularly news) to users that may be of interest to them.
Some news reading methods are performed by means of entity tags. This approach does not provide an intuitive representation of the events described by the news itself. Therefore, there is a limitation in making relevant recommendations based on entity tags.
The embodiment of the disclosure provides a text recommendation method. FIG. 1 illustrates an exemplary flow diagram of a text recommendation method 100 according to an embodiment of the disclosure. In some embodiments of the present disclosure, the text is from, for example, a microblog, a WeChat public number, an electronic newspaper, an electronic impurity, a forum, web page information, and the like. In one example, the text is news, for example. The process of text recommendation is described below with reference to fig. 1.
At block S102, a plurality of reference entities included in the reference text and reference entity relationships between the plurality of reference entities are obtained. In some embodiments of the present disclosure, the reference text may be, for example, news that the user is currently browsing. In case the news the user is currently browsing is e.g. news about medical aspects, the reference entities may e.g. be: drug name, disease name, symptom name, body tissue name, target name, medical regulation name, instrument name, examination name, and organization name, etc. The reference entity relationship between reference entities may be a relationship between any two reference entities, e.g., a drug treatment disease, a disease diagnosed as a symptom, a body tissue exhibiting a symptom, etc.
In some embodiments of the present disclosure, a domain knowledge base may be established. The domain knowledge base is mainly used for reading and extracting data acquired by the directional data source. The interpretation and extraction function is to expect that the relationship between the entities in the text is extracted through the keywords and the description of the relationship in the text, so as to form a relationship network diagram (also called a relationship diagram) of the text. The domain knowledge base is mainly divided into two parts: named entities and semantic relationships. Semantic relationships may include hierarchical relationships, associative relationships, and the like.
In some embodiments of the present disclosure, all of the specified entities (e.g., drug name entities, etc.) in the reference text may be extracted first. And then, sequentially taking each specified entity as a condition, and predicting by using a pre-trained prediction model to obtain a relation triple. The relationship triplets may include, for example: specifying entities, relationships, and associated entities.
In one example, the extraction and transformation of labels and relationships in text can be performed from a professional thesaurus and a manually organized dictionary (which can be understood simply as using dictionary matching). The advantages of using dictionary matching are: the results are interpretable, can be maintained manually, and are fast to iterate. But the disadvantages are: the method has no disambiguation capability, cannot solve the ambiguity problem, and is easy to mismatch entity words.
In another example, a model prediction approach, such as a Bidirectional Encoder Representation (BERT) model, may be employed to achieve the specified entity identification and entity relationship extraction. The advantages of using the BERT model are: has certain disambiguation capability and can solve ambiguity problems. But the disadvantages are: the result is poor in interpretability, slow in version iteration and prone to losing entity words.
Considering the advantages and disadvantages of both approaches, in some embodiments of the present disclosure, the results of the dictionary matching may be used in conjunction with the results of the BERT model. Results from dictionary matching and model prediction can be quantitatively scored using a Conditional Random Field (CRF) weighting network to obtain a composite result. In one example, entities and entity relationships in the reference text can be labeled using a predetermined dictionary. A bi-directional encoder representation model may be employed to identify entities and entity relationships in reference text. The entities and entity relationships labeled using the predetermined dictionary and those identified using the bi-directional encoder representation model can then be quantitatively scored using a conditional random field weight network. And determining a plurality of reference entities and reference entity relations among the plurality of reference entities according to the result of the quantitative scoring. This allows more accurate identification of entities and entity relationships.
In some embodiments of the disclosure, a primary word corresponding to the obtained reference entity or reference entity relationship may be determined. The main word is used to denote a uniform name for a reference entity or a reference entity relationship. The main word is, for example, a professional term. And the reference entity obtained may be an alternative to the term of art. The obtained reference entity or reference entity relationship may then be converted into its corresponding subject word. In case the obtained reference entity or reference entity relationship is consistent with its main word, no conversion is needed. In one example, the correspondence of the main word to its alternative name may be stored as a mapping table. After the alternative names of the entities are obtained, the corresponding main words can be found in the mapping table. For example, if the obtained reference entity is "Ibrutinib", "erlotinib", "Ibrutinib", or "Ibrutinib", it may be converted into the main word "Ibrutinib" according to the above mapping table. Such that substantially the same entity or relationship of entities is a uniform term for convenience in subsequent operation.
At block S104, a reference relationship graph for the reference text is established from the plurality of reference entities and the reference entity relationships. Fig. 2 shows an exemplary schematic diagram of a reference relationship diagram 210 for reference text 201. In the example of fig. 2, the following reference entities are identified in reference text 201: drug a1, disease B1, disease B2, symptom C1, symptom C2, body tissue D1, and target E1. The following reference entity relationships are identified in reference text 201: drug a1 treated disease B1, drug a1 treated disease B2, disease B2 was diagnosed as symptom C1, and body tissue D1 exhibited symptom C1.
At block S106, a plurality of texts to be recommended are obtained. In some embodiments of the present disclosure, the plurality of text instances to be recommended are, for example, texts stored in a database. In the case where the news that the user is currently browsing is, for example, news about medical aspects, the plurality of texts to be recommended are, for example, other news about medical aspects stored in the database. In some embodiments of the disclosure, texts with the same label can be screened out in advance according to the label of the reference text to serve as texts to be recommended, so that the recommendation range can be narrowed to a more effective range, and subsequent calculation amount is saved.
At block S108, one text of the plurality of texts to be recommended is determined as a candidate text. In some embodiments of the present disclosure, the plurality of texts to be recommended may be sorted according to the timestamp generated by the texts and a sorting table may be generated. In one example, the sorted list may be generated in reverse chronological order. One of the plurality of texts may be determined as a candidate text according to the sorted list. In other embodiments of the present disclosure, the sorted list may be randomly generated. In still other embodiments of the present disclosure, no sorted list may be generated. Text determined to be candidate text may be marked as processed. Thus, the text is not repeatedly determined as a candidate text in subsequent operations.
At block S110, a plurality of candidate entities included in the candidate text and a candidate entity relationship between the plurality of candidate entities are obtained. In case the candidate text is news about medical aspects stored in a database, the candidate entities may be, for example: drug name, disease name, symptom name, body tissue name, target name, medical regulation name, instrument name, examination name, and organization name, etc. The candidate entity relationship between candidate entities may be a relationship between any two candidate entities, e.g., a drug treating a disease, a disease diagnosed as symptomatic, a body tissue exhibiting symptoms, etc.
In some embodiments of the present disclosure, the manner of obtaining the candidate entity and the candidate entity relationship may be the same as the manner of obtaining the reference entity and the reference entity relationship at block S102. And will not be described in detail herein.
At block S112, a candidate relationship graph for the candidate text is established from the plurality of candidate entities and candidate entity relationships. In some embodiments of the present disclosure, the candidate relationship graph for the candidate text may be established in the same manner as the reference relationship graph for the reference text is established at block S104. Fig. 3 shows an exemplary schematic diagram of a candidate relationship graph 310 for candidate text 301. In the example of fig. 3, the following candidate entities are identified in candidate text 301: drug a1, disease B1, disease B3, body tissue D2, and target E1. The following candidate entity relationships are identified in the candidate text 301: drug a1 treats disease B1.
At block S114, a common subgraph of the reference relationship graph and the candidate relationship graph is determined. In some embodiments of the present disclosure, the reference relationship graph may be merged with the same entities in the candidate relationship graph. Accordingly, the same entity relationships are also merged together. The merged entities and entity relationships form a common subgraph of the reference relationship graph and the candidate relationship graph. Fig. 4 shows an exemplary schematic diagram after the reference relationship diagram 210 in fig. 2 is merged with the candidate relationship diagram 310 in fig. 3. In the example of fig. 4, the common subgraph 410 of the reference relationship graph and the candidate relationship graph is indicated by a dashed box. The common subgraph 410 includes entities: drug A1, disease B1, and target E1. The common subgraph 410 includes entity relationships: drug a1 treats disease B1.
At block S116, a similarity of the candidate text to the reference text is calculated according to how often the entities and entity relationships in the common subgraph appear in the reference text and the candidate text, respectively. In some embodiments of the present disclosure, a frequency of occurrence of each entity in the common sub-graph in the reference text, a frequency of occurrence of each entity in the common sub-graph in the candidate text, a frequency of occurrence of each entity relationship in the common sub-graph in the reference text, a frequency of occurrence of each entity relationship in the common sub-graph in the candidate text, a total frequency of occurrence of all entities in the reference relationship graph in the reference text, a total frequency of occurrence of all entities in the candidate relationship graph in the candidate text, a total frequency of occurrence of all entity relationships in the reference relationship graph in the reference text, and a total frequency of occurrence of all entity relationships in the candidate relationship graph in the candidate text may be calculated, respectively. And then calculating the similarity between the candidate text and the reference text according to the calculated frequency.
In some embodiments of the present disclosure, the similarity S of the candidate text and the reference text may be calculated according to the following formula (1):
Figure 186256DEST_PATH_IMAGE010
(1)
wherein S represents the similarity between the candidate text and the reference text,
Figure DEST_PATH_IMAGE011
representing the frequency with which the ith entity in the common sub-graph appears in the reference text,
Figure 71035DEST_PATH_IMAGE012
representing the frequency of occurrence of the ith entity in the common sub-graph in the candidate text,
Figure DEST_PATH_IMAGE013
indicating the total frequency of occurrence of all entities in the reference relationship graph in the reference text,
Figure 151118DEST_PATH_IMAGE014
representing the total frequency of occurrence of all entities in the candidate relationship graph in the candidate text,
Figure DEST_PATH_IMAGE015
representing the frequency with which the ith entity relationship in the common sub-graph appears in the reference text,
Figure 412335DEST_PATH_IMAGE016
to representThe frequency with which the ith entity relationship in the common sub-graph appears in the candidate text,
Figure DEST_PATH_IMAGE017
indicating the total frequency of occurrence of all entity relationships in the reference relationship graph in the reference text,
Figure 643332DEST_PATH_IMAGE018
representing the total frequency of occurrence of all entity relations in the candidate relation graph in the candidate text, n representing the number of entities in the common sub-graph, m representing the number of entity relations in the common sub-graph, X representing a first parameter, and Y representing a second parameter.
In some embodiments of the present disclosure, the values of X and Y in equation (1) may be determined according to the browsing history of the current user. In one example, the browsing history of the current user over the past, e.g., K (K being a positive integer), days may be recorded. The values of X and Y may be adjusted according to the relevance of the browsing history to the candidate text. In one example, browsing history for each day may be given different weights, and the values of X and Y may be adjusted according to the relevance of browsing history to candidate text and the corresponding weights.
In some embodiments of the present disclosure, the values of X and Y in equation (1) may be determined according to browsing history of other users. In one example, the other user may be a user who browsed the reference text or the candidate text.
In some embodiments of the present disclosure, the values of X and Y in equation (1) may be determined according to the browsing history of the current user and other users. In one example, the browsing history of the current user may be given different weights than the browsing histories of the other users, and the values of X and Y may be determined from a weighted sum of the browsing histories of the current user and the other users.
In some embodiments of the present disclosure, the values of X and Y may be updated in real time, in case the computing power of the text recommendation device is sufficiently powerful. Alternatively, the values of X and Y may also be updated offline at predetermined time intervals. For example, the values of X and Y may be updated on a daily basis.
In the figure2-4, the frequency with which drug A1 appears in the reference text of FIG. 2 may be calculated
Figure DEST_PATH_IMAGE019
. The frequency of occurrence of the disease B1 in the reference text can be calculated
Figure 699013DEST_PATH_IMAGE020
. The frequency of occurrence of target point E1 in the reference text can be calculated
Figure DEST_PATH_IMAGE021
. The frequency of occurrence of the physical relationship of drug A1 for treating disease B1 in the reference text can be calculated
Figure 266392DEST_PATH_IMAGE022
. The frequency of drug A1 occurrences in the candidate text of FIG. 3 may be calculated
Figure DEST_PATH_IMAGE023
. The frequency of occurrence of disease B1 in the candidate text can be calculated
Figure 65720DEST_PATH_IMAGE024
. The frequency of occurrence of target point E1 in the candidate text can be calculated
Figure DEST_PATH_IMAGE025
. The frequency of occurrence of the physical relationship of drug A1 for treating disease B1 in the candidate text can be calculated
Figure 402155DEST_PATH_IMAGE026
. The total frequency of occurrences in the reference text of all reference entities (drug A1, disease B1, disease B2, symptom C1, symptom C2, body tissue D1, and target E1) in the reference relationship graph can be calculated
Figure 425475DEST_PATH_IMAGE028
. The total frequency of occurrence of all candidate entities (drug A1, disease B1, disease B3, body tissue D2, and target E1) in the candidate relationship graph in the candidate text can be calculated
Figure 870362DEST_PATH_IMAGE030
. The total frequency of occurrence of all reference entity relationships in the reference relationship graph (drug A1 treatment disease B1, drug A1 treatment disease B2, disease B2 diagnosed as symptom C1, and body tissue D1 exhibiting symptom C1) in the reference text can be calculated
Figure 489694DEST_PATH_IMAGE032
. The total frequency of occurrence of all candidate entity relationships (drug A1 treatment disease B1) in the candidate relationship graph in the candidate text can be calculated
Figure 602006DEST_PATH_IMAGE034
. The number of entities in a common subgraph n =3 can be calculated. The number of entity relationships in the common subgraph m =1 can be computed. Then, the similarity S of the candidate text and the reference text is calculated according to equation (1).
At block S118, it is determined whether all of the text to be recommended has been processed. In the case where a sorting table is generated for a plurality of texts to be recommended, it may be determined whether the last text of the sorting table has been processed. If the last text of the sorted list has been processed, it may be determined that all of the texts to be recommended have been processed. In the event that text determined to be candidate text is marked as processed, a determination may be made as to whether all of the text is marked as processed. If all the texts are marked as processed, it can be determined that all the texts to be recommended are processed.
If all the texts to be recommended have not been processed (no at block S118), the process proceeds to block S108, and the next text in the plurality of texts to be recommended is determined as a candidate text. In the case where the sorting table is generated for a plurality of texts to be recommended, one text immediately below the candidate text in the sorting table may be determined as the candidate text. In the case where the text determined as the candidate text is marked as processed, any one of the texts that is not marked as processed may be determined as the candidate text.
If all of the texts to be recommended have been processed ("yes" at block S118), one or more of the plurality of texts are recommended according to the similarity calculated for the plurality of texts at block S120. In some embodiments of the present disclosure, the plurality of texts may be ordered according to the calculated similarity. In some embodiments of the present disclosure, a weighted sum of one or more of a posting time ranking, a click volume ranking, a reading volume ranking, a comment volume ranking, a like volume ranking, etc. of the text and the calculated similarity may also be calculated, with the text ranked by the weighted sum.
One or more of the plurality of texts are then recommended in an order. For example, the top ranked text is displayed before the bottom ranked text. In the case where the number of recommendations is limited, a specified number of texts ranked top may be recommended.
In some embodiments of the present disclosure, the operations performed at blocks S102 and S104 may be performed in parallel with the operations performed at blocks S110 and S112. In some embodiments of the present disclosure, the operations performed at blocks S108 to S116 may be performed in parallel for at least two of the texts to be recommended.
Fig. 5 shows a schematic block diagram of a text recommendation device 500 according to an embodiment of the present disclosure. As shown in fig. 5, the text recommendation device 500 may include a processor 510 and a memory 520 storing computer programs. The computer program, when executed by the processor 510, causes the text recommendation apparatus 500 to perform the steps of the text recommendation method 100 as shown in fig. 1. In one example, the text recommendation apparatus 500 may be a computer device or a cloud computing node. The text recommendation device 500 may acquire a plurality of reference entities included in the reference text and a reference entity relationship between the plurality of reference entities. The text recommendation device 500 may establish a reference relationship graph for the reference text according to the plurality of reference entities and the reference entity relationship. The text recommendation device 500 may obtain a plurality of texts to be recommended. The text recommendation device 500 may use each of the plurality of texts as a candidate text, and perform the following operations on the candidate text: obtaining a plurality of candidate entities included in the candidate text and candidate entity relations among the plurality of candidate entities; establishing a candidate relation graph aiming at the candidate text according to the candidate entities and the candidate entity relation; determining a common subgraph of the reference relationship graph and the candidate relationship graph; and calculating the similarity of the candidate text and the reference text according to the occurrence frequency of the entities and the entity relations in the common subgraph in the reference text and the candidate text respectively. The text recommendation device 500 may recommend one or more of the plurality of texts according to the similarity calculated for the plurality of texts.
In some embodiments of the present disclosure, the text recommendation device 500 may determine a main word corresponding to the obtained reference entity or reference entity relationship and convert the obtained reference entity or reference entity relationship into its corresponding main word.
In some embodiments of the present disclosure, the text recommendation device 500 may determine a main word corresponding to the acquired candidate entity or candidate entity relationship and convert the acquired candidate entity or candidate entity relationship into its corresponding main word.
In some embodiments of the present disclosure, the text recommendation device 500 may use a predetermined dictionary to label entities and entity relationships in the reference text. The text recommender 500 may employ a bi-directional encoder representation model to identify entities and entity relationships in reference text. The text recommender 500 may use a conditional random field weight network to quantitatively score entities and entity relationships labeled using a predetermined dictionary and identified using a bi-directional encoder representation model. The text recommender 500 may determine the plurality of reference entities and the reference entity relationships between the plurality of reference entities based on the results of the quantitative scoring.
In an embodiment of the present disclosure, the processor 510 may be, for example, a Central Processing Unit (CPU), a microprocessor, a Digital Signal Processor (DSP), a processor based on a multi-core processor architecture, or the like. The memory 520 may be any type of memory implemented using data storage technology including, but not limited to, random access memory, read only memory, semiconductor-based memory, flash memory, disk memory, and the like.
Furthermore, in the embodiment of the present disclosure, the text recommendation apparatus 500 may also include an input device 530, such as a keyboard, a mouse, a touch screen, etc., for obtaining the reference text. In addition, the text recommendation apparatus 500 may further include an output device 540, such as a display or the like, for outputting the recommended text.
In other embodiments of the present disclosure, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program, when executed by a processor, is capable of implementing the steps of the method as shown in fig. 1.
In summary, the embodiments of the present disclosure perform named entity recognition and analysis of the domain knowledge base on the text, so that the text can be fully interpreted, the entities in the text can be tripled, and then the text relationship diagram is generated by the knowledge graph technology. The similarity between texts is calculated by introducing the entity relationship, so that the recommendation accuracy of similar texts is improved, and the satisfaction of the user on content recommendation is improved.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus and methods according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As used herein and in the appended claims, the singular forms of words include the plural and vice versa, unless the context clearly dictates otherwise. Thus, when reference is made to the singular, it is generally intended to include the plural of the corresponding term. Similarly, the terms "comprising" and "including" are to be construed as being inclusive rather than exclusive. Likewise, the terms "include" and "or" should be construed as inclusive unless such an interpretation is explicitly prohibited herein. Where the term "example" is used herein, particularly when it comes after a set of terms, it is merely exemplary and illustrative and should not be considered exclusive or extensive.
Further aspects and ranges of adaptability will become apparent from the description provided herein. It should be understood that various aspects of the present application may be implemented alone or in combination with one or more other aspects. It should also be understood that the description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
Several embodiments of the present disclosure have been described in detail above, but it is apparent that various modifications and variations can be made to the embodiments of the present disclosure by those skilled in the art without departing from the spirit and scope of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (10)

1. A text recommendation method, comprising:
acquiring a plurality of reference entities included in a reference text and reference entity relations among the plurality of reference entities;
establishing a reference relation graph aiming at the reference text according to the multiple reference entities and the reference entity relation;
acquiring a plurality of texts to be recommended;
respectively taking each text in the plurality of texts as a candidate text, and performing the following operations on the candidate text:
obtaining a plurality of candidate entities included in the candidate text and candidate entity relations among the candidate entities;
establishing a candidate relation graph aiming at the candidate text according to the candidate entities and the candidate entity relation;
determining a common subgraph of the reference relationship graph and the candidate relationship graph; and
calculating the similarity of the candidate text and the reference text according to the occurrence frequency of the entities and entity relations in the common subgraph in the reference text and the candidate text respectively; and
recommending one or more of the plurality of texts according to the calculated similarities for the plurality of texts.
2. The text recommendation method of claim 1, wherein the similarity of the candidate text to the reference text is calculated as:
Figure DEST_PATH_IMAGE001
wherein S represents the similarity of the candidate text and the reference text,
Figure 571587DEST_PATH_IMAGE002
representing the frequency of occurrence of the ith entity in the common sub-graph in the reference text,
Figure DEST_PATH_IMAGE003
representing the frequency of occurrence of the ith entity in the common subgraph in the candidate text,
Figure 441454DEST_PATH_IMAGE004
representing the total frequency of occurrence of all reference entities in the reference relationship graph in the reference text,
Figure DEST_PATH_IMAGE005
representing a total frequency of occurrence of all candidate entities in the candidate relationship graph in the candidate text,
Figure 802160DEST_PATH_IMAGE006
representing the frequency with which the ith entity relationship in the common subgraph appears in the reference text,
Figure DEST_PATH_IMAGE007
representing the frequency of occurrence of the ith entity relationship in the common subgraph in the candidate text,
Figure 972241DEST_PATH_IMAGE008
representing the total frequency of occurrences of all reference entity relationships in the reference relationship graph in the reference text,
Figure DEST_PATH_IMAGE009
representing the total frequency of occurrence of all candidate entity relations in the candidate relation graph in the candidate text, n representing the number of entities in the common sub-graph, m representing the number of entity relations in the common sub-graph, X representing a first parameter, and Y representing a second parameter.
3. The text recommendation method of claim 2, wherein the values of X and Y are determined according to browsing history of the current user and/or other users.
4. The text recommendation method of claim 3, wherein the values of X and Y are updated in real time and/or at predetermined time intervals according to a browsing history of the current user and/or other users.
5. The text recommendation method of claim 1,
the text is news, and/or
The reference text is news that the user is currently browsing, and/or
The plurality of texts to be recommended are a plurality of news stored in a database.
6. The text recommendation method of claim 1,
the reference entity and/or the candidate entity comprise one or more of: drug name, disease name, symptom name, body tissue name, target name, medical regulation name, instrument name, examination name, and organizational name, and/or
The reference entity relationship comprises a relationship between any two of the reference entities, and/or
The candidate entity relationships include relationships between any two of the candidate entities.
7. The text recommendation method of claim 1, further comprising:
determining a main word corresponding to the obtained reference entity or reference entity relationship, wherein the main word is used for representing a uniform name of the obtained reference entity or reference entity relationship;
converting the obtained reference entity or reference entity relationship into a corresponding main word;
determining a main word corresponding to the obtained candidate entity or candidate entity relationship, wherein the main word is used for representing a uniform name of the obtained candidate entity or candidate entity relationship; and
and converting the obtained candidate entity or candidate entity relationship into the corresponding main word.
8. The text recommendation method of claim 1, wherein obtaining a plurality of reference entities included in a reference text and reference entity relationships between the plurality of reference entities comprises:
labeling entities and entity relationships in the reference text using a predetermined dictionary;
identifying entities and entity relationships in the reference text using a bi-directional encoder representation model;
quantitatively scoring the entities and entity relationships labeled using the predetermined dictionary and the entities and entity relationships identified using the bi-directional encoder representation model using a conditional random field weight network; and
determining the plurality of reference entities and the reference entity relationships between the plurality of reference entities according to a result of the quantization scoring.
9. A text recommendation apparatus comprising:
at least one processor; and
at least one memory storing a computer program;
wherein the computer program, when executed by the at least one processor, causes the text recommendation apparatus to perform the steps of the text recommendation method of any of claims 1-8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the text recommendation method according to any one of claims 1 to 8.
CN202210148616.8A 2022-02-18 2022-02-18 Text recommendation method and text recommendation device Active CN114201598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210148616.8A CN114201598B (en) 2022-02-18 2022-02-18 Text recommendation method and text recommendation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210148616.8A CN114201598B (en) 2022-02-18 2022-02-18 Text recommendation method and text recommendation device

Publications (2)

Publication Number Publication Date
CN114201598A true CN114201598A (en) 2022-03-18
CN114201598B CN114201598B (en) 2022-05-17

Family

ID=80645657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210148616.8A Active CN114201598B (en) 2022-02-18 2022-02-18 Text recommendation method and text recommendation device

Country Status (1)

Country Link
CN (1) CN114201598B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647733A (en) * 2022-05-23 2022-06-21 中国平安财产保险股份有限公司 Question and answer corpus evaluation method and device, computer equipment and storage medium
CN114840693A (en) * 2022-07-05 2022-08-02 深圳市拓保软件有限公司 Financial image data searching method and system based on distributed graph database

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164394A (en) * 2012-07-16 2013-06-19 上海大学 Text similarity calculation method based on universal gravitation
US20170076178A1 (en) * 2015-09-14 2017-03-16 International Business Machines Corporation System, method, and recording medium for efficient cohesive subgraph identification in entity collections for inlier and outlier detection
CN110909153A (en) * 2019-10-22 2020-03-24 中国船舶重工集团公司第七0九研究所 Knowledge graph visualization method based on semantic attention model
CN113743467A (en) * 2021-08-03 2021-12-03 浙江工商大学 Use case graph similarity judgment method based on maximum public subgraph calculation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164394A (en) * 2012-07-16 2013-06-19 上海大学 Text similarity calculation method based on universal gravitation
US20170076178A1 (en) * 2015-09-14 2017-03-16 International Business Machines Corporation System, method, and recording medium for efficient cohesive subgraph identification in entity collections for inlier and outlier detection
CN110909153A (en) * 2019-10-22 2020-03-24 中国船舶重工集团公司第七0九研究所 Knowledge graph visualization method based on semantic attention model
CN113743467A (en) * 2021-08-03 2021-12-03 浙江工商大学 Use case graph similarity judgment method based on maximum public subgraph calculation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647733A (en) * 2022-05-23 2022-06-21 中国平安财产保险股份有限公司 Question and answer corpus evaluation method and device, computer equipment and storage medium
CN114840693A (en) * 2022-07-05 2022-08-02 深圳市拓保软件有限公司 Financial image data searching method and system based on distributed graph database
CN114840693B (en) * 2022-07-05 2022-09-16 深圳市拓保软件有限公司 Financial image data searching method and system based on distributed graph database

Also Published As

Publication number Publication date
CN114201598B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
Hariri et al. Uncertainty in big data analytics: survey, opportunities, and challenges
Hoffart et al. Discovering emerging entities with ambiguous names
Hu et al. Identification of highly-cited papers using topic-model-based and bibliometric features: The consideration of keyword popularity
Stein et al. Intrinsic plagiarism analysis
US20130110839A1 (en) Constructing an analysis of a document
Bisandu et al. Clustering news articles using efficient similarity measure and N-grams
US20160350294A1 (en) Method and system for peer detection
CN114201598B (en) Text recommendation method and text recommendation device
JP5057474B2 (en) Method and system for calculating competition index between objects
CN114238573B (en) Text countercheck sample-based information pushing method and device
KR20070089898A (en) Method and apparatus for evaluating searched contents by using user feedback and providing search result by utilizing evaluation result
CA2956627A1 (en) System and engine for seeded clustering of news events
Al-Obaydy et al. Document classification using term frequency-inverse document frequency and K-means clustering
El-Kishky et al. k NN-Embed: Locally Smoothed Embedding Mixtures for Multi-interest Candidate Retrieval
Zhang et al. Document keyword extraction based on semantic hierarchical graph model
Ebrahimi et al. Developing a prediction model for author collaboration in bioinformatics research using graph mining techniques and big data applications
CA2614774A1 (en) Method and system for automatically extracting data from web sites
Ganguli et al. Nonparametric method of topic identification using granularity concept and graph-based modeling
Venkateswara Rao et al. The societal communication of the Q&A community on topic modeling
Zhang et al. Stock trend forecasting method based on sentiment analysis and system similarity model
Qumsiyeh et al. Enhancing web search by using query-based clusters and multi-document summaries
CN112215006B (en) Organization named entity normalization method and system
Guadie et al. Amharic text summarization for news items posted on social media
CN115098619A (en) Information duplication eliminating method and device, electronic equipment and computer readable storage medium
Nuamah et al. Calculating error bars on inferences from web data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant