CN111930962A - Document data value evaluation method and device, electronic equipment and storage medium - Google Patents

Document data value evaluation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111930962A
CN111930962A CN202010912004.2A CN202010912004A CN111930962A CN 111930962 A CN111930962 A CN 111930962A CN 202010912004 A CN202010912004 A CN 202010912004A CN 111930962 A CN111930962 A CN 111930962A
Authority
CN
China
Prior art keywords
document data
knowledge graph
value
graph
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010912004.2A
Other languages
Chinese (zh)
Inventor
马旋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010912004.2A priority Critical patent/CN111930962A/en
Publication of CN111930962A publication Critical patent/CN111930962A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/11Patent retrieval

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to big data technology, and discloses a method for evaluating value of document data, which comprises the following steps: acquiring literature data from a preset database, and constructing an initial knowledge graph according to entity information and a correlation relation of an initial data set in the literature data; performing relation completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph; and scoring the document data by using a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and outputting the value evaluation result through a display screen of electronic equipment. The invention also relates to a block chain technology, and the literature data can be acquired from the block chain. The invention also discloses a document data value evaluation device, electronic equipment and a computer readable storage medium. The method and the device can improve the accuracy of evaluation of the value of the literature data.

Description

Document data value evaluation method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of big data, in particular to a method and a device for evaluating value of document data, electronic equipment and a computer-readable storage medium.
Background
The value of literature data, particularly patents, as an important intangible asset is increasingly valued in practice. However, at present, a unified literature data value evaluation method is not available. The traditional literature data value is mainly evaluated in a manual mode, and subjective factors are strong; the existing literature data value evaluation method is mainly used for scoring the literature data by analyzing the information of the literature data, the analysis of the literature data value is not comprehensive enough, the calculation efficiency is low on the basis of consuming a large amount of calculation resources, and the error estimation of the literature data value is easily caused.
Disclosure of Invention
The invention provides a method and a device for evaluating the value of document data, electronic equipment and a computer-readable storage medium, and mainly aims to provide a method for improving the accuracy of evaluating the value of the document data.
In order to achieve the above object, the present invention provides a document data value evaluation method, which is executed in an electronic device, and includes:
acquiring literature data from a database in communication connection with the electronic equipment, and constructing an initial knowledge graph according to entity information and correlation of an original data set in the literature data;
performing relation completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph;
and scoring the document data by using a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and outputting the value evaluation result through a display screen of electronic equipment.
Optionally, the constructing an initial knowledge graph according to entity information and correlation of an original data set in literature data includes:
extracting a plurality of keywords from the original data set by a natural language processing technology, and taking the keywords as target entities to obtain entity information;
analyzing the correlation among the entities contained in the entity information;
and constructing an initial knowledge graph according to the entity information and the correlation relationship.
Optionally, the extracting a plurality of keywords from the original data set by natural language processing technology includes:
performing word segmentation on the text contained in the original data set, and removing stop words to obtain word segmentation results;
one or more keywords are picked out from the word segmentation result.
Optionally, the constructing an initial knowledge graph according to the entity information and the correlation includes:
taking an entity in the entity information as a node of a knowledge graph;
constructing a plurality of triples according to the correlation relationship between every two entities and the attribute of each node;
and visualizing the triples to form an initial knowledge graph.
Optionally, the performing the relationship completion on the initial knowledge graph by using the pre-constructed algorithm model includes:
converting the entities and the correlation in the initial knowledge map into vectors with the same dimension by using a pre-constructed knowledge representation algorithm, and mapping the vectors into the same vector space;
calculating the vectors according to a preset distance formula, and judging whether the vectors have similarity according to the calculation result;
and when the vectors are similar, completing the correlation among the entities in the initial knowledge graph.
Optionally, the determining the similarity between the vectors according to the calculation result includes:
comparing the calculation result with a preset similar threshold value;
if the calculation result is smaller than a preset similarity threshold value, determining that the vectors have similarity;
and if the calculation result is greater than or equal to a preset similarity threshold value, judging that the vectors have no similarity.
Optionally, the pre-constructed value analysis model is:
Figure BDA0002663644050000021
wherein p isiAnd pjIs a document data node, M, in the standard knowledge graphpiIs the point p in the standard knowledge-graphiOf nodes, L (p)j) Is pjA set of nodes pointed to in the standard knowledge-graph, N is a total number of nodes in the standard knowledge-graph, α is a damping coefficient, PR (p)i) And PR (p)j) Are each piScore value of (1) and pjThe value of (a).
In order to solve the above problem, the present invention also provides a document data value evaluation device, including:
the knowledge graph construction module is used for acquiring literature data from a preset database and constructing an initial knowledge graph according to entity information and relevant relations of an original data set in the literature data;
the knowledge graph perfecting module is used for performing relationship completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph;
and the scoring module is used for scoring the document data by utilizing a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and outputting the value evaluation result through a display screen of electronic equipment.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one computer program instruction; and
a processor executing computer program instructions stored in the memory to implement the document data value assessment method of any of the above.
In order to solve the above problem, the present invention also provides a computer-readable storage medium storing a computer program which is executed by a processor to implement the document data value evaluation method according to any one of the above.
According to the embodiment of the invention, the initial knowledge graph is constructed according to the entity information and the correlation of the original data set in the document data, so that the interference of human subjective factors can be reduced, and the accuracy of the result is improved from the document data; performing relationship completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph, so that the invisible value of document data can be mined, the utilization rate of the document data is improved, and the accuracy of value evaluation is improved; and scoring the document data by using a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and ranking and scoring the nodes in the knowledge graph to realize importance scoring of the document data and reduce occupation and waste of computing resources. Therefore, the document data value evaluation method, the document data value evaluation device and the computer-readable storage medium provided by the invention can achieve the purpose of improving the accuracy of document data value evaluation.
Drawings
FIG. 1 is a schematic flow chart of a document data value evaluation method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of an initial knowledge-graph construction method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for refining an initial knowledge-graph according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a method for ranking scores of nodes of a knowledge graph according to an embodiment of the present invention;
FIG. 5 is a block diagram of a document data value evaluation apparatus according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an internal structure of an electronic device implementing a document data value evaluation method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The execution subject of the document data value evaluation method provided by the embodiment of the application includes but is not limited to at least one of the electronic devices such as a server and a terminal which can be configured to execute the method provided by the embodiment of the application. In other words, the document data value evaluation method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Referring to fig. 1, a flow chart of a document data value evaluation method according to an embodiment of the present invention is schematically shown. In this embodiment, the document data value evaluation method includes:
and S1, acquiring document data from a preset database, and constructing an initial knowledge graph according to the entity information and the correlation of the original data set in the document data.
Preferably, the literature data in the embodiment of the present invention is a patent literature, and the original data set includes, but is not limited to, patent information, inventor information, enterprise information, and the like. The literature data can be retrieved from a background database of the published patent website, and can also be retrieved from nodes in a block chain.
In detail, referring to fig. 2, the constructing of the initial knowledge-graph according to the entity information and the correlation of the original data set in the literature data includes:
s10, extracting a plurality of keywords from the original data set through a natural language processing technology, and taking the keywords as target entities to obtain entity information;
s11, analyzing the correlation among the entities contained in the entity information;
and S12, constructing an initial knowledge graph according to the entity information and the correlation relationship.
Wherein the entity information includes, but is not limited to, the name, application number, inventor, applicant, family, and citation field of the patent document. Further, the correlation includes an invention relationship between an inventor and a patent document, an application relationship between an enterprise and a patent document, a family relationship between a patent document and a patent document, a citation relationship, and the like. The family relationship refers to a group of patent documents with the same or basically the same content, which are applied, published or approved for multiple times by patent organizations in different countries or regions and regions based on the same priority document.
In the embodiment of the present invention, the analyzing of the correlation between the entities included in the entity information includes, for example, determining a relationship between an applicant and a patent document name (patent document application number) as an invention relationship, determining a relationship between an applicant and a patent document name (patent document application number) as an application relationship, and determining a relationship between a patent document name (patent document application number) and a patent document name (patent document application number) as a family relationship or a citation relationship.
In detail, the extracting a plurality of keywords from the original data set by using a natural language processing technology in the embodiment of the present invention includes:
performing word segmentation on the text contained in the original data set, and removing stop words to obtain word segmentation results;
one or more keywords are picked out from the word segmentation result.
In the embodiment of the invention, one or more keywords can be selected by adopting the currently disclosed TextRank, a keyword extraction algorithm based on semantics and the like.
Further, the constructing an initial knowledge graph according to the entity information and the correlation relationship includes:
taking an entity in the entity information as a node of a knowledge graph;
constructing a plurality of triples according to the correlation relationship between every two entities and the attribute of each node;
and visualizing the triples to form an initial knowledge graph.
Specifically, in the embodiment of the present invention, after information fusion processing is performed on the entities and the correlation relationships, a plurality of triples are obtained, the initial knowledge graph is composed of a plurality of triples, and a concrete expression form of the triples is "(a, B, C)", where B is a relationship, and a and C are graph nodes, for example: the inventor of patent document a is denoted by "patent document a, inventor C" as a triplet.
Preferably, the graph structure of the initial knowledge-graph is used for providing a basic data structure for subsequent graph algorithm operation and use.
And S2, performing relation completion on the initial knowledge graph by using the pre-constructed algorithm model to obtain a standard knowledge graph.
Preferably, the entities in the initial knowledge graph have different types and attributes, and the correlation relationships between the entities also have different types, but some relationship exists between a plurality of entities and is not reflected, so that we need to further complement the initial knowledge graph to find the relationship to be complemented that does not exist in the initial knowledge graph.
The embodiment of the invention completes the initial knowledge graph by adopting a pre-constructed algorithm model, embeds the initial knowledge graph into a continuous vector space and reserves certain information in the graph.
In detail, referring to fig. 3, the performing the relational completion on the initial knowledge-graph by using the pre-constructed algorithm model includes:
s20, converting the entities and the correlation in the initial knowledge map into vectors with the same dimensionality by using a pre-constructed knowledge representation algorithm, and mapping the vectors into the same vector space;
s21, calculating the vectors according to a preset distance formula, and judging whether the vectors have similarity according to the calculation result;
and S22, completing the correlation among the entities in the initial knowledge graph when the vectors are similar.
Wherein the distance formula is as follows:
Figure BDA0002663644050000061
in the formula, h is a front entity vector in a triple, t is a correlation relationship vector in the triple, r is a rear entity vector in the triple, d represents a relationship distance between two entities in the triple, and the closer d is to 0, the higher the similarity between the two vectors is.
Further, when there is no similarity between the vectors, there is no need to complement the correlation between the entities in the initial knowledge-graph.
In detail, the determining whether the vectors have similarity according to the calculation result includes:
comparing the calculation result with a preset similar threshold value;
if the calculation result is smaller than a preset similarity threshold value, determining that the vectors have similarity;
and if the calculation result is greater than or equal to a preset similarity threshold value, judging that the vectors have no similarity.
Preferably, the pre-constructed knowledge representation algorithm in the embodiment of the present invention may adopt a currently disclosed TransE algorithm, and the knowledge representation algorithm may represent the entity vectors in a low-dimensional dense vector space, so as to facilitate calculation and reasoning of the entity vectors.
And S3, scoring the document data by using a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and displaying the value evaluation result to a user to output through a display screen of the electronic equipment.
In detail, referring to fig. 4, the scoring of the literature data using a pre-constructed value analysis model based on the standard knowledge-graph includes:
s30, scoring each document data node in the standard knowledge graph by using a pre-constructed value analysis model to obtain a scoring result;
s31, sorting according to the grading result to obtain a sorting result;
and S32, analyzing the scoring result and the sorting result to obtain a value evaluation result of the literature data.
Wherein the pre-constructed value analysis model is as follows:
Figure BDA0002663644050000071
wherein p isiAnd pjAre document data nodes in the standard knowledge-graph,
Figure BDA0002663644050000072
is the point p in the standard knowledge-graphiOf nodes, L (p)j) Is pjA set of nodes pointed to in the standard knowledge graph, N is the total number of nodes in the standard knowledge graph, alpha is a damping coefficient and is a preset constant, PR (p)i) And PR (p)j) Are each piScore value of (1) and pjThe value of (a).
Preferably, the embodiment of the invention judges the value of the document data according to the relationships of inventors, citations and the like, scores and sorts the document data by using a preset scoring formula based on the standard knowledge graph so as to obtain the evaluation value of the document data, and pushes the value evaluation result to a system page output on a display screen of the electronic equipment to be displayed to a user, so that the user can understand the evaluation value conveniently.
According to the embodiment of the invention, the initial knowledge graph is constructed according to the entity information and the correlation of the original data set in the document data, so that the interference of human subjective factors can be reduced, and the accuracy of the result is improved from the document data; performing relationship completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph, so that the invisible value of document data can be mined, the utilization rate of the document data is improved, and the accuracy of value evaluation is improved; and scoring the document data by using a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and ranking and scoring the nodes in the knowledge graph to realize importance scoring of the document data and reduce occupation and waste of computing resources. Therefore, the document data value evaluation method, the document data value evaluation device and the computer-readable storage medium provided by the invention can achieve the purpose of improving the accuracy of document data value evaluation
FIG. 5 is a functional block diagram of the document data value evaluation device according to the present invention.
The document data value evaluation device 100 according to the present invention may be installed in an electronic device. According to the realized functions, the literature data value evaluation device can comprise a knowledge graph construction module 101, a knowledge graph improvement module 102 and a grading module 103. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the knowledge graph constructing module 101 is configured to obtain literature data from a preset database, and construct an initial knowledge graph according to entity information and a correlation of an original data set in the literature data.
Preferably, the literature data in the embodiment of the present invention is a patent literature, and the original data set includes, but is not limited to, patent information, inventor information, enterprise information, and the like. The literature data can be retrieved from a background database of the published patent website, and can also be retrieved from nodes in a block chain.
In detail, the knowledge-graph building module 101 is specifically configured to:
extracting a plurality of keywords from the original data set by a natural language processing technology, and taking the keywords as target entities to obtain entity information;
analyzing the correlation among the entities contained in the entity information;
and constructing an initial knowledge graph according to the entity information and the correlation relationship.
Wherein the entity information includes, but is not limited to, the name, application number, inventor, applicant, family, and citation field of the patent document. Further, the correlation includes an invention relationship between an inventor and a patent document, an application relationship between an enterprise and a patent document, a family relationship between a patent document and a patent document, a citation relationship, and the like. The family relationship refers to a group of patent documents with the same or basically the same content, which are applied, published or approved for multiple times by patent organizations in different countries or regions and regions based on the same priority document.
In the embodiment of the present invention, the analyzing of the correlation between the entities included in the entity information includes, for example, determining a relationship between an applicant and a patent document name (patent document application number) as an invention relationship, determining a relationship between an applicant and a patent document name (patent document application number) as an application relationship, and determining a relationship between a patent document name (patent document application number) and a patent document name (patent document application number) as a family relationship or a citation relationship.
In detail, the extracting a plurality of keywords from the original data set by using a natural language processing technology in the embodiment of the present invention includes:
performing word segmentation on the text contained in the original data set, and removing stop words to obtain word segmentation results;
one or more keywords are picked out from the word segmentation result.
In the embodiment of the invention, one or more keywords can be selected by adopting the currently disclosed TextRank, a keyword extraction algorithm based on semantics and the like.
Further, the constructing an initial knowledge graph according to the entity information and the correlation relationship includes:
taking an entity in the entity information as a node of a knowledge graph;
constructing a plurality of triples according to the correlation relationship between every two entities and the attribute of each node;
and visualizing the triples to form an initial knowledge graph.
Specifically, in the embodiment of the present invention, after information fusion processing is performed on the entities and the correlation relationships, a plurality of triples are obtained, the initial knowledge graph is composed of a plurality of triples, and a concrete expression form of the triples is "(a, B, C)", where B is a relationship, and a and C are graph nodes, for example: the inventor of patent document a is denoted by "patent document a, inventor C" as a triplet.
Preferably, the graph structure of the initial knowledge-graph is used for providing a basic data structure for subsequent graph algorithm operation and use.
The knowledge graph perfecting module 102 is configured to perform relationship completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph.
Preferably, the entities in the initial knowledge graph have different types and attributes, and the correlation relationships between the entities also have different types, but some relationship exists between a plurality of entities and is not reflected, so that we need to further complement the initial knowledge graph to find the relationship to be complemented that does not exist in the initial knowledge graph.
The embodiment of the invention completes the initial knowledge graph by adopting a pre-constructed algorithm model, embeds the initial knowledge graph into a continuous vector space and reserves certain information in the graph.
In detail, the knowledge-graph perfection module 102 performs a relational completion on the initial knowledge-graph by the following method steps:
converting the entities and the correlation in the initial knowledge map into vectors with the same dimension by using a pre-constructed knowledge representation algorithm, and mapping the vectors into the same vector space;
calculating the vectors according to a preset distance formula, and judging whether the vectors have similarity according to the calculation result;
and when the vectors are similar, completing the correlation among the entities in the initial knowledge graph.
Wherein the distance formula is as follows:
Figure BDA0002663644050000101
in the formula, h is a front entity vector in a triple, t is a correlation relationship vector in the triple, r is a rear entity vector in the triple, d represents a relationship distance between two entities in the triple, and the closer d is to 0, the higher the similarity between the two vectors is.
Further, when there is no similarity between the vectors, there is no need to complement the correlation between the entities in the initial knowledge-graph.
In detail, the determining whether the vectors have similarity according to the calculation result includes:
comparing the calculation result with a preset similar threshold value;
if the calculation result is smaller than a preset similarity threshold value, determining that the vectors have similarity;
and if the calculation result is greater than or equal to a preset similarity threshold value, judging that the vectors have no similarity.
Preferably, the pre-constructed knowledge representation algorithm in the embodiment of the present invention may adopt a currently disclosed TransE algorithm, and the knowledge representation algorithm may represent the entity vectors in a low-dimensional dense vector space, so as to facilitate calculation and reasoning of the entity vectors.
The scoring module 103 is configured to score the document data according to the standard knowledge graph by using a pre-constructed value analysis model to obtain a value evaluation result of the document data, and output the value evaluation result through a display screen of an electronic device.
In detail, the scoring module 103 is specifically configured to:
scoring each document data node in the standard knowledge graph by using a pre-constructed value analysis model to obtain a scoring result;
sorting according to the grading result to obtain a sorting result;
and analyzing the scoring result and the sequencing result to obtain a value evaluation result of the literature data.
Wherein the pre-constructed value analysis model is as follows:
Figure BDA0002663644050000111
wherein p isiAnd pjAre document data nodes in the standard knowledge-graph,
Figure BDA0002663644050000112
is the point p in the standard knowledge-graphiOf nodes, L (p)j) Is pjA set of nodes pointed to in the standard knowledge graph, N is the total number of nodes in the standard knowledge graph, alpha is a damping coefficient and is a preset constant, PR (p)i) And PR (p)j) Are each piScore value of (1) and pjThe value of (a).
Preferably, the embodiment of the invention judges the value of the document data according to the relationships of inventors, citations and the like, scores and sorts the document data by using a preset scoring formula based on the standard knowledge graph so as to obtain the evaluation value of the document data, and pushes the value evaluation result to a system page output on a display screen of the electronic equipment to be displayed to a user, so that the user can understand the evaluation value conveniently.
Fig. 6 is a schematic structural diagram of an electronic device for implementing the document data value evaluation method according to the present invention.
The electronic device 1 may include a processor 10, a memory 11 and a bus, and may further include a computer program, such as a document data value evaluation program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the document data value evaluation program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing a document data value evaluation program and the like) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 6 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 6 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the embodiments described are for illustrative purposes only and are not to be construed as limiting the scope of the application to the disclosed structure.
The document data value evaluation program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, enable:
acquiring literature data from a preset database, and constructing an initial knowledge graph according to entity information and correlation of an original data set in the literature data;
performing relation completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph;
and grading the document data by using a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and displaying the value evaluation result to a user for outputting through a display screen of electronic equipment.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any accompanying claims should not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A document data value evaluation method, which is operated in an electronic device, comprises the following steps:
acquiring literature data from a database in communication connection with the electronic equipment, and constructing an initial knowledge graph according to entity information and correlation of an original data set in the literature data;
performing relation completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph;
and scoring the document data by using a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and outputting the value evaluation result through a display screen of electronic equipment.
2. The method for evaluating the value of literature data according to claim 1, wherein the constructing an initial knowledge-graph from entity information and correlations of original data sets in the literature data comprises:
extracting a plurality of keywords from the original data set by a natural language processing technology, and taking the keywords as target entities to obtain entity information;
analyzing the correlation among the entities contained in the entity information;
and constructing an initial knowledge graph according to the entity information and the correlation relationship.
3. The document data value evaluation method of claim 2, wherein said extracting a plurality of keywords from said raw data set by natural language processing techniques comprises:
performing word segmentation on the text contained in the original data set, and removing stop words to obtain word segmentation results;
one or more keywords are picked out from the word segmentation result.
4. The document data value assessment method according to claim 2, wherein said constructing an initial knowledge-graph from said entity information and said correlations comprises:
taking an entity in the entity information as a node of a knowledge graph;
constructing a plurality of triples according to the correlation relationship between every two entities and the attribute of each node;
and visualizing the triples to form an initial knowledge graph.
5. The document data value evaluation method according to claim 1, wherein the performing the relational completion on the initial knowledge-graph by using the pre-constructed algorithm model comprises:
converting the entities and the correlation in the initial knowledge map into vectors with the same dimension by using a pre-constructed knowledge representation algorithm, and mapping the vectors into the same vector space;
calculating the vectors according to a preset distance formula, and judging whether the vectors have similarity according to the calculation result;
and when the vectors are similar, completing the correlation among the entities in the initial knowledge graph.
6. The document data value evaluation method according to claim 5, wherein the judging the similarity between the vectors based on the calculation result comprises:
comparing the calculation result with a preset similar threshold value;
if the calculation result is smaller than a preset similarity threshold value, determining that the vectors have similarity;
and if the calculation result is greater than or equal to a preset similarity threshold value, judging that the vectors have no similarity.
7. The document data value evaluation method according to any one of claims 1 to 6, wherein the pre-constructed value analysis model is:
Figure FDA0002663644040000021
wherein p isiAnd pjIs thatThe document data nodes in the standard knowledge-graph,
Figure FDA0002663644040000022
is the point p in the standard knowledge-graphiOf nodes, L (p)j) Is pjA set of nodes pointed to in the standard knowledge-graph, N is a total number of nodes in the standard knowledge-graph, α is a damping coefficient, PR (p)i) And PR (p)j) Are each piScore value of (1) and pjThe value of (a).
8. An apparatus for evaluating value of document data, the apparatus comprising:
the knowledge graph construction module is used for acquiring literature data from a preset database and constructing an initial knowledge graph according to entity information and relevant relations of an original data set in the literature data;
the knowledge graph perfecting module is used for performing relationship completion on the initial knowledge graph by using a pre-constructed algorithm model to obtain a standard knowledge graph;
and the scoring module is used for scoring the document data by utilizing a pre-constructed value analysis model according to the standard knowledge graph to obtain a value evaluation result of the document data, and outputting the value evaluation result through a display screen of electronic equipment.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing at least one computer program instruction; and
a processor executing computer program instructions stored in the memory to perform a document data value assessment method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the document data value evaluation method according to any one of claims 1 to 7.
CN202010912004.2A 2020-09-02 2020-09-02 Document data value evaluation method and device, electronic equipment and storage medium Pending CN111930962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010912004.2A CN111930962A (en) 2020-09-02 2020-09-02 Document data value evaluation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010912004.2A CN111930962A (en) 2020-09-02 2020-09-02 Document data value evaluation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111930962A true CN111930962A (en) 2020-11-13

Family

ID=73309045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010912004.2A Pending CN111930962A (en) 2020-09-02 2020-09-02 Document data value evaluation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111930962A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699250A (en) * 2021-01-13 2021-04-23 北京创安恒宇科技有限公司 Knowledge graph construction method and device, readable storage medium and electronic equipment
CN112885478A (en) * 2021-01-28 2021-06-01 平安科技(深圳)有限公司 Medical document retrieval method, medical document retrieval device, electronic device, and storage medium
CN113191870A (en) * 2021-01-19 2021-07-30 迅鳐成都科技有限公司 Intellectual property value evaluation method and system based on block chain
CN113538178A (en) * 2021-06-10 2021-10-22 北京易创新科信息技术有限公司 Intellectual property value evaluation method and device, electronic equipment and readable storage medium
CN113987017A (en) * 2021-10-26 2022-01-28 北京华宜信科技有限公司 Scientific and technological achievement evaluation service network platform construction method based on block chain
CN115617927A (en) * 2022-11-08 2023-01-17 北京数安行科技有限公司 Safety metering method and device for big data value
CN116681056A (en) * 2023-05-24 2023-09-01 人民网股份有限公司 Text value calculation method and device based on value scale
CN117891959A (en) * 2024-03-15 2024-04-16 中国标准化研究院 Document metadata storage method and system based on Bayesian network
CN118036902A (en) * 2024-04-11 2024-05-14 中国科学院自动化研究所 Knowledge graph-based ocean typical scene evaluation index system construction method and device, electronic equipment and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699250A (en) * 2021-01-13 2021-04-23 北京创安恒宇科技有限公司 Knowledge graph construction method and device, readable storage medium and electronic equipment
CN113191870A (en) * 2021-01-19 2021-07-30 迅鳐成都科技有限公司 Intellectual property value evaluation method and system based on block chain
CN113191870B (en) * 2021-01-19 2023-08-08 迅鳐成都科技有限公司 Intellectual property value evaluation method and system based on blockchain
CN112885478B (en) * 2021-01-28 2023-07-07 平安科技(深圳)有限公司 Medical document retrieval method, medical document retrieval device, electronic device and storage medium
CN112885478A (en) * 2021-01-28 2021-06-01 平安科技(深圳)有限公司 Medical document retrieval method, medical document retrieval device, electronic device, and storage medium
CN113538178A (en) * 2021-06-10 2021-10-22 北京易创新科信息技术有限公司 Intellectual property value evaluation method and device, electronic equipment and readable storage medium
CN113987017A (en) * 2021-10-26 2022-01-28 北京华宜信科技有限公司 Scientific and technological achievement evaluation service network platform construction method based on block chain
CN115617927A (en) * 2022-11-08 2023-01-17 北京数安行科技有限公司 Safety metering method and device for big data value
CN116681056A (en) * 2023-05-24 2023-09-01 人民网股份有限公司 Text value calculation method and device based on value scale
CN116681056B (en) * 2023-05-24 2024-01-26 人民网股份有限公司 Text value calculation method and device based on value scale
CN117891959A (en) * 2024-03-15 2024-04-16 中国标准化研究院 Document metadata storage method and system based on Bayesian network
CN117891959B (en) * 2024-03-15 2024-05-10 中国标准化研究院 Document metadata storage method and system based on Bayesian network
CN118036902A (en) * 2024-04-11 2024-05-14 中国科学院自动化研究所 Knowledge graph-based ocean typical scene evaluation index system construction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111930962A (en) Document data value evaluation method and device, electronic equipment and storage medium
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN112883190A (en) Text classification method and device, electronic equipment and storage medium
CN113449187A (en) Product recommendation method, device and equipment based on double portraits and storage medium
CN112364107A (en) System analysis visualization method and device, electronic equipment and computer readable storage medium
CN112906377A (en) Question answering method and device based on entity limitation, electronic equipment and storage medium
CN114612194A (en) Product recommendation method and device, electronic equipment and storage medium
CN111831708A (en) Missing data-based sample analysis method and device, electronic equipment and medium
CN111522782A (en) File data writing method and device and computer readable storage medium
CN113505273A (en) Data sorting method, device, equipment and medium based on repeated data screening
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN115409041B (en) Unstructured data extraction method, device, equipment and storage medium
CN111429085A (en) Contract data generation method and device, electronic equipment and storage medium
CN113705201B (en) Text-based event probability prediction evaluation algorithm, electronic device and storage medium
CN111930961A (en) Competitive relationship analysis method and device, electronic equipment and storage medium
CN111444159B (en) Refined data processing method, device, electronic equipment and storage medium
CN114708073A (en) Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium
CN113343102A (en) Data recommendation method and device based on feature screening, electronic equipment and medium
CN113656586A (en) Emotion classification method and device, electronic equipment and readable storage medium
CN113434660A (en) Product recommendation method, device, equipment and storage medium based on multi-domain classification
CN113204962A (en) Word sense disambiguation method, device, equipment and medium based on graph expansion structure
CN112733537A (en) Text duplicate removal method and device, electronic equipment and computer readable storage medium
CN112287676A (en) New word discovery method, device, electronic equipment and medium
CN111738005A (en) Named entity alignment method and device, electronic equipment and readable storage medium
CN114969385B (en) Knowledge graph optimization method and device based on document attribute assignment entity weight

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination