CN110458466B - Patent estimation method and system based on data mining and heterogeneous knowledge association - Google Patents

Patent estimation method and system based on data mining and heterogeneous knowledge association Download PDF

Info

Publication number
CN110458466B
CN110458466B CN201910758922.1A CN201910758922A CN110458466B CN 110458466 B CN110458466 B CN 110458466B CN 201910758922 A CN201910758922 A CN 201910758922A CN 110458466 B CN110458466 B CN 110458466B
Authority
CN
China
Prior art keywords
value
text
heterogeneous
association
texts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910758922.1A
Other languages
Chinese (zh)
Other versions
CN110458466A (en
Inventor
刘维东
刘鑫
张程
郭旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University
Original Assignee
Inner Mongolia University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University filed Critical Inner Mongolia University
Priority to CN201910758922.1A priority Critical patent/CN110458466B/en
Publication of CN110458466A publication Critical patent/CN110458466A/en
Application granted granted Critical
Publication of CN110458466B publication Critical patent/CN110458466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of patent value evaluation, and discloses a patent evaluation method and a patent evaluation system based on data mining and heterogeneous knowledge association, wherein a heterogeneous knowledge association network is constructed as a network environment of patent evaluation according to complex association relation between patent text and market information text; training machine learning in the network environment, and accurately measuring the patent value by a data mining model; and accurately measuring the patent value by a machine learning and data mining method in the network environment. The invention not only extracts the related value characteristics from the patent, but also analyzes and introduces complex external entity characteristics from the association relation. A huge heterogeneous knowledge association network is formed by associating a large number of patent texts with texts related to complex external markets, and the accurate measurement of the patent value is performed by training machine learning and a data mining model through extracted patent self and external value characteristics.

Description

Patent estimation method and system based on data mining and heterogeneous knowledge association
Technical Field
The invention belongs to the technical field of patent value evaluation, and particularly relates to a patent estimation method and a patent estimation system based on data mining and heterogeneous knowledge correlation.
Background
Currently, the closest prior art:
the current common value measurement model mainly comprises a method for counting the number of quotations, a method for potential graph models, a method for deep learning and the like, and the methods are used for researching the value of the patent by taking the value characteristics of the patent as starting points or introducing simple external characteristics.
(1) From the perspective of the patent itself, the current method considers the patent value to be single and determined.
For intangible assets such as patents, the value itself changes with the change of external related factors, and the value cannot always be stable. Even in different environments, different values may be represented. The prior method treats the value of the patent in a single and definite angle, so that the value of the patent in other angles is buried, and the accurate estimation of the patent cannot be performed.
(2) From the correlation, the current method considers that the patent value is irrelevant and is not market.
In fact, the patent value often appears in a heterogeneous relationship with the complex features of external entities. The value of intangible assets such as patents is indispensible from market trend. For example, the technology proposed by a certain patent is closely related to the current market, so that the more the patent is related to the current market, the greater the value of the patent is. Through these heterogeneous relationships, a linkage occurs between the patent value and these external entities. If these complex heterogeneous relationships are not clearly represented and utilized, the true value of the patent is difficult to mine, and no accurate measure of the patent value can be made.
In summary, the problems of the prior art are:
the current method cannot clearly express and utilize the complex heterogeneous relations, so that the true value of the patent is difficult to mine, and the accurate measurement of the patent value cannot be performed.
The difficulty of solving the technical problems is as follows:
the difficulty is as follows: and establishing a heterogeneous knowledge association network. Heterogeneous knowledge association networks are composed of heterogeneous nodes of different types and interrelations between the nodes, and how to interrelate the different types of nodes is one of the difficulties.
Difficulty two: extraction of external entity features. The value of a patent has a very close association with the market situation, but how to smartly associate a patent with the market situation and how to translate into computer-recognizable features is a technical difficulty.
Meaning of solving the technical problems:
the counter-balance between countries in the world today has been converted into a comparison of knowledge technology, and knowledge economy is becoming more important. Patents are valued among countries as representative knowledge outputs. The patent is converted into the transaction correctly, and the accurate measurement of the value of the patent is indispensable. Because the value evaluation is carried out on the patent in different markets, the patent user can know the patent more, and the implementation of transaction conversion is facilitated.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a patent estimation method and a system based on data mining and heterogeneous knowledge association.
The invention is realized in such a way that a patent estimation method based on data mining and heterogeneous knowledge association comprises the following steps:
and constructing a heterogeneous knowledge association network as a network environment of patent valuation according to the complex association relation between the patent text and the market information text.
The patent value is accurately measured by training machine learning and a data mining model in the network environment.
And accurately measuring the patent value by a machine learning and data mining method in the network environment.
Further, the patent estimation method based on data mining and heterogeneous knowledge association further comprises the following steps:
firstly, data acquisition and preprocessing are carried out, and data source selection and information extraction are carried out.
And secondly, establishing a heterogeneous knowledge association network, and constructing network nodes, extracting text features, performing association calculation and constructing edges.
And thirdly, extracting the value characteristics of the patent and the external entity characteristics related to the value characteristics, and generating a probability map model and calculation of the posterior distribution of the patent value.
Further, the first step specifically includes:
1) Selecting a data source: determining data sources, namely patent information of China national intellectual property office, annual report information of marketing companies of the huge tide information network and commodity information on an electronic commerce website.
2) Information extraction: first, patent data is extracted from the abstract of the patent, claims, and text information of the specification. Then, for the annual report data, the text information of the main business and the operating range of the company is extracted, and the indexes of relatively stable five companies, namely the net asset yield, the investment yield, the net profit, the flow rate and the gross profit, are also extracted. And extracting text information of commodity introduction and specification package from commodity data, and extracting commodity indexes of commodity value and comment quantity.
Further, the second step includes:
a) Construction of heterogeneous nodes. And selecting patent texts, and using commercial company annual newspaper texts on the market and commodity information texts as heterogeneous nodes for constructing a heterogeneous knowledge association network.
b) And (5) extracting text features.
c) And (3) association calculation:
d) Determination of edges between heterogeneous nodes.
And forming a heterogeneous knowledge association network according to the complex association relation between the patent text and the external market.
Further, the third step, the construction and calculation of the probability map model further includes:
a) Extracting value features, the extracting value features comprising:
value characteristics inside text: text features are extracted for each node V. For the extracted patent text, popularity of the patent keywords, context consistency of the patent text and complexity of the patent text content are extracted as the value characteristics of the patent text. For the extracted annual report text, five firm stable indexes of the extracted net asset yield, the investment yield, the net profit, the flow rate and the gross profit are used as the value characteristics of the annual report. And regarding the extracted commodity information text, taking the extracted commodity value and the comment quantity as the value characteristic of commodity information.
External physical characteristics of the associations between texts: and extracting the associated characteristics of the text for the nodes V. Namely, the patent text and the company annual report text are related to each other. And obtaining the association characteristic between the texts through the calculation of the distance between the texts.
B) Probability map model generation process: subjecting the weights of the value features to a dirichlet distribution; making the text value obey gamma distribution; the intensity among different types of texts is subjected to gamma distribution; making each associated feature obey poisson distribution;
c) Calculating posterior distribution of patent value: calculating joint probability distribution of the whole model according to the generated probability map model; and after the joint probability distribution is obtained, calculating posterior distribution of the patent value parameter by a sampling algorithm.
Further, in the probability map model generation method, the weight node of each value feature is made to be Is the super ginseng. Let each value node-> Is a super ginseng; let the intensity node lambda-Gamma (alpha) λ ),α λ Is a super ginseng; letting each associated feature node w-Possion (lambda, r);
the method specifically comprises the following steps:
let the value weight number W, the value distribution number V, the intensity type number N, the association number M.
for n=1,…,N do;
sampling λ~Gamma(α λ );
for m=1,…,M do;
for w=1,…,W do;
sampling
for v=1,…,V do;
sampling
sampling w~Possion(λ,r);
The sampling method of the present invention is implemented using the MCMC algorithm.
The posterior distribution calculation method of the patent value comprises the following steps: calculating joint probability distribution of the whole model according to the generated probability map model;
further, the process of constructing the heterogeneous knowledge association network comprises the following steps:
first, construction of heterogeneous nodes: and selecting patent texts, and using commercial company annual newspaper texts on the market and commodity information texts as heterogeneous nodes for constructing a heterogeneous knowledge association network.
Second, extraction of text features: in the text of the constructed heterogeneous node, extracting text keywords from the text by adopting a natural language processing technology, and taking the extracted keywords as the characteristics of the text.
Third, association calculation: and under the obtained text characteristics, calculating the distance between the texts by using a natural language processing technology, so that the distance between the texts is used as the associated size between the texts.
Fourth, construction of edges: the distance between the texts is taken as the side of the association degree between the texts.
According to the described structure of the edges between the heterogeneous nodes, a large number of patent texts, annual report texts and commodity information texts are constructed according to the method, so that a complex heterogeneous knowledge association network is formed. And then accurately measuring the patent value in the network environment through a machine learning and data mining method.
Another object of the present invention is to provide an information data processing terminal implementing the patent estimation method based on data mining and heterogeneous knowledge association.
It is a further object of the present invention to provide a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the data mining and heterogeneous knowledge correlation based patent estimation method.
Another object of the present invention is to provide a patent estimation system based on heterogeneous knowledge association, which implements the patent estimation method based on data mining and heterogeneous knowledge association.
Another object of the present invention is to provide a patent estimation device based on heterogeneous knowledge association, which implements the patent estimation method based on data mining and heterogeneous knowledge association.
In summary, the invention has the advantages and positive effects that:
the invention designs a patent pricing method based on heterogeneous knowledge association network, and from the perspective of the patent, the invention analyzes that the patent value is multidimensional and opposite. From the correlation, the value characteristics of the patent are considered, and complex external entity characteristics are introduced to correlate the characteristics, so that a complex heterogeneous knowledge correlation network is formed. This heterogeneous knowledge-related network is used as a network environment for patent pricing. The patent value is then accurately measured by a machine learning and data mining method in the environment.
Compared with the prior art, the method adopts the technical means and effects as follows:
modeling the patent value as a multidimensional probability distribution by establishing a probability model diagram to express uncertainty and relativity of the patent value, namely: the patent value fluctuates in a certain probability in a range, different dimensions give different view angles of the value analysis, and the value distribution in the different view angles is different.
The association network is used as the value environment of the patent, on the basis, the self value characteristics of the patent are combined with the association environment, and the probability generation process of the patent value is established, so that the association and the linkage of the patent value are reflected, namely: the patent value has relevance with the environment, the patent value can be different according to the different surrounding environments, and the internal value of the patent can be changed in linkage with the external environment.
The whole patent estimation method and system are realized through machine learning algorithms such as automatic web crawlers, associated data mining, probability model diagrams, automatic sampling methods and the like, and the patent estimation is completely automatic.
The outstanding technical advantages are as follows: compared with the existing method, the method fully considers uncertainty, relativity, multidimensional property, relevance, value linkage law and the like of the patent value in the modeling process, so that the value expression of the method accords with the value characteristics and the value law, the value measurement is more accurate, in addition, the method transmits the patent estimation process to the knowledge association environment, market data and value linkage, the data mining and machine learning algorithm are adopted to automatically realize the process, manual participation is not needed in the whole process, subjectivity of manual participation is abandoned, the estimation is more intelligent, and the estimation result is more accurate and objective.
Drawings
FIG. 1 is a flow chart of a patent valuation method based on data mining and heterogeneous knowledge correlation provided by an embodiment of the invention.
Fig. 2 is a graph of correlation analysis T representing types provided by an embodiment of the present invention. In the figure: the index i indicates the i-th node, and w indicates the degree of association.
Fig. 3 is a probability map model provided by an embodiment of the present invention.
Fig. 4 is a first dimension value diagram of CN200310108255 provided by an embodiment of the present invention.
Fig. 5 is a second dimensional value graph of CN200310108255 provided by an embodiment of the present invention.
Fig. 6 is a third dimensional value graph of CN200310108255 provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The patent value refers to the economic return that the patent can bring to the patentee. The patent value measurement system specifically quantifies economic returns brought by patents to patentees through patents submitted by patentees and model calculation of a system background.
In the prior art, the patent value is considered to be single and determined by the self. For intangible assets such as patents, the value itself changes with the change of external related factors, and the value cannot always be stable. Even in different environments, different values may be represented. The prior method treats the value of the patent in a single and definite angle, so that the value of the patent in other angles is buried, and the accurate estimation of the patent cannot be performed. (2) From the correlation, they consider the patent value to be non-correlated and non-marketable. In fact, the patent value often appears in a heterogeneous relationship with the complex features of external entities. Through these heterogeneous relationships, a linkage occurs between the patent value and these external entities. The true value of the patent is difficult to mine if these complex heterogeneous relationships are not clearly represented and exploited. The patent value cannot be precisely measured.
Aiming at the problems existing in the prior art, the invention provides a patent estimation method based on data mining and heterogeneous knowledge association, and the measured value is realized through the technologies of natural language processing, machine learning and data mining; the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the patent estimation method based on data mining and heterogeneous knowledge association provided by the embodiment of the invention includes:
first: data acquisition and preprocessing.
1) Selecting a data source: the present invention determines 3 data sources. The information is the patent information of China national intellectual property office, the annual report information of the market company of the huge tide information network and the commodity information on an electronic commerce website (Taobao, jingdong and the like).
2) Information extraction: first, for patent data, the invention extracts the abstract of the patent, claims, and text information of the specification. Then, for annual report data, the invention extracts the text information of the main business and the operating range of the company, and extracts the stable indexes of five companies, namely the net asset yield, the investment yield, the net profit, the flow rate and the gross profit. For commodity data, the invention extracts text information of commodity introduction and specification package, and extracts commodity indexes of commodity value and comment quantity.
Second,: the establishment of the heterogeneous knowledge association network comprises the following steps:
1) Construction of heterogeneous nodes: and selecting patent texts, and using commercial company annual newspaper texts on the market and commodity information texts as heterogeneous nodes for constructing a heterogeneous knowledge association network.
2) Extracting text features: in the text of the heterogeneous node constructed in the step 1), the text keywords are extracted by adopting techniques such as natural language processing and the like, and the extracted keywords are used as the characteristics of the text.
3) And (3) association calculation: and 2) under the text characteristics obtained in the step 2), calculating the distance between texts to be used as the association between texts. The distance calculation is described in detail in reference 1.
The article proposes a new method of calculating the distance of a text document: word river's Distance (WMD). The main ideas are: the text document is represented as a word embedding weight using the word2vec property. The distance between two documents a and B is defined as the minimum cumulative distance that all words in a move exactly to match in document B. Form as formula
The formula:wherein c (i, j) is Euclidean distance of word vectors corresponding to two words of i, j, T i,j Indicating how much of word i was translated into word j.
Wherein d is i The weights of the words are represented.
4) Edge construction: the distance between the texts obtained in the step 3) is used as an edge of the association degree between the texts.
According to the described structure of the edges between the heterogeneous nodes, a large number of patent texts, annual report texts and commodity information texts are constructed according to the method, so that a complex heterogeneous knowledge association network is formed.
Third,: the construction and calculation of the probability map model comprises the following steps:
probability map model generation process:
table: description of the symbols in the model drawings
(a) Let the weight node of each value feature Is the super ginseng.
(b) Let each value node Is the super ginseng.
(c) Let the intensity node lambda-Gamma (alpha) λ ),α λ Is the super ginseng.
(d) Let each associated feature node w-Possion (λ, r).
The method specifically comprises the following steps:
let the value weight number W, the value distribution number V, the intensity type number N, the association number M.
for n=1,…,N do;
sampling λ~Gamma(α λ );
for m=1,…,M do;
for w=1,…,W do;
sampling
for v=1,…,V do;
sampling
sampling w~Possion(λ,r);
The sampling method is realized through an MCMC algorithm.
2) Calculating posterior distributions of patent values includes:
(i) And (2) calculating the joint probability distribution of the whole model according to the probability map model generated in the step 1).
(ii) And after the joint probability distribution is obtained, calculating posterior distribution of the patent value parameter by a sampling algorithm.
The invention is further described below in connection with specific embodiments.
Examples
According to the patent estimation method based on data mining and heterogeneous knowledge association, provided by the embodiment of the invention, the value characteristics of the patent are analyzed from the correlation, and complex external entity characteristics are introduced, so that the value characteristics of the patent are associated with the external entity characteristics, and a complex heterogeneous knowledge association network is formed as a network environment for patent pricing.
In the embodiment of the invention, the environment establishment process is as follows: establishing a heterogeneous knowledge association network: before the network environment is established, a definition is made of the heterogeneous knowledge association network. Before defining it, we know what is the information network. An information network is a network modeled by a graph, comprising two elements, vertex and edge. Where vertices represent physical objects in the real world and edges represent links between entities. The entities and the connections between each other form an information network. Heterogeneous information networks are information networks that include multiple types of nodes and multiple types of edges. Thus, the process of building a heterogeneous knowledge-related network includes:
1) Construction of heterogeneous nodes: and selecting patent texts, and using commercial company annual newspaper texts on the market and commodity information texts as heterogeneous nodes for constructing a heterogeneous knowledge association network.
2) Extracting text features: and (3) in the text of the heterogeneous node constructed in the step 1), extracting text keywords from the text by adopting a natural language processing technology, and taking the extracted keywords as the characteristics of the text.
3) And (3) association calculation: and 2) under the text characteristics obtained in the step 2), calculating the distance between the texts by using a natural language processing technology, so that the distance between the texts is used as the associated size between the texts.
4) Edge construction: the distance between the texts obtained in the step 3) is used as an edge of the association degree between the texts.
In the embodiment of the invention, a large number of patent texts, annual report texts and commodity information texts are constructed according to the method according to the described structure of the edges between the heterogeneous nodes, so that a complex heterogeneous knowledge association network is formed. And then accurately measuring the patent value in the network environment by a machine learning and data mining method. The method of how to measure is as follows:
table 1 description of the symbols in the model
A) And (3) generating a probability map model: let the weight node of each value feature Is the super ginseng.
Let each value node Is the super ginseng.
Let the intensity node lambda-Gamma (alpha) λ ),α λ Is the super ginseng.
Let each associated feature node w-Possion (λ, r).
The method specifically comprises the following steps:
let the value weight number W, the value distribution number V, the intensity type number N, the association number M.
for n=1,…,N do
sampling λ~Gamma(α λ )
for m=1,…,M do
for w=1,…,W do
sampling
for v=1,…,V do
sampling
sampling w~Possion(λ,r)
B) Calculating posterior distribution of patent value: and calculating joint probability distribution of the whole model according to the generated probability map model.
C) And after the joint probability distribution is obtained, calculating posterior distribution of the patent value parameter by a sampling algorithm.
The method of the present invention and the comparative analysis of the prior art are further described below.
The method of the invention is applied to the patent of CN200310108255, and the three-dimensional value distribution diagram is shown as a first-dimensional value of CN200310108255 in figure 1. Fig. 2 is a second dimensional value of CN 200310108255. Fig. 3 is a third dimensional value of CN 200310108255.
The comparison method adopts the LGM method proposed by the prior art. The method of the present invention is first qualitatively compared to the LGM method [ ]. Compared with a comparison model, the model of the invention is obviously better in relativity, linkage and interpretability than a comparison method.
Table 1: patent estimation method comparison
3000 patents are selected from three IPC secondary classifications A44, C23 and D04 respectively to carry out experimental comparison on the patent method and the LGM method, and the comparison result is as follows:
the method of the invention is significantly better than the LGM method in terms of accuracy.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (6)

1. The patent estimation method based on the data mining and the heterogeneous knowledge association is characterized by comprising the following steps of:
according to the complex association relation between the patent text and the market information text, constructing a heterogeneous knowledge association network as a network environment of patent valuation;
training machine learning in the network environment, and accurately measuring the patent value by a data mining model;
accurately measuring the patent value in the network environment through machine learning and a data mining method;
the patent estimation method based on data mining and heterogeneous knowledge association further comprises the following steps:
firstly, data acquisition and preprocessing, namely selecting a data source and extracting information;
secondly, establishing a heterogeneous knowledge association network, and constructing network nodes, extracting text features, performing association calculation and constructing edges;
thirdly, extracting the value characteristics of the patent and the external entity characteristics related to the value characteristics, and generating a probability map model and calculation of the posterior distribution of the patent value;
thirdly, constructing and calculating a probability map model further comprises the following steps:
a) Extracting value characteristics, wherein the value characteristics comprise value characteristics inside texts and external entity characteristics associated with the texts;
b) Probability map model generation process: subjecting the weights of the value features to a dirichlet distribution; making the text value obey gamma distribution; the intensity among different types of texts is subjected to gamma distribution; making each associated feature obey poisson distribution;
c) Calculating posterior distribution of patent value: calculating joint probability distribution of the whole model according to the generated probability map model; and after the joint probability distribution is obtained, calculating posterior distribution of the patent value parameter by a sampling algorithm.
2. The patent estimation method based on data mining and heterogeneous knowledge association according to claim 1, wherein the process of constructing the heterogeneous knowledge association network is as follows:
first, construction of heterogeneous nodes: selecting a patent text, a commercial company annual report text on the market, and taking a commodity information text as a heterogeneous node for constructing a heterogeneous knowledge association network;
second, extraction of text features: in the text of the constructed heterogeneous node, extracting text keywords from the text by adopting a natural language processing technology, and taking the extracted keywords as characteristics of the text;
third, association calculation: under the obtained text characteristics, calculating the distance between the texts by using a natural language processing technology, so that the distance between the texts is used as the associated size between the texts;
fourth, construction of edges: the obtained distance between the texts is used as an edge of the association degree between the texts; according to the structure of the edges between the heterogeneous nodes, constructing a large number of patent texts, annual report texts and commodity information texts according to the method to form a complex heterogeneous knowledge association network; and then accurately measuring the patent value in the network environment through a machine learning and data mining method.
3. The patent valuation method based on data mining and heterogeneous knowledge correlation of claim 1, wherein the first step specifically comprises:
1) Selecting a data source: determining data sources, namely patent information of China national intellectual property office, annual report information of a marketing company of a huge tide information network and commodity information on an electronic commerce website;
2) Information extraction: firstly, abstracting patent data, claims and text information of specifications from the patent data; then, for annual report data, extracting text information of main business and operation range of the company, and extracting relatively stable indexes of five companies, namely net asset yield, investment yield, net profit, flow rate and gross profit; and extracting text information of commodity introduction and specification package from commodity data, and extracting commodity indexes of commodity value and comment quantity.
4. The patent valuation method based on data mining and heterogeneous knowledge correlation of claim 1, wherein the second step comprises:
a) Constructing heterogeneous nodes; selecting a patent text, a commercial company annual report text on the market, and taking a commodity information text as a heterogeneous node for constructing a heterogeneous knowledge association network;
b) Extracting text features;
c) And (3) association calculation:
d) Determining edges between heterogeneous nodes;
and forming a heterogeneous knowledge association network according to the complex association relation between the patent text and the external market.
5. The patent valuation method based on data mining and heterogeneous knowledge correlation of claim 1, wherein the value features inside the text extract text features for each node v; extracting popularity of patent keywords from the extracted patent text, and taking the context consistency of the patent text and the complexity of the content of the patent text as the value characteristics of the patent text; for the extracted annual report text, using the five firm stable indexes of the extracted net asset profit rate, the investment profit rate, the net profit rate, the flow rate and the gross profit rate as the value characteristics of the annual report; for the extracted commodity information text, using the extracted commodity value and the comment quantity as the value characteristic of commodity information;
the external entity characteristics of the text are the associated characteristics of the text extracted from the nodes V;
the patent text and the company annual report text are related with each other; obtaining association features among texts through calculation of the distance among the texts;
in the probability map model generation method, the weight node of each value characteristic is made to beIs a super ginseng; let each value node->Is a super ginseng;
let the intensity node lambda-Gamma (alpha) λ ),α λ Is a super ginseng;
letting each associated feature node w-Possion (lambda, r);
the method specifically comprises the following steps:
setting a value weight number W, a value distribution number V, an intensity type number N and a correlation number M;
for n=1,…,N do;
samplingλ~Gamma(α λ );
for m=1,…,M do;
for w=1,…,W do;
for v=1,…,V do;
sampling w~POssion(λ,r);
the posterior distribution calculation method of the patent value comprises the following steps: calculating joint probability distribution of the whole model according to the generated probability map model;
T a ,T b node types corresponding to the node i and the node j respectively;representing the corresponding type T of node i a Is a weight vector of (2); />Representing the corresponding type T of node j b Is a weight vector of (2); />Representing the strength of association between nodes i and j; />Representation type T a And T b Strength of (2); />Representing a kth dimension value of the inode; />Representing the kth dimension value of the j node; θ λ ,θ r ,α w Respectively super parameters.
6. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the patent valuation method based on data mining and heterogeneous knowledge correlation of any of claims 1-5.
CN201910758922.1A 2019-08-16 2019-08-16 Patent estimation method and system based on data mining and heterogeneous knowledge association Active CN110458466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910758922.1A CN110458466B (en) 2019-08-16 2019-08-16 Patent estimation method and system based on data mining and heterogeneous knowledge association

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910758922.1A CN110458466B (en) 2019-08-16 2019-08-16 Patent estimation method and system based on data mining and heterogeneous knowledge association

Publications (2)

Publication Number Publication Date
CN110458466A CN110458466A (en) 2019-11-15
CN110458466B true CN110458466B (en) 2023-09-26

Family

ID=68487187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910758922.1A Active CN110458466B (en) 2019-08-16 2019-08-16 Patent estimation method and system based on data mining and heterogeneous knowledge association

Country Status (1)

Country Link
CN (1) CN110458466B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291114B (en) * 2020-01-14 2023-04-07 内蒙古大学 Service information processing method of agricultural and livestock product tracing system based on block chain
CN112085104B (en) * 2020-09-10 2024-04-12 杭州中奥科技有限公司 Event feature extraction method and device, storage medium and electronic equipment
CN112733549B (en) * 2020-12-31 2024-03-01 厦门智融合科技有限公司 Patent value information analysis method and device based on multiple semantic fusion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030269A (en) * 2006-03-03 2007-09-05 鸿富锦精密工业(深圳)有限公司 Patent valve estimating system and method
CN102567476A (en) * 2011-12-15 2012-07-11 浙江大学 Screening and valuing method of technical similarity patent
CN103117877A (en) * 2013-01-29 2013-05-22 四川大学 Automatic network topology generation device based on iterative TTL-IPID data package classification
CN103167570A (en) * 2011-12-19 2013-06-19 中国科学院声学研究所 Switch trigger and judgment method and switch trigger and judgment system in indoor cellular network
CN103679291A (en) * 2013-12-17 2014-03-26 江苏大学 Patent value assessment method
CN109194583A (en) * 2018-08-07 2019-01-11 中国地质大学(武汉) Network congestion Diagnosis of Links method and system based on depth enhancing study

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030269A (en) * 2006-03-03 2007-09-05 鸿富锦精密工业(深圳)有限公司 Patent valve estimating system and method
CN102567476A (en) * 2011-12-15 2012-07-11 浙江大学 Screening and valuing method of technical similarity patent
CN103167570A (en) * 2011-12-19 2013-06-19 中国科学院声学研究所 Switch trigger and judgment method and switch trigger and judgment system in indoor cellular network
CN103117877A (en) * 2013-01-29 2013-05-22 四川大学 Automatic network topology generation device based on iterative TTL-IPID data package classification
CN103679291A (en) * 2013-12-17 2014-03-26 江苏大学 Patent value assessment method
CN109194583A (en) * 2018-08-07 2019-01-11 中国地质大学(武汉) Network congestion Diagnosis of Links method and system based on depth enhancing study

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Patent_Maintenance_Recommendation_with_Patent_Information_Network_Model;Xin Jin等;《2011 11th IEEE International Conference on Data Mining》;20111231;280-289 *

Also Published As

Publication number Publication date
CN110458466A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN110458466B (en) Patent estimation method and system based on data mining and heterogeneous knowledge association
CN108647322B (en) Method for identifying similarity of mass Web text information based on word network
CN106649272A (en) Named entity recognizing method based on mixed model
CN109213843A (en) A kind of detection method and device of rubbish text information
CN106598950A (en) Method for recognizing named entity based on mixing stacking model
CN103092975A (en) Detection and filter method of network community garbage information based on topic consensus coverage rate
CN110264038A (en) A kind of generation method and equipment of product appraisal model
CN112990035B (en) Text recognition method, device, equipment and storage medium
CN110990718A (en) Social network model building module of company image improving system
CN109033132A (en) The method and device of text and the main body degree of correlation are calculated using knowledge mapping
CN110222192A (en) Corpus method for building up and device
CN107958068B (en) Language model smoothing method based on entity knowledge base
CN113222775A (en) User identity correlation method integrating multi-mode information and weight tensor
CN115344712A (en) Carbon standard knowledge graph construction method based on fusion text
CN111353838A (en) Method and device for automatically checking commodity category
WO2021098491A1 (en) Knowledge graph generating method, apparatus, and terminal, and storage medium
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
CN112579783A (en) Short text clustering method based on Laplace map
CN108536796B (en) Heterogeneous ontology matching method and system based on graph
CN116579348A (en) False news detection method and system based on uncertain semantic fusion
CN113361263B (en) Character entity attribute alignment method and system based on attribute value distribution
Zhou et al. Big data validity evaluation based on MMTD
Li et al. [Retracted] Intelligent Integration Method of Ideological and Political Education Resources Based on Deep Mining
CN113157954A (en) Sketch interactive clothing image retrieval method based on cross-domain conversion
Widianto et al. Application of density based clustering of disaster location in realtime social media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant