CN115146912A - Enterprise patent set and business correlation measuring method and system - Google Patents

Enterprise patent set and business correlation measuring method and system Download PDF

Info

Publication number
CN115146912A
CN115146912A CN202210563931.7A CN202210563931A CN115146912A CN 115146912 A CN115146912 A CN 115146912A CN 202210563931 A CN202210563931 A CN 202210563931A CN 115146912 A CN115146912 A CN 115146912A
Authority
CN
China
Prior art keywords
enterprise
business
abstract
target
target enterprise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210563931.7A
Other languages
Chinese (zh)
Inventor
阮传宏
田继阳
吴胜建
王驭
张邦华
徐绡绡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Credit Bureau Co ltd
Original Assignee
Anhui Credit Bureau Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Credit Bureau Co ltd filed Critical Anhui Credit Bureau Co ltd
Priority to CN202210563931.7A priority Critical patent/CN115146912A/en
Publication of CN115146912A publication Critical patent/CN115146912A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for measuring correlation between an enterprise patent set and a business, and relates to the technical field of computer information processing. The embodiment of the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.

Description

Enterprise patent set and business correlation measuring method and system
Technical Field
The invention relates to the technical field of computer information processing, in particular to a method and a system for measuring correlation between an enterprise patent set and a business.
Background
The patent is an important intangible asset of an enterprise and plays an important role in the production and operation process of the enterprise. The correlation between the patent and the business of the enterprise is high, on one hand, the enterprise has a technical basis for converting the patent, and the potential of the patent for bringing economic benefits to the enterprise is higher; on the other hand, the authority of the patent is also reflected, so that how to judge the correlation between the enterprise patent and the business becomes an important step for judging the strength of the scientific and technological enterprise.
Most of the existing patents and business relevance are manually judged through expert experience, so that the subjectivity is high and the efficiency is low. Therefore, a technology capable of determining the correlation between the enterprise patent and the business is needed.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a method and a system for measuring the correlation between an enterprise patent set and a business, and solves the problems of strong subjectivity and low efficiency of the conventional method for judging the correlation between the patent set and the business of the enterprise.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
in a first aspect, a method for measuring correlation between an enterprise patent set and a business is provided, where the method includes:
training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text;
generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text;
generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text;
calculating the similarity between each target enterprise patent abstract vector and a target enterprise business vector;
and calculating the correlation between the enterprise patent sets and the business based on the similarity.
Further, the enterprise service description text includes: enterprise profiles, business boundaries, product introductions, contest introductions, and company business profiles.
Further, the training of the patent abstract word vector model and the enterprise description text word vector model based on the enterprise patent abstract text and the enterprise business description text includes:
preprocessing an enterprise patent abstract text to obtain a patent abstract corpus;
training a Word2vec model based on the patent abstract corpus to obtain a patent abstract Word vector model;
preprocessing an enterprise service description text to obtain enterprise service corpora;
and training a Word2vec model based on the enterprise business corpus to obtain an enterprise description text Word vector model.
Further, the generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text includes:
acquiring a target enterprise business description text and preprocessing the target enterprise business description text;
extracting key words from the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business key words and business key word weights of the target enterprise;
the business keywords of the target enterprise are used as the input of the trained enterprise description text word vector model to generate the business keyword vector of the target enterprise;
and carrying out weighted average on the business keyword vectors of the target enterprise based on the business keyword weight of the target enterprise to obtain the business vectors of the target enterprise.
Further, the service keyword weight includes:
and acquiring the TF-IDF value of the jth business keyword of the target enterprise business description text as the weight of the jth business keyword.
Further, generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text includes:
acquiring a target enterprise patent abstract text, and preprocessing the target enterprise patent abstract text;
extracting key words of the preprocessed target enterprise patent abstract text based on a TF-IDF algorithm to obtain abstract key words and abstract key word weights of the target enterprise;
taking the abstract keywords of the target enterprise as the input of the trained patent abstract word vector model to generate an abstract keyword vector of the target enterprise;
and carrying out weighted average on the abstract keyword vectors of the target enterprises based on the abstract keyword weights of the target enterprises to obtain the abstract vectors of the target enterprises.
Further, the weight of the abstract keyword comprises:
the weight of the abstract key words positioned in the first sentence of the patent abstract is
Figure BDA0003657530240000031
Figure BDA0003657530240000032
Otherwise, the weight of the abstract key words is
Figure BDA0003657530240000033
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003657530240000034
ith patent abstract p representing target enterprise i The weight of the kth abstract keyword;
Figure BDA0003657530240000035
ith patent abstract p representing target enterprise i The TF-IDF value of the kth digest key of (1);
γ and δ are coefficients, γ + δ =1, γ > δ.
Further, the calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector includes:
and calculating the cosine distance between the target enterprise business vector and each target enterprise patent abstract vector of the target enterprise as the similarity.
Further, the calculating the correlation between the enterprise patent set and the business based on the similarity comprises:
and taking the average value of all the similarity degrees as the correlation between the enterprise patent set and the business.
In a second aspect, an enterprise patent set and business correlation measurement system is provided, the system includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method when executing the computer program.
(III) advantageous effects
The invention provides a method and a system for measuring correlation between an enterprise patent set and a business. Compared with the prior art, the method has the following beneficial effects:
the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application solves the problems of strong subjectivity and low efficiency of the existing patent and enterprise business correlation judgment method by providing the enterprise patent set and business correlation measurement method and system.
In order to solve the technical problems, the general idea of the embodiment of the application is as follows:
the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And can avoid manual intervention, reduce the labor cost, and improve the efficiency of correlation measurement under the background of big data
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example 1:
as shown in fig. 1, the present invention provides a method for measuring correlation between an enterprise patent set and a business, the method includes:
training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text;
generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text;
generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text;
calculating the similarity between each target enterprise patent abstract vector and a target enterprise business vector;
and calculating the correlation between the enterprise patent sets and the business based on the similarity.
The beneficial effect of this embodiment does:
the embodiment of the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And moreover, manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.
The following describes the implementation process of the embodiment of the present invention in detail:
s1, training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text.
In specific implementation, the construction process of the patent abstract word vector model comprises the following steps:
s101, collecting an enterprise patent abstract text and an enterprise business description text as original corpora;
s102, preprocessing the enterprise patent abstract text in the original corpus, for example, performing word segmentation, word stop and other operations to obtain a patent abstract corpus;
s103, taking the patent abstract corpus as the input of the Word2vec model, and training by using the conventional model training mode to obtain the patent abstract Word vector model.
Word2vec is the correlation model used to generate the Word vector. After training is completed, the Word2vec model can be used to map each Word to a vector, which can be used to represent the Word-to-Word relationship, and the vector is the hidden layer of the neural network.
Similar to the patent abstract word vector model, the construction process of the enterprise description text word vector model comprises the following steps:
s104, preprocessing the enterprise service description text in the original corpus, for example, performing word segmentation, word stop and other operations to obtain an enterprise service corpus;
and S105, taking the enterprise business corpus as the input of the Word2vec model, and training by using the conventional model training mode to obtain the enterprise description text Word vector model.
And S2, generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text.
In specific implementation, the method can be realized by adopting the following steps:
s201, acquiring an enterprise business description text of a target enterprise, and preprocessing the enterprise business description text; specifically, the preprocessing includes operations such as word segmentation and word stop processing;
s202, extracting keywords from the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business keywords and business keyword weights of the target enterprise;
t represents the number of key words of the business description text b of the enterprise to be evaluated;
by tfidf b j The TF-IDF value of the jth keyword of the target enterprise service b is represented;
the TF-IDF values of all business keywords of the target enterprise can be noted as:
tfidf b =[tfidf b 1 ,tfidf b 2 ,…,tfidf b j ,…,tfidf b t ]
specifically, the inventor researches and discovers that because the regularity of the business description text is not strong, the business description text does not consider the position weight, and therefore the TF-IDF value of the jth business keyword of the target enterprise business description text can be used as the weight of the jth business keyword.
S203, taking the business keywords of the target enterprise as the input of the trained enterprise description text word vector model, generating a business keyword vector of the target enterprise, and recording as:
w b =[w b 1 ,w b 2 ,…,w b j ,…w b t ]
wherein w b j And expressing the jth keyword vector of the business description text b of the enterprise to be evaluated.
S204, carrying out weighted average on the business keyword vector of the target enterprise based on the business keyword weight of the target enterprise to obtain a business vector beta of the target enterprise. And the calculation formula is as follows:
Figure BDA0003657530240000071
and S3, generating target enterprise patent abstract vectors based on the trained patent abstract word vector model and the target enterprise patent abstract text.
S301, acquiring a target enterprise patent abstract text, and preprocessing the target enterprise patent abstract text; specifically, the preprocessing includes operations such as word segmentation and word stop processing;
s302, extracting keywords from the preprocessed target enterprise patent abstract text based on a TF-IDF algorithm to obtain abstract keywords and abstract keyword weights of the target enterprise;
specifically, the patent abstract set of the target enterprise is P = { P = } 1 ,p 2 ,…,p n Denotes, and p i Is the ith patent abstract in P. The ith patent abstract p of the target enterprise i The TF-IDF values for all keywords of (1) can be written as:
Figure BDA0003657530240000072
wherein the content of the first and second substances,
Figure BDA0003657530240000073
patent abstract p representing target enterprise i M represents the patent abstract p of the target enterprise i The number of keywords.
Further considering the influence of the position of the abstract keyword in the abstract on the weight, the inventor finds that unexpected effects can be achieved by setting the weight in the following way, and the accuracy is effectively improved:
the weight of the abstract key words positioned in the first sentence of the patent abstract is
Figure BDA0003657530240000081
Figure BDA0003657530240000082
Otherwise, the abstract keyword has a weight of
Figure BDA0003657530240000083
Wherein the content of the first and second substances,
Figure BDA0003657530240000084
ith patent abstract p representing target enterprise i The weight of the kth abstract keyword;
Figure BDA0003657530240000085
ith patent abstract p representing target enterprise i The TF-IDF value of the kth digest key of (1);
γ and δ are coefficients, γ + δ =1, γ > δ.
Then the target enterprise patent abstract p i The keyword weight of (c) can be recorded as:
Figure BDA0003657530240000086
s303, taking the abstract keywords of the target enterprise as the input of the trained patent abstract word vector model to generate an abstract keyword vector of the target enterprise;
specifically, abstract p of the ith patent of the target enterprise i The keyword word vector may be noted as:
Figure BDA0003657530240000087
wherein the content of the first and second substances,
Figure BDA0003657530240000088
representing target enterprise patent abstract p i The word vector of the kth keyword of (1), m represents the patent abstract p i The number of keywords.
S304, carrying out weighted average on the abstract keyword vectors of the target enterprises based on the abstract keyword weights of the target enterprises to obtain the abstract vectors of the target enterprises.
By alpha i Representing the ith patent abstract p of a target enterprise i Object of (2)Enterprise abstract vectors; then the calculation formula is as follows:
Figure BDA0003657530240000089
and S4, calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector.
In specific implementation, a target enterprise business vector beta and a target enterprise patent set P = { P ] are calculated 1 ,p 2 ,…,p n Each patent abstract p in i Vector alpha i Cosine distance S of i And the calculation formula is as follows:
Figure BDA00036575302400000810
and S5, calculating the correlation between the enterprise patent set and the business based on the similarity.
Specifically, the inventor finds that at present, similarity between the enterprise patent set and the business is mostly judged aiming at a single patent, but the method cannot well evaluate the capacity of the enterprise, and needs to consider the correlation between the whole enterprise patent set and the business, so that the cosine distance average value S of the target enterprise description text vector and all patent abstract vectors thereof needs to be calculated to determine the correlation between the enterprise patent set and the business. And the specific formula is as follows:
Figure BDA0003657530240000091
and (3) experimental verification:
the accuracy of the embodiment of the invention is verified by specific examples, wherein gamma =2/3 and delta =1/3 are set, a thousand small and medium-sized scientific enterprises in Anhui province are selected, patent set data and enterprise description data are collected, three researchers are invited to mark the data, the correlation between the enterprise patent set and the business is judged, text vectorization is performed by using the conventional method based on the weight of the keyword TF-IDF as a comparison experiment, and the experiment result is shown in Table 1.
TABLE 1 patent and Enterprise relevance metric Performance
TF-IDF TF-IDF + location weight
Mean Square Error (MSE) 0.0085 0.0021
Root Mean Square Error (RMSE) 0.0923 0.0457
Mean Absolute Error (MAE) 0.0792 0.0347
Mean Absolute Percent Error (MAPE) 0.2776 0.1214
The result shows that under four indexes of mean square error, root mean square error, average absolute error and average absolute percentage error, compared with the existing method for text vectorization based on TF-IDF weight, the accuracy of the correlation measurement of the enterprise patent set and the business can be effectively improved by optimizing the position weight and the TF-IDF weight.
Example 2
The invention also provides an enterprise patent set and business correlation measurement system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method.
It can be understood that the system for measuring correlation between enterprise patent sets and businesses provided in the embodiments of the present invention corresponds to the method for measuring correlation between enterprise patent sets and businesses, and the explanation, examples, and beneficial effects of the relevant contents may refer to the corresponding contents in the method for measuring correlation between enterprise patent sets and businesses, which are not described herein again.
In summary, compared with the prior art, the invention has the following beneficial effects:
the method for measuring the correlation between the enterprise patent set and the business takes the patent semantic information and the enterprise semantic information into consideration, and can effectively measure the correlation between the enterprise patent set and the business thereof. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.
It should be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform. With this understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments. In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for measuring correlation between an enterprise patent set and a business is characterized by comprising the following steps:
training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text;
generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text;
generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text;
calculating the similarity between each target enterprise patent abstract vector and a target enterprise business vector;
and calculating the correlation between the enterprise patent sets and the business based on the similarity.
2. The method as claimed in claim 1, wherein the enterprise business description text includes: enterprise profiles, business boundaries, product introductions, contest introductions, and company business profiles.
3. The method for measuring the correlation between the enterprise patent sets and the business, as claimed in claim 1, wherein the training of the patent abstract word vector model and the business description text based on the enterprise patent abstract text and the business description text comprises:
preprocessing an enterprise patent abstract text to obtain patent abstract corpora;
training a Word2vec model based on the patent abstract corpus to obtain a patent abstract Word vector model;
preprocessing an enterprise service description text to obtain enterprise service corpora;
and training a Word2vec model based on the enterprise business corpus to obtain an enterprise description text Word vector model.
4. The method for measuring correlation between enterprise patent sets and businesses as claimed in claim 1, wherein said generating target enterprise business vectors based on the trained enterprise description text word vector model and the target enterprise business description text comprises:
acquiring a target enterprise business description text, and preprocessing the target enterprise business description text;
performing keyword extraction on the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business keywords and business keyword weight of the target enterprise;
the business keywords of the target enterprise are used as the input of the trained enterprise description text word vector model to generate the business keyword vector of the target enterprise;
and carrying out weighted average on the business keyword vectors of the target enterprise based on the business keyword weight of the target enterprise to obtain the business vectors of the target enterprise.
5. The method as claimed in claim 4, wherein the business keyword weight comprises:
and acquiring the TF-IDF value of the jth business keyword of the target enterprise business description text as the weight of the jth business keyword.
6. The method for measuring correlation between an enterprise patent set and a business as claimed in claim 1, wherein said generating a target enterprise patent abstract vector based on a trained patent abstract word vector model and a target enterprise patent abstract text comprises:
acquiring a target enterprise patent abstract text, and preprocessing the target enterprise patent abstract text;
extracting key words of the preprocessed target enterprise patent abstract text based on a TF-IDF algorithm to obtain abstract key words and abstract key word weights of the target enterprise;
taking the abstract keywords of the target enterprise as the input of the trained patent abstract word vector model to generate an abstract keyword vector of the target enterprise;
and carrying out weighted average on the abstract keyword vectors of the target enterprises based on the abstract keyword weights of the target enterprises to obtain the abstract vectors of the target enterprises.
7. The method as claimed in claim 6, wherein the abstract keyword weight comprises:
the abstract key words in the first sentence of the patent abstract are weighted as
Figure FDA0003657530230000021
Figure FDA0003657530230000022
Otherwise, the weight of the abstract key words is
Figure FDA0003657530230000023
Wherein the content of the first and second substances,
Figure FDA0003657530230000024
ith patent abstract representing target enterpriseTo p is to i The weight of the kth abstract keyword;
Figure FDA0003657530230000025
ith patent abstract p representing target enterprise i The TF-IDF value of the kth digest key of (1);
γ and δ are coefficients, γ + δ =1, γ > δ.
8. The method for measuring the correlation between the enterprise patent sets and the business as claimed in claim 1, wherein the calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector comprises:
and calculating the cosine distance between the target enterprise business vector and each target enterprise patent abstract vector of the target enterprise as the similarity.
9. The method for measuring the correlation between the enterprise patent set and the business as claimed in claim 1, wherein the calculating the correlation between the enterprise patent set and the business based on the similarity comprises:
and taking the average value of all the similarity degrees as the correlation between the enterprise patent set and the business.
10. An enterprise patent collection and business correlation measurement system, the system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method of any of claims 1-9.
CN202210563931.7A 2022-05-23 2022-05-23 Enterprise patent set and business correlation measuring method and system Pending CN115146912A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210563931.7A CN115146912A (en) 2022-05-23 2022-05-23 Enterprise patent set and business correlation measuring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210563931.7A CN115146912A (en) 2022-05-23 2022-05-23 Enterprise patent set and business correlation measuring method and system

Publications (1)

Publication Number Publication Date
CN115146912A true CN115146912A (en) 2022-10-04

Family

ID=83406543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210563931.7A Pending CN115146912A (en) 2022-05-23 2022-05-23 Enterprise patent set and business correlation measuring method and system

Country Status (1)

Country Link
CN (1) CN115146912A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523473A (en) * 2023-06-29 2023-08-01 湖南省拾牛网络科技有限公司 Similar enterprise-based item matching method, device, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523473A (en) * 2023-06-29 2023-08-01 湖南省拾牛网络科技有限公司 Similar enterprise-based item matching method, device, equipment and medium
CN116523473B (en) * 2023-06-29 2023-08-25 湖南省拾牛网络科技有限公司 Similar enterprise-based item matching method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN109670191B (en) Calibration optimization method and device for machine translation and electronic equipment
CN110032641B (en) Method and device for extracting event by using neural network and executed by computer
WO2024131111A1 (en) Intelligent writing method and apparatus, device, and nonvolatile readable storage medium
CN109033132B (en) Method and device for calculating text and subject correlation by using knowledge graph
CN111694940A (en) User report generation method and terminal equipment
CN111930931B (en) Abstract evaluation method and device
CN107908698A (en) A kind of theme network crawler method, electronic equipment, storage medium, system
CN113392209A (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN112347758A (en) Text abstract generation method and device, terminal equipment and storage medium
CN112395875A (en) Keyword extraction method, device, terminal and storage medium
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN112434514A (en) Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment
CN115099310A (en) Method and device for training model and classifying enterprises
CN115146912A (en) Enterprise patent set and business correlation measuring method and system
CN112562736B (en) Voice data set quality assessment method and device
CN110399477A (en) A kind of literature summary extracting method, equipment and can storage medium
CN111881264B (en) Method and electronic equipment for searching long text in question-answering task in open field
CN112668305B (en) Attention mechanism-based thesis reference quantity prediction method and system
CN116522912B (en) Training method, device, medium and equipment for package design language model
CN112668838A (en) Scoring standard word bank establishing method and device based on natural language analysis
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN110705287B (en) Method and system for generating text abstract
CN109558481B (en) Method, device and equipment for measuring correlation between patent and enterprise and readable storage medium
CN111178038B (en) Document similarity recognition method and device based on latent semantic analysis
CN108733824B (en) Interactive theme modeling method and device considering expert knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination