CN115146912A - Enterprise patent set and business correlation measuring method and system - Google Patents
Enterprise patent set and business correlation measuring method and system Download PDFInfo
- Publication number
- CN115146912A CN115146912A CN202210563931.7A CN202210563931A CN115146912A CN 115146912 A CN115146912 A CN 115146912A CN 202210563931 A CN202210563931 A CN 202210563931A CN 115146912 A CN115146912 A CN 115146912A
- Authority
- CN
- China
- Prior art keywords
- enterprise
- business
- abstract
- target
- target enterprise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 239000013598 vector Substances 0.000 claims abstract description 119
- 238000005259 measurement Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 15
- 238000007781 pre-processing Methods 0.000 claims description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims 1
- 230000010365 information processing Effects 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
- G06Q50/184—Intellectual property management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Probability & Statistics with Applications (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a system for measuring correlation between an enterprise patent set and a business, and relates to the technical field of computer information processing. The embodiment of the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.
Description
Technical Field
The invention relates to the technical field of computer information processing, in particular to a method and a system for measuring correlation between an enterprise patent set and a business.
Background
The patent is an important intangible asset of an enterprise and plays an important role in the production and operation process of the enterprise. The correlation between the patent and the business of the enterprise is high, on one hand, the enterprise has a technical basis for converting the patent, and the potential of the patent for bringing economic benefits to the enterprise is higher; on the other hand, the authority of the patent is also reflected, so that how to judge the correlation between the enterprise patent and the business becomes an important step for judging the strength of the scientific and technological enterprise.
Most of the existing patents and business relevance are manually judged through expert experience, so that the subjectivity is high and the efficiency is low. Therefore, a technology capable of determining the correlation between the enterprise patent and the business is needed.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a method and a system for measuring the correlation between an enterprise patent set and a business, and solves the problems of strong subjectivity and low efficiency of the conventional method for judging the correlation between the patent set and the business of the enterprise.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
in a first aspect, a method for measuring correlation between an enterprise patent set and a business is provided, where the method includes:
training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text;
generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text;
generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text;
calculating the similarity between each target enterprise patent abstract vector and a target enterprise business vector;
and calculating the correlation between the enterprise patent sets and the business based on the similarity.
Further, the enterprise service description text includes: enterprise profiles, business boundaries, product introductions, contest introductions, and company business profiles.
Further, the training of the patent abstract word vector model and the enterprise description text word vector model based on the enterprise patent abstract text and the enterprise business description text includes:
preprocessing an enterprise patent abstract text to obtain a patent abstract corpus;
training a Word2vec model based on the patent abstract corpus to obtain a patent abstract Word vector model;
preprocessing an enterprise service description text to obtain enterprise service corpora;
and training a Word2vec model based on the enterprise business corpus to obtain an enterprise description text Word vector model.
Further, the generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text includes:
acquiring a target enterprise business description text and preprocessing the target enterprise business description text;
extracting key words from the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business key words and business key word weights of the target enterprise;
the business keywords of the target enterprise are used as the input of the trained enterprise description text word vector model to generate the business keyword vector of the target enterprise;
and carrying out weighted average on the business keyword vectors of the target enterprise based on the business keyword weight of the target enterprise to obtain the business vectors of the target enterprise.
Further, the service keyword weight includes:
and acquiring the TF-IDF value of the jth business keyword of the target enterprise business description text as the weight of the jth business keyword.
Further, generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text includes:
acquiring a target enterprise patent abstract text, and preprocessing the target enterprise patent abstract text;
extracting key words of the preprocessed target enterprise patent abstract text based on a TF-IDF algorithm to obtain abstract key words and abstract key word weights of the target enterprise;
taking the abstract keywords of the target enterprise as the input of the trained patent abstract word vector model to generate an abstract keyword vector of the target enterprise;
and carrying out weighted average on the abstract keyword vectors of the target enterprises based on the abstract keyword weights of the target enterprises to obtain the abstract vectors of the target enterprises.
Further, the weight of the abstract keyword comprises:
the weight of the abstract key words positioned in the first sentence of the patent abstract is Otherwise, the weight of the abstract key words is
Wherein, the first and the second end of the pipe are connected with each other,ith patent abstract p representing target enterprise i The weight of the kth abstract keyword;
ith patent abstract p representing target enterprise i The TF-IDF value of the kth digest key of (1);
γ and δ are coefficients, γ + δ =1, γ > δ.
Further, the calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector includes:
and calculating the cosine distance between the target enterprise business vector and each target enterprise patent abstract vector of the target enterprise as the similarity.
Further, the calculating the correlation between the enterprise patent set and the business based on the similarity comprises:
and taking the average value of all the similarity degrees as the correlation between the enterprise patent set and the business.
In a second aspect, an enterprise patent set and business correlation measurement system is provided, the system includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method when executing the computer program.
(III) advantageous effects
The invention provides a method and a system for measuring correlation between an enterprise patent set and a business. Compared with the prior art, the method has the following beneficial effects:
the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application solves the problems of strong subjectivity and low efficiency of the existing patent and enterprise business correlation judgment method by providing the enterprise patent set and business correlation measurement method and system.
In order to solve the technical problems, the general idea of the embodiment of the application is as follows:
the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And can avoid manual intervention, reduce the labor cost, and improve the efficiency of correlation measurement under the background of big data
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example 1:
as shown in fig. 1, the present invention provides a method for measuring correlation between an enterprise patent set and a business, the method includes:
training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text;
generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text;
generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text;
calculating the similarity between each target enterprise patent abstract vector and a target enterprise business vector;
and calculating the correlation between the enterprise patent sets and the business based on the similarity.
The beneficial effect of this embodiment does:
the embodiment of the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And moreover, manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.
The following describes the implementation process of the embodiment of the present invention in detail:
s1, training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text.
In specific implementation, the construction process of the patent abstract word vector model comprises the following steps:
s101, collecting an enterprise patent abstract text and an enterprise business description text as original corpora;
s102, preprocessing the enterprise patent abstract text in the original corpus, for example, performing word segmentation, word stop and other operations to obtain a patent abstract corpus;
s103, taking the patent abstract corpus as the input of the Word2vec model, and training by using the conventional model training mode to obtain the patent abstract Word vector model.
Word2vec is the correlation model used to generate the Word vector. After training is completed, the Word2vec model can be used to map each Word to a vector, which can be used to represent the Word-to-Word relationship, and the vector is the hidden layer of the neural network.
Similar to the patent abstract word vector model, the construction process of the enterprise description text word vector model comprises the following steps:
s104, preprocessing the enterprise service description text in the original corpus, for example, performing word segmentation, word stop and other operations to obtain an enterprise service corpus;
and S105, taking the enterprise business corpus as the input of the Word2vec model, and training by using the conventional model training mode to obtain the enterprise description text Word vector model.
And S2, generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text.
In specific implementation, the method can be realized by adopting the following steps:
s201, acquiring an enterprise business description text of a target enterprise, and preprocessing the enterprise business description text; specifically, the preprocessing includes operations such as word segmentation and word stop processing;
s202, extracting keywords from the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business keywords and business keyword weights of the target enterprise;
t represents the number of key words of the business description text b of the enterprise to be evaluated;
by tfidf b j The TF-IDF value of the jth keyword of the target enterprise service b is represented;
the TF-IDF values of all business keywords of the target enterprise can be noted as:
tfidf b =[tfidf b 1 ,tfidf b 2 ,…,tfidf b j ,…,tfidf b t ]
specifically, the inventor researches and discovers that because the regularity of the business description text is not strong, the business description text does not consider the position weight, and therefore the TF-IDF value of the jth business keyword of the target enterprise business description text can be used as the weight of the jth business keyword.
S203, taking the business keywords of the target enterprise as the input of the trained enterprise description text word vector model, generating a business keyword vector of the target enterprise, and recording as:
w b =[w b 1 ,w b 2 ,…,w b j ,…w b t ]
wherein w b j And expressing the jth keyword vector of the business description text b of the enterprise to be evaluated.
S204, carrying out weighted average on the business keyword vector of the target enterprise based on the business keyword weight of the target enterprise to obtain a business vector beta of the target enterprise. And the calculation formula is as follows:
and S3, generating target enterprise patent abstract vectors based on the trained patent abstract word vector model and the target enterprise patent abstract text.
S301, acquiring a target enterprise patent abstract text, and preprocessing the target enterprise patent abstract text; specifically, the preprocessing includes operations such as word segmentation and word stop processing;
s302, extracting keywords from the preprocessed target enterprise patent abstract text based on a TF-IDF algorithm to obtain abstract keywords and abstract keyword weights of the target enterprise;
specifically, the patent abstract set of the target enterprise is P = { P = } 1 ,p 2 ,…,p n Denotes, and p i Is the ith patent abstract in P. The ith patent abstract p of the target enterprise i The TF-IDF values for all keywords of (1) can be written as:
wherein the content of the first and second substances,patent abstract p representing target enterprise i M represents the patent abstract p of the target enterprise i The number of keywords.
Further considering the influence of the position of the abstract keyword in the abstract on the weight, the inventor finds that unexpected effects can be achieved by setting the weight in the following way, and the accuracy is effectively improved:
the weight of the abstract key words positioned in the first sentence of the patent abstract is Otherwise, the abstract keyword has a weight of
Wherein the content of the first and second substances,ith patent abstract p representing target enterprise i The weight of the kth abstract keyword;
ith patent abstract p representing target enterprise i The TF-IDF value of the kth digest key of (1);
γ and δ are coefficients, γ + δ =1, γ > δ.
Then the target enterprise patent abstract p i The keyword weight of (c) can be recorded as:
s303, taking the abstract keywords of the target enterprise as the input of the trained patent abstract word vector model to generate an abstract keyword vector of the target enterprise;
specifically, abstract p of the ith patent of the target enterprise i The keyword word vector may be noted as:
wherein the content of the first and second substances,representing target enterprise patent abstract p i The word vector of the kth keyword of (1), m represents the patent abstract p i The number of keywords.
S304, carrying out weighted average on the abstract keyword vectors of the target enterprises based on the abstract keyword weights of the target enterprises to obtain the abstract vectors of the target enterprises.
By alpha i Representing the ith patent abstract p of a target enterprise i Object of (2)Enterprise abstract vectors; then the calculation formula is as follows:
and S4, calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector.
In specific implementation, a target enterprise business vector beta and a target enterprise patent set P = { P ] are calculated 1 ,p 2 ,…,p n Each patent abstract p in i Vector alpha i Cosine distance S of i And the calculation formula is as follows:
and S5, calculating the correlation between the enterprise patent set and the business based on the similarity.
Specifically, the inventor finds that at present, similarity between the enterprise patent set and the business is mostly judged aiming at a single patent, but the method cannot well evaluate the capacity of the enterprise, and needs to consider the correlation between the whole enterprise patent set and the business, so that the cosine distance average value S of the target enterprise description text vector and all patent abstract vectors thereof needs to be calculated to determine the correlation between the enterprise patent set and the business. And the specific formula is as follows:
and (3) experimental verification:
the accuracy of the embodiment of the invention is verified by specific examples, wherein gamma =2/3 and delta =1/3 are set, a thousand small and medium-sized scientific enterprises in Anhui province are selected, patent set data and enterprise description data are collected, three researchers are invited to mark the data, the correlation between the enterprise patent set and the business is judged, text vectorization is performed by using the conventional method based on the weight of the keyword TF-IDF as a comparison experiment, and the experiment result is shown in Table 1.
TABLE 1 patent and Enterprise relevance metric Performance
TF-IDF | TF-IDF + location weight | |
Mean Square Error (MSE) | 0.0085 | 0.0021 |
Root Mean Square Error (RMSE) | 0.0923 | 0.0457 |
Mean Absolute Error (MAE) | 0.0792 | 0.0347 |
Mean Absolute Percent Error (MAPE) | 0.2776 | 0.1214 |
The result shows that under four indexes of mean square error, root mean square error, average absolute error and average absolute percentage error, compared with the existing method for text vectorization based on TF-IDF weight, the accuracy of the correlation measurement of the enterprise patent set and the business can be effectively improved by optimizing the position weight and the TF-IDF weight.
Example 2
The invention also provides an enterprise patent set and business correlation measurement system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method.
It can be understood that the system for measuring correlation between enterprise patent sets and businesses provided in the embodiments of the present invention corresponds to the method for measuring correlation between enterprise patent sets and businesses, and the explanation, examples, and beneficial effects of the relevant contents may refer to the corresponding contents in the method for measuring correlation between enterprise patent sets and businesses, which are not described herein again.
In summary, compared with the prior art, the invention has the following beneficial effects:
the method for measuring the correlation between the enterprise patent set and the business takes the patent semantic information and the enterprise semantic information into consideration, and can effectively measure the correlation between the enterprise patent set and the business thereof. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.
It should be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform. With this understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments. In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for measuring correlation between an enterprise patent set and a business is characterized by comprising the following steps:
training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text;
generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text;
generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text;
calculating the similarity between each target enterprise patent abstract vector and a target enterprise business vector;
and calculating the correlation between the enterprise patent sets and the business based on the similarity.
2. The method as claimed in claim 1, wherein the enterprise business description text includes: enterprise profiles, business boundaries, product introductions, contest introductions, and company business profiles.
3. The method for measuring the correlation between the enterprise patent sets and the business, as claimed in claim 1, wherein the training of the patent abstract word vector model and the business description text based on the enterprise patent abstract text and the business description text comprises:
preprocessing an enterprise patent abstract text to obtain patent abstract corpora;
training a Word2vec model based on the patent abstract corpus to obtain a patent abstract Word vector model;
preprocessing an enterprise service description text to obtain enterprise service corpora;
and training a Word2vec model based on the enterprise business corpus to obtain an enterprise description text Word vector model.
4. The method for measuring correlation between enterprise patent sets and businesses as claimed in claim 1, wherein said generating target enterprise business vectors based on the trained enterprise description text word vector model and the target enterprise business description text comprises:
acquiring a target enterprise business description text, and preprocessing the target enterprise business description text;
performing keyword extraction on the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business keywords and business keyword weight of the target enterprise;
the business keywords of the target enterprise are used as the input of the trained enterprise description text word vector model to generate the business keyword vector of the target enterprise;
and carrying out weighted average on the business keyword vectors of the target enterprise based on the business keyword weight of the target enterprise to obtain the business vectors of the target enterprise.
5. The method as claimed in claim 4, wherein the business keyword weight comprises:
and acquiring the TF-IDF value of the jth business keyword of the target enterprise business description text as the weight of the jth business keyword.
6. The method for measuring correlation between an enterprise patent set and a business as claimed in claim 1, wherein said generating a target enterprise patent abstract vector based on a trained patent abstract word vector model and a target enterprise patent abstract text comprises:
acquiring a target enterprise patent abstract text, and preprocessing the target enterprise patent abstract text;
extracting key words of the preprocessed target enterprise patent abstract text based on a TF-IDF algorithm to obtain abstract key words and abstract key word weights of the target enterprise;
taking the abstract keywords of the target enterprise as the input of the trained patent abstract word vector model to generate an abstract keyword vector of the target enterprise;
and carrying out weighted average on the abstract keyword vectors of the target enterprises based on the abstract keyword weights of the target enterprises to obtain the abstract vectors of the target enterprises.
7. The method as claimed in claim 6, wherein the abstract keyword weight comprises:
the abstract key words in the first sentence of the patent abstract are weighted as Otherwise, the weight of the abstract key words is
Wherein the content of the first and second substances,ith patent abstract representing target enterpriseTo p is to i The weight of the kth abstract keyword;
ith patent abstract p representing target enterprise i The TF-IDF value of the kth digest key of (1);
γ and δ are coefficients, γ + δ =1, γ > δ.
8. The method for measuring the correlation between the enterprise patent sets and the business as claimed in claim 1, wherein the calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector comprises:
and calculating the cosine distance between the target enterprise business vector and each target enterprise patent abstract vector of the target enterprise as the similarity.
9. The method for measuring the correlation between the enterprise patent set and the business as claimed in claim 1, wherein the calculating the correlation between the enterprise patent set and the business based on the similarity comprises:
and taking the average value of all the similarity degrees as the correlation between the enterprise patent set and the business.
10. An enterprise patent collection and business correlation measurement system, the system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210563931.7A CN115146912A (en) | 2022-05-23 | 2022-05-23 | Enterprise patent set and business correlation measuring method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210563931.7A CN115146912A (en) | 2022-05-23 | 2022-05-23 | Enterprise patent set and business correlation measuring method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115146912A true CN115146912A (en) | 2022-10-04 |
Family
ID=83406543
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210563931.7A Pending CN115146912A (en) | 2022-05-23 | 2022-05-23 | Enterprise patent set and business correlation measuring method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115146912A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523473A (en) * | 2023-06-29 | 2023-08-01 | 湖南省拾牛网络科技有限公司 | Similar enterprise-based item matching method, device, equipment and medium |
-
2022
- 2022-05-23 CN CN202210563931.7A patent/CN115146912A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523473A (en) * | 2023-06-29 | 2023-08-01 | 湖南省拾牛网络科技有限公司 | Similar enterprise-based item matching method, device, equipment and medium |
CN116523473B (en) * | 2023-06-29 | 2023-08-25 | 湖南省拾牛网络科技有限公司 | Similar enterprise-based item matching method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670191B (en) | Calibration optimization method and device for machine translation and electronic equipment | |
CN110032641B (en) | Method and device for extracting event by using neural network and executed by computer | |
WO2024131111A1 (en) | Intelligent writing method and apparatus, device, and nonvolatile readable storage medium | |
CN109033132B (en) | Method and device for calculating text and subject correlation by using knowledge graph | |
CN111694940A (en) | User report generation method and terminal equipment | |
CN111930931B (en) | Abstract evaluation method and device | |
CN107908698A (en) | A kind of theme network crawler method, electronic equipment, storage medium, system | |
CN113392209A (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN112347758A (en) | Text abstract generation method and device, terminal equipment and storage medium | |
CN112395875A (en) | Keyword extraction method, device, terminal and storage medium | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN112434514A (en) | Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment | |
CN115099310A (en) | Method and device for training model and classifying enterprises | |
CN115146912A (en) | Enterprise patent set and business correlation measuring method and system | |
CN112562736B (en) | Voice data set quality assessment method and device | |
CN110399477A (en) | A kind of literature summary extracting method, equipment and can storage medium | |
CN111881264B (en) | Method and electronic equipment for searching long text in question-answering task in open field | |
CN112668305B (en) | Attention mechanism-based thesis reference quantity prediction method and system | |
CN116522912B (en) | Training method, device, medium and equipment for package design language model | |
CN112668838A (en) | Scoring standard word bank establishing method and device based on natural language analysis | |
CN116186219A (en) | Man-machine dialogue interaction method, system and storage medium | |
CN110705287B (en) | Method and system for generating text abstract | |
CN109558481B (en) | Method, device and equipment for measuring correlation between patent and enterprise and readable storage medium | |
CN111178038B (en) | Document similarity recognition method and device based on latent semantic analysis | |
CN108733824B (en) | Interactive theme modeling method and device considering expert knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |