CN115146912A

CN115146912A - Enterprise patent set and business correlation measuring method and system

Info

Publication number: CN115146912A
Application number: CN202210563931.7A
Authority: CN
Inventors: 阮传宏; 田继阳; 吴胜建; 王驭; 张邦华; 徐绡绡
Original assignee: Anhui Credit Bureau Co ltd
Current assignee: Anhui Credit Bureau Co ltd
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-10-04

Abstract

The invention provides a method and a system for measuring correlation between an enterprise patent set and a business, and relates to the technical field of computer information processing. The embodiment of the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.

Description

Enterprise patent set and business correlation measuring method and system

Technical Field

The invention relates to the technical field of computer information processing, in particular to a method and a system for measuring correlation between an enterprise patent set and a business.

Background

The patent is an important intangible asset of an enterprise and plays an important role in the production and operation process of the enterprise. The correlation between the patent and the business of the enterprise is high, on one hand, the enterprise has a technical basis for converting the patent, and the potential of the patent for bringing economic benefits to the enterprise is higher; on the other hand, the authority of the patent is also reflected, so that how to judge the correlation between the enterprise patent and the business becomes an important step for judging the strength of the scientific and technological enterprise.

Most of the existing patents and business relevance are manually judged through expert experience, so that the subjectivity is high and the efficiency is low. Therefore, a technology capable of determining the correlation between the enterprise patent and the business is needed.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a method and a system for measuring the correlation between an enterprise patent set and a business, and solves the problems of strong subjectivity and low efficiency of the conventional method for judging the correlation between the patent set and the business of the enterprise.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme:

in a first aspect, a method for measuring correlation between an enterprise patent set and a business is provided, where the method includes:

training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text;

generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text;

generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text;

calculating the similarity between each target enterprise patent abstract vector and a target enterprise business vector;

and calculating the correlation between the enterprise patent sets and the business based on the similarity.

Further, the enterprise service description text includes: enterprise profiles, business boundaries, product introductions, contest introductions, and company business profiles.

Further, the training of the patent abstract word vector model and the enterprise description text word vector model based on the enterprise patent abstract text and the enterprise business description text includes:

preprocessing an enterprise patent abstract text to obtain a patent abstract corpus;

training a Word2vec model based on the patent abstract corpus to obtain a patent abstract Word vector model;

preprocessing an enterprise service description text to obtain enterprise service corpora;

and training a Word2vec model based on the enterprise business corpus to obtain an enterprise description text Word vector model.

Further, the generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text includes:

acquiring a target enterprise business description text and preprocessing the target enterprise business description text;

extracting key words from the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business key words and business key word weights of the target enterprise;

the business keywords of the target enterprise are used as the input of the trained enterprise description text word vector model to generate the business keyword vector of the target enterprise;

and carrying out weighted average on the business keyword vectors of the target enterprise based on the business keyword weight of the target enterprise to obtain the business vectors of the target enterprise.

Further, the service keyword weight includes:

and acquiring the TF-IDF value of the jth business keyword of the target enterprise business description text as the weight of the jth business keyword.

Further, generating a target enterprise patent abstract vector based on the trained patent abstract word vector model and the target enterprise patent abstract text includes:

acquiring a target enterprise patent abstract text, and preprocessing the target enterprise patent abstract text;

extracting key words of the preprocessed target enterprise patent abstract text based on a TF-IDF algorithm to obtain abstract key words and abstract key word weights of the target enterprise;

taking the abstract keywords of the target enterprise as the input of the trained patent abstract word vector model to generate an abstract keyword vector of the target enterprise;

and carrying out weighted average on the abstract keyword vectors of the target enterprises based on the abstract keyword weights of the target enterprises to obtain the abstract vectors of the target enterprises.

Further, the weight of the abstract keyword comprises:

the weight of the abstract key words positioned in the first sentence of the patent abstract is

Otherwise, the weight of the abstract key words is

Wherein, the first and the second end of the pipe are connected with each other,

ith patent abstract p representing target enterprise _i The weight of the kth abstract keyword;

ith patent abstract p representing target enterprise _i The TF-IDF value of the kth digest key of (1);

γ and δ are coefficients, γ + δ =1, γ > δ.

Further, the calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector includes:

and calculating the cosine distance between the target enterprise business vector and each target enterprise patent abstract vector of the target enterprise as the similarity.

Further, the calculating the correlation between the enterprise patent set and the business based on the similarity comprises:

and taking the average value of all the similarity degrees as the correlation between the enterprise patent set and the business.

In a second aspect, an enterprise patent set and business correlation measurement system is provided, the system includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method when executing the computer program.

(III) advantageous effects

The invention provides a method and a system for measuring correlation between an enterprise patent set and a business. Compared with the prior art, the method has the following beneficial effects:

the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the application solves the problems of strong subjectivity and low efficiency of the existing patent and enterprise business correlation judgment method by providing the enterprise patent set and business correlation measurement method and system.

In order to solve the technical problems, the general idea of the embodiment of the application is as follows:

the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And can avoid manual intervention, reduce the labor cost, and improve the efficiency of correlation measurement under the background of big data

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Example 1:

as shown in fig. 1, the present invention provides a method for measuring correlation between an enterprise patent set and a business, the method includes:

The beneficial effect of this embodiment does:

the embodiment of the invention firstly constructs and trains a patent abstract word vector model and an enterprise description text word vector model based on enterprise patent abstract and business description. The patent abstract and the business description of a target enterprise are extracted based on two trained models to obtain a target enterprise business vector and a target enterprise patent abstract vector, the similarity between each target enterprise patent abstract vector and the target enterprise business vector is calculated, and the similarity between all the patent abstracts and the business is finally considered as the correlation between an enterprise patent set and the business. And moreover, manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.

The following describes the implementation process of the embodiment of the present invention in detail:

s1, training a patent abstract word vector model and an enterprise description text word vector model based on an enterprise patent abstract text and an enterprise business description text.

In specific implementation, the construction process of the patent abstract word vector model comprises the following steps:

s101, collecting an enterprise patent abstract text and an enterprise business description text as original corpora;

s102, preprocessing the enterprise patent abstract text in the original corpus, for example, performing word segmentation, word stop and other operations to obtain a patent abstract corpus;

s103, taking the patent abstract corpus as the input of the Word2vec model, and training by using the conventional model training mode to obtain the patent abstract Word vector model.

Word2vec is the correlation model used to generate the Word vector. After training is completed, the Word2vec model can be used to map each Word to a vector, which can be used to represent the Word-to-Word relationship, and the vector is the hidden layer of the neural network.

Similar to the patent abstract word vector model, the construction process of the enterprise description text word vector model comprises the following steps:

s104, preprocessing the enterprise service description text in the original corpus, for example, performing word segmentation, word stop and other operations to obtain an enterprise service corpus;

and S105, taking the enterprise business corpus as the input of the Word2vec model, and training by using the conventional model training mode to obtain the enterprise description text Word vector model.

And S2, generating a target enterprise business vector based on the trained enterprise description text word vector model and the target enterprise business description text.

In specific implementation, the method can be realized by adopting the following steps:

s201, acquiring an enterprise business description text of a target enterprise, and preprocessing the enterprise business description text; specifically, the preprocessing includes operations such as word segmentation and word stop processing;

s202, extracting keywords from the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business keywords and business keyword weights of the target enterprise;

t represents the number of key words of the business description text b of the enterprise to be evaluated;

by tfidf ^b _j The TF-IDF value of the jth keyword of the target enterprise service b is represented;

the TF-IDF values of all business keywords of the target enterprise can be noted as:

tfidf ^b ＝[tfidf ^b ₁ ,tfidf ^b ₂ ,…,tfidf ^b _j ,…,tfidf ^b _t ]

specifically, the inventor researches and discovers that because the regularity of the business description text is not strong, the business description text does not consider the position weight, and therefore the TF-IDF value of the jth business keyword of the target enterprise business description text can be used as the weight of the jth business keyword.

S203, taking the business keywords of the target enterprise as the input of the trained enterprise description text word vector model, generating a business keyword vector of the target enterprise, and recording as:

w ^b ＝[w ^b ₁ ,w ^b ₂ ,…,w ^b _j ,…w ^b _t ]

wherein w ^b _j And expressing the jth keyword vector of the business description text b of the enterprise to be evaluated.

S204, carrying out weighted average on the business keyword vector of the target enterprise based on the business keyword weight of the target enterprise to obtain a business vector beta of the target enterprise. And the calculation formula is as follows:

and S3, generating target enterprise patent abstract vectors based on the trained patent abstract word vector model and the target enterprise patent abstract text.

S301, acquiring a target enterprise patent abstract text, and preprocessing the target enterprise patent abstract text; specifically, the preprocessing includes operations such as word segmentation and word stop processing;

s302, extracting keywords from the preprocessed target enterprise patent abstract text based on a TF-IDF algorithm to obtain abstract keywords and abstract keyword weights of the target enterprise;

specifically, the patent abstract set of the target enterprise is P = { P = } ₁ ,p ₂ ,…,p _n Denotes, and p _i Is the ith patent abstract in P. The ith patent abstract p of the target enterprise _i The TF-IDF values for all keywords of (1) can be written as:

wherein the content of the first and second substances,

patent abstract p representing target enterprise _i M represents the patent abstract p of the target enterprise _i The number of keywords.

Further considering the influence of the position of the abstract keyword in the abstract on the weight, the inventor finds that unexpected effects can be achieved by setting the weight in the following way, and the accuracy is effectively improved:

Otherwise, the abstract keyword has a weight of

Wherein the content of the first and second substances,

γ and δ are coefficients, γ + δ =1, γ > δ.

Then the target enterprise patent abstract p _i The keyword weight of (c) can be recorded as:

s303, taking the abstract keywords of the target enterprise as the input of the trained patent abstract word vector model to generate an abstract keyword vector of the target enterprise;

specifically, abstract p of the ith patent of the target enterprise _i The keyword word vector may be noted as:

wherein the content of the first and second substances,

representing target enterprise patent abstract p _i The word vector of the kth keyword of (1), m represents the patent abstract p _i The number of keywords.

S304, carrying out weighted average on the abstract keyword vectors of the target enterprises based on the abstract keyword weights of the target enterprises to obtain the abstract vectors of the target enterprises.

By alpha _i Representing the ith patent abstract p of a target enterprise _i Object of (2)Enterprise abstract vectors; then the calculation formula is as follows:

and S4, calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector.

In specific implementation, a target enterprise business vector beta and a target enterprise patent set P = { P ] are calculated ₁ ,p ₂ ,…,p _n Each patent abstract p in _i Vector alpha _i Cosine distance S of _i And the calculation formula is as follows:

and S5, calculating the correlation between the enterprise patent set and the business based on the similarity.

Specifically, the inventor finds that at present, similarity between the enterprise patent set and the business is mostly judged aiming at a single patent, but the method cannot well evaluate the capacity of the enterprise, and needs to consider the correlation between the whole enterprise patent set and the business, so that the cosine distance average value S of the target enterprise description text vector and all patent abstract vectors thereof needs to be calculated to determine the correlation between the enterprise patent set and the business. And the specific formula is as follows:

and (3) experimental verification:

the accuracy of the embodiment of the invention is verified by specific examples, wherein gamma =2/3 and delta =1/3 are set, a thousand small and medium-sized scientific enterprises in Anhui province are selected, patent set data and enterprise description data are collected, three researchers are invited to mark the data, the correlation between the enterprise patent set and the business is judged, text vectorization is performed by using the conventional method based on the weight of the keyword TF-IDF as a comparison experiment, and the experiment result is shown in Table 1.

TABLE 1 patent and Enterprise relevance metric Performance

	TF-IDF	TF-IDF + location weight
			Mean Square Error (MSE)	0.0085	0.0021
Root Mean Square Error (RMSE)	0.0923	0.0457
			Mean Absolute Error (MAE)	0.0792	0.0347
Mean Absolute Percent Error (MAPE)	0.2776	0.1214

The result shows that under four indexes of mean square error, root mean square error, average absolute error and average absolute percentage error, compared with the existing method for text vectorization based on TF-IDF weight, the accuracy of the correlation measurement of the enterprise patent set and the business can be effectively improved by optimizing the position weight and the TF-IDF weight.

Example 2

The invention also provides an enterprise patent set and business correlation measurement system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method.

It can be understood that the system for measuring correlation between enterprise patent sets and businesses provided in the embodiments of the present invention corresponds to the method for measuring correlation between enterprise patent sets and businesses, and the explanation, examples, and beneficial effects of the relevant contents may refer to the corresponding contents in the method for measuring correlation between enterprise patent sets and businesses, which are not described herein again.

In summary, compared with the prior art, the invention has the following beneficial effects:

the method for measuring the correlation between the enterprise patent set and the business takes the patent semantic information and the enterprise semantic information into consideration, and can effectively measure the correlation between the enterprise patent set and the business thereof. And manual intervention can be avoided, the labor cost is reduced, and the correlation measurement efficiency is improved under the background of big data.

It should be noted that, through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform. With this understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments. In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for measuring correlation between an enterprise patent set and a business is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the enterprise business description text includes: enterprise profiles, business boundaries, product introductions, contest introductions, and company business profiles.

3. The method for measuring the correlation between the enterprise patent sets and the business, as claimed in claim 1, wherein the training of the patent abstract word vector model and the business description text based on the enterprise patent abstract text and the business description text comprises:

preprocessing an enterprise patent abstract text to obtain patent abstract corpora;

4. The method for measuring correlation between enterprise patent sets and businesses as claimed in claim 1, wherein said generating target enterprise business vectors based on the trained enterprise description text word vector model and the target enterprise business description text comprises:

acquiring a target enterprise business description text, and preprocessing the target enterprise business description text;

performing keyword extraction on the preprocessed target enterprise business description text based on a TF-IDF algorithm to obtain business keywords and business keyword weight of the target enterprise;

5. The method as claimed in claim 4, wherein the business keyword weight comprises:

6. The method for measuring correlation between an enterprise patent set and a business as claimed in claim 1, wherein said generating a target enterprise patent abstract vector based on a trained patent abstract word vector model and a target enterprise patent abstract text comprises:

7. The method as claimed in claim 6, wherein the abstract keyword weight comprises:

the abstract key words in the first sentence of the patent abstract are weighted as

Otherwise, the weight of the abstract key words is

Wherein the content of the first and second substances,

ith patent abstract representing target enterpriseTo p is to _i The weight of the kth abstract keyword;

γ and δ are coefficients, γ + δ =1, γ > δ.

8. The method for measuring the correlation between the enterprise patent sets and the business as claimed in claim 1, wherein the calculating the similarity between each target enterprise patent abstract vector and the target enterprise business vector comprises:

9. The method for measuring the correlation between the enterprise patent set and the business as claimed in claim 1, wherein the calculating the correlation between the enterprise patent set and the business based on the similarity comprises:

10. An enterprise patent collection and business correlation measurement system, the system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method of any of claims 1-9.