CN109558481A - Patent and Business Relevancy Measurement Method, device, equipment and readable storage medium storing program for executing - Google Patents

Patent and Business Relevancy Measurement Method, device, equipment and readable storage medium storing program for executing Download PDF

Info

Publication number
CN109558481A
CN109558481A CN201811466764.4A CN201811466764A CN109558481A CN 109558481 A CN109558481 A CN 109558481A CN 201811466764 A CN201811466764 A CN 201811466764A CN 109558481 A CN109558481 A CN 109558481A
Authority
CN
China
Prior art keywords
enterprise
text
characteristic word
frequency
paper
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811466764.4A
Other languages
Chinese (zh)
Other versions
CN109558481B (en
Inventor
高影繁
刘志辉
姚长青
李岩
崔笛
郑明�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Original Assignee
INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA filed Critical INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Priority to CN201811466764.4A priority Critical patent/CN109558481B/en
Publication of CN109558481A publication Critical patent/CN109558481A/en
Application granted granted Critical
Publication of CN109558481B publication Critical patent/CN109558481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application provides a kind of patent and Business Relevancy Measurement Method, device, equipment and readable storage medium storing program for executing.This method comprises: obtaining the patent characteristic word in enterprise patent text;Determine weighted value of each patent characteristic word in enterprise patent text;Text and patent characteristic word are described according to the enterprise of enterprise patent text owned enterprise, determine each patent characteristic word and enterprise is associated with the frequency;It is associated with the frequency based on weighted value of each patent characteristic word in enterprise patent text and each patent characteristic word and enterprise, determines the correlation of enterprise patent text with enterprise.The correlation that enterprise patent text with enterprise are determined by the scheme of the application, the drawbacks of can be avoided artificial judgment, greatly improve the accuracy and efficiency of patent and Business Relevancy judgement.

Description

Patent and Business Relevancy Measurement Method, device, equipment and readable storage medium storing program for executing
Technical field
This application involves technical field of information processing, specifically, this application involves a kind of patents and Business Relevancy to survey Spend method, apparatus, equipment and readable storage medium storing program for executing.
Background technique
Patent has become the embodiment of enterprise key technology and core competitiveness, and the implementation of patent has more the development of enterprise Carry out more important role, and the competition of patent right also becomes one of an enterprise, area or even effective competition power of country.Herein Under background, the application of patent is gradually valued by enterprise, and the annual amount of the application for patent in China is also gradually increasing.However, in patent number While amount increasingly increases, the quality problems of patent gradually appear: on the one hand, country is that patent application has been formulated support and encouraged Policy, and patent is declared to strive for policy reward carelessly by enterprise, patent numbers increase, but patent quality and field shadow It is not high to ring power, positive influence is had no for enterprise development and profit improvement, and have no directly with enterprise's principal products of business (service) Effect.On the other hand, country provides a series of preferential tax policys for new high-tech enterprise, and this is excellent in order to obtain for some enterprises Benevolent administration's plan leads to " pseudo- high-new " phenomenon frequency by purchase patent or other modes application and the little patent of corporate linkage degree Numerous generation.For enterprise and audit mechanism, accurate judgement validity of patent can be improved patent application efficiency, reduce audit Error rate, and judge a key factor of validity of patent for the correlation of patent and enterprise.
Currently, usually carrying out people by domain expert about patent and the directly related property of enterprise's major product (service) judgement Work judgement.The drawbacks of this judgment method is increasingly prominent: on the one hand, the surge of patent numbers is so that patent quality more ichthyosauru is mixed It is miscellaneous, it needs to put into more manpowers and is judged with the time;On the other hand, different experts have different subjective understandings, can sentence Certain error in judgement is generated when disconnected.Therefore, how to carry out quick, effective, mass patent and Business Relevancy judge by Gradually become a urgent problem to be solved.
Summary of the invention
The purpose of the application is intended at least can solve above-mentioned one of technological deficiency.Technical solution used by the application is such as Under:
In a first aspect, this application provides a kind of methods of patent and Business Relevancy Measurement Method, this method comprises:
Obtain the patent characteristic word in enterprise patent text;
Determine weighted value of each patent characteristic word in enterprise patent text;
Text and patent characteristic word are described according to the enterprise of enterprise patent text owned enterprise, determines each patent characteristic Word is associated with the frequency with enterprise;
Being associated with based on weighted value of each patent characteristic word in enterprise patent text and each patent characteristic word and enterprise The frequency determines the correlation of enterprise patent text with enterprise.
Optionally, patent characteristic word includes foundation characteristic word and the corresponding expansion word of foundation characteristic word.
Optionally, enterprise patent text include: the patent text of the held patent of enterprise, enterprise before renaming held patent it is special The patent text of the held patent of the branch of sharp text and enterprise.
Optionally it is determined that weighted value of each patent characteristic word in enterprise patent text, comprising:
According to frequency of each patent characteristic word in patent text file, and/or, what each patent characteristic word was delivered in enterprise Frequency in enterprise's paper determines weighted value of each patent characteristic word in enterprise patent text.
Optionally, the enterprise of the frequency according to each patent characteristic word in patent text file and each patent characteristic word in enterprise Frequency in industry paper determines weighted value of each patent characteristic word in enterprise patent text, comprising:
By following formula, weighted value of each patent characteristic word in enterprise patent text is determined:
wi=idfi*(p_tfi+c_tfi)
Wherein, wiIndicate weighted value of i-th of patent characteristic word in enterprise patent text, idfiIndicate that i-th of patent is special Levy the reverse document-frequency of word, p_tfiIndicate frequency of i-th of patent characteristic word in enterprise patent text, c_tfiIndicate i-th Frequency of a patent characteristic word in enterprise's paper.
Optionally, enterprise's paper include: enterprise delivered enterprise's paper, enterprise's delivered enterprise's paper before renaming, with And enterprise's paper that the branch of enterprise is delivered.
Optionally, this method further include:
According to each patent characteristic word in the first specific field of enterprise's paper frequency of occurrence and enterprise's paper in it is each The weight of first specific field determines frequency of each patent characteristic word in enterprise's paper.
Optionally, the frequency of occurrence according to each patent characteristic word in the first specific field of enterprise's paper and enterprise The weight of each first specific field in paper determines frequency of each patent characteristic word in enterprise's paper, comprising:
By following formula, frequency of each patent characteristic word in enterprise's paper is determined:
Wherein, c_tfiIndicate frequency of i-th of patent characteristic word in enterprise's paper, I indicates the first finger in enterprise's paper Determine the set of field, j indicates j-th of first specific fields in I, c_tfi(j) indicate that i-th of patent characteristic word is discussed in enterprise Frequency of occurrence in text in j-th of first specific fields, c_weight (j) indicate j-th of first specific fields in enterprise's paper Weight.
Optionally, the first specific field includes at least one of the following:
The keyword of the Article Titles of enterprise's paper, the abstract of enterprise's paper and enterprise's paper.
Optionally, text and patent characteristic word are described according to the enterprise of enterprise patent text owned enterprise, determined each special Sharp Feature Words are associated with the frequency with enterprise, comprising:
According to frequency of occurrence and enterprise description of each patent characteristic word in the second specific field that enterprise describes text The weight of each second specific field of text, determine each patent characteristic word and enterprise is associated with the frequency.
Optionally, the frequency of occurrence according to each patent characteristic word in the second specific field that enterprise describes text, and Enterprise describes the weight of each second specific field of text, and determine each patent characteristic word and enterprise is associated with the frequency, comprising:
Based on following formula, determine each patent characteristic word and enterprise is associated with the frequency:
Wherein, r_tfiIndicate the frequency that is associated with of i-th of patent characteristic word with enterprise, J indicates that enterprise describes second in text The set of specific field, l indicate first of second specific fields in J, r_tfi(l) indicate i-th of patent characteristic word in enterprise Frequency of occurrence in text in first of second specific fields described, r_weight (l) indicates that enterprise describes in text first the The weight of two specific fields.
Optionally, the second specific field includes at least one of the following:
The discussion of the board of directors, enterprise, development project emphasis, industry technology _ key technology, core competitiveness, major product, warp Seek range, risk field, structure of personnel and enterprise's essential information.
Optionally, the weighted value based on each patent characteristic word in enterprise patent text and each patent characteristic word and enterprise The association frequency, determine the correlation of enterprise patent text with enterprise, comprising:
Based on following formula, the relevance values of enterprise patent text and enterprise are determined:
Wherein, r indicates that the relevance values of enterprise patent text and enterprise, K indicate the set of patent characteristic word, w (ki) table Show kthiWeighted value of a patent characteristic word in enterprise patent text;r_tf(ki) indicate kthiA patent characteristic word and enterprise The association frequency;
The relevance values of enterprise patent text and enterprise, for characterizing the correlation of enterprise patent text with enterprise.
Second aspect, this application provides the device that a kind of patent and Business Relevancy are estimated, which includes:
Patent characteristic word obtains module, for obtaining the patent characteristic word in enterprise patent text;
Weighted value determining module, for determining weighted value of each patent characteristic word in enterprise patent text;
It is associated with frequency determining module, for describing text and patent according to the enterprise of enterprise patent text owned enterprise Feature Words, determine each patent characteristic word and enterprise is associated with the frequency;
Correlation determining module, for based on weighted value of each patent characteristic word in enterprise patent text and each patent Feature Words are associated with the frequency with enterprise, determine the correlation of enterprise patent text with enterprise.
The third aspect, this application provides a kind of electronic equipment, which includes: processor and memory;
Memory, for storing operational order;
Processor executes institute in any embodiment such as the first aspect of the application for instructing by call operation The patent and Business Relevancy Measurement Method shown.
Fourth aspect, this application provides a kind of computer readable storage mediums, are stored thereon with computer program, the journey Realize that patent shown in any embodiment of the first aspect of the application and Business Relevancy are surveyed when sequence is executed by processor The method of degree.
Technical solution provided by the embodiments of the present application has the benefit that
The scheme of the embodiment of the present application goes out to characterize the patent characteristic of enterprise patent text in enterprise patent Text Feature Extraction Word, and be associated with the frequency according to what patent characteristic word and enterprise described that text determines each patent characteristic word and enterprise, based on it is each specially Weighted value of the sharp Feature Words in enterprise patent text and each patent characteristic word are associated with the frequency with enterprise, determine that enterprise is special The correlation of sharp text and enterprise when determining enterprise patent text and the correlation of enterprise, can effectively be kept away with this solution The drawbacks of manpower-free judges greatly improves the accuracy and efficiency of patent and Business Relevancy judgement.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application Attached drawing to be used is needed to be briefly described.
Fig. 1 is a kind of flow diagram of patent and Business Relevancy Measurement Method provided by the embodiments of the present application;
Fig. 2 is a kind of design of enterprise patent text and the calculation method of the correlation of enterprise provided by the embodiments of the present application Flow chart;
Fig. 3 is a kind of structural schematic diagram of patent and Business Relevancy determination device provided by the embodiments of the present application;
Fig. 4 is the structural schematic diagram of the electronic equipment of one kind provided by the embodiments of the present application.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the application, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.
How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.
The embodiment of the present application provides a kind of patent and Business Relevancy Measurement Method, as shown in Figure 1, this method mainly may be used To include:
Step S110: the patent characteristic word in enterprise patent text is obtained;
Step S120: weighted value of each patent characteristic word in enterprise patent text is determined.
In the embodiment of the present application, patent characteristic word can be special to obtain from the title and abstract of enterprise patent text Industry vocabulary, for characterizing enterprise patent text, the number of acquired patent characteristic word can be set according to actual needs It is fixed.
Step S130: describing text and patent characteristic word according to the enterprise of enterprise patent text owned enterprise, determines each Patent characteristic word is associated with the frequency with enterprise.
In the embodiment of the present application, patent characteristic word is described with the frequency that is associated with of enterprise for characterizing patent characteristic word and enterprise The correlation of text can describe text in enterprise based on patent characteristic word or describe the specific field in text in enterprise Frequency of occurrence determines.
Specifically, enterprise describes to can choose the major product for containing enterprise, main business, core technology in text Deng text, for example, enterprise annual reports text.
In the embodiment of the present application, patent characteristic word describes to occur in text in enterprise, i.e., patent characteristic word and enterprise describe Text produces association, therefore, text can be described according to patent characteristic word and enterprise and determines each patent characteristic word and enterprise The association frequency.
The embodiment of the present application can choose enterprise annual reports text as enterprise and describe text.According to stock supervisory committee's " public publication Company's Information Disclosure Contents of security with format criterion the 2nd --- the content and format of annual report " regulation, enterprise needs basis Criterion is disclosed company information, need to be to main business that company in the report period is engaged in, major product in addition to financial information And application thereof, the information such as the affiliated industry of contents and company, the core competitiveness such as management mode, main achievement driving factors Carry out detailed report.It therefore, include the huge information of enterprise in enterprise annual reports text.
Step S140: based on weighted value of each patent characteristic word in enterprise patent text and each patent characteristic word and enterprise The association frequency of industry, determines the correlation of enterprise patent text with enterprise.
In the embodiment of the present application, the correlation of enterprise patent text and enterprise refers to the main of enterprise patent text and enterprise The degree of correlation of product, main business, core technology etc. can describe text by enterprise and obtain the major product of enterprise, master The information such as business business, core technology.
Method provided by the embodiments of the present application goes out to characterize the patent of enterprise patent text in enterprise patent Text Feature Extraction Feature Words, and it is associated with the frequency according to what patent characteristic word and enterprise described that text determines each patent characteristic word and enterprise, it is based on The frequency that is associated with of weighted value and each patent characteristic word and enterprise of each patent characteristic word in enterprise patent text determines enterprise The correlation of industry patent text and enterprise, the drawbacks of can be avoided artificial judgment, greatly improve patent and sentence with Business Relevancy Disconnected accuracy and efficiency.
In the embodiment of the present application, above-mentioned patent characteristic word includes foundation characteristic word and the corresponding extension of foundation characteristic word Word.
In actual use, foundation characteristic word can be directly acquired from the title and abstract of enterprise patent text, but Be enterprise patent text title and abstract length it is shorter, therefrom extract can to characterize enterprise patent text professional The quantity of foundation characteristic word is extremely limited, and may be not enough to characterize the statistical property of the text feature word frequency of enterprise patent text. Therefore the expansion word of each foundation characteristic word, energy can be obtained by being extended respectively to each foundation characteristic word got The enough quantity for greatly expanding the professional vocabulary that can characterize enterprise patent text being drawn into, effectively improves characterization enterprise patent The statistical property of the text feature word frequency of text.
Specifically, can determine the foundation characteristic word of enterprise patent text by TextRank algorithm.
TextRank algorithm is a kind of sort algorithm based on figure for text, and basic thought is from Google PageRank algorithm utilizes ballot by the way that text segmentation at several component units (such as word, sentence) and is established graph model Mechanism is ranked up the important component in text, and keyword extraction can be realized merely with the information of single document itself.With LDA (Latent Dirichlet Allocation, document subject matter generate model), HMM (Hidden Markov Model, it is hidden Markov model) etc. models it is different, TextRank does not need to carry out learning training to multiple documents in advance, because it is succinct effective And it is used widely.TextRank algorithm is to be arranged using relationship (co-occurrence window) between local vocabulary subsequent key word Sequence is directly extracted from text itself.
Further, it by TextRank algorithm, determines the foundation characteristic word of enterprise patent text, includes the following steps:
1) given enterprise patent text is split according to complete words;
2) for each sentence, participle and part-of-speech tagging processing are carried out, and filters out stop words, only retains specified part of speech Word, such as noun, verb, adjective, the candidate keywords after as retaining;
3) candidate keywords figure G=(V, E) is constructed, wherein V is node set, and E is the set on side.By the candidate 2) generated Then crucial phrase is at appointing the side between two o'clock using cooccurrence relation construction, there are the case where side between two nodes to refer to this The co-occurrence in the window that length is K of vocabulary corresponding to two nodes, K indicate window size, i.e., most K words of co-occurrence;
4) according to formula G=(V, E) above, the weight of each node of iterative diffusion, until convergence;
5) Bit-reversed is carried out to node weights, to obtain most important T word, as candidate keywords, i.e., originally Apply for the foundation characteristic word in embodiment;
6) the most important T word that will 5) obtain, is marked in urtext, if forming adjacent phrase, group Synthesize more word keywords.
For the embodiment of the present application, the foundation characteristic word of enterprise patent text is extracted using TextRank algorithm, not only With stronger professional, and do not need to carry out learning training to multiple documents in advance, thus it is more simple and efficient.
It, can be based on the text term vector library after training, to base after the foundation characteristic word of enterprise patent text has been determined Plinth Feature Words are extended respectively, obtain the corresponding expansion word of foundation characteristic word.
Foundation characteristic word is extended respectively, obtaining the corresponding expansion word of foundation characteristic word may comprise steps of:
By the text term vector library after inquiry training, the first term vector of any foundation characteristic word is obtained;
The cosine similarity value between the first term vector and the second term vector is calculated, the second term vector is the text word after training Term vector in vector library in addition to the first term vector;
Determine that cosine similarity value is greater than the corresponding word of the second term vector of the predetermined number of the first preset threshold, and As the expansion word of any foundation characteristic word.
Specifically, the embodiment of the present application is extended foundation characteristic word using depth learning technology, and method and step is as follows:
1) Word2Vec (term vector) method training text term vector library is utilized
Word in word vector expression text is the core skill that deep learning algorithm is introduced to natural language processing Art.Word2vec is a outstanding modeling tool that is used to obtain term vector of the Google in open source in 2013, main to use CBOW (Continuous Bag-Of-Words, continuous bag of words) and Skip-gram (vertical jump in succession metagrammar) model. Wherein, the embodiment of the present application uses more efficient CBOW neural network model, is trained to the text in presetting database, Text term vector library after being trained.
Exemplary, when text is patent text, the embodiment of the present application carries out on 20,000,000 patent texts of about 10G Training, the patent term vector library after being trained, wherein patent text includes the text fields such as patent title and abstract, is generated Term vector dimension be 100, after training there are about 1,000,000 vocabulary, size about 990M in patent term vector library.
2) foundation characteristic word is extended based on the text term vector library after training
Specifically, when target text is patent text, the foundation characteristic word that each patent text extracts is carried out The method of extension is inquired one by one exactly by the foundation characteristic word of the first predetermined number obtained above by TextRank algorithm Patent term vector library obtains the term vector (the first i.e. above-mentioned term vector) of each foundation characteristic word, and it is similar then to carry out cosine Calculating process is spent, wherein cosine similarity calculating process are as follows: calculate term vector and the patent term vector library of any foundation characteristic word In cosine similarity value between other term vectors (the second i.e. above-mentioned term vector) in addition to the term vector of the foundation characteristic word, root According to cosine similarity value compared with the first preset threshold and predetermined number, the expansion word of the foundation characteristic word is determined.
Further, for each foundation characteristic word determined, it is performed both by above-mentioned cosine similarity value calculating process, So that it is determined that the expansion word of each foundation characteristic word out.
It is exemplary, when foundation characteristic word be " installation procedure ", " cheap ", " water reuse ", " decontamination ", " high-speed railway " and " partially fall ", and when the second predetermined number is 6, the expansion word of available each foundation characteristic word is as shown in table 1:
1 foundation characteristic word of table and its corresponding expansion word
For the embodiment of the present application, text term vector library after giving based on training determines the expansion of each foundation characteristic word The detailed process and operating procedure for opening up word, enable those skilled in the art according to the step in the embodiment of the present application, quickly It is accurately finished the extension of foundation characteristic word, greatly expands the professional vocabulary that can characterize enterprise patent text being drawn into Quantity effectively improves the statistical property of the text feature word frequency of characterization enterprise patent text.
Specifically, it after obtaining the corresponding expansion word of each foundation characteristic word, needs to the further mistake of obtained expansion word Filter, wherein can according to need and only filter out stop words therein, can also only filter out reverse text frequency therein less than second The word of preset threshold can also filter out the word of stop words therein and reverse text frequency less than the second preset threshold simultaneously, lead to It crosses and obtained expansion word is filtered, expansion word is enabled preferably to characterize target text.
In the embodiment of the present application, above-mentioned enterprise patent text may include: the patent text of the held patent of enterprise, enterprise The patent text of the held patent of the patent text of held patent and the branch of enterprise before renaming.
In practical applications, can be by comprehensively, completely collect to enterprise patent, Lai Tigao patent and enterprise's phase The accuracy and confidence level of sex determination are closed, specifically, it may be considered that the history of renaming of enterprise, it will be in institute, enterprise under current name The held patent of enterprise is collected before holding patent and renaming, and the branch of enterprise can be considered, and the son of enterprise is looked forward to The patent that industry, coordinated enterprise, joint venture etc. are held is collected.
Some enterprise is measured in the embodiment of the present application as unit of enterprise and holds each patent journey related to enterprise Degree, it is therefore desirable to obtain whole patents that entire enterprise is held.When obtaining enterprise patent, need to consider the existing name of parent company Claim held patent, consider the history of renaming of enterprise, and the branches such as subsidiary, coordinated enterprise, joint venture are held Patent be included in parent company.The accuracy of this method and credible can be improved in this more full and complete patent collection scheme Degree.
In the embodiment of the present application, weighted value of each patent characteristic word of above-mentioned determination in enterprise patent text be can wrap It includes:
According to frequency of each patent characteristic word in patent text file, and/or, what each patent characteristic word was delivered in enterprise Frequency in enterprise's paper determines weighted value of each patent characteristic word in enterprise patent text.
It, can if frequency of the patent characteristic word in enterprise patent text file is higher in the embodiment of the present application It is mentioned with to think the patent characteristic word in enterprise patent text more, more can technically characterize enterprise patent text, Therefore higher weighted value can be determined to the patent characteristic word.
In actual use, enterprise's paper can also be comprehensively considered to determine patent characteristic word in enterprise patent text Weighted value.
Enterprise's paper can be for using enterprise as authors' working unit in the embodiment of the present application, the paper delivered on periodical.Enterprise Paper and enterprise patent text can reflect the technology of enterprise to a certain extent, if a patent characteristic word is present in It while in patent text file, is also present in enterprise's paper, it is believed that the patent characteristic word more can be characterized technically The enterprise, thus can in conjunction with patent characteristic word frequency in enterprise's paper, it is higher to the frequency in patent text file and The higher patent characteristic word of frequency determines higher weighted value in enterprise's paper.
In the embodiment of the present application, existed according to frequency of each patent characteristic word in patent text file and each patent characteristic word Frequency in enterprise's paper of enterprise determines weighted value of each patent characteristic word in enterprise patent text, may include:
By following formula, weighted value of each patent characteristic word in enterprise patent text is determined:
wi=idfi*(p_tfi+c_tfi) (1)
Wherein, wiIndicate weighted value of i-th of patent characteristic word in enterprise patent text, idfiIndicate that i-th of patent is special Levy the reverse document-frequency of word, p_tfiIndicate frequency of i-th of patent characteristic word in enterprise patent text, c_tfiIndicate i-th Frequency of a patent characteristic word in enterprise's paper, wherein the total number of 1≤i≤N, N expression patent characteristic word.
Wherein, p_tfiCalculation can be with are as follows: (the patent characteristic word is in the title and abstract of enterprise patent text Frequency of occurrence+1)/(the total word number+1 of patent characteristic word) add 1 for not having the word occurred in patent title and abridgments of specifications Smoothing effect can be played.
Based on above-mentioned formula (1), each patent characteristic word (including the basis determined from enterprise patent text can be calculated Feature Words and expansion word) weighted value: w1,w2,…wi,…wN, convenient for carrying out subsequent calculating by each weighted value.
In the embodiment of the present application, enterprise's paper may include: that enterprise's paper for being delivered of enterprise, enterprise are delivered before renaming Enterprise's paper that the branch of enterprise's paper and enterprise is delivered.
In practical applications, it is collected in order to comprehensive, complete to the progress of enterprise's paper, it may be considered that renaming for enterprise is gone through The paper that enterprise is delivered under current enterprise enterprise is delivered under one's name paper and former name is collected by history, and It is contemplated that the branch of enterprise, the paper that the sub- enterprise, coordinated enterprise, joint venture etc. of enterprise are delivered is received Collection.
In the embodiment of the present application, the above method further include:
According to each patent characteristic word in the first specific field of enterprise's paper frequency of occurrence and enterprise's paper in it is each The weight of first specific field determines frequency of each patent characteristic word in enterprise's paper.
Specifically, determining frequency of each patent characteristic word in enterprise's paper by following formula:
c_tfi=∑j∈Ic_tfi(j)*c_weight(j) (2)
Wherein, c_tfiIndicate frequency of i-th of patent characteristic word in enterprise's paper, I indicates the first finger in enterprise's paper Determine the set of field, j indicates j-th of first specific fields in I, c_tfi(j) indicate that i-th of patent characteristic word is discussed in enterprise Frequency of occurrence in text in j-th of first specific fields, c_weight (j) indicate j-th of first specific fields in enterprise's paper Weight, wherein 1≤j≤n, n indicate total number of segment of the first specific field, i.e. the sum of field in set I.
First specific field can specify in the field of enterprise's paper;The set I of first specific field, comprising: the 1st First specific field, the 2nd j-th of the first specific field ... the first specific field of n-th of the first specific field ....
Frequency of occurrence of i-th of patent characteristic word in enterprise's paper in each first specific field are as follows: c_tfi(1), c_ tfi(2)…c_tfi(j)…_tfi(n).
May include at least one of following in above-mentioned each first specific field:
The keyword of the Article Titles of enterprise's paper, the abstract of enterprise's paper and enterprise's paper.
It can be correspondingly arranged the respective weights of above-mentioned each first specific field: c_weight (1), c_weight simultaneously (2)…c_weight(j)…c_weight(n)。
Such as: the Article Titles of enterprise's paper, the keyword weight of enterprise's paper are 3, and the abstract weight of enterprise's paper is 2。
Based on above-mentioned formula (2), each patent characteristic word (including foundation characteristic word and expansion word) can be calculated and looked forward to Frequency in industry paper: c_tf1, c_tf2…c_tfi…c_tfn, convenient for further calculating each patent characteristic word in enterprise patent Weighted value in text.
In the embodiment of the present application, the above-mentioned enterprise according to enterprise patent text owned enterprise describes text and patent is special Word is levied, determine each patent characteristic word and enterprise is associated with the frequency, comprising:
According to frequency of occurrence and enterprise description of each patent characteristic word in the second specific field that enterprise describes text The weight of each second specific field of text, determine each patent characteristic word and enterprise is associated with the frequency.
Specifically, the frequency of occurrence according to each patent characteristic word in the second specific field that enterprise describes text, with And enterprise describes the weight of each second specific field of text, determine each patent characteristic word and enterprise is associated with the frequency, comprising:
Based on following formula, determine each patent characteristic word and enterprise is associated with the frequency:
r_tfi=∑l∈Jr_tfi(l)*r_weight(l) (3)
Wherein, r_tfiIndicate the frequency that is associated with of i-th of patent characteristic word with enterprise, J indicates that enterprise describes second in text The set of specific field, l indicate first of second specific fields in J, r_tfi(l) indicate i-th of patent characteristic word in enterprise Frequency of occurrence in text in first of second specific fields described, r_weight (l) indicates that enterprise describes in text first the Wherein, 1≤l≤m, m indicate total number of segment of the second specific field to the weight of two specific fields, i.e. the sum of field in set J.
Second specific field can specify in the field that enterprise describes text, the set J of the second specific field, comprising: 1st field, second first of field ..., m-th of field ... field.
May include at least one of following in above-mentioned each field when enterprise describes text selection enterprise annual reports text:
The discussion of the board of directors, enterprise, development project emphasis, industry technology _ key technology, core competitiveness, major product, warp Seek range, risk field, structure of personnel and enterprise's essential information.
In the embodiment of the present application, the information in enterprise annual reports text is carried out to extract classification, has mainly included enterprise director Can discuss, development project emphasis, industry technology _ key technology, core competitiveness, major product, business scope, risk field, The fields such as structure of personnel, enterprise's essential information.Development project emphasis, industry technology _ key technology in selection annual report, core are competing It strives 7 field building enterprises such as power, board of directors's discussion, major product, business scope, risk field and describes text.And according to Analysis to each field assigns different weights for different field, and numerical value is higher, and weight is bigger.
Simple sorting coding method is used to the setting of weight in the embodiment of the present application: first being arranged according to the importance of field Sequence.Specifically, four development project emphasis, industry technology _ key technology, core competitiveness, major product fields are enterprise The direct description of itself principal products of business (service), key technology etc., word profession, territoriality are strong, importance highest;Business scope Field is that the overall of our company's management functions is introduced, and range is larger, word is more general, and risk field is from market, technology, policy Etc. the risk that is faced of description enterprise, business scope field and the importance of two fields of risk field take second place,;The board of directors Discuss that field is prediction of the management level for the evaluation analysis of this enterprise past management state and to Future Development of Enterprise trend Property judgement, wherein comprising explanation to financial situation described in enterprise financial report and management performance, importance is minimum.So Afterwards according to the importance degree of each field, initial weight is distributed it according to natural number sequence size, multiple importance are identical Field can distribute same natural number as initial weight.Specifically, by development project emphasis, industry technology _ key technology, Core competitiveness, four fields of major product initial weight be set as 3;By business scope field and two fields of risk field Initial weight be set as 2;The board of directors is discussed that the initial weight of field is set as 1.Finally by the initial weight of each field into Row normalized is normalized as a result, specifically normalization result is as follows: development project emphasis, industry technology _ key skill Art, the normalization result of four fields of core competitiveness and major product are 0.18;Business scope field and risk field Normalization result be 0.12;The board of directors discusses that the normalization result of field is 0.04.
Weight used by finally calculating in above-mentioned formula (3) is the normalization result of the initial weight of each field.
The setting result and normalization of the initial weight of each second specific field in one example of the application are shown in table 2 As a result.
Wherein, above-mentioned each field is the description to enterprise technology, for example, industry technology _ key technology field is to be used for The field of the key technology of enterprise is described.It is understood that in practical applications, the describing mode of above-mentioned each field may Not exactly the same, still by taking industry technology _ key technology field as an example, which can be described as industry technology field, crucial skill Art field, major technique field etc..
The setting of 2 second specific field initial weight of table and normalization result
Field Initial weight Normalize result
Development project emphasis 3 0.18
Industry technology _ key technology 3 0.18
Core competitiveness 3 0.18
Major product 3 0.18
Business scope 2 0.12
Risk field 2 0.12
The board of directors discusses 1 0.04
I-th of patent characteristic word describes the frequency of occurrence in text in each second specific field in enterprise are as follows: r_tfi(1), r_tfi(2)…r_tfi(l)…c_tfi(m)。
It can be correspondingly arranged the respective weights of above-mentioned each second specific field: c_weight (1), c_weight simultaneously (2)…c_weight(l)…c_weight(m)。
Based on above-mentioned formula (3), can calculate determine i-th of patent characteristic word and enterprise be associated with frequency r_tfi, just The correlation of text is described in further enterprise patent text with enterprise.
In the embodiment of the present application, above-mentioned weighted value based on each patent characteristic word in enterprise patent text and it is each specially Sharp Feature Words are associated with the frequency with enterprise, determine the correlation of enterprise patent text with enterprise, comprising:
Based on following formula, the relevance values of enterprise patent text and enterprise are determined:
Wherein, r indicates that the relevance values of enterprise patent text and enterprise, K indicate the set of patent characteristic word, w (ki) table Show kthiWeighted value of a patent characteristic word in enterprise patent text;r_tf(ki) indicate kthiA patent characteristic word and enterprise The association frequency.
Wherein, w (ki) specifically can be and be calculated by formula (1), r_tf (ki) specifically can be and counted by formula (1) It obtains.
The relevance values of enterprise patent text and enterprise are related for characterizing the correlation of enterprise patent text with enterprise The property more big then enterprise patent text of value and the correlation of enterprise are stronger.
In above-mentioned formula (4), based on each patent characteristic word (including the foundation characteristic word determined from enterprise patent text And expansion word) weighted value: w (k1), w (k2)…w(ki)…w(kN) and each patent characteristic word and enterprise be associated with frequency It is secondary: r_tf (k1), r_tf (k2)…r_tf(ki)…r_tf(kN), it is capable of determining that state enterprise patent text related to enterprise Property value r.
The embodiment of the present application can carry out after the relevance values for completing all patents and enterprise calculate according to relevance values The judgement of correlation power, specifically can be used such as under type: the calculated result of relevance values is normalized, then according to According to quartile method, patent is divided into " strong correlation ", " middle related ", " weak correlation ", " uncorrelated " to Business Relevancy intensity Four grades, and related patents are marked.Wherein, the division between each grade can be set according to actual needs, for example, When correlation is 0 or correlation is not more than 0.01, determine that patent and Business Relevancy intensity are " uncorrelated ".Actually make It, can be by relevance measure result, that is, different enterprise patents text and the relevance values of enterprise by the way of list displaying in It shows, specifically, can sort to every patent according to relevance values, is such as shown according to the mode of relevance values from high to low, Allow users to the degree of relevancy for readily telling enterprise's each patent and enterprise.
Fig. 2 shows a kind of enterprise patent texts provided by the embodiments of the present application and the calculation method of the correlation of enterprise Design flow diagram.Description as content shown in Fig. 2 and above is it is found that a kind of method master provided by the embodiment of the present application May include enterprise patent text obtain, the extraction and extension of patent characteristic word, based on enterprise patent text, enterprise's paper with And enterprise describes text, determines weighted value and determining patent characteristic word and enterprise of the patent characteristic word in enterprise patent text The association frequency (patent characteristic word weighted value and patent characteristic word shown in Fig. 2, which are associated with the frequency with enterprise, to be estimated), last base Several sides such as the correlation calculations of enterprise patent text and enterprise are carried out in patent characteristic word weighted value and enterprise's association frequency Face.
Based on principle identical with method shown in Fig. 1, the embodiment of the present application also provides a kind of patents and enterprise's phase Closing property determination device 20, as shown in figure 3, the patent includes: with Business Relevancy determination device 20
Patent characteristic word obtains module 210, for obtaining the patent characteristic word in enterprise patent text;
Weighted value determining module 220, for determining weighted value of each patent characteristic word in enterprise patent text;
It is associated with frequency determining module 230, for describing text, Yi Jizhuan according to the enterprise of enterprise patent text owned enterprise Sharp Feature Words, determine each patent characteristic word and enterprise is associated with the frequency;
Correlation determining module 240, for based on weighted value of each patent characteristic word in enterprise patent text and each Patent characteristic word is associated with the frequency with enterprise, determines the correlation of enterprise patent text with enterprise.
The patent and Business Relevancy determination device of the offer of the embodiment of the present application, going out in enterprise patent Text Feature Extraction can The patent characteristic word of enterprise patent text is characterized, and text is described according to patent characteristic word and enterprise and determines each patent characteristic word Be associated with the frequency with enterprise, based on weighted value of each patent characteristic word in enterprise patent text and each patent characteristic word with The association frequency of enterprise greatly mentions the drawbacks of determining the correlation of enterprise patent text with enterprise, can be avoided artificial judgment The accuracy and efficiency of high patent and Business Relevancy judgement.
Optionally, patent characteristic word includes foundation characteristic word and the corresponding expansion word of foundation characteristic word.
Optionally, enterprise patent text include: the patent text of the held patent of enterprise, enterprise before renaming held patent it is special The patent text of the held patent of the branch of sharp text and enterprise.
Optionally, weighted value determining module is specifically used for:
According to frequency of each patent characteristic word in patent text file, and/or, what each patent characteristic word was delivered in enterprise Frequency in enterprise's paper determines weighted value of each patent characteristic word in enterprise patent text.
Optionally, weighted value determining module is in the frequency and each patent according to each patent characteristic word in patent text file Frequency of the Feature Words in enterprise's paper of enterprise, when determining weighted value of each patent characteristic word in enterprise patent text, tool Body is used for:
By following formula, weighted value of each patent characteristic word in enterprise patent text is determined:
wi=idfi*(p_tfi+c_tfi)
Wherein, wiIndicate weighted value of i-th of patent characteristic word in enterprise patent text, idfiIndicate that i-th of patent is special Levy the reverse document-frequency of word, p_tfiIndicate frequency of i-th of patent characteristic word in enterprise patent text, c_tfiIndicate i-th Frequency of a patent characteristic word in enterprise's paper.
Optionally, enterprise's paper include: enterprise delivered enterprise's paper, enterprise's delivered enterprise's paper before renaming, with And enterprise's paper that the branch of enterprise is delivered.
Optionally, the device further include:
Frequency determining module, according to frequency of occurrence of each patent characteristic word in the first specific field of enterprise's paper, with And in enterprise's paper each first specific field weight, determine frequency of each patent characteristic word in enterprise's paper.
Optionally, frequency determining module is specifically used for:
By following formula, frequency of each patent characteristic word in enterprise's paper is determined:
Wherein, c_tfiIndicate frequency of i-th of patent characteristic word in enterprise's paper, I indicates the first finger in enterprise's paper Determine the set of field, j indicates j-th of first specific fields in I, c_tfi(j) indicate that i-th of patent characteristic word is discussed in enterprise Frequency of occurrence in text in j-th of first specific fields, c_weight (j) indicate j-th of first specific fields in enterprise's paper Weight.
Optionally, presetting field includes at least one of the following: in enterprise's paper
The keyword of the Article Titles of enterprise's paper, the abstract of enterprise's paper and enterprise's paper.
Optionally, it is associated with frequency determining module, is specifically used for:
According to frequency of occurrence and enterprise description of each patent characteristic word in the second specific field that enterprise describes text The weight of each second specific field of text, determine patent characteristic word and enterprise is associated with the frequency.
Optionally, it is associated with frequency determining module, is specifically used for:
Based on following formula, determine enterprise patent text and enterprise is associated with the frequency:
Wherein, r_tfiIndicate the frequency that is associated with of i-th of patent characteristic word with enterprise, J indicates that enterprise describes second in text The set of specific field, l indicate first of second specific fields in J, r_tfi(l) indicate i-th of patent characteristic word in enterprise Frequency of occurrence in text in first of second specific fields described, r_weight (l) indicates that enterprise describes in text first the The weight of two specific fields.
Optionally, the second specific field includes at least one of the following:
The discussion of the board of directors, enterprise, development project emphasis, industry technology _ key technology, core competitiveness, major product, warp Seek range, risk field, structure of personnel and enterprise's essential information.
Optionally, correlation determining module is specifically used for:
Based on following formula, the relevance values of enterprise patent text and enterprise are determined:
Wherein, r indicates that the relevance values of enterprise patent text and enterprise, K indicate the set of patent characteristic word, w (ki) table Show kthiWeighted value of a patent characteristic word in enterprise patent text;r_tf(ki) indicate kthiA patent characteristic word and enterprise The association frequency;
The relevance values of enterprise patent text and enterprise, for characterizing the correlation of enterprise patent text with enterprise.
It is realized it is understood that above-mentioned each module of patent and Business Relevancy determination device in the present embodiment has The function of patent and Business Relevancy Measurement Method corresponding steps in embodiment shown in Fig. 1.The function can be by hard Part is realized, corresponding software realization can also be executed by hardware.The hardware or software include one or more and above-mentioned function Corresponding module.Above-mentioned module can be software and/or hardware, and above-mentioned each module can be implemented separately, can also be with multiple moulds Block integration realization.Fig. 1 specifically may refer to for the function description of above-mentioned patent and each module of Business Relevancy determination device Shown in patent description corresponding with Business Relevancy Measurement Method in embodiment, details are not described herein.
The embodiment of the present application provides a kind of electronic equipment, as shown in figure 4, electronic equipment shown in Fig. 4 2000 includes: place Manage device 2001 and memory 2003.Wherein, processor 2001 is connected with memory 2003, is such as connected by bus 2002.It is optional , electronic equipment 2000 can also include transceiver 2004.It should be noted that transceiver 2004 is not limited to one in practical application A, the structure of the electronic equipment 2000 does not constitute the restriction to the embodiment of the present application.
Wherein, processor 2001 is applied in the embodiment of the present application, for realizing method shown in above method embodiment. Transceiver 2004 may include Receiver And Transmitter, and transceiver 2004 is applied in the embodiment of the present application, real when for executing The function that the electronic equipment of existing the embodiment of the present application is communicated with other equipment.
Processor 2001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure Various illustrative logic blocks, module and circuit.Processor 2001 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..
Bus 2002 may include an access, and information is transmitted between said modules.Bus 2002 can be pci bus or Eisa bus etc..Bus 2002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 4 convenient for indicating One thick line indicates, it is not intended that an only bus or a type of bus.
Memory 2003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.
Optionally, memory 2003 is used to store the application code for executing application scheme, and by processor 2001 It is executed to control.Processor 2001 is for executing the application code stored in memory 2003, to realize above method reality Apply patent shown in example and Business Relevancy Measurement Method.
Electronic equipment provided by the embodiments of the present application is suitable for above method any embodiment, and details are not described herein.
The embodiment of the present application provides a kind of electronic equipment, compared with prior art, goes out energy in enterprise patent Text Feature Extraction The patent characteristic word of enterprise patent text is enough characterized, and text is described according to patent characteristic word and enterprise and determines each patent characteristic Word is associated with the frequency with enterprise, the weighted value and each patent characteristic word based on each patent characteristic word in enterprise patent text With enterprise the drawbacks of being associated with the frequency, determine the correlation of enterprise patent text with enterprise, can be avoided artificial judgment, greatly Improve the accuracy and efficiency of patent and Business Relevancy judgement.
The embodiment of the present application provides a kind of computer readable storage medium, is stored on the computer readable storage medium Computer program realizes patent shown in above method embodiment and the Business Relevancy side of estimating when the program is executed by processor Method.
Computer readable storage medium provided by the embodiments of the present application is suitable for above method any embodiment, herein not It repeats again.
The embodiment of the present application provides a kind of computer-readable storage matter, compared with prior art, in enterprise patent text The patent characteristic word that can characterize enterprise patent text is extracted, and text is described according to patent characteristic word and enterprise and is determined respectively Patent characteristic word is associated with the frequency with enterprise, based on weighted value of each patent characteristic word in enterprise patent text and it is each specially Sharp Feature Words are associated with the frequency with enterprise, determine the correlation of enterprise patent text with enterprise, can be avoided the disadvantage of artificial judgment The accuracy and efficiency of patent and Business Relevancy judgement are greatly improved in end.
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (16)

1. a kind of patent and Business Relevancy Measurement Method characterized by comprising
Obtain the patent characteristic word in enterprise patent text;
Determine weighted value of each patent characteristic word in the enterprise patent text;
Text and the patent characteristic word are described according to the enterprise of enterprise patent text owned enterprise, is determined each described Patent characteristic word is associated with the frequency with the enterprise;
Based on weighted value of each patent characteristic word in the enterprise patent text and each patent characteristic word and institute The association frequency for stating enterprise determines the correlation of the enterprise patent text and the enterprise.
2. patent according to claim 1 and Business Relevancy Measurement Method, which is characterized in that the patent characteristic word packet Include foundation characteristic word and the corresponding expansion word of the foundation characteristic word.
3. patent according to claim 1 and Business Relevancy Measurement Method, which is characterized in that the enterprise patent text It include: the patent text of the held patent of the enterprise, the enterprise patent text of held patent and the enterprise before renaming The held patent of branch patent text.
4. patent according to claim 1 and Business Relevancy Measurement Method, which is characterized in that the determination is each described special Weighted value of the sharp Feature Words in the enterprise patent text, comprising:
According to frequency of each patent characteristic word in the patent text file, and/or, each patent characteristic word is in institute The frequency in enterprise's paper that enterprise delivers is stated, determines weight of each patent characteristic word in the enterprise patent text Value.
5. patent according to claim 4 and Business Relevancy Measurement Method, which is characterized in that special according to each patent Word is levied in the frequency of frequency and each patent characteristic word in enterprise's paper of the enterprise in the patent text file, Determine weighted value of each patent characteristic word in the enterprise patent text, comprising:
By following formula, weighted value of each patent characteristic word in the enterprise patent text is determined:
wi=idfi*(p_tfi+c_tfi)
Wherein, wiIndicate weighted value of i-th of patent characteristic word in the enterprise patent text, idfiIndicate i-th of institute State the reverse document-frequency of patent characteristic word, p_tfiIndicate i-th of patent characteristic word in the enterprise patent text Frequency, c_tfiIndicate frequency of i-th of patent characteristic word in enterprise's paper.
6. patent according to claim 4 and Business Relevancy Measurement Method, which is characterized in that enterprise's paper packet It includes: enterprise's paper that the enterprise is delivered, the enterprise delivered enterprise's paper and branch's machine of the enterprise before renaming Enterprise's paper that structure is delivered.
7. patent according to claim 5 and Business Relevancy Measurement Method, which is characterized in that further include:
According to frequency of occurrence of each patent characteristic word in the first specific field of enterprise's paper and enterprise's paper In each first specific field weight, determine frequency of each patent characteristic word in enterprise's paper.
8. patent according to claim 7 and Business Relevancy Measurement Method, which is characterized in that described according to each described special Sharp Feature Words in the first specific field of enterprise's paper frequency of occurrence and enterprise's paper in it is each described first specified The weight of field determines frequency of each patent characteristic word in enterprise's paper, comprising:
By following formula, frequency of each patent characteristic word in enterprise's paper is determined:
Wherein, c_tfiIndicate frequency of i-th of patent characteristic word in enterprise's paper, I indicates institute in enterprise's paper The set of the first specific field is stated, j indicates j-th of first specific fields in I, c_tfi(j) indicate that i-th of patent is special Frequency of occurrence of the word in enterprise's paper in j-th of first specific fields is levied, c_weight (j) indicates enterprise's paper In j-th of first specific fields weight.
9. patent according to claim 7 and Business Relevancy Measurement Method, which is characterized in that first specific field It includes at least one of the following:
The keyword of the Article Titles of enterprise's paper, the abstract of enterprise's paper and enterprise's paper.
10. patent according to claim 1 and Business Relevancy Measurement Method, which is characterized in that described according to the enterprise The enterprise of industry patent text owned enterprise describes text and the patent characteristic word, determines each patent characteristic word and institute State the association frequency of enterprise, comprising:
According to frequency of occurrence of each patent characteristic word in the second specific field that enterprise describes text and the enterprise The weight for describing each second specific field of text, determine each patent characteristic word and the enterprise is associated with the frequency.
11. patent according to claim 10 and Business Relevancy Measurement Method, which is characterized in that described according to each described Frequency of occurrence and the enterprise of the patent characteristic word in the second specific field that enterprise describes text describe each institute of text The weight for stating the second specific field, determine each patent characteristic word and the enterprise is associated with the frequency, comprising:
Based on following formula, determine each patent characteristic word and the enterprise is associated with the frequency:
Wherein, r_tfiIndicate the frequency that is associated with of i-th the patent characteristic word and the enterprise, J indicates enterprise's description text The set of second specific field described in this, l indicate first of second specific fields in J, r_tfi(l) it indicates described in i-th Patent characteristic word describes the frequency of occurrence in text in first of second specific fields in the enterprise, and r_weight (l) indicates institute State the weight that enterprise describes first of second specific fields in text.
12. patent according to claim 11 and Business Relevancy Measurement Method, which is characterized in that second designated word Section includes at least one of the following:
The discussion of the board of directors, enterprise, major product, manages model at development project emphasis, industry technology _ key technology, core competitiveness It encloses, risk field, structure of personnel and enterprise's essential information.
13. patent described in any one of -12 and Business Relevancy Measurement Method according to claim 1, which is characterized in that described Based on weighted value of each patent characteristic word in the enterprise patent text and each patent characteristic word and the enterprise The association frequency of industry determines the correlation of the enterprise patent text and the enterprise, comprising:
Based on following formula, the relevance values of the enterprise patent text and the enterprise are determined:
Wherein, r indicates that the relevance values of the enterprise patent text and the enterprise, K indicate the set of the patent characteristic word, w(ki) indicate kthiWeighted value of a patent characteristic word in enterprise patent text;r_tf(ki) indicate kthiIt is a described special Sharp Feature Words are associated with the frequency with the enterprise;
The relevance values of the enterprise patent text and the enterprise, for characterizing the enterprise patent text and the enterprise Correlation.
14. a kind of patent and Business Relevancy determination device characterized by comprising
Patent characteristic word obtains module, for obtaining the patent characteristic word in enterprise patent text;
Weighted value determining module, for determining weighted value of each patent characteristic word in the enterprise patent text;
It is associated with frequency determining module, for describing text and described according to the enterprise of enterprise patent text owned enterprise Patent characteristic word, determine each patent characteristic word and the enterprise is associated with the frequency;
Correlation determining module, for based on weighted value of each patent characteristic word in the enterprise patent text and each The patent characteristic word is associated with the frequency with the enterprise, determines the correlation of the enterprise patent text and the enterprise.
15. a kind of electronic equipment, which is characterized in that it includes processor and memory;
The memory, for storing operational order;
The processor, for executing special described in any one of the claims 1-13 by calling the operational order Benefit and Business Relevancy Measurement Method.
16. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Patent and Business Relevancy Measurement Method described in any one of the claims 1-13 are realized when execution.
CN201811466764.4A 2018-12-03 2018-12-03 Method, device and equipment for measuring correlation between patent and enterprise and readable storage medium Active CN109558481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811466764.4A CN109558481B (en) 2018-12-03 2018-12-03 Method, device and equipment for measuring correlation between patent and enterprise and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811466764.4A CN109558481B (en) 2018-12-03 2018-12-03 Method, device and equipment for measuring correlation between patent and enterprise and readable storage medium

Publications (2)

Publication Number Publication Date
CN109558481A true CN109558481A (en) 2019-04-02
CN109558481B CN109558481B (en) 2022-05-24

Family

ID=65868432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811466764.4A Active CN109558481B (en) 2018-12-03 2018-12-03 Method, device and equipment for measuring correlation between patent and enterprise and readable storage medium

Country Status (1)

Country Link
CN (1) CN109558481B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114331766A (en) * 2022-01-05 2022-04-12 中国科学技术信息研究所 Method and device for determining patent technology core degree, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028651A1 (en) * 2001-07-31 2003-02-06 Schreckengast James O. Proprietary information utility
WO2005041094A1 (en) * 2003-10-23 2005-05-06 Intellectual Property Bank Corp. Enterprise evaluation device and enterprise evaluation program
CN101563687A (en) * 2006-10-13 2009-10-21 谷歌公司 Business listing search
CN103885934A (en) * 2014-02-19 2014-06-25 中国专利信息中心 Method for automatically extracting key phrases of patent documents
CN104376406A (en) * 2014-11-05 2015-02-25 上海计算机软件技术开发中心 Enterprise innovation resource management and analysis system and method based on big data
CN106446129A (en) * 2016-09-19 2017-02-22 合肥清浊信息科技有限公司 Patent data analysis system
CN107133726A (en) * 2017-04-20 2017-09-05 北京理工大学 Products scheme competitiveness evaluation method based on patent information
CN107193915A (en) * 2017-05-15 2017-09-22 北京因果树网络科技有限公司 A kind of company information sorting technique and device
CN107247806A (en) * 2017-07-04 2017-10-13 山东浪潮云服务信息科技有限公司 A kind of patent big data analysis and enterprise's application platform
CN107844478A (en) * 2017-11-20 2018-03-27 山东浪潮云服务信息科技有限公司 A kind of processing method and processing device of patent document
CN108563636A (en) * 2018-04-04 2018-09-21 广州杰赛科技股份有限公司 Extract method, apparatus, equipment and the storage medium of text key word
CN108804421A (en) * 2018-05-28 2018-11-13 中国科学技术信息研究所 Text similarity analysis method, device, electronic equipment and computer storage media

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028651A1 (en) * 2001-07-31 2003-02-06 Schreckengast James O. Proprietary information utility
WO2005041094A1 (en) * 2003-10-23 2005-05-06 Intellectual Property Bank Corp. Enterprise evaluation device and enterprise evaluation program
CN101563687A (en) * 2006-10-13 2009-10-21 谷歌公司 Business listing search
CN103885934A (en) * 2014-02-19 2014-06-25 中国专利信息中心 Method for automatically extracting key phrases of patent documents
CN104376406A (en) * 2014-11-05 2015-02-25 上海计算机软件技术开发中心 Enterprise innovation resource management and analysis system and method based on big data
CN106446129A (en) * 2016-09-19 2017-02-22 合肥清浊信息科技有限公司 Patent data analysis system
CN107133726A (en) * 2017-04-20 2017-09-05 北京理工大学 Products scheme competitiveness evaluation method based on patent information
CN107193915A (en) * 2017-05-15 2017-09-22 北京因果树网络科技有限公司 A kind of company information sorting technique and device
CN107247806A (en) * 2017-07-04 2017-10-13 山东浪潮云服务信息科技有限公司 A kind of patent big data analysis and enterprise's application platform
CN107844478A (en) * 2017-11-20 2018-03-27 山东浪潮云服务信息科技有限公司 A kind of processing method and processing device of patent document
CN108563636A (en) * 2018-04-04 2018-09-21 广州杰赛科技股份有限公司 Extract method, apparatus, equipment and the storage medium of text key word
CN108804421A (en) * 2018-05-28 2018-11-13 中国科学技术信息研究所 Text similarity analysis method, device, electronic equipment and computer storage media

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANDREW BUCHANAN: "Measuring the Evolution of the Drivers of Technological Innovation in the Patent Record", 《 ARTIFICIAL LIFE》 *
刘萍芬: "高管薪酬差距、产品市场竞争与企业技术创新", 《中国优秀博硕士学位论文全文数据库(硕士)》 *
陈芨熙等: "基于向量空间模型和专利文献特征的相似专利确定方法", 《浙江大学学报(工学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114331766A (en) * 2022-01-05 2022-04-12 中国科学技术信息研究所 Method and device for determining patent technology core degree, electronic equipment and storage medium
CN114331766B (en) * 2022-01-05 2022-07-08 中国科学技术信息研究所 Method and device for determining patent technology core degree, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109558481B (en) 2022-05-24

Similar Documents

Publication Publication Date Title
Qin et al. DuerQuiz: A personalized question recommender system for intelligent job interview
CN109344236A (en) One kind being based on the problem of various features similarity calculating method
Mazidi et al. Infusing nlu into automatic question generation
Du et al. Microblog bursty topic detection based on user relationship
CN102262663B (en) Method for repairing software defect reports
CN109255012A (en) A kind of machine reads the implementation method and device of understanding
WO2020060718A1 (en) Intelligent search platforms
KR20170009692A (en) Stock fluctuatiion prediction method and server
CN112667777A (en) Classification method for client incoming call appeal
Patel et al. Extractive Based Automatic Text Summarization.
CN110083837A (en) A kind of keyword generation method and device
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
Maziarz et al. PlWordNet as the cornerstone of a toolkit of lexico-semantic resources
CN109558481A (en) Patent and Business Relevancy Measurement Method, device, equipment and readable storage medium storing program for executing
CN111597793B (en) Paper innovation measuring method based on SAO-ADV structure
Visser et al. Sentiment and intent classification of in-text citations using bert
Calamo et al. CICERO: a GPT2-based writing assistant to investigate the effectiveness of specialized LLMs’ applications in e-justice
Li-Juan et al. A classification method of Vietnamese news events based on maximum entropy model
Choi et al. R&D proposal screening system based on text-mining approach
Dabholkar et al. Automatic document summarization using sentiment analysis
Song et al. Hwe: Hybrid word embeddings for text classification
Wang et al. Query construction based on concept importance for effective patent retrieval
Ye et al. A proposal: Interactively learning to summarise timelines by reinforcement learning
Goyal et al. Query representation through lexical association for information retrieval
Miliano et al. Machine Learning-based Automated Problem Categorization in a Helpdesk Ticketing Application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant