Disclosure of Invention
The invention provides an intellectual property information processing method and device, which are used for solving the technical problem of intelligent processing of intellectual property information, so that enterprises know the current technical development and patent layout condition in advance.
The invention realizes intelligent processing of intellectual property information by the following modes, which comprises the following steps:
constructing a patent information base, namely constructing a multi-way tree index of class numbers according to an international classification table, correspondingly distributing a storage space for each class number, storing class number semantic vectors corresponding to the class numbers, extracting keywords of the class numbers according to class number definition in advance, and storing the keywords into the class number semantic vectors so as to construct a class number index table; supplementing class number semantic vectors in a class number index table according to the patent information crawled from the patent database; generating a patent semantic vector by adopting a text vector generation method aiming at each crawled patent, and generating a patent information base;
step two, self patent information is crawled from the patent database, and a self patent information base is formed by adopting the method in the step one;
step three, setting competitor information, and crawling the patent information of the competitor from the patent database to form a competitor patent information base;
step four, according to the index of the classification number, the self patent information base and the competitor patent information base are compared in a crossing way, and the direction similarity and the patent similarity are obtained; the cross comparison comprises traversing each patent in the same class number in the two information bases, calculating the similarity of the patent according to the semantic vector of the patent, calculating the similarity of the direction according to the semantic vector of the class number in the two information bases;
step five, visualizing the ordering condition of the direction similarity, wherein a user can adjust the research and development direction according to the ordering of the direction similarity; based on the similarity of the patents, given the patented expectations of the patents, the user can refer to the values to determine the manner in which the patents are processed.
Meanwhile, the method also comprises the steps of extracting text vectors of the technical proposal to be submitted, calculating similarity with semantic vectors of all class numbers respectively, taking class numbers with similarity exceeding a certain threshold value as recommended class numbers of the proposal, and determining whether to submit the proposal to a patent application according to the directional similarity of the recommended class numbers.
According to the intelligent intellectual property information processing method and system based on the intelligent intellectual property information processing system, patent layout conditions of enterprises and competitors of the enterprises are analyzed from two angles of classification numbers and individual patents, the computing direction similarity and the patent similarity guide the enterprises to adjust research and development directions and patent layouts of the enterprises, the patenting possibility of the technical scheme to be submitted can be intelligently analyzed, unnecessary application cost is avoided from being wasted by the enterprises, and intelligent intellectual property information processing analysis is realized.
Detailed Description
The embodiments are described in detail below with reference to the accompanying drawings.
The method of the invention is shown in the flow chart of figure 1:
constructing a patent information base, namely constructing a multi-way tree index of class numbers according to an international classification table, correspondingly distributing a storage space for each class number, storing class number semantic vectors corresponding to the class numbers, extracting keywords of the class numbers according to class number definition in advance, and storing the keywords into the class number semantic vectors so as to construct a class number index table; supplementing class number semantic vectors in a class number index table according to the patent information crawled from the patent database; generating a patent semantic vector by adopting a text vector generation method aiming at each crawled patent, and generating a patent information base;
when the multi-way tree index based on the IPC classification table is established, the root node is a part index of the classification number, a root node is established for each part, the multi-way tree is divided in sequence according to the sequence of the major class, the minor class, the major group and the minor group of the IPC, a corresponding class number semantic vector is generated for each node, and the corresponding class number semantic vector is updated in real time, so that the subsequent classification of other patent information can be guided; meanwhile, the class number index table can be constructed by only selecting a part according to the needs and can be determined according to the specific research field of enterprises, so that the data processing capacity is reduced.
When the semantic vectors of the classification numbers are supplemented, the classification number information given by the patents is utilized, the semantic vectors of the corresponding classification numbers are supplemented and updated according to keywords given by the patents, the weights of the keywords in the semantic vectors are adjusted according to keyword sources, wherein the keyword sources comprise abstracts, background technology, claims and specifications, the summary information relates to the invention point information of the patents, and the background technology can reflect the information of the field to which the patents belong, so that higher weight information is set for the keywords extracted from the abstracts and the background technology.
The text vector generation method can adopt various known text vector generation methods, such as a neural network, doc2vec and the like.
Step two, self patent information is crawled from the patent database, and a self patent information base is formed by adopting the method in the step one;
the self-patent information base comprises patent information formed by a self-classification number index table and self-patent semantic vectors; the class number semantic vector in the self class number index table is updated according to the self patent information.
Step three, setting competitor information, crawling the patent information of the competitor from the patent database, and forming a competitor patent information base by adopting the method in the step one;
the competitor patent information base comprises a competitor class number index table and patent information formed by competitor patent semantic vectors; the classification number semantic vector in the competitor classification number index table is updated according to the competitor patent information.
Step four, according to the index table of the classification number, the self patent information base and the competitor patent information base are compared in a crossing way, and the direction similarity and the patent similarity are obtained; the cross comparison comprises traversing each patent in the same class number in the two information bases, calculating the similarity of the patent according to the semantic vector of the patent, calculating the semantic vector of the class number in the same class number in the two information bases, and calculating the similarity of the direction according to the semantic vector of the class number.
For example, for class a, patents a1, b1, c1 are in the own patent information base, and patents a2, b2 are in the competitor patent information base; then comparing the patent semantic vector similarity of (a 1, a 2), (a 1, b 2), (b 1, a 2), (b 1, b 2), (c 1, a 2), (c 1, b 2), defining a patent similarity; and calculating the similarity between the class number semantic vector of the class number A in the self patent information base and the class number semantic vector of the class number A in the competitor patent information base, and defining the direction similarity.
The directional similarity can be divided into a large group of similarity and a small group of similarity. According to the overlapping condition of competitors and the research field of the competitors, the parameters can be dynamically adjusted to calculate the similarity of a large group or the similarity of a small group, so that finer research and development direction guidance can be provided for the enterprises, and the large group and the small group refer to the structure of the large group and the small group in the classification table.
Step five, visualizing the ordering condition of the direction similarity, wherein a user can adjust the research and development direction according to the ordering of the direction similarity; based on the similarity of the patents, given the patented expectations of the patents, the user can refer to the values to determine the manner in which the patents are processed.
In the third step, the competitor information can be dynamically adjusted.
In another embodiment, the invention further comprises extracting text vectors of the scheme for the technical scheme to be submitted, calculating similarity with semantic vectors of all classification numbers respectively, taking the classification number with the similarity exceeding a certain threshold value as a recommended classification number of the scheme, and determining whether to submit the scheme to the patent application according to the directional similarity of the recommended classification number.
In one embodiment of the present invention as shown in fig. 2, an intellectual property information processing apparatus includes the following modules:
the information base construction module is used for constructing a patent information base, namely constructing a multi-tree index of class numbers according to an international classification table, correspondingly distributing a storage space for each class number, storing class number semantic vectors corresponding to the class numbers, extracting keywords of the class numbers according to class number definition in advance, and storing the keywords into the class number semantic vectors so as to construct a class number index table; supplementing class number semantic vectors in a class number index table according to the patent information crawled from the patent database; generating a patent semantic vector by adopting a text vector generation method aiming at each crawled patent, and generating a patent information base;
the self patent information base generation module is used for crawling self patent information from the patent database and forming a self patent information base by adopting the method in the first step;
the competitor patent information base generation module is used for setting competitor information, crawling the patent information of the competitor from the patent database and forming a competitor patent information base;
the similarity calculation module is used for carrying out cross comparison on the self patent information base and the competitor patent information base according to the index of the classification number to obtain the direction similarity and the patent similarity; the cross comparison comprises traversing each patent in the same class number in the two information bases, calculating the similarity of the patent according to the semantic vector of the patent, calculating the similarity of the direction according to the semantic vector of the class number in the two databases and comparing the similarity of the direction according to the same class number in the two databases;
the visualization module is used for visualizing the ordering condition of the direction similarity, and a user can adjust the research and development direction according to the ordering of the direction similarity; based on the similarity of the patents, given the patented expectations of the patents, the user can refer to the values to determine the manner in which the patents are processed.
When the information base construction module supplements the semantic vectors of the class numbers, the corresponding semantic vectors of the class numbers are supplemented and updated according to the keywords given by the patents according to the class number information given by the patents, the weights of the keywords in the semantic vectors are adjusted according to the keyword sources, the keyword sources comprise abstracts, background technologies, claims and specifications, and higher weight information is set for the keywords extracted from the abstracts and the background technologies.
The competitor information in the competitor patent information base generation module can be dynamically adjusted.
The direction similarity in the similarity calculation module can be divided into a large group of similarity and a small group of similarity.
In another embodiment, the device further includes an analysis module, configured to extract a text vector of the file for a technical solution to be submitted, calculate similarity with semantic vectors of respective classification numbers, use a classification number with similarity exceeding a certain threshold as a recommendation classification number of the solution, and determine whether to submit the solution to the patent application according to the directional similarity of the recommendation classification number.
The above embodiments are merely preferred embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.