The content of the invention
It is a primary object of the present invention to provide a kind of computational methods and device of company's similarity, it is intended to accurately calculate
The similarity of different company.
To achieve the above object, the computational methods of a kind of company's similarity provided by the invention, the described method includes following
Step:
If the similarity measure request to Liang Ge companies is received, according to the Liang Ge companies in each information pre-established
Child node position in the message structure tree of dimension, calculates similarity of the Liang Ge companies under each information dimension respectively;
Wherein, described information structure tree is every according to the company information of acquisition to obtain company information from predetermined database
The message structure tree with host node and child node that one predetermined information dimension is established;
According to similarity of the Liang Ge companies under each information dimension, and based on predetermined similarity measure rule
Overall similarity of the Liang Ge companies under all information dimensions is calculated, using the overall similarity as between the Liang Ge companies
Similarity.
Preferably, it is described calculate the Liang Ge companies respectively under each information dimension similarity the step of include:
In the message structure tree of an information dimension, the child node position of a company from the Liang Ge companies is calculated
The child node and/or the number of nodes of host node that node path to the child node position of another company is included, will calculate
Similarity of the number of nodes as the Liang Ge companies under the information dimension.
Preferably, the predetermined similarity measure rule is:
Similarity of the Liang Ge companies under each information dimension is weighted according to predetermined similarity weight
Calculate, to obtain overall similarity of the Liang Ge companies under all information dimensions.
Preferably, the predetermined similarity measure rule is:
Similarity of the Liang Ge companies under each information dimension is averaged to obtain the Liang Ge companies in all letters
Cease the overall similarity under dimension.
Preferably, the predetermined database includes the company information data storehouse and/or securities trading of the administration for industry and commerce
Company information data storehouse;Described information dimension includes trade classification, holding relation and/or occurrences in human life structure.
In addition, to achieve the above object, the present invention also provides a kind of computing device of company's similarity, the company are similar
The computing device of degree includes:
First computing module, if for receiving the similarity measure request to Liang Ge companies, exists according to the Liang Ge companies
Child node position in the message structure tree of each information dimension pre-established, calculates the Liang Ge companies in each letter respectively
Cease the similarity under dimension;Wherein, described information structure tree is obtains company information from predetermined database, according to obtaining
The company information taken, is the message structure tree with host node and child node that each predetermined information dimension is established;
Second computing module, for the similarity according to the Liang Ge companies under each information dimension, and based on true in advance
Fixed similarity measure rule calculates overall similarity of the Liang Ge companies under all information dimensions, by the overall similarity
As the similarity between the Liang Ge companies.
Preferably, first computing module is additionally operable to:
In the message structure tree of an information dimension, the child node position of a company from the Liang Ge companies is calculated
The child node and/or the number of nodes of host node that node path to the child node position of another company is included, will calculate
Similarity of the number of nodes as the Liang Ge companies under the information dimension.
Preferably, the predetermined similarity measure rule is:
Similarity of the Liang Ge companies under each information dimension is weighted according to predetermined similarity weight
Calculate, to obtain overall similarity of the Liang Ge companies under all information dimensions.
Preferably, the predetermined similarity measure rule is:
Similarity of the Liang Ge companies under each information dimension is averaged to obtain the Liang Ge companies in all letters
Cease the overall similarity under dimension.
Preferably, the predetermined database includes the company information data storehouse and/or securities trading of the administration for industry and commerce
Company information data storehouse;Described information dimension includes trade classification, holding relation and/or occurrences in human life structure.
The computational methods and device of company's similarity proposed by the present invention, please to the similarity measure of Liang Ge companies receiving
When asking, which is calculated under all information dimensions by similarity of the Liang Ge companies under each information dimension
Overall similarity, to obtain the similarity between the Liang Ge companies.Due in the similarity between calculating Liang Ge companies, no
Character and the word of two Business Names are only based only on to calculate, moreover it is possible to according to Liang Ge companies under each different information dimensions
Similarity more deeply, exactly calculate the overall similarities of the Liang Ge companies, improve to similarity between company
The accuracy rate calculated.
Embodiment
In order to make technical problems, technical solutions and advantages to be solved clearer, clear, tie below
Drawings and examples are closed, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only
To explain the present invention, it is not intended to limit the present invention.
The present invention provides a kind of computational methods of company's similarity.
With reference to Fig. 1, Fig. 1 is the flow diagram of one embodiment of computational methods of company's similarity of the present invention.
In one embodiment, the computational methods of the said firm's similarity include:
Step S10, if receive to the request of the similarity measures of Liang Ge companies, is pre-establishing according to the Liang Ge companies
Child node position in the message structure tree of each information dimension, calculates the Liang Ge companies under each information dimension respectively
Similarity;Wherein, described information structure tree is believed to obtain company information from predetermined database according to the enterprise of acquisition
Breath, is the message structure tree with host node and child node that each predetermined information dimension is established;
In the present embodiment, the request that similarity measure is carried out to different company that user sends is received, for example, receiving user
Relevant information is inputted in the terminals such as mobile phone, tablet computer, self-help terminal equipment (for example, it is desired to carry out each of similarity measure
A Business Name etc.) send afterwards similarity measure request, such as reception user in mobile phone, tablet computer, self-help terminal equipment
Inputting after relevant information the similarity measure sent in terminal in preassembled company's similarity measure application APP please
Ask, or receive after user inputs relevant information on browser in the terminals such as mobile phone, tablet computer, self-help terminal equipment
The similarity measure request sent.
It should be noted that for the ease of describing, in the present embodiment only exemplified by carrying out similarity measure to Liang Ge companies
To be specifically described, the process that multiple companies are carried out with similarity measure can refer to the mistake that similarity measure is carried out to Liang Ge companies
Journey, this will not be repeated here.
, can be each what is pre-established according to the Liang Ge companies after receiving to the similarity measure request of Liang Ge companies
Child node position in the message structure tree of information dimension, it is similar under each information dimension to calculate the Liang Ge companies respectively
Degree, can such as be calculated according to child node position of the Liang Ge companies in the message structure tree of each information dimension pre-established should
Number of nodes size that the node path of Liang Ge companies is far and near, passes through etc., and tieed up in this, as the Liang Ge companies in each information
Similarity under degree.Wherein, described information structure tree to establish process as follows:
From predetermined database (for example, the company information data storehouse of the administration for industry and commerce, the company information of stock exchange
Database) in obtain company information (for example, Business Name, the industry, main management scope, occurrences in human life structural information, shareholder's information
Deng), it is each predetermined information dimension (for example, trade classification, holding relation, occurrences in human life according to the company information of acquisition
The information such as structure dimension) establish a message structure tree with host node and child node.
For example, following is to establish the specific of message structure tree with host node and child node for different information dimensions
Process:
1st, it is trade classification dimension for information dimension, the sector classification dimension is host node, corresponding the sector point
Tree structure division rule under class dimension is:It is level-one child node by the industry class that all industries are divided into default quantity;Pin
It is subordinate's child node that sub-industry classification step by step is carried out to each industry class.For example, if building trade is host node, premises
It is real estate row to produce the child node that industry, bridge construction industry etc. are building trade, industrial estate industry, commercial real estate industry etc.
The child node of industry, corresponding afterbody child node is each company, wherein, next stage child node is higher level's host node or higher level
The branch of child node.
2nd, it is holding relation dimension for information dimension, which is host node, the corresponding holding pass
It is that tree structure division rule under dimension is:The next stage child node of host node is public for the first order do not controlled interest by other companies
Department;The next stage child node of first order company is by its holding second level company;Next child node of second level company is quilt
Its holding third level company;And so on carry out the extension of subordinate child node.Wherein, the company to be controlled interest jointly by multiple companies
Contacting for the superior and the subordinate's node can be formed as common node and multiple holding companies.
3rd, it is occurrences in human life structural dimension for information dimension, which is host node, the corresponding occurrences in human life knot
Tree structure division rule is under structure dimension:The next stage child node of host node is to appoint predetermined position (example in each company
Such as, the position on the middle and senior level such as CEO, Chief Financial Officer, sales director, Executive Director, executive vice president) first order custodian
Member;First order tenure company that the next stage child node of first order administrative staff is held a post for it;The first order tenure company it is next
Child node is the second level administrative staff of the predetermined position in addition to the first order administrative staff of higher level's child node inside it;The
Second level tenure company that the next stage child node of two level administrative staff is held a post for it;The next stage section of second level tenure company
Point is the 3rd of all management jobs inside it in addition to the first order administrative staff and second level administrative staff of higher level's child node
Level administrative staff;The next stage child node of third level administrative staff is held a post company for third level of its tenure, with this to all duties
Analogize the extension for carrying out subordinate's child node in position.
Step S20, according to similarity of the Liang Ge companies under each information dimension, and is based on predetermined similarity
Computation rule calculates overall similarity of the Liang Ge companies under all information dimensions, using the overall similarity as this two
Similarity between company.
Counted according to child node position of the Liang Ge companies in the message structure tree of each information dimension pre-established
After calculating similarity of the Liang Ge companies under each information dimension, predetermined similarity measure rule meter can be also based on
Calculate overall similarity of the Liang Ge companies under all information dimensions.If classifying for example, the Liang Ge companies belong to same industry,
Then can emphasis with reference to similarity of the Liang Ge companies under trade classification dimension this information dimension come determine the Liang Ge companies it
Between overall similarity, namely the similarity of the final Liang Ge companies.
The present embodiment is tieed up when receiving the similarity measure request to Liang Ge companies by the Liang Ge companies in each information
Similarity under degree calculates overall similarity of the Liang Ge companies under all information dimensions, with obtain the Liang Ge companies it
Between similarity.Due in the similarity between calculating Liang Ge companies, being not only based only on the character of two Business Names
Calculated with word, moreover it is possible to calculated more deeply, exactly according to similarity of the Liang Ge companies under each different information dimensions
Go out the overall similarity of the Liang Ge companies, improve the accuracy rate calculated similarity between company.
Further, in other embodiments, the above-mentioned phase for calculating the Liang Ge companies respectively under each information dimension
The step of seemingly spending can include:
In the message structure tree of an information dimension, the child node position of a company from the Liang Ge companies is calculated
The child node and/or the number of nodes of host node that node path to the child node position of another company is included, will calculate
Similarity of the number of nodes as the Liang Ge companies under the information dimension.
In the present embodiment, for similarity of the Liang Ge companies under each information dimension, this two are first obtained respectively
Child node position of the company in the message structure tree of same information dimension, the Liang Ge companies are calculated further according to child node position
The degree of association in the message structure tree of same information dimension.For example, the message structure in same information dimension can be calculated
In tree, from the node path institute of the child node position of child node position to another company of a company in the Liang Ge companies
Comprising child node and/or host node number of nodes, the number of the number of nodes can reflect the Liang Ge companies same
Degree of association height in the message structure tree of a information dimension, it is similar under the information dimension also to embody the Liang Ge companies
Spend size, as number of nodes is more, illustrate the degree of association of the Liang Ge companies in the message structure tree of same information dimension compared with
Low, then similarity of the Liang Ge companies under the information dimension is relatively low;Number of nodes is fewer, illustrates the Liang Ge companies same
The degree of association in the message structure tree of a information dimension is higher, then similarity of the Liang Ge companies under the information dimension also compared with
It is high.
Further, in other embodiments, the predetermined similarity measure rule is:
Similarity of the Liang Ge companies under each information dimension is weighted according to predetermined similarity weight
Calculate, to obtain overall similarity of the Liang Ge companies under all information dimensions.
In the present embodiment, in the son according to the Liang Ge companies in the message structure tree of each information dimension pre-established
, can also be by the Liang Ge companies in each letter after node location calculates similarity of the Liang Ge companies under each information dimension
Similarity under breath dimension is weighted according to predetermined similarity weight, to calculate the Liang Ge companies all
Overall similarity under information dimension.For example, can be in advance to the corresponding similarity weight of different information dimension sets, if needing to count
Calculating the company of the existing phase same industry in each company of similarity also has the company of different industries, then can be by each company in industry
The corresponding similarity weight of this information dimension of classification dimension is arranged to weight limit, in this way, calculating the entirety of each company
The similarity between company in the mutually same industry can be more accurately identified during similarity, avoids by mistake calculating the company of different industries
For the big company of similarity, the accuracy rate of company's similarity measure further increasing.
In one embodiment, can come to be tieed up in each information according to the Liang Ge companies by following calculating formula of similarity
Similarity measure under degree goes out overall similarity of the Liang Ge companies under all information dimensions:
Wherein, S (V) represents overall similarity of the Liang Ge companies under all information dimensions, HiRepresent i-th of information dimension
The predetermined similarity weight of degree, ViRepresent the similarity of the Liang Ge companies under i-th of information dimension.
Further, in other embodiments, the predetermined similarity measure rule is:
Similarity of the Liang Ge companies under each information dimension is averaged to obtain the Liang Ge companies in all letters
Cease the overall similarity under dimension.
In the present embodiment, in the son according to the Liang Ge companies in the message structure tree of each information dimension pre-established
, can also be by the Liang Ge companies in each letter after node location calculates similarity of the Liang Ge companies under each information dimension
Similarity under breath dimension is averaged, to calculate overall similarity of the Liang Ge companies under all information dimensions.Should
The average value of similarity of the Liang Ge companies under each information dimension similarity as a whole, can balance each information dimension to most
The influence of whole overall similarity, makes the overall similarity finally obtained more accurately, rationally, and then improves company's similarity meter
The accuracy rate of calculation.
The present invention further provides a kind of computing device of company's similarity.
With reference to Fig. 2, Fig. 2 is the high-level schematic functional block diagram of one embodiment of computing device of company's similarity of the present invention.
In one embodiment, the computing device of the said firm's similarity includes:
First computing module 01, if for receiving the similarity measure request to Liang Ge companies, according to the Liang Ge companies
Child node position in the message structure tree of each information dimension pre-established, calculates the Liang Ge companies each respectively
Similarity under information dimension;Wherein, described information structure tree is to obtain company information from predetermined database, according to
The company information of acquisition, is the message structure with host node and child node that each predetermined information dimension is established
Tree;
In the present embodiment, the request that similarity measure is carried out to different company that user sends is received, for example, receiving user
Relevant information is inputted in the terminals such as mobile phone, tablet computer, self-help terminal equipment (for example, it is desired to carry out each of similarity measure
A Business Name etc.) send afterwards similarity measure request, such as reception user in mobile phone, tablet computer, self-help terminal equipment
Inputting after relevant information the similarity measure sent in terminal in preassembled company's similarity measure application APP please
Ask, or receive after user inputs relevant information on browser in the terminals such as mobile phone, tablet computer, self-help terminal equipment
The similarity measure request sent.
It should be noted that for the ease of describing, in the present embodiment only exemplified by carrying out similarity measure to Liang Ge companies
To be specifically described, the process that multiple companies are carried out with similarity measure can refer to the mistake that similarity measure is carried out to Liang Ge companies
Journey, this will not be repeated here.
, can be each what is pre-established according to the Liang Ge companies after receiving to the similarity measure request of Liang Ge companies
Child node position in the message structure tree of information dimension, it is similar under each information dimension to calculate the Liang Ge companies respectively
Degree, can such as be calculated according to child node position of the Liang Ge companies in the message structure tree of each information dimension pre-established should
Number of nodes size that the node path of Liang Ge companies is far and near, passes through etc., and tieed up in this, as the Liang Ge companies in each information
Similarity under degree.Wherein, described information structure tree to establish process as follows:
From predetermined database (for example, the company information data storehouse of the administration for industry and commerce, the company information of stock exchange
Database) in obtain company information (for example, Business Name, the industry, main management scope, occurrences in human life structural information, shareholder's information
Deng), it is each predetermined information dimension (for example, trade classification, holding relation, occurrences in human life according to the company information of acquisition
The information such as structure dimension) establish a message structure tree with host node and child node.
For example, following is to establish the specific of message structure tree with host node and child node for different information dimensions
Process:
1st, it is trade classification dimension for information dimension, the sector classification dimension is host node, corresponding the sector point
Tree structure division rule under class dimension is:It is level-one child node by the industry class that all industries are divided into default quantity;Pin
It is subordinate's child node that sub-industry classification step by step is carried out to each industry class.For example, if building trade is host node, premises
It is real estate row to produce the child node that industry, bridge construction industry etc. are building trade, industrial estate industry, commercial real estate industry etc.
The child node of industry, corresponding afterbody child node is each company, wherein, next stage child node is higher level's host node or higher level
The branch of child node.
2nd, it is holding relation dimension for information dimension, which is host node, the corresponding holding pass
It is that tree structure division rule under dimension is:The next stage child node of host node is public for the first order do not controlled interest by other companies
Department;The next stage child node of first order company is by its holding second level company;Next child node of second level company is quilt
Its holding third level company;And so on carry out the extension of subordinate child node.Wherein, the company to be controlled interest jointly by multiple companies
Contacting for the superior and the subordinate's node can be formed as common node and multiple holding companies.
3rd, it is occurrences in human life structural dimension for information dimension, which is host node, the corresponding occurrences in human life knot
Tree structure division rule is under structure dimension:The next stage child node of host node is to appoint predetermined position (example in each company
Such as, the position on the middle and senior level such as CEO, Chief Financial Officer, sales director, Executive Director, executive vice president) first order custodian
Member;First order tenure company that the next stage child node of first order administrative staff is held a post for it;The first order tenure company it is next
Child node is the second level administrative staff of the predetermined position in addition to the first order administrative staff of higher level's child node inside it;The
Second level tenure company that the next stage child node of two level administrative staff is held a post for it;The next stage section of second level tenure company
Point is the 3rd of all management jobs inside it in addition to the first order administrative staff and second level administrative staff of higher level's child node
Level administrative staff;The next stage child node of third level administrative staff is held a post company for third level of its tenure, with this to all duties
Analogize the extension for carrying out subordinate's child node in position.
Second computing module 02, for the similarity according to the Liang Ge companies under each information dimension, and based on advance
Definite similarity measure rule calculates overall similarity of the Liang Ge companies under all information dimensions, this is overall similar
Degree is as the similarity between the Liang Ge companies.
Counted according to child node position of the Liang Ge companies in the message structure tree of each information dimension pre-established
After calculating similarity of the Liang Ge companies under each information dimension, predetermined similarity measure rule meter can be also based on
Calculate overall similarity of the Liang Ge companies under all information dimensions.If classifying for example, the Liang Ge companies belong to same industry,
Then can emphasis with reference to similarity of the Liang Ge companies under trade classification dimension this information dimension come determine the Liang Ge companies it
Between overall similarity, namely the similarity of the final Liang Ge companies.
The present embodiment is tieed up when receiving the similarity measure request to Liang Ge companies by the Liang Ge companies in each information
Similarity under degree calculates overall similarity of the Liang Ge companies under all information dimensions, with obtain the Liang Ge companies it
Between similarity.Due in the similarity between calculating Liang Ge companies, being not only based only on the character of two Business Names
Calculated with word, moreover it is possible to calculated more deeply, exactly according to similarity of the Liang Ge companies under each different information dimensions
Go out the overall similarity of the Liang Ge companies, improve the accuracy rate calculated similarity between company.
Further, in other embodiments, above-mentioned first computing module 01 can be also used for:
In the message structure tree of an information dimension, the child node position of a company from the Liang Ge companies is calculated
The child node and/or the number of nodes of host node that node path to the child node position of another company is included, will calculate
Similarity of the number of nodes as the Liang Ge companies under the information dimension.
In the present embodiment, for similarity of the Liang Ge companies under each information dimension, this two are first obtained respectively
Child node position of the company in the message structure tree of same information dimension, the Liang Ge companies are calculated further according to child node position
The degree of association in the message structure tree of same information dimension.For example, the message structure in same information dimension can be calculated
In tree, from the node path institute of the child node position of child node position to another company of a company in the Liang Ge companies
Comprising child node and/or host node number of nodes, the number of the number of nodes can reflect the Liang Ge companies same
Degree of association height in the message structure tree of a information dimension, it is similar under the information dimension also to embody the Liang Ge companies
Spend size, as number of nodes is more, illustrate the degree of association of the Liang Ge companies in the message structure tree of same information dimension compared with
Low, then similarity of the Liang Ge companies under the information dimension is relatively low;Number of nodes is fewer, illustrates the Liang Ge companies same
The degree of association in the message structure tree of a information dimension is higher, then similarity of the Liang Ge companies under the information dimension also compared with
It is high.
Further, in other embodiments, the predetermined similarity measure rule is:
Similarity of the Liang Ge companies under each information dimension is weighted according to predetermined similarity weight
Calculate, to obtain overall similarity of the Liang Ge companies under all information dimensions.
In the present embodiment, in the son according to the Liang Ge companies in the message structure tree of each information dimension pre-established
, can also be by the Liang Ge companies in each letter after node location calculates similarity of the Liang Ge companies under each information dimension
Similarity under breath dimension is weighted according to predetermined similarity weight, to calculate the Liang Ge companies all
Overall similarity under information dimension.For example, can be in advance to the corresponding similarity weight of different information dimension sets, if needing to count
Calculating the company of the existing phase same industry in each company of similarity also has the company of different industries, then can be by each company in industry
The corresponding similarity weight of this information dimension of classification dimension is arranged to weight limit, in this way, calculating the entirety of each company
The similarity between company in the mutually same industry can be more accurately identified during similarity, avoids by mistake calculating the company of different industries
For the big company of similarity, the accuracy rate of company's similarity measure further increasing.
In one embodiment, can come to be tieed up in each information according to the Liang Ge companies by following calculating formula of similarity
Similarity measure under degree goes out overall similarity of the Liang Ge companies under all information dimensions:
Wherein, S (V) represents overall similarity of the Liang Ge companies under all information dimensions, HiRepresent i-th of information dimension
The predetermined similarity weight of degree, ViRepresent the similarity of the Liang Ge companies under i-th of information dimension.
Further, in other embodiments, the predetermined similarity measure rule is:
Similarity of the Liang Ge companies under each information dimension is averaged to obtain the Liang Ge companies in all letters
Cease the overall similarity under dimension.
In the present embodiment, in the son according to the Liang Ge companies in the message structure tree of each information dimension pre-established
, can also be by the Liang Ge companies in each letter after node location calculates similarity of the Liang Ge companies under each information dimension
Similarity under breath dimension is averaged, to calculate overall similarity of the Liang Ge companies under all information dimensions.Should
The average value of similarity of the Liang Ge companies under each information dimension similarity as a whole, can balance each information dimension to most
The influence of whole overall similarity, makes the overall similarity finally obtained more accurately, rationally, and then improves company's similarity meter
The accuracy rate of calculation.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row
His property includes, so that process, method, article or device including a series of elements not only include those key elements, and
And other elements that are not explicitly listed are further included, or further include as this process, method, article or device institute inherently
Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this
Also there are other identical element in the process of key element, method, article or device.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to realized by hardware, but very much
In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing
The part that technology contributes can be embodied in the form of software product, which is stored in a storage
In medium (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, calculate
Machine, server, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
Above by reference to the preferred embodiment of the present invention has been illustrated, not thereby limit to the interest field of the present invention.On
State that sequence number of the embodiment of the present invention is for illustration only, do not represent the quality of embodiment.Patrolled in addition, though showing in flow charts
Order is collected, but in some cases, can be with the steps shown or described are performed in an order that is different from the one herein.
Those skilled in the art do not depart from the scope of the present invention and essence, can have a variety of flexible programs to realize the present invention,
It can be used for another embodiment for example as the feature of one embodiment and obtain another embodiment.All technologies with the present invention
The all any modification, equivalent and improvement made within design, should all be within the interest field of the present invention.