CN107248023B - Method and device for screening benchmarking enterprise list - Google Patents

Method and device for screening benchmarking enterprise list Download PDF

Info

Publication number
CN107248023B
CN107248023B CN201710344912.4A CN201710344912A CN107248023B CN 107248023 B CN107248023 B CN 107248023B CN 201710344912 A CN201710344912 A CN 201710344912A CN 107248023 B CN107248023 B CN 107248023B
Authority
CN
China
Prior art keywords
enterprise
core
sequence
feature
ordered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710344912.4A
Other languages
Chinese (zh)
Other versions
CN107248023A (en
Inventor
赵璐
戴光华
郭林海
丁春明
王芙萍
张冰洁
曹思佳
施敬思
王瑞
曹印杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Minsheng Banking Corp Ltd
Original Assignee
China Minsheng Banking Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Minsheng Banking Corp Ltd filed Critical China Minsheng Banking Corp Ltd
Priority to CN201710344912.4A priority Critical patent/CN107248023B/en
Publication of CN107248023A publication Critical patent/CN107248023A/en
Application granted granted Critical
Publication of CN107248023B publication Critical patent/CN107248023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
    • G06Q10/063Operations research or analysis
    • G06Q10/0639Performance analysis
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6201Matching; Proximity measures
    • G06K9/6215Proximity measures, i.e. similarity or distance measures

Abstract

The embodiment of the invention provides a method and a device for screening a benchmarking enterprise list, which particularly extract a plurality of preset ordered element sequences in accordance with the operation range of an enterprise, and perform duplication removal and segmentation on the ordered element sequences to obtain core elements and modification elements in accordance with the operation range of the enterprise; constructing a plurality of feature clusters according to the core elements and the modification elements, and performing grouping measurement on each feature cluster to obtain a measurement value of each feature cluster; constructing an enterprise feature matrix of a target enterprise to be targeted, wherein the enterprise feature matrix comprises a core element vector and a modification element vector of the target enterprise; calculating the similarity between the target enterprise and each comparison enterprise according to the enterprise characteristic matrix and the metric value; and determining the benchmarking enterprise list from a plurality of comparison enterprises according to the similarity, thereby realizing providing the benchmarking enterprise list for the industry benchmarking analysis.

Description

Method and device for screening benchmarking enterprise list
Technical Field
The invention relates to the technical field of industry benchmarking, in particular to a method and a device for screening a benchmarking enterprise list.
Background
The industry benchmarking analysis is a scientific and advanced enterprise management method, which means that enterprises take excellent enterprises in the industry or outside the industry as benchmarks, compare, analyze and judge with the benchmarks enterprises from various aspects, and overcome the defects of the enterprises by learning advanced experiences of other people, thereby overtaking the benchmarks enterprises and continuously pursuing the virtuous circle process of excellent performance.
With the development of productivity, the business elements of each enterprise are more and more abundant, so that the situation that the standard analysis of the industry needs to be solved is more and more complicated. For example, enterprises are in full diversification and management, the related industries are relatively balanced in development, no obvious main business enterprises exist, and various analysis tools and indexes under a single industry item, including industry mean values, periods and the like, cannot accurately analyze the enterprises; for another example, with the continuous subdivision of industries and the cross-border between the subdivision industries, diversified differentiation and mixed-match industries are promoted, and although the industries belong to the same national standard subdivision industry, the characteristics of operation, finance and the like are different; for another example, the operation contents of some enterprises can correspond to a plurality of national standard industries according to the processing depth, and the ammonia production belongs to the chemical raw material production, and the chemical fertilizer production belongs to the chemical fertilizer production after the processing.
Meanwhile, some industries belong to different industries on national standard classification, but certain relation exists between actual operation and finance, and the traditional industry standard-to-standard analysis method based on the national standard single industry classification item cannot integrate the industries and the standard-to-standard analysis method into enterprises for analysis, so that on one hand, data waste is caused, and on the other hand, similar enterprise data cannot be borrowed for analysis across industries under the condition that samples are insufficient.
Due to the above factors, the utility of the industry benchmarking analysis method is reduced, so how to screen the benchmarking enterprise list becomes a front difficulty of the industry benchmarking analysis.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for screening a benchmarking enterprise list, which are used for providing a benchmarking enterprise list for industry benchmarking analysis.
In order to achieve the purpose, the invention discloses a method for screening a bidding enterprise list, which comprises the following steps:
extracting a plurality of preset ordered element sequences of the business range of the reference enterprise, removing the duplication and dividing the ordered element sequences to obtain core elements and modification elements of the business range of the reference enterprise, and removing the duplication of the ordered element sequences of other business characteristics (such as discrete characteristics of products, raw materials and the like) of the enterprise to obtain other business characteristic elements of the reference enterprise;
constructing a plurality of feature clusters according to the core elements, the modification elements and other operation feature elements, and performing clustering measurement on each feature cluster to obtain a measurement value of each feature cluster;
constructing an enterprise feature matrix of a target enterprise needing to be targeted, wherein the enterprise feature matrix comprises a core element vector, a modified element vector and other operation feature element vectors of the target enterprise;
calculating the similarity between the target enterprise and each comparison enterprise according to the enterprise feature matrix and the metric value;
and determining a benchmarking enterprise list from the plurality of comparison enterprises according to the similarity.
Optionally, the extracting of the preset ordered element sequences of the multiple comparison enterprises in the operating range, and the deduplication and segmentation of the ordered element sequences to obtain the core elements and the modification elements of the comparison enterprises in the operating range includes the steps of:
simplifying the ordered element sequence and removing meaningless characters in the ordered element sequence;
performing first segmentation processing on the simplified ordered element sequence according to a preset segmentation symbol;
performing secondary segmentation processing on the ordered element sequence subjected to the primary segmentation processing, and performing word segmentation processing on elements with more than two characters;
and classifying the ordered element sequence subjected to the second segmentation treatment to obtain the core element and the modification element.
Optionally, the extracting of the preset ordered element sequences of the multiple comparison enterprises in the operating range, and the de-duplication and segmentation of the ordered element sequences to obtain the core elements and the modification elements of the comparison enterprises in the operating range further include:
and adjusting the group of the ordered elements according to the part of speech.
Optionally, the constructing a plurality of feature clusters according to the core element, the modification element, and the other operation feature elements, and performing clustering measurement on each feature cluster to obtain a measurement value of each feature cluster includes:
respectively carrying out duplication removal treatment on the core elements and the modified elements to form a core element sequence and a modified element sequence of the reference enterprise, constructing a layer matrix according to the core element sequence and the modified element sequence, carrying out duplication removal on ordered elements of other operating characteristics (such as discrete characteristics of products, raw materials and the like) of the enterprise to obtain other operating characteristic element sequences of the reference enterprise, and constructing a layer matrix according to the other operating characteristic element sequences, wherein the layer matrix comprises a core element layer matrix, a modified element layer matrix and other operating characteristic element matrices;
clustering the layer matrix according to a preset clustering rule to obtain a plurality of feature clusters;
adjusting the feature clusters;
and calculating the metric value of each feature cluster according to a preset calculation rule.
Optionally, the determining a benchmarking enterprise list from the plurality of comparison enterprises according to the similarity includes:
searching from the plurality of comparison enterprises according to a preset similarity threshold, and selecting the enterprises with the similarity greater than the similarity threshold to form the benchmarking enterprise list;
or, the comparison enterprises are ranked according to the similarity between the comparison enterprises and the target enterprise, and a preset number of enterprises are picked out from the ranking list to form the benchmarking enterprise list.
Correspondingly, in order to ensure the implementation of the method, the invention also provides a device for screening the benchmarking enterprise list, which comprises the following steps:
the data extraction module is used for extracting a plurality of preset ordered element sequences of the business scope of the reference enterprise, and carrying out duplication removal and segmentation on the ordered element sequences to obtain core elements and modification elements of the business scope of the reference enterprise; extracting preset ordered elements according to other operation characteristics (such as discrete characteristics of products, raw materials and the like) of the enterprise, and removing the duplication of the ordered element sequence to obtain other operation characteristic element sequences of the enterprise;
the characteristic cluster construction module is used for constructing a plurality of characteristic clusters according to the core elements, the modification elements and other operation characteristic elements, and performing grouping measurement on each characteristic cluster to obtain a measurement value of each characteristic cluster;
the system comprises a feature matrix construction module, a feature matrix analysis module and a feature matrix analysis module, wherein the feature matrix construction module is used for constructing an enterprise feature matrix of a target enterprise needing to be paired, and the enterprise feature matrix comprises a core element vector, a modified element vector and other operation feature element vectors of the target enterprise;
the similarity calculation module is used for calculating the similarity between the target enterprise and each comparison enterprise according to the enterprise feature matrix and the metric value;
and the benchmarking business list determining module is used for determining a benchmarking business list from the comparison businesses according to the similarity.
Optionally, the data extraction module includes:
the sequence simplifying unit is used for simplifying the ordered element sequence and eliminating meaningless characters in the ordered element sequence;
the first segmentation unit is used for performing first segmentation processing on the simplified ordered element sequence according to a preset segmentation symbol;
the second segmentation unit is used for performing second segmentation processing on the ordered element sequence subjected to the first segmentation processing and performing word segmentation processing on elements with more than two characters;
and the classification processing unit is used for classifying the ordered element sequence subjected to the second segmentation processing to obtain the core element and the modification element.
Optionally, the data extraction module further includes:
and the group adjusting unit is used for adjusting the groups of the ordered elements according to the parts of speech.
Optionally, the feature cluster building module includes:
a duplicate removal processing unit, configured to perform duplicate removal processing on the core element and the modified element respectively to form a core element sequence and a modified element sequence of the reference enterprise, construct a layer matrix according to the core element sequence and the modified element sequence, and simultaneously perform duplicate removal on ordered elements of other business features (such as discrete features of products, raw materials, and the like) of the enterprise to obtain other business feature element sequences of the reference enterprise, and construct a layer matrix according to the other business feature element sequences, where the layer matrix includes a core element layer matrix, a modified element layer matrix, and other business feature element matrices;
the grouping processing unit is used for grouping the layer matrix according to a preset grouping rule to obtain a plurality of feature clusters;
the cluster adjusting unit is used for adjusting the feature clusters;
and the metric value calculating unit is used for calculating the metric value of each feature cluster according to a preset calculating rule.
Optionally, the module for determining the benchmarking ticket includes:
the first determining unit is used for searching from the plurality of comparison enterprises according to a preset similarity threshold value, and selecting the enterprises with the similarity greater than the similarity threshold value to form the benchmarking enterprise list;
and the second determining unit is used for sequencing the comparison enterprises according to the similarity between the comparison enterprises and the target enterprise, and picking out a preset number of enterprises from the sequencing list to form the benchmarking enterprise list.
The invention provides a method and a device for screening a benchmarking enterprise list, which particularly extracts a plurality of preset ordered element sequences in a business scope of a comparison enterprise, and performs duplication removal and segmentation on the ordered element sequences to obtain core elements and modification elements in the business scope of the comparison enterprise; constructing a plurality of feature clusters according to the core elements and the modification elements, and performing grouping measurement on each feature cluster to obtain a measurement value of each feature cluster; constructing an enterprise feature matrix of a target enterprise to be targeted, wherein the enterprise feature matrix comprises a core element vector and a modification element vector of the target enterprise; calculating the similarity between the target enterprise and each comparison enterprise according to the enterprise characteristic matrix and the metric value; and determining the benchmarking enterprise list from a plurality of comparison enterprises according to the similarity, thereby realizing providing the benchmarking enterprise list for the industry benchmarking analysis.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a method for screening a business list according to an embodiment of the present invention;
fig. 2 is a block diagram illustrating a structure of an embodiment of a filtering apparatus for a benchmarking enterprise list provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method and the device for screening the benchmarking enterprise list are based on the following existing information. The specific existing information includes:
a) the management range data of a plurality of enterprises, and an enterprise management range database is constructed according to the management range data of the plurality of enterprises;
b) a plurality of common words and part of speech comparison tables among the common words;
c) the operation characteristic data of the plurality of enterprises, such as products, raw materials and the like, and an enterprise operation characteristic database is constructed according to the operation characteristic data;
example one
Fig. 1 is a flowchart illustrating steps of a method for screening a target enterprise list according to an embodiment of the present invention.
The execution subject of this embodiment is an electronic computing device, and the electronic computing device may be an application located in the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application of the local terminal, which is not particularly limited in this embodiment of the present invention.
It should be understood that the application may be an application program (native app) installed on the terminal, or may also be a web program (webApp) of a browser on the terminal, which is not limited in this embodiment of the present invention.
As shown in fig. 1, the screening method for comparing the enterprise list provided in this embodiment specifically includes the following steps:
s101: and extracting preset operation range elements of a plurality of enterprises.
On the basis of the operation range data of a plurality of enterprises, the operation range elements of the plurality of enterprises are extracted, and the ordered element sequences for describing the operation ranges are integrated into an element matrix, wherein the element matrix comprises a core element layer and a modification element layer. The specific treatment process comprises the following steps: first, the business segment elements are simplified. The parts without practical meaning are removed, and the parts include auxiliary words (such as's', s ') and prepositions (such as's ', s'), and brackets (including brackets, middle brackets and big brackets).
For example: for the operation range' producing, processing and selling computer electronic equipment and telecommunication electric appliance elements (not containing the specialized commodities specified by the state). Education, tourism industry and other industry investments. ", the simplified result is: the production, processing and selling of computer electronic equipment and telecommunication electric appliance elements. Education, tourism industry and other industry investments. "
And then, carrying out segmentation processing on the ordered element I after the simplification processing, and taking conjunctions (such as and, or, and the like) and punctuation marks (such as comma, pause and the like) in the operation range as separators to segment the operation range for the first time to obtain an ordered element set I and an element initial grouping set.
And (3) carrying out segmentation from left to right, wherein if the current position is segmented by punctuation marks and the segmented punctuation marks are inconsistent with the punctuation marks segmented last, the elements behind the current segmentation mark and the elements in the front do not belong to the same group.
For example, for the business sector "production, processing, marketing of computer electronics, telecommunications and electrical components. Education, tourism industry and other industry investments. ", the ordered element I sequence formed by segmentation is:
[ investments in production, processing, selling of computer electronic equipment, telecommunication electronic components, education, tourism industry, and other industries ]
The group sequence number of each element is as follows:
[1,1,1,1,2,2,2]。
and then, segmenting the ordered elements II obtained by the first segmentation processing again, segmenting words of elements of more than two characters in each ordered element I sequence, and keeping the longest word segmentation result.
For example:
for the operation range of producing, processing and selling computer electronic equipment and telecommunication electric appliance elements. Education, tourism industry and other industry investments. ", sequence formed by sequential element I segmentation: [ investment in production, processing, selling of computer electronic equipment, telecommunication electrical components, education, tourism industry, and other industries ], the sequence of the group numbers to which the corresponding elements belong is: [1,1,1,1,2,2,2], after segmentation, the sequence of the ordered element II is as follows:
[ production, processing, sales, computer electronics, telecommunication apparatus, education, tourism industry, other industries, investments ]
The group sequence number sequence to which each corresponding element belongs is as follows:
[1,1,1,1,1,2,2,2,2].
and finally, classifying the ordered element sequence obtained by the two-time segmentation according to the element properties. Matching the elements in the ordered element II sequence with the elements in the historical element library, acquiring the properties of the elements, and classifying the properties according to the parts of speech of the elements if the elements do not exist in the historical element library, wherein the method specifically comprises the following steps:
the property of the verb and verb phrase is classified into core elements;
the properties of other words and phrases are classified as modifying elements;
for example:
after the sequential element II is segmented, the sequence of the sequential element II is as follows:
[ production, processing, sales, computer electronics, telecommunication and electrical components, investment, education, tourism industry, other industries ]
The ordered element III after element property classification has the following property sequence:
[ core, modification ]
Additionally, adjustments to element groups may also be included.
And scanning the elements in the ordered element III sequence after element property classification one by one, and if the property of the current element is inconsistent with the property of the previous element, the current element and the previous element belong to the same group, and two properties of 'core' and 'decoration' exist in the continuous elements of the same group before the current element, changing the elements which belong to the same group with the current element after the current element into another group.
For example:
after the sequential element II is segmented, the sequence of the sequential element II is as follows:
[ production, processing, sales, computer electronics, telecommunication and electrical components, investment, education, tourism industry, other industries ]
After the element property classification, the ordered element III has the following property sequence:
[ core, modification ]
The group sequence number sequence to which each corresponding element belongs is as follows:
[1,1,1,1,1,1,1,1,1]
after the element group adjustment is carried out, the group sequence number sequence to which each element of the ordered element IV belongs is as follows:
[1,1,1,1,1,2,2,2,2]
s102: a plurality of feature clusters are constructed from the core elements and the modifier elements.
After a plurality of feature clusters are constructed, a metric value for each feature cluster is further calculated. The method specifically comprises the following steps:
firstly, respectively removing the duplication of core elements and modification elements in ordered elements IV constructed in all enterprise operation ranges, and forming a core element sequence A and a modification element sequence B, wherein the core element sequence is k in totalaEach element is marked as aiThe sequence of modifying elements is kbEach element is marked as bi
Constructing a core element layer matrix MAA size of kaLine kaColumn in which each element ai,jIs defined as the frequency of the i-th element and the j-th element of the core element sequence occurring simultaneously, namely: if the ith element and the jth element of the core element sequence appear in the same group of the ordered elements IV constructed in the n enterprise business scopes simultaneously, ai,j=n。
Constructing a modified element layer matrix MBA size of kb*kbWherein each element bi,jIs defined as the frequency of the i element and the j element of the modifier sequence occurring simultaneously, namely: if the ith element and the jth element of the modified element sequence appear in the same group of the ordered elements IV constructed in the n enterprise business scopes simultaneously, bi,j=n。
Duplicate the other operation characteristic elements such as products, raw materials, etc., andforming respective operation characteristic element sequences S1, S2 …, which are respectively called nth operation characteristic element sequence (for example, S1 corresponds to product, S2 corresponds to raw material, etc.), wherein, the ith operation characteristic element sequence has ksnEach element is marked as sni
Constructing an operational feature layer matrix MSNA size of ksn*ksnWherein each element sni,jIs defined as the frequency of the ith element and the jth element of the nth operation characteristic element sequence occurring simultaneously, namely: if the ith element and the jth element of the nth business characteristic element sequence simultaneously appear in other business characteristics (such as products, raw materials and the like) of m enterprises, sni,j=m。
In the above operation, each aiCan be understood as a point, each ai,jIs aiAnd ajEdge between ai,jThe larger, then aiAnd ajThe connections are about tight, and thus a graph with dotted lines can be constructed.
And then, clustering the layer matrix according to a preset clustering rule to obtain a plurality of initial feature clusters.
Taking a random number sequence VnWherein the element is represented as niNumber knAnd k isn<ka(ii) a The random number sequence meets the following requirements:
in a random number sequence VnTwo elements n are arbitrarily selected fromiAnd njFor any natural number l<kaI.e. a random number sequence VnCore matrix M corresponding to any two elementsAAre not connected via a third element.
Then VnIs a cluster center initial sequence of the core element layer, wherein each element is a cluster center in the core element layer, niI.e. the center of the core element layer i, corresponding to aniBelonging to the ith group of core elements.
For arbitrarily take alWherein 0 is<l<kaIf a is presentni,lIf > 0, then alBelongs to the ith group of core elements; if arbitrarily take 0<l<kaAll have ani,lWhen the value is 0, the following is satisfied with al,jOne of j is randomly selected to make a > 0 requirementlBelonging to the jth group of core elements.
The cluster attribution sequence G of the core elements can be obtained according to the stepsaWherein the element is ga,i,ga,iIs defined as aiThe number of the group in which the element is located, i.e. ga,iMeaning 2, aiBelonging to core element group 2.
Similar random number sequence VmWherein the element is represented as miNumber kmAnd k ism<kb(ii) a The random number sequence meets the following requirements:
in a random number sequence VmM is two elements ofiAnd mjFor any natural number l<kbI.e. a random number sequence VmThe modification matrix M corresponding to any two elementsBAre not connected via a third element.
Then VmI.e. the initial sequence of the cluster center of the modified element layer, wherein each element is the cluster center in the modified element layer, miI.e. the center of the decoration element layer i, corresponding to bmiBelongs to group i of modifying elements.
For arbitrarily take blWherein 0 is<l<kbIf b is presentmi,lIf > 0, then blBelongs to group i of modifying elements; if arbitrarily take 0<l<kbAll have bmi,lWhen the value is 0, the value is in accordance with bl,jOne of j is randomly selected for being more than 0 requirement, so that blBelongs to group j of modification elements.
The cluster attribution sequence G of the modification elements can be obtained according to the stepsbWherein the element is gb,i,gb,iIs defined as biThe number of the group in which the element is located, i.e. gb,iMeaning 2, biBelongs to group 2 of modification elements.
Similarly, for other operation characteristic elements such as products and raw materials, the cluster attribution sequence G of the corresponding operation characteristic element can be obtained according to the methodsnWherein the element is gsn,i,gsn,iIs defined as sniThe number of the group in which the element is located, i.e. gsn,iMeaning 2 sniBelonging to the corresponding business feature group 2.
Then, the clustering measurement is carried out on each feature cluster, namely, the measurement value of each feature cluster is calculated.
For each element clustering result of the core element matrix, defining the clustering effect metric value as Q, and calculating the method as follows:
wherein the content of the first and second substances,
(ga,i,ga,j) The calculation method is as follows: when g isa,i=ga,jWhen (g)a,i,ga,j) 1, otherwise (g)a,i,ga,j)=0
Similarly, for each element clustering result of the modified element matrix and other operation characteristic element matrices, the clustering effect metric is defined as Qb、Qs1、Qs2…, the calculation method is as follows:
wherein the content of the first and second substances,
(gb,i,gb,j) The calculation method is as follows: when g isb,i=gb,jWhen (g)b,i,gb,j) 1, otherwise(gb,i,gb,j)=0
Wherein the content of the first and second substances,
(gsn,i,gsn,j) The calculation method is as follows: when g issn,i=gsn,jWhen (g)sn,i,gsn,j) 1, otherwise (g)sn,i,gsn,j)=0
Preferably, the clustering metric value is a Q-modeling metric proposed by Newman, which is a public method and will not be described in detail here.
And finally, adjusting the feature clusters.
Arbitrarily take aiWherein 0 is<i<kaWhen j is 0,1,2 …, kaAnd i ≠ j, if ga,i≠ga,jThen let ga,i=ga,jCalculating a cluster metric Q ', if Q'>And Q, keeping the assignment, otherwise, canceling the assignment.
For all a in AiAnd performing the steps until Q' caused by all exchanges in the round is not less than Q. Then current GaI.e. the grouping result of the core element layer, wherein the number of the groupings is kga
Similar clustering results G for the modifier layer can be obtainedbWherein the number of the subgroups is kgbAnd the grouping result G of each of the other business feature element layerssnWherein the number of the subgroups is kgsn
The essence of the above operation is to adjust the attribution of each element to other groups nearby, then calculate the Q value, and keep those adjustments that can make the Q value larger.
S103: and constructing an enterprise characteristic matrix of the target enterprise.
Extracting the operation range elements of the target enterprises to be targeted, and formingSequence of sequence elements IV and constituting a vector of core elements VAVector V of modifying elementsBAnd forming an enterprise characteristic matrix.
Core element vector VAOf size 1 x kaWherein the element wa,iIs defined as follows:
if the element a in the core element sequence AiIn the current enterprise's ordered element IV sequence, then wa,i1, otherwise, wa,i=0;
Similar modifier vector VBOf size 1 x kbWherein the element wb,iIs defined as follows:
if the element B in the element sequence B is modifiediIn the current enterprise's ordered element IV sequence, then wb,i1, otherwise, wb,i=0。
Similar other business feature element vectors VSNOf size 1 x ksnWherein the element wsn,iIs defined as follows:
if the elements SN in the other business characteristic element sequence SNiIn the Nth other business feature of the current enterprise, then wsn,i1, otherwise, wsn,i=0。
Constructed matrix MCSize ka*kbIt is defined as: vB=VA·MC
S104: and calculating the similarity of the target enterprise and each enterprise.
For target enterprises needing to be targeted, the characteristics are VA、VB、MC、VS1、VS2…, any company is taken as a comparison company which is characterized by V'A、V′B、M′C、V′S1、V′S2…, the similarity between them is defined as pi
Wherein, γa、γb、γ1… isCoefficient greater than 0, pa、pb、pn… are core element similarity, modified element similarity and other operation characteristic element similarity respectively, and the specific calculation mode is as follows:
for the target enterprise, the attribution characteristic of the core element group is FATotal number of elements is kgaEach element is fa,iSpecifically defined as:
if g isa,iJ, then αj1, otherwise αj=0。
FAEssentially, each core element in the operation range of the target enterprise appears in each group of each core element, and finally the vector form is [4,5,0,0,6 ]]。
Similarly, for a comparison enterprise, the core element group attribution characteristic of the comparison enterprise can be obtained as FA i
For the target enterprise and the comparison enterprise, the core element similarity p of the two enterprisesa,iThe calculation method is as follows:
the essence of the core element similarity is the cosine distance of the attribution features of the core element groups of the two enterprises.
For the target enterprise, the attribute of the modifying element group is FBTotal number of elements is kgbEach element is fb,iIs specifically defined as optionally 0<j<kgbIf g is presentb,iJ, then fb,i1, otherwise fb,i=0。
FBAnd virtually, whether each modifier element in the operation range of the target enterprise appears in each modifier element grouping or not is 1 if the modifier element appears, and is 0 if the modifier element does not appear. FBDefinition of (1) and FAInconsistency, mainly considering how many core elements are in the grouping, is related toEnterprises have a great deal of business in this area, and modifying elements do not have this property.
Similarly, for a comparison enterprise, the attribute of the modification element group can be obtained as FB i
For the target enterprise and the comparison enterprise, the similarity p of the modifier elements of the two enterprisesb,iThe calculation method is as follows:
similar to the core element similarity, the essence of the similarity of the modification elements is the cosine distance of the attribution characteristics of the two enterprise similar element groups.
In addition, the attribution characteristics F of the element group for the comparison enterprise decorationB iElement f in (1)i b,i≠fb,iMay be calculated as follows:
wherein the content of the first and second substances,
wherein, if'b,j≠fb,jThen σj1, otherwise σj=0
Feature vectors obtained by replacing a home feature with a replacement home feature, called home feature replacement vectorsThe corresponding substitution similarity is as follows:
the substitute feature is substantially a comparison of the connection strength with the element home group of the target business against one of the element home groups of the business that is inconsistent with the element home group of the target business, the value of which is a comparison of the number of connections of the element home group of the business with the element home group of the target business divided by the number of connections of the element home group of the target business with all other element home groups.
Similar to the similarity of the modification elements, the similarity p of other operation characteristics can be calculateds1、ps2…, and alternative similarities to other business features
Similar to the core element similarity, the essence of the modifier element similarity is the cosine distance of the attribution features of the two enterprise similar element groups.
S105: and determining a benchmarking enterprise list according to the similarity of the target enterprise and other enterprises.
For industry benchmarking of a target enterprise, the benchmarking enterprise list can be determined in one of the following two ways:
one is to set up the threshold value of enterprise's similarity, in the database of enterprise's operating range, with the enterprise whose similarity with goal enterprise is higher than the threshold value, add to the list of the enterprise of the benchmarking;
the other method is that the number of enterprise benchmarks is set, the similarity between the enterprise in the enterprise operation range database and the target enterprise is calculated, the enterprises are ranked from high to low according to the similarity, a certain number of the enterprises are taken out and added to a benchmarking enterprise list;
or the two modes are used for determining the targeted enterprise list, namely:
and setting the number of the enterprise benchmarks and the threshold value of the similarity of the enterprises. And calculating the similarity between the enterprises in the enterprise operation range database and the target enterprise, sequencing the enterprises according to the similarity from high to low, taking the enterprises with a certain number before and the enterprises with the similarity higher than a threshold value, and adding the enterprises into the benchmarking enterprise list.
In the above method, if there is a lower limit requirement for the quantity in the benchmarking enterprise list, the similarity of the modification element and other operation characteristic elements may be replaced by the corresponding substitute similarity for calculation, so as to compare the substitute similarity with a threshold value, or to perform sorting, so as to extract the benchmarking enterprise list meeting the requirement.
It can be seen from the above technical solutions that the present embodiment provides a method for screening a benchmarking enterprise list, specifically, a plurality of preset ordered element sequences in accordance with the business scope of an enterprise are extracted, and the ordered element sequences are deduplicated and segmented to obtain core elements and modification elements in accordance with the business scope of the enterprise, and at the same time, the ordered elements of other business characteristics (e.g., discrete characteristics such as products, raw materials, etc.) of the enterprise are deduplicated to obtain other business characteristic elements in accordance with the enterprise; constructing a plurality of feature clusters according to the core elements, the modification elements and other operation feature elements, and performing grouping measurement on each feature cluster to obtain a measurement value of each feature cluster; constructing an enterprise feature matrix of a target enterprise to be targeted, wherein the enterprise feature matrix comprises a core element vector, a modified element vector and other operation feature element vectors of the target enterprise; calculating the similarity between the target enterprise and each comparison enterprise according to the enterprise characteristic matrix and the metric value; and determining the benchmarking enterprise list from a plurality of comparison enterprises according to the similarity, thereby realizing providing the benchmarking enterprise list for the industry benchmarking analysis.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Example two
Fig. 2 is a block diagram illustrating a structure of an embodiment of a filtering apparatus for a benchmarking enterprise list provided in the present invention.
The device of this embodiment may be understood as an electronic computing device, and the electronic computing device may be an application located in the local terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) located in the application of the local terminal, which is not particularly limited in this embodiment of the present invention.
It should be understood that the application may be an application program (native app) installed on the terminal, or may also be a web program (webApp) of a browser on the terminal, which is not limited in this embodiment of the present invention.
As shown in fig. 2, the screening apparatus for comparing enterprise lists provided in this embodiment specifically includes a data extraction module 10, a feature cluster construction module 20, a feature matrix construction module 30, a similarity technique module 40, and a benchmarking list determination module 50.
The data advance module is used for extracting preset operation range elements of multiple enterprises.
On the basis of the operation range data of a plurality of enterprises, the operation range elements of the plurality of enterprises are extracted, and the ordered element sequences for describing the operation ranges are integrated into an element matrix, wherein the element matrix comprises a core element layer and a modification element layer.
The module comprises a sequence simplifying unit 11, a first dividing unit 12, a second dividing unit 13 and a classification processing unit 14, wherein the sequence simplifying unit is used for simplifying the operation range elements during specific processing. The parts without practical meaning are removed, and the parts include auxiliary words (such as's', s ') and prepositions (such as's ', s'), and brackets (including brackets, middle brackets and big brackets).
For example: for the operation range' producing, processing and selling computer electronic equipment and telecommunication electric appliance elements (not containing the specialized commodities specified by the state). Education, tourism industry and other industry investments. ", the simplified result is: the production, processing and selling of computer electronic equipment and telecommunication electric appliance elements. Education, tourism industry and other industry investments. "
The first segmentation unit is used for segmenting the simplified ordered element I, taking conjunctions (such as and, or, and the like) and punctuation marks (such as commas, pause numbers and the like) in the operation range as separators, segmenting the operation range for the first time to obtain an ordered element set I and an element initial grouping set.
And (3) carrying out segmentation from left to right, wherein if the current position is segmented by punctuation marks and the segmented punctuation marks are inconsistent with the punctuation marks segmented last, the elements behind the current segmentation mark and the elements in the front do not belong to the same group.
For example, for the business sector "production, processing, marketing of computer electronics, telecommunications and electrical components. Education, tourism industry and other industry investments. ", the ordered element I sequence formed by segmentation is:
[ investments in production, processing, selling of computer electronic equipment, telecommunication electronic components, education, tourism industry, and other industries ]
The group sequence number of each element is as follows:
[1,1,1,1,2,2,2]。
and the second segmentation unit is used for segmenting the ordered elements II obtained by the first segmentation processing again, segmenting words of elements of more than two characters in each ordered element I sequence, and keeping the longest word segmentation result.
For example:
for the operation range of producing, processing and selling computer electronic equipment and telecommunication electric appliance elements. Education, tourism industry and other industry investments. ", sequence formed by sequential element I segmentation: [ investment in production, processing, selling of computer electronic equipment, telecommunication electrical components, education, tourism industry, and other industries ], the sequence of the group numbers to which the corresponding elements belong is: [1,1,1,1,2,2,2], after segmentation, the sequence of the ordered element II is as follows:
[ production, processing, sales, computer electronics, telecommunication apparatus, education, tourism industry, other industries, investments ]
The group sequence number sequence to which each corresponding element belongs is as follows:
[1,1,1,1,1,2,2,2,2].
the classification processing unit is used for classifying the ordered element sequence obtained by the two-time segmentation processing according to the element properties. Matching the elements in the ordered element II sequence with the elements in the historical element library, acquiring the properties of the elements, and classifying the properties according to the parts of speech of the elements if the elements do not exist in the historical element library, wherein the method specifically comprises the following steps:
the property of the verb and verb phrase is classified into core elements;
the properties of other words and phrases are classified as modifying elements;
for example:
after the sequential element II is segmented, the sequence of the sequential element II is as follows:
[ production, processing, sales, computer electronics, telecommunication and electrical components, investment, education, tourism industry, other industries ]
The ordered element III after element property classification has the following property sequence:
[ core, modification ]
In addition, the module comprises a group adjustment unit 15 for adjusting the element group.
And scanning the elements in the ordered element III sequence after element property classification one by one, and if the property of the current element is inconsistent with the property of the previous element, the current element and the previous element belong to the same group, and two properties of 'core' and 'decoration' exist in the continuous elements of the same group before the current element, changing the elements which belong to the same group with the current element after the current element into another group.
For example:
after the sequential element II is segmented, the sequence of the sequential element II is as follows:
[ production, processing, sales, computer electronics, telecommunication and electrical components, investment, education, tourism industry, other industries ]
After the element property classification, the ordered element III has the following property sequence:
[ core, modification ]
The group sequence number sequence to which each corresponding element belongs is as follows:
[1,1,1,1,1,1,1,1,1]
after the element group adjustment is carried out, the group sequence number sequence to which each element of the ordered element IV belongs is as follows:
[1,1,1,1,1,2,2,2,2]
the feature cluster construction module is used for constructing a plurality of feature clusters according to the core elements and the modification elements.
After a plurality of feature clusters are constructed, a metric value for each feature cluster is further calculated. The module includes a deduplication processing unit 21, a grouping processing unit 22, a metric value calculation unit 23, and a grouping adjustment unit 24.
The duplication removal processing unit is used for respectively eliminating duplication of core elements and modification elements in the ordered elements IV constructed in all enterprise operation ranges and forming a core element sequence A and a modification element sequence B, wherein the core element sequence is k in totalaEach element is marked as aiThe sequence of modifying elements is kbEach element is marked as bi
Constructing a core element layer matrix MAA size of kaLine kaColumn in which each element ai,jIs defined as the frequency of the i-th element and the j-th element of the core element sequence occurring simultaneously, namely: if the ith element and the jth element of the core element sequence appear in the same group of the ordered elements IV constructed in the n enterprise business scopes simultaneously, ai,j=n。
Constructing a modified element layer matrix MBA size of kb*kbWherein each element bi,jIs defined as the frequency of the i element and the j element of the modifier sequence occurring simultaneously, namely: if the ith element and the jth element of the modified element sequence appear in the same group of the ordered elements IV constructed in the n enterprise business scopes simultaneously, bi,j=n。
Removing duplication of other operation characteristic elements such as product and raw material, and forming operation characteristic element sequences S1 and S2 … (for example, S1 corresponds to product, S2 corresponds to raw material, etc.), wherein the operation characteristic element sequence I has ksnEach element is marked as sni
Constructing an operational feature layer matrix MSNA size of ksn*ksnWherein each element sni,jIs defined as the frequency of the ith element and the jth element of the nth operation characteristic element sequence occurring simultaneously, namely: if the ith element and the jth element of the nth business characteristic element sequence simultaneously appear in other business characteristics (such as products, raw materials and the like) of m enterprises, sni,j=m。
In the above operation, each aiCan be understood as a point, each ai,jIs aiAnd ajEdge between ai,jThe larger, then aiAnd ajThe connections are about tight, and thus a graph with dotted lines can be constructed.
The clustering processing unit performs clustering processing on the layer matrix according to a preset clustering rule to obtain a plurality of initial feature clusters.
Taking a random number sequence VnWherein the element is represented as niNumber knAnd k isn<ka(ii) a The random number sequence meets the following requirements:
in a random number sequence VnTwo elements n are arbitrarily selected fromiAnd njFor any natural number l<kaI.e. a random number sequence VnCore matrix M corresponding to any two elementsAAre not connected via a third element.
Then VnIs a cluster center initial sequence of the core element layer, wherein each element is a cluster center in the core element layer, niI.e. the center of the core element layer i, corresponding to aniBelonging to the ith group of core elements.
For arbitrarily take alWherein 0 is<l<kaIf a is presentni,lIf > 0, then alBelongs to the ith group of core elements; if arbitrarily take 0<l<kaAll have ani,lWhen the value is 0, the following is satisfied with al,jOne of j is randomly selected to make a > 0 requirementlBelonging to the jth group of core elements.
The cluster attribution sequence G of the core elements can be obtained according to the stepsaWherein the element is ga,i,ga,iIs defined as aiThe number of the group in which the element is located, i.e. ga,iMeaning 2, aiBelonging to core element group 2.
Similar random number sequence VmWherein the element is represented as miNumber kmAnd k ism<kb(ii) a The random number sequence meets the following requirements:
in a random number sequence VmM is two elements ofiAnd mjFor any natural number l<kbI.e. a random number sequence VmThe modification matrix M corresponding to any two elementsBAre not connected via a third element.
Then VmI.e. the initial sequence of the cluster center of the modified element layer, wherein each element is the cluster center in the modified element layer, miI.e. the center of the decoration element layer i, corresponding to bmiBelongs to group i of modifying elements.
For arbitrarily take blWherein 0 is<l<kbIf b is presentmi,lIf > 0, then blBelongs to group i of modifying elements; if arbitrarily take 0<l<kbAll have bmi,lWhen the value is 0, the value is in accordance with bl,jOne of j is randomly selected for being more than 0 requirement, so that blBelongs to group j of modification elements.
The cluster attribution sequence G of the modification elements can be obtained according to the stepsbWherein the element is gb,i,gb,iIs defined as biThe number of the group in which the element is located, i.e. gb,iMeaning 2, biBelongs to group 2 of modification elements.
Similarly, for other operation characteristic elements such as products and raw materials, the cluster attribution sequence G of the corresponding operation characteristic element can be obtained according to the methodsnWherein the element is gsn,i,gsn,iIs defined assniThe number of the group in which the element is located, i.e. gsn,iMeaning 2 sniBelonging to the corresponding business feature group 2.
The metric value calculating unit is used for performing clustering measurement on each feature cluster, namely calculating the metric value of each feature cluster.
For each element clustering result of the core element matrix, defining the clustering effect metric value as Q, and calculating the method as follows:
wherein the content of the first and second substances,
(ga,i,ga,j) The calculation method is as follows: when g isa,i=ga,jWhen (g)a,i,ga,j) 1, otherwise (g)a,i,ga,j)=0
Similarly, for each element clustering result of the modified element matrix and other operation characteristic element matrices, the clustering effect metric is defined as Qb、Qs1、Qs2…, the calculation method is as follows:
wherein the content of the first and second substances,
(gb,i,gb,j) The calculation method is as follows: when g isb,i=gb,jWhen (g)b,i,gb,j) 1, otherwise (g)b,i,gb,j)=0
Wherein the content of the first and second substances,
(gsn,i,gsn,j) The calculation method is as follows: when g issn,i=gsn,jWhen (g)sn,i,gsn,j) 1, otherwise (g)sn,i,gsn,j)=0
Preferably, the clustering metric value is a Q-modeling metric proposed by Newman, which is a public method and will not be described in detail here.
The cluster adjusting unit is used for adjusting the feature clusters.
Arbitrarily take aiWherein 0 is<i<kaWhen j is 0,1,2 …, kaAnd i ≠ j, if ga,i≠ga,jThen let ga,i=ga,jCalculating a cluster metric Q ', if Q'>And Q, keeping the assignment, otherwise, canceling the assignment.
For all a in AiAnd performing the steps until Q' caused by all exchanges in the round is not less than Q. Then current GaI.e. the grouping result of the core element layer, wherein the number of the groupings is kga
Similar clustering results G for the modifier layer can be obtainedbWherein the number of the subgroups is kgbAnd the grouping result G of each of the other business feature element layerssnWherein the number of the subgroups is kgsn
The essence of the above operation is to adjust the attribution of each element to other groups nearby, then calculate the Q value, and keep those adjustments that can make the Q value larger.
The characteristic matrix construction module is used for constructing an enterprise characteristic matrix of the target enterprise.
Extracting the operation range elements of the target enterprise to be targeted, forming an ordered element IV sequence and forming a core element vector VAVector V of modifying elementsBAnd forming an enterprise characteristic matrix.
Core element vector VAOf size 1 x kaWherein the element wa,iIs defined as follows:
if the element a in the core element sequence AiIn the current enterprise's ordered element IV sequence, then wa,i1, otherwise, wa,i=0;
Similar modifier vector VBOf size 1 x kbWherein the element wb,iIs defined as follows:
if the element B in the element sequence B is modifiediIn the current enterprise's ordered element IV sequence, then wb,i1, otherwise, wb,i=0。
Similar other business feature element vectors VSNOf size 1 x ksnWherein the element wsn,iIs defined as follows:
if the elements SN in the other business characteristic element sequence SNiIn the Nth other business feature of the current enterprise, then wsn,i1, otherwise, wsn,i=0。
Constructed matrix MCSize ka*kbIt is defined as: vB=VA·MC
The similarity calculation module is used for calculating the similarity between the target enterprise and each enterprise.
For target enterprises needing to be targeted, the characteristics are VA、VB、MC、VS1、VS2…, any company is taken as a comparison company which is characterized by V'A、V′B、M′C、V′S1、V′S2…, the similarity between them is defined as pi
Wherein, γa、γb、γ1… is a coefficient greater than 0, pa、pb、pn… are core element similarity, modified element similarity and other operation characteristic element similarity respectively, and the specific calculation mode is as follows:
for the target enterprise, the attribution characteristic of the core element group is FATotal number of elements is kgaEach element is fa,iSpecifically defined as:
if g isa,iJ, then αj1, otherwise αj=0。
FAEssentially, each core element in the operation range of the target enterprise appears in each group of each core element, and finally the vector form is [4,5,0,0,6 ]]。
Similarly, for a comparison enterprise, the core element group attribution characteristic of the comparison enterprise can be obtained as FA i
For the target enterprise and the comparison enterprise, the core element similarity p of the two enterprisesa,iThe calculation method is as follows:
the essence of the core element similarity is the cosine distance of the attribution features of the core element groups of the two enterprises.
For the target enterprise, the attribute of the modifying element group is FBTotal number of elements is kgbEach element is fb,iIs specifically defined as optionally 0<j<kgbIf g is presentb,iJ, then fb,i1, otherwise fb,i=0。
FBAnd virtually, whether each modifier element in the operation range of the target enterprise appears in each modifier element grouping or not is 1 if the modifier element appears, and is 0 if the modifier element does not appear. FBDefinition of (1) and FAThe inconsistency is mainly considered how many core elements are in the grouping, which is related to the emphasis of enterprises in the field, while the modifying elements do not have the characteristic.
Similarly, for a comparison enterprise, the attribute of the modification element group can be obtained as FB i
For the target enterprise and the comparison enterprise, the similarity p of the modifier elements of the two enterprisesb,iThe calculation method is as follows:
similar to the core element similarity, the essence of the similarity of the modification elements is the cosine distance of the attribution characteristics of the two enterprise similar element groups.
In addition, the attribution characteristics F of the element group for the comparison enterprise decorationB iElement f in (1)i b,i≠fb,iMay be calculated as follows:
wherein the content of the first and second substances,
wherein, if'b,j≠fb,jThen σj1, otherwise σj=0
Feature vectors obtained by replacing a home feature with a replacement home feature, called home feature replacement vectorsThe corresponding substitution similarity is as follows:
the substitute feature is substantially a comparison of the connection strength with the element home group of the target business against one of the element home groups of the business that is inconsistent with the element home group of the target business, the value of which is a comparison of the number of connections of the element home group of the business with the element home group of the target business divided by the number of connections of the element home group of the target business with all other element home groups.
Similar to the similarity of the modification elements, the similarity p of other operation characteristics can be calculateds1、ps2…, and alternative similarities to other business features
Similar to the core element similarity, the essence of the modifier element similarity is the cosine distance of the attribution features of the two enterprise similar element groups.
And the benchmarking list determining module 50 is used for determining the benchmarking enterprise list according to the similarity of the target enterprise and other enterprises. The module comprises a first determining unit 51 and a second determining unit 52.
The first determining unit is used for setting an enterprise similarity threshold value, and then adding enterprises with the similarity higher than the threshold value with a target enterprise in the enterprise operation range database to a benchmarking enterprise list;
the second determining unit is used for setting the number of the enterprise benchmarks, calculating the similarity between the enterprises in the enterprise operation range database and the target enterprise, sequencing the enterprises from high to low according to the similarity, taking a certain number of the enterprises before, and adding the enterprises to the benchmarks enterprise list;
or the two modes are used for determining the targeted enterprise list, namely:
and setting the number of the enterprise benchmarks and the threshold value of the similarity of the enterprises. And calculating the similarity between the enterprises in the enterprise operation range database and the target enterprise, sequencing the enterprises according to the similarity from high to low, taking the enterprises with a certain number before and the enterprises with the similarity higher than a threshold value, and adding the enterprises into the benchmarking enterprise list.
In the above method, if there is a lower limit requirement for the quantity in the benchmarking enterprise list, the similarity of the modification element and other operation characteristic elements may be replaced by the corresponding substitute similarity for calculation, so as to compare the substitute similarity with a threshold value, or to perform sorting, so as to extract the benchmarking enterprise list meeting the requirement.
It can be seen from the above technical solutions that the present embodiment provides a screening apparatus for benchmarking enterprise lists, specifically, a plurality of preset ordered element sequences in accordance with the business scope of an enterprise are extracted, and the ordered element sequences are deduplicated and segmented to obtain core elements and modification elements in accordance with the business scope of the enterprise; constructing a plurality of feature clusters according to the core elements and the modification elements, and performing grouping measurement on each feature cluster to obtain a measurement value of each feature cluster; constructing an enterprise feature matrix of a target enterprise to be targeted, wherein the enterprise feature matrix comprises a core element vector and a modification element vector of the target enterprise; calculating the similarity between the target enterprise and each comparison enterprise according to the enterprise characteristic matrix and the metric value; and determining the benchmarking enterprise list from a plurality of comparison enterprises according to the similarity, thereby realizing providing the benchmarking enterprise list for the industry benchmarking analysis.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for screening a bidding enterprise list is characterized by comprising the following steps:
extracting a plurality of preset ordered element sequences of the operating range of the reference enterprise, and carrying out duplication removal and segmentation on the ordered element sequences to obtain core elements and modification elements of the operating range of the reference enterprise; the core elements are property classifications of verbs and verb phrases, and the modifying elements are property classifications of other words not including verbs and other words not including verbs;
constructing a plurality of feature clusters according to the core elements and the modification elements, and performing clustering measurement on each feature cluster to obtain a measurement value of each feature cluster;
constructing an enterprise feature matrix of a target enterprise to be targeted, wherein the enterprise feature matrix comprises a core element vector and a modification element vector of the target enterprise;
calculating the similarity between the target enterprise and each comparison enterprise according to the enterprise feature matrix and the metric value;
and determining a benchmarking enterprise list from the plurality of comparison enterprises according to the similarity.
2. The screening method of claim 1, wherein the extracting of the ordered element sequences of the predetermined multiple business scopes of the reference enterprise, and the de-duplication and segmentation of the ordered element sequences to obtain the core elements and the modification elements of the business scopes of the reference enterprise, comprises the steps of:
simplifying the ordered element sequence and removing meaningless characters in the ordered element sequence;
performing first segmentation processing on the simplified ordered element sequence according to a preset segmentation symbol;
performing secondary segmentation processing on the ordered element sequence subjected to the primary segmentation processing, and performing word segmentation processing on elements with more than two characters;
and classifying the ordered element sequence subjected to the second segmentation treatment to obtain the core element and the modification element.
3. The screening method of claim 2, wherein the extracting of the ordered element sequences of the predetermined multiple business scopes of the reference enterprise, and the de-duplication and segmentation of the ordered element sequences to obtain the core elements and the modification elements of the business scopes of the reference enterprise, further comprises the steps of:
and adjusting the group of the ordered elements according to the part of speech.
4. The screening method of claim 1, wherein the constructing a plurality of feature clusters according to the core element and the modifier element, and performing a clustering metric on each of the feature clusters to obtain a metric value for each of the feature clusters comprises the steps of:
performing duplicate removal treatment on the core elements and the modified elements respectively to form a core element sequence and a modified element sequence of the reference enterprise, and constructing a layer matrix according to the core element sequence and the modified element sequence, wherein the layer matrix comprises a core element layer matrix and a modified element layer matrix;
clustering the layer matrix according to a preset clustering rule to obtain a plurality of feature clusters;
calculating the metric value of each feature cluster according to a preset calculation rule;
adjusting the feature cluster.
5. The screening method of claim 1, wherein said determining a list of benchmarking businesses from the plurality of comparison businesses based on the similarity comprises:
searching from the plurality of comparison enterprises according to a preset similarity threshold, and selecting the enterprises with the similarity greater than the similarity threshold to form the benchmarking enterprise list;
or, the comparison enterprises are ranked according to the similarity between the comparison enterprises and the target enterprise, and a preset number of enterprises are picked out from the ranking list to form the benchmarking enterprise list.
6. A screening device for benchmarking enterprise lists is characterized by comprising:
the data extraction module is used for extracting a plurality of preset ordered element sequences of the business scope of the reference enterprise, and carrying out duplication removal and segmentation on the ordered element sequences to obtain core elements and modification elements of the business scope of the reference enterprise; the core elements are property classifications of verbs and verb phrases, and the modifying elements are property classifications of other words not including verbs and other words not including verbs;
the feature cluster construction module is used for constructing a plurality of feature clusters according to the core elements and the modification elements, and performing clustering measurement on each feature cluster to obtain a measurement value of each feature cluster;
the system comprises a feature matrix construction module, a feature matrix construction module and a feature matrix modification module, wherein the feature matrix construction module is used for constructing an enterprise feature matrix of a target enterprise needing to be subjected to targeting, and the enterprise feature matrix comprises a core element vector and a modification element vector of the target enterprise;
the similarity calculation module is used for calculating the similarity between the target enterprise and each comparison enterprise according to the enterprise feature matrix and the metric value;
and the benchmarking business list determining module is used for determining a benchmarking business list from the comparison businesses according to the similarity.
7. The screening apparatus of claim 6, wherein the data extraction module comprises:
the sequence simplifying unit is used for simplifying the ordered element sequence and eliminating meaningless characters in the ordered element sequence;
the first segmentation unit is used for performing first segmentation processing on the simplified ordered element sequence according to a preset segmentation symbol;
the second segmentation unit is used for performing second segmentation processing on the ordered element sequence subjected to the first segmentation processing and performing word segmentation processing on elements with more than two characters;
and the classification processing unit is used for classifying the ordered element sequence subjected to the second segmentation processing to obtain the core element and the modification element.
8. The screening apparatus of claim 7, wherein the data extraction module further comprises:
and the group adjusting unit is used for adjusting the groups of the ordered elements according to the parts of speech.
9. The screening apparatus of claim 6, wherein the feature cluster construction module comprises:
the duplication removal processing unit is used for respectively carrying out duplication removal processing on the core elements and the modified elements to form a core element sequence and a modified element sequence of the reference enterprise, and constructing a layer matrix according to the core element sequence and the modified element sequence, wherein the layer matrix comprises a core element layer matrix and a modified element layer matrix;
the grouping processing unit is used for grouping the layer matrix according to a preset grouping rule to obtain a plurality of feature clusters;
the metric value calculating unit is used for calculating the metric value of each feature cluster according to a preset calculating rule;
and the cluster adjusting unit is used for adjusting the feature clusters.
10. The screening apparatus of claim 6, wherein the benchmarking ticket determination module comprises:
the first determining unit is used for searching from the plurality of comparison enterprises according to a preset similarity threshold value, and selecting the enterprises with the similarity greater than the similarity threshold value to form the benchmarking enterprise list;
and the second determining unit is used for sequencing the comparison enterprises according to the similarity between the comparison enterprises and the target enterprise, and picking out a preset number of enterprises from the sequencing list to form the benchmarking enterprise list.
CN201710344912.4A 2017-05-16 2017-05-16 Method and device for screening benchmarking enterprise list Active CN107248023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710344912.4A CN107248023B (en) 2017-05-16 2017-05-16 Method and device for screening benchmarking enterprise list

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710344912.4A CN107248023B (en) 2017-05-16 2017-05-16 Method and device for screening benchmarking enterprise list

Publications (2)

Publication Number Publication Date
CN107248023A CN107248023A (en) 2017-10-13
CN107248023B true CN107248023B (en) 2020-09-25

Family

ID=60017576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710344912.4A Active CN107248023B (en) 2017-05-16 2017-05-16 Method and device for screening benchmarking enterprise list

Country Status (1)

Country Link
CN (1) CN107248023B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844570A (en) * 2017-11-03 2018-03-27 国云科技股份有限公司 A kind of Corporate Identity system and its implementation based on configuration layer
CN110659960A (en) * 2019-09-11 2020-01-07 深圳传世智慧科技有限公司 Automatic generation method of change management service product, server and change management system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182732A (en) * 2014-08-12 2014-12-03 南京师范大学 Handwritten Chinese character stroke confirmation method for carrying out similarity matching on the basis of characteristic matrix
CN104951866A (en) * 2015-05-19 2015-09-30 广西大学 Line loss comprehensive-management benchmarking evaluating system and method for county-level power enterprises
CN104966132A (en) * 2015-06-11 2015-10-07 安徽融信金模信息技术有限公司 Method for simulating and analyzing enterprise operation information
CN105512245A (en) * 2015-11-30 2016-04-20 青岛智能产业技术研究院 Enterprise figure building method based on regression model
CN106227722A (en) * 2016-09-12 2016-12-14 中山大学 A kind of extraction method based on listed company's bulletin summary
CN106504084A (en) * 2016-11-16 2017-03-15 航天信息股份有限公司 A kind of method and system for recognizing core enterprise in supply chain
US20170091692A1 (en) * 2015-09-30 2017-03-30 Linkedln Corporation Inferring attributes of organizations using member graph

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182732A (en) * 2014-08-12 2014-12-03 南京师范大学 Handwritten Chinese character stroke confirmation method for carrying out similarity matching on the basis of characteristic matrix
CN104951866A (en) * 2015-05-19 2015-09-30 广西大学 Line loss comprehensive-management benchmarking evaluating system and method for county-level power enterprises
CN104966132A (en) * 2015-06-11 2015-10-07 安徽融信金模信息技术有限公司 Method for simulating and analyzing enterprise operation information
US20170091692A1 (en) * 2015-09-30 2017-03-30 Linkedln Corporation Inferring attributes of organizations using member graph
CN105512245A (en) * 2015-11-30 2016-04-20 青岛智能产业技术研究院 Enterprise figure building method based on regression model
CN106227722A (en) * 2016-09-12 2016-12-14 中山大学 A kind of extraction method based on listed company's bulletin summary
CN106504084A (en) * 2016-11-16 2017-03-15 航天信息股份有限公司 A kind of method and system for recognizing core enterprise in supply chain

Also Published As

Publication number Publication date
CN107248023A (en) 2017-10-13

Similar Documents

Publication Publication Date Title
CN102193936B (en) Data classification method and device
WO2015035864A1 (en) Method, apparatus and system for data analysis
CN106844407B (en) Tag network generation method and system based on data set correlation
CN107066616A (en) Method, device and electronic equipment for account processing
CN105787025B (en) Network platform public account classification method and device
CN104077407B (en) A kind of intelligent data search system and method
CN107248023B (en) Method and device for screening benchmarking enterprise list
CN103559630A (en) Customer segmentation method based on customer attribute and behavior characteristic analysis
CN104573130B (en) The entity resolution method and device calculated based on colony
CN105022754A (en) Social network based object classification method and apparatus
CN106844416B (en) A kind of sub-topic method for digging
CN103136683A (en) Method and device for calculating product reference price and method and system for searching products
CN107341199A (en) A kind of recommendation method based on documentation & info general model
Vall et al. The Importance of Song Context in Music Playlists.
CN107180093A (en) Information search method and device and ageing inquiry word recognition method and device
CN106919699A (en) A kind of recommendation method for personalized information towards large-scale consumer
CN104572623B (en) A kind of efficient data analysis and summary method of online LDA models
US10387805B2 (en) System and method for ranking news feeds
CN106846029B (en) Collaborative filtering recommendation algorithm based on genetic algorithm and novel similarity calculation strategy
Lumauag et al. An enhanced recommendation algorithm based on modified user-based collaborative filtering
CN112035449A (en) Data processing method and device, computer equipment and storage medium
CN111639690A (en) Fraud analysis method, system, medium, and apparatus based on relational graph learning
CN110955767A (en) Algorithm and device for generating intention candidate set list set in robot dialogue system
CN109389321B (en) Item list classification method and device
Jiang et al. Durable product review mining for customer segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant