Disclosure of Invention
The invention aims to solve the technical problem of how to provide more detailed and more activated information presentation experience for people living in cities, so that people can effectively select and provide safer service guarantee by checking the operation state of enterprises related to the living industry.
The present invention has been made to solve the above-mentioned technical problems, such as classified presentation of detailed information for enterprises of different industries. The embodiment of the invention provides an enterprise data classification display method and device, a storage medium and electronic equipment.
According to an aspect of the embodiments of the present invention, there is provided an enterprise data classification display method, including:
acquiring initial data associated with a plurality of target objects, and performing feature extraction on the initial data to acquire dimension data of each target object;
extracting region information from the dimensional data to determine a region attribute of each target object based on the region information, constructing a classification rule from the dimensional data and determining a classification attribute of each target object based on the classification rule, and calculating an additional attribute of each target object based on the classification attribute and the dimensional data; and
and determining a classification position according to the region attribute and the classification attribute, and generating presentation content according to the dimension data and the additional attribute, so that classification presentation information of each target object is generated based on the classification position and the presentation content.
Optionally, in the foregoing method embodiments of the present invention, the acquiring initial data associated with a plurality of target objects includes:
acquiring raw data relating to a target object;
content recognition of the raw data to determine a plurality of target objects to which the raw data relates;
determining identification information of each target object according to the original data, and identifying an original data set constructed for each target object according to the identification information;
the raw data sets for each target object are aggregated to obtain initial data associated with a plurality of target objects.
Optionally, in the foregoing method embodiments of the present invention, the performing feature extraction on the initial data to obtain the dimension data of each target object includes:
acquiring an original data set of each target object from the initial data;
and respectively performing feature extraction on each original data item in a plurality of original data items in the original data set to acquire dimension data of each target object.
Optionally, in the above method embodiments of the present invention, the dimension data includes a plurality of dimension data items, and each dimension data item is used to describe a feature of the target object;
and dividing the dimension data into basic dimension data and additional dimension data according to the categories of the features described by the dimension data items.
Optionally, in the foregoing method embodiments of the present invention, the extracting region information from the dimension data to determine the region attribute of each target object based on the region information includes:
extracting a plurality of dimension data items associated with region information from base dimension data of the dimension data;
determining a plurality of region features from a plurality of dimensional data items associated with the region information;
the region attribute of each target object is determined according to the plurality of region features.
Optionally, in the foregoing method embodiments of the present invention, the determining the region attribute of each target object according to a plurality of region features includes:
parsing the plurality of region features to determine at least one candidate region;
determining the region weight of each region feature according to a preset region weight rule;
performing weighting calculation according to the region weight of each region feature so as to obtain a weighting result of each candidate region;
determining a target region from at least one candidate region according to the weighting result of each candidate region;
and determining the region attribute of each target object according to the region characteristics of the target region.
Optionally, in the foregoing method embodiments of the present invention, the analyzing the plurality of region features to determine at least one candidate region includes:
analyzing each region feature in the plurality of region features to acquire feature content associated with region information;
at least one candidate region is determined based on the feature content associated with the region information.
Optionally, in the foregoing method embodiments of the present invention, the determining the region weight of each region feature according to a preset region weight rule includes:
determining the region weight of each dimension data item associated with the region information according to a preset region weight rule;
and determining the region weight of the region feature corresponding to the dimension data item according to the region weight of each dimension data item associated with the region information, thereby determining the region weight of each region feature.
Optionally, in the foregoing method embodiments of the present invention, the constructing a classification rule according to the dimension data includes:
extracting a plurality of dimension data items associated with an object classification from base dimension data of the dimension data;
performing natural language processing on the contents of a plurality of dimensional data items associated with the object classification to obtain a plurality of classification features;
a classification sample set is generated based on the plurality of classification features and a classification model is generated based on the classification sample set, and a classification rule is constructed from the classification model.
Optionally, in the foregoing method embodiments of the present invention, the performing natural language processing on the content of the multiple dimension data items associated with the object classification to obtain multiple classification features includes:
performing filtering processing on the contents of a plurality of dimension data items associated with the object classification to obtain contents with invalid information filtered;
and segmenting the contents with the invalid information filtered, thereby obtaining a plurality of classification characteristics.
Optionally, in the foregoing method embodiments of the present invention, the generating a classification sample set based on a plurality of classification features includes:
performing word embedding on the plurality of classification features to obtain a plurality of classification feature vectors;
a classification sample set is generated from the plurality of classification feature vectors.
Optionally, in the foregoing method embodiments of the present invention, the determining the classification attribute of each target object based on the classification rule includes:
acquiring at least one dimension data item associated with the object classification of each target object;
determining the matching degree of at least one dimension data item and each classification item in the classification rule;
and determining the attributive classification items based on the matching degree of each distribution item, and determining the classification attribute of each target object based on the attributive classification items.
Optionally, in the foregoing method embodiments of the present invention, the calculating an additional attribute of each target object based on the classification attribute and the dimension data includes:
extracting a plurality of dimensional data items associated with additional content from additional dimensional data of the dimensional data;
determining a plurality of additional features of the target object from a plurality of dimensional data items associated with the additional content;
and determining classification items to which the target objects belong based on the classification attributes, and calculating the additional attributes of each target object according to the classification items and the plurality of additional features.
Optionally, in the foregoing method embodiments of the present invention, the calculating an additional attribute of each target object according to the classification entry and the plurality of additional features includes:
determining a feature score for each target object based on a plurality of additional features of the target object;
determining all target objects included in each classification item and calculating the average characteristic score of all target objects included in each classification item;
and calculating the presentation score of each target object according to the feature score of each target object and the average feature score of the belonged classification item, and taking the presentation score as an additional attribute of each target object.
Optionally, in the foregoing method embodiments of the present invention, the determining a classification position according to the region attribute and the classification attribute includes:
determining the area name of at least one grade according to the area attribute, and determining the first classification position of each target object according to the area name of each grade;
determining a classification name of at least one grade according to the classification attribute, and determining a second classification position of each target object according to the classification name of each grade;
and forming the first classification position and the second classification position into a classification position of each target object.
Optionally, in the foregoing method embodiments of the present invention, the generating presentation content according to the dimension data and the additional attribute includes:
extracting a plurality of dimensional data items associated with additional content from additional dimensional data of the dimensional data;
determining a plurality of additional features of the target object from a plurality of dimensional data items associated with the additional content;
selecting at least one additional feature from the plurality of additional features for use as at least one presentation feature;
presentation content is generated based on the at least one presentation feature and the additional attributes.
Optionally, in the foregoing method embodiments of the present invention, the generating classification presentation information of each target object based on the classification position and the presentation content includes:
determining a region identifier and a classification identifier of each target object based on the classification position;
and generating the classified presentation information of each target object by using the area identification, the classified identification and the presentation content.
According to another aspect of the embodiments of the present invention, there is provided an enterprise data classification display apparatus, including:
the acquisition module is used for acquiring initial data associated with a plurality of target objects and performing feature extraction on the initial data to acquire dimension data of each target object;
a processing module for extracting region information from the dimensional data to determine a region attribute of each target object based on the region information, constructing a classification rule from the dimensional data and determining a classification attribute of each target object based on the classification rule, and calculating an additional attribute of each target object based on the classification attribute and the dimensional data; and
and the generation module is used for determining a classification position according to the region attribute and the classification attribute and generating presentation content according to the dimension data and the additional attribute so as to generate classification presentation information of each target object based on the classification position and the presentation content.
Optionally, in each of the above apparatus embodiments of the present invention, the obtaining module includes:
a first acquisition unit configured to acquire original data relating to a target object;
the identification unit is used for identifying the content of the original data so as to determine a plurality of target objects related to the original data;
the identification unit is used for determining the identification information of each target object according to the original data and identifying the original data set constructed for each target object according to the identification information;
and the aggregation unit is used for aggregating the original data set of each target object so as to acquire initial data associated with a plurality of target objects.
Optionally, in each of the above apparatus embodiments of the present invention, the obtaining module further includes:
a second obtaining unit, configured to obtain an original data set of each target object from the initial data;
the first extraction unit is used for respectively extracting features of each original data item in a plurality of original data items in the original data set so as to obtain dimension data of each target object.
Optionally, in the above apparatus embodiments of the present invention, the dimension data includes a plurality of dimension data items, and each dimension data item is used to describe a feature of the target object;
the method also comprises an initialization module used for dividing the dimension data into basic dimension data and additional dimension data according to the categories of the features described by the dimension data items.
Optionally, in each of the above apparatus embodiments of the present invention, the processing module includes:
a second extraction unit configured to extract a plurality of dimensional data items associated with region information from base dimensional data of the dimensional data;
a first determination unit configured to determine a plurality of region features from a plurality of dimensional data items associated with region information;
a second determining unit for determining a region attribute of each target object according to the plurality of region features.
Optionally, in each of the apparatus embodiments of the present invention, the second determining unit is specifically configured to: parsing the plurality of region features to determine at least one candidate region; determining the region weight of each region feature according to a preset region weight rule; performing weighting calculation according to the region weight of each region feature so as to obtain a weighting result of each candidate region; determining a target region from at least one candidate region according to the weighting result of each candidate region; and determining the region attribute of each target object according to the region characteristics of the target region.
Optionally, in each of the apparatus embodiments of the present invention, the second determining unit is specifically configured to: analyzing each region feature in the plurality of region features to acquire feature content associated with region information; at least one candidate region is determined based on the feature content associated with the region information.
Optionally, in each of the apparatus embodiments of the present invention, the second determining unit is specifically configured to: determining the region weight of each dimension data item associated with the region information according to a preset region weight rule; and determining the region weight of the region feature corresponding to the dimension data item according to the region weight of each dimension data item associated with the region information, thereby determining the region weight of each region feature.
Optionally, in each of the above apparatus embodiments of the present invention, the processing module further includes:
a third extraction unit, configured to extract a plurality of dimension data items associated with an object classification from base dimension data of the dimension data;
a language processing unit for performing natural language processing for the contents of a plurality of dimensional data items associated with the object classification to obtain a plurality of classification features;
a construction unit for generating a classification sample set based on the plurality of classification features, generating a classification model based on the classification sample set, and constructing a classification rule according to the classification model.
Optionally, in each of the above apparatus embodiments of the present invention, the language processing unit is specifically configured to: performing filtering processing on the contents of a plurality of dimension data items associated with the object classification to obtain contents with invalid information filtered; and segmenting the contents with the invalid information filtered, thereby obtaining a plurality of classification characteristics.
Optionally, in each of the above apparatus embodiments of the present invention, the constructing unit is specifically configured to: performing word embedding on the plurality of classification features to obtain a plurality of classification feature vectors; a classification sample set is generated from the plurality of classification feature vectors.
Optionally, in each of the above apparatus embodiments of the present invention, the processing module further includes:
a third acquisition unit configured to acquire at least one dimension data item associated with the object classification for each target object;
a third determining unit, configured to determine a matching degree of the at least one dimension data item with each classification entry in the classification rule;
and the fourth determining unit is used for determining the attributive classification items based on the matching degree of each distribution item and determining the classification attribute of each target object based on the attributive classification items.
Optionally, in each of the above apparatus embodiments of the present invention, the processing module further includes:
a fourth extraction unit configured to extract a plurality of dimensional data items associated with additional content from additional dimensional data of the dimensional data;
a fifth determining unit configured to determine a plurality of additional features of the target object from the plurality of dimensional data items associated with the additional content;
and the calculating unit is used for determining the classification items to which the target objects belong based on the classification attributes and calculating the additional attributes of each target object according to the classification items and the plurality of additional features.
Optionally, in each of the above apparatus embodiments of the present invention, the calculating unit is specifically configured to: determining a feature score for each target object based on a plurality of additional features of the target object; determining all target objects included in each classification item and calculating the average characteristic score of all target objects included in each classification item; and calculating the presentation score of each target object according to the feature score of each target object and the average feature score of the belonged classification item, and taking the presentation score as an additional attribute of each target object.
Optionally, in each of the above apparatus embodiments of the present invention, the generating module includes:
a sixth determining unit, configured to determine, according to the area attribute, an area name of at least one level, and determine, according to the area name of each level, a first classification position of each target object;
a seventh determining unit, configured to determine a classification name of at least one level according to the classification attribute, and determine a second classification position of each target object according to the classification name of each level;
and the forming unit is used for forming the first classification position and the second classification position into the classification position of each target object.
Optionally, in each of the above apparatus embodiments of the present invention, the generating module further includes:
a fifth extraction unit configured to extract a plurality of dimensional data items associated with additional content from additional dimensional data of the dimensional data;
an eighth determining unit configured to determine a plurality of additional features of the target object from the plurality of dimensional data items associated with the additional content;
a selection unit for selecting at least one additional feature from the plurality of additional features to be used as at least one presentation feature;
a first generating unit for generating presentation content based on the at least one presentation feature and the additional attribute.
Optionally, in each of the above apparatus embodiments of the present invention, the generating module further includes:
a ninth determining unit for determining a region identification and a classification identification of each target object based on the classification position;
and the second generation unit is used for generating the classification presentation information of each target object by using the area identification, the classification identification and the presentation content.
According to yet another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium, wherein the storage medium stores a computer program, and the computer program is configured to execute the method according to any one of the above-mentioned embodiments of the present invention.
According to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus, including:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method according to any of the above embodiments of the present invention.
Based on the enterprise data classification display method and device, the storage medium and the electronic device provided by the embodiment of the invention, a specific industry classification is defined based on the industry characteristics of the enterprise, such as a local life industry, and the industry information of the enterprise based on a local life label is generated by analyzing each data dimension of the enterprise. The invention redefines the 'life industry' and generates industry-based risk indicators for existing enterprises. In addition, the risk indicators of the enterprises can be analyzed through the algorithm, so that more competitive enterprises and more transparent safety guarantee are provided for users based on the risk indicators.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Detailed Description
Hereinafter, example embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present invention are used merely to distinguish one element, step, device, module, or the like from another element, and do not denote any particular technical or logical order therebetween.
It should also be understood that in embodiments of the present invention, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the invention may be generally understood as one or more, unless explicitly defined otherwise or stated to the contrary hereinafter.
In addition, the term "and/or" in the present invention is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In the present invention, the character "/" generally indicates that the preceding and following related objects are in an "or" relationship.
It should also be understood that the description of the embodiments of the present invention emphasizes the differences between the embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations, and with numerous other electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Exemplary method
Fig. 1 is a flowchart illustrating an enterprise data classification display method according to an exemplary embodiment of the present invention. The embodiment can be applied to an electronic device, as shown in fig. 1, and includes the following steps:
step 101, obtaining initial data associated with a plurality of target objects, and performing feature extraction on the initial data to obtain dimension data of each target object. Wherein obtaining initial data associated with a plurality of target objects comprises: acquiring raw data relating to a target object; content recognition of the raw data to determine a plurality of target objects to which the raw data relates; determining identification information of each target object according to the original data, and identifying an original data set constructed for each target object according to the identification information; the raw data sets for each target object are aggregated to obtain initial data associated with a plurality of target objects.
For example, the invention can obtain the original data or data related to the enterprise from the internet through a network data acquisition tool such as a web crawler. As one example, the target object may be a company (of various types) or a business and the identification information of the target object may be a name, a code, an identification code, or the like of the target object that can uniquely identify the target object. When the target object is a business, the identification information may be a business name, a business unified credit code, a business identification code, or the like that is capable of uniquely identifying the business.
As one embodiment, a plurality of businesses associated with raw data is determined by performing content recognition on the raw data. For example, if identification information of 10000 businesses is included in the raw data, it may be determined that the raw data relates to 10000 businesses. And then, dividing and merging the original data according to the identification information, thereby constructing an original data set for each enterprise and identifying the original data set by using the identification information of the enterprise. The invention then generates initial data from the raw data set of all of the enterprises. All data is generated, for example, by aggregating or merging the raw data sets for each enterprise. To this end, the initial data may include the raw data set of all enterprises that the present invention needs to use. As one embodiment, each target object's raw data set includes a plurality of raw data items. Each raw data item may serve as a data basis for acquiring dimensional data. For example, the original data items are: on 20/2/2021, if corporate legal representative of company a is listed as the executor of the restricted consumption, the dimension data of the restricted consumption is increased by one.
As an embodiment, performing feature extraction on the initial data to obtain dimension data of each target object includes: acquiring an original data set of each target object from the initial data; and respectively performing feature extraction on each original data item in a plurality of original data items in the original data set to acquire dimension data of each target object. As described above, the dimension data is acquired by extracting the content associated with any of the plurality of dimension data from each of the original data items. For example, dimension data for restricted consumption is extracted from the original data items relating to restricted consumption.
As one embodiment, the dimension data includes a plurality of dimension data items, and each dimension data item is used to describe a feature of the target object. For example, dimension data for an enterprise may include 100 dimension data items, and each dimension data item may describe a feature of the enterprise.
As an embodiment, from the perspective of the categories of features described by the dimension data items, the dimension data from the raw data or the raw data items may include: base dimensional data and risk dimensional data. The basic dimension data comprises an enterprise name dimension, an enterprise address information dimension, an enterprise unified credit code dimension, an operation range dimension and the like. The risk dimension data comprises public information such as enterprise scale dimension, dispute announcement dimension, consumption limiting dimension and the like. In addition, the same dimension data may have both area attributes and classification attributes, for example, a business name information dimension.
By extracting the characteristics of the initial data to obtain the dimension data of each target object, the invention can describe each target object more clearly and completely according to the dimension data. And, based on this clearer and complete description, accurate classification presentation information can then be generated for each target object.
Step 102, extracting region information from the dimension data to determine a region attribute of each target object based on the region information, constructing a classification rule according to the dimension data and determining a classification attribute of each target object based on the classification rule, and calculating an additional attribute of each target object based on the classification attribute and the dimension data.
Generally, in the invention, algorithm analysis can be performed by using basic data dimensions, so that comprehensive characteristic information such as cities, industry classifications, risk indexes and the like corresponding to target objects can be respectively analyzed. And then, performing life industrialization according to the comprehensive characteristic information, and providing more detailed and professional enterprise information with high life association for the user. And then, extracting the contents of name information, operation range information and the like of the enterprise, resolving the industry range of the enterprise through an algorithm, and carrying out algorithm resolution on the extracted data of the enterprise scale, dispute data and the like to obtain risk index data serving as risk prompts of the enterprise.
As one embodiment, extracting region information from the dimension data to determine a region attribute of each target object based on the region information includes: extracting a plurality of dimension data items associated with the region information from base dimension data of the dimension data; determining a plurality of region features from a plurality of dimensional data items associated with the region information; the region attribute of each target object is determined according to the plurality of region features.
For example, a plurality of dimensional data items associated with area information, such as a business name dimension, a business address information dimension, and a business uniform credit code dimension, are extracted from the underlying dimensional data of the business. A plurality of regional characteristics are then determined based on the business name dimension, the business address information dimension, the business unified credit code dimension, and the like. For example, the enterprise name dimension of enterprise a is happy childhood kindergarten in the hai lake region of beijing, and the enterprise address information dimension is institute No. 16, chunlu, the hai lake region of beijing, for which a plurality of regional characteristics can be determined as beijing and hai lake regions.
Therefore, the method can calculate and obtain the affiliated city information and the regional information of the enterprise according to the division of the national administrative region by extracting the name information, the address information, the unified credit code of the enterprise and other basic information of the enterprise. As described above, for example, when the name of a business has a feature of "beijing", hailake ", or the like, the city of the business may be matched as" beijing city ", and the district may be ranked as" hailake district ".
However, since enterprises have behaviors such as renaming, relocation, and national administrative region division change, it is necessary to perform a unified analysis based on multidimensional data such as enterprise names, address information, and unified credit codes, to establish a region code dictionary model, and perform association matching between enterprises and regions based on the dictionary model.
As described above, the present invention determines the region attribute of each target object based on a plurality of region features. Fig. 2 is a flowchart illustrating a method for determining a region attribute according to an exemplary embodiment of the present invention. As shown in fig. 2, determining the region attribute of each target object according to the plurality of region features includes:
in step 201, the plurality of region features are analyzed to determine at least one candidate region. Wherein parsing the plurality of region features to determine at least one candidate region comprises: analyzing each of the plurality of regional characteristics to obtain characteristic content associated with the regional information; at least one candidate region is determined based on the feature content associated with the region information. For example, when the registered area feature of the business B is "beijing" and the office area feature of the business B is "tianjin", the candidate areas of the business B are determined to include beijing city and tianjin city according to the two area features.
Step 202, determining the area weight of each area feature according to a preset area weight rule. Specifically, the method comprises the following steps: determining the region weight of each dimension data item associated with the region information according to a preset region weight rule; and determining the region weight of the region feature corresponding to the dimension data item according to the region weight of each dimension data item associated with the region information, thereby determining the region weight of each region feature. For example, the preset region weight rule is that the region weight of the registration region feature is 0.4, and the region weight of the office region feature is 0.6.
And step 203, performing weighting calculation according to the region weight of each region feature, thereby obtaining the weighting result of each candidate region. For example, when it is determined statistically from the area information that "beijing" appears 5 times and "tianjin" appears 4 times in the plurality of area features of the dimension data, it may be determined that the weighted calculation result of "beijing" is 5 × 0.4 — 2 and the weighted calculation result of "tianjin" is 4 × 0.6 — 2.4.
And step 204, determining a target area from at least one candidate area according to the weighting result of each candidate area. As described above, since the weighting calculation result of "beijing" is 2 and the weighting calculation result of "tianjin" is 2.4, "tianjin" having a larger weighting result is set as the target area.
Step 205, determining the region attribute of each target object according to the region feature of the target region. For example, the area of "Tianjin" is characterized by the city of Tianjin. Further, the region attribute includes a region name of at least one level. For example, the regional attribute of Business B is the Tianjin City Wuqing district, where Tianjin City is the region name at the first level and Wuqing district is the region name at the second level.
As an embodiment, constructing a classification rule from dimensional data includes: extracting a plurality of dimension data items associated with the object classification from base dimension data of the dimension data; performing natural language processing on the contents of a plurality of dimensional data items associated with the object classification to obtain a plurality of classification features; a classification sample set is generated based on the plurality of classification features and a classification model is generated based on the classification sample set, and a classification rule is constructed from the classification model.
For example, the plurality of dimensional data items associated with the object taxonomy may include a business's business scope, name information, and the like. It can be seen that the name information may be used as a basis for determining the region attribute or a basis for determining the classification attribute. Wherein natural language processing is performed on the content of the plurality of dimensional data items associated with the object classification to obtain a plurality of classification features, including: performing filtering processing on the contents of a plurality of dimension data items associated with the object classification to obtain contents with invalid information filtered; and segmenting the contents with the invalid information filtered, thereby obtaining a plurality of classification characteristics.
The method and the device can form basic data by extracting name information of enterprises and business range information of the enterprises, perform Processing such as filtering, word segmentation and word embedding on the information by using a Natural Language Processing (NLP) technology, construct a vector space model by using an NLP algorithm, establish an industry classification model and calculate industry information of the enterprises.
As one embodiment, the present invention generates a classification sample set based on a plurality of classification features, comprising: performing word embedding on the plurality of classification features to obtain a plurality of classification feature vectors; a classification sample set is generated from the plurality of classification feature vectors.
As an embodiment, determining the classification attribute of each target object based on the classification rule includes: acquiring at least one dimension data item associated with the object classification of each target object; determining the matching degree of at least one dimension data item and each classification item in the classification rule; and determining the attributive classification items based on the matching degree of each distribution item, and determining the classification attribute of each target object based on the attributive classification items. For example, the "beijing hailpeng tianpeyu center" has a matching degree with the paternity industry of 95% and a matching degree with the beauty industry of 5%, and thus it can be determined that the "beijing hailpeng tianpeyu center" belongs to the paternity industry.
The classification attribute includes a classification name of at least one level. For example, a first level of category names for "happy childhood kindergarten in the Haihu district of Beijing City" is the parent-child trade, and a second level of category names is the kindergarten.
As described above, the present invention calculates additional attributes for each target object based on the classification attributes and the dimension data. Fig. 3 is a flowchart illustrating a method for calculating an additional attribute according to an exemplary embodiment of the present invention. As shown in FIG. 3, computing additional attributes for each target object based on the classification attributes and the dimension data includes:
in step 301, a plurality of dimension data items associated with additional content are extracted from additional dimension data of the dimension data. Wherein the additional dimensional data is, for example, risk dimensional data and the additional content is, for example, risk-hint-related content for describing the target object. According to the invention, the enterprise risk index can be obtained by analyzing the risk dimension data. For example, by analyzing risk dimension data such as the scale of an enterprise and litigation, it is possible to obtain a development index and a risk index of an enterprise in the business process, and determine risk index information of the enterprise.
At step 302, a plurality of additional features of the target object are determined from the plurality of dimensional data items associated with the additional content. The additional content is content for better presenting a detailed introduction of the target object to the user. Thus, there are multiple dimensions of data associated with the additional content, such as an enterprise scale dimension, a dispute bulletin dimension, a limit consumption dimension, an administrative penalty dimension, and so forth. Accordingly, the determined plurality of additional features may be enterprise size, dispute announcements, restricted consumption, administrative penalties, and the like.
Step 303, determining a classification item to which the target object belongs based on the classification attribute, and calculating an additional attribute of each target object according to the classification item and the plurality of additional features. The method specifically comprises the following steps: determining a feature score for each target object based on a plurality of additional features of the target object; determining all target objects included in each classification item and calculating the average characteristic score of all target objects included in each classification item; and calculating the presentation score of each target object according to the feature score of each target object and the average feature score of the belonged classification item, and taking the presentation score as an additional attribute of each target object.
As one example, the feature score ranges from 0 to 10 points. After acquiring a plurality of additional features such as "happy childhood kindergarten", the feature score of "happy childhood kindergarten" is determined to be 6 points according to the plurality of additional features (e.g., the scale of enterprises, dispute announcements, restrictions on consumption, administrative penalties, etc.). In particular, each additional feature may be converted to a different positive or negative score, and the sum of all additional features taken as the feature score. The 'happy childhood kindergarten' belongs to kindergarten classification of parent-child industry, and the average characteristic score of all kindergartens in the kindergarten classification of the parent-child industry is calculated to be 5. Then, the ratio of the feature score of "happy childhood kindergarten" to the average feature score is calculated to be 6/5 ═ 1.2. Assuming that the initial value of the presentation score is 2.5 stars, the presentation score of "happy childhood kindergarten" is 1.2X2.5 ═ 3 stars.
The present invention determines the attributed region for each target object by calculating the region attribute, and determines the specific class of each target object by calculating the classification attribute. In this way, each target object can be better classified accurately. The method and the device calculate the additional attribute of each target object based on the classification attribute and the dimension data, so that the relevant information of the target object can be better presented according to the additional attribute.
And 103, determining a classification position according to the region attribute and the classification attribute, and generating presentation content according to the dimension data and the additional attribute, so that the classification presentation information of each target object is generated based on the classification position and the presentation content. As an embodiment, determining the classification location based on the region attribute and the classification attribute includes: and determining the area name of at least one level according to the area attribute, and determining the first classification position of each target object according to the area name of each level. For example, the first classification location is the Haishen district of Beijing. For this purpose, the area location of the target object is located under the classification directory of the hai lake district of beijing. And determining a classification name of at least one grade according to the classification attribute, and determining a second classification position of each target object according to the classification name of each grade. For example, if the second classification location is a parent-child kindergarten, the target object is located under the classification directory of the parent-child kindergarten. And forming the first classification position and the second classification position into a classification position of each target object.
As one embodiment, generating presentation content from the dimension data and the additional attributes includes: extracting a plurality of dimensional data items associated with the additional content from additional dimensional data of the dimensional data; determining a plurality of additional features of the target object from a plurality of dimensional data items associated with the additional content; selecting at least one additional feature from the plurality of additional features for use as at least one presentation feature; presentation content is generated based on the at least one presentation feature and the additional attributes. For example, the presentation feature of "happy childhood kindergarten" is 3 stars, and the contents of the enterprise scale of "happy childhood kindergarten", dispute bulletins, consumption restriction, administrative penalties, and the like are also presented.
As an embodiment, generating the classification presentation information of each target object based on the classification location and the presentation content includes: determining a region identifier and a classification identifier of each target object based on the classification position; and generating the classified presentation information of each target object by using the area identification, the classified identification and the presentation content.
The invention generates the classification presentation information of each target object through the area identification, the classification identification and the presentation content, so that the classification presentation information of each target object can be better presented in a regional and hierarchical manner. In addition, the invention describes the important information of the target object by presenting the content, so that the user can intuitively know the target object.
Fig. 4 is a schematic diagram of the classification presentation information provided by an exemplary embodiment of the present invention. As one embodiment, after the industry classification of the enterprise is parsed, the enterprises of the "life industry field" are summarized based on the relevant standards. Through the method, the enterprises in the industries which are closely related to the life of people are gathered, the commonalities and the difference characteristics of the enterprises are analyzed, and the industry classification for defining the life label is further determined. For example: the industries of kindergarten, playground and the like can be defined as parent-child industries. "beauty" and "massage" can be defined as "beauty" industry, etc. Wherein the classification criteria may be established in dependence on national industry standards.
And carrying out industry focusing under cities according to the defined 'life industry field', and counting the distribution of different industries under each city through an algorithm. By calculating the competitiveness of different enterprise industries in cities, personalized enterprise recommendation can be conveniently carried out. In addition, risk indexes of enterprises in various industries are calculated through the models, risk prompt of the enterprises is carried out, and more detailed life experience and safer guarantee are brought to users.
As shown in FIG. 4, for example, from the perspective of the region properties, the region at the first level includes Beijing, Shanghai, Guangzhou, Shenzhen, and so on. When the user selects Beijing, a second level of area may be presented, including the east city area, the west city area, the Hai lake area, and so on. After the user selects the Beijing Haizu area, a first level of categorization may be presented, including, for example, the Liren industry and the relatives industry. Wherein the beauty industry includes 789 businesses in total for 8 subcategories and the parent industry includes 568 businesses in total for 6 subcategories.
When the user selects the parent-child sector, a second level of classification is presented, including 51 kindergartens and 36 playgrounds, etc. When the user selects a kindergarten, the summary information of each kindergarten can be reviewed. For example, a small sapling bilingual kindergarten is on a 5-star scale and has 0 piece of risk information, a happy childhood kindergarten is on a 3-star scale and has 5 pieces of risk information, and a happy youyou kingarten is on a 4-star scale and has 3 pieces of risk information. When the user has selected a happy childhood kindergarten, more detailed presentation information is provided, including: 1. legal limits consumption, 2, administrative penalties, 3, lawsuits, 4, administrative penalties, and 5 executives.
Exemplary devices
Fig. 5 is a schematic structural diagram of an enterprise data classification display device according to an exemplary embodiment of the present invention. As shown in fig. 5, the present embodiment includes:
an obtaining module 51, configured to obtain initial data associated with a plurality of target objects, and perform feature extraction on the initial data to obtain dimension data of each target object.
As an embodiment, the obtaining module 51 includes: a first acquisition unit configured to acquire original data relating to a target object; the identification unit is used for identifying the content of the original data to determine a plurality of target objects related to the original data; the identification unit is used for determining the identification information of each target object according to the original data and identifying the original data set constructed for each target object according to the identification information; and the aggregation unit is used for aggregating the original data set of each target object so as to acquire initial data associated with a plurality of target objects. Wherein the raw data set includes a plurality of raw data items.
As an embodiment, the obtaining module 51 further includes: a second acquisition unit, configured to acquire an original data set of each target object from the initial data; the first extraction unit is used for respectively extracting features of each original data item in a plurality of original data items in the original data set so as to obtain dimension data of each target object. Wherein the dimension data comprises a plurality of dimension data items, and each dimension data item is used for describing the characteristic of the target object;
a processing module 52 for extracting region information from the dimension data to determine a region attribute of each target object based on the region information, constructing a classification rule from the dimension data and determining a classification attribute of each target object based on the classification rule, and calculating an additional attribute of each target object based on the classification attribute and the dimension data.
As one embodiment, the processing module 52 includes: a second extraction unit configured to extract a plurality of dimensional data items associated with the region information from base dimensional data of the dimensional data; a first determination unit configured to determine a plurality of region features from a plurality of dimensional data items associated with region information; a second determining unit for determining a region attribute of each target object according to the plurality of region features.
As an embodiment, the second determining unit is specifically configured to: parsing the plurality of region features to determine at least one candidate region; determining the region weight of each region feature according to a preset region weight rule; performing weighting calculation according to the region weight of each region feature so as to obtain a weighting result of each candidate region; determining a target region from at least one candidate region according to the weighting result of each candidate region; and determining the region attribute of each target object according to the region characteristics of the target region.
As an embodiment, the second determining unit is specifically configured to: analyzing each of the plurality of regional characteristics to obtain characteristic content associated with the regional information; at least one candidate region is determined based on the feature content associated with the region information.
As an embodiment, the second determining unit is specifically configured to: determining the region weight of each dimension data item associated with the region information according to a preset region weight rule; and determining the region weight of the region feature corresponding to the dimension data item according to the region weight of each dimension data item associated with the region information, thereby determining the region weight of each region feature. Wherein the region attribute comprises at least one rank of region name.
As one embodiment, the processing module 52 further includes: a third extraction unit configured to extract a plurality of dimensional data items associated with the object classification from base dimensional data of the dimensional data; a language processing unit for performing natural language processing for the contents of a plurality of dimensional data items associated with the object classification to obtain a plurality of classification features; a construction unit for generating a classification sample set based on the plurality of classification features, generating a classification model based on the classification sample set, and constructing a classification rule according to the classification model.
As an embodiment, the language processing unit is specifically configured to: performing filtering processing on the contents of a plurality of dimension data items associated with the object classification to obtain contents with invalid information filtered; and segmenting the contents with the invalid information filtered, thereby obtaining a plurality of classification characteristics.
As an embodiment, the construction unit is specifically configured to: performing word embedding on the plurality of classification features to obtain a plurality of classification feature vectors; a classification sample set is generated from the plurality of classification feature vectors.
As one embodiment, the processing module 52 further includes: a third acquisition unit configured to acquire at least one dimension data item associated with the object classification for each target object; a third determining unit, configured to determine a matching degree of the at least one dimension data item with each classification entry in the classification rule; and the fourth determining unit is used for determining the attributive classification items based on the matching degree of each distribution item and determining the classification attribute of each target object based on the attributive classification items. Wherein the classification attribute comprises a classification name of at least one class.
As one embodiment, the processing module 52 further includes: a fourth extraction unit operable to extract a plurality of dimensional data items associated with the additional content from additional dimensional data of the dimensional data; a fifth determining unit configured to determine a plurality of additional features of the target object from the plurality of dimensional data items associated with the additional content; and the calculating unit is used for determining the classification items to which the target objects belong based on the classification attributes and calculating the additional attributes of each target object according to the classification items and the plurality of additional features.
As an embodiment, the computing unit is specifically configured to: determining a feature score for each target object based on a plurality of additional features of the target object; determining all target objects included in each classification item and calculating the average characteristic score of all target objects included in each classification item; and calculating the presentation score of each target object according to the feature score of each target object and the average feature score of the belonged classification item, and taking the presentation score as an additional attribute of each target object.
And a generating module 53, configured to determine a classification position according to the region attribute and the classification attribute, and generate presentation content according to the dimension data and the additional attribute, so as to generate classification presentation information of each target object based on the classification position and the presentation content.
As an embodiment, the generating module 53 includes: a sixth determining unit, configured to determine, according to the area attribute, an area name of at least one level, and determine, according to the area name of each level, a first classification position of each target object; a seventh determining unit, configured to determine a classification name of at least one level according to the classification attribute, and determine a second classification position of each target object according to the classification name of each level; and the forming unit is used for forming the first classification position and the second classification position into the classification position of each target object.
As an embodiment, the generating module 53 further includes: a fifth extraction unit operable to extract a plurality of dimensional data items associated with the additional content from additional dimensional data of the dimensional data; an eighth determining unit configured to determine a plurality of additional features of the target object from the plurality of dimensional data items associated with the additional content; a selection unit for selecting at least one additional feature from the plurality of additional features to be used as at least one presentation feature; a first generating unit for generating presentation content based on the at least one presentation feature and the additional attribute.
As an embodiment, the generating module 53 further includes: a ninth determining unit for determining a region identification and a classification identification of each target object based on the classification position; and the second generation unit is used for generating the classification presentation information of each target object by using the area identification, the classification identification and the presentation content.
And the initialization module 54 is used for dividing the dimension data into basic dimension data and additional dimension data according to the categories of the features described by the dimension data items.
Exemplary electronic device
Fig. 6 is a structure of an electronic device according to an exemplary embodiment of the present invention. The electronic device may be either or both of the first device and the second device, or a stand-alone device separate from them, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom. FIG. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 6, the electronic device includes one or more processors 61 and a memory 62.
The processor 61 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
The memory 62 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processor 61 to implement the enterprise data taxonomy exposure methods of the software programs of the various embodiments of the present disclosure described above and/or other desired functionality. In one example, the electronic device may further include: an input device 63 and an output device 64, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 63 may also include, for example, a keyboard, a mouse, and the like.
The output device 64 can output various information to the outside. The output devices 64 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 6, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the enterprise data taxonomy exposure method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the enterprise data taxonomy exposure method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.