CN113239126A - Business activity information standardization scheme based on BOR method - Google Patents

Business activity information standardization scheme based on BOR method Download PDF

Info

Publication number
CN113239126A
CN113239126A CN202110511892.1A CN202110511892A CN113239126A CN 113239126 A CN113239126 A CN 113239126A CN 202110511892 A CN202110511892 A CN 202110511892A CN 113239126 A CN113239126 A CN 113239126A
Authority
CN
China
Prior art keywords
data
service
entity
business
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110511892.1A
Other languages
Chinese (zh)
Inventor
路艳玲
李水旺
徐林
吴江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank Of China Insurance Information Technology Management Co ltd
Original Assignee
Bank Of China Insurance Information Technology Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank Of China Insurance Information Technology Management Co ltd filed Critical Bank Of China Insurance Information Technology Management Co ltd
Priority to CN202110511892.1A priority Critical patent/CN113239126A/en
Publication of CN113239126A publication Critical patent/CN113239126A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The invention discloses a business activity information standardization scheme based on a BOR method, which relates to the technical field of information and mainly aims to standardize business object information and realize the standardized processing and high-quality conversion from business activity and business information to data elements, thereby being beneficial to the management and application of business data and being the basis for improving the business informatization level and realizing high-quality development. The method comprises the following steps: acquiring service activity information to be processed; determining attribute information corresponding to a service object in the service activity information; inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information; inputting the data entity into a preset theme domain classification model for theme domain classification to obtain a theme domain corresponding to the data entity; and further determining a standardized processing result corresponding to the service activity information. The invention is suitable for processing the service information.

Description

Business activity information standardization scheme based on BOR method
Technical Field
The invention relates to the technical field of information, in particular to a business activity information standardization scheme based on a BOR method.
Background
With the continuous improvement of user requirements, various business activities which can be participated by users can be generated, the users can generate a large amount of business activity information in the process of participating in the business activities, the business activity information is effectively subjected to standardized processing, the high-quality conversion from the business activities and the business information to data elements is realized, the management and the storage of business data are facilitated, the business informatization level can be effectively improved, and the method is the basis of enterprise digital transformation and high-quality development. At present, the same way of processing the business activity information does not exist, so that the business data of the business activity cannot be defined in a standardized way, and the effective management and the application of the business data are not facilitated.
Disclosure of Invention
The invention provides a business activity information standardization scheme based on a BOR method, which is mainly used for carrying out standardization processing on business object information, thereby realizing high-quality conversion from business activities and business information to data elements and being beneficial to management and application of business data.
According to a first aspect of the present invention, a business activity information standardization scheme based on a BOR method is provided, which includes:
acquiring service activity information to be processed;
determining attribute information corresponding to a service object in the service activity information;
inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information;
inputting the data entity into a preset theme domain classification model for theme domain classification to obtain a theme domain corresponding to the data entity;
and determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
According to a second aspect of the present invention, there is provided a business activity information standardizing device based on a BOR method, including:
the acquisition unit is used for acquiring the service activity information to be processed;
a first determining unit, configured to determine attribute information corresponding to a service object in the service activity information;
the first classification unit is used for inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information;
the second classification unit is used for inputting the data entity into a preset topic domain classification model to perform topic domain classification, so as to obtain a topic domain corresponding to the data entity;
and the second determining unit is used for determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring service activity information to be processed;
determining attribute information corresponding to a service object in the service activity information;
inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information;
inputting the data entity into a preset theme domain classification model for theme domain classification to obtain a theme domain corresponding to the data entity;
and determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
According to a fourth aspect of the present invention, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program:
acquiring service activity information to be processed;
determining attribute information corresponding to a service object in the service activity information;
inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information;
inputting the data entity into a preset theme domain classification model for theme domain classification to obtain a theme domain corresponding to the data entity;
and determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
Compared with the prior art that the business data of the business activity cannot be subjected to standardized definition, the business activity information standardization scheme based on the BOR method can acquire the business activity information to be processed; determining attribute information corresponding to the service object in the service activity information; meanwhile, inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information; inputting the data entity into a preset topic domain classification model for topic domain classification to obtain a topic domain corresponding to the data entity; finally, according to the data entity and the theme domain, a standardized processing result corresponding to the business activity information is determined, so that the standardized processing of the business activity information can be realized by extracting the attribute information corresponding to the business object in the business activity information, carrying out data entity clustering on the attribute information and carrying out theme domain clustering on the obtained data entity, and further the business data of the business activity is defined in a uniform mode, so that the high-quality conversion from the business activity, the business information to the data element is realized, and the management and the application of the business data are facilitated.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 shows a flowchart of a business activity information standardization scheme based on a BOR method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating another business activity information standardization scheme based on a BOR method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram illustrating a business activity information standardization apparatus based on a BOR method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram illustrating another business activity information standardization apparatus based on a BOR method according to an embodiment of the present invention;
fig. 5 shows a physical structure diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
At present, the same way of processing the business activity information does not exist, so that the business data of the business activity cannot be defined in a standardized way, and the effective management and application of the business data are not facilitated.
In order to solve the above problem, an embodiment of the present invention provides a business activity information standardization scheme based on a BOR method, as shown in fig. 1, where the method includes:
101. and acquiring the service activity information to be processed.
Wherein, the service activity information refers to the service information generated by the user in the process of participating in the service activity, for example, in the insurance acceptance link in the insurance field, the business activities participated by the user include insurance application, insurance verification, order output, charging and the like, the user can generate insurance application information, insurance applicant information, insurance application information and the like in the process of insurance application, in order to overcome the defects that the service activity information can not be defined in a standardized way and the service data management and application are not facilitated in the prior art, the embodiment of the invention extracts the attribute information corresponding to the service object from the service activity information, and performing data entity clustering on the attribute information, and performing topic domain clustering on the obtained data entities, therefore, the standardized definition of the business activity information can be realized, the high-quality conversion from the business activity and the business information to the data elements is realized, and the effective management and the efficient application of the business data are further realized. The embodiment of the invention is mainly suitable for standardizing and defining the business activity information, and the execution main body of the embodiment of the invention is a device or equipment capable of standardizing and processing the business activity information, and can be specifically arranged at one side of a client or a server.
For the embodiment of the invention, a user can generate corresponding business activity information in the process of participating in business activities, for example, the user can generate insurance policy information, insurance applicant information, dangerous species information and the like in the process of insuring, the insurance policy information specifically comprises insurance policy number, insurance company and the like, the insurance applicant information specifically comprises name, gender, identity number, bank account number and the like, the dangerous species information specifically comprises dangerous species name, dangerous species type, insurance fee, insurance amount and the like, the information is business activity information generated by the user in the process of insuring, and in order to facilitate management and application of the business activity information, the business activity information needs to be defined in a standardized manner, so that high-quality conversion from business activities and business information to data elements is realized.
102. And determining attribute information corresponding to the service object in the service activity information.
For example, in the insurance field, the business activity information includes business objects participating in activities such as insurance policy, insurance applicant, dangerous seeds, etc., and each business object can be described by using a plurality of attribute information, for example, the insurance applicant can be described by using attribute information such as name, sex, identification number, etc.
For the embodiment of the present invention, in the process of standardizing the service activity information, the service activity information needs to be firstly split into attribute information corresponding to a plurality of service objects, and then data entity clustering and topic domain clustering are performed on the attribute information, so as to realize standardized definition of the service data. Specifically, in the process of splitting the business activity information into attribute information corresponding to a plurality of business objects, a preset entity identification algorithm may be used to perform entity identification on the business objects in the business activity information, for example, identify business objects such as insurance policy, insurance applicant, and risk, in the insurance business activity information, and further, the attribute information corresponding to the business objects is respectively extracted from the business activity information, so that the business activity information can be split into attribute information corresponding to a plurality of business objects, that is, a data processing process from total to total in the business data standardization definition process is realized.
103. And inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information.
For the embodiment of the present invention, after splitting the service activity information into attribute information corresponding to the service object, in order to avoid that the service data is composed of a large amount of unordered attribute information in the standardization process, clustering processing needs to be performed on the attribute information, specifically, the attribute information may be input to the preset data entity classification model to perform entity classification, so as to obtain a data entity to which the attribute information belongs, and then, clustering processing is performed on the attribute information belonging to the same data entity, so as to obtain attribute information under different data entities. For example, the attribute information corresponding to the service object is determined to include an insurance policy number, an insurance company name, an insurance applicant gender, a dangerous species name and a dangerous species type, the attribute information is respectively input into a preset data entity classification model for classification, a data entity corresponding to the insurance policy number and the insurance company name can be determined to be an insurance policy, a data entity corresponding to the insurance applicant name and the insurance applicant gender is an insurance applicant, a data entity corresponding to the dangerous species name and the dangerous species type is a dangerous species, and then the attribute information belonging to the same data entity is subjected to clustering processing to obtain the attribute information under different data entities, so that the clustering combination of the attribute information with similar service logic relations is realized, the data entity corresponding to the attribute information is obtained, and the data processing from the classification to the total data processing process in the service data standardization definition process is realized.
104. And inputting the data entity into a preset topic domain classification model for topic domain classification to obtain a topic domain corresponding to the data entity.
For the embodiment of the present invention, after clustering the data entities of the split attribute information, in order to further summarize the data entities, the data entities need to be clustered, and finally data entities under different subject domains are formed. For example, the clustered data entities comprise insurant basic information, insurant address information, insured basic information, insured address information, report information, set up information, claim information and the like, and the data entities are respectively input into a preset topic domain classification model for classification, so that the insurant basic information, the insurant address information, the insured basic information and the insured address information can be determined, and are shared information of different roles of a client in different business links, have identity and can be abstracted to form a client topic domain; the application information, the proposal information and the claim information have close coupling relation in the business activity, and can be combined to form a claim subject domain, thereby realizing the clustering processing of the data entities and obtaining the subject domain corresponding to the data entities.
105. And determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
For the embodiment of the invention, the business activity information is divided into the attribute information of the business object, and the attribute information is subjected to data entity clustering and subject domain clustering, so that the attribute information under different data entities and the data entities under different subject domains can be determined, thereby completing the standardized definition of the business activity information, realizing the high-quality conversion from the business activity and the business information to the data elements, and facilitating the effective management and application of the subsequent business data.
Compared with the prior art that the business activity information can not be subjected to standardized definition, the business activity information standardization scheme based on the BOR method provided by the embodiment of the invention can obtain the business activity information to be processed; determining attribute information corresponding to the service object in the service activity information; meanwhile, inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information; inputting the data entity into a preset topic domain classification model for topic domain classification to obtain a topic domain corresponding to the data entity; finally, according to the data entity and the theme domain, a standardized processing result corresponding to the business activity information is determined, so that the standardized processing of the business activity information can be realized by extracting the attribute information corresponding to the business object in the business activity information, carrying out data entity clustering on the attribute information and carrying out theme domain clustering on the obtained data entity, the business activity information is further defined in a uniform mode, the high-quality conversion from the business activity and the business information to the data elements is realized, and the effective management and the efficient application of the business data are facilitated.
Further, in order to better describe the standardization processing procedure of the service activity information, as a refinement and an extension to the foregoing embodiment, an embodiment of the present invention provides another service activity information standardization scheme based on a BOR method, as shown in fig. 2, where the method includes:
201. and acquiring the service activity information to be processed.
For the embodiment of the invention, the business activities can be divided into different levels, for example, first-level insurance business activities such as sales, underwriting, security, claim settlement, service and the like, and for the underwriting link, the business activities can be further divided into second-level or even third-level business activities such as insurance application, underwriting, order output, charging and the like, and the obtained business activity information is also the business information generated when the user participates in the business activities of different levels. It should be noted that the business activity information in the embodiment of the present invention may specifically be insurance business activity information generated by the user in the process of participating in insurance business activities, and may also be business activity information generated by the user in the process of participating in business activities in other fields, which is not specifically limited in the embodiment of the present invention.
202. And identifying the business object in the business activity information by using a preset entity identification algorithm.
For the embodiment of the present invention, in order to perform standardized definition on the service activity information, it is necessary to first identify a service object entity included in the service object information. As an optional implementation manner, the step 202 specifically includes, for a specific process of identifying a business object: performing word segmentation processing on the service activity information to obtain each word segmentation corresponding to the service activity information; and inputting each word segmentation corresponding to the business activity information into a preset entity recognition model for entity recognition, and determining a business object contained in the business activity information.
Specifically, a preset natural language model is used to perform word segmentation processing on the business activity information to obtain each word segmentation corresponding to the business activity information, and the preset natural language model may be a BERT natural language model. Then, inputting each word segmentation corresponding to the business activity information into a preset entity recognition model for entity recognition, and determining a business object entity contained in the business activity information, wherein the preset entity recognition model can be specifically an LSTM network; the business activity information at least comprises a business object, specifically, each participle corresponding to the business activity information is input into an LSTM network, probability values of different entity categories of each participle can be obtained, entity categories corresponding to each participle are determined according to the probability values, and target business object entities are screened out. For example: and screening out the service objects participating in the insurance activities, such as insurance policies, insurance applicants, dangerous varieties and the like from the insurance service object information, so that the service objects can be identified from the service activity information according to the mode so as to further determine the attribute information corresponding to the service objects. It should be noted that, in a specific application scenario, the preset natural language model may be set at the client, and the LSTM may be set at the server, so that the data volume of the query can be reduced.
203. And extracting attribute information in the service activity information by using a preset service object attribute word bank to obtain attribute information corresponding to the service object.
The preset business object attribute word bank records attribute words of different business objects, for example, the preset business object attribute word bank records a word insurance policy number and an insurance company name corresponding to the insurance policy. For the embodiment of the invention, after the word segmentation processing is carried out on the business activity information, the business object contained in the business object information is determined, then the preset business object attribute word bank is inquired according to the business object, the attribute field corresponding to the business object is determined, further, the business activity information is inquired according to the attribute field, and when the attribute field exists in the business activity information, the attribute information corresponding to the attribute field is extracted, so that the business activity information can be split into the attribute information corresponding to the business object, and the data processing process from the total to the point in the business data standardization definition process is realized.
204. And inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information.
For the embodiment of the present invention, in order to determine the data entity category to which the attribute information of the service object belongs, step 204 specifically includes: inputting the attribute information into a preset decision tree data entity classification model to obtain a first probability value of the attribute information belonging to different data entities; and screening a first maximum probability value in the first probability values, and determining a data entity corresponding to the first maximum probability value as a data entity corresponding to the attribute information.
For example, the name of the attribute information insurance company is input into a preset decision tree entity classification model for entity classification, and the probability values of the name of the insurance company belonging to the insurance policy, the insurance applicant and the risk are respectively 0.5, 0.3 and 0.2, so that the name of the attribute information insurance company belonging to the insurance policy data entity can be determined. After the data entities to which the attribute information belongs are determined, the attribute information belonging to the same data entity is clustered to obtain the attribute information under different data entities, so that the attribute information with similar business logic relations can be combined, and the data processing process from the standard definition process of the business data to the total data processing process is realized.
It should be noted that, in the process of constructing the preset decision tree data entity classification model, historical business data is collected, the business data is labeled according to the data entity category to which the business data belongs, the labeled business data is used as a training set, and the training set is trained to construct the preset decision tree entity classification model.
205. And inputting the data entity into a preset topic domain classification model for topic domain classification to obtain a topic domain corresponding to the data entity.
For the embodiment of the present invention, in order to determine the subject domain to which the data entity belongs, step 205 specifically includes: inputting the data entity into a preset random forest topic domain classification model for topic domain classification to obtain a second probability value of the data entity belonging to different topic domains; and screening a second maximum probability value in the second probability values, and determining a topic domain corresponding to the second maximum probability value as a topic domain corresponding to the data entity.
For example, the basic information of the policy of the data entity is input into a preset random forest topic domain classification model for topic domain classification, the probability values of the basic information of the policy belonging to the client, the policy and the claim topic domain are respectively 0.3, 0.5 and 0.2, so that the basic information of the policy of the data entity can be determined to belong to the topic domain of the policy, after the topic domain of the data entity is determined, the data entities belonging to the same topic domain are clustered to obtain the data entities under different topic domains, and the data entities can be further summarized.
It should be noted that, in the process of constructing the preset random forest topic domain classification model, historical business data is collected, historical business data under different data entities is determined, then the data entities are labeled according to topic domains corresponding to the different data entities, the labeled data entities are used as a training set, and the preset random forest topic domain classification model is constructed based on the training set.
206. And determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
For embodiments of the present invention, for data acquisition according to defined data standards, after step 206, the method further comprises: respectively adjusting the data entity range and the subject domain range according to the data acquisition scene; and acquiring the business data based on the adjusted data entity range and the adjusted subject domain range. Specifically, after data is defined, an attribute information range in a data entity and a data entity range in a subject domain can be adjusted according to a specific scene on the basis of the defined data, and data acquisition is performed based on the adjusted range.
In a specific application scenario, in order to enable the acquired data to meet corresponding requirements, a preset checking rule is required to be used for checking the service data, and based on this, before acquiring the service data based on the adjusted data entity range and the adjusted subject domain range, the method further includes: judging whether the file identification corresponding to the service data meets a preset file identification rule or not; if the preset file identification rule is met, judging whether the service data meets a preset data check rule; and if the preset data checking rule is met, acquiring the service data based on the adjusted data entity range and the adjusted subject domain range. The text identifier may specifically be a file name, or a hash value or md5 value of the file, and may use a file naming rule for bearing the service data to determine whether the file name meets corresponding requirements, if so, further use a preset data detection rule to detect the service data in the file, if so, collect the service data, and simultaneously upload the data passing through the data detection rule to the server for querying in the server.
Further, after the relevant service departments complete the collection of the service data, in order to further ensure that the collected service data does not have quality problems, the collected service data needs to be subjected to quality detection, specifically, corresponding data detection rules can be set according to data usage requirements, service types and data standards, the collected service data is subjected to quality detection by using the data detection rules, and if the service data meets the preset data detection rules, the quality of the service data does not have problems; if the service data does not meet the preset data detection rule, the quality of the service data is indicated to have a problem, and based on the problem, the method comprises the following steps: carrying out quality detection on the acquired service data by using a preset data detection rule; if the quality problem exists in the service data, generating a quality problem report corresponding to the service data based on the quality problem type corresponding to the service data, and sending the quality problem report to a related service personnel terminal so as to correct the service data based on the quality problem report; and receiving the corrected service data, and performing quality detection on the corrected service data by using a preset data detection rule again.
The collected service data can be comprehensively quality-detected according to the data usage requirement, the service type and the data standard, for example, the data format detection rule is used for format detection of the collected service data, the sensitive information detection rule is used for sensitivity detection of the collected service data, the collected service data is prevented from containing user privacy and causing leakage of the user privacy, or the data logic detection rule is used for logic detection of the collected service data, whether logic errors exist in the collected service data is judged, and if conflicts exist between the age of the user and the guarantee period are detected by the logic detection rule, the collected service data are not in accordance with logic. After setting the data detection rule, selecting a corresponding script writing tool according to the service requirement, for example, selecting a statepad + + tool to write a corresponding quality detection script, when receiving a quality detection instruction of the service data, calling the corresponding quality detection script to perform data quality detection, if the acquired service data does not meet the preset data detection rule, determining the problem type of the service data according to the preset data detection rule that the service data does not pass, further adopting a report template corresponding to the problem type to generate a quality problem report corresponding to the service data, sending the quality problem report to a service worker, analyzing the cause of the problem according to the quality problem report, setting a corresponding rectification scheme, correcting the acquired service data based on the rectification scheme, or re-setting the rectification scheme to acquire the service data, further, the corrected service data are sent to the data detection platform again, the data detection platform detects the corrected service data again by using the preset data detection rule, and the process is repeated until the detected service data meet the preset data detection rule. Therefore, a closed-loop management mode of rule making, problem finding, reason analysis and tracking and modifying can be formed according to the mode, and therefore data quality can be continuously optimized.
Compared with the prior art that the business activity information can not be standardized and defined, the business activity information standardization scheme based on the BOR method provided by the embodiment of the invention can obtain the business activity information to be processed; determining attribute information corresponding to the service object in the service activity information; meanwhile, inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information; inputting the data entity into a preset topic domain classification model for topic domain classification to obtain a topic domain corresponding to the data entity; finally, according to the data entity and the subject domain, determining a standardized processing result corresponding to the business activity information, therefore, by extracting the attribute information corresponding to the business object in the business activity information and carrying out data entity clustering on the attribute information, and subject domain clustering is carried out on the obtained data entities, the standardized processing of business activity information can be realized, the high-quality conversion from the business activity and business information to data elements is realized, thereby facilitating the management and application of the service data, further, extracting the attribute information corresponding to the service object in the service activity information by using the preset entity recognition algorithm, improving the extraction efficiency of the attribute information, reducing the labor cost, and in addition, by means of the preset subject domain classification model and the preset data entity classification model, automatic classification of business data is achieved, and the problem that the classification result is inaccurate due to excessive manual intervention is avoided.
Further, as a specific implementation of fig. 1, an embodiment of the present invention provides a business activity information standardization apparatus based on a BOR method, and as shown in fig. 3, the apparatus includes: an acquisition unit 31, a first determination unit 32, a first classification unit 33, a second classification unit 34, and a second determination unit 35.
The obtaining unit 31 may be configured to obtain to-be-processed service activity information.
The first determining unit 32 may be configured to determine attribute information corresponding to a service object in the service activity information.
The first classification unit 33 may be configured to input the attribute information to a preset data entity classification model for entity classification, so as to obtain a data entity corresponding to the attribute information.
The second classification unit 34 may be configured to input the data entity to a preset topic domain classification model for topic domain classification, so as to obtain a topic domain corresponding to the data entity.
The second determining unit 35 may be configured to determine a standardized processing result corresponding to the service activity information according to the data entity and the subject domain.
For the embodiment of the present invention, as shown in fig. 4, in order to split the service activity information into the attribute information corresponding to the service object, the first determining unit 32 includes: an identification module 321 and an extraction module 322.
The identifying module 321 may be configured to identify the service object in the service activity information by using a preset entity identification algorithm.
The extracting module 322 may be configured to extract the attribute information in the service activity information by using a preset service object attribute thesaurus, so as to obtain attribute information corresponding to the service object.
Further, in order to identify the business object in the business activity information, the identifying module 321 includes: a word segmentation sub-module and a recognition sub-module.
The word segmentation sub-module may be configured to perform word segmentation processing on the service activity information to obtain each word segmentation corresponding to the service activity information.
The recognition submodule may be configured to input each word segmentation corresponding to the service activity information to a preset entity recognition model for entity recognition, and determine a service object included in the service activity information.
Further, the preset data entity classification model is a preset decision tree data entity classification model, and the first classification unit 33 includes: an entity classification module 331 and a determination module 332.
The entity classification module 331 is configured to input the attribute information to a preset decision tree data entity classification model for entity classification, so as to obtain a first probability value that the attribute information belongs to different data entities.
The determining module 332 may be configured to filter a first maximum probability value of the first probability values, and determine a data entity corresponding to the first maximum probability value as a data entity corresponding to the attribute information.
Further, the preset topic domain classification model is a preset random forest topic domain classification model, and the second classification unit 34 includes: a topic domain classification module 341 and a determination module 342.
The topic domain classification module 341 may be configured to input the data entity to a preset random forest topic domain classification model for topic domain classification, so as to obtain a second probability value that the data entity belongs to different topic domains.
The determining module 342 may be configured to filter a second maximum probability value of the second probability values, and determine a topic domain corresponding to the second maximum probability value as a topic domain corresponding to the data entity.
Further, for collecting the service data, the apparatus further comprises an adjusting unit 36 and a collecting unit 37.
The adjusting unit 36 may be configured to adjust the data entity range and the subject domain range according to the data acquisition scenario.
The collecting unit 37 may be configured to collect the service data based on the adjusted data entity range and the adjusted subject domain range.
Further, the apparatus further comprises: the determining unit 38, where the determining unit 38 may be configured to determine whether a file identifier corresponding to the service data meets a preset file identifier rule.
The determining unit 38 may be further configured to determine whether the service data meets a preset data checking rule if the preset file identification rule is met.
The collecting unit 37 may be specifically configured to collect the service data based on the adjusted data entity range and the adjusted subject domain range if the preset data checking rule is satisfied.
It should be noted that other corresponding descriptions of the functional modules related to the business activity information standardization apparatus based on the BOR method provided in the embodiment of the present invention may refer to the corresponding descriptions of the method shown in fig. 1, and are not described herein again.
Based on the method shown in fig. 1, correspondingly, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps: acquiring service activity information to be processed; determining attribute information corresponding to a service object in the service activity information; inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information; inputting the data entity into a preset theme domain classification model for theme domain classification to obtain a theme domain corresponding to the data entity; and determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
Based on the above embodiments of the method shown in fig. 1 and the apparatus shown in fig. 3, an embodiment of the present invention further provides an entity structure diagram of a computer device, as shown in fig. 5, where the computer device includes: a processor 41, a memory 42 and a computer program stored on the memory 42 and executable on the processor, wherein the memory 42 and the processor 41 are arranged on a bus 43 such that the processor 41 performs the following steps when executing the program: acquiring service activity information to be processed; determining attribute information corresponding to a service object in the service activity information; inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information; inputting the data entity into a preset theme domain classification model for theme domain classification to obtain a theme domain corresponding to the data entity; and determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
By the technical scheme, the method can acquire the service activity information to be processed; determining attribute information corresponding to the service object in the service activity information; meanwhile, inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information; inputting the data entity into a preset topic domain classification model for topic domain classification to obtain a topic domain corresponding to the data entity; and finally, determining a standardized processing result corresponding to the business activity information according to the data entity and the theme domain, so that the standardized processing of the business activity information can be realized by extracting the attribute information corresponding to the business object in the business activity information, carrying out data entity clustering on the attribute information and carrying out theme domain clustering on the obtained data entity, and the high-quality conversion from the business activity and the business information to the data elements is realized, thereby being beneficial to the management and application of the business data.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be; they may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed over a network of multiple computing devices, and they may alternatively be implemented in program code that is executable by a computing device, such that they are stored in a memory device and executed by a computing device, and in some cases the steps shown or described are performed in an order different than that shown and described herein, or they are separately fabricated into individual integrated circuit modules, or multiple ones of them are fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A business activity information standardization scheme based on BOR method is characterized by comprising:
acquiring service activity information to be processed;
determining attribute information corresponding to a service object in the service activity information;
inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information;
inputting the data entity into a preset theme domain classification model for theme domain classification to obtain a theme domain corresponding to the data entity;
and determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
2. The method of claim 1, wherein the determining attribute information corresponding to the business object in the business activity information comprises:
identifying the business object in the business activity information by using a preset entity identification algorithm;
and extracting attribute information in the service activity information by using a preset service object attribute word bank to obtain attribute information corresponding to the service object.
3. The method according to claim 2, wherein the identifying the business object in the business activity information by using a preset entity identification algorithm comprises:
performing word segmentation processing on the service activity information to obtain each word segmentation corresponding to the service activity information;
and inputting each word segmentation corresponding to the business activity information into a preset entity recognition model for entity recognition, and determining a business object contained in the business activity information.
4. The method according to claim 1, wherein the preset data entity classification model is a preset decision tree data entity classification model, and the inputting the attribute information into the preset data entity classification model for entity classification to obtain the data entity corresponding to the attribute information comprises:
inputting the attribute information into a preset decision tree data entity classification model for entity classification to obtain a first probability value that the attribute information belongs to different data entities;
and screening a first maximum probability value in the first probability values, and determining a data entity corresponding to the first maximum probability value as a data entity corresponding to the attribute information.
5. The method as claimed in claim 1, wherein the preset topic domain classification model is a preset random forest topic domain classification model, and the step of inputting the data entity into the preset topic domain classification model for topic domain classification to obtain the topic domain corresponding to the data entity comprises:
inputting the data entity into a preset random forest topic domain classification model for topic domain classification to obtain a second probability value of the data entity belonging to different topic domains;
and screening a second maximum probability value in the second probability values, and determining a topic domain corresponding to the second maximum probability value as a topic domain corresponding to the data entity.
6. The method according to claim 1, wherein after determining the standardized processing result corresponding to the business activity information according to the data entity and the subject domain, the method further comprises:
respectively adjusting the data entity range and the subject domain range according to the data acquisition scene;
and acquiring the business data based on the adjusted data entity range and the adjusted subject domain range.
7. The method of claim 6, wherein after the collecting the business data based on the adjusted data entity scope and subject domain scope, the method further comprises:
carrying out quality detection on the acquired service data by using a preset data detection rule;
if the quality problem exists in the service data, generating a quality problem report corresponding to the service data based on the quality problem type corresponding to the service data, and sending the quality problem report to related service personnel so as to correct the service data based on the quality problem report;
receiving the corrected service data, and performing quality detection on the corrected service data by using a preset data detection rule again;
the quality detection of the collected service data by using the preset data detection rule includes:
selecting a corresponding script compiling tool according to the service requirement;
compiling a detection script corresponding to the preset data detection rule by using the script compiling tool;
and responding to the quality detection instruction of the service data, and calling the detection script to carry out quality detection on the service data.
8. A business activity information standardization device based on BOR method is characterized in that the business activity information standardization device comprises:
the acquisition unit is used for acquiring the service activity information to be processed;
a first determining unit, configured to determine attribute information corresponding to a service object in the service activity information;
the first classification unit is used for inputting the attribute information into a preset data entity classification model for entity classification to obtain a data entity corresponding to the attribute information;
the second classification unit is used for inputting the data entity into a preset topic domain classification model to perform topic domain classification, so as to obtain a topic domain corresponding to the data entity;
and the second determining unit is used for determining a standardized processing result corresponding to the business activity information according to the data entity and the subject domain.
9. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202110511892.1A 2021-05-11 2021-05-11 Business activity information standardization scheme based on BOR method Pending CN113239126A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110511892.1A CN113239126A (en) 2021-05-11 2021-05-11 Business activity information standardization scheme based on BOR method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110511892.1A CN113239126A (en) 2021-05-11 2021-05-11 Business activity information standardization scheme based on BOR method

Publications (1)

Publication Number Publication Date
CN113239126A true CN113239126A (en) 2021-08-10

Family

ID=77133381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110511892.1A Pending CN113239126A (en) 2021-05-11 2021-05-11 Business activity information standardization scheme based on BOR method

Country Status (1)

Country Link
CN (1) CN113239126A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626447A (en) * 2021-10-12 2021-11-09 民航成都信息技术有限公司 Civil aviation data management platform and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626447A (en) * 2021-10-12 2021-11-09 民航成都信息技术有限公司 Civil aviation data management platform and method

Similar Documents

Publication Publication Date Title
US9299108B2 (en) Insurance claims processing
US11956272B2 (en) Identifying legitimate websites to remove false positives from domain discovery analysis
CN109829629B (en) Risk analysis report generation method, apparatus, computer device and storage medium
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
CN111368147B (en) Graph feature processing method and device
US9836520B2 (en) System and method for automatically validating classified data objects
CN109801151B (en) Financial falsification risk monitoring method, device, computer equipment and storage medium
CN112150298A (en) Data processing method, system, device and readable medium
CN112712429A (en) Remittance service auditing method, remittance service auditing device, computer equipment and storage medium
CN111581193A (en) Data processing method, device, computer system and storage medium
CN112835910B (en) Method and device for processing enterprise information and policy information
CN113239126A (en) Business activity information standardization scheme based on BOR method
CN110489434B (en) Information processing method and related equipment
CN111507850A (en) Authority guaranteeing method and related device and equipment
CN116662387A (en) Service data processing method, device, equipment and storage medium
KR101456189B1 (en) Method for evaluating patents using engine and evaluation server
KR101456187B1 (en) Method for evaluating patents based on complex factors
CN114817518A (en) License handling method, system and medium based on big data archive identification
CN111126503B (en) Training sample generation method and device
CN114298845A (en) Method and device for processing claim settlement bills
CN113901075A (en) Method and device for generating SQL (structured query language) statement, computer equipment and storage medium
Wang et al. A knowledge discovery case study of software quality prediction: Isbsg database
KR20140080592A (en) Method for online evaluating patents
CN111429110A (en) Store standardization auditing method, device, equipment and storage medium
CN112819347B (en) Industry subject auditing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination