WO2021232595A1 - Enterprise state supervision method, apparatus, and device, and computer readable storage medium - Google Patents

Enterprise state supervision method, apparatus, and device, and computer readable storage medium Download PDF

Info

Publication number
WO2021232595A1
WO2021232595A1 PCT/CN2020/106230 CN2020106230W WO2021232595A1 WO 2021232595 A1 WO2021232595 A1 WO 2021232595A1 CN 2020106230 W CN2020106230 W CN 2020106230W WO 2021232595 A1 WO2021232595 A1 WO 2021232595A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
state
enterprise
score
model
Prior art date
Application number
PCT/CN2020/106230
Other languages
French (fr)
Chinese (zh)
Inventor
刘春�
Original Assignee
平安国际智慧城市科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安国际智慧城市科技股份有限公司 filed Critical 平安国际智慧城市科技股份有限公司
Publication of WO2021232595A1 publication Critical patent/WO2021232595A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants

Definitions

  • This application relates to the field of data processing technology, and in particular to a method, device, equipment and computer-readable storage medium for monitoring the state of an enterprise.
  • the main purpose of this application is to provide a method, device, equipment, and computer-readable storage medium for monitoring the status of an enterprise, aiming to solve the technical problems of untimely and inefficient monitoring of the operating status of catering enterprises in the prior art.
  • an embodiment of the present application provides a method for monitoring the state of an enterprise, and the method for monitoring the state of an enterprise includes the following steps:
  • this application also provides an enterprise state monitoring device, the enterprise state monitoring device includes:
  • a generating module used for transmitting preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model
  • An obtaining module is used to obtain structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmit the structured data to the state data model to generate a data score of the structured data;
  • the supervision module is used to supervise whether the enterprise status of the enterprise to be checked is valid according to the data score.
  • the present application also provides an enterprise state monitoring device, the enterprise state monitoring device including a memory, a processor, and an enterprise state monitoring program stored on the memory and running on the processor, When the enterprise state monitoring program is executed by the processor, the following steps are implemented:
  • the present application also provides a computer-readable storage medium with an enterprise state monitoring program stored on the computer-readable storage medium, and when the enterprise state monitoring program is executed by a processor, the following steps are implemented:
  • This application provides a method, device, equipment, and computer-readable storage medium for monitoring the state of an enterprise.
  • the preset sample data is first transmitted to the initial model, and the initial model is trained based on the federated learning algorithm to generate the state data model; Check the structured data corresponding to the state of the enterprise in the enterprise, and transfer the structured data to the state data model to generate the data score of the structured data; and then, according to the data score, supervise whether the enterprise state of the enterprise to be verified is valid.
  • the preset sample data is various types of data representing their respective states in each enterprise, and is the real and effective data of each enterprise.
  • the federated learning algorithm is combined with the preset sample data of a large number of enterprises for training, which enriches the training sample size and makes all
  • the generated state data model is more accurate. Therefore, the status data model is used to monitor the effectiveness of the state of the enterprise to be verified, which combines various real data of the enterprise to be verified to reflect the state of the enterprise, avoids relying on the inspection of industrial and commercial registration for supervision, and ensures the authenticity of the state of the supervised enterprise. While ensuring the effectiveness and accuracy of supervision, it also ensures the timeliness of supervision, which is conducive to timely supervision and efficient supervision.
  • FIG. 1 is a schematic diagram of the structure of an enterprise state monitoring device in a hardware operating environment involved in a solution according to an embodiment of the application;
  • FIG. 2 is a schematic flowchart of the first embodiment of the method for monitoring the state of an applicant enterprise
  • FIG. 3 is a schematic diagram of functional modules of a preferred embodiment of the enterprise state monitoring device of this application.
  • FIG. 1 is a schematic diagram of the structure of an enterprise state monitoring device in a hardware operating environment involved in the solution of an embodiment of the present application.
  • the enterprise state monitoring device in the embodiment of the present application may be a PC, or a portable terminal device such as a tablet computer and a portable computer.
  • the enterprise state monitoring device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • the structure of the enterprise state monitoring device shown in FIG. 1 does not constitute a limitation on the enterprise state monitoring device, and may include more or less components than those shown in the figure, or a combination of certain components, or different components.
  • the layout of the components does not constitute a limitation on the enterprise state monitoring device, and may include more or less components than those shown in the figure, or a combination of certain components, or different components. The layout of the components.
  • the memory 1005 which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a detection program.
  • the network interface 1004 is mainly used to connect to the back-end server and communicate with the back-end server;
  • the user interface 1003 is mainly used to connect to the client (user side) to communicate with the client;
  • the processor 1001 can be used to call the detection program stored in the memory 1005 and perform the following operations:
  • the step of obtaining the structured data corresponding to the state of the enterprise in the enterprise to be checked includes:
  • step of separately extracting state keywords in the multiple types of state text data includes:
  • noise words that are irrelevant to the state of the enterprise in the multiple types of word segmentation to be recognized are eliminated to obtain the state keywords in the multiple types of state text data.
  • step of transmitting the structured data to the state data model to generate the data score of the structured data includes:
  • the data score of the structured data is generated according to the sub-score of the various types of the sub-data.
  • step of supervising whether the enterprise status of the enterprise to be checked is valid according to the data score includes:
  • the processor 1001 may be used to call the detection program stored in the memory 1005 and perform the following operations:
  • the preset sample data is updated, and the state data model is optimized for training based on the updated preset sample data.
  • the step of transmitting preset sample data to the initial model, and training the initial model based on a federated learning algorithm, and generating a state data model includes:
  • the preset sample data is transmitted to the initial model, the initial model is trained, and the model gradient is generated;
  • the model gradient is transmitted to the coordinating party corresponding to the federated learning algorithm, so that the coordinating party aggregates the model gradient and at least one other model gradient generated based on the federated learning algorithm to generate a return gradient ;
  • the first embodiment of the present application provides a schematic flowchart of a method for monitoring the state of an enterprise.
  • the enterprise state monitoring method includes the following steps:
  • Step S10 transmitting preset sample data to an initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
  • the method for monitoring the state of an enterprise in this embodiment is applied to a monitoring server, and is suitable for monitoring the state of an enterprise through the monitoring server.
  • the state of an enterprise is the state of operation of the enterprise.
  • a large amount of data in various aspects such as procurement, arrears, information disclosure, industrial and commercial registration, supervision, sales, training, etc. is used to determine whether the enterprise is in an operating state and then supervise it.
  • the enterprise may be various types of enterprises such as catering, clothing, travel, finance, construction, etc.
  • This embodiment preferably takes a catering enterprise as an example for description.
  • the initial model is trained through a large amount of data in various aspects of various catering companies that have determined the state of the enterprise, and the state data model for enterprise supervision is obtained and deployed to the supervision server.
  • Pre-set data indicators that characterize the operating conditions of catering companies such as procurement, industrial and commercial registration, sales, training, bright kitchens and other indicators, and then obtain a large number of such indicators from various catering companies with determined corporate status Data, that is, to obtain data such as procurement data, industrial and commercial registration data, sales data, training data and bright kitchens of multiple catering companies.
  • Different weights and scores are set for the acquired data according to their respective degree of influence on the effectiveness of the state of the enterprise. For example, compared with training data, procurement data has a higher degree of influence on the effectiveness of the state of the enterprise. High weight and score. After each acquired data is set with its own weight and score, all kinds of data are transmitted as preset sample data to the initial model for training.
  • the initial model is a preset network model, and the training is implemented based on a federated learning algorithm.
  • the federated learning algorithm is a way to continue machine learning under the premise of protecting data privacy and meeting legal compliance requirements. It uses technical algorithms to encrypt the model built, and the federated parties can also conduct model training without providing their own data.
  • federated learning protects user data privacy through parameter exchange under the encryption mechanism. The data and the model itself will not be transmitted, nor can they guess the other party’s data. Therefore, there is no possibility of leakage at the data level, and it does not violate stricter rules. Therefore, the data protection act of the People’s Republic of China can protect data privacy while maintaining data integrity to a high degree.
  • regions of the same level that need to be supervised on the state of the enterprise are regarded as the two sides of the federation, such as two different counties, two different cities, and so on.
  • Different regions set their own initial models, and conduct federated training to their respective initial models through the data of the catering companies that have determined their corporate status, that is, the combination of their respective preset sample data, and then obtain the federated training for their respective initial models
  • a state data model for monitoring the corporate state of catering companies in the local area That is, each area of the alliance has been trained to obtain a local state data model to supervise the business state of local catering companies. Training with data from other regions has enriched the training sample size and made the state data model more accurate while ensuring data security.
  • Step S20 Obtain structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmit the structured data to the state data model to generate a data score of the structured data;
  • the structured data is the various types of data indicators that represent the business status of the catering company. After structured processing, data in a specific data structure is obtained. Among them, various types of data corresponding to data indicators form various types of sub-data in structured data; for example, sales data and purchase data corresponding to data indicators form two types of sub-data in structured data, and these two types of sub-data
  • the data structure is the same.
  • the structured data is transferred to the state data model, and the type prediction of various sub-data in the structured data is performed through the state data model. Then according to the type of prediction, the scores and weights of various types of sub-data are searched, and the data scores of structured data are generated from each score and weight to indicate whether the business status of the enterprise is normal or not.
  • the steps of transmitting structured data to the state data model and generating the data score of the structured data include:
  • Step S21 transmitting the structured data to the state data model, and determining target sample data respectively matching various types of sub-data in the structured data;
  • the model parameters obtained after the state data model are trained are used to classify the various sub-data in the structured data, and search for various types of data in the preset sample data.
  • the sub-data respectively match the target sample data.
  • the matching is determined by the size of the similarity.
  • the structured data contains sub-data a and b
  • the preset sample data in the state data model includes three types of data p1, p2, and p3; then both a and b are transmitted to the state data model, and based on its model parameters, calculate The similarity between a and p1, p2, and p3, respectively; if the similarity with p3 is greater than the preset threshold, then p3 is used as the target sample data matching a.
  • Step S22 Determine the sub-scores of the various types of sub-data according to the scores and weight values respectively corresponding to each of the target sample data;
  • each data constituting the preset sample data has its own score and weight.
  • the score and weight corresponding to each target sample data can be searched. Furthermore, the scores and weights are calculated, and the scores and weights corresponding to the same target sample data are multiplied to obtain sub-scores of various sub-data. For example, for the aforementioned target sample data p1 and p3, if the score and weight value of p1 are w1 and k1, and the score and weight value of p3 are w3 and k3, then the sub-score of sub-data a is k3 *w3, the sub-score of sub-data b is k1*w1.
  • Step S23 Generate data scores of the structured data according to the sub-scores of the various types of sub-data.
  • the data scores of the structured data can be obtained according to each sub-score.
  • the data scores can be set as a collection of multiple scores, including at least the minimum, maximum, and average values of each sub-score.
  • Step S30 according to the data score, supervise whether the enterprise status of the enterprise to be checked is valid.
  • the corresponding relationship of different operating states is set in advance for different data scores, and the obtained data score of structured data is compared with the corresponding relationship, and the corresponding relationship is determined to be consistent with the data score of structured data.
  • the actual business status of the enterprise to be checked is determined based on the operating status of the target data score in the corresponding relationship.
  • the actual enterprise status represents the current true operating status of the enterprise to be checked.
  • whether the registered enterprise status of the enterprise to be checked is effective is supervised, and it is determined whether there is an update lag in the registered enterprise status.
  • the steps to supervise whether the enterprise status of the enterprise to be checked is effective include:
  • Step S31 Determine the target state corresponding to the combination formed by the maximum value, the minimum value and the average value in the data score according to the preset correspondence between the combined score and the state;
  • the data score is a collection of multiple scores including the maximum value, the minimum value, and the average value
  • the multiple score ranges are formed as The combined score is preset to correspond to the state.
  • call the corresponding relationship and compare the combination of the maximum, minimum, and average of the data scores with the combined scores in the corresponding relationship to determine whether the values of the combined combination are in the combination Within the range of the score; if each value exists in the range of each score of a certain combination, that is, the maximum, minimum and average values exist in the range of the maximum of a certain combination of scores,
  • the minimum range and the average range are used to find the state corresponding to the combined score in the corresponding relationship, as the target state corresponding to the data score, which represents the current actual operating state of the enterprise to be checked.
  • Step S32 searching for the registration status corresponding to the enterprise to be checked, and supervising whether the enterprise status of the enterprise to be checked is valid according to the consistency between the target state and the registration status.
  • the registration status of the enterprise to be checked is searched, and the searched registration status is compared with the target status, and the consistency of the two is judged. If it is determined that the two are consistent after comparison, it means that the registered status of the company to be verified is consistent with its actual operating status, and it is determined that the company status of the company to be verified is valid. On the contrary, if the registration status and the target status are inconsistent after comparison, it means that the status of the company to be verified is inconsistent with the status of actual operations, and the status of the registered company has a lag in updating, and it is determined that the status of the company to be verified is invalid.
  • the real data of all aspects of the enterprise to be checked can be combined to supervise its operating status, and the authenticity and effectiveness of supervision can be ensured. At the same time, supervision can be achieved by obtaining structured data, and the timeliness and efficiency of supervision can be ensured.
  • the enterprise state monitoring method of this embodiment first transmits preset sample data to the initial model, and trains the initial model based on the federated learning algorithm to generate a state data model; then obtains structured data corresponding to the state of the enterprise in the enterprise to be checked , And transfer the structured data to the status data model to generate the data score of the structured data; and then, according to the data score, supervise whether the enterprise status of the enterprise to be checked is valid.
  • the preset sample data is various types of data representing their respective states in each enterprise, and is the real and effective data of each enterprise.
  • the federated learning algorithm is combined with the preset sample data of a large number of enterprises for training, which enriches the training sample size and makes all The generated state data model is more accurate.
  • the status data model is used to monitor the effectiveness of the state of the enterprise to be verified, which combines various real data of the enterprise to be verified to reflect the state of the enterprise, avoids relying on the inspection of industrial and commercial registration for supervision, and ensures the authenticity of the state of the supervised enterprise. While ensuring the effectiveness and accuracy of supervision, it also ensures the timeliness of supervision, which is conducive to timely supervision and efficient supervision.
  • a second embodiment of the enterprise state supervision method of the present application is proposed.
  • the steps include:
  • Step S24 Collect enterprise text data of the enterprise to be checked, and extract the text data corresponding to the enterprise status from each of the enterprise text data for classification, to obtain multiple types of status text data;
  • the supervisory server connects with the company to be verified to collect its corporate text data from the company to be verified.
  • the text data of the company is the various types of data involved in the business process of the company to be verified, including at least purchases, arrears, and information. Disclosure, business registration, supervision, sales, training, corporate structure, corporate employee composition, and other data in text form.
  • extract the collected corporate text data based on the data indicators representing the operating conditions of the catering company extract the text data corresponding to the corporate status, and classify the extracted text data to obtain multiple types of status text corresponding to the data indicators data. That is, to determine the data index to which the extracted text data belongs, and divide the data belonging to the same type of data index into the same type to form multiple types of status text data such as sales and purchases.
  • Step S25 extracting state keywords in multiple types of the state text data respectively, and performing format conversion on the extracted multiple types of state keywords according to a preset data format to obtain the structured data.
  • the status keywords in various status text data are extracted to characterize the operating status of the enterprise to be checked in various aspects through various types of status keywords.
  • a preset data format is set in advance according to the required data structure.
  • the extracted multiple types of status keywords are formatted separately, and each type of status keyword is converted into the preset data format.
  • the preset data format is: purchasing category-purchasing time-purchasing data, after extracting the state keywords of each purchase in the purchase text data, the state keywords of each purchase are in accordance with the preset format Arrange the data, and check whether the time keyword in the status keyword is consistent with the purchase time format required in the preset data format.
  • the required purchase time format is XXXX year-XX month-XX day, and the time of the time keyword If the format is XX.XX.XX, it is judged that the time format is inconsistent. While arranging the status keywords according to the preset format data, the time format of the time keywords is converted to meet the requirements of the preset data format. Structured data.
  • step of separately extracting state keywords in the multiple types of state text data includes:
  • Step S251 performing segmentation processing and sentence processing on the multiple types of state text data respectively, generating multiple types of to-be-recognized clauses, and removing invalid clauses in the multiple types of the to-be-recognized clauses;
  • extracting state keywords in various state text data is a process of processing each state text data separately, and the separate processing may be serial processing or parallel processing. Specifically, first perform segmentation processing on each type of state text data to obtain a text data segment of the state text data, and then perform segmentation processing on each text data segment to obtain multiple text sentences as the to-be-recognized clauses. After that, search for sentences that are not related to the operating status in multiple clauses to be identified, and remove the searched sentences as invalid clauses to ensure that the status keywords extracted from the clauses to be identified are all related to the operating status .
  • Step S252 performing word segmentation processing on the multiple types of the to-be-recognized clauses after removing the invalid clauses, respectively, to generate multiple types of to-be-recognized word segmentation;
  • each type of sentence to be recognized is subjected to word segmentation processing, and the sentence to be recognized is divided into multiple words according to the language logic, and the word to be recognized in each type of state text data is obtained.
  • Step S253 Eliminate noise words that are irrelevant to the state of the enterprise among the multiple types of word segmentation to be recognized, and obtain multiple types of state keywords in the state text data.
  • pre-set words related to the business status form a dictionary, and compare the divided words to be recognized with the words in the dictionary one by one to determine whether the word to be recognized exists in the dictionary. If it exists in the dictionary, it is determined that the segmented word to be recognized is a valid word related to the business status, and if it does not exist in the dictionary, it is determined that the segmented word to be recognized is an invalid word that has nothing to do with the business status. After finding out all the invalid words that are not related to the business status in each type of word segmentation to be recognized, all the invalid words are eliminated as noise words that are not related to the state of the enterprise, and the status keywords in the text data of each type of status are obtained.
  • various status keywords are formatted according to the preset data format to obtain structured data representing the actual state of the enterprise from various aspects.
  • Each sub-data in the structured data exists in the same preset data format, which is convenient
  • the processing of each sub-data in the same way is conducive to the improvement of processing efficiency.
  • status keywords are extracted from various corporate text data of the company to be checked, and are generated as structured data to represent the actual status of the company. Because all kinds of enterprise text data are the real data of the enterprise, and represent the operating status of the enterprise from various aspects, the structured data generated according to it can reflect the true state of the enterprise from many aspects, and improve the effectiveness of the actual state of the enterprise. Sex and accuracy.
  • a third embodiment of the enterprise state supervision method of the present application is proposed.
  • the supervision office After describing the steps to verify whether the enterprise status of the enterprise is valid, the following include:
  • Step S40 transmitting the research and judgment score corresponding to the enterprise to be checked to the state data model, and judge whether the research and judgment score matches the data score;
  • This embodiment is provided with an optimization mechanism for the state data model.
  • the enterprise to be checked is scored manually based on the state of the enterprise represented by the structured data, and the research and judgment scores of the enterprise to be checked are obtained and transmitted to the supervision server.
  • the supervisory server transmits the research judgment score to the state data model, and judges whether the data score generated by the state data model matches the research judgment score. Among them, matching is not required to be completely consistent.
  • the numerical difference between the data score and the research score is within a certain range, it means that the two are relatively close, and the two can be considered to match. On the contrary, it shows that the two are far apart, and it is determined that the two do not match.
  • Step S50 if it matches the data score, store the data score and the structured data correspondingly;
  • the state data model can accurately process structured data at present, and optimization is not necessary.
  • the data scores and structured data are formed into a corresponding relationship and then stored as the basis for corporate state supervision.
  • Step S60 if it does not match the data score, search for target sample data that matches the structured data in the preset sample data;
  • the state data model currently has low accuracy in processing structured data, and it needs to be optimized. Because the data score is generated based on the preset sample data similar to the structured data of the enterprise to be checked in the state data model, and the data score generated based on the similar preset sample data is inaccurate, the optimization processing of the state data model That is to process the similar preset sample data.
  • the structured data is compared with the data in the preset sample data, and the data whose similarity with the structured data is greater than the preset similarity threshold is searched, and the data obtained by the search is used as the target of matching with the structured data Sample data, which is similar to the structured data of the verification enterprise, is used to generate the preset sample data of the data score.
  • Step S70 removing the target sample data and the score label corresponding to the target sample data, and generating the research and judgment score as the to-be-trained score label of the structured data;
  • the state data model generates data scores according to the score labels that represent the scores and weights carried by the target sample data.
  • the target sample data The score label carried by it is removed from the preset sample data and is not used as a training sample for the state data model.
  • the research and judgment scores are accurate scores, the research and judgment scores are generated as the to-be-trained score labels of the structured data, which are used to train the state data model and optimize the state data model.
  • step S80 the preset sample data is updated according to the structured data and the score label to be trained, and the state data model is optimized for training based on the updated preset sample data.
  • the structured data is converted into sample data, and the converted sample data and the score label to be trained are used as new preset sample data to update the preset sample data.
  • the state data model is optimized and trained based on the updated preset sample data to improve the accuracy of the state data model.
  • this embodiment can also reset the score label for the target sample data for training, that is, only remove the score label corresponding to the target sample data, and retain the target sample data. And set the new score label of the target sample data according to the research and judgment score, and then use the target sample data and its new score label as the new preset sample data, optimize the training of the state data model, and improve the performance of the state data model. accuracy.
  • the fourth embodiment of the enterprise state supervision method of the present application is proposed.
  • the preset sample The data is transmitted to the initial model, and the initial model is trained based on the federated learning algorithm.
  • the steps of generating the state data model include:
  • Step S11 Obtain the positive sample data corresponding to the preset positive field name and the negative sample data corresponding to the preset negative field name, and combine each of the positive sample data and each of the negative sample data Transmitting to the initial model as the preset sample data, training the initial model, and generating model gradients;
  • This embodiment is based on a federated learning algorithm to perform federated training on the initial model to generate a state data model.
  • the federated training involves at least two regions, each region has its own initial model, and the presets used for training between the regions
  • the sample data are independent of each other. All regions have the same training process for their respective initial models, and this embodiment uses any one of them for description.
  • the preset sample data includes positive sample data indicating that the state of the enterprise is valid, and negative sample data indicating that the state of the enterprise is invalid.
  • a preset positive field name representing a positive sample and a preset negative field name representing a negative sample are preset.
  • the large amount of data collected is filtered according to the preset positive field name and negative field name to obtain the The positive sample data corresponding to the field name, and the negative sample data corresponding to the preset negative field name. Then set different scores and weights for each positive sample data, and set different scores and weights for each negative sample data, and then set each positive sample data and each negative sample data as the default
  • the sample data is transmitted to the initial model for training, and the model gradient used by the party to update the model parameters is generated.
  • Step S12 Transmit the model gradient to the coordinator corresponding to the federated learning algorithm, so that the coordinator can aggregate the model gradient and at least one other model gradient generated based on the federated learning algorithm to generate Return gradient
  • a coordinator corresponding to the federated learning algorithm is set in the federal training process.
  • the coordinator can be any party in the various regions, or it can be independent of each party.
  • the generated model gradient is transmitted to the coordinating party, and the coordinating party aggregates the model gradient and other model gradients generated by other parties based on the federated learning algorithm.
  • the aggregation can be set to mean aggregation or weighted aggregation according to requirements. Generate a return gradient and return it to the supervisory server in each region.
  • Step S13 Receive the return gradient returned by the coordinator, and continuously train the initial model according to the return gradient until the initial model converges to obtain the state data model.
  • the initial model is continuously trained according to the return gradient. After each training, it is judged whether the initial model converges. If it converges, it means that the trained initial model can accurately generate data scores, and the trained initial model is used as the state data model. On the contrary, if it does not converge, continue training until it converges to obtain the state data model.
  • the convergence of the initial model can be determined by the convergence function in the initial model. After each training of the initial model, the test sample data is processed according to the model parameters obtained through training in the initial model to obtain the processing result. The convergence function is used to calculate the loss value between the processing result and the expected result. After the loss value continues to be less than the preset value for many times, it is determined that the initial model has converged, and the training is stopped. Otherwise, the training is continued.
  • a federated learning algorithm is used to perform federated training on the initial model to obtain a state data model.
  • the preset sample data in various regions is not transmitted outside, which protects data privacy while enriching the number of training samples and optimizes the training effect of the state data model. , Which makes the enterprise state supervision based on the state data model more accurate.
  • this application also provides an enterprise state monitoring device.
  • Fig. 3 is a schematic diagram of the functional modules of the first embodiment of the enterprise state monitoring device of this application.
  • the enterprise state monitoring device includes:
  • the generating module 10 is used for transmitting preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
  • the obtaining module 20 is configured to obtain structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmit the structured data to the state data model to generate a data score of the structured data;
  • the supervision module 30 is configured to supervise whether the enterprise status of the enterprise to be checked is valid according to the data score.
  • the generation module 10 first transmits the preset sample data to the initial model, and trains the initial model based on the federated learning algorithm to generate the state data model;
  • the structured data corresponding to the state of the enterprise is transmitted to the state data model to generate the data score of the structured data; and the supervisory module 30 supervises whether the enterprise state of the enterprise to be checked is valid according to the data score.
  • the preset sample data is various types of data representing their respective states in each enterprise, and is the real and effective data of each enterprise.
  • the federated learning algorithm is combined with the preset sample data of a large number of enterprises for training, which enriches the training sample size and makes all The generated state data model is more accurate.
  • the status data model is used to monitor the effectiveness of the state of the enterprise to be verified, which combines various real data of the enterprise to be verified to reflect the state of the enterprise, avoids relying on the inspection of industrial and commercial registration for supervision, and ensures the authenticity of the state of the supervised enterprise. While ensuring the effectiveness and accuracy of supervision, it also ensures the timeliness of supervision, which is conducive to timely supervision and efficient supervision.
  • the acquisition module 20 includes:
  • the collection unit is used to collect enterprise text data of the enterprise to be checked, and extract the text data corresponding to the enterprise status from each of the enterprise text data for classification, and obtain multiple types of status text data;
  • the conversion unit is used to extract state keywords in multiple types of the state text data, and perform format conversion on the extracted multiple types of state keywords according to a preset data format to obtain the structured data.
  • conversion unit is also used for:
  • noise words that are irrelevant to the state of the enterprise in the multiple types of word segmentation to be recognized are eliminated to obtain the state keywords in the multiple types of state text data.
  • the acquisition module 20 further includes:
  • the first transmission unit is configured to transmit the structured data to the state data model, and determine target sample data respectively matching various types of sub-data in the structured data;
  • the first determining unit is configured to determine the sub-scores of various types of the sub-data according to the scores and weights respectively corresponding to each of the target sample data;
  • the generating unit is configured to generate the data score of the structured data according to the sub-score of the various types of sub-data.
  • monitoring module 30 further includes:
  • the second determining unit is configured to determine the target state corresponding to the combination formed by the maximum value, the minimum value and the average value in the data score according to the preset correspondence between the combined score and the state;
  • the supervision unit is configured to search for the registration status corresponding to the enterprise to be checked, and supervise whether the enterprise status of the enterprise to be checked is valid according to the consistency between the target state and the registration status.
  • the enterprise state monitoring device further includes:
  • the judgment module is configured to transmit the research judgment score corresponding to the enterprise to be checked to the state data model, and judge whether the research judgment score matches the data score;
  • a storage module configured to store the data score and the structured data correspondingly if it matches the data score
  • a search module configured to search for target sample data matching the structured data in the preset sample data if the score does not match the data
  • a rejecting module configured to reject the target sample data and the score label corresponding to the target sample data, and generate the research and judgment score as the to-be-trained score label of the structured data
  • the update module is configured to update the preset sample data according to the structured data and the score label to be trained, and optimize the training of the state data model based on the updated preset sample data.
  • the generating module 10 further includes:
  • the acquiring unit is configured to acquire positive sample data corresponding to the preset positive field name and negative sample data corresponding to the preset negative field name, and combine each of the positive sample data and each of the negative
  • the sample data is transmitted to the initial model as the preset sample data, and the initial model is trained to generate model gradients;
  • the first transmission unit is configured to transmit the model gradient to the coordinating party corresponding to the federated learning algorithm, so that the coordinating party can transfer the model gradient and at least one other model gradient generated based on the federated learning algorithm Perform aggregation to generate a return gradient;
  • the receiving unit is configured to receive the return gradient returned by the coordinating party, and continuously train the initial model according to the return gradient until the initial model converges to obtain the state data model.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores an enterprise state monitoring program, and the enterprise state monitoring program is executed by a processor to implement the steps of the enterprise state monitoring method as described above.
  • the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a computer-readable storage medium as described above (such as The ROM/RAM, magnetic disk, optical disk) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An enterprise state supervision method, apparatus, and device, and a computer readable storage medium. Said method comprises: transmitting preset sample data to an initial model, and training the initial model on the basis of a federated learning algorithm, so as to generate a state data model (S10); acquiring structured data corresponding to an enterprise state in an enterprise to be checked, and transmitting the structured data to the state data model, so as to generate data score values of the structured data (S20); and according to the data score values, determining whether the enterprise state of said enterprise is valid (S30).

Description

企业状态监管方法、装置、设备及计算机可读存储介质Enterprise state supervision method, device, equipment and computer readable storage medium
本申请要求于2020年5月22日提交中国专利局、申请号为202010445603.8,发明名称为“企业状态监管方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 22, 2020, the application number is 202010445603.8, and the invention title is "Enterprise State Supervision Methods, Devices, Equipment, and Computer-readable Storage Media", and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及一种企业状态监管方法、装置、设备及计算机可读存储介质。This application relates to the field of data processing technology, and in particular to a method, device, equipment and computer-readable storage medium for monitoring the state of an enterprise.
背景技术Background technique
当前国内餐饮企业数量众多,每年有大量餐饮企业兴起,也存在大量停业的餐饮企业。对于停业的餐饮企业,若未及时办理营业执照注销,其工商登记的营业状态就存在滞后性,而需要监管更新其营业状态。目前监管人员对餐饮企业营业状态的监管,则是依赖于查询工商登记实现。发明人意识到,监管人员通常按照设定的期限进行查询,可能查询时餐饮企业已经停业很久,容易导致监管的不及时性。同时,对工商登记的查询还存在效率低下的问题,如此一来也影响了监管的效率。At present, there are a large number of domestic catering companies, a large number of catering companies rise up every year, and there are also a large number of catering companies that have ceased operations. For catering companies that have ceased operations, if they fail to complete the cancellation of their business licenses in time, the business status of their industrial and commercial registration will lag behind, and they need to be supervised to update their business status. At present, the supervision of the business status of catering enterprises by supervisors depends on the realization of inquiries on industrial and commercial registration. The inventor realizes that supervisors usually make inquiries according to the set time limit, and the catering companies may have been closed for a long time at the time of inquiries, which may easily lead to untimely supervision. At the same time, there is still a problem of inefficiency in the inquiries of industrial and commercial registration, which also affects the efficiency of supervision.
技术解决方案Technical solutions
本申请的主要目的在于提供一种企业状态监管方法、装置、设备及计算机可读存储介质,旨在解决现有技术中对餐饮企业经营状态的监管所存在的不及时和效率低下的技术问The main purpose of this application is to provide a method, device, equipment, and computer-readable storage medium for monitoring the status of an enterprise, aiming to solve the technical problems of untimely and inefficient monitoring of the operating status of catering enterprises in the prior art.
为实现上述目的,本申请实施例提供一种企业状态监管方法,所述企业状态监管方法包括以下步骤:In order to achieve the above-mentioned objective, an embodiment of the present application provides a method for monitoring the state of an enterprise, and the method for monitoring the state of an enterprise includes the following steps:
将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;Transmitting the preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;Acquiring structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmitting the structured data to the state data model to generate a data score of the structured data;
根据所述数据分值,监管所述待核查企业的企业状态是否有效。According to the data score, whether the enterprise status of the enterprise to be checked is effective is supervised.
为实现上述目的,本申请还提供一种企业状态监管装置,所述企业状态监管装置包括:To achieve the above objective, this application also provides an enterprise state monitoring device, the enterprise state monitoring device includes:
生成模块,用于将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;A generating module, used for transmitting preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
获取模块,用于获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;An obtaining module is used to obtain structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmit the structured data to the state data model to generate a data score of the structured data;
监管模块,用于根据所述数据分值,监管所述待核查企业的企业状态是否有效。The supervision module is used to supervise whether the enterprise status of the enterprise to be checked is valid according to the data score.
进一步地,为实现上述目的,本申请还提供企业状态监管设备,所述企业状态监管设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的企业状态监管程序,所述企业状态监管程序被所述处理器执行时实现如下步骤:Further, in order to achieve the above-mentioned purpose, the present application also provides an enterprise state monitoring device, the enterprise state monitoring device including a memory, a processor, and an enterprise state monitoring program stored on the memory and running on the processor, When the enterprise state monitoring program is executed by the processor, the following steps are implemented:
将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;Transmitting the preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;Acquiring structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmitting the structured data to the state data model to generate a data score of the structured data;
根据所述数据分值,监管所述待核查企业的企业状态是否有效。According to the data score, whether the enterprise status of the enterprise to be checked is effective is supervised.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有企业状态监管程序,所述企业状态监管程序被处理器执行时实现如下步骤:In addition, in order to achieve the above-mentioned object, the present application also provides a computer-readable storage medium with an enterprise state monitoring program stored on the computer-readable storage medium, and when the enterprise state monitoring program is executed by a processor, the following steps are implemented:
将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;Transmitting the preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;Acquiring structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmitting the structured data to the state data model to generate a data score of the structured data;
根据所述数据分值,监管所述待核查企业的企业状态是否有效。According to the data score, whether the enterprise status of the enterprise to be checked is effective is supervised.
本申请提供一种企业状态监管方法、装置、设备及计算机可读存储介质,先将预设样本数据传输到初始模型,并基于联邦学习算法对初始模型进行训练,生成状态数据模型;再获取待核查企业中与企业状态对应的结构化数据,并将结构化数据传输到状态数据模型,生成结构化数据的数据分值;进而根据该数据分值,监管待核查企业的企业状态是否有效。其中,预设样本数据为各个企业中表征各自状态的各类数据,为各企业真实有效的数据,通过联邦学习算法联合大量企业的预设样本数据进行训练,丰富了训练的样本量,使得所生成的状态数据模型更为准确。因此,通过状态数据模型来监管待核查企业状态的有效性,实现了结合待核查企业的各类真实数据来反映企业的状态,避免了依赖查询工商登记进行监管,在确保所监管企业状态的真实性和准确性的同时,还确保了监管的时效性,有利于及时监管和高效监管。This application provides a method, device, equipment, and computer-readable storage medium for monitoring the state of an enterprise. The preset sample data is first transmitted to the initial model, and the initial model is trained based on the federated learning algorithm to generate the state data model; Check the structured data corresponding to the state of the enterprise in the enterprise, and transfer the structured data to the state data model to generate the data score of the structured data; and then, according to the data score, supervise whether the enterprise state of the enterprise to be verified is valid. Among them, the preset sample data is various types of data representing their respective states in each enterprise, and is the real and effective data of each enterprise. The federated learning algorithm is combined with the preset sample data of a large number of enterprises for training, which enriches the training sample size and makes all The generated state data model is more accurate. Therefore, the status data model is used to monitor the effectiveness of the state of the enterprise to be verified, which combines various real data of the enterprise to be verified to reflect the state of the enterprise, avoids relying on the inspection of industrial and commercial registration for supervision, and ensures the authenticity of the state of the supervised enterprise. While ensuring the effectiveness and accuracy of supervision, it also ensures the timeliness of supervision, which is conducive to timely supervision and efficient supervision.
附图说明Description of the drawings
图1为本申请实施例方案涉及的硬件运行环境的企业状态监管设备结构示意图;FIG. 1 is a schematic diagram of the structure of an enterprise state monitoring device in a hardware operating environment involved in a solution according to an embodiment of the application;
图2为本申请企业状态监管方法第一实施例的流程示意图;FIG. 2 is a schematic flowchart of the first embodiment of the method for monitoring the state of an applicant enterprise;
图3为本申请企业状态监管装置较佳实施例的功能模块示意图。FIG. 3 is a schematic diagram of functional modules of a preferred embodiment of the enterprise state monitoring device of this application.
本发明的实施方式Embodiments of the present invention
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的企业状态监管设备结构示意图。As shown in FIG. 1, FIG. 1 is a schematic diagram of the structure of an enterprise state monitoring device in a hardware operating environment involved in the solution of an embodiment of the present application.
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本申请的说明,其本身没有特定的意义。因此,“模块”、“部件”或“单元”可以混合地使用。In the following description, the use of suffixes such as “module”, “part” or “unit” used to indicate elements is only for the description of the present application, and has no specific meaning in itself. Therefore, "module", "part" or "unit" can be used in a mixed manner.
本申请实施例企业状态监管设备可以是PC,也可以是平板电脑、便携计算机等可移动式终端设备。The enterprise state monitoring device in the embodiment of the present application may be a PC, or a portable terminal device such as a tablet computer and a portable computer.
如图1所示,该企业状态监管设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1, the enterprise state monitoring device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
本领域技术人员可以理解,图1中示出的企业状态监管设备结构并不构成对企业状态监管设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of the enterprise state monitoring device shown in FIG. 1 does not constitute a limitation on the enterprise state monitoring device, and may include more or less components than those shown in the figure, or a combination of certain components, or different components. The layout of the components.
如图1所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及检测程序。As shown in FIG. 1, the memory 1005, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a detection program.
在图1所示的设备中,网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;用户接口1003主要用于连接客户端(用户端),与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的检测程序,并执行以下操作:In the device shown in Figure 1, the network interface 1004 is mainly used to connect to the back-end server and communicate with the back-end server; the user interface 1003 is mainly used to connect to the client (user side) to communicate with the client; and the processor 1001 can be used to call the detection program stored in the memory 1005 and perform the following operations:
将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;Transmitting the preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;Acquiring structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmitting the structured data to the state data model to generate a data score of the structured data;
根据所述数据分值,监管所述待核查企业的企业状态是否有效。According to the data score, whether the enterprise status of the enterprise to be checked is effective is supervised.
进一步地,所述获取待核查企业中与企业状态对应的结构化数据的步骤包括:Further, the step of obtaining the structured data corresponding to the state of the enterprise in the enterprise to be checked includes:
采集待核查企业的企业文本数据,并从各所述企业文本数据中抽取与企业状态对应的文本数据进行分类,得到多类状态文本数据;Collect enterprise text data of the enterprise to be checked, and extract the text data corresponding to the enterprise status from each of the enterprise text data for classification, and obtain multiple types of status text data;
分别提取多类所述状态文本数据中的状态关键词,并根据预设数据格式,对提取的多类所述状态关键词进行格式转换,得到所述结构化数据。Extracting state keywords in multiple types of the state text data respectively, and performing format conversion on the extracted multiple types of state keywords according to a preset data format, to obtain the structured data.
进一步地,所述分别提取多类所述状态文本数据中的状态关键词的步骤包括:Further, the step of separately extracting state keywords in the multiple types of state text data includes:
分别对多类所述状态文本数据进行分段处理和分句处理,生成多类待识别分句,并剔除多类所述待识别分句中的无效分句;Performing segmentation processing and sentence processing on multiple types of the state text data respectively, generating multiple types of to-be-recognized clauses, and eliminating invalid clauses in the multiple types of the to-be-recognized clauses;
对经剔除所述无效分句后的多类所述待识别分句分别进行分词处理,生成多类待识别分词;Perform word segmentation processing on the multiple types of the to-be-recognized clauses after the invalid clauses are eliminated, and generate multiple types of to-be-recognized word segmentation;
将多类所述待识别分词中与所述企业状态无关的噪声词剔除,得到多类所述状态文本数据中的状态关键词。The noise words that are irrelevant to the state of the enterprise in the multiple types of word segmentation to be recognized are eliminated to obtain the state keywords in the multiple types of state text data.
进一步地,所述将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值的步骤包括:Further, the step of transmitting the structured data to the state data model to generate the data score of the structured data includes:
将所述结构化数据传输到所述状态数据模型,确定与所述结构化数据中各类子数据分别匹配的目标样本数据;Transmitting the structured data to the state data model, and determining target sample data respectively matching various types of sub-data in the structured data;
根据与各所述目标样本数据分别对应的分值和权重值,确定各类所述子数据的子分值;Determine the sub-scores of the various types of sub-data according to the scores and weights respectively corresponding to each of the target sample data;
根据各类所述子数据的子分值,生成所述结构化数据的数据分值。The data score of the structured data is generated according to the sub-score of the various types of the sub-data.
进一步地,所述根据所述数据分值,监管所述待核查企业的企业状态是否有效的步骤包括:Further, the step of supervising whether the enterprise status of the enterprise to be checked is valid according to the data score includes:
根据预设的组合分值与状态之间的对应关系,确定由所述数据分值中最大值、最小值和平均值所形成组合对应的目标状态;Determine the target state corresponding to the combination formed by the maximum value, the minimum value and the average value in the data score according to the preset correspondence between the combined score and the state;
查找与所述待核查企业对应的登记状态,并根据所述目标状态和所述登记状态之间的一致性,监管所述待核查企业的企业状态是否有效。Find the registration status corresponding to the company to be checked, and supervise whether the company status of the company to be checked is valid according to the consistency between the target status and the registration status.
进一步地,所述根据所述数据分值,监管所述待核查企业的企业状态是否有效的步骤之后,处理器1001可以用于调用存储器1005中存储的检测程序,并执行以下操作:Further, after the step of supervising whether the enterprise state of the enterprise to be checked is valid according to the data score, the processor 1001 may be used to call the detection program stored in the memory 1005 and perform the following operations:
将与所述待核查企业对应的研判分值传输到所述状态数据模型,判断所述研判分值与所述数据分值是否匹配;Transmitting the research and judgment score corresponding to the enterprise to be checked to the state data model, and determine whether the research and judgment score matches the data score;
若与所述数据分值匹配,则将所述数据分值和所述结构化数据对应存储;If it matches the data score, store the data score and the structured data correspondingly;
若与所述数据分值不匹配,则查找所述预设样本数据中与所述结构化数据匹配的目标样本数据;If it does not match the data score, searching for target sample data that matches the structured data in the preset sample data;
将所述目标样本数据以及与所述目标样本数据对应的分值标签剔除,并将所述研判分值生成为所述结构化数据的待训练分值标签;Removing the target sample data and the score label corresponding to the target sample data, and generating the research and judgment score as the to-be-trained score label of the structured data;
根据所述结构化数据和所述待训练分值标签,更新所述预设样本数据,并基于更新的所述预设样本数据对所述状态数据模型优化训练。According to the structured data and the score label to be trained, the preset sample data is updated, and the state data model is optimized for training based on the updated preset sample data.
进一步地,所述将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型的步骤包括:Further, the step of transmitting preset sample data to the initial model, and training the initial model based on a federated learning algorithm, and generating a state data model includes:
获取与预设正向字段名对应的正向样本数据,以及与预设负向字段名对应的负向样本数据,并将各所述正向样本数据和各所述负向样本数据作为所述预设样本数据传输到初始模型,对所述初始模型进行训练,生成模型梯度;Obtain the positive sample data corresponding to the preset positive field name and the negative sample data corresponding to the preset negative field name, and use each of the positive sample data and each of the negative sample data as the The preset sample data is transmitted to the initial model, the initial model is trained, and the model gradient is generated;
将所述模型梯度传输到与所述联邦学习算法对应的协调方,以供所述协调方将所述模型梯度和至少一个基于所述联邦学习算法生成的其他模型梯度进行聚合,生成回传梯度;The model gradient is transmitted to the coordinating party corresponding to the federated learning algorithm, so that the coordinating party aggregates the model gradient and at least one other model gradient generated based on the federated learning algorithm to generate a return gradient ;
接收所述协调方返回的回传梯度,并根据所述回传梯度对所述初始模型持续训练,直到所述初始模型收敛,得到所述状态数据模型。Receive the return gradient returned by the coordinator, and continuously train the initial model according to the return gradient until the initial model converges to obtain the state data model.
本申请企业状态监管设备的具体实施方式与下述企业状态监管方法各实施例基本相同,在此不再赘述。The specific implementation of the enterprise state supervision device of this application is basically the same as the following embodiments of the enterprise state supervision method, and will not be repeated here.
为了更好的理解上述技术方案,下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。In order to better understand the above technical solutions, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although the drawings show exemplary embodiments of the present disclosure, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
为了更好的理解上述技术方案,下面将结合说明书附图以及具体的实施方式对上述技术方案进行详细的说明。In order to better understand the above technical solutions, the above technical solutions will be described in detail below in conjunction with the accompanying drawings of the specification and specific implementations.
参照图2,本申请第一实施例提供一种企业状态监管方法的流程示意图。该实施例中,所述企业状态监管方法包括以下步骤:2, the first embodiment of the present application provides a schematic flowchart of a method for monitoring the state of an enterprise. In this embodiment, the enterprise state monitoring method includes the following steps:
步骤S10,将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;Step S10, transmitting preset sample data to an initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
本实施例中的企业状态监管方法应用于监管服务器,适用于通过监管服务器来对企业的状态进行监管。其中,企业的状态为企业的经营状态,通过企业在采购、欠费、信息公开、工商登记、监管、销售、培训等各方面的大量数据来判断企业是否处于经营状态继而进行监管。企业可以是餐饮、服装、旅行、金融、建筑等各类企业,本实施例优选以餐饮企业为例进行说明。具体地,通过已确定企业状态的各类餐饮企业在各方面的大量数据来对初始模型进行训练,得到用于企业监管的状态数据模型部署到监管服务器中。预先设定表征餐饮企业经营状况的数据指标,若采购、工商登记、销售、培训、明厨亮灶等指标,进而从各已确定企业状态的各类餐饮企业中获取在该类指标上的大量数据,即获取多个餐饮企业的采购数据、工商登记数据、销售数据、培训数据和明厨亮灶等数据。针对获取的数据依据各自对企业状态有效性的影响程度,设定不同的权重和分值,如相对于培训数据,采购数据对企业状态有效性的影响程度较高,从而针对采购数据设定较高的权重和分值。在获取的各数据均设定各自的权重和分值后,即将各类数据作为预设样本数据传输到初始模型进行训练。The method for monitoring the state of an enterprise in this embodiment is applied to a monitoring server, and is suitable for monitoring the state of an enterprise through the monitoring server. Among them, the state of an enterprise is the state of operation of the enterprise. A large amount of data in various aspects such as procurement, arrears, information disclosure, industrial and commercial registration, supervision, sales, training, etc. is used to determine whether the enterprise is in an operating state and then supervise it. The enterprise may be various types of enterprises such as catering, clothing, travel, finance, construction, etc. This embodiment preferably takes a catering enterprise as an example for description. Specifically, the initial model is trained through a large amount of data in various aspects of various catering companies that have determined the state of the enterprise, and the state data model for enterprise supervision is obtained and deployed to the supervision server. Pre-set data indicators that characterize the operating conditions of catering companies, such as procurement, industrial and commercial registration, sales, training, bright kitchens and other indicators, and then obtain a large number of such indicators from various catering companies with determined corporate status Data, that is, to obtain data such as procurement data, industrial and commercial registration data, sales data, training data and bright kitchens of multiple catering companies. Different weights and scores are set for the acquired data according to their respective degree of influence on the effectiveness of the state of the enterprise. For example, compared with training data, procurement data has a higher degree of influence on the effectiveness of the state of the enterprise. High weight and score. After each acquired data is set with its own weight and score, all kinds of data are transmitted as preset sample data to the initial model for training.
需要说明的是,初始模型是预先设定的网络模型,并且训练基于联邦学习算法实现。联邦学习算法是在保护数据隐私、满足合法合规要求的前提下继续进行机器学习的方式,其利用技术算法加密建造的模型,联邦双方在不用给出己方数据的情况下,也可进行模型训练得到模型参数,联邦学习通过加密机制下的参数交换方式保护用户数据隐私,数据和模型本身不会进行传输,也不能反猜对方数据,因此在数据层面不存在泄露的可能,也不违反更严格的数据保护法案,因此,能够在较高程度保持数据完整性的同时,保障数据隐私。It should be noted that the initial model is a preset network model, and the training is implemented based on a federated learning algorithm. The federated learning algorithm is a way to continue machine learning under the premise of protecting data privacy and meeting legal compliance requirements. It uses technical algorithms to encrypt the model built, and the federated parties can also conduct model training without providing their own data. After obtaining the model parameters, federated learning protects user data privacy through parameter exchange under the encryption mechanism. The data and the model itself will not be transmitted, nor can they guess the other party’s data. Therefore, there is no possibility of leakage at the data level, and it does not violate stricter rules. Therefore, the data protection act of the People’s Republic of China can protect data privacy while maintaining data integrity to a high degree.
本实施例将需要进行企业状态监管的相同级别的地区作为联邦的双方,如两个不同的县,两个不同的市等。不同地区设定各自的初始模型,并通过各自已确定企业状态的餐饮企业在各方面的数据,即各自的预设样本数据的联合来对各自的初始模型进行联邦训练,进而得到用于对各自本地区内的餐饮企业的企业状态进行监管的状态数据模型。即联合的每个地区均经训练得到本地的状态数据模型,对本地的餐饮企业的企业状态进行监管。在联合其他地区数据进行训练,丰富了训练样本量,使得状态数据模型更为准确的同时,又确保了数据的安全性。In this embodiment, regions of the same level that need to be supervised on the state of the enterprise are regarded as the two sides of the federation, such as two different counties, two different cities, and so on. Different regions set their own initial models, and conduct federated training to their respective initial models through the data of the catering companies that have determined their corporate status, that is, the combination of their respective preset sample data, and then obtain the federated training for their respective initial models A state data model for monitoring the corporate state of catering companies in the local area. That is, each area of the alliance has been trained to obtain a local state data model to supervise the business state of local catering companies. Training with data from other regions has enriched the training sample size and made the state data model more accurate while ensuring data security.
步骤S20,获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;Step S20: Obtain structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmit the structured data to the state data model to generate a data score of the structured data;
进一步地,将需要进行企业状态核查的餐饮企业作为待核查企业,获取其中与企业状态对应的结构化数据,该结构化数据为待核查企业在表征餐饮企业经营状况的数据指标上的各类,经结构化处理之后得到以特定数据结构存在的数据。其中,对应数据指标的各类数据,形成结构化数据中的各类子数据;如对应数据指标的销售数据和采购数据,则形成结构化数据中的两类子数据,且该两类子数据的数据结构相同。Furthermore, the catering company that needs to be verified as the company to be verified, and the structured data corresponding to the status of the company is obtained. The structured data is the various types of data indicators that represent the business status of the catering company. After structured processing, data in a specific data structure is obtained. Among them, various types of data corresponding to data indicators form various types of sub-data in structured data; for example, sales data and purchase data corresponding to data indicators form two types of sub-data in structured data, and these two types of sub-data The data structure is the same.
更进一步地,将结构化数据传输到状态数据模型中,通过状态数据模型对结构化数据中的各类子数据进行类型预测。进而依据预测的类型查找各类子数据的分值和权重,由各个分值和权重,来生成结构化数据的数据分值,表征企业经营状态的正常与否。具体地,将结构化数据传输到状态数据模型,生成结构化数据的数据分值的步骤包括:Furthermore, the structured data is transferred to the state data model, and the type prediction of various sub-data in the structured data is performed through the state data model. Then according to the type of prediction, the scores and weights of various types of sub-data are searched, and the data scores of structured data are generated from each score and weight to indicate whether the business status of the enterprise is normal or not. Specifically, the steps of transmitting structured data to the state data model and generating the data score of the structured data include:
步骤S21,将所述结构化数据传输到所述状态数据模型,确定与所述结构化数据中各类子数据分别匹配的目标样本数据;Step S21, transmitting the structured data to the state data model, and determining target sample data respectively matching various types of sub-data in the structured data;
进一步地,在将结构化数据传输到状态数据模型之后,由状态数据模型经训练之后得到的模型参数,来对结构化数据中的各类子数据进行分类,查找预设样本数据中与各类子数据分别匹配的目标样本数据。其中,匹配与否通过相似度的大小确定,当某一类子数据与预设样本数据中某一数据之间的相似度大于预设阈值,则判定该类子数据与该数据匹配,而将该数据作为目标样本数据。例如,结构化数据包含的子数据有a和b,状态数据模型中预设样本数据包括p1、p2和p3三类数据;则将a和b均传输到状态数据模型,依据其模型参数,计算a分别与p1、p2、p3之间的相似度;若与p3之间的相似度大于预设阈值,则将p3作为与a匹配的目标样本数据。同样地,计算b分别与p1、p2、p3之间的相似度;若与p1之间的相似度大于预设阈值,则将p1作为与b匹配的目标样本数据。Further, after the structured data is transmitted to the state data model, the model parameters obtained after the state data model are trained are used to classify the various sub-data in the structured data, and search for various types of data in the preset sample data. The sub-data respectively match the target sample data. Among them, the matching is determined by the size of the similarity. When the similarity between a certain type of sub-data and a certain data in the preset sample data is greater than the preset threshold, it is determined that the type of sub-data matches the data, and the This data is used as the target sample data. For example, the structured data contains sub-data a and b, and the preset sample data in the state data model includes three types of data p1, p2, and p3; then both a and b are transmitted to the state data model, and based on its model parameters, calculate The similarity between a and p1, p2, and p3, respectively; if the similarity with p3 is greater than the preset threshold, then p3 is used as the target sample data matching a. Similarly, calculate the similarity between b and p1, p2, and p3 respectively; if the similarity between b and p1 is greater than the preset threshold, then p1 is used as the target sample data matching b.
步骤S22,根据与各所述目标样本数据分别对应的分值和权重值,确定各类所述子数据的子分值;Step S22: Determine the sub-scores of the various types of sub-data according to the scores and weight values respectively corresponding to each of the target sample data;
可理解地,组成预设样本数据的各数据均对应有各自的分值和权重,在得到目标样本数据之后,则可查找与各目标样本数据分别对应的分值和权重。进而对分值和权重进行计算,将与同一目标样本数据对应的分值和权重相乘,得到各类子数据的子分值。例如,对于上述目标样本数据p1和p3,若p1对应的分值和权重值分别为w1和k1,p3对应的分值和权重值分别为w3和k3,则子数据a的子分值为k3*w3,子数据b的子分值为k1*w1。Understandably, each data constituting the preset sample data has its own score and weight. After the target sample data is obtained, the score and weight corresponding to each target sample data can be searched. Furthermore, the scores and weights are calculated, and the scores and weights corresponding to the same target sample data are multiplied to obtain sub-scores of various sub-data. For example, for the aforementioned target sample data p1 and p3, if the score and weight value of p1 are w1 and k1, and the score and weight value of p3 are w3 and k3, then the sub-score of sub-data a is k3 *w3, the sub-score of sub-data b is k1*w1.
步骤S23,根据各类所述子数据的子分值,生成所述结构化数据的数据分值。Step S23: Generate data scores of the structured data according to the sub-scores of the various types of sub-data.
进一步地,在各类子数据的子分值均计算完成后,即可依据各个子分值,得到结构化数据的数据分值。为了通过数据分值准确表征企业经营状态,可将数据分值设定为包括多个分值的集合,其中包括的分值至少有:各个子分值的最小值、最大值和平均值。通过对各类子数据的子分值进行对比,筛选出其中的最小值和最大值;同时对各个子分值进行均值处理,得到平均值;进而将得到的最小值、最大值和平均值作为结构化数据的数据分值。例如,对于上述包含子数据a和b的结构化数据,对各自的子分值k3*w3和k1*w1比较,确定其中的最大值k1*w1,和最小值k3*w3,同时在两项子分值之间做均值处理,得到平均值(k1*w1+k1*w1)/2;进而将k1*w1、k3*w3和(k1*w1+k1*w1)/2形成为结构化数据的数据分值。Further, after all the sub-scores of various types of sub-data are calculated, the data scores of the structured data can be obtained according to each sub-score. In order to accurately characterize the business status of the enterprise through data scores, the data scores can be set as a collection of multiple scores, including at least the minimum, maximum, and average values of each sub-score. By comparing the sub-scores of various sub-data, the minimum and maximum values are selected; meanwhile, the average value of each sub-score is processed to obtain the average; and then the minimum, maximum, and average values obtained are used as Data score for structured data. For example, for the above structured data containing sub-data a and b, compare the respective sub-scores k3*w3 and k1*w1 to determine the maximum value k1*w1 and minimum value k3*w3. Perform average processing between the sub-scores to get the average value (k1*w1+k1*w1)/2; then k1*w1, k3*w3 and (k1*w1+k1*w1)/2 are formed into structured data Data points.
步骤S30,根据所述数据分值,监管所述待核查企业的企业状态是否有效。Step S30, according to the data score, supervise whether the enterprise status of the enterprise to be checked is valid.
更进一步地,预先针对不同的数据分值设定不同经营状态的对应关系,将得到的结构化数据的数据分值和该对应关系进行对比,确定对应关系中与结构化数据的数据分值一致的目标数据分值,进而依据目标数据分值在对应关系中的经营状态,确定待核查企业的实际企业状态,该实际企业状态表征了待核查企业当前真实的经营状态。进而依据该实际企业状态对该待核查企业登记的企业状态是否有效进行监管,判定登记的企业状态是否存在更新上的滞后,若存在更新上的滞后,则判定企业状态无效;反之,若不存在更新上的滞后,则判定企业状态有效。具体地,根据数据分值,监管待核查企业的企业状态是否有效的步骤包括:Furthermore, the corresponding relationship of different operating states is set in advance for different data scores, and the obtained data score of structured data is compared with the corresponding relationship, and the corresponding relationship is determined to be consistent with the data score of structured data. According to the target data score of the target data, the actual business status of the enterprise to be checked is determined based on the operating status of the target data score in the corresponding relationship. The actual enterprise status represents the current true operating status of the enterprise to be checked. Furthermore, according to the actual enterprise status, whether the registered enterprise status of the enterprise to be checked is effective is supervised, and it is determined whether there is an update lag in the registered enterprise status. If there is an update lag, the enterprise status is determined to be invalid; otherwise, if it does not exist The lag in the update determines that the state of the enterprise is valid. Specifically, according to the data score, the steps to supervise whether the enterprise status of the enterprise to be checked is effective include:
步骤S31,根据预设的组合分值与状态之间的对应关系,确定由所述数据分值中最大值、最小值和平均值所形成组合对应的目标状态;Step S31: Determine the target state corresponding to the combination formed by the maximum value, the minimum value and the average value in the data score according to the preset correspondence between the combined score and the state;
可理解地,因数据分值为包含最大值、最小值和平均值等多项分值的集合,从而在设定数据分值与经营状态之间对应关系时,将多个分值范围形成为组合分值,预设为与状态之间的对应关系。在确定数据分值后,调用该对应关系,并将由数据分值中最大值、最小值和平均值所形成的组合和对应关系中组合分值对比,判断所形成组合的各项数值是否在组合分值的分值范围内;若各项数值同时存在于某一组合分值的各项分值范围内,即最大值、最小值和平均值同时存在于某一组合分值的最大值范围、最小值范围和平均值范围,则查找对应关系中与该组合分值对应的状态,作为与数据分值对应的目标状态,表征待核查企业当前实际的经营状态。Understandably, because the data score is a collection of multiple scores including the maximum value, the minimum value, and the average value, when setting the correspondence between the data score and the operating status, the multiple score ranges are formed as The combined score is preset to correspond to the state. After determining the data score, call the corresponding relationship, and compare the combination of the maximum, minimum, and average of the data scores with the combined scores in the corresponding relationship to determine whether the values of the combined combination are in the combination Within the range of the score; if each value exists in the range of each score of a certain combination, that is, the maximum, minimum and average values exist in the range of the maximum of a certain combination of scores, The minimum range and the average range are used to find the state corresponding to the combined score in the corresponding relationship, as the target state corresponding to the data score, which represents the current actual operating state of the enterprise to be checked.
步骤S32,查找与所述待核查企业对应的登记状态,并根据所述目标状态和所述登记状态之间的一致性,监管所述待核查企业的企业状态是否有效。Step S32, searching for the registration status corresponding to the enterprise to be checked, and supervising whether the enterprise status of the enterprise to be checked is valid according to the consistency between the target state and the registration status.
进一步地,对待核查企业的登记状态进行查找,并将查找的登记状态和目标状态进行对比,判断两者的一致性。若经对比确定两者一致,则说明待核查企业所登记的状态与其实际的运营状态一致,判定待核查企业的企业状态有效。反之若经对比登记状态和目标状态不一致,则说明待核查企业所登记的状态与实际运营的状态不一致,登记的企业状态存在更新上的滞后,判定待核查企业的企业状态无效。以此,实现结合待核查企业各方面的真实数据来监管其经营状态,确保了监管的真实性和有效性。同时,通过获取结构化数据即可实现监管,又确保了监管的及时性和高效性。Further, the registration status of the enterprise to be checked is searched, and the searched registration status is compared with the target status, and the consistency of the two is judged. If it is determined that the two are consistent after comparison, it means that the registered status of the company to be verified is consistent with its actual operating status, and it is determined that the company status of the company to be verified is valid. On the contrary, if the registration status and the target status are inconsistent after comparison, it means that the status of the company to be verified is inconsistent with the status of actual operations, and the status of the registered company has a lag in updating, and it is determined that the status of the company to be verified is invalid. In this way, the real data of all aspects of the enterprise to be checked can be combined to supervise its operating status, and the authenticity and effectiveness of supervision can be ensured. At the same time, supervision can be achieved by obtaining structured data, and the timeliness and efficiency of supervision can be ensured.
本实施例的企业状态监管方法,先将预设样本数据传输到初始模型,并基于联邦学习算法对初始模型进行训练,生成状态数据模型;再获取待核查企业中与企业状态对应的结构化数据,并将结构化数据传输到状态数据模型,生成结构化数据的数据分值;进而根据该数据分值,监管待核查企业的企业状态是否有效。其中,预设样本数据为各个企业中表征各自状态的各类数据,为各企业真实有效的数据,通过联邦学习算法联合大量企业的预设样本数据进行训练,丰富了训练的样本量,使得所生成的状态数据模型更为准确。因此,通过状态数据模型来监管待核查企业状态的有效性,实现了结合待核查企业的各类真实数据来反映企业的状态,避免了依赖查询工商登记进行监管,在确保所监管企业状态的真实性和准确性的同时,还确保了监管的时效性,有利于及时监管和高效监管。The enterprise state monitoring method of this embodiment first transmits preset sample data to the initial model, and trains the initial model based on the federated learning algorithm to generate a state data model; then obtains structured data corresponding to the state of the enterprise in the enterprise to be checked , And transfer the structured data to the status data model to generate the data score of the structured data; and then, according to the data score, supervise whether the enterprise status of the enterprise to be checked is valid. Among them, the preset sample data is various types of data representing their respective states in each enterprise, and is the real and effective data of each enterprise. The federated learning algorithm is combined with the preset sample data of a large number of enterprises for training, which enriches the training sample size and makes all The generated state data model is more accurate. Therefore, the status data model is used to monitor the effectiveness of the state of the enterprise to be verified, which combines various real data of the enterprise to be verified to reflect the state of the enterprise, avoids relying on the inspection of industrial and commercial registration for supervision, and ensures the authenticity of the state of the supervised enterprise. While ensuring the effectiveness and accuracy of supervision, it also ensures the timeliness of supervision, which is conducive to timely supervision and efficient supervision.
进一步的,基于本申请企业状态监管方法第一实施例,提出本申请企业状态监管方法第二实施例,在第二实施例中,所述获取待核查企业中与企业状态对应的结构化数据的步骤包括:Further, based on the first embodiment of the enterprise state supervision method of the present application, a second embodiment of the enterprise state supervision method of the present application is proposed. The steps include:
步骤S24,采集待核查企业的企业文本数据,并从各所述企业文本数据中抽取与企业状态对应的文本数据进行分类,得到多类状态文本数据;Step S24: Collect enterprise text data of the enterprise to be checked, and extract the text data corresponding to the enterprise status from each of the enterprise text data for classification, to obtain multiple types of status text data;
可理解地,企业在经营过程中涉及到的数据众多,在监管过程中,为了获取到表征待核查企业经营状态的结构化数据,需要先获取其在经营过程中所生成的各类数据,进而从各类数据中抽取出与经营状态相关的数据进行处理,以此得到结构化数据。具体地,监管服务器与待核查企业对接,以从待核查企业中采集其企业文本数据,该企业文本数据即为待核查企业经营过程中涉及到的各类数据,至少包括采购、欠费、信息公开、工商登记、监管、销售、培训、企业组成架构、企业员工组成等各类以文本形式存在的数据。Understandably, there are a lot of data involved in the business process of an enterprise. In the supervision process, in order to obtain structured data that characterizes the business status of the enterprise to be checked, it is necessary to obtain all kinds of data generated in the business process first, and then Data related to business status is extracted from various types of data for processing to obtain structured data. Specifically, the supervisory server connects with the company to be verified to collect its corporate text data from the company to be verified. The text data of the company is the various types of data involved in the business process of the company to be verified, including at least purchases, arrears, and information. Disclosure, business registration, supervision, sales, training, corporate structure, corporate employee composition, and other data in text form.
进一步地,对采集的企业文本数据依据表征餐饮企业经营状况的数据指标进行抽取,抽取出与企业状态对应的文本数据,并对抽取的文本数据进行分类,得到多类与数据指标对应的状态文本数据。即判断抽取的文本数据所归属的数据指标,将归属于同一类数据指标的数据划分到同一类型,形成诸如销售、采购等多类状态文本数据。Further, extract the collected corporate text data based on the data indicators representing the operating conditions of the catering company, extract the text data corresponding to the corporate status, and classify the extracted text data to obtain multiple types of status text corresponding to the data indicators data. That is, to determine the data index to which the extracted text data belongs, and divide the data belonging to the same type of data index into the same type to form multiple types of status text data such as sales and purchases.
步骤S25,分别提取多类所述状态文本数据中的状态关键词,并根据预设数据格式,对提取的多类所述状态关键词进行格式转换,得到所述结构化数据。Step S25, extracting state keywords in multiple types of the state text data respectively, and performing format conversion on the extracted multiple types of state keywords according to a preset data format to obtain the structured data.
更进一步地,对各类状态文本数据中的状态关键词进行提取,以通过各类型的状态关键词来表征待核查企业在各个方面所体现的经营状态。同时预先依据需求的数据结构设定有预设数据格式,按照预设数据格式,对提取的多类状态关键字分别进行格式转换,将每一类状态关键词均转化为预设数据格式所表征的数据形式,得到多类结构化数据。例如,对于采购数据,其预设数据格式为:采购品类-采购时间-采购数据,则在提取到采购文本数据各次采购的状态关键字之后,对各次采购的状态关键字按照预设格式数据进行排列,并检测状态关键字中时间关键字与预设数据格式中所要求的采购时间格式是否一致,若要求的采购时间格式为XXXX年-XX月-XX日,而时间关键字的时间格式为XX.XX.XX,则判定时间格式不一致,则在将状态关键字按照预设格式数据进行排列的同时,对其中的时间关键字的时间格式进行转换,以形成符合预设数据格式要求的结构化数据。Furthermore, the status keywords in various status text data are extracted to characterize the operating status of the enterprise to be checked in various aspects through various types of status keywords. At the same time, a preset data format is set in advance according to the required data structure. According to the preset data format, the extracted multiple types of status keywords are formatted separately, and each type of status keyword is converted into the preset data format. In the form of data, multiple types of structured data are obtained. For example, for purchasing data, the preset data format is: purchasing category-purchasing time-purchasing data, after extracting the state keywords of each purchase in the purchase text data, the state keywords of each purchase are in accordance with the preset format Arrange the data, and check whether the time keyword in the status keyword is consistent with the purchase time format required in the preset data format. If the required purchase time format is XXXX year-XX month-XX day, and the time of the time keyword If the format is XX.XX.XX, it is judged that the time format is inconsistent. While arranging the status keywords according to the preset format data, the time format of the time keywords is converted to meet the requirements of the preset data format. Structured data.
进一步地,所述分别提取多类所述状态文本数据中的状态关键词的步骤包括:Further, the step of separately extracting state keywords in the multiple types of state text data includes:
步骤S251,分别对多类所述状态文本数据进行分段处理和分句处理,生成多类待识别分句,并剔除多类所述待识别分句中的无效分句;Step S251, performing segmentation processing and sentence processing on the multiple types of state text data respectively, generating multiple types of to-be-recognized clauses, and removing invalid clauses in the multiple types of the to-be-recognized clauses;
进一步地,本实施提取各类状态文本数据中的状态关键词是针对各类状态文本数据单独处理的过程,该单独处理可以是串行处理也可以是并行处理。具体地,先对每一类状态文本数据进行分段处理,得到状态文本数据的文本数据段,进而对各文本数据段进行分句处理,得到多个文本语句作为待识别分句。此后,查找多个待识别分句中与经营状态不相关的语句,并将查找得到的语句作为无效分句进行剔除,以确保从待识别分句中所提取的状态关键词均与经营状态相关。Further, in this implementation, extracting state keywords in various state text data is a process of processing each state text data separately, and the separate processing may be serial processing or parallel processing. Specifically, first perform segmentation processing on each type of state text data to obtain a text data segment of the state text data, and then perform segmentation processing on each text data segment to obtain multiple text sentences as the to-be-recognized clauses. After that, search for sentences that are not related to the operating status in multiple clauses to be identified, and remove the searched sentences as invalid clauses to ensure that the status keywords extracted from the clauses to be identified are all related to the operating status .
步骤S252,对经剔除所述无效分句后的多类所述待识别分句分别进行分词处理,生成多类待识别分词;Step S252, performing word segmentation processing on the multiple types of the to-be-recognized clauses after removing the invalid clauses, respectively, to generate multiple types of to-be-recognized word segmentation;
更进一步地,对剔除无效分句后的每一类待识别分句均进行分词处理,将待识别分句按照语言逻辑划分为多个词语,得到每一类状态文本数据中的待识别分词。Furthermore, after eliminating invalid clauses, each type of sentence to be recognized is subjected to word segmentation processing, and the sentence to be recognized is divided into multiple words according to the language logic, and the word to be recognized in each type of state text data is obtained.
步骤S253,将多类所述待识别分词中与所述企业状态无关的噪声词剔除,得到多类所述状态文本数据中的状态关键词。Step S253: Eliminate noise words that are irrelevant to the state of the enterprise among the multiple types of word segmentation to be recognized, and obtain multiple types of state keywords in the state text data.
进一步地,预先设定与经营状态相关的词形成词典,将划分的待识别分词逐一和词典中的词对比,判断待识别分词是否存在于词典中。若存在于词典中则判定该划分的待识别分词为与经营状态相关的有效词,若不存在与词典中则判定该划分的待识别分词为与经营状态无关的无效词。在查找出每一类待识别分词中与经营状态无关的所有无效词之后,将所有的无效词作为与企业状态无关的噪声词进行剔除,得到每一类状态文本数据中的状态关键词。进而将各类状态关键词按照预设数据格式进行格式转换,得到从各方面表征企业实际状态的结构化数据,结构化数据中的各项子数据均以相同的预设数据格式所存在,便于对各项子数据以同样的方式进行处理,有利于处理效率的提高。Further, pre-set words related to the business status form a dictionary, and compare the divided words to be recognized with the words in the dictionary one by one to determine whether the word to be recognized exists in the dictionary. If it exists in the dictionary, it is determined that the segmented word to be recognized is a valid word related to the business status, and if it does not exist in the dictionary, it is determined that the segmented word to be recognized is an invalid word that has nothing to do with the business status. After finding out all the invalid words that are not related to the business status in each type of word segmentation to be recognized, all the invalid words are eliminated as noise words that are not related to the state of the enterprise, and the status keywords in the text data of each type of status are obtained. Furthermore, various status keywords are formatted according to the preset data format to obtain structured data representing the actual state of the enterprise from various aspects. Each sub-data in the structured data exists in the same preset data format, which is convenient The processing of each sub-data in the same way is conducive to the improvement of processing efficiency.
本实施例从采集的待核查企业的各类企业文本数据中提取出状态关键词,并生成为结构化数据,来表征企业实际状态。因各类企业文本数据为企业的真实数据,且从各方面表征了企业的经营状态,使得依据其生成的结构化数据实现了从多方面来反应企业的真实状态,提高了企业实际状态的有效性和准确性。In this embodiment, status keywords are extracted from various corporate text data of the company to be checked, and are generated as structured data to represent the actual status of the company. Because all kinds of enterprise text data are the real data of the enterprise, and represent the operating status of the enterprise from various aspects, the structured data generated according to it can reflect the true state of the enterprise from many aspects, and improve the effectiveness of the actual state of the enterprise. Sex and accuracy.
进一步的,基于本申请企业状态监管方法第一实施例或第二实施例,提出本申请企业状态监管方法第三实施例,在第三实施例中,所述根据所述数据分值,监管所述待核查企业的企业状态是否有效的步骤之后包括:Further, based on the first embodiment or the second embodiment of the enterprise state supervision method of the present application, a third embodiment of the enterprise state supervision method of the present application is proposed. In the third embodiment, according to the data score, the supervision office After describing the steps to verify whether the enterprise status of the enterprise is valid, the following include:
步骤S40,将与所述待核查企业对应的研判分值传输到所述状态数据模型,判断所述研判分值与所述数据分值是否匹配;Step S40, transmitting the research and judgment score corresponding to the enterprise to be checked to the state data model, and judge whether the research and judgment score matches the data score;
本实施例针对状态数据模型设置有优化机制。具体地,由人工依据结构化数据所表征的企业状态,对待核查企业进行打分,得到与待核查企业对于的研判分值传输到监管服务器。监管服务器则将该研判分值传输到状态数据模型,判断经由状态数据模型生成的数据分值与该研判分值之间是否匹配。其中,匹配不要求完全一致,当数据分值和研判分值的数值差值在一定范围内,则说明两者较为接近,可认定两者匹配。反之则说明两者相差较远,认定两者不匹配。This embodiment is provided with an optimization mechanism for the state data model. Specifically, the enterprise to be checked is scored manually based on the state of the enterprise represented by the structured data, and the research and judgment scores of the enterprise to be checked are obtained and transmitted to the supervision server. The supervisory server transmits the research judgment score to the state data model, and judges whether the data score generated by the state data model matches the research judgment score. Among them, matching is not required to be completely consistent. When the numerical difference between the data score and the research score is within a certain range, it means that the two are relatively close, and the two can be considered to match. On the contrary, it shows that the two are far apart, and it is determined that the two do not match.
步骤S50,若与所述数据分值匹配,则将所述数据分值和所述结构化数据对应存储;Step S50, if it matches the data score, store the data score and the structured data correspondingly;
进一步地,若经判定研判分值与数据分值匹配,则说明状态数据模型当前可准确对结构化数据进行处理,可以不用优化。此时,将数据分值和结构化数据形成对应关系后存储,作为企业状态监管依据。Further, if it is determined that the research score matches the data score, it means that the state data model can accurately process structured data at present, and optimization is not necessary. At this time, the data scores and structured data are formed into a corresponding relationship and then stored as the basis for corporate state supervision.
步骤S60,若与所述数据分值不匹配,则查找所述预设样本数据中与所述结构化数据匹配的目标样本数据;Step S60, if it does not match the data score, search for target sample data that matches the structured data in the preset sample data;
更进一步地,若经确定研判分值与数据分值之间不匹配,则说明状态数据模型当前对结构化数据处理的准确性低,而需要对其进行优化。因数据分值依据状态数据模型中与待核查企业的结构化数据相似的预设样本数据生成,而依据该相似的预设样本数据生成的数据分值不准确,故对状态数据模型的优化处理即为对该相似的预设样本数据进行处理。具体地,将结构化数据与预设样本数据中的各项数据进行对比,查找其中与结构化数据相似度大于预设相似阈值的数据,进而将查找得到的数据作为与结构化数据匹配的目标样本数据,即与核查企业的结构化数据相似,用于生成数据分值的预设样本数据。Furthermore, if it is determined that there is a mismatch between the research score and the data score, it means that the state data model currently has low accuracy in processing structured data, and it needs to be optimized. Because the data score is generated based on the preset sample data similar to the structured data of the enterprise to be checked in the state data model, and the data score generated based on the similar preset sample data is inaccurate, the optimization processing of the state data model That is to process the similar preset sample data. Specifically, the structured data is compared with the data in the preset sample data, and the data whose similarity with the structured data is greater than the preset similarity threshold is searched, and the data obtained by the search is used as the target of matching with the structured data Sample data, which is similar to the structured data of the verification enterprise, is used to generate the preset sample data of the data score.
步骤S70,将所述目标样本数据以及与所述目标样本数据对应的分值标签剔除,并将所述研判分值生成为所述结构化数据的待训练分值标签;Step S70, removing the target sample data and the score label corresponding to the target sample data, and generating the research and judgment score as the to-be-trained score label of the structured data;
进一步地,状态数据模型依据目标样本数据所携带的表征分值和权重的分值标签来生成数据分值,在所生成的数据分值不准确而对状态数据模型优化过程中,将目标样本数据及其携带的分值标签从预设样本数据中剔除,不作为状态数据模型的训练样本。同时,因研判分值为准确分值,而将研判分值生成为结构化数据的待训练分值标签,以用于对状态数据模型进行训练,优化状态数据模型。Further, the state data model generates data scores according to the score labels that represent the scores and weights carried by the target sample data. In the process of optimizing the state data model due to the inaccuracy of the generated data scores, the target sample data The score label carried by it is removed from the preset sample data and is not used as a training sample for the state data model. At the same time, because the research and judgment scores are accurate scores, the research and judgment scores are generated as the to-be-trained score labels of the structured data, which are used to train the state data model and optimize the state data model.
步骤S80,根据所述结构化数据和所述待训练分值标签,更新所述预设样本数据,并基于更新的所述预设样本数据对所述状态数据模型优化训练。In step S80, the preset sample data is updated according to the structured data and the score label to be trained, and the state data model is optimized for training based on the updated preset sample data.
更进一步地,根据数据指标,将结构化数据转换为样本数据,并将转换的样本数据和待训练分值标签作为新的预设样本数据,以更新预设样本数据。此后,基于更新的预设样本数据对状态数据模型进行优化训练,以提高状态数据模型的准确性。Furthermore, according to the data index, the structured data is converted into sample data, and the converted sample data and the score label to be trained are used as new preset sample data to update the preset sample data. Thereafter, the state data model is optimized and trained based on the updated preset sample data to improve the accuracy of the state data model.
需要说明的是,本实施例对于目标样本数据也可重新设置分值标签进行训练,即仅将与目标样本数据对应的分值标签剔除,而保留目标样本数据。并且依据研判分值设定目标样本数据的新的分值标签,进而将目标样本数据及其新的分值标签作为新的预设样本数据,对状态数据模型进行优化训练,提高状态数据模型的准确性。It should be noted that this embodiment can also reset the score label for the target sample data for training, that is, only remove the score label corresponding to the target sample data, and retain the target sample data. And set the new score label of the target sample data according to the research and judgment score, and then use the target sample data and its new score label as the new preset sample data, optimize the training of the state data model, and improve the performance of the state data model. accuracy.
进一步的,基于本申请企业状态监管方法第一实施例、第二实施例或第三实施例,提出本申请企业状态监管方法第四实施例,在第四实施例中,所述将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型的步骤包括:Further, based on the first embodiment, the second embodiment or the third embodiment of the enterprise state supervision method of the present application, the fourth embodiment of the enterprise state supervision method of the present application is proposed. In the fourth embodiment, the preset sample The data is transmitted to the initial model, and the initial model is trained based on the federated learning algorithm. The steps of generating the state data model include:
步骤S11,获取与预设正向字段名对应的正向样本数据,以及与预设负向字段名对应的负向样本数据,并将各所述正向样本数据和各所述负向样本数据作为所述预设样本数据传输到初始模型,对所述初始模型进行训练,生成模型梯度;Step S11: Obtain the positive sample data corresponding to the preset positive field name and the negative sample data corresponding to the preset negative field name, and combine each of the positive sample data and each of the negative sample data Transmitting to the initial model as the preset sample data, training the initial model, and generating model gradients;
本实施例基于联邦学习算法对初始模型进行联邦训练生成状态数据模型,其中联邦训练至少涉及到两方地区,各方地区均设置有各自的初始模型,各方地区之间用于训练的预设样本数据相互独立。各方地区对各自初始模型的训练过程相同,本实施例以其中任意一方进行说明。具体地,预设样本数据包括表征企业状态有效的正向样本数据,以及表征企业状态无效的负向样本数据。预先设置有表征正向样本的预设正向字段名和表征负向样本的预设负向字段名。在采集到该方已确定企业状态的各类餐饮企业在数据指标上的大量数据后,依据预设正向字段名和负向字段名,对采集的大量数据进行筛选,以得到与预设正向字段名对应的正向样本数据,以及与预设负向字段名对应的负向样本数据。进而针对各正向样本数据设定不同的分值和权重,以及针对各负向样本数据设定不同的分值和权重,此后将各项正向样本数据和各项负向样本数据作为预设样本数据传输到初始模型进行训练,生成该方用于对模型参数进行更新的模型梯度。This embodiment is based on a federated learning algorithm to perform federated training on the initial model to generate a state data model. The federated training involves at least two regions, each region has its own initial model, and the presets used for training between the regions The sample data are independent of each other. All regions have the same training process for their respective initial models, and this embodiment uses any one of them for description. Specifically, the preset sample data includes positive sample data indicating that the state of the enterprise is valid, and negative sample data indicating that the state of the enterprise is invalid. A preset positive field name representing a positive sample and a preset negative field name representing a negative sample are preset. After collecting a large amount of data on the data indicators of various catering companies that have determined the enterprise status of the party, the large amount of data collected is filtered according to the preset positive field name and negative field name to obtain the The positive sample data corresponding to the field name, and the negative sample data corresponding to the preset negative field name. Then set different scores and weights for each positive sample data, and set different scores and weights for each negative sample data, and then set each positive sample data and each negative sample data as the default The sample data is transmitted to the initial model for training, and the model gradient used by the party to update the model parameters is generated.
步骤S12,将所述模型梯度传输到与所述联邦学习算法对应的协调方,以供所述协调方将所述模型梯度和至少一个基于所述联邦学习算法生成的其他模型梯度进行聚合,生成回传梯度;Step S12: Transmit the model gradient to the coordinator corresponding to the federated learning algorithm, so that the coordinator can aggregate the model gradient and at least one other model gradient generated based on the federated learning algorithm to generate Return gradient
进一步地,为了对各方地区初始模型训练过程的协调,联邦训练过程中设置有与联邦学习算法对应的协调方,该协调方可以是各方地区中的任意一方,也可以是独立于各方地区的第三方。将生成的模型梯度传输到该协调方,协调方将该模型梯度和其他各方基于联邦学习算法生成的其他模型梯度进行聚合,聚合可依据需求设定为均值聚合或者加权聚合等,以此,生成回传梯度返回到各方地区的监督服务器。Further, in order to coordinate the initial model training process of the various regions, a coordinator corresponding to the federated learning algorithm is set in the federal training process. The coordinator can be any party in the various regions, or it can be independent of each party. Third party in the region. The generated model gradient is transmitted to the coordinating party, and the coordinating party aggregates the model gradient and other model gradients generated by other parties based on the federated learning algorithm. The aggregation can be set to mean aggregation or weighted aggregation according to requirements. Generate a return gradient and return it to the supervisory server in each region.
步骤S13,接收所述协调方返回的回传梯度,并根据所述回传梯度对所述初始模型持续训练,直到所述初始模型收敛,得到所述状态数据模型。Step S13: Receive the return gradient returned by the coordinator, and continuously train the initial model according to the return gradient until the initial model converges to obtain the state data model.
更进一步地,在接收到协调方返回的回传梯度后,则根据回传梯度对初始模型持续训练。在每次训练结束后均判断初始模型是否收敛,若收敛,则说明所训练的初始模型可准确生成数据分值,而将该经训练的初始模型作为状态数据模型。反之,若不收敛,则继续训练直到收敛,得到状态数据模型。Furthermore, after receiving the return gradient returned by the coordinator, the initial model is continuously trained according to the return gradient. After each training, it is judged whether the initial model converges. If it converges, it means that the trained initial model can accurately generate data scores, and the trained initial model is used as the state data model. On the contrary, if it does not converge, continue training until it converges to obtain the state data model.
需要说明的是,初始模型的收敛与否可通过初始模型中的收敛函数判定。在初始模型每次训练结束后,依据初始模型中经训练得到的模型参数对测试样本数据进行处理,得到处理结果。通过收敛函数来计算处理结果与预期结果之间的额损失值,在损失值持续多次均小于预设值后,判定初始模型收敛,停止训练,反之则继续训练。It should be noted that the convergence of the initial model can be determined by the convergence function in the initial model. After each training of the initial model, the test sample data is processed according to the model parameters obtained through training in the initial model to obtain the processing result. The convergence function is used to calculate the loss value between the processing result and the expected result. After the loss value continues to be less than the preset value for many times, it is determined that the initial model has converged, and the training is stopped. Otherwise, the training is continued.
本实施例通过联邦学习算法对初始模型进行联邦训练得到状态数据模型,在各方地区的预设样本数据不外传,保护数据隐私性的同时丰富了训练样本数量,优化了状态数据模型的训练效果,使得依据状态数据模型所进行的企业状态监管更为准确。In this embodiment, a federated learning algorithm is used to perform federated training on the initial model to obtain a state data model. The preset sample data in various regions is not transmitted outside, which protects data privacy while enriching the number of training samples and optimizes the training effect of the state data model. , Which makes the enterprise state supervision based on the state data model more accurate.
进一步地,本申请还提供一种企业状态监管装置。Furthermore, this application also provides an enterprise state monitoring device.
参照图3,图3为本申请企业状态监管装置第一实施例的功能模块示意图。所述企业状态监管装置包括:Referring to Fig. 3, Fig. 3 is a schematic diagram of the functional modules of the first embodiment of the enterprise state monitoring device of this application. The enterprise state monitoring device includes:
生成模块10,用于将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;The generating module 10 is used for transmitting preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
获取模块20,用于获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;The obtaining module 20 is configured to obtain structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmit the structured data to the state data model to generate a data score of the structured data;
监管模块30,用于根据所述数据分值,监管所述待核查企业的企业状态是否有效。The supervision module 30 is configured to supervise whether the enterprise status of the enterprise to be checked is valid according to the data score.
本实施例的企业状态监管装置,先由生成模块10将预设样本数据传输到初始模型,并基于联邦学习算法对初始模型进行训练,生成状态数据模型;再由获取模块20获取待核查企业中与企业状态对应的结构化数据,并将结构化数据传输到状态数据模型,生成结构化数据的数据分值;进而由监管模块30根据该数据分值,监管待核查企业的企业状态是否有效。其中,预设样本数据为各个企业中表征各自状态的各类数据,为各企业真实有效的数据,通过联邦学习算法联合大量企业的预设样本数据进行训练,丰富了训练的样本量,使得所生成的状态数据模型更为准确。因此,通过状态数据模型来监管待核查企业状态的有效性,实现了结合待核查企业的各类真实数据来反映企业的状态,避免了依赖查询工商登记进行监管,在确保所监管企业状态的真实性和准确性的同时,还确保了监管的时效性,有利于及时监管和高效监管。In the enterprise state monitoring device of this embodiment, the generation module 10 first transmits the preset sample data to the initial model, and trains the initial model based on the federated learning algorithm to generate the state data model; The structured data corresponding to the state of the enterprise is transmitted to the state data model to generate the data score of the structured data; and the supervisory module 30 supervises whether the enterprise state of the enterprise to be checked is valid according to the data score. Among them, the preset sample data is various types of data representing their respective states in each enterprise, and is the real and effective data of each enterprise. The federated learning algorithm is combined with the preset sample data of a large number of enterprises for training, which enriches the training sample size and makes all The generated state data model is more accurate. Therefore, the status data model is used to monitor the effectiveness of the state of the enterprise to be verified, which combines various real data of the enterprise to be verified to reflect the state of the enterprise, avoids relying on the inspection of industrial and commercial registration for supervision, and ensures the authenticity of the state of the supervised enterprise. While ensuring the effectiveness and accuracy of supervision, it also ensures the timeliness of supervision, which is conducive to timely supervision and efficient supervision.
进一步地,所述获取模块20包括:Further, the acquisition module 20 includes:
采集单元,用于采集待核查企业的企业文本数据,并从各所述企业文本数据中抽取与企业状态对应的文本数据进行分类,得到多类状态文本数据;The collection unit is used to collect enterprise text data of the enterprise to be checked, and extract the text data corresponding to the enterprise status from each of the enterprise text data for classification, and obtain multiple types of status text data;
转换单元,用于分别提取多类所述状态文本数据中的状态关键词,并根据预设数据格式,对提取的多类所述状态关键词进行格式转换,得到所述结构化数据。The conversion unit is used to extract state keywords in multiple types of the state text data, and perform format conversion on the extracted multiple types of state keywords according to a preset data format to obtain the structured data.
进一步地,所述转换单元还用于:Further, the conversion unit is also used for:
分别对多类所述状态文本数据进行分段处理和分句处理,生成多类待识别分句,并剔除多类所述待识别分句中的无效分句;Performing segmentation processing and sentence processing on multiple types of the state text data respectively, generating multiple types of to-be-recognized clauses, and eliminating invalid clauses in the multiple types of the to-be-recognized clauses;
对经剔除所述无效分句后的多类所述待识别分句分别进行分词处理,生成多类待识别分词;Perform word segmentation processing on the multiple types of the to-be-recognized clauses after the invalid clauses are eliminated, and generate multiple types of to-be-recognized word segmentation;
将多类所述待识别分词中与所述企业状态无关的噪声词剔除,得到多类所述状态文本数据中的状态关键词。The noise words that are irrelevant to the state of the enterprise in the multiple types of word segmentation to be recognized are eliminated to obtain the state keywords in the multiple types of state text data.
进一步地,所述获取模块20还包括:Further, the acquisition module 20 further includes:
第一传输单元,用于将所述结构化数据传输到所述状态数据模型,确定与所述结构化数据中各类子数据分别匹配的目标样本数据;The first transmission unit is configured to transmit the structured data to the state data model, and determine target sample data respectively matching various types of sub-data in the structured data;
第一确定单元,用于根据与各所述目标样本数据分别对应的分值和权重值,确定各类所述子数据的子分值;The first determining unit is configured to determine the sub-scores of various types of the sub-data according to the scores and weights respectively corresponding to each of the target sample data;
生成单元,用于根据各类所述子数据的子分值,生成所述结构化数据的数据分值。The generating unit is configured to generate the data score of the structured data according to the sub-score of the various types of sub-data.
进一步地,所述监管模块30还包括:Further, the monitoring module 30 further includes:
第二确定单元,用于根据预设的组合分值与状态之间的对应关系,确定由所述数据分值中最大值、最小值和平均值所形成组合对应的目标状态;The second determining unit is configured to determine the target state corresponding to the combination formed by the maximum value, the minimum value and the average value in the data score according to the preset correspondence between the combined score and the state;
监管单元,用于查找与所述待核查企业对应的登记状态,并根据所述目标状态和所述登记状态之间的一致性,监管所述待核查企业的企业状态是否有效。The supervision unit is configured to search for the registration status corresponding to the enterprise to be checked, and supervise whether the enterprise status of the enterprise to be checked is valid according to the consistency between the target state and the registration status.
进一步地,所述企业状态监管装置还包括:Further, the enterprise state monitoring device further includes:
判断模块,用于将与所述待核查企业对应的研判分值传输到所述状态数据模型,判断所述研判分值与所述数据分值是否匹配;The judgment module is configured to transmit the research judgment score corresponding to the enterprise to be checked to the state data model, and judge whether the research judgment score matches the data score;
存储模块,用于若与所述数据分值匹配,则将所述数据分值和所述结构化数据对应存储;A storage module, configured to store the data score and the structured data correspondingly if it matches the data score;
查找模块,用于若与所述数据分值不匹配,则查找所述预设样本数据中与所述结构化数据匹配的目标样本数据;A search module, configured to search for target sample data matching the structured data in the preset sample data if the score does not match the data;
剔除模块,用于将所述目标样本数据以及与所述目标样本数据对应的分值标签剔除,并将所述研判分值生成为所述结构化数据的待训练分值标签;A rejecting module, configured to reject the target sample data and the score label corresponding to the target sample data, and generate the research and judgment score as the to-be-trained score label of the structured data;
更新模块,用于根据所述结构化数据和所述待训练分值标签,更新所述预设样本数据,并基于更新的所述预设样本数据对所述状态数据模型优化训练。The update module is configured to update the preset sample data according to the structured data and the score label to be trained, and optimize the training of the state data model based on the updated preset sample data.
进一步地,所述生成模块10还包括:Further, the generating module 10 further includes:
获取单元,用于获取与预设正向字段名对应的正向样本数据,以及与预设负向字段名对应的负向样本数据,并将各所述正向样本数据和各所述负向样本数据作为所述预设样本数据传输到初始模型,对所述初始模型进行训练,生成模型梯度;The acquiring unit is configured to acquire positive sample data corresponding to the preset positive field name and negative sample data corresponding to the preset negative field name, and combine each of the positive sample data and each of the negative The sample data is transmitted to the initial model as the preset sample data, and the initial model is trained to generate model gradients;
第一传输单元,用于将所述模型梯度传输到与所述联邦学习算法对应的协调方,以供所述协调方将所述模型梯度和至少一个基于所述联邦学习算法生成的其他模型梯度进行聚合,生成回传梯度;The first transmission unit is configured to transmit the model gradient to the coordinating party corresponding to the federated learning algorithm, so that the coordinating party can transfer the model gradient and at least one other model gradient generated based on the federated learning algorithm Perform aggregation to generate a return gradient;
接收单元,用于接收所述协调方返回的回传梯度,并根据所述回传梯度对所述初始模型持续训练,直到所述初始模型收敛,得到所述状态数据模型。The receiving unit is configured to receive the return gradient returned by the coordinating party, and continuously train the initial model according to the return gradient until the initial model converges to obtain the state data model.
本申请企业状态监管装置具体实施方式与上述企业状态监管方法各实施例基本相同,在此不再赘述。The specific implementation of the enterprise state monitoring device of this application is basically the same as the foregoing embodiments of the enterprise state monitoring method, and will not be repeated here.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性的,也可以是易失性的。In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile.
计算机可读存储介质上存储有企业状态监管程序,企业状态监管程序被处理器执行时实现如上所述的企业状态监管方法的步骤。The computer-readable storage medium stores an enterprise state monitoring program, and the enterprise state monitoring program is executed by a processor to implement the steps of the enterprise state monitoring method as described above.
本申请计算机可读存储介质的具体实施方式与上述企业状态监管方法各实施例基本相同,在此不再赘述。The specific implementation of the computer-readable storage medium of this application is basically the same as the foregoing embodiments of the enterprise state supervision method, and will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or system. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or system that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个计算机可读存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a computer-readable storage medium as described above (such as The ROM/RAM, magnetic disk, optical disk) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种企业状态监管方法,其中,所述企业状态监管方法包括以下步骤:An enterprise state supervision method, wherein the enterprise state supervision method includes the following steps:
    将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;Transmitting the preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
    获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;Acquiring structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmitting the structured data to the state data model to generate a data score of the structured data;
    根据所述数据分值,监管所述待核查企业的企业状态是否有效。According to the data score, whether the enterprise status of the enterprise to be checked is effective is supervised.
  2. 如权利要求1所述的企业状态监管方法,其中,所述获取待核查企业中与企业状态对应的结构化数据的步骤包括:The method for monitoring the state of an enterprise according to claim 1, wherein the step of obtaining structured data corresponding to the state of the enterprise in the enterprise to be checked comprises:
    采集待核查企业的企业文本数据,并从各所述企业文本数据中抽取与企业状态对应的文本数据进行分类,得到多类状态文本数据;Collect enterprise text data of the enterprise to be checked, and extract the text data corresponding to the enterprise status from each of the enterprise text data for classification, and obtain multiple types of status text data;
    分别提取多类所述状态文本数据中的状态关键词,并根据预设数据格式,对提取的多类所述状态关键词进行格式转换,得到所述结构化数据。Extracting state keywords in multiple types of the state text data respectively, and performing format conversion on the extracted multiple types of state keywords according to a preset data format, to obtain the structured data.
  3. 如权利要求2所述的企业状态监管方法,其中,所述分别提取多类所述状态文本数据中的状态关键词的步骤包括:3. The method for monitoring the state of an enterprise according to claim 2, wherein the step of extracting state keywords in multiple types of state text data respectively comprises:
    分别对多类所述状态文本数据进行分段处理和分句处理,生成多类待识别分句,并剔除多类所述待识别分句中的无效分句;Performing segmentation processing and sentence processing on multiple types of the state text data respectively, generating multiple types of to-be-recognized clauses, and eliminating invalid clauses in the multiple types of the to-be-recognized clauses;
    对经剔除所述无效分句后的多类所述待识别分句分别进行分词处理,生成多类待识别分词;Perform word segmentation processing on the multiple types of the to-be-recognized clauses after the invalid clauses are eliminated, and generate multiple types of to-be-recognized word segmentation;
    将多类所述待识别分词中与所述企业状态无关的噪声词剔除,得到多类所述状态文本数据中的状态关键词。The noise words that are irrelevant to the state of the enterprise in the multiple types of word segmentation to be recognized are eliminated to obtain the state keywords in the multiple types of state text data.
  4. 如权利要求1所述的企业状态监管方法,其中,所述将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值的步骤包括:The method for monitoring the state of an enterprise according to claim 1, wherein the step of transmitting the structured data to the state data model to generate a data score of the structured data comprises:
    将所述结构化数据传输到所述状态数据模型,确定与所述结构化数据中各类子数据分别匹配的目标样本数据;Transmitting the structured data to the state data model, and determining target sample data respectively matching various types of sub-data in the structured data;
    根据与各所述目标样本数据分别对应的分值和权重值,确定各类所述子数据的子分值;Determine the sub-scores of the various types of sub-data according to the scores and weights respectively corresponding to each of the target sample data;
    根据各类所述子数据的子分值,生成所述结构化数据的数据分值。The data score of the structured data is generated according to the sub-score of the various types of the sub-data.
  5. 如权利要求1所述的企业状态监管方法,其中,所述根据所述数据分值,监管所述待核查企业的企业状态是否有效的步骤包括:The method for monitoring the state of an enterprise according to claim 1, wherein the step of monitoring whether the enterprise state of the enterprise to be checked is valid according to the data score comprises:
    根据预设的组合分值与状态之间的对应关系,确定由所述数据分值中最大值、最小值和平均值所形成组合对应的目标状态;Determine the target state corresponding to the combination formed by the maximum value, the minimum value and the average value in the data score according to the preset correspondence between the combined score and the state;
    查找与所述待核查企业对应的登记状态,并根据所述目标状态和所述登记状态之间的一致性,监管所述待核查企业的企业状态是否有效。Find the registration status corresponding to the company to be checked, and supervise whether the company status of the company to be checked is valid according to the consistency between the target status and the registration status.
  6. 如权利要求1-5任一项所述的企业状态监管方法,其中,所述根据所述数据分值,监管所述待核查企业的企业状态是否有效的步骤之后包括:The method for monitoring the state of an enterprise according to any one of claims 1 to 5, wherein the step of supervising whether the enterprise state of the enterprise to be checked is valid according to the data score afterwards comprises:
    将与所述待核查企业对应的研判分值传输到所述状态数据模型,判断所述研判分值与所述数据分值是否匹配;Transmitting the research and judgment score corresponding to the enterprise to be checked to the state data model, and determine whether the research and judgment score matches the data score;
    若与所述数据分值匹配,则将所述数据分值和所述结构化数据对应存储;If it matches the data score, store the data score and the structured data correspondingly;
    若与所述数据分值不匹配,则查找所述预设样本数据中与所述结构化数据匹配的目标样本数据;If it does not match the data score, searching for target sample data that matches the structured data in the preset sample data;
    将所述目标样本数据以及与所述目标样本数据对应的分值标签剔除,并将所述研判分值生成为所述结构化数据的待训练分值标签;Removing the target sample data and the score label corresponding to the target sample data, and generating the research and judgment score as the to-be-trained score label of the structured data;
    根据所述结构化数据和所述待训练分值标签,更新所述预设样本数据,并基于更新的所述预设样本数据对所述状态数据模型优化训练。According to the structured data and the score label to be trained, the preset sample data is updated, and the state data model is optimized for training based on the updated preset sample data.
  7. 如权利要求1-5任一项所述的企业状态监管方法,其中,所述将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型的步骤包括:The method for monitoring the state of an enterprise according to any one of claims 1-5, wherein the step of transmitting preset sample data to an initial model, and training the initial model based on a federated learning algorithm to generate a state data model include:
    获取与预设正向字段名对应的正向样本数据,以及与预设负向字段名对应的负向样本数据,并将各所述正向样本数据和各所述负向样本数据作为所述预设样本数据传输到初始模型,对所述初始模型进行训练,生成模型梯度;Obtain the positive sample data corresponding to the preset positive field name and the negative sample data corresponding to the preset negative field name, and use each of the positive sample data and each of the negative sample data as the The preset sample data is transmitted to the initial model, the initial model is trained, and the model gradient is generated;
    将所述模型梯度传输到与所述联邦学习算法对应的协调方,以供所述协调方将所述模型梯度和至少一个基于所述联邦学习算法生成的其他模型梯度进行聚合,生成回传梯度;The model gradient is transmitted to the coordinating party corresponding to the federated learning algorithm, so that the coordinating party aggregates the model gradient and at least one other model gradient generated based on the federated learning algorithm to generate a return gradient ;
    接收所述协调方返回的回传梯度,并根据所述回传梯度对所述初始模型持续训练,直到所述初始模型收敛,得到所述状态数据模型。Receive the return gradient returned by the coordinator, and continuously train the initial model according to the return gradient until the initial model converges to obtain the state data model.
  8. 一种企业状态监管装置,其中,所述企业状态监管装置包括:An enterprise state monitoring device, wherein the enterprise state monitoring device includes:
    生成模块,用于将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;A generating module, used for transmitting preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
    获取模块,用于获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;An obtaining module is used to obtain structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmit the structured data to the state data model to generate a data score of the structured data;
    监管模块,用于根据所述数据分值,监管所述待核查企业的企业状态是否有效。The supervision module is used to supervise whether the enterprise status of the enterprise to be checked is valid according to the data score.
  9. 一种企业状态监管设备,其中,所述企业状态监管设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的企业状态监管程序,所述企业状态监管程序被所述处理器执行时实现如下步骤:An enterprise state monitoring device, wherein the enterprise state monitoring device includes a memory, a processor, and an enterprise state monitoring program stored on the memory and running on the processor, and the enterprise state monitoring program is The following steps are implemented when the processor is executed:
    将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;Transmitting the preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
    获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;Acquiring structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmitting the structured data to the state data model to generate a data score of the structured data;
    根据所述数据分值,监管所述待核查企业的企业状态是否有效。According to the data score, whether the enterprise status of the enterprise to be checked is effective is supervised.
  10. 如权利要求9所述的企业状态监管设备,其中,所述获取待核查企业中与企业状态对应的结构化数据的步骤包括:9. The enterprise state monitoring device according to claim 9, wherein the step of obtaining structured data corresponding to the state of the enterprise in the enterprise to be checked comprises:
    采集待核查企业的企业文本数据,并从各所述企业文本数据中抽取与企业状态对应的文本数据进行分类,得到多类状态文本数据;Collect enterprise text data of the enterprise to be checked, and extract the text data corresponding to the enterprise status from each of the enterprise text data for classification, and obtain multiple types of status text data;
    分别提取多类所述状态文本数据中的状态关键词,并根据预设数据格式,对提取的多类所述状态关键词进行格式转换,得到所述结构化数据。Extracting state keywords in multiple types of the state text data respectively, and performing format conversion on the extracted multiple types of state keywords according to a preset data format, to obtain the structured data.
  11. 如权利要求10所述的企业状态监管设备,其中,所述分别提取多类所述状态文本数据中的状态关键词的步骤包括:10. The enterprise state monitoring device according to claim 10, wherein the step of extracting state keywords in multiple types of state text data respectively comprises:
    分别对多类所述状态文本数据进行分段处理和分句处理,生成多类待识别分句,并剔除多类所述待识别分句中的无效分句;Performing segmentation processing and sentence processing on multiple types of the state text data respectively, generating multiple types of to-be-recognized clauses, and eliminating invalid clauses in the multiple types of the to-be-recognized clauses;
    对经剔除所述无效分句后的多类所述待识别分句分别进行分词处理,生成多类待识别分词;Perform word segmentation processing on the multiple types of the to-be-recognized clauses after the invalid clauses are eliminated, and generate multiple types of to-be-recognized word segmentation;
    将多类所述待识别分词中与所述企业状态无关的噪声词剔除,得到多类所述状态文本数据中的状态关键词。The noise words that are irrelevant to the state of the enterprise in the multiple types of word segmentation to be recognized are eliminated to obtain the state keywords in the multiple types of state text data.
  12. 如权利要求9所述的企业状态监管设备,其中,所述将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值的步骤包括:9. The enterprise state monitoring device according to claim 9, wherein the step of transmitting the structured data to the state data model to generate a data score of the structured data comprises:
    将所述结构化数据传输到所述状态数据模型,确定与所述结构化数据中各类子数据分别匹配的目标样本数据;Transmitting the structured data to the state data model, and determining target sample data respectively matching various types of sub-data in the structured data;
    根据与各所述目标样本数据分别对应的分值和权重值,确定各类所述子数据的子分值;Determine the sub-scores of the various types of sub-data according to the scores and weights respectively corresponding to each of the target sample data;
    根据各类所述子数据的子分值,生成所述结构化数据的数据分值。The data score of the structured data is generated according to the sub-score of the various types of the sub-data.
  13. 如权利要求9所述的企业状态监管设备,其中,所述根据所述数据分值,监管所述待核查企业的企业状态是否有效的步骤包括:9. The enterprise state monitoring device according to claim 9, wherein the step of monitoring whether the enterprise state of the enterprise to be checked is valid according to the data score comprises:
    根据预设的组合分值与状态之间的对应关系,确定由所述数据分值中最大值、最小值和平均值所形成组合对应的目标状态;Determine the target state corresponding to the combination formed by the maximum value, the minimum value and the average value in the data score according to the preset correspondence between the combined score and the state;
    查找与所述待核查企业对应的登记状态,并根据所述目标状态和所述登记状态之间的一致性,监管所述待核查企业的企业状态是否有效。Find the registration status corresponding to the company to be checked, and supervise whether the company status of the company to be checked is valid according to the consistency between the target status and the registration status.
  14. 如权利要求9-13任一项所述的企业状态监管设备,其中,所述根据所述数据分值,监管所述待核查企业的企业状态是否有效的步骤之后包括:The enterprise state monitoring device according to any one of claims 9-13, wherein the step of monitoring whether the enterprise state of the enterprise to be checked is valid according to the data score afterwards comprises:
    将与所述待核查企业对应的研判分值传输到所述状态数据模型,判断所述研判分值与所述数据分值是否匹配;Transmitting the research and judgment score corresponding to the enterprise to be checked to the state data model, and determine whether the research and judgment score matches the data score;
    若与所述数据分值匹配,则将所述数据分值和所述结构化数据对应存储;If it matches the data score, store the data score and the structured data correspondingly;
    若与所述数据分值不匹配,则查找所述预设样本数据中与所述结构化数据匹配的目标样本数据;If it does not match the data score, searching for target sample data that matches the structured data in the preset sample data;
    将所述目标样本数据以及与所述目标样本数据对应的分值标签剔除,并将所述研判分值生成为所述结构化数据的待训练分值标签;Removing the target sample data and the score label corresponding to the target sample data, and generating the research and judgment score as the to-be-trained score label of the structured data;
    根据所述结构化数据和所述待训练分值标签,更新所述预设样本数据,并基于更新的所述预设样本数据对所述状态数据模型优化训练。According to the structured data and the score label to be trained, the preset sample data is updated, and the state data model is optimized for training based on the updated preset sample data.
  15. 如权利要求9-13任一项所述的企业状态监管设备,其中,所述将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型的步骤包括:The enterprise state monitoring device according to any one of claims 9-13, wherein the step of transmitting preset sample data to an initial model, and training the initial model based on a federated learning algorithm to generate a state data model include:
    获取与预设正向字段名对应的正向样本数据,以及与预设负向字段名对应的负向样本数据,并将各所述正向样本数据和各所述负向样本数据作为所述预设样本数据传输到初始模型,对所述初始模型进行训练,生成模型梯度;Obtain the positive sample data corresponding to the preset positive field name and the negative sample data corresponding to the preset negative field name, and use each of the positive sample data and each of the negative sample data as the The preset sample data is transmitted to the initial model, the initial model is trained, and the model gradient is generated;
    将所述模型梯度传输到与所述联邦学习算法对应的协调方,以供所述协调方将所述模型梯度和至少一个基于所述联邦学习算法生成的其他模型梯度进行聚合,生成回传梯度;The model gradient is transmitted to the coordinating party corresponding to the federated learning algorithm, so that the coordinating party aggregates the model gradient and at least one other model gradient generated based on the federated learning algorithm to generate a return gradient ;
    接收所述协调方返回的回传梯度,并根据所述回传梯度对所述初始模型持续训练,直到所述初始模型收敛,得到所述状态数据模型。Receive the return gradient returned by the coordinator, and continuously train the initial model according to the return gradient until the initial model converges to obtain the state data model.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有企业状态监管程序,所述企业状态监管程序被处理器执行时实现如下步骤:A computer-readable storage medium, wherein an enterprise state monitoring program is stored on the computer-readable storage medium, and the following steps are implemented when the enterprise state monitoring program is executed by a processor:
    将预设样本数据传输到初始模型,并基于联邦学习算法对所述初始模型进行训练,生成状态数据模型;Transmitting the preset sample data to the initial model, and training the initial model based on a federated learning algorithm to generate a state data model;
    获取待核查企业中与企业状态对应的结构化数据,并将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值;Acquiring structured data corresponding to the state of the enterprise in the enterprise to be checked, and transmitting the structured data to the state data model to generate a data score of the structured data;
    根据所述数据分值,监管所述待核查企业的企业状态是否有效。According to the data score, whether the enterprise status of the enterprise to be checked is effective is supervised.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述获取待核查企业中与企业状态对应的结构化数据的步骤包括:15. The computer-readable storage medium of claim 16, wherein the step of obtaining structured data corresponding to the state of the enterprise in the enterprise to be checked comprises:
    采集待核查企业的企业文本数据,并从各所述企业文本数据中抽取与企业状态对应的文本数据进行分类,得到多类状态文本数据;Collect enterprise text data of the enterprise to be checked, and extract the text data corresponding to the enterprise status from each of the enterprise text data for classification, and obtain multiple types of status text data;
    分别提取多类所述状态文本数据中的状态关键词,并根据预设数据格式,对提取的多类所述状态关键词进行格式转换,得到所述结构化数据。Extracting state keywords in multiple types of the state text data respectively, and performing format conversion on the extracted multiple types of state keywords according to a preset data format, to obtain the structured data.
  18. 如权利要求17所述的计算机可读存储介质,其中,所述分别提取多类所述状态文本数据中的状态关键词的步骤包括:17. The computer-readable storage medium of claim 17, wherein the step of extracting state keywords in the plurality of types of state text data respectively comprises:
    分别对多类所述状态文本数据进行分段处理和分句处理,生成多类待识别分句,并剔除多类所述待识别分句中的无效分句;Performing segmentation processing and sentence processing on multiple types of the state text data respectively, generating multiple types of to-be-recognized clauses, and eliminating invalid clauses in the multiple types of the to-be-recognized clauses;
    对经剔除所述无效分句后的多类所述待识别分句分别进行分词处理,生成多类待识别分词;Perform word segmentation processing on the multiple types of the to-be-recognized clauses after the invalid clauses are eliminated, and generate multiple types of to-be-recognized word segmentation;
    将多类所述待识别分词中与所述企业状态无关的噪声词剔除,得到多类所述状态文本数据中的状态关键词。The noise words that are irrelevant to the state of the enterprise in the multiple types of word segmentation to be recognized are eliminated to obtain the state keywords in the multiple types of state text data.
  19. 如权利要求16所述的计算机可读存储介质,其中,所述将所述结构化数据传输到所述状态数据模型,生成所述结构化数据的数据分值的步骤包括:15. The computer-readable storage medium of claim 16, wherein the step of transmitting the structured data to the state data model to generate a data score of the structured data comprises:
    将所述结构化数据传输到所述状态数据模型,确定与所述结构化数据中各类子数据分别匹配的目标样本数据;Transmitting the structured data to the state data model, and determining target sample data respectively matching various types of sub-data in the structured data;
    根据与各所述目标样本数据分别对应的分值和权重值,确定各类所述子数据的子分值;Determine the sub-scores of the various types of sub-data according to the scores and weights respectively corresponding to each of the target sample data;
    根据各类所述子数据的子分值,生成所述结构化数据的数据分值。The data score of the structured data is generated according to the sub-score of the various types of the sub-data.
  20. 如权利要求16所述的计算机可读存储介质,其中,所述根据所述数据分值,监管所述待核查企业的企业状态是否有效的步骤包括:15. The computer-readable storage medium according to claim 16, wherein the step of monitoring whether the enterprise status of the enterprise to be checked is valid according to the data score comprises:
    根据预设的组合分值与状态之间的对应关系,确定由所述数据分值中最大值、最小值和平均值所形成组合对应的目标状态;Determine the target state corresponding to the combination formed by the maximum value, the minimum value and the average value in the data score according to the preset correspondence between the combined score and the state;
    查找与所述待核查企业对应的登记状态,并根据所述目标状态和所述登记状态之间的一致性,监管所述待核查企业的企业状态是否有效。Find the registration status corresponding to the company to be checked, and supervise whether the company status of the company to be checked is valid according to the consistency between the target status and the registration status.
PCT/CN2020/106230 2020-05-22 2020-07-31 Enterprise state supervision method, apparatus, and device, and computer readable storage medium WO2021232595A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010445603.8A CN111798352A (en) 2020-05-22 2020-05-22 Enterprise state supervision method, device, equipment and computer readable storage medium
CN202010445603.8 2020-05-22

Publications (1)

Publication Number Publication Date
WO2021232595A1 true WO2021232595A1 (en) 2021-11-25

Family

ID=72806510

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/106230 WO2021232595A1 (en) 2020-05-22 2020-07-31 Enterprise state supervision method, apparatus, and device, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN111798352A (en)
WO (1) WO2021232595A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170715A (en) * 2017-12-01 2018-06-15 厦门快商通信息技术有限公司 A kind of text classification method for extracting content and text structure processing method
CN109388805A (en) * 2018-10-23 2019-02-26 重庆誉存大数据科技有限公司 A kind of industrial and commercial analysis on altered project method extracted based on entity
WO2019059717A1 (en) * 2017-09-22 2019-03-28 김민준 System for providing associated service matching platform on basis of company analysis
CN110414688A (en) * 2019-07-29 2019-11-05 卓尔智联(武汉)研究院有限公司 Information analysis method, device, server and storage medium
CN110443067A (en) * 2019-07-30 2019-11-12 卓尔智联(武汉)研究院有限公司 Federal model building device, method and readable storage medium storing program for executing based on secret protection
CN110956331A (en) * 2019-12-04 2020-04-03 汇鼎数据科技(上海)有限公司 Method, system and device for predicting operation state of digital factory

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019059717A1 (en) * 2017-09-22 2019-03-28 김민준 System for providing associated service matching platform on basis of company analysis
CN108170715A (en) * 2017-12-01 2018-06-15 厦门快商通信息技术有限公司 A kind of text classification method for extracting content and text structure processing method
CN109388805A (en) * 2018-10-23 2019-02-26 重庆誉存大数据科技有限公司 A kind of industrial and commercial analysis on altered project method extracted based on entity
CN110414688A (en) * 2019-07-29 2019-11-05 卓尔智联(武汉)研究院有限公司 Information analysis method, device, server and storage medium
CN110443067A (en) * 2019-07-30 2019-11-12 卓尔智联(武汉)研究院有限公司 Federal model building device, method and readable storage medium storing program for executing based on secret protection
CN110956331A (en) * 2019-12-04 2020-04-03 汇鼎数据科技(上海)有限公司 Method, system and device for predicting operation state of digital factory

Also Published As

Publication number Publication date
CN111798352A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
WO2021103492A1 (en) Risk prediction method and system for business operations
WO2020253358A1 (en) Service data risk control analysis processing method, apparatus and computer device
US11748416B2 (en) Machine-learning system for servicing queries for digital content
JP4920023B2 (en) Inter-object competition index calculation method and system
CN110462604A (en) The data processing system and method for association internet device are used based on equipment
CN110008288A (en) The construction method in the knowledge mapping library for Analysis of Network Malfunction and its application
CN111835585B (en) Inspection method and device for Internet of things equipment, computer equipment and storage medium
WO2021159834A1 (en) Abnormal information processing node analysis method and apparatus, medium and electronic device
US20230139783A1 (en) Schema-adaptable data enrichment and retrieval
CN110880075A (en) Employee departure tendency detection method
CN109977291A (en) Search method, device, equipment and storage medium based on physical knowledge map
CN105389341A (en) Text clustering and analysis method for repeating caller work orders of customer service calls
CN109062936B (en) Data query method, computer readable storage medium and terminal equipment
CN111813956A (en) Knowledge graph construction method and device, and information penetration method and system
CN109947952A (en) Search method, device, equipment and storage medium based on english knowledge map
Wu et al. Hybrid TODIM method with crisp number and probability linguistic term set for urban epidemic situation evaluation
CN112395351A (en) Visual identification group complaint risk method, device, computer equipment and medium
CN104636386A (en) Information monitoring method and device
US20240127143A1 (en) Method, device and storage medium for information processing based on data interaction
CN114003600A (en) Data processing method, system, electronic device and storage medium
US20230367821A1 (en) Machine-learning system for servicing queries for digital content
WO2021128721A1 (en) Method and device for text classification
CN107609203A (en) A kind of data analysis system and method for search engine optimization effect quantitative evaluation
WO2021232595A1 (en) Enterprise state supervision method, apparatus, and device, and computer readable storage medium
WO2024001102A1 (en) Method and apparatus for intelligently identifying family circle in communication industry, and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20936420

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08/03/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20936420

Country of ref document: EP

Kind code of ref document: A1