CN116260866A

CN116260866A - Government information pushing method and device based on machine learning and computer equipment

Info

Publication number: CN116260866A
Application number: CN202310106241.3A
Authority: CN
Inventors: 杨超; 高文飞; 张�荣; 田野
Original assignee: Beijing Wucoded Technology Co ltd
Current assignee: Beijing Wucoded Technology Co ltd
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-06-13

Abstract

The invention discloses a government affair information pushing method and device based on machine learning and computer equipment. The method comprises the following steps: acquiring information of a plurality of enterprises in a government affair information pushing area; dividing a plurality of enterprises into a plurality of enterprise groups; collecting government information and cleaning; classifying the washed government information through similarity calculation, and labeling each type of government information; training to obtain a government affair classification model, and inputting a government affair file to be pushed into the government affair classification model to obtain a government affair class classification result of each paragraph content in the government affair file to be pushed; and pushing the content of each paragraph in the government affair file to be pushed to the corresponding enterprise group according to the corresponding government affair category. According to the invention, the government affair files can be classified accurately and efficiently by using the machine learning model, and the contents of each section in the government affair files can be actively pushed to corresponding enterprises after classification, so that the enterprises do not need to manually screen the contents useful for themselves, and the labor cost of the enterprises is reduced.

Description

Government information pushing method and device based on machine learning and computer equipment

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to a government affair information pushing method, device and computer equipment based on machine learning.

Background

The government information is an important category of information, and is a generic name of information, conditions, data, diagrams, text materials, audio-visual materials and the like reflecting government works and related things in government activities. When a government agreement is completed, the government affairs need to be issued, so that people and government offices can know the government affairs.

With the deep advancement of big data such as national e-government, digital government and digital China, intelligent government affairs, government departments of all levels increasingly face public propaganda and pushing through the form of networks, and government affairs release is expanded from original paper file release to network release, thereby producing a large number of government official document notices.

Government departments have long developed and recorded government data for a large number of related departments, which are important bases for government departments to manage. The government and society have increasingly higher demands for mining the value of government data in the field of departments along with the development of big data and the Internet. Based on incomplete statistics, the national ministry of the government has released over 10 tens of thousands of documents through government open websites in the last five years.

In the face of increasingly heavy government document text data information, it becomes very difficult to correctly and efficiently sort each government document by using the existing government data system and actively push the government document to a corresponding enterprise, so that the enterprise is difficult to accurately acquire government information matched with the enterprise. For example, chinese patent document CN202210808392.9 discloses a government affair information pushing method according to region, which includes the steps of: establishing a dependency model between regions, and determining the relevance between the regions through the dependency model between the regions; step two: acquiring published government information content, extracting key features in the government information content, and inputting the key features into the subordinate relation model; step three: and outputting the influence of the government affair information on each region through the subordinate relation model, marking the government affair information content part corresponding to the keyword characteristic in the government affair information content, and pushing the government affair information content part to a terminal in the appointed region.

By the method, the government affair content can only be pushed to all users in the corresponding region, but the government affair content cannot be pushed to enterprises matched with the region. Huge information enables enterprises in regions to carefully read the official document information to screen out useful contents, and a large amount of manpower resources can be wasted in the process, so that the manpower cost of the enterprises is increased.

Disclosure of Invention

Based on the above technical problems, a government affair information pushing method, device and computer equipment based on machine learning are provided, so as to solve the technical problems that the existing government affair data system is difficult to accurately and efficiently sort and actively push each government affair file to a corresponding enterprise.

In order to achieve the above object, the present application provides the following technical solutions:

in a first aspect, a government affair information pushing method based on machine learning includes:

acquiring information of a plurality of enterprises in a government affair information pushing area;

dividing the enterprises into a plurality of enterprise groups according to the information of the enterprises;

collecting government affair information, and cleaning the collected government affair information;

classifying the government information after cleaning through similarity calculation, and labeling each type of government information according to government class labels contained in a pre-established government label library;

inputting the government affair information marked with the labels into a machine learning model, and training to obtain a government affair classification model;

receiving a government affair file to be pushed, and inputting the government affair file to be pushed into the government affair classification model to obtain a government affair class classification result of each paragraph content in the government affair file to be pushed;

and pushing the content of each paragraph in the government affair file to be pushed to a corresponding enterprise group according to the corresponding government affair category.

Optionally, the collecting government affair information includes:

capturing government affair disclosure information in a network through a web crawler technology;

and acquiring data of an external government system through a standard API.

Further optionally, the government affairs disclosure information includes policy regulations, a commerce environment and expert consensus.

Optionally, the algorithm used by the machine learning model is a classification algorithm or a regression algorithm.

Alternatively, the similarity calculation is performed specifically using the HanLP algorithm.

Optionally, the method further comprises:

and evaluating the effect of the government classification model by adopting a deep learning evaluation method.

Optionally, the method further comprises:

and periodically updating the collected government affair information and the government affair classification model, and maintaining the government affair classification model.

Optionally, the method further comprises:

and storing the classification result of the government affair file to be pushed into a relational database.

In a second aspect, a government information pushing device based on machine learning includes:

the enterprise information acquisition module is used for acquiring information of a plurality of enterprises in the government affair information pushing area;

the enterprise dividing module is used for dividing the enterprises into a plurality of enterprise groups according to the information of the enterprises;

the government affair information acquisition module is used for acquiring government affair information and cleaning the acquired government affair information;

the similarity calculation module is used for classifying the government information subjected to cleaning through similarity calculation and labeling each type of government information according to government class labels contained in a pre-established government label library;

the model training module is used for inputting the government affair information subjected to label marking into the machine learning model and training to obtain a government affair classification model;

the government affair file classification module to be pushed is used for receiving the government affair file to be pushed, inputting the government affair file to be pushed into the government affair classification model, and obtaining a government affair class classification result of each paragraph content in the government affair file to be pushed;

and the pushing module is used for pushing the content of each paragraph in the government affair file to be pushed to the corresponding enterprise group according to the corresponding government affair category.

In a third aspect, a computer device comprises a memory storing a computer program and a processor implementing the steps of the method of any of the first aspects when the computer program is executed.

The invention has at least the following beneficial effects:

in the government information pushing method based on machine learning provided by the embodiment of the invention, the information of a plurality of enterprises in a government information pushing area is acquired, the enterprises are divided into a plurality of enterprise groups, the government information is collected and cleaned, the government information after cleaning is classified through similarity calculation, each type of government information is labeled, a government classification model is obtained through training, a government file to be pushed is input into the government classification model, a government classification result of each paragraph content in the government file to be pushed is obtained, and each paragraph content in the government file to be pushed is pushed to the corresponding enterprise group according to the corresponding government class; the machine learning model can be utilized to accurately and efficiently classify the government affair files, and after classification, the contents of each section in the government affair files can be actively pushed to corresponding enterprises, so that enterprises do not need to manually read document information to screen contents useful for themselves, the labor cost of the enterprises is reduced, the execution efficiency and quality of the government affair work are effectively improved, and the implementation burden and cost of the government affair work are reduced.

Drawings

Fig. 1 is a schematic flow chart of a government affair information pushing method based on machine learning according to an embodiment of the invention;

FIG. 2 is a block diagram of a module architecture of a government information pushing device based on machine learning according to an embodiment of the present invention;

fig. 3 is an internal structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a government affair information pushing method based on machine learning is provided, which includes the following steps:

s101, information of a plurality of enterprises in a government affair information pushing area is obtained.

Wherein the enterprise information comprises enterprise name, belonging industry, enterprise management content and the like

S102, dividing the enterprises into a plurality of enterprise groups according to the information of the enterprises.

That is, the enterprises of the same type are divided into the same enterprise group, and government information possibly needed by the enterprises in the same enterprise group is similar or identical.

S103, collecting government affair information and cleaning the collected government affair information.

A large amount of government data needs to be collected, dirty data may exist in the collected data, and the data needs to be cleaned to remove useless information and abnormal values.

Wherein, gather government affair information includes:

capturing government affair disclosure information in a network through a web crawler technology; the government affair disclosure information comprises policy regulations, a commercial environment and expert consensus;

and acquiring data of an external government system through a standard API.

That is, the data overall collection can be performed by using various methods oriented to various data sources, and the whole life cycle of government data is penetrated. And a government affair data asset system is built by collecting enough comprehensive attributes, dimensions and indexes. In the aspect of multi-source data aggregation, user access behavior data are collected through an SDK buried point technology, and user access indexes such as UV, PV and the like are analyzed; capturing network data through a web crawler technology, and acquiring government affair disclosure information such as public opinion information monitoring, policy and regulation, and a commercial environment; and acquiring external government affair system data through a standard API. In the aspect of data analysis, multi-dimensional and multi-view data analysis is realized through a data analysis rule, and the potential value of the associated data is deeply mined; and a parallel real-time computing processing technology of mass data information is supported, so that the data analysis and processing efficiency is greatly improved.

S104, classifying the government information after cleaning through similarity calculation, and labeling each type of government information according to government class labels contained in a pre-established government label library.

Specifically, the similarity calculation is performed by utilizing a HanLP algorithm.

The entity matching rule method based on the rapid similarity calculation and the entity matching method based on the rule can be comprehensively utilized to construct analysis models and knowledge items for government performance management, government supervision, financial assessment, budget performance analysis and the like. Aiming at different application scenes, the similarity of knowledge information and business information is calculated through a text matcher, a structure matcher, an example-based matcher, a mapping relation and a similarity algorithm model, so that accurate service pushing is realized. Searching for matching instance pairs in massive service data, performing feature analysis on the matching instance pairs, and self-learning and perfecting a matching rule algorithm based on intelligent algorithms such as a statistical algorithm, a clustering algorithm, deep learning, a neural network and the like, wherein the algorithm rule iterates along with the increase of the service data volume.

S105, inputting the government affair information subjected to label marking into a machine learning model, and training to obtain a government affair classification model.

The algorithm used by the machine learning model is a classification algorithm or a regression algorithm.

S106, receiving the government affair file to be pushed, and inputting the government affair file to be pushed into a government affair classification model to obtain a government affair class classification result of the content of each paragraph in the government affair file to be pushed.

The government file to be pushed generally contains a plurality of pieces of content, or a plurality of point content, and each point content can be aimed at different industries and enterprises. Through the government affair classification model, each section in the government affair file to be pushed can be classified and labeled, the category of each section of content in the government affair file to be pushed can be known, and therefore the government affair file can be pushed accurately better.

And S107, pushing the content of each paragraph in the government file to be pushed to the corresponding enterprise group according to the corresponding government class.

According to the classification result of each section of the government affair file to be pushed, each section of content of the government affair file to be pushed can be respectively pushed to all enterprises in the required enterprise group, so that the trained model is deployed into practical application for accurately pushing government affair service.

Further, the method further comprises:

and (5) evaluating the effect of the government classification model by adopting a deep learning evaluation method.

The deep learning evaluation method is specifically a cross verification method, a leave verification method or a prediction verification method.

Further, the method further comprises:

periodically updating the collected government affair information, cleaning the updated government affair information, calculating and classifying similarity, labeling labels, and training the government affair classification model again by using the updated government affair information to update the government affair classification model; and maintaining the government classification model to ensure that the government classification model can normally operate.

Further, the method further comprises: and storing the classification result of the government file to be pushed into a relational database.

In the invention, in order to solve the phenomena of government application data chimney and information island, the data among the systems can be opened through establishing a unified data standard interface specification, so as to realize data integration, modeling and display, and provide unified data authorization and effective data access and sharing service among the application systems. In the aspect of data management, unified storage and management of data of different sources (external application, web crawlers, offline import and the like) are realized by using a multi-source standard data interface, visual metadata management and a heterogeneous data conversion engine. A data access layer providing a standardized data resource service directory; supporting complete access management such as data access application, application audit, access authentication, access audit and the like; and the external user acquires the authorized data resource information through the standardized access interface.

Through catalog management of data standards, unified standard catalog management can be carried out on all the data standards in the platform according to different types of standard definitions and attributes, and the unified catalog management mainly realizes unified cataloging of all the standards, unified catalog model management and the like. The catalog can enable all defined data standard information and resources to play a better and more accurate role, and based on the standard catalog management capability, all data standard resources in the data standard module can be combined for centralized management, so that on one hand, a user or a machine is facilitated to search and use resources, and on the other hand, more strict safety control is guaranteed.

The invention applies natural language processing technology to independently develop government affair word segmentation technology through a great deal of experience accumulation in the government affair professional field. The technology trains and obtains a vertical field word segmentation model by using a multi-layer convolutional neural network based on word vectorization through special words such as government performance, finance, government supervision, commercial environment and the like. Thereby improving the word segmentation accuracy and category correlation in the government affair professional field. The word segmentation model has self-learning and self-organizing capabilities, and the calculation result set is continuously expanded through continuous application, learning and training, so that the word segmentation model and the permission are further optimized. The technology is widely applied to business functions such as government work report decomposition, sensitive word monitoring, case recommendation, matching check and the like, effectively improves the efficiency and quality of government work, and reduces the implementation burden and cost of government work.

In the government information pushing method based on machine learning, the information of a plurality of enterprises in a government information pushing area is acquired, the enterprises are divided into a plurality of enterprise groups, the government information is collected and cleaned, the cleaned government information is classified through similarity calculation, each type of government information is labeled, a government classification model is obtained through training, a government file to be pushed is input into the government classification model, a government classification result of each paragraph content in the government file to be pushed is obtained, and each paragraph content in the government file to be pushed is pushed to the corresponding enterprise group according to the corresponding government class; the machine learning model can be utilized to accurately and efficiently classify the government affair files, and after classification, the contents of each section in the government affair files can be actively pushed to corresponding enterprises, so that enterprises do not need to manually read document information to screen contents useful for themselves, the labor cost of the enterprises is reduced, the execution efficiency and quality of the government affair work are effectively improved, and the implementation burden and cost of the government affair work are reduced.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in fig. 1 may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed in rotation or alternatively with at least a portion of the steps or stages in other steps or other steps.

In one embodiment, as shown in fig. 2, there is provided a government affair information pushing device based on machine learning, including the following program modules:

the enterprise information acquisition module 201 is configured to acquire information of a plurality of enterprises in the government affair information pushing area;

an enterprise dividing module 202, configured to divide a plurality of enterprises into a plurality of enterprise groups according to information of the plurality of enterprises;

the government affair information acquisition module 203 is configured to acquire government affair information and clean the acquired government affair information;

the similarity calculation module 204 is configured to classify the government information after cleaning through similarity calculation, and label each type of government information according to a government class label contained in a pre-established government label library;

the model training module 205 is configured to input the government affair information labeled with the label into a machine learning model, and train to obtain a government affair classification model;

the to-be-pushed government affair file classification module 206 is configured to receive the to-be-pushed government affair file, input the to-be-pushed government affair file into the government affair classification model, and obtain a government affair classification result of each paragraph content in the to-be-pushed government affair file;

the pushing module 207 is configured to push each paragraph content in the government file to be pushed to a corresponding enterprise group according to a corresponding government class.

For specific limitations on a government information pushing device based on machine learning, reference may be made to the above limitation on a government information pushing method based on machine learning, which is not described herein. The above-mentioned modules in the government information pushing device based on machine learning may be all or partially implemented by software, hardware and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 3. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a machine learning based government information pushing method.

It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, including a memory and a processor, the memory having stored therein a computer program, involving all or part of the flow of the methods of the embodiments described above.

In one embodiment, a computer readable storage medium having a computer program stored thereon is provided, involving all or part of the flow of the methods of the embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile memory may include Read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, or the like. Volatile memory can include Random access memory (Random AccessMemory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can take many forms, such as static random access memory (StaticRandomAccessMemory, SRAM) or dynamic random access memory (DynamicRandomAccessMemory, DRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The government affair information pushing method based on machine learning is characterized by comprising the following steps of:

2. The machine learning based government affair information pushing method according to claim 1, wherein the collecting government affair information comprises:

and acquiring data of an external government system through a standard API.

3. The machine learning based government affair information pushing method according to claim 2, wherein the government affair disclosure information includes policy regulations, a business environment and expert consensus.

4. The machine learning-based government information pushing method according to claim 1, wherein the algorithm used by the machine learning model is a classification algorithm or a regression algorithm.

5. The machine learning-based government information pushing method according to claim 1, wherein the similarity calculation is performed by utilizing a HanLP algorithm.

6. The machine learning based government information pushing method of claim 1, further comprising:

7. The machine learning based government information pushing method of claim 1, further comprising:

8. The machine learning based government information pushing method of claim 1, further comprising:

9. Government affair information pusher based on machine study, characterized by comprising:

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.