CN115238065B - Intelligent document recommendation method based on federal learning - Google Patents

Intelligent document recommendation method based on federal learning Download PDF

Info

Publication number
CN115238065B
CN115238065B CN202211154292.5A CN202211154292A CN115238065B CN 115238065 B CN115238065 B CN 115238065B CN 202211154292 A CN202211154292 A CN 202211154292A CN 115238065 B CN115238065 B CN 115238065B
Authority
CN
China
Prior art keywords
official document
document
sub
database
official
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211154292.5A
Other languages
Chinese (zh)
Other versions
CN115238065A (en
Inventor
肖益
陈轮
韩国权
黄海峰
黄铁淳
周伟
程建润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiji Computer Corp Ltd
Original Assignee
Taiji Computer Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiji Computer Corp Ltd filed Critical Taiji Computer Corp Ltd
Priority to CN202211154292.5A priority Critical patent/CN115238065B/en
Publication of CN115238065A publication Critical patent/CN115238065A/en
Application granted granted Critical
Publication of CN115238065B publication Critical patent/CN115238065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Abstract

The application provides a federal learning-based intelligent document recommendation method, which comprises the following steps: determining the official document type of the target official document text and the corresponding official document type of each official document sub-database; determining a federal learning architecture corresponding to each official document sub-database; broadcasting the federal learning architecture to each official document sub-database by a federal learning aggregation server; and outputting a document recommendation result corresponding to the target document text after the federal learning training. According to the intelligent document recommendation method provided by the invention, a document recommendation model is obtained by adopting the framework training of federal learning, so that the problem of training corpus loss caused by the fact that the data of each document sub-database are not intercommunicated is solved; meanwhile, based on the characteristics of the corpus data of each official document sub-database, the official document recommendation model gives consideration to learning of common grammatical structure characteristics of official documents and specific grammatical structure characteristics of various official documents, and accordingly the recommendation result accuracy of the official document recommendation model is high.

Description

Intelligent document recommendation method based on federal learning
Technical Field
The application relates to the technical field of computers, in particular to an intelligent document recommendation method based on federal learning.
Background
Under the background of the rapid development of e-government applications, e-official document approval systems are gradually popularized. When a document approver approves a target document, a large number of related documents are generally referred to, so that accurate approval opinions are given. However, the existing electronic official document approval system has not provided the function of relevant official document recommendation for a while.
In the face of this problem, if the existing machine learning-based algorithm is adopted and the current official document database is used as training data to obtain the official document recommendation model, the following problems can exist: due to the special confidentiality of the official document text, official document data of different departments in different regions can be separately stored in each official document sub-database, and the data in the databases are difficult to intercommunicate. Therefore, the method can only use the local official document sub-database to train the official document recommendation model, and the problem of training corpus loss with less official documents accumulated in the official document sub-database exists. Moreover, the official document texts are different from general texts and have special grammatical structural features, and the grammatical structural features of the official document texts of different types are the same, so that the problem of corpus loss is further amplified, and finally, the recommendation result of the official document recommendation model obtained through training is inaccurate.
Disclosure of Invention
An object of an embodiment of the present application is to provide an intelligent document recommendation method based on federal learning, including:
determining the official document type of the target official document text and the official document types corresponding to all the official document sub-databases;
determining a federal learning architecture corresponding to each official document sub-database according to the official document type of the target official document text and the official document type corresponding to each official document sub-database; the federated learning architecture comprises a horizontal federated learning or vertical federated learning relationship among the document recommendation models corresponding to the document subdata;
broadcasting the federal learning architecture to each official document sub-database by a federal learning aggregation server;
and after the document recommendation model corresponding to each document sub-database performs federal learning training according to the federal learning state framework, outputting a document recommendation result corresponding to the target document text.
Optionally, the determining, according to the document type of the target document text and the document type corresponding to each document sub-database, the federal learning architecture corresponding to each document sub-database specifically includes:
dividing each official document sub-database into a plurality of official document sub-database sets according to the corresponding official document types;
determining similarity weights of the official document sub-database sets relative to the official document types of the target official document texts;
constructing and determining the federal learning architecture by taking each official document sub-database set as a node for longitudinal federal learning and taking each official document sub-database in each official document sub-database set as a sub-node for horizontal federal learning; wherein the rank of the longitudinal federated learned nodes in the federated learning architecture is determined according to the similarity weight.
Optionally, the dividing each document sub-database into a plurality of document sub-database sets according to the corresponding document type specifically includes:
determining a main type vector of each official document sub-database according to the official document type of the main official document text in each official document sub-database;
extracting official document keywords from all official document texts in each official document sub-database, and determining a main keyword vector of each official document sub-database;
and clustering the official document sub-databases according to the main type vector and the main keyword vector of each official document sub-database to obtain a plurality of official document sub-database sets.
Optionally, the determining the main type vector of each document sub-database according to the document type of the main document text in each document sub-database specifically includes:
counting the official document types with preset names and ranking in front according to the number of official document texts of each official document type in each official document sub-database;
and determining a main type vector of the official document sub-database according to the top-ranked official document type of the preset name order.
Optionally, the determining similarity weights of the multiple official document sub-database sets with respect to the official document types of the target official document text specifically includes:
and determining the similarity weight of the official document sub-database set relative to the official document type of the target official document text according to the proportion of the number of the official document texts which are the same as or similar to the official document type of the target official document text in each official document sub-database set.
Optionally, the similarity relationship of the document types is pre-calculated according to the grammatical structure features of the document texts of the plurality of document types.
Optionally, the method further includes a federal learning training process of the document recommendation model corresponding to each document sub-database as follows:
after local training is carried out on each horizontal federally-learned child node in the federal learning framework, the well-trained document recommendation model parameters are transmitted to a federal learning aggregation server;
the federal learning aggregation server transversely aggregates the document recommendation model parameters trained by the sub-nodes according to the information of the document sub-database set to obtain node aggregation parameters, and respectively transmits the node aggregation parameters to corresponding longitudinal federal learning nodes in the federal learning architecture;
according to the sequencing of each longitudinal federated learned node in the federated learning framework, sequentially carrying out longitudinal aggregation on the node aggregation parameters of two adjacent nodes according to the similarity weight, combining the similarity weights, and transmitting to the next node until the node aggregation parameters of all the nodes sequentially complete longitudinal aggregation;
and after evaluating the node aggregation parameters after longitudinal aggregation, the federated learning aggregation server performs the next round of federated learning until the document recommendation model parameters meet preset conditions.
Optionally, the step of combining the similarity weights of two adjacent nodes specifically includes:
and taking the number of the official document texts in the official document sub-database corresponding to the two adjacent nodes as a weight, and carrying out weighted summation on the similarity weights of the two adjacent nodes.
Optionally, the similarity weight corresponding to the node is stored in a local server corresponding to each official document sub-database.
Optionally, before training the official document recommendation model corresponding to each official document sub-database, the method further includes:
and the federal learning aggregation server broadcasts the initial structure and parameters of the official document recommendation model and the official document types of the target official document texts to all child nodes of the federal learning framework.
According to the intelligent document recommendation method, the document recommendation model is obtained by adopting the framework training of federal learning, so that the problem of training corpus loss caused by the fact that the data of all document sub-databases are not intercommunicated is solved; meanwhile, based on the characteristics of corpus data of each document sub-database, a framework combining horizontal federal learning and longitudinal federal learning is designed, so that the common grammatical structural features of the document texts and the respective specific grammatical structural features of various document texts of the document recommendation model are considered for learning, and the recommendation result accuracy of the document recommendation model is higher; in addition, the privacy of each official document sub-database data is also considered, so that the official document text data can not excessively appear in a domain situation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.
FIG. 1 is a schematic flow chart of an intelligent document recommendation method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a federal learning architecture determination method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a federated learning architecture provided by an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a federated learning training method provided in an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an intelligent document recommendation device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Under the background of the rapid development of e-government applications, e-official document approval systems are gradually popularized. When a document approver approves a target document, a large number of related documents are generally referred to, so that accurate approval opinions are given. However, the existing electronic official document approval system has not provided the function of relevant official document recommendation for a while.
In the face of this problem, if the existing machine learning-based algorithm is adopted and the current official document database is used as training data to obtain the official document recommendation model, the following problems can exist: due to the special confidentiality of the official document text, official document data of different departments in different regions can be separately stored in each official document sub-database, and the data in the databases are difficult to intercommunicate. Therefore, the document recommendation model is trained by using only the local document sub-database, and the problem of corpus deficiency exists in which documents accumulated in the document sub-database are less. Moreover, the official document texts are different from general texts and have special grammatical structural features, and the grammatical structural features of the official document texts of different types are not only the same, so that the problem of corpus loss is further amplified, and finally, the recommendation result of the official document recommendation model obtained through training is inaccurate.
Based on the above, the embodiment of the invention provides an intelligent document recommendation method based on federal learning. Fig. 1 shows a flow chart of an intelligent document recommendation method according to an embodiment of the present invention.
Step S110, determining the official document type of the target official document text and the official document types corresponding to the official document sub-databases.
When the embodiment of the invention trains the official document recommendation model, the problems that due to the special confidentiality of the official document text, official document data of different departments in different regions can be separately stored in each official document sub-database, and the data in the databases are difficult to intercommunicate are considered. Therefore, the language material information of each document sub-database can be used for training the document recommendation model in a federated learning mode, and the privacy of the data of each document sub-database cannot be influenced.
However, the text of a document is of many different types, such as announcements, resolution, meeting summaries, requests, and so forth. Thus, a document will have two different features than a general text: firstly, all types of official texts have common grammatical structural features based on the normalization of official structure and written expression; and secondly, different types of official document texts have respective specific paragraph formats and common words, so that the different types of official document texts respectively have respective specific grammatical structural features. Therefore, two types of grammatical structure characteristics need to be considered when the text recommendation model is trained in a federal learning mode.
Specifically, for a certain type of target official document text, the official document text corpora with the same or similar types and the official document text corpora with completely different types have completely different values when text recommendation is performed. But common grammatical structural features in the completely different types of official text corpora cannot be completely abandoned during training. Therefore, the embodiment of the invention needs to combine two structures of horizontal federal learning and vertical federal learning in the text recommendation model training process based on the federal learning. The horizontal federal learning refers to a mode of aggregating models after training the models respectively by a plurality of similar corpus data; and the longitudinal federal learning refers to a mode of uniformly training a model after aggregating a plurality of different corpus data.
Therefore, the embodiment of the invention aims at a single-task official document approval scene, namely, a single-type official document text is often audited in one official document approval task, the type of the target official document text must be considered when the text recommendation model is trained, and a proper federal learning architecture can be determined according to the type of the target official document text, so that the trained text recommendation model can provide an accurate official document recommendation result for the type of official document text.
Therefore, in this step, the official document type of the target official document text and the official document types corresponding to the official document sub-databases need to be determined, and then the training mode of the official document recommendation model is determined in the subsequent steps. It is worth to be noted that, in the embodiment of the present invention, federal learning training is performed based on the official document sub-databases distributed in different regions and different departments as corpus data, and all texts in one official document sub-database become the minimum corpus data unit in the embodiment of the present invention as a whole, so as to ensure that the official document data among the official document sub-databases are not intercommunicated.
Step S120, determining a federal learning architecture corresponding to each official document sub-database according to the official document type of the target official document text and the official document type corresponding to each official document sub-database; the federal learning architecture comprises a horizontal federal learning or longitudinal federal learning relationship among the official document recommendation models corresponding to the official document subdata.
After the document type of the target document text and the document types corresponding to the document sub-databases are determined in step S110, this step needs to determine the federal learning architecture adopted in the embodiment of the present invention according to the above information, and fig. 2 shows that the method for determining the federal learning architecture in step S120 may be further specifically subdivided into steps S121 to S123.
And step S121, dividing each document sub-database into a plurality of document sub-database sets according to the corresponding document type.
When determining the federal learning architecture in the embodiment of the invention, each official document sub-database is a unit structure for federal learning. Since each unit structure of the federal learning architecture can be regarded as a tree structure, each unit structure can be referred to as a "node" or "child node" in the tree structure. Furthermore, it is necessary to determine information such as horizontal federal learning relationship or vertical federal learning relationship among the document sub-databases as "nodes" or "child nodes", and the positions of the "nodes" or "child nodes" in the tree structure.
The principle of the mode of combining the horizontal federal learning and the vertical federal learning is that different types of official documents have respective specific grammatical structure characteristics, and then different official document sub-databases are used as corpus data and are suitable for recommended official document types. The method is characterized in that the document sub-databases related to the corpus data are suitable for transverse federal learning, namely, a mode of aggregating models after training the models of a plurality of similar corpus data respectively is adopted, and grammatical structure characteristics pointed by the training data are similar, so that the cooperative training of the corpus recommendation model is realized under the condition that the data do not appear in a domain, and the effective utilization of the data is realized; between the completely different official document sub-databases of the official document types related to the contained language data, the special grammatical structure features have great difference, so that the horizontal federal learning cannot be carried out, and only the vertical federal learning can be used, namely, a mode of uniformly training a model after a plurality of different language data are aggregated is used, so that the training result of the language data of one part of the official document sub-databases is used as the supplement of the language data of the other part of the official document sub-databases. In the longitudinal federal learning, even the corpus data which is greatly different from the document type of the target document text can be used for training common grammatical structure characteristics of the document text when the text recommendation model is trained.
Therefore, in this step, the smallest corpus unit and the document sub-database are grouped according to the document type involved in the smallest corpus unit. The meaning of the grouping shows that the different types of the official documents are similar, and the grammatical structural characteristics of some official documents are similar, such as decision and resolution, announcement and announcement, and the like, and the official documents can be used for transverse federal learning as corpus data; some official documents have different grammatical structural features, such as commands and meeting essentials, and can be used for longitudinal federal learning as corpus data. The information of whether the grammatical structure features of the different types of official document texts are similar is obtained by manually pre-calculating the grammatical structure features of the different types of official document texts, and meanwhile, the information can be determined in an auxiliary mode by combining manual experience.
Specifically, the official document sub-database mainly refers to which types of official document texts in the official document sub-database occupy the main part according to the corresponding official document types, that is, the official document types of the main official document texts represent the corresponding official document types of the official document sub-database. The official document sub-databases may count the top-ranked official document types of the preset ranking, such as the top-five official document types in the official document sub-databases, according to the corresponding official document types, which may be the number of official document texts of each official document type in each official document sub-database. And then determining a main type vector of the official document sub-database according to the official document type with the preset ranking. For example, the identifiers corresponding to the document types in the top five ranks are combined into a feature vector, which is used to identify the document type attribute of the document sub-database.
In order to improve the accuracy of grouping the document sub-databases, in this step, the document type attribute of the document sub-database needs to be identified according to the key words that mainly appear. Because for different types of official document texts, each has a specific high-frequency vocabulary as the characteristic of the type of official document text. Therefore, in this step, the official document keywords can be extracted from all the official document texts in each official document sub-database, the main keyword vector of each official document sub-database can be determined, specifically, the high-frequency word statistical script can be used for implementation, and finally, the top ten keywords can be taken to be combined into the main keyword vector.
After the main type vector and the main keyword vector of each document sub-database are determined, clustering can be performed on each document sub-database to obtain a plurality of document sub-database sets. The specific clustering algorithm can be the existing algorithms such as k-means and DBSCAN. Therefore, the official document sub-databases participating in the federal learning are divided into a plurality of sets, the official document types corresponding to the official document sub-databases in each set are the same or similar, and the official document types corresponding to the official document sub-databases in the sets are different.
And step S122, determining similarity weights of the plurality of official document sub-database sets relative to the official document types of the target official document texts.
The official document recommendation models corresponding to a plurality of official document subdata in the same official document subdatabase set can be trained simultaneously, and no clear requirement is made on the time sequence. However, when longitudinal federal learning is performed among different document sub-database sets, the training sequence, namely the aggregation mode of corpus data, needs to be determined.
In particular, fig. 3 shows a schematic diagram of a federated learning architecture in an embodiment of the present invention. Longitudinal federated learning is expanded among different nodes, training data of the nodes are aggregated from far to near according to a federated learning aggregation server, namely, the nodes of the public document sub-database set 1 and the nodes of the public document sub-database set 2 are aggregated at first, and then an aggregation result is aggregated with the nodes of the public document sub-database set 3, and so on. It can be understood that, for nodes farther from the federal learning aggregation server, the contribution of the result to the official document recommendation model is smaller, the common grammatical structure characteristics and the specific grammatical structure characteristics of the same type of texts used for training the official document recommendation model in the process of training the model are met, and the common grammatical structure characteristics of the training official documents are supplemented by different types of official documents, so that the official documents recommended by the trained model are more accurate.
Therefore, in order to order the nodes in the federated learning architecture in this step, the similarity weight of each document sub-database set with respect to the document type of the target document text needs to be determined. Specifically, the similarity weight of each official document sub-database set relative to the official document type of the target official document text may be determined according to the percentage of the number of official document texts in each official document sub-database set, which are the same as or similar to the official document type of the target official document text, where the value of the similarity weight is between 0 and 1.
Step S123, constructing and determining the federal learning architecture by taking the public document sub-database sets as nodes for longitudinal federal learning and taking the public document sub-databases in each public document sub-database set as sub-nodes for transverse federal learning; wherein the rank of the longitudinal federated learned nodes in the federated learning architecture is determined according to the similarity weight.
After determining the similarity weights of the plurality of official document sub-database sets and the official document types of the official document sub-database sets relative to the target official document text, the federal learning architecture in the embodiment of the invention can be determined. As shown in fig. 3, the N public document sub-databases participating in the federal learning are divided into M public document sub-database sets as nodes of the federal learning frame, each public document sub-database set includes a plurality of public document sub-databases with different numbers as sub-nodes of the federal learning frame, for example, the public document sub-database set 1 includes public document sub-databases 1 to 10, the public document sub-database set 2 includes public document sub-databases 11 to 15, and the public document sub-database set M includes public document sub-databases N-9 to N. The sequencing of the M nodes is determined according to the similarity weight of the nodes relative to the document type of the target document text, the similarity weight of the document sub-database set 1 is the smallest and farthest from the federated learning aggregation server, the similarity weight of the document sub-database set 2 is the next, and the like.
And step S130, broadcasting the federal learning architecture to each official document sub-database by a federal learning aggregation server.
In the training process of the federal learning, the calculation of the federal learning framework is calculated after the information of each official document sub-database is collected in the federal learning aggregation server. Therefore, before the federal learning training of the official document recommendation model is performed, the federal learning aggregation server not only needs to broadcast the initial structure and parameters of the official document recommendation model and the official document type of the target official document text to each child node of the federal learning architecture, but also needs to broadcast the determined federal learning architecture to the node where each official document sub-database is located, so that each node knows how to transmit the training data to the next node or to the federal learning aggregation server through a communication channel in the training process.
And step S140, after the official document recommendation model corresponding to each official document sub-database carries out federal learning training according to the federal learning state framework, outputting an official document recommendation result corresponding to the target official document text.
After the federal learning architecture is determined and the federal learning aggregation server broadcasts the necessary information before training described in the previous step to the child nodes where the document sub-databases are located, the step is trained according to the federal learning training mode. The sub-nodes with the horizontal federal learning relationship can be trained respectively and then model parameters are aggregated to the nodes, and the nodes are sequentially aggregated according to the order determined by the federal learning architecture when longitudinal federal learning is carried out.
Fig. 4 shows a federal learning training procedure of the document recommendation model provided in the embodiment of the present invention, which specifically includes the following steps.
And S410, after local training is carried out on each horizontal federally-learned child node in the federal learning framework, transmitting the trained document recommendation model parameters to the federal learning aggregation server.
As described above, in the federal learning training process of the document recommendation model in the embodiment of the present invention, local training is performed on each horizontal federally learned child node in the federal learning architecture, and the training target is the initial document recommendation model broadcasted by the federal learning aggregation server to each child node. Training in order to ensure privacy of data among the sub-nodes, each sub-node in the node cannot directly aggregate data in a local aggregation server or a local aggregation server, and the trained model parameters need to be transmitted to the federal learning aggregation server of the whole system.
And S420, the federal learning aggregation server transversely aggregates the document recommendation model parameters trained by the sub-nodes according to the information of the document sub-database set to obtain node aggregation parameters, and respectively transmits the node aggregation parameters to the corresponding longitudinal federal learning nodes in the federal learning architecture.
And after receiving the trained parameters reported by each child node, the federated learning aggregation server needs to execute data aggregation operation in the horizontal federated learning and transmit the data aggregation operation as a node aggregation parameter to the corresponding node. The data aggregation mode is not particularly limited in the embodiment of the invention, and federal learning parameter aggregation methods such as naive weighted average aggregation, euclidean distance statistical aggregation, median-based aggregation and the like can be adopted.
The node aggregation parameters are transmitted to the corresponding nodes and are transmitted to the nodes so that data aggregation can be performed in a local action range when the nodes perform longitudinal aggregation subsequently, and unified aggregation is not performed at a federal learning aggregation server, and the risk of data leakage is reduced.
And S430, sequentially carrying out longitudinal aggregation on the node aggregation parameters of two adjacent nodes according to the similarity weight and transmitting the node aggregation parameters to the next node after combining the similarity weights according to the sequence of each longitudinal federated learned node in the federated learning architecture until the node aggregation parameters of all the nodes sequentially complete the longitudinal aggregation.
Original longitudinal federal learning needs to aggregate training data to a federal learning aggregation server and then finish training uniformly, but the longitudinal federal learning step in the embodiment of the invention is different from the common learning mode. The embodiment of the invention adopts a special federal learning architecture, and the corpus data in each node carries respective grammatical structure characteristic information in a document recommendation scene. Therefore, in order to avoid a large amount of data standardization processing generated when data of a large amount of document sub-database sets are aggregated at the federal learning aggregation server and increase the burden of the federal learning aggregation server, the embodiment of the invention directly carries out longitudinal aggregation on node aggregation parameters obtained by processing of the federal learning aggregation server, thereby simplifying the calculation amount of the longitudinal aggregation and the data communication cost.
Specifically, referring to fig. 3, in the federal learning architecture, the node aggregation parameters of two adjacent nodes are sequentially subjected to longitudinal aggregation according to the similarity weight according to the sequence of each longitudinal federal learned node. That is, the node aggregation parameters of the nodes in the document sub-database set 1 and the node aggregation parameters of the nodes in the document sub-database set 2 are aggregated, and the similarity weights corresponding to the document sub-database sets can be used as the weighted sum weight information in the aggregation mode of the two. In addition, except that the node aggregation parameters need to be aggregated in a weighted summation mode, the similarity weights also need to be aggregated as the similarity weights when aggregation calculation is performed with the next node. The step of combining the similarity weights of two adjacent nodes is specifically to take the number of the official document texts in the official document sub-databases corresponding to the two adjacent nodes as a weight, and perform weighted summation on the similarity weights of the two adjacent nodes.
And aggregating the node aggregation parameters and the similarity weights of the nodes of the document sub-database set 1 and the nodes of the document sub-database set 2 to obtain longitudinal comprehensive node aggregation parameters and longitudinal comprehensive similarity weights. And next, aggregating the longitudinal comprehensive node aggregation parameters and the longitudinal comprehensive similarity weights with the node aggregation parameters and the similarity weights of the nodes in the document sub-database set 3, and repeating the process until the node aggregation parameters of all the nodes sequentially complete longitudinal aggregation to obtain the final model training parameters of the federal learning.
It should be noted that, in the process of vertical federal learning, the similarity weights corresponding to the nodes are stored in the local servers corresponding to the document sub-databases. And the federated learning aggregation server does not store the data for a long time after using the similarity weights corresponding to the nodes in the step of determining the federated learning architecture, so as to prevent the similarity weight data from revealing the information of each official document sub-database.
It can be understood that the node aggregation parameter of the document sub-database set 1 node farthest from the federal learning aggregation server gradually reduces the contribution to the final training result in the longitudinal learning process, and conforms to the characteristic that the similarity weight of the node aggregation parameter with respect to the document type of the target document text is the lowest.
And S440, after the federated learning aggregation server evaluates the node aggregation parameters after longitudinal aggregation, performing the next federated learning cycle until the document recommendation model parameters meet preset conditions.
After the steps, the whole training process of federal learning completes one round of training. And a model evaluation module is arranged at the federal learning aggregation server, and is used for evaluating the longitudinally aggregated model parameters output in the step S430, and if the test passing the test set does not meet the preset condition, repeating the steps to carry out the next round of federal learning training.
After the training process is finished, the trained model parameters can be used as the model parameters of the official document recommendation model in the embodiment of the invention. In the model application stage, after receiving the input of the target official document text, the model can output the official document recommendation result corresponding to the target official document text. The official document recommendation result can be one or more official document texts highly related to the target official document text, and the range of the inquiry recommendation official document text can be determined according to the authority of the approver. The official document recommendation model obtained by the embodiment of the invention can recommend a more accurate official document text when the same type of input as the target official document text is faced. If the type of the official document which is responsible for the approval by the approver is another type different from the target official document text, the federal learning framework is determined again according to the type of the official document and corresponding model training is carried out, and the official document recommendation result suitable for the approver can be obtained.
According to the intelligent document recommendation method provided by the embodiment of the invention, a document recommendation model is obtained by adopting the framework training of federal learning, so that the problem of training corpus loss caused by the fact that the data of each document sub-database are not intercommunicated is solved; meanwhile, based on the characteristics of corpus data of each document sub-database, a framework combining horizontal federal learning and longitudinal federal learning is designed, so that the common grammatical structural features of the document texts and the respective specific grammatical structural features of various document texts of the document recommendation model are considered for learning, and the recommendation result accuracy of the document recommendation model is higher; in addition, the privacy of each official document sub-database data is also considered, so that the official document text data can not excessively appear in a domain situation.
Based on any of the above embodiments, fig. 5 shows a schematic structural diagram of an intelligent document recommendation device provided in an embodiment of the present invention, and the specific contents are as follows:
the official document classification module 501 is used for determining the official document types of the target official document texts and the official document types corresponding to the official document sub-databases;
the architecture determining module 502 is configured to determine a federal learning architecture corresponding to each document sub-database according to the document type of the target document text and the document type corresponding to each document sub-database; the federated learning architecture comprises a horizontal federated learning or vertical federated learning relationship among the document recommendation models corresponding to the document subdata;
a framework broadcasting module 503, configured to broadcast the federated learning framework to the respective official document sub-databases through a federated learning aggregation server;
and the official document recommendation module 504 is configured to output an official document recommendation result corresponding to the target official document text after the official document recommendation model corresponding to each official document sub-database performs federal learning training according to the federal learning state framework.
According to the intelligent document recommendation device provided by the embodiment of the invention, a document recommendation model is obtained by adopting the framework training of federal learning, so that the problem of training corpus loss caused by the fact that the data of each document sub-database are not intercommunicated is solved; meanwhile, based on the characteristics of corpus data of each document sub-database, a framework combining horizontal federal learning and vertical federal learning is designed, so that the common grammatical structure characteristics of the document recommendation model and the respective specific grammatical structure characteristics of various document texts are considered for learning, and the recommendation result accuracy of the document recommendation model is higher; in addition, the privacy of each official document sub-database data is also considered, so that the official document text data can not excessively appear in a domain situation.
Based on any of the above embodiments, fig. 6 shows a schematic physical structure diagram of an electronic device provided in an embodiment of the present invention, where the electronic device may include: a processor (processor) 610, a communication Interface (Communications Interface) 620, a memory (memory) 630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform the following method:
according to the intelligent document recommendation method provided by the embodiment of the invention, a document recommendation model is obtained by adopting the framework training of federal learning, so that the problem of training corpus loss caused by the fact that the data of each document sub-database are not intercommunicated is solved; meanwhile, based on the characteristics of corpus data of each document sub-database, a framework combining horizontal federal learning and longitudinal federal learning is designed, so that the common grammatical structural features of the document texts and the respective specific grammatical structural features of various document texts of the document recommendation model are considered for learning, and the recommendation result accuracy of the document recommendation model is higher; in addition, the privacy of the data of each official document sub-database is also considered, so that the official document text data can not be displayed in a domain excessively.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to execute the method provided by the foregoing embodiments, for example, the method includes:
according to the intelligent document recommendation method provided by the embodiment of the invention, a document recommendation model is obtained by adopting the framework training of federal learning, so that the problem of training corpus loss caused by the fact that the data of each document sub-database are not intercommunicated is solved; meanwhile, based on the characteristics of corpus data of each document sub-database, a framework combining horizontal federal learning and longitudinal federal learning is designed, so that the common grammatical structural features of the document texts and the respective specific grammatical structural features of various document texts of the document recommendation model are considered for learning, and the recommendation result accuracy of the document recommendation model is higher; in addition, the privacy of each official document sub-database data is also considered, so that the official document text data can not excessively appear in a domain situation.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. An intelligent document recommendation method based on federal learning is characterized by comprising the following steps:
determining the official document type of the target official document text and the official document types corresponding to all the official document sub-databases;
determining a federal learning framework corresponding to each document sub-database according to the document type of the target document text and the document type corresponding to each document sub-database; the federated learning architecture comprises a horizontal federated learning or vertical federated learning relationship among the document recommendation models corresponding to the document subdata;
broadcasting the federal learning architecture to each official document sub-database by a federal learning aggregation server;
after the document recommendation model corresponding to each document sub-database performs federal learning training according to the federal learning framework, outputting a document recommendation result corresponding to the target document text;
the determining the federal learning architecture corresponding to each official document sub-database according to the official document type of the target official document text and the official document type corresponding to each official document sub-database specifically includes:
dividing each official document sub-database into a plurality of official document sub-database sets according to the corresponding official document types;
determining similarity weights of the official document sub-database sets relative to the official document types of the target official document texts;
constructing and determining the federal learning architecture by taking each official document sub-database set as a node for longitudinal federal learning and taking each official document sub-database in each official document sub-database set as a sub-node for horizontal federal learning; wherein the rank of the longitudinal federated learning node in the federated learning architecture is determined according to the similarity weight, and the node with the minimum similarity weight is farthest from a federated learning aggregation server in the federated learning architecture.
2. The intelligent official document recommendation method according to claim 1, wherein the dividing of the official document sub-databases into a plurality of official document sub-database sets according to corresponding official document types specifically comprises:
determining a main type vector of each official document sub-database according to the official document type of the main official document text in each official document sub-database;
extracting official document keywords from all official document texts in each official document sub-database, and determining a main keyword vector of each official document sub-database;
and clustering the document sub-databases according to the main type vector and the main keyword vector of each document sub-database to obtain a plurality of document sub-database sets.
3. The intelligent official document recommendation method according to claim 2, wherein the determining of the main type vector of each official document sub-database according to the official document type of the main official document text in each official document sub-database specifically comprises:
counting the official document types with preset names and ranking in front according to the number of official document texts of each official document type in each official document sub-database;
and determining a main type vector of the official document sub-database according to the top-ranked official document type of the preset name order.
4. The intelligent official document recommendation method according to claim 1, wherein the determining similarity weights of the official document sub-database sets with respect to the official document type of the target official document text specifically comprises:
and determining the similarity weight of the official document sub-database set relative to the official document type of the target official document text according to the proportion of the number of the official document texts which are the same as or similar to the official document type of the target official document text in each official document sub-database set.
5. The intelligent recommendation method for official documents according to claim 4, wherein the similarity relation of the official document types is pre-calculated according to the grammatical structure characteristics of the official document texts of a plurality of official document types.
6. The intelligent document recommendation method according to claim 1, further comprising a federal learning training process of the document recommendation model corresponding to each document sub-database as follows:
after local training is carried out on each horizontal federally-learned child node in the federal learning framework, the well-trained document recommendation model parameters are transmitted to a federal learning aggregation server;
the federated learning aggregation server transversely aggregates the document recommendation model parameters trained by each sub-node according to the information of the document sub-database set to obtain node aggregation parameters, and respectively transmits the node aggregation parameters to corresponding longitudinal federated learning nodes in the federated learning architecture;
according to the sequencing of each longitudinal federated learned node in the federated learning framework, sequentially carrying out longitudinal aggregation on the node aggregation parameters of two adjacent nodes according to the similarity weight, combining the similarity weights, and transmitting to the next node until the node aggregation parameters of all the nodes sequentially complete longitudinal aggregation;
and after evaluating the node aggregation parameters after longitudinal aggregation, the federated learning aggregation server performs the next round of federated learning until the document recommendation model parameters meet preset conditions.
7. The intelligent official document recommendation method according to claim 6, wherein the step of combining similarity weights of two adjacent nodes specifically comprises:
and taking the number of the official document texts in the official document sub-database corresponding to the two adjacent nodes as a weight, and carrying out weighted summation on the similarity weights of the two adjacent nodes.
8. The intelligent official document recommendation method according to claim 6, wherein the similarity weights corresponding to the nodes are stored in the local server corresponding to each official document sub-database.
9. The intelligent official document recommendation method according to claim 6, wherein before training the official document recommendation model corresponding to each official document sub-database, the method further comprises:
and the federal learning aggregation server broadcasts the initial structure and parameters of the official document recommendation model and the official document types of the target official document texts to all child nodes of the federal learning framework.
CN202211154292.5A 2022-09-22 2022-09-22 Intelligent document recommendation method based on federal learning Active CN115238065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211154292.5A CN115238065B (en) 2022-09-22 2022-09-22 Intelligent document recommendation method based on federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211154292.5A CN115238065B (en) 2022-09-22 2022-09-22 Intelligent document recommendation method based on federal learning

Publications (2)

Publication Number Publication Date
CN115238065A CN115238065A (en) 2022-10-25
CN115238065B true CN115238065B (en) 2022-12-20

Family

ID=83667116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211154292.5A Active CN115238065B (en) 2022-09-22 2022-09-22 Intelligent document recommendation method based on federal learning

Country Status (1)

Country Link
CN (1) CN115238065B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825970A (en) * 2019-11-07 2020-02-21 浙江同花顺智能科技有限公司 Information recommendation method, device, equipment and computer readable storage medium
CN111666401A (en) * 2020-05-29 2020-09-15 平安科技(深圳)有限公司 Official document recommendation method and device based on graph structure, computer equipment and medium
CN112836130A (en) * 2021-02-20 2021-05-25 四川省人工智能研究院(宜宾) Context-aware recommendation system and method based on federated learning
CN114117210A (en) * 2021-11-12 2022-03-01 中国银行股份有限公司 Intelligent financial product recommendation method and device based on federal learning
CN114625976A (en) * 2022-05-16 2022-06-14 深圳市宏博信息科技有限公司 Data recommendation method, device, equipment and medium based on federal learning
CN115049011A (en) * 2022-06-27 2022-09-13 支付宝(杭州)信息技术有限公司 Method and device for determining contribution degree of training member model of federal learning

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012592A1 (en) * 2017-07-07 2019-01-10 Pointr Data Inc. Secure federated neural networks
CN110782042B (en) * 2019-10-29 2022-02-11 深圳前海微众银行股份有限公司 Method, device, equipment and medium for combining horizontal federation and vertical federation
US11461593B2 (en) * 2019-11-26 2022-10-04 International Business Machines Corporation Federated learning of clients
CN111079022B (en) * 2019-12-20 2023-10-03 深圳前海微众银行股份有限公司 Personalized recommendation method, device, equipment and medium based on federal learning
CN111522948A (en) * 2020-04-22 2020-08-11 中电科新型智慧城市研究院有限公司 Method and system for intelligently processing official document
CN113254574A (en) * 2021-03-15 2021-08-13 河北地质大学 Method, device and system for auxiliary generation of customs official documents
CN113689003B (en) * 2021-08-10 2024-03-22 华东师范大学 Mixed federal learning framework and method for safely removing third party
CN113992360B (en) * 2021-10-01 2024-01-30 浙商银行股份有限公司 Federal learning method and equipment based on block chain crossing
CN113704386A (en) * 2021-10-27 2021-11-26 深圳前海环融联易信息科技服务有限公司 Text recommendation method and device based on deep learning and related media
CN114169412A (en) * 2021-11-23 2022-03-11 北京邮电大学 Federal learning model training method for large-scale industrial chain privacy calculation
CN115034816A (en) * 2022-06-07 2022-09-09 青岛文达通科技股份有限公司 Demand prediction method and system based on unsupervised and federal learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825970A (en) * 2019-11-07 2020-02-21 浙江同花顺智能科技有限公司 Information recommendation method, device, equipment and computer readable storage medium
CN111666401A (en) * 2020-05-29 2020-09-15 平安科技(深圳)有限公司 Official document recommendation method and device based on graph structure, computer equipment and medium
CN112836130A (en) * 2021-02-20 2021-05-25 四川省人工智能研究院(宜宾) Context-aware recommendation system and method based on federated learning
CN114117210A (en) * 2021-11-12 2022-03-01 中国银行股份有限公司 Intelligent financial product recommendation method and device based on federal learning
CN114625976A (en) * 2022-05-16 2022-06-14 深圳市宏博信息科技有限公司 Data recommendation method, device, equipment and medium based on federal learning
CN115049011A (en) * 2022-06-27 2022-09-13 支付宝(杭州)信息技术有限公司 Method and device for determining contribution degree of training member model of federal learning

Also Published As

Publication number Publication date
CN115238065A (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
CN105022754B (en) Object classification method and device based on social network
CN110705301B (en) Entity relationship extraction method and device, storage medium and electronic equipment
CN103885937B (en) Method for judging repetition of enterprise Chinese names on basis of core word similarity
CN103116588A (en) Method and system for personalized recommendation
CN107545038B (en) Text classification method and equipment
CN104077417A (en) Figure tag recommendation method and system in social network
CN106997379B (en) Method for merging similar texts based on click volumes of image texts
CN110020022B (en) Data processing method, device, equipment and readable storage medium
CN105069129A (en) Self-adaptive multi-label prediction method
CN110162624A (en) A kind of text handling method, device and relevant device
CN111539612B (en) Training method and system of risk classification model
CN109960719A (en) A kind of document handling method and relevant apparatus
CN115186654A (en) Method for generating document abstract
CN112035449A (en) Data processing method and device, computer equipment and storage medium
CN104572915A (en) User event relevance calculation method based on content environment enhancement
CN104077288B (en) Web page contents recommend method and web page contents recommendation apparatus
CN114254615A (en) Volume assembling method and device, electronic equipment and storage medium
CN114491149A (en) Information processing method and apparatus, electronic device, storage medium, and program product
CN113934848A (en) Data classification method and device and electronic equipment
CN117235238A (en) Question answering method, question answering device, storage medium and computer equipment
CN115238065B (en) Intelligent document recommendation method based on federal learning
CN105512270B (en) Method and device for determining related objects
CN108536666A (en) A kind of short text information extracting method and device
CN111382265B (en) Searching method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant