WO2022105554A1

WO2022105554A1 - Region portrait correction method and apparatus, and electronic device and readable storage medium

Info

Publication number: WO2022105554A1
Application number: PCT/CN2021/126483
Authority: WO
Inventors: 王若兰; 刘洋; 张钧波; 郑宇�
Original assignee: 京东城市(北京)数字科技有限公司
Priority date: 2020-11-18
Filing date: 2021-10-26
Publication date: 2022-05-27
Also published as: CN113781082B; CN113781082A

Abstract

A region portrait correction method and apparatus, and an electronic device and a computer-readable storage medium, which relate to the field of machine learning. The region portrait correction method comprises: sending, to a collaborative server, screened region information that is obtained by means of screening a plurality of regions, so as to receive overlapping region information sent by the collaborative server, wherein the overlapping region information is generated by the collaborative server according to screened region information sent by a first server and screened region information sent by a second server (S202); determining, on the basis of the overlapping region information, a region to be corrected (S204); calling the overlapping region information to execute an interactive training operation with the collaborative server and the second server, so as to generate a correction model according to an interactive training result (S206); and correcting, on the basis of the correction model, the region to be corrected, so as to correct region portraits of the plurality of regions (S208). By means of the technical solution of the present disclosure, the accuracy of the description of a region portrait can be improved, thereby improving the reliability of subsequent utilization of the region portrait.

Description

Correction method, device, electronic device and readable storage medium for area portrait

The present disclosure claims the priority of the Chinese patent application with the application number 202011291786.9 and the title of "The Correction Method, Apparatus, Electronic Device and Readable Storage Medium of Area Portrait" filed on November 18, 2020, the entirety of the Chinese patent application The contents are incorporated herein by reference in their entirety.

technical field

The present disclosure relates to the technical field of machine learning, and in particular, to a method, device, electronic device, and computer-readable storage medium for correcting a region portrait.

Background technique

The construction of regional portraits is of great significance to site selection and refined urban management. However, due to the operating characteristics of a single institution and the limited user population it covers, it is difficult to accurately describe a target index in a certain region using unilateral data from an enterprise. Therefore, in order to obtain a more accurate regional portrait, data fusion needs to be carried out between institutions to combine multi-party data to correct the target index in the region.

However, if some data cannot be shared between enterprises, it will seriously affect the accuracy of regional portraits, which will have a huge impact on urban service construction or commercial construction in the later stage.

It should be noted that the information disclosed in the above Background section is only for enhancement of understanding of the background of the present disclosure, and therefore may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.

SUMMARY OF THE INVENTION

The purpose of the present disclosure is to provide a method, device, electronic device and computer-readable storage medium for correcting a region portrait, at least to a certain extent, to overcome the problem of poor description accuracy of the region portrait in the related art.

Other features and advantages of the present disclosure will become apparent from the following detailed description, or be learned in part by practice of the present disclosure.

According to a first aspect of the present disclosure, there is provided a method for correcting a region portrait, comprising: sending screening region information selected from a plurality of regions to a collaborative server, so as to receive the overlapping region information sent by the collaborative server, wherein , the information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; the area to be corrected is determined based on the information of the overlapping area; Performing an interactive training operation between the area information, the collaborative server and the second server to generate a correction model according to the interactive training result; correcting the to-be-corrected area based on the correction model to correct the multiple areas area image.

In one embodiment, the sending the screening area information screened out from the multiple areas to the collaborative server, so as to receive the overlapping area information sent by the collaborative server, includes: filtering the multiple areas based on the first filtering rule Perform a screening operation to obtain first screening area information; send the first screening area information to the collaborative server, and receive the information of the first overlapping area sent by the collaborative server, wherein the information of the first overlapping area It is used to indicate an invalid area, and the information of the first overlapping area is generated by the collaborative server according to the first screening area information and the second screening area information sent by the second server.

In one embodiment, the performing a screening operation on multiple regions according to the screening rule and obtaining the screening region information further includes: deleting the first overlapping region in the multiple regions to obtain the remaining regions; based on the second The screening rule performs a screening operation on the remaining area to obtain first screening area information; sending the second screening area information to the collaborative server, and receiving the information of the second overlapping area sent by the collaborative server, wherein the The information of the second overlapping area is used to indicate a reliable area, and the information of the second overlapping area is generated by the cooperative server according to the third screening area information and the fourth screening area information sent by the second server.

In one embodiment, the modifying the region to be modified based on the modification model to modify the region portraits of the multiple regions includes: inputting the region features of the region to be modified into the modification model, To output the modified target features; use the modified target features to replace the original target features in the to-be-corrected area, to update the target features of the multiple regions and determine based on the updated target features of the multiple regions target indices of the multiple regions; and correcting the regional portraits of the multiple regions based on the target indices.

In one embodiment, the determining the target index of the plurality of regions based on the updated target features of the plurality of regions includes: performing a clustering operation on the target features of the plurality of regions, and obtaining a plurality of clusters Class centers and corresponding clusters; sort the plurality of cluster centers, and configure a score interval for each cluster center; match the clusters to the corresponding score intervals to generate target indices for the plurality of regions.

In one embodiment, the determining the target index of the plurality of regions based on the updated target features of the plurality of regions includes: inputting the target features of the plurality of regions into a preset classification model, so as to be determined by The classification model outputs the target indices of the multiple regions according to the classification results of the revised target features, wherein the historical target indices are trained in a supervised learning manner to generate the classification model.

In one embodiment, the invoking the overlapping area information and performing an interactive training operation between the collaborative server and the second server, so as to generate a revised model according to the interactive training result, includes: receiving a password sent by the collaborative server. key information; invoking the key information and the overlapping area information to perform interactive encryption training of the federated learning model with the second server to generate the revised model.

According to a second aspect of the present disclosure, there is provided a method for correcting an area portrait, including: respectively receiving screening area information sent by a first server and a second server; The information of the overlapping area is sent to the first server and the second server; based on the overlapping area information, an interactive training operation is performed between the first server and/or the second server, so that the The first server and/or the second service generates a correction model according to the interactive training result, and corrects the respective regions to be corrected based on the correction model.

In one embodiment, the receiving, respectively, the screening area information sent by the first server and the second server includes: receiving the first screening information sent by the first server and the second screening information sent by the second server, to obtain the intersection of the first screening information and the second screening information; and receiving the third screening information sent by the first server and the fourth screening information sent by the second server, to check the first screening information. The third screening information is intersected with the fourth screening information.

In one embodiment, the performing an interactive training operation with the first server and/or the second server based on the overlapping area information includes: sending a message to the first server and the second server respectively. key information, so that the first server and/or the second server perform interactive encryption training of the federated learning model based on the key information.

According to a third aspect of the present disclosure, there is provided an apparatus for correcting a region portrait, comprising: a sending module configured to send screening region information screened from multiple regions to a collaborative server, so as to receive the coincidence data sent by the collaborative server. area information, wherein, the information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; a determination module is used for based on the overlapping area information The information determines the area to be corrected; the interactive training module is used to call the overlapping area information and the collaborative server and the second server to perform an interactive training operation, so as to generate a correction model according to the interactive training result; the correction module is used for The region to be corrected is corrected based on the correction model, so as to correct the region portraits of the multiple regions.

According to a fourth aspect of the present disclosure, there is provided an apparatus for correcting an area portrait, including: a transmission module for respectively receiving screening area information sent by a first server and a second server; a processing module for retrieving the screening area information Take the intersection to generate the information of the overlapping area; the sending module is used to send the information of the overlapping area to the first server and the second server; the auxiliary training module is used to send the information of the overlapping area to the first server and the second server; based on the information of the overlapping area and the Auxiliary interactive training is performed between the first server and/or the second server, so that the first server and/or the second service generate a revised model according to the interactive training result, and make corrections based on the revised model respective areas to be corrected.

According to a fifth aspect of the present disclosure, there is provided an electronic device, comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute any one of the foregoing by executing the executable instructions Correction method of area image.

According to a sixth aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any one of the above-mentioned correction methods for a region portrait.

In the area portrait correction scheme provided by the embodiments of the present disclosure, by sending the screening area information to the collaborative server, and receiving the overlapping area information obtained by the collaborative server combining the screening area information of the first server and the screening area information of the second server, By determining the overlapping area information, not only can the overlapping area be eliminated from multiple areas to obtain the area to be corrected, but also data fusion with the second server can be realized.

Further, the correction model is obtained based on the fusion data, and the correction model is used to correct the portrait of the area to be corrected. On the one hand, the accuracy of the description of the area portrait can be improved, thereby improving the reliability of the subsequent use of the area portrait, on the other hand. , in the whole interaction process, the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure. Obviously, the drawings in the following description are only some embodiments of the present disclosure, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

1 shows a schematic diagram of the structure of a correction system for a region portrait in an embodiment of the present disclosure;

2 shows a flowchart of a method for correcting a region portrait in an embodiment of the present disclosure;

3 shows a flowchart of another method for correcting a region portrait in an embodiment of the present disclosure;

FIG. 4 shows a flowchart of still another method for correcting a region portrait in an embodiment of the present disclosure;

5 shows a flowchart of another method for correcting a region portrait in an embodiment of the present disclosure;

FIG. 6 shows an interactive schematic diagram of a correction scheme of a region portrait according to an embodiment of the present disclosure;

7 shows a schematic diagram of an apparatus for correcting a region portrait in an embodiment of the present disclosure;

8 shows a schematic diagram of another apparatus for correcting a region portrait in an embodiment of the present disclosure;

FIG. 9 shows a schematic diagram of an electronic device in an embodiment of the present disclosure.

Detailed ways

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments, however, can be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repeated descriptions will be omitted. Some of the block diagrams shown in the figures are functional entities that do not necessarily necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The solution provided by the present application obtains a correction model by setting fusion data, and uses the correction model to correct the portrait of the region to be corrected. On the one hand, the accuracy of the description of the region portrait can be improved, thereby improving the reliability of the subsequent use of the region portrait. , on the other hand, in the whole interaction process, the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.

For ease of understanding, the following first explains several terms involved in this application.

City Profile (multi-factor profile based on big data and machine learning) is a SaaS product for planning, real estate, retail, and many GIS application industries. Its innovation lies in that, on the one hand, it integrates machine learning calculation and interactive visualization to break through the limitations of traditional GIS in the exploration and analysis of multi-dimensional/high-dimensional spatiotemporal data; on the other hand, it integrates massive urban data, Spark/ElasticSearch big data Processing engine, distributed computing, online data processing, online index calculation and multi-factor mining analysis have created an extremely easy-to-use and powerful SaaS service, breaking the professional barriers of GIS, empowering each user, allowing data acquisition, Data processing and multifactor spatial data mining become efficient and easy. At the same time, it supports the secondary development of API/SDK and can easily access the user's existing platform.

Federated Learning: When multiple data owners (such as enterprises) Fi (i=1\,...\,N) want to jointly train a machine learning model with their respective data Di, the traditional approach is to integrate the data into one party and use the The data D={Di\,i=1\,...\,N} are trained and the model M_sum is obtained. However, this scheme is often difficult to implement due to legal issues such as privacy and data security. To solve this problem, we introduce federated learning. Federated learning means that the data owner Fi can perform model training to obtain the calculation process of the model M_fed without giving its own data Di, and can ensure that the gap between the effect of the model M_fed V_fed and the effect of the model M_sum V_sum is sufficient. small, that is, |V_fed-V_sum|<δ, where δ is an arbitrarily small positive value.

Multi-party lending: refers to a bad user who borrows money from one financial institution and repays it to another lending institution. A large number of such illegal behaviors will collapse the entire financial system. To discover such users, the traditional method is that financial institutions go to a central database to query user information, and each institution must upload all their user information, but doing so is equivalent to exposing all important user privacy and data security of financial institutions. Not allowed under GDPR. Under the federated learning mechanism, there is no need to establish a central database, and any financial institution participating in federated learning can send a new user query request to other institutions in the federation, and other institutions will reply to the user without knowing the specific information of the user. Questions about local lending. This can not only protect the privacy and data integrity of existing users in various financial institutions, but also complete the important issue of querying multi-party lending.

The solutions provided by the embodiments of the present application involve technologies such as network modeling and machine learning, and are specifically described by the following embodiments.

FIG. 1 shows a schematic structural diagram of a system for correcting an area portrait in an embodiment of the present disclosure, including multiple terminals 120 and a server cluster 140 .

The terminal 120 may be a mobile phone, a game console, a tablet computer, an e-book reader, smart glasses, an MP4 (Moving Picture Experts Group Audio Layer IV, a moving image expert compression standard audio layer 4) player, a smart home device, an AR (Augmented Reality) player , augmented reality) equipment, VR (Virtual Reality, virtual reality) equipment and other mobile terminals, or, the terminal 120 may also be a personal computer (Personal Computer, PC), such as a laptop portable computer and a desktop computer and the like.

Wherein, the terminal 120 may be installed with an application program for providing correction of the area portrait.

The terminal 120 and the server cluster 140 are connected through a communication network. Optionally, the communication network is a wired network or a wireless network.

The server cluster 140 is a server, or consists of several servers, or a virtualization platform, or a cloud computing service center. The server cluster 140 is used to provide background services for the correction application for providing the regional portrait and the training application for the traffic prediction model. Optionally, the server cluster 140 undertakes the main computing work, and the terminal 120 undertakes the secondary computing work; alternatively, the server cluster 140 undertakes the secondary computing work, and the terminal 120 undertakes the main computing work; or, the terminal 120 and the server cluster 140 adopt distributed distribution Collaborative computing using a computing architecture.

In some optional embodiments, the server cluster 140 is used to store the correction model and prediction method of the region portrait.

Optionally, the clients of the applications installed in different terminals 120 are the same, or the clients of the applications installed on the two terminals 120 are clients of the same type of application on different control system platforms. Based on different terminal platforms, the specific form of the client of the application program may also be different, for example, the client of the application program may be a mobile phone client, a PC client, or a World Wide Web (Web) client.

Those skilled in the art may know that the number of the above-mentioned terminals 120 may be more or less. For example, the above-mentioned terminal may be only one, or the above-mentioned terminal may be dozens or hundreds, or more. The embodiments of the present application do not limit the number of terminals and device types.

Optionally, the system may further include a management device (not shown in FIG. 1 ), and the management device and the server cluster 140 are connected through a communication network. Optionally, the communication network is a wired network or a wireless network.

Optionally, the above-mentioned wireless network or wired network uses standard communication technologies and/or protocols. The network is usually the Internet, but can be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks). In some embodiments, data exchanged over a network is represented using technologies and/or formats including Hyper Text Mark-up Language (HTML), Extensible Markup Language (XML), and the like. In addition, you can also use services such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec), etc. Conventional encryption techniques to encrypt all or some of the links. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

Hereinafter, each step in the method for correcting the region portrait and the method for training the traffic prediction model in this exemplary embodiment will be described in more detail with reference to the accompanying drawings and embodiments.

FIG. 2 shows a flowchart of a method for correcting a region portrait in an embodiment of the present disclosure. The methods provided in the embodiments of the present disclosure may be executed by any electronic device with computing processing capability, for example, the terminal 120 and/or the server cluster 140 in FIG. 1 . In the following illustration, the server cluster 140 is used as the execution subject for illustration.

Wherein, the first server and the second server may be multiple, and the cooperation server interacts with the first server and the second server respectively.

As shown in FIG. 2 , the server cluster 140 is specifically the first server, and the method for performing the correction of the regional portrait includes the following steps:

In step S202, the screening area information screened out from multiple areas is sent to the collaborative server to receive the information of the overlapping area sent by the collaborative server, wherein the information of the overlapping area is determined by the collaborative server according to the screening area information sent by the first server and the information of the overlapping area. The screening area information sent by the second server is generated.

The screening area information may specifically be the ID of the screening area.

By adopting Geohash6, Geohash7 or other division methods for grid division to generate multiple regions, specifically, it can be carried out through the feature data such as regional population, geographical location, and regional POI (Point of interest) in the enterprise database. The regional features corresponding to the regional grid are obtained by preprocessing, and the regional consumption is used as the target features of the region to obtain the regional portrait of the region based on the above features and target features.

In addition, a collaboration server can be understood as a collaboration platform. On the one hand, the collaboration platform is used to perform screening operations based on the screening area information sent by the first server and the second server, and on the other hand, it is used to ensure that the first server and/or the second server Under the condition of the security of the feature information of the multiple regions, assist the first server and/or the second server to perform model training. The second server is specifically the information exchanged parameters with the first server, and further combined with the cooperative operation of the cooperative server, the second server can perform the same processing process as the first server, and the first server and the second server can store a Different regional features of the same region.

Step S204: Determine the to-be-corrected area based on the information of the overlapping area.

Those skilled in the art can understand that the screening area information screened from multiple areas can be executed based on the screening rules, and an intersection operation is performed on the collaborative server based on the area information screened out by the screening rules to determine the first The area shared by the server and the second server and meets the screening rules at the same time. The purpose of screening is to obtain areas that do not need to be corrected, so that areas to be corrected are obtained by eliminating areas that do not need to be corrected from multiple areas.

Specifically, the collaborative server performs a screening operation on the screening area information sent by the first server and the second server to obtain the overlapping area information. On the one hand, the area to be corrected can be obtained by eliminating the overlapping area from multiple areas; After obtaining the overlapping area information, relevant data of the second server can be introduced based on the overlapping area information, so as to realize data fusion between different servers.

Step S206, invoking the overlapping area information to perform an interactive training operation between the collaboration server and the second server, so as to generate a revised model according to the interactive training result.

Wherein, the overlapping area information can be used as training data, and the revised model can be generated through interactive training with the collaborative server and the second server.

Step S208 , correcting the region to be corrected based on the correction model, so as to correct the region portraits of the multiple regions.

In this embodiment, the correction model is used to correct target features with poor reliability in multiple regions, such as target features in regions with low reliability of consumption indicators, and the target features may be consumption data.

In addition, regional portraits can be understood as portraits generated based on features such as regional population, geographic location, regional POI (Point of Interest), and regional consumption.

In this embodiment, the multiple areas may include overlapping areas and areas to be corrected. The information obtained by sending the screening area information to the collaborative server and receiving the filtering area information of the first server and the screening area information of the second server is obtained by the collaborative server. For the overlapping area information, by determining the overlapping area information, not only can the overlapping area be eliminated from multiple areas to obtain the area to be corrected, but also data fusion with the second server can be realized.

For example, the subsequent use of regional portraits includes business scenarios such as regional location selection and advertisement placement in the later stage.

In one embodiment, sending the screening area information screened out from the multiple areas to the collaborative server to receive the overlapping area information sent by the collaborative server includes: performing a filtering operation on the multiple areas based on the first filtering rule, to obtain The first screening area information; send the first screening area information to the collaborative server, and receive the information of the first overlapping area sent by the collaborative server, wherein the information of the first overlapping area is used to indicate an invalid area, and the information of the first overlapping area Generated by the collaborative server according to the first screening area information and the second screening area information sent by the second server.

Taking the consumption index as an example, the reliable area refers to the area where the reliability of the consumption index is 0, that is, it is considered that the consumption data in this area cannot reflect the real consumption level.

For example, the first filtering rule may be: total population<C&total number of characteristic POIs<D&consumption amount<E, so as to filter out the area where the consumption index is 0. The first server and the second server respectively screen out areas whose own characteristics satisfy the first screening rule, and transmit such areas to the collaborative server.

In one embodiment, performing a screening operation on multiple regions according to the screening rule and obtaining the screening region information further includes: deleting the first overlapping region in the multiple regions to obtain the remaining region; performing the screening operation on the remaining regions based on the second screening rule The screening operation is performed to obtain the first screening area information; the second screening area information is sent to the collaborative server, and the information of the second overlapping area sent by the collaborative server is received, wherein the information of the second overlapping area is used to indicate a reliable area, and the first The information of the double overlapping area is generated by the collaboration server according to the third screening area information and the fourth screening area information sent by the second server.

Taking the consumption index as an example, the reliable area refers to the area with high reliability of the consumption index, that is, the consumption data in this area is considered to reflect the real consumption level.

In addition, the second screening rule can be generated based on the above characteristics. Taking the consumption index of the region as an example, the second screening rule is set as the regional population is greater than A & the total number of POIs is greater than 0 & the consumption amount is greater than or equal to B. Based on the above second screening rule, it is possible to filter Identify areas with high confidence in consumption indicators.

In this embodiment, by first screening out the areas where the consumption index is 0, the remaining areas are the areas where the consumption index is not 0, and further dividing the areas with high index credibility and low index credibility, and then the index can be used to The regions with high reliability are subjected to federated modeling, and then the target features of the regions with low index reliability are re-determined to realize the revision of the regional portrait.

In one embodiment, in step S206, generating the region to be corrected based on the information of the first overlapping region includes: deleting the first overlapping region and the second overlapping region in the multiple regions to obtain the region to be corrected.

In this embodiment, by setting the first screening rule and the second screening rule, the first overlapping area, that is, the invalid area, and the second overlapping area, that is, the reliable area, among the multiple areas are determined, and the first overlapping area and the second overlapping area are determined. The overlapping area is eliminated from multiple areas, and the remaining area is the area with low reliability, that is, the area that needs to be corrected. Through the above operations, the accuracy of the corrected operation object can be guaranteed.

As shown in Figure 3, in one embodiment, step S208 is based on the correction model to amend the area to be corrected, to revise the area portrait of the area to be corrected includes:

In step S302, the regional features of the region to be corrected are input into the correction model to output the corrected target features.

In step S304, the modified target features are used to replace the original target features in the region to be corrected, so as to update the target features of multiple regions.

Specifically, the trained model is used to infer the regions with low reliability of the target index, and the target features of such regions are retrieved to replace the original inaccurate region target features.

Based on the updated target features of the multiple regions, the target indices of the multiple regions are determined, including:

Step S306, perform a clustering operation on the target features of multiple regions, and obtain multiple cluster centers and corresponding cluster clusters.

Step S308, sort the plurality of cluster centers, and configure a score interval corresponding to each cluster center.

In step S310, the clusters are matched to corresponding score intervals to generate target indices of multiple regions.

Step S312, correcting the regional portraits of the multiple regions based on the target index.

In this embodiment, the revised consumption data is clustered, and the data can be clustered into 5-10 categories, and the specific number is determined according to the scene or business requirements.

After the target features are clustered, the cluster centers are sorted from small to large, and the index data of the corresponding clusters are matched to the corresponding score interval in turn, so that the final index score is between 0-100, the target index score It is the revised urban area target index. The obtained precise profile target index can be used for consumption analysis and regional consumption power estimation.

In one embodiment, determining the target indices of the multiple regions based on the updated target features of the multiple regions can also be achieved by adopting the following steps, including: inputting the target features of the multiple regions into a preset classification model, so that the The classification model outputs the target index of the area to be corrected according to the classification result of the corrected target feature, wherein the historical target index is trained based on supervised learning to generate the classification model.

In one embodiment, invoking the overlapping area information and performing an interactive training operation between the collaborative server and the second server to generate a revised model according to the interactive training result includes: receiving key information sent by the collaborative server; invoking the key information and the overlapping area The information pair performs the interactive encryption training of the federated learning model with the second server to generate a revised model.

In this embodiment, when revising the urban area portrait index, federated learning can be used to correct the regional portrait index when the enterprise data is not stored in the database. This method trains the federated model by using the regions with high reliability of the portrait indicators, and corrects the indicators characteristics of the regions with low reliability. Compared with the regional portrait obtained from unilateral data, the regional portrait correction technology in this paper can achieve a more accurate urban regional portrait under the premise of protecting the security of multi-party data, so as to serve the later application scenarios of regional portraits.

In addition, in the case that the original data is not stored in the database, multi-party secure computing and other methods based on cross-domain modeling of multi-party data security and privacy protection can also be used to replace the federated learning algorithm.

Specifically, we introduce the system architecture of federated learning by taking a scenario including two data owners (ie, the first server and the second server) as an example. The architecture can be extended to scenarios involving multiple data owners. Suppose that the first server and the second server jointly train a machine learning model, and their business systems have relevant data about their respective users. In addition, the second server also has the label data that the model needs to predict. For data privacy protection and security considerations, the first server and the second server cannot directly exchange data, and a federated learning system can be used to build a model. The architecture of the federated learning system consists of three parts.

Part 1: Encrypted sample alignment. Since the user groups of the two companies do not completely overlap, the system uses encryption-based user sample alignment technology to confirm the common users of both parties on the premise that the first server and the second server do not disclose their respective data, and does not expose users that do not overlap each other. In order to combine the characteristics of these users to model.

Part II: Encrypted model training. Once the shared user group is identified, the data can be used to train a machine learning model. In order to ensure the confidentiality of data during the training process, it is necessary to use a third-party collaborative server for encrypted training.

As shown in Figure 4, taking the linear regression model as an example, the training process includes:

Step S402, the collaborative server distributes the public key to the first server and the second server to encrypt the data to be exchanged in the training process.

Step S404, the first server and the second server exchange the intermediate result for calculating the gradient in encrypted form.

Step S406, the first server and the second server respectively perform calculations based on the encrypted gradient values, while the second server calculates the loss according to its tag data, and summarizes the results to the collaborative server.

Step S408, the collaborative server calculates the total gradient value through the summary result and decrypts it.

Step S410, the collaborative server transmits the decrypted gradients back to the first server and the second server respectively.

Step S412, the first server and the second server update the parameters of the respective models according to the gradient.

Step S414, the above steps are iterated until the loss function converges to generate a revised model.

In the process of sample alignment and model training, the respective data of the first server and the second server are kept locally, and the data interaction during training will not lead to data privacy leakage. Therefore, the two parties can cooperate to train the model with the help of federated learning.

The third part: effect incentives. Models obtained by institutions that provide more data will perform better, and the model performance depends on the contributions of the data providers to themselves and others. The effects of these models will be distributed to agencies for feedback on the federal mechanism, and will continue to motivate more agencies to join this data federation.

As shown in FIG. 5 , the server cluster 140 is specifically a collaborative server, and a method for correcting an area portrait according to another embodiment of the present disclosure includes:

Step S502, respectively receiving the screening area information sent by the first server and the second server.

Step S504, taking the intersection of the information of the screening area, and generating the information of the overlapping area.

Step S506, the information of the overlapping area is sent to the first server and the second server.

Step S508, perform an interactive training operation with the first server and/or the second server based on the overlapping area information, so that the first server and/or the second service generate a modified model according to the interactive training result, and modify the respective area to be corrected.

In this embodiment, on the collaborative server side, by performing an intersection operation on the screening region sent by the first server and the screening region sent by the second server, the region IDs both on the first server and the second server are obtained to determine Overlapping regions in multiple regions, the overlapping regions may include reliable regions, so that in the training process of the revised model, the feature information of the overlapping regions stored on the first server and the feature information of the overlapping regions stored on the second server can be combined respectively. Carry out model training, obtain a correction model based on the fusion data, and use the correction model to correct the portrait of the area to be corrected. On the one hand, it can improve the accuracy of the description of the area portrait, thereby improving the reliability of subsequent use of the area portrait. On the one hand, in the whole interaction process, the collaborative server is used to assist in training, thereby helping to reduce the resource occupation of the collaborative server.

In one embodiment, respectively receiving the screening area information sent by the first server and the second server includes: receiving the first screening information sent by the first server and the second screening information sent by the second server, so as to analyze the first screening information intersecting with the second screening information; and receiving the third screening information sent by the first server and the fourth screening information sent by the second server, so as to obtain the intersection of the third screening information and the fourth screening information.

In this embodiment, on the collaborative server side, the first overlapping area, that is, the invalid area, is determined by comparing the first screening area and the second screening area, and the third screening area and the fourth screening area are determined. The second overlapping area in the multiple areas is the reliable area, so that the first overlapping area and the second overlapping area are eliminated from the multiple areas on the first server side and the second server side, and the remaining area is the low reliability area. The area that needs to be corrected is the area that needs to be corrected. Through the above operations, the accuracy of the corrected operation object can be guaranteed.

In one embodiment, performing an interactive training operation with the first server and/or the second server based on the overlapping area information includes: sending key information to the first server and the second server, respectively, so that the first server and/or The second server performs interactive encryption training of the federated learning model based on the key information.

In this embodiment, by sending the key information to the first server and the second server, the regional portrait correction technology in this paper can be used to depict a more accurate urban area portrait under the premise of protecting the security of multi-party data, so as to serve the later stage. Area portrait application scenarios.

In the following, with reference to FIG. 6 , taking the revised urban regional consumption index as an example, the consumption data is taken as the target feature, and the consumption index is taken as the target index, and the correction scheme of the regional portrait of the present disclosure is further described.

Each organization (including but not limited to the first server 10 and the second server 20 as an example) divides the urban area into grids according to Geohash7 (Geohash6 or other division methods can be used), and conducts grid division on the regional population, geographical location, and area in the enterprise database. Consumption, regional POI and other data are preprocessed to obtain the corresponding features of the regional grid. For example, by matching the order data to the address and consumption amount, the regional consumption characteristics are obtained. Combining the data features of the two servers, formulate the first screening rule (total population < 10 & total number of characteristic POIs < 1 & consumption amount < 100) to filter out areas where the consumption index is 0, and the second screening rule (total population > 3 & total POI > 0 & consumption amount >= 100) to filter out areas with high reliability of consumption indicators, and complete the correction process in conjunction with the collaborative server 30 .

Step S602, the first server and the second server respectively screen out areas whose own characteristics satisfy the first screening rule, and transmit such areas to the collaborative server.

Step S604, the collaborative server collects the area ID sets transmitted by each server, and then takes the intersection of the ID sets to obtain the information of the first overlapping area.

Step S606, the collaborative server transmits the information of the first overlapping area to each server.

Among them, the intersection area is defined as the area where the consumption index is 0, and the remaining areas are the areas where the reliability of the consumption indicator is high and the reliability of the consumption indicator is low.

Combining the characteristics of the two platforms again, it is believed that the consumption data of this part of the region can reflect the real consumption level of the region.

Step S608, after deleting the area with the consumption index of 0, the first server and the second server respectively screen out the area whose own characteristics satisfy the second screening characteristic, and transmit the area to the collaborative server.

Step S610, the collaborative server collects the area ID sets transmitted by each server, and then takes the intersection of the ID sets to obtain the information of the second overlapping area.

Step S612, the cooperative server transmits the information of the second overlapping area to each server.

Among them, the intersection area is defined as an area with high reliability of consumption indicators.

In step S614, the regions with the index of 0 and high reliability are eliminated, and the remaining regions are regions to be corrected with low consumption reliability.

Step S616, firstly align the ids of the regions with low reliability and high reliability of both parties, use the regions with high reliability as training data for federated modeling, adjust the parameters to train the model multiple times, and select appropriate parameters to train the best model. Model, as a revised model, the two servers save the model to the local respectively.

In addition, different federated models (such as federated Boosting, federated forest, etc.) can also be adjusted and selected multiple times to select appropriate parameters to train the best model, and the two platforms save the models locally.

Step S618, use the trained model to infer an area with low reliability of the consumption index, and obtain the consumption data of this type of area again to replace the original inaccurate area consumption data.

In step S620, the revised consumption data are clustered to obtain revised urban area consumption indicators.

Specifically, it can generally be clustered into 5-10 categories, and the specific number is determined according to the scene or business requirements. After clustering the consumption data, sort the cluster centers from small to large, and sequentially match the index data of the corresponding cluster to the corresponding score interval, so that the final index score is between 0-100, the consumption index score It is the revised urban regional consumption index. The obtained accurate portrait consumption indicators can be used for consumption analysis and regional consumption power estimation.

It should be noted that the above-mentioned drawings are only schematic illustrations of the processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It is easy to understand that the processes shown in the above figures do not indicate or limit the chronological order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, in multiple modules.

As will be appreciated by one skilled in the art, various aspects of the present invention may be implemented as a system, method or program product. Therefore, various aspects of the present invention can be embodied in the following forms: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", "module" or "system".

Next, referring to FIG. 7 , an apparatus 700 for correcting a region portrait according to this embodiment of the present invention will be described. The apparatus 700 for correcting a region portrait shown in FIG. 7 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.

The correction device 700 of the region image is represented in the form of a hardware module. The components of the region portrait correction device 700 may include, but are not limited to: a transmission module 702, configured to send the screening region information screened out from multiple regions to the collaborative server, to receive the overlapping region information sent by the collaborative server, wherein, The information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; the determination module 704 is used to determine the area to be corrected based on the information of the overlapping area; the interactive training module 706 is used for Invoke the overlapping area information to perform an interactive training operation between the collaborative server and the second server, so as to generate a correction model according to the interactive training result; the correction module 708 is used to correct the area to be corrected based on the correction model, so as to correct the regional portraits of multiple areas .

Next, referring to FIG. 8 , an apparatus 800 for correcting a region portrait according to this embodiment of the present invention will be described. The apparatus 800 for correcting a region portrait shown in FIG. 8 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.

The correction device 800 of the region image is represented in the form of a hardware module. The components of the region portrait correction device 800 may include, but are not limited to: a receiving module 802, for respectively receiving the screening region information sent by the first server and the second server; a processing module 804, for taking the intersection of the screening region information and generating a coincidence The information of the area; the sending module 806 is used to send the information of the overlapping area to the first server and the second server; the auxiliary training module 808 is used to communicate with the first server and/or the first server and/or the first server based on the information of the overlapping area Auxiliary interactive training is performed between the two servers, so that the first server and/or the second service generates a correction model according to the interactive training result, and corrects the respective regions to be corrected based on the correction model.

An electronic device 900 according to this embodiment of the present invention is described below with reference to FIG. 9 . The electronic device 900 shown in FIG. 9 is only an example, and should not impose any limitations on the function and scope of use of the embodiments of the present invention.

As shown in FIG. 9, electronic device 900 takes the form of a general-purpose computing device. Components of the electronic device 900 may include, but are not limited to, the above-mentioned at least one processing unit 910 , the above-mentioned at least one storage unit 920 , and a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910 ).

The storage unit stores program codes, which can be executed by the processing unit 1010, so that the processing unit 910 performs the steps according to various exemplary embodiments of the present invention described in the above-mentioned "Exemplary Methods" section of this specification. For example, the processing unit 1010 may perform steps S202 , S204 to S210 as shown in FIG. 2 , and other steps defined in the method for correcting a region portrait of the present disclosure.

The storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 9201 and/or a cache storage unit 9202 , and may further include a read only storage unit (ROM) 9203 .

The storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, An implementation of a network environment may be included in each or some combination of these examples.

The bus 930 may be representative of one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures bus.

Electronic device 900 may also communicate with one or more external devices 960 (eg, keyboards, pointing devices, Bluetooth devices, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device, and/or communicate with The electronic device 900 can communicate with any device (eg, router, modem, etc.) that communicates with one or more other computing devices. Such communication may take place through input/output (I/O) interface 950 . Also, the electronic device 900 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 950 . As shown, network adapter 950 communicates with other modules of electronic device 900 via bus 930 . It should be understood that, although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and Data backup storage system, etc.

From the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to an embodiment of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium on which a program product capable of implementing the above-described method of the present specification is stored. In some possible implementations, various aspects of the present invention can also be implemented in the form of a program product, which includes program code, when the program product runs on a terminal device, the program code is used to cause the terminal device to execute the above-mentioned description in this specification. The steps according to various exemplary embodiments of the present invention are described in the "Example Methods" section.

A program product for implementing the above method according to an embodiment of the present invention may adopt a portable compact disc read only memory (CD-ROM) and include program codes, and may run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider business via an Internet connection).

It should be noted that although several modules or units of the apparatus for action performance are mentioned in the above detailed description, this division is not mandatory. Indeed, according to embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied.

Additionally, although the various steps of the methods of the present disclosure are depicted in the figures in a particular order, this does not require or imply that the steps must be performed in the particular order or that all illustrated steps must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, and the like.

From the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to an embodiment of the present disclosure.

Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common general knowledge or techniques in the technical field not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the appended claims.

Industrial Applicability

In the solution provided by the present disclosure, by sending the screening area information to the collaborative server, and receiving the overlapping area information obtained by the collaborative server combining the screening area information of the first server and the screening area information of the second server, by determining the overlapping area information, not only can The overlapping area is eliminated from the multiple areas to obtain the area to be corrected, and data fusion with the second server can also be realized. Further, the correction model is obtained based on the fusion data, and the correction model is used to correct the portrait of the area to be corrected. On the one hand, the accuracy of the description of the area portrait can be improved, and the reliability of the subsequent use of the area portrait can be improved, on the other hand. , in the whole interaction process, the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.

Claims

A method for correcting an area portrait, which is applicable to a first server, is characterized in that, it includes:

Send the screening area information screened out from the multiple areas to the collaborative server to receive the information of the overlapping area sent by the collaborative server, wherein the information of the overlapping area is sent by the collaborative server according to the first server The screening area information is generated from the screening area information sent by the second server;

Determine the area to be corrected based on the information of the overlapping area;

invoking the overlapping area information to perform an interactive training operation between the collaborative server and the second server, so as to generate a revised model according to the interactive training result;

The region to be corrected is corrected based on the correction model, so as to correct the region portraits of the multiple regions.
The method for correcting an area portrait according to claim 1, wherein the sending of the screening area information selected from the plurality of areas to the collaborative server to receive the information of the overlapping area sent by the collaborative server comprises:

Perform a screening operation on the plurality of regions based on the first screening rule to obtain first screening region information;

sending the first screening area information to the collaborative server, and receiving the information of the first overlapping area sent by the collaborative server,

Wherein, the information of the first overlapping area is used to indicate an invalid area, and the information of the first overlapping area is used by the collaborative server according to the first screening area information and the second screening area information sent by the second server. generate.
The method for correcting a region portrait according to claim 2, wherein the performing a screening operation on a plurality of regions according to a screening rule, and obtaining the screening region information further comprises:

deleting the first overlapping area in the plurality of areas to obtain a remaining area;

Perform a screening operation on the remaining area based on the second screening rule to obtain third screening area information;

sending the third screening area information to the collaborative server, and receiving the information of the second overlapping area sent by the collaborative server,

Wherein, the information of the second overlapping area is used to indicate a reliable area, and the information of the second overlapping area is used by the collaborative server according to the third screening area information and the fourth screening area information sent by the second server. generate.
The method for correcting a region portrait according to claim 1, wherein the modifying the region to be corrected based on the correction model to correct the region portraits of the multiple regions comprises:

Inputting the regional feature of the region to be corrected into the corrected model to output the corrected target feature;

Using the modified target feature to replace the original target feature in the to-be-corrected region to update the target features of the multiple regions;

determining target indices of the plurality of regions based on the updated target characteristics of the plurality of regions;

The regional profiles of the plurality of regions are modified based on the target index.
The method for correcting an area portrait according to claim 4, wherein the determining the target index of the plurality of areas based on the updated target characteristics of the plurality of areas comprises:

Perform a clustering operation on the target features of the multiple regions, and obtain multiple cluster centers and corresponding cluster clusters;

Sort the plurality of cluster centers, and configure a score interval corresponding to each cluster center;

Matching the clusters to the corresponding score intervals to generate target indices for the plurality of regions.
The method for correcting an area portrait according to claim 4, wherein the determining the target index of the plurality of areas based on the updated target characteristics of the plurality of areas comprises:

inputting the target features of the multiple regions into a preset classification model, so that the classification model outputs the target indices of the multiple regions according to the classification results of the modified target features,

Wherein, the historical target index is trained based on supervised learning to generate the classification model.
The method for correcting an area portrait according to any one of claims 1 to 6, wherein the invoking the overlapping area information performs an interactive training operation between the collaborative server and the second server, to Generating a revised model based on the interactive training results includes:

receiving the key information sent by the collaborative server;

Invoking the key information and the overlapping area information to perform interactive encryption training of the federated learning model with the second server to generate the revised model.
A method for correcting an area portrait, which is applicable to a collaborative server, is characterized in that, it includes:

respectively receiving the screening area information sent by the first server and the second server;

Taking the intersection of the screening area information, the information of the overlapping area is generated;

sending the information of the overlapping area to the first server and the second server;

An interactive training operation is performed between the first server and/or the second server based on the overlapping area information, so that the first server and/or the second service generate a correction based on the interactive training result. model, and correct the respective regions to be corrected based on the corrected model.
The method for correcting a region portrait according to claim 8, wherein the receiving the screening region information sent by the first server and the second server respectively comprises:

receiving the first screening information sent by the first server and the second screening information sent by the second server, so as to obtain the intersection of the first screening information and the second screening information; and

The third screening information sent by the first server and the fourth screening information sent by the second server are received, so as to obtain the intersection of the third screening information and the fourth screening information.
The method for correcting an area portrait according to claim 8, wherein the performing an interactive training operation with the first server and/or the second server based on the overlapping area information comprises:

Send key information to the first server and the second server respectively, so that the first server and/or the second server perform interactive encryption training of the federated learning model based on the key information.
A device for correcting an area portrait, suitable for a first server, is characterized in that, comprising:

The transmission module is configured to send the screening area information selected from the multiple areas to the collaborative server, so as to receive the information of the overlapping area sent by the collaborative server, wherein the information of the overlapping area is determined by the collaborative server according to the information of the overlapping area. The screening area information sent by the first server and the screening area information sent by the second server are generated;

a determining module for determining the area to be corrected based on the information of the overlapping area;

An interactive training module, configured to invoke the overlapping area information to perform an interactive training operation between the collaborative server and the second server, so as to generate a revised model according to the interactive training result;

A correction module, configured to correct the to-be-corrected area based on the correction model, so as to correct the area portraits of the multiple areas.
A device for correcting an area portrait, suitable for a collaborative server, is characterized in that it includes:

a receiving module for respectively receiving the screening area information sent by the first server and the second server;

a processing module, used for taking the intersection of the information of the screening area, and generating the information of the overlapping area;

a sending module, configured to send the information of the overlapping area to the first server and the second server;

An auxiliary training module, configured to perform auxiliary interactive training with the first server and/or the second server based on the overlapping area information, so that the first server and/or the second service the A correction model is generated according to the interactive training result, and the respective regions to be corrected are corrected based on the correction model.
An electronic device, comprising:

processor; and

a memory for storing executable instructions for the processor;

Wherein, the processor is configured to execute the method for modifying a region portrait according to any one of claims 1 to 7 and/or the region according to any one of claims 8 to 10 by executing the executable instructions Image correction method.
A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method for correcting a region portrait according to any one of claims 1 to 10 is implemented.