WO2022105554A1 - Region portrait correction method and apparatus, and electronic device and readable storage medium - Google Patents

Region portrait correction method and apparatus, and electronic device and readable storage medium Download PDF

Info

Publication number
WO2022105554A1
WO2022105554A1 PCT/CN2021/126483 CN2021126483W WO2022105554A1 WO 2022105554 A1 WO2022105554 A1 WO 2022105554A1 CN 2021126483 W CN2021126483 W CN 2021126483W WO 2022105554 A1 WO2022105554 A1 WO 2022105554A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
information
area
screening
region
Prior art date
Application number
PCT/CN2021/126483
Other languages
French (fr)
Chinese (zh)
Inventor
王若兰
刘洋
张钧波
郑宇�
Original Assignee
京东城市(北京)数字科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东城市(北京)数字科技有限公司 filed Critical 京东城市(北京)数字科技有限公司
Publication of WO2022105554A1 publication Critical patent/WO2022105554A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the technical field of machine learning, and in particular, to a method, device, electronic device, and computer-readable storage medium for correcting a region portrait.
  • the purpose of the present disclosure is to provide a method, device, electronic device and computer-readable storage medium for correcting a region portrait, at least to a certain extent, to overcome the problem of poor description accuracy of the region portrait in the related art.
  • a method for correcting a region portrait comprising: sending screening region information selected from a plurality of regions to a collaborative server, so as to receive the overlapping region information sent by the collaborative server, wherein , the information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; the area to be corrected is determined based on the information of the overlapping area; Performing an interactive training operation between the area information, the collaborative server and the second server to generate a correction model according to the interactive training result; correcting the to-be-corrected area based on the correction model to correct the multiple areas area image.
  • the sending the screening area information screened out from the multiple areas to the collaborative server, so as to receive the overlapping area information sent by the collaborative server includes: filtering the multiple areas based on the first filtering rule Perform a screening operation to obtain first screening area information; send the first screening area information to the collaborative server, and receive the information of the first overlapping area sent by the collaborative server, wherein the information of the first overlapping area It is used to indicate an invalid area, and the information of the first overlapping area is generated by the collaborative server according to the first screening area information and the second screening area information sent by the second server.
  • the performing a screening operation on multiple regions according to the screening rule and obtaining the screening region information further includes: deleting the first overlapping region in the multiple regions to obtain the remaining regions; based on the second The screening rule performs a screening operation on the remaining area to obtain first screening area information; sending the second screening area information to the collaborative server, and receiving the information of the second overlapping area sent by the collaborative server, wherein the The information of the second overlapping area is used to indicate a reliable area, and the information of the second overlapping area is generated by the cooperative server according to the third screening area information and the fourth screening area information sent by the second server.
  • the modifying the region to be modified based on the modification model to modify the region portraits of the multiple regions includes: inputting the region features of the region to be modified into the modification model, To output the modified target features; use the modified target features to replace the original target features in the to-be-corrected area, to update the target features of the multiple regions and determine based on the updated target features of the multiple regions target indices of the multiple regions; and correcting the regional portraits of the multiple regions based on the target indices.
  • the determining the target index of the plurality of regions based on the updated target features of the plurality of regions includes: performing a clustering operation on the target features of the plurality of regions, and obtaining a plurality of clusters Class centers and corresponding clusters; sort the plurality of cluster centers, and configure a score interval for each cluster center; match the clusters to the corresponding score intervals to generate target indices for the plurality of regions.
  • the determining the target index of the plurality of regions based on the updated target features of the plurality of regions includes: inputting the target features of the plurality of regions into a preset classification model, so as to be determined by The classification model outputs the target indices of the multiple regions according to the classification results of the revised target features, wherein the historical target indices are trained in a supervised learning manner to generate the classification model.
  • the invoking the overlapping area information and performing an interactive training operation between the collaborative server and the second server, so as to generate a revised model according to the interactive training result includes: receiving a password sent by the collaborative server. key information; invoking the key information and the overlapping area information to perform interactive encryption training of the federated learning model with the second server to generate the revised model.
  • a method for correcting an area portrait including: respectively receiving screening area information sent by a first server and a second server; The information of the overlapping area is sent to the first server and the second server; based on the overlapping area information, an interactive training operation is performed between the first server and/or the second server, so that the The first server and/or the second service generates a correction model according to the interactive training result, and corrects the respective regions to be corrected based on the correction model.
  • the receiving, respectively, the screening area information sent by the first server and the second server includes: receiving the first screening information sent by the first server and the second screening information sent by the second server, to obtain the intersection of the first screening information and the second screening information; and receiving the third screening information sent by the first server and the fourth screening information sent by the second server, to check the first screening information.
  • the third screening information is intersected with the fourth screening information.
  • the performing an interactive training operation with the first server and/or the second server based on the overlapping area information includes: sending a message to the first server and the second server respectively. key information, so that the first server and/or the second server perform interactive encryption training of the federated learning model based on the key information.
  • an apparatus for correcting a region portrait comprising: a sending module configured to send screening region information screened from multiple regions to a collaborative server, so as to receive the coincidence data sent by the collaborative server. area information, wherein, the information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; a determination module is used for based on the overlapping area information The information determines the area to be corrected; the interactive training module is used to call the overlapping area information and the collaborative server and the second server to perform an interactive training operation, so as to generate a correction model according to the interactive training result; the correction module is used for The region to be corrected is corrected based on the correction model, so as to correct the region portraits of the multiple regions.
  • an apparatus for correcting an area portrait including: a transmission module for respectively receiving screening area information sent by a first server and a second server; a processing module for retrieving the screening area information Take the intersection to generate the information of the overlapping area; the sending module is used to send the information of the overlapping area to the first server and the second server; the auxiliary training module is used to send the information of the overlapping area to the first server and the second server; based on the information of the overlapping area and the Auxiliary interactive training is performed between the first server and/or the second server, so that the first server and/or the second service generate a revised model according to the interactive training result, and make corrections based on the revised model respective areas to be corrected.
  • an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute any one of the foregoing by executing the executable instructions Correction method of area image.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any one of the above-mentioned correction methods for a region portrait.
  • the screening area information by sending the screening area information to the collaborative server, and receiving the overlapping area information obtained by the collaborative server combining the screening area information of the first server and the screening area information of the second server, By determining the overlapping area information, not only can the overlapping area be eliminated from multiple areas to obtain the area to be corrected, but also data fusion with the second server can be realized.
  • the correction model is obtained based on the fusion data, and the correction model is used to correct the portrait of the area to be corrected.
  • the accuracy of the description of the area portrait can be improved, thereby improving the reliability of the subsequent use of the area portrait, on the other hand.
  • the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.
  • FIG. 1 shows a schematic diagram of the structure of a correction system for a region portrait in an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of a method for correcting a region portrait in an embodiment of the present disclosure
  • FIG. 3 shows a flowchart of another method for correcting a region portrait in an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of still another method for correcting a region portrait in an embodiment of the present disclosure
  • FIG. 5 shows a flowchart of another method for correcting a region portrait in an embodiment of the present disclosure
  • FIG. 6 shows an interactive schematic diagram of a correction scheme of a region portrait according to an embodiment of the present disclosure
  • FIG. 7 shows a schematic diagram of an apparatus for correcting a region portrait in an embodiment of the present disclosure
  • FIG. 8 shows a schematic diagram of another apparatus for correcting a region portrait in an embodiment of the present disclosure
  • FIG. 9 shows a schematic diagram of an electronic device in an embodiment of the present disclosure.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • Example embodiments can be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
  • the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
  • the solution provided by the present application obtains a correction model by setting fusion data, and uses the correction model to correct the portrait of the region to be corrected.
  • the accuracy of the description of the region portrait can be improved, thereby improving the reliability of the subsequent use of the region portrait.
  • the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.
  • City Profile (multi-factor profile based on big data and machine learning) is a SaaS product for planning, real estate, retail, and many GIS application industries. Its innovation lies in that, on the one hand, it integrates machine learning calculation and interactive visualization to break through the limitations of traditional GIS in the exploration and analysis of multi-dimensional/high-dimensional spatiotemporal data; on the other hand, it integrates massive urban data, Spark/ElasticSearch big data Processing engine, distributed computing, online data processing, online index calculation and multi-factor mining analysis have created an extremely easy-to-use and powerful SaaS service, breaking the professional barriers of GIS, empowering each user, allowing data acquisition, Data processing and multifactor spatial data mining become efficient and easy. At the same time, it supports the secondary development of API/SDK and can easily access the user's existing platform.
  • Multi-party lending refers to a bad user who borrows money from one financial institution and repays it to another lending institution. A large number of such illegal behaviors will collapse the entire financial system. To discover such users, the traditional method is that financial institutions go to a central database to query user information, and each institution must upload all their user information, but doing so is equivalent to exposing all important user privacy and data security of financial institutions. Not allowed under GDPR. Under the federated learning mechanism, there is no need to establish a central database, and any financial institution participating in federated learning can send a new user query request to other institutions in the federation, and other institutions will reply to the user without knowing the specific information of the user. Questions about local lending. This can not only protect the privacy and data integrity of existing users in various financial institutions, but also complete the important issue of querying multi-party lending.
  • FIG. 1 shows a schematic structural diagram of a system for correcting an area portrait in an embodiment of the present disclosure, including multiple terminals 120 and a server cluster 140 .
  • the terminal 120 may be a mobile phone, a game console, a tablet computer, an e-book reader, smart glasses, an MP4 (Moving Picture Experts Group Audio Layer IV, a moving image expert compression standard audio layer 4) player, a smart home device, an AR (Augmented Reality) player , augmented reality) equipment, VR (Virtual Reality, virtual reality) equipment and other mobile terminals, or, the terminal 120 may also be a personal computer (Personal Computer, PC), such as a laptop portable computer and a desktop computer and the like.
  • PC Personal Computer
  • the terminal 120 may be installed with an application program for providing correction of the area portrait.
  • the terminal 120 and the server cluster 140 are connected through a communication network.
  • the communication network is a wired network or a wireless network.
  • the server cluster 140 is a server, or consists of several servers, or a virtualization platform, or a cloud computing service center.
  • the server cluster 140 is used to provide background services for the correction application for providing the regional portrait and the training application for the traffic prediction model.
  • the server cluster 140 undertakes the main computing work, and the terminal 120 undertakes the secondary computing work; alternatively, the server cluster 140 undertakes the secondary computing work, and the terminal 120 undertakes the main computing work; or, the terminal 120 and the server cluster 140 adopt distributed distribution Collaborative computing using a computing architecture.
  • the server cluster 140 is used to store the correction model and prediction method of the region portrait.
  • the clients of the applications installed in different terminals 120 are the same, or the clients of the applications installed on the two terminals 120 are clients of the same type of application on different control system platforms.
  • the specific form of the client of the application program may also be different, for example, the client of the application program may be a mobile phone client, a PC client, or a World Wide Web (Web) client.
  • the client of the application program may be a mobile phone client, a PC client, or a World Wide Web (Web) client.
  • Web World Wide Web
  • the number of the above-mentioned terminals 120 may be more or less.
  • the above-mentioned terminal may be only one, or the above-mentioned terminal may be dozens or hundreds, or more.
  • the embodiments of the present application do not limit the number of terminals and device types.
  • the system may further include a management device (not shown in FIG. 1 ), and the management device and the server cluster 140 are connected through a communication network.
  • the communication network is a wired network or a wireless network.
  • the above-mentioned wireless network or wired network uses standard communication technologies and/or protocols.
  • the network is usually the Internet, but can be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks).
  • data exchanged over a network is represented using technologies and/or formats including Hyper Text Mark-up Language (HTML), Extensible Markup Language (XML), and the like.
  • HTML Hyper Text Mark-up Language
  • XML Extensible Markup Language
  • you can also use services such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec), etc.
  • SSL Secure Socket Layer
  • TLS Transport Layer Security
  • VPN Virtual Private Network
  • IPsec Internet Protocol Security
  • Conventional encryption techniques to encrypt all or some of the links.
  • custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques
  • FIG. 2 shows a flowchart of a method for correcting a region portrait in an embodiment of the present disclosure.
  • the methods provided in the embodiments of the present disclosure may be executed by any electronic device with computing processing capability, for example, the terminal 120 and/or the server cluster 140 in FIG. 1 .
  • the server cluster 140 is used as the execution subject for illustration.
  • first server and the second server may be multiple, and the cooperation server interacts with the first server and the second server respectively.
  • the server cluster 140 is specifically the first server, and the method for performing the correction of the regional portrait includes the following steps:
  • step S202 the screening area information screened out from multiple areas is sent to the collaborative server to receive the information of the overlapping area sent by the collaborative server, wherein the information of the overlapping area is determined by the collaborative server according to the screening area information sent by the first server and the information of the overlapping area.
  • the screening area information sent by the second server is generated.
  • the screening area information may specifically be the ID of the screening area.
  • Geohash6, Geohash7 or other division methods for grid division to generate multiple regions can be carried out through the feature data such as regional population, geographical location, and regional POI (Point of interest) in the enterprise database.
  • the regional features corresponding to the regional grid are obtained by preprocessing, and the regional consumption is used as the target features of the region to obtain the regional portrait of the region based on the above features and target features.
  • a collaboration server can be understood as a collaboration platform.
  • the collaboration platform is used to perform screening operations based on the screening area information sent by the first server and the second server, and on the other hand, it is used to ensure that the first server and/or the second server Under the condition of the security of the feature information of the multiple regions, assist the first server and/or the second server to perform model training.
  • the second server is specifically the information exchanged parameters with the first server, and further combined with the cooperative operation of the cooperative server, the second server can perform the same processing process as the first server, and the first server and the second server can store a Different regional features of the same region.
  • Step S204 Determine the to-be-corrected area based on the information of the overlapping area.
  • the screening area information screened from multiple areas can be executed based on the screening rules, and an intersection operation is performed on the collaborative server based on the area information screened out by the screening rules to determine the first The area shared by the server and the second server and meets the screening rules at the same time.
  • the purpose of screening is to obtain areas that do not need to be corrected, so that areas to be corrected are obtained by eliminating areas that do not need to be corrected from multiple areas.
  • the collaborative server performs a screening operation on the screening area information sent by the first server and the second server to obtain the overlapping area information.
  • the area to be corrected can be obtained by eliminating the overlapping area from multiple areas; After obtaining the overlapping area information, relevant data of the second server can be introduced based on the overlapping area information, so as to realize data fusion between different servers.
  • Step S206 invoking the overlapping area information to perform an interactive training operation between the collaboration server and the second server, so as to generate a revised model according to the interactive training result.
  • the overlapping area information can be used as training data, and the revised model can be generated through interactive training with the collaborative server and the second server.
  • Step S208 correcting the region to be corrected based on the correction model, so as to correct the region portraits of the multiple regions.
  • the correction model is used to correct target features with poor reliability in multiple regions, such as target features in regions with low reliability of consumption indicators, and the target features may be consumption data.
  • regional portraits can be understood as portraits generated based on features such as regional population, geographic location, regional POI (Point of Interest), and regional consumption.
  • features such as regional population, geographic location, regional POI (Point of Interest), and regional consumption.
  • the multiple areas may include overlapping areas and areas to be corrected.
  • the information obtained by sending the screening area information to the collaborative server and receiving the filtering area information of the first server and the screening area information of the second server is obtained by the collaborative server.
  • For the overlapping area information by determining the overlapping area information, not only can the overlapping area be eliminated from multiple areas to obtain the area to be corrected, but also data fusion with the second server can be realized.
  • the correction model is obtained based on the fusion data, and the correction model is used to correct the portrait of the area to be corrected.
  • the accuracy of the description of the area portrait can be improved, thereby improving the reliability of the subsequent use of the area portrait, on the other hand.
  • the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.
  • the subsequent use of regional portraits includes business scenarios such as regional location selection and advertisement placement in the later stage.
  • sending the screening area information screened out from the multiple areas to the collaborative server to receive the overlapping area information sent by the collaborative server includes: performing a filtering operation on the multiple areas based on the first filtering rule, to obtain The first screening area information; send the first screening area information to the collaborative server, and receive the information of the first overlapping area sent by the collaborative server, wherein the information of the first overlapping area is used to indicate an invalid area, and the information of the first overlapping area Generated by the collaborative server according to the first screening area information and the second screening area information sent by the second server.
  • the reliable area refers to the area where the reliability of the consumption index is 0, that is, it is considered that the consumption data in this area cannot reflect the real consumption level.
  • the first filtering rule may be: total population ⁇ C&total number of characteristic POIs ⁇ D&consumption amount ⁇ E, so as to filter out the area where the consumption index is 0.
  • the first server and the second server respectively screen out areas whose own characteristics satisfy the first screening rule, and transmit such areas to the collaborative server.
  • performing a screening operation on multiple regions according to the screening rule and obtaining the screening region information further includes: deleting the first overlapping region in the multiple regions to obtain the remaining region; performing the screening operation on the remaining regions based on the second screening rule The screening operation is performed to obtain the first screening area information; the second screening area information is sent to the collaborative server, and the information of the second overlapping area sent by the collaborative server is received, wherein the information of the second overlapping area is used to indicate a reliable area, and the first The information of the double overlapping area is generated by the collaboration server according to the third screening area information and the fourth screening area information sent by the second server.
  • the reliable area refers to the area with high reliability of the consumption index, that is, the consumption data in this area is considered to reflect the real consumption level.
  • the second screening rule can be generated based on the above characteristics. Taking the consumption index of the region as an example, the second screening rule is set as the regional population is greater than A & the total number of POIs is greater than 0 & the consumption amount is greater than or equal to B. Based on the above second screening rule, it is possible to filter Identify areas with high confidence in consumption indicators.
  • the remaining areas are the areas where the consumption index is not 0, and further dividing the areas with high index credibility and low index credibility, and then the index can be used to
  • the regions with high reliability are subjected to federated modeling, and then the target features of the regions with low index reliability are re-determined to realize the revision of the regional portrait.
  • step S206 generating the region to be corrected based on the information of the first overlapping region includes: deleting the first overlapping region and the second overlapping region in the multiple regions to obtain the region to be corrected.
  • the first overlapping area that is, the invalid area
  • the second overlapping area that is, the reliable area
  • the overlapping area is eliminated from multiple areas, and the remaining area is the area with low reliability, that is, the area that needs to be corrected.
  • step S208 is based on the correction model to amend the area to be corrected, to revise the area portrait of the area to be corrected includes:
  • step S302 the regional features of the region to be corrected are input into the correction model to output the corrected target features.
  • step S304 the modified target features are used to replace the original target features in the region to be corrected, so as to update the target features of multiple regions.
  • the trained model is used to infer the regions with low reliability of the target index, and the target features of such regions are retrieved to replace the original inaccurate region target features.
  • the target indices of the multiple regions are determined, including:
  • Step S306 perform a clustering operation on the target features of multiple regions, and obtain multiple cluster centers and corresponding cluster clusters.
  • Step S308 sort the plurality of cluster centers, and configure a score interval corresponding to each cluster center.
  • step S310 the clusters are matched to corresponding score intervals to generate target indices of multiple regions.
  • Step S312 correcting the regional portraits of the multiple regions based on the target index.
  • the revised consumption data is clustered, and the data can be clustered into 5-10 categories, and the specific number is determined according to the scene or business requirements.
  • the cluster centers are sorted from small to large, and the index data of the corresponding clusters are matched to the corresponding score interval in turn, so that the final index score is between 0-100, the target index score It is the revised urban area target index.
  • the obtained precise profile target index can be used for consumption analysis and regional consumption power estimation.
  • determining the target indices of the multiple regions based on the updated target features of the multiple regions can also be achieved by adopting the following steps, including: inputting the target features of the multiple regions into a preset classification model, so that the The classification model outputs the target index of the area to be corrected according to the classification result of the corrected target feature, wherein the historical target index is trained based on supervised learning to generate the classification model.
  • invoking the overlapping area information and performing an interactive training operation between the collaborative server and the second server to generate a revised model according to the interactive training result includes: receiving key information sent by the collaborative server; invoking the key information and the overlapping area The information pair performs the interactive encryption training of the federated learning model with the second server to generate a revised model.
  • federated learning when revising the urban area portrait index, federated learning can be used to correct the regional portrait index when the enterprise data is not stored in the database.
  • This method trains the federated model by using the regions with high reliability of the portrait indicators, and corrects the indicators characteristics of the regions with low reliability.
  • the regional portrait correction technology in this paper can achieve a more accurate urban regional portrait under the premise of protecting the security of multi-party data, so as to serve the later application scenarios of regional portraits.
  • multi-party secure computing and other methods based on cross-domain modeling of multi-party data security and privacy protection can also be used to replace the federated learning algorithm.
  • the system architecture of federated learning by taking a scenario including two data owners (ie, the first server and the second server) as an example.
  • the architecture can be extended to scenarios involving multiple data owners.
  • the first server and the second server jointly train a machine learning model, and their business systems have relevant data about their respective users.
  • the second server also has the label data that the model needs to predict.
  • the first server and the second server cannot directly exchange data, and a federated learning system can be used to build a model.
  • the architecture of the federated learning system consists of three parts.
  • Part 1 Encrypted sample alignment. Since the user groups of the two companies do not completely overlap, the system uses encryption-based user sample alignment technology to confirm the common users of both parties on the premise that the first server and the second server do not disclose their respective data, and does not expose users that do not overlap each other. In order to combine the characteristics of these users to model.
  • Part II Encrypted model training. Once the shared user group is identified, the data can be used to train a machine learning model. In order to ensure the confidentiality of data during the training process, it is necessary to use a third-party collaborative server for encrypted training.
  • the training process includes:
  • Step S402 the collaborative server distributes the public key to the first server and the second server to encrypt the data to be exchanged in the training process.
  • Step S404 the first server and the second server exchange the intermediate result for calculating the gradient in encrypted form.
  • Step S406 the first server and the second server respectively perform calculations based on the encrypted gradient values, while the second server calculates the loss according to its tag data, and summarizes the results to the collaborative server.
  • Step S408 the collaborative server calculates the total gradient value through the summary result and decrypts it.
  • Step S410 the collaborative server transmits the decrypted gradients back to the first server and the second server respectively.
  • Step S412 the first server and the second server update the parameters of the respective models according to the gradient.
  • Step S414 the above steps are iterated until the loss function converges to generate a revised model.
  • the respective data of the first server and the second server are kept locally, and the data interaction during training will not lead to data privacy leakage. Therefore, the two parties can cooperate to train the model with the help of federated learning.
  • the third part effect incentives. Models obtained by institutions that provide more data will perform better, and the model performance depends on the contributions of the data providers to themselves and others. The effects of these models will be distributed to agencies for feedback on the federal mechanism, and will continue to motivate more agencies to join this data federation.
  • the server cluster 140 is specifically a collaborative server, and a method for correcting an area portrait according to another embodiment of the present disclosure includes:
  • Step S502 respectively receiving the screening area information sent by the first server and the second server.
  • Step S504 taking the intersection of the information of the screening area, and generating the information of the overlapping area.
  • Step S506 the information of the overlapping area is sent to the first server and the second server.
  • Step S508 perform an interactive training operation with the first server and/or the second server based on the overlapping area information, so that the first server and/or the second service generate a modified model according to the interactive training result, and modify the respective area to be corrected.
  • the region IDs both on the first server and the second server are obtained to determine Overlapping regions in multiple regions
  • the overlapping regions may include reliable regions, so that in the training process of the revised model, the feature information of the overlapping regions stored on the first server and the feature information of the overlapping regions stored on the second server can be combined respectively.
  • Carry out model training obtain a correction model based on the fusion data, and use the correction model to correct the portrait of the area to be corrected. On the one hand, it can improve the accuracy of the description of the area portrait, thereby improving the reliability of subsequent use of the area portrait.
  • the collaborative server is used to assist in training, thereby helping to reduce the resource occupation of the collaborative server.
  • respectively receiving the screening area information sent by the first server and the second server includes: receiving the first screening information sent by the first server and the second screening information sent by the second server, so as to analyze the first screening information intersecting with the second screening information; and receiving the third screening information sent by the first server and the fourth screening information sent by the second server, so as to obtain the intersection of the third screening information and the fourth screening information.
  • the first overlapping area that is, the invalid area
  • the second overlapping area in the multiple areas is the reliable area, so that the first overlapping area and the second overlapping area are eliminated from the multiple areas on the first server side and the second server side, and the remaining area is the low reliability area.
  • the area that needs to be corrected is the area that needs to be corrected.
  • performing an interactive training operation with the first server and/or the second server based on the overlapping area information includes: sending key information to the first server and the second server, respectively, so that the first server and/or The second server performs interactive encryption training of the federated learning model based on the key information.
  • the regional portrait correction technology in this paper can be used to depict a more accurate urban area portrait under the premise of protecting the security of multi-party data, so as to serve the later stage. Area portrait application scenarios.
  • the consumption data is taken as the target feature, and the consumption index is taken as the target index, and the correction scheme of the regional portrait of the present disclosure is further described.
  • Each organization (including but not limited to the first server 10 and the second server 20 as an example) divides the urban area into grids according to Geohash7 (Geohash6 or other division methods can be used), and conducts grid division on the regional population, geographical location, and area in the enterprise database.
  • Geohash7 Geohash6 or other division methods can be used
  • Consumption, regional POI and other data are preprocessed to obtain the corresponding features of the regional grid. For example, by matching the order data to the address and consumption amount, the regional consumption characteristics are obtained.
  • the first screening rule total population ⁇ 10 & total number of characteristic POIs ⁇ 1 & consumption amount ⁇ 100
  • Step S602 the first server and the second server respectively screen out areas whose own characteristics satisfy the first screening rule, and transmit such areas to the collaborative server.
  • Step S604 the collaborative server collects the area ID sets transmitted by each server, and then takes the intersection of the ID sets to obtain the information of the first overlapping area.
  • Step S606 the collaborative server transmits the information of the first overlapping area to each server.
  • intersection area is defined as the area where the consumption index is 0, and the remaining areas are the areas where the reliability of the consumption indicator is high and the reliability of the consumption indicator is low.
  • the consumption data of this part of the region can reflect the real consumption level of the region.
  • Step S608 after deleting the area with the consumption index of 0, the first server and the second server respectively screen out the area whose own characteristics satisfy the second screening characteristic, and transmit the area to the collaborative server.
  • Step S610 the collaborative server collects the area ID sets transmitted by each server, and then takes the intersection of the ID sets to obtain the information of the second overlapping area.
  • Step S612 the cooperative server transmits the information of the second overlapping area to each server.
  • intersection area is defined as an area with high reliability of consumption indicators.
  • step S614 the regions with the index of 0 and high reliability are eliminated, and the remaining regions are regions to be corrected with low consumption reliability.
  • Step S616 firstly align the ids of the regions with low reliability and high reliability of both parties, use the regions with high reliability as training data for federated modeling, adjust the parameters to train the model multiple times, and select appropriate parameters to train the best model.
  • Model as a revised model, the two servers save the model to the local respectively.
  • federated models such as federated Boosting, federated forest, etc.
  • federated Boosting federated forest, etc.
  • Step S618, use the trained model to infer an area with low reliability of the consumption index, and obtain the consumption data of this type of area again to replace the original inaccurate area consumption data.
  • step S620 the revised consumption data are clustered to obtain revised urban area consumption indicators.
  • the consumption index score It is the revised urban regional consumption index.
  • the obtained accurate portrait consumption indicators can be used for consumption analysis and regional consumption power estimation.
  • aspects of the present invention may be implemented as a system, method or program product. Therefore, various aspects of the present invention can be embodied in the following forms: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", “module” or "system”.
  • FIG. 7 an apparatus 700 for correcting a region portrait according to this embodiment of the present invention will be described.
  • the apparatus 700 for correcting a region portrait shown in FIG. 7 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.
  • the correction device 700 of the region image is represented in the form of a hardware module.
  • the components of the region portrait correction device 700 may include, but are not limited to: a transmission module 702, configured to send the screening region information screened out from multiple regions to the collaborative server, to receive the overlapping region information sent by the collaborative server, wherein, The information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; the determination module 704 is used to determine the area to be corrected based on the information of the overlapping area; the interactive training module 706 is used for Invoke the overlapping area information to perform an interactive training operation between the collaborative server and the second server, so as to generate a correction model according to the interactive training result; the correction module 708 is used to correct the area to be corrected based on the correction model, so as to correct the regional portraits of multiple areas .
  • FIG. 8 an apparatus 800 for correcting a region portrait according to this embodiment of the present invention will be described.
  • the apparatus 800 for correcting a region portrait shown in FIG. 8 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.
  • the correction device 800 of the region image is represented in the form of a hardware module.
  • the components of the region portrait correction device 800 may include, but are not limited to: a receiving module 802, for respectively receiving the screening region information sent by the first server and the second server; a processing module 804, for taking the intersection of the screening region information and generating a coincidence The information of the area; the sending module 806 is used to send the information of the overlapping area to the first server and the second server; the auxiliary training module 808 is used to communicate with the first server and/or the first server and/or the first server based on the information of the overlapping area Auxiliary interactive training is performed between the two servers, so that the first server and/or the second service generates a correction model according to the interactive training result, and corrects the respective regions to be corrected based on the correction model.
  • FIG. 9 An electronic device 900 according to this embodiment of the present invention is described below with reference to FIG. 9 .
  • the electronic device 900 shown in FIG. 9 is only an example, and should not impose any limitations on the function and scope of use of the embodiments of the present invention.
  • electronic device 900 takes the form of a general-purpose computing device.
  • Components of the electronic device 900 may include, but are not limited to, the above-mentioned at least one processing unit 910 , the above-mentioned at least one storage unit 920 , and a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910 ).
  • the storage unit stores program codes, which can be executed by the processing unit 1010, so that the processing unit 910 performs the steps according to various exemplary embodiments of the present invention described in the above-mentioned "Exemplary Methods" section of this specification.
  • the processing unit 1010 may perform steps S202 , S204 to S210 as shown in FIG. 2 , and other steps defined in the method for correcting a region portrait of the present disclosure.
  • the storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 9201 and/or a cache storage unit 9202 , and may further include a read only storage unit (ROM) 9203 .
  • RAM random access storage unit
  • ROM read only storage unit
  • the storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, An implementation of a network environment may be included in each or some combination of these examples.
  • the bus 930 may be representative of one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures bus.
  • Electronic device 900 may also communicate with one or more external devices 960 (eg, keyboards, pointing devices, Bluetooth devices, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device, and/or communicate with The electronic device 900 can communicate with any device (eg, router, modem, etc.) that communicates with one or more other computing devices. Such communication may take place through input/output (I/O) interface 950 . Also, the electronic device 900 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 950 . As shown, network adapter 950 communicates with other modules of electronic device 900 via bus 930 .
  • I/O input/output
  • the electronic device 900 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 950
  • the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to an embodiment of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • a computer-readable storage medium on which a program product capable of implementing the above-described method of the present specification is stored.
  • various aspects of the present invention can also be implemented in the form of a program product, which includes program code, when the program product runs on a terminal device, the program code is used to cause the terminal device to execute the above-mentioned description in this specification.
  • the steps according to various exemplary embodiments of the present invention are described in the "Example Methods" section.
  • a program product for implementing the above method according to an embodiment of the present invention may adopt a portable compact disc read only memory (CD-ROM) and include program codes, and may run on a terminal device, such as a personal computer.
  • CD-ROM compact disc read only memory
  • the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider business via an Internet connection).
  • LAN local area network
  • WAN wide area network
  • an external computing device eg, using an Internet service provider business via an Internet connection
  • modules or units of the apparatus for action performance are mentioned in the above detailed description, this division is not mandatory. Indeed, according to embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied.
  • the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to an embodiment of the present disclosure.
  • a computing device which may be a personal computer, a server, a mobile terminal, or a network device, etc.
  • the collaborative server by sending the screening area information to the collaborative server, and receiving the overlapping area information obtained by the collaborative server combining the screening area information of the first server and the screening area information of the second server, by determining the overlapping area information, not only can The overlapping area is eliminated from the multiple areas to obtain the area to be corrected, and data fusion with the second server can also be realized. Further, the correction model is obtained based on the fusion data, and the correction model is used to correct the portrait of the area to be corrected. On the one hand, the accuracy of the description of the area portrait can be improved, and the reliability of the subsequent use of the area portrait can be improved, on the other hand. , in the whole interaction process, the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A region portrait correction method and apparatus, and an electronic device and a computer-readable storage medium, which relate to the field of machine learning. The region portrait correction method comprises: sending, to a collaborative server, screened region information that is obtained by means of screening a plurality of regions, so as to receive overlapping region information sent by the collaborative server, wherein the overlapping region information is generated by the collaborative server according to screened region information sent by a first server and screened region information sent by a second server (S202); determining, on the basis of the overlapping region information, a region to be corrected (S204); calling the overlapping region information to execute an interactive training operation with the collaborative server and the second server, so as to generate a correction model according to an interactive training result (S206); and correcting, on the basis of the correction model, the region to be corrected, so as to correct region portraits of the plurality of regions (S208). By means of the technical solution of the present disclosure, the accuracy of the description of a region portrait can be improved, thereby improving the reliability of subsequent utilization of the region portrait.

Description

区域画像的修正方法、装置、电子设备和可读存储介质Correction method, device, electronic device and readable storage medium for area portrait
本公开要求于2020年11月18日提交的申请号为202011291786.9、名称为“区域画像的修正方法、装置、电子设备和可读存储介质”的中国专利申请的优先权,该中国专利申请的全部内容通过引用全部并入本文。The present disclosure claims the priority of the Chinese patent application with the application number 202011291786.9 and the title of "The Correction Method, Apparatus, Electronic Device and Readable Storage Medium of Area Portrait" filed on November 18, 2020, the entirety of the Chinese patent application The contents are incorporated herein by reference in their entirety.
技术领域technical field
本公开涉及机器学习技术领域,尤其涉及一种区域画像的修正方法、装置、电子设备和计算机可读存储介质。The present disclosure relates to the technical field of machine learning, and in particular, to a method, device, electronic device, and computer-readable storage medium for correcting a region portrait.
背景技术Background technique
区域画像的构建对选址、城市精细化管理等具有重要的意义,但由于单个机构自身的经营特点及覆盖的用户人群有限,利用企业单方的数据难以精准刻画出某个区域的某个目标指数,因此为得到较为准确的区域画像,机构之间需要进行数据融合以结合多方数据进行区域中目标指数的修正。The construction of regional portraits is of great significance to site selection and refined urban management. However, due to the operating characteristics of a single institution and the limited user population it covers, it is difficult to accurately describe a target index in a certain region using unilateral data from an enterprise. Therefore, in order to obtain a more accurate regional portrait, data fusion needs to be carried out between institutions to combine multi-party data to correct the target index in the region.
但是若企业间某些数据无法共享,则严重影响区域画像的精准度,这对后期做城市服务建设或商业建设有巨大影响。However, if some data cannot be shared between enterprises, it will seriously affect the accuracy of regional portraits, which will have a huge impact on urban service construction or commercial construction in the later stage.
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above Background section is only for enhancement of understanding of the background of the present disclosure, and therefore may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.
发明内容SUMMARY OF THE INVENTION
本公开的目的在于提供一种区域画像的修正方法、装置、电子设备和计算机可读存储介质,至少在一定程度上克服相关技术中对区域画像的描述精准度差的问题。The purpose of the present disclosure is to provide a method, device, electronic device and computer-readable storage medium for correcting a region portrait, at least to a certain extent, to overcome the problem of poor description accuracy of the region portrait in the related art.
本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。Other features and advantages of the present disclosure will become apparent from the following detailed description, or be learned in part by practice of the present disclosure.
根据本公开的第一方面,提供一种区域画像的修正方法,包括:将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收所述协同服务器发送的重合区域的信息,其中,所述重合区域的信息由所述协同服务器根据所述第一服务器发送的筛选区域信息与第二服务器发送的筛选区域信息生成;基于所述重合区域的信息确定待修正区域;调用所述重合区域信息与所述协同服务器以及所述第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型;基于所述修正模型对所述待修正区域进行修正,以修正所述多个区域的区域画像。According to a first aspect of the present disclosure, there is provided a method for correcting a region portrait, comprising: sending screening region information selected from a plurality of regions to a collaborative server, so as to receive the overlapping region information sent by the collaborative server, wherein , the information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; the area to be corrected is determined based on the information of the overlapping area; Performing an interactive training operation between the area information, the collaborative server and the second server to generate a correction model according to the interactive training result; correcting the to-be-corrected area based on the correction model to correct the multiple areas area image.
在一个实施例中,所述将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收所述协同服务器发送的重合区域的信息包括:基于第一筛选规则对所述多个区域执行 筛选操作,以得到第一筛选区域信息;将所述第一筛选区域信息发送至协同服务器,并接收所述协同服务器发送的第一重合区域的信息,其中,所述第一重合区域的信息用于表示无效区域,所述第一重合区域的信息由所述协同服务器根据所述第一筛选区域信息与所述第二服务器发送的第二筛选区域信息生成。In one embodiment, the sending the screening area information screened out from the multiple areas to the collaborative server, so as to receive the overlapping area information sent by the collaborative server, includes: filtering the multiple areas based on the first filtering rule Perform a screening operation to obtain first screening area information; send the first screening area information to the collaborative server, and receive the information of the first overlapping area sent by the collaborative server, wherein the information of the first overlapping area It is used to indicate an invalid area, and the information of the first overlapping area is generated by the collaborative server according to the first screening area information and the second screening area information sent by the second server.
在一个实施例中,所述根据筛选规则对多个区域执行筛选操作,并得到筛选区域信息还包括:删除所述多个区域中的所述第一重合区域,以得到剩余区域;基于第二筛选规则对所述剩余区域执行筛选操作,以得到第一筛选区域信息;将所述第二筛选区域信息发送至协同服务器,并接收所述协同服务器发送的第二重合区域的信息,其中,所述第二重合区域的信息用于表示可靠区域,所述第二重合区域的信息由所述协同服务器根据所述第三筛选区域信息与所述第二服务器发送的第四筛选区域信息生成。In one embodiment, the performing a screening operation on multiple regions according to the screening rule and obtaining the screening region information further includes: deleting the first overlapping region in the multiple regions to obtain the remaining regions; based on the second The screening rule performs a screening operation on the remaining area to obtain first screening area information; sending the second screening area information to the collaborative server, and receiving the information of the second overlapping area sent by the collaborative server, wherein the The information of the second overlapping area is used to indicate a reliable area, and the information of the second overlapping area is generated by the cooperative server according to the third screening area information and the fourth screening area information sent by the second server.
在一个实施例中,所述基于所述修正模型对所述待修正区域进行修正,以修正所述多个区域的区域画像包括:将所述待修正区域的区域特征输入到所述修正模型,以输出修正后的目标特征;采用所述修正的目标特征代替所述待修正区域中原始的目标特征,以更新所述多个区域的目标特征基于更新后的所述多个区域的目标特征确定所述多个区域的目标指数;基于所述目标指数修正所述多个区域的区域画像。In one embodiment, the modifying the region to be modified based on the modification model to modify the region portraits of the multiple regions includes: inputting the region features of the region to be modified into the modification model, To output the modified target features; use the modified target features to replace the original target features in the to-be-corrected area, to update the target features of the multiple regions and determine based on the updated target features of the multiple regions target indices of the multiple regions; and correcting the regional portraits of the multiple regions based on the target indices.
在一个实施例中,所述基于更新后的所述多个区域的目标特征确定所述多个区域的目标指数包括:对所述多个区域的目标特征执行聚类操作,并得到多个聚类中心与对应的聚类簇;对所述多个聚类中心进行排序,并且将每个聚类中心对应配置一个得分区间;将所述聚类簇匹配至对应的所述得分区间,以生成所述多个区域的目标指数。In one embodiment, the determining the target index of the plurality of regions based on the updated target features of the plurality of regions includes: performing a clustering operation on the target features of the plurality of regions, and obtaining a plurality of clusters Class centers and corresponding clusters; sort the plurality of cluster centers, and configure a score interval for each cluster center; match the clusters to the corresponding score intervals to generate target indices for the plurality of regions.
在一个实施例中,所述基于更新后的所述多个区域的目标特征确定所述多个区域的目标指数包括:将所述多个区域的目标特征输入预设的分类模型中,以由所述分类模型根据对所述修正后的目标特征的分类结果,输出所述多个区域的目标指数,其中,基于监督学习的方式对历史的目标指数进行训练,以生成所述分类模型。In one embodiment, the determining the target index of the plurality of regions based on the updated target features of the plurality of regions includes: inputting the target features of the plurality of regions into a preset classification model, so as to be determined by The classification model outputs the target indices of the multiple regions according to the classification results of the revised target features, wherein the historical target indices are trained in a supervised learning manner to generate the classification model.
在一个实施例中,所述调用所述重合区域信息与所述协同服务器以及所述第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型包括:接收所述协同服务器发送的密钥信息;调用所述密钥信息与所述重合区域信息对与所述第二服务器进行联邦学习模型的交互加密训练,生成所述修正模型。In one embodiment, the invoking the overlapping area information and performing an interactive training operation between the collaborative server and the second server, so as to generate a revised model according to the interactive training result, includes: receiving a password sent by the collaborative server. key information; invoking the key information and the overlapping area information to perform interactive encryption training of the federated learning model with the second server to generate the revised model.
根据本公开的第二方面,提供一种区域画像的修正方法,包括:分别接收第一服务器与第二服务器发送的筛选区域信息;对所述筛选区域信息取交集,生成重合区域的信息将所述重合区域的信息发送至所述第一服务器与所述第二服务器;基于所述重合区域信息与所述第一服务器和/或所述第二服务器之间执行交互训练操作,以使所述第一服务器和/或所述第二服务所述根据交互训练结果生成修正模型,以及基于所述修正模型修正各自的待修正区域。According to a second aspect of the present disclosure, there is provided a method for correcting an area portrait, including: respectively receiving screening area information sent by a first server and a second server; The information of the overlapping area is sent to the first server and the second server; based on the overlapping area information, an interactive training operation is performed between the first server and/or the second server, so that the The first server and/or the second service generates a correction model according to the interactive training result, and corrects the respective regions to be corrected based on the correction model.
在一个实施例中,所述分别接收第一服务器与第二服务器发送的筛选区域信息包括:接收所述第一服务器发送的第一筛选信息,以及所述第二服务器发送的第二筛选信息,以 对所述第一筛选信息与所述第二筛选息取交集;以及接收所述第一服务器发送的第三筛选信息,以及所述第二服务器发送的第四筛选信息,以对所述第三筛选信息与所述第四筛选息取交集。In one embodiment, the receiving, respectively, the screening area information sent by the first server and the second server includes: receiving the first screening information sent by the first server and the second screening information sent by the second server, to obtain the intersection of the first screening information and the second screening information; and receiving the third screening information sent by the first server and the fourth screening information sent by the second server, to check the first screening information. The third screening information is intersected with the fourth screening information.
在一个实施例中,所述基于所述重合区域信息与所述第一服务器和/或所述第二服务器之间执行交互训练操作包括:分别向所述第一服务器与所述第二服务器发送密钥信息,以使所述第一服务器和/或所述第二服务器基于所述密钥信息进行联邦学习模型的交互加密训练。In one embodiment, the performing an interactive training operation with the first server and/or the second server based on the overlapping area information includes: sending a message to the first server and the second server respectively. key information, so that the first server and/or the second server perform interactive encryption training of the federated learning model based on the key information.
根据本公开的第三方面,提供一种区域画像的修正装置,包括:发送模块,用于将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收所述协同服务器发送的重合区域的信息,其中,所述重合区域的信息由所述协同服务器根据所述第一服务器发送的筛选区域信息与第二服务器发送的筛选区域信息生成;确定模块,用于基于所述重合区域的信息确定待修正区域;交互训练模块,用于调用所述重合区域信息与所述协同服务器以及所述第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型;修正模块,用于基于所述修正模型对所述待修正区域进行修正,以修正所述多个区域的区域画像。According to a third aspect of the present disclosure, there is provided an apparatus for correcting a region portrait, comprising: a sending module configured to send screening region information screened from multiple regions to a collaborative server, so as to receive the coincidence data sent by the collaborative server. area information, wherein, the information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; a determination module is used for based on the overlapping area information The information determines the area to be corrected; the interactive training module is used to call the overlapping area information and the collaborative server and the second server to perform an interactive training operation, so as to generate a correction model according to the interactive training result; the correction module is used for The region to be corrected is corrected based on the correction model, so as to correct the region portraits of the multiple regions.
根据本公开的第四方面,提供一种区域画像的修正装置,包括:传输模块,用于分别接收第一服务器与第二服务器发送的筛选区域信息;处理模块,用于对所述筛选区域信息取交集,生成重合区域的信息;发送模块,用于将所述重合区域的信息发送至所述第一服务器与所述第二服务器;辅助训练模块,用于基于所述重合区域信息与所述第一服务器和/或所述第二服务器之间执行辅助交互训练,以使所述第一服务器和/或所述第二服务所述根据交互训练结果生成修正模型,以及基于所述修正模型修正各自的待修正区域。According to a fourth aspect of the present disclosure, there is provided an apparatus for correcting an area portrait, including: a transmission module for respectively receiving screening area information sent by a first server and a second server; a processing module for retrieving the screening area information Take the intersection to generate the information of the overlapping area; the sending module is used to send the information of the overlapping area to the first server and the second server; the auxiliary training module is used to send the information of the overlapping area to the first server and the second server; based on the information of the overlapping area and the Auxiliary interactive training is performed between the first server and/or the second server, so that the first server and/or the second service generate a revised model according to the interactive training result, and make corrections based on the revised model respective areas to be corrected.
根据本公开的第五方面,提供一种电子设备,包括:处理器;以及存储器,用于存储处理器的可执行指令;其中,处理器配置为经由执行可执行指令来执行上述任意一项的区域画像的修正方法。According to a fifth aspect of the present disclosure, there is provided an electronic device, comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute any one of the foregoing by executing the executable instructions Correction method of area image.
根据本公开的第六方面,提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述任意一项的区域画像的修正方法。According to a sixth aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any one of the above-mentioned correction methods for a region portrait.
本公开的实施例所提供的区域画像的修正方案,通过将筛选区域信息发送至协同服务器,并接收协同服务器结合第一服务器的筛选区域信息与第二服务器的筛选区域信息得到的重叠区域信息,通过确定重叠区域信息,不但能够在多个区域中剔除重合区域以得到待修正区域,还能够实现与第二服务器之间的数据融合。In the area portrait correction scheme provided by the embodiments of the present disclosure, by sending the screening area information to the collaborative server, and receiving the overlapping area information obtained by the collaborative server combining the screening area information of the first server and the screening area information of the second server, By determining the overlapping area information, not only can the overlapping area be eliminated from multiple areas to obtain the area to be corrected, but also data fusion with the second server can be realized.
进一步地,基于融合数据得到修正模型,以采用修正模型对待修正区域的画像进行修正,一方面,能够提高对区域画像的描述的精准度,进而提高对区域画像后续利用的可靠性,另一方面,在整个交互过程中,协同服务器用于协助训练,从而有利于减少对协同服务器的资源占用。Further, the correction model is obtained based on the fusion data, and the correction model is used to correct the portrait of the area to be corrected. On the one hand, the accuracy of the description of the area portrait can be improved, thereby improving the reliability of the subsequent use of the area portrait, on the other hand. , in the whole interaction process, the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure. Obviously, the drawings in the following description are only some embodiments of the present disclosure, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.
图1示出本公开实施例中一种区域画像的修正系统结构的示意图;1 shows a schematic diagram of the structure of a correction system for a region portrait in an embodiment of the present disclosure;
图2示出本公开实施例中一种区域画像的修正方法的流程图;2 shows a flowchart of a method for correcting a region portrait in an embodiment of the present disclosure;
图3示出本公开实施例中另一种区域画像的修正方法的流程图;3 shows a flowchart of another method for correcting a region portrait in an embodiment of the present disclosure;
图4示出本公开实施例中再一种区域画像的修正方法的流程图;FIG. 4 shows a flowchart of still another method for correcting a region portrait in an embodiment of the present disclosure;
图5示出本公开实施例中又一种区域画像的修正方法的流程图;5 shows a flowchart of another method for correcting a region portrait in an embodiment of the present disclosure;
图6示出本公开实施例的区域画像的修正方案的交互示意图;FIG. 6 shows an interactive schematic diagram of a correction scheme of a region portrait according to an embodiment of the present disclosure;
图7示出本公开实施例中一种区域画像的修正装置的示意图;7 shows a schematic diagram of an apparatus for correcting a region portrait in an embodiment of the present disclosure;
图8示出本公开实施例中另一种区域画像的修正装置的示意图;8 shows a schematic diagram of another apparatus for correcting a region portrait in an embodiment of the present disclosure;
图9示出本公开实施例中一种电子设备的示意图。FIG. 9 shows a schematic diagram of an electronic device in an embodiment of the present disclosure.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments, however, can be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repeated descriptions will be omitted. Some of the block diagrams shown in the figures are functional entities that do not necessarily necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
本申请提供的方案,通过设置融合数据得到修正模型,以采用修正模型对待修正区域的画像进行修正,一方面,能够提高对区域画像的描述的精准度,进而提高对区域画像后续利用的可靠性,另一方面,在整个交互过程中,协同服务器用于协助训练,从而有利于减少对协同服务器的资源占用。The solution provided by the present application obtains a correction model by setting fusion data, and uses the correction model to correct the portrait of the region to be corrected. On the one hand, the accuracy of the description of the region portrait can be improved, thereby improving the reliability of the subsequent use of the region portrait. , on the other hand, in the whole interaction process, the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.
为了便于理解,下面首先对本申请涉及到的几个名词进行解释。For ease of understanding, the following first explains several terms involved in this application.
城市画像(基于大数据和机器学习的多因子画像),是一个面向规划、房地产、零售,以及众多GIS应用行业的SaaS产品。它的创新点在于,一方面集成机器学习计算和交互可视化,从技术方法上突破传统GIS在多维/高维时空数据探索分析上的局限;另一方面, 集成海量城市数据、Spark/ElasticSearch大数据处理引擎、分布式计算、在线数据处理、在线指标计算和多因子挖掘分析,打造了一个极其方便易用并且功能强大的SaaS服务,打破GIS的专业壁垒,赋能每个用户,让数据获取、数据处理和多因子空间数据挖掘变得高效,并且容易。同时,支持API/SDK二次开发,轻松接入用户现有的平台中。City Profile (multi-factor profile based on big data and machine learning) is a SaaS product for planning, real estate, retail, and many GIS application industries. Its innovation lies in that, on the one hand, it integrates machine learning calculation and interactive visualization to break through the limitations of traditional GIS in the exploration and analysis of multi-dimensional/high-dimensional spatiotemporal data; on the other hand, it integrates massive urban data, Spark/ElasticSearch big data Processing engine, distributed computing, online data processing, online index calculation and multi-factor mining analysis have created an extremely easy-to-use and powerful SaaS service, breaking the professional barriers of GIS, empowering each user, allowing data acquisition, Data processing and multifactor spatial data mining become efficient and easy. At the same time, it supports the secondary development of API/SDK and can easily access the user's existing platform.
联邦学习:当多个数据拥有方(例如企业)Fi(i=1\,…\,N)想要联合他们各自的数据Di训练机器学习模型时,传统做法是把数据整合到一方,并利用数据D={Di\,i=1\,…\,N}进行训练并得到模型M_sum。然而,该方案由于涉及隐私和数据安全等法律问题通常难以实施。为解决这一问题,我们引入了联邦学习。联邦学习是指数据拥有方Fi在不用给出己方数据Di的情况下,也可进行模型训练得到模型M_fed的计算过程,并能够保证模型M_fed的效果V_fed与模型M_sum的效果V_sum之间的差距足够小,即|V_fed-V_sum|<δ,这里δ是任意小的一个正量值。Federated Learning: When multiple data owners (such as enterprises) Fi (i=1\,...\,N) want to jointly train a machine learning model with their respective data Di, the traditional approach is to integrate the data into one party and use the The data D={Di\,i=1\,...\,N} are trained and the model M_sum is obtained. However, this scheme is often difficult to implement due to legal issues such as privacy and data security. To solve this problem, we introduce federated learning. Federated learning means that the data owner Fi can perform model training to obtain the calculation process of the model M_fed without giving its own data Di, and can ensure that the gap between the effect of the model M_fed V_fed and the effect of the model M_sum V_sum is sufficient. small, that is, |V_fed-V_sum|<δ, where δ is an arbitrarily small positive value.
多方借贷:是指某不良用户在一个金融机构借贷后还钱给另一个借贷机构,大量这种非法行为会让整个金融系统崩溃。要想发现这样的用户,传统的做法是金融机构去某中心数据库查询用户信息,而各个机构必须上传他们所有的用户信息,但这样做等于暴露了金融机构的所有重要用户隐私和数据安全,这在GDPR下是不允许的。在联邦学习机制下,没有必要建立一个中心数据库,而任何参与联邦学习的金融机构可以向联邦内的其他机构发出新用户查询请求,其他机构在不知道这个用户具体信息的前提下,回答该用户关于本地借贷的提问。这样既能保护已有用户在各个金融机构的隐私和数据完整性,同时也能完成查询多方借贷这个重要问题。Multi-party lending: refers to a bad user who borrows money from one financial institution and repays it to another lending institution. A large number of such illegal behaviors will collapse the entire financial system. To discover such users, the traditional method is that financial institutions go to a central database to query user information, and each institution must upload all their user information, but doing so is equivalent to exposing all important user privacy and data security of financial institutions. Not allowed under GDPR. Under the federated learning mechanism, there is no need to establish a central database, and any financial institution participating in federated learning can send a new user query request to other institutions in the federation, and other institutions will reply to the user without knowing the specific information of the user. Questions about local lending. This can not only protect the privacy and data integrity of existing users in various financial institutions, but also complete the important issue of querying multi-party lending.
本申请实施例提供的方案涉及网络建模和机器学习等技术,具体通过如下实施例进行说明。The solutions provided by the embodiments of the present application involve technologies such as network modeling and machine learning, and are specifically described by the following embodiments.
图1示出本公开实施例中一种区域画像的修正的系统的结构示意图,包括多个终端120和服务器集群140。FIG. 1 shows a schematic structural diagram of a system for correcting an area portrait in an embodiment of the present disclosure, including multiple terminals 120 and a server cluster 140 .
终端120可以是手机、游戏主机、平板电脑、电子书阅读器、智能眼镜、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、智能家居设备、AR(Augmented Reality,增强现实)设备、VR(Virtual Reality,虚拟现实)设备等移动终端,或者,终端120也可以是个人计算机(Personal Computer,PC),比如膝上型便携计算机和台式计算机等等。The terminal 120 may be a mobile phone, a game console, a tablet computer, an e-book reader, smart glasses, an MP4 (Moving Picture Experts Group Audio Layer IV, a moving image expert compression standard audio layer 4) player, a smart home device, an AR (Augmented Reality) player , augmented reality) equipment, VR (Virtual Reality, virtual reality) equipment and other mobile terminals, or, the terminal 120 may also be a personal computer (Personal Computer, PC), such as a laptop portable computer and a desktop computer and the like.
其中,终端120中可以安装有用于提供区域画像的修正的应用程序。Wherein, the terminal 120 may be installed with an application program for providing correction of the area portrait.
终端120与服务器集群140之间通过通信网络相连。可选的,通信网络是有线网络或无线网络。The terminal 120 and the server cluster 140 are connected through a communication network. Optionally, the communication network is a wired network or a wireless network.
服务器集群140是一台服务器,或者由若干台服务器组成,或者是一个虚拟化平台,或者是一个云计算服务中心。服务器集群140用于为提供区域画像的修正应用程序以及流量预测模型的训练应用程序提供后台服务。可选地,服务器集群140承担主要计算工作,终端120承担次要计算工作;或者,服务器集群140承担次要计算工作,终端120承担主 要计算工作;或者,终端120和服务器集群140之间采用分布式计算架构进行协同计算。The server cluster 140 is a server, or consists of several servers, or a virtualization platform, or a cloud computing service center. The server cluster 140 is used to provide background services for the correction application for providing the regional portrait and the training application for the traffic prediction model. Optionally, the server cluster 140 undertakes the main computing work, and the terminal 120 undertakes the secondary computing work; alternatively, the server cluster 140 undertakes the secondary computing work, and the terminal 120 undertakes the main computing work; or, the terminal 120 and the server cluster 140 adopt distributed distribution Collaborative computing using a computing architecture.
在一些可选的实施例中,服务器集群140用于存储区域画像的修正模型以及预测方法等。In some optional embodiments, the server cluster 140 is used to store the correction model and prediction method of the region portrait.
可选地,不同的终端120中安装的应用程序的客户端是相同的,或两个终端120上安装的应用程序的客户端是不同控制系统平台的同一类型应用程序的客户端。基于终端平台的不同,该应用程序的客户端的具体形态也可以不同,比如,该应用程序客户端可以是手机客户端、PC客户端或者全球广域网(World Wide Web,Web)客户端等。Optionally, the clients of the applications installed in different terminals 120 are the same, or the clients of the applications installed on the two terminals 120 are clients of the same type of application on different control system platforms. Based on different terminal platforms, the specific form of the client of the application program may also be different, for example, the client of the application program may be a mobile phone client, a PC client, or a World Wide Web (Web) client.
本领域技术人员可以知晓,上述终端120的数量可以更多或更少。比如上述终端可以仅为一个,或者上述终端为几十个或几百个,或者更多数量。本申请实施例对终端的数量和设备类型不加以限定。Those skilled in the art may know that the number of the above-mentioned terminals 120 may be more or less. For example, the above-mentioned terminal may be only one, or the above-mentioned terminal may be dozens or hundreds, or more. The embodiments of the present application do not limit the number of terminals and device types.
可选的,该系统还可以包括管理设备(图1未示出),该管理设备与服务器集群140之间通过通信网络相连。可选的,通信网络是有线网络或无线网络。Optionally, the system may further include a management device (not shown in FIG. 1 ), and the management device and the server cluster 140 are connected through a communication network. Optionally, the communication network is a wired network or a wireless network.
可选的,上述的无线网络或有线网络使用标准通信技术和/或协议。网络通常为因特网、但也可以是任何网络,包括但不限于局域网(Local Area Network,LAN)、城域网(Metropolitan Area Network,MAN)、广域网(Wide Area Network,WAN)、移动、有线或者无线网络、专用网络或者虚拟专用网络的任何组合)。在一些实施例中,使用包括超文本标记语言(Hyper Text Mark-up Language,HTML)、可扩展标记语言(Extensible MarkupLanguage,XML)等的技术和/或格式来代表通过网络交换的数据。此外还可以使用诸如安全套接字层(Secure Socket Layer,SSL)、传输层安全(Transport Layer Security,TLS)、虚拟专用网络(Virtual Private Network,VPN)、网际协议安全(Internet ProtocolSecurity,IPsec)等常规加密技术来加密所有或者一些链路。在另一些实施例中,还可以使用定制和/或专用数据通信技术取代或者补充上述数据通信技术。Optionally, the above-mentioned wireless network or wired network uses standard communication technologies and/or protocols. The network is usually the Internet, but can be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks). In some embodiments, data exchanged over a network is represented using technologies and/or formats including Hyper Text Mark-up Language (HTML), Extensible Markup Language (XML), and the like. In addition, you can also use services such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec), etc. Conventional encryption techniques to encrypt all or some of the links. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.
下面,将结合附图及实施例对本示例实施方式中的区域画像的修正方法与流量预测模型的训练方法中的各个步骤进行更详细的说明。Hereinafter, each step in the method for correcting the region portrait and the method for training the traffic prediction model in this exemplary embodiment will be described in more detail with reference to the accompanying drawings and embodiments.
图2示出本公开实施例中一种区域画像的修正方法流程图。本公开实施例提供的方法可以由任意具备计算处理能力的电子设备执行,例如如图1中的终端120和/或服务器集群140。在下面的举例说明中,以服务器集群140为执行主体进行示例说明。FIG. 2 shows a flowchart of a method for correcting a region portrait in an embodiment of the present disclosure. The methods provided in the embodiments of the present disclosure may be executed by any electronic device with computing processing capability, for example, the terminal 120 and/or the server cluster 140 in FIG. 1 . In the following illustration, the server cluster 140 is used as the execution subject for illustration.
其中,第一服务器与第二服务器可以均为多个,协同服务器分别与第一服务器以及第二服务器交互。Wherein, the first server and the second server may be multiple, and the cooperation server interacts with the first server and the second server respectively.
如图2所示,服务器集群140具体为第一服务器,执行区域画像的修正方法,包括以下步骤:As shown in FIG. 2 , the server cluster 140 is specifically the first server, and the method for performing the correction of the regional portrait includes the following steps:
步骤S202,将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收协同服务器发送的重合区域的信息,其中,重合区域的信息由协同服务器根据第一服务器发送的筛选区域信息与第二服务器发送的筛选区域信息生成。In step S202, the screening area information screened out from multiple areas is sent to the collaborative server to receive the information of the overlapping area sent by the collaborative server, wherein the information of the overlapping area is determined by the collaborative server according to the screening area information sent by the first server and the information of the overlapping area. The screening area information sent by the second server is generated.
其中,筛选区域信息具体可以为筛选区域的ID。The screening area information may specifically be the ID of the screening area.
通过采用Geohash6、Geohash7或其他划分方法进行网格划分,以生成多个区域,具体地,可以通过对企业数据库中的区域人口、地理位置、区域POI(Point of interesting,兴趣点)等特征数据进行预处理得到区域网格对应的区域特征,将区域消费等作为该区域的目标特征,以基于上述特征与目标特征得到该区域的区域画像。By adopting Geohash6, Geohash7 or other division methods for grid division to generate multiple regions, specifically, it can be carried out through the feature data such as regional population, geographical location, and regional POI (Point of interest) in the enterprise database. The regional features corresponding to the regional grid are obtained by preprocessing, and the regional consumption is used as the target features of the region to obtain the regional portrait of the region based on the above features and target features.
另外,协同服务器可以理解为协作平台,协作平台一方面用于基于第一服务器与第二服务器发送的筛选区域信息执行筛选操作,另一方面用于在保证第一服务器和/或第二服务器上的多个区域的特征信息的安全性的条件下,协助第一服务器和/或第二服务器进行模型训练。第二服务器具体为与第一服务器进行参数交换的信息,并且进一步结合协同服务器的协同操作,使第二服务器能够与第一服务器执行相同的处理过程,在第一服务器与第二服务器可以存储有相同区域的不同区域特征。In addition, a collaboration server can be understood as a collaboration platform. On the one hand, the collaboration platform is used to perform screening operations based on the screening area information sent by the first server and the second server, and on the other hand, it is used to ensure that the first server and/or the second server Under the condition of the security of the feature information of the multiple regions, assist the first server and/or the second server to perform model training. The second server is specifically the information exchanged parameters with the first server, and further combined with the cooperative operation of the cooperative server, the second server can perform the same processing process as the first server, and the first server and the second server can store a Different regional features of the same region.
步骤S204,基于重合区域的信息确定待修正区域。Step S204: Determine the to-be-corrected area based on the information of the overlapping area.
本领域的技术人员能够理解的是,从多个区域中筛选出的筛选区域信息,可以基于筛选规则执行,通过筛选规则筛选出的区域信息,在协同服务器上进行取交集操作,以确定第一服务器与第二服务器上共有的,并同时满足筛选规则的区域,筛选的目的是为了得到不需要修正的区域,从而通过从多个区域中剔除不需要修正的区域,得到待修正区域。Those skilled in the art can understand that the screening area information screened from multiple areas can be executed based on the screening rules, and an intersection operation is performed on the collaborative server based on the area information screened out by the screening rules to determine the first The area shared by the server and the second server and meets the screening rules at the same time. The purpose of screening is to obtain areas that do not need to be corrected, so that areas to be corrected are obtained by eliminating areas that do not need to be corrected from multiple areas.
具体地,协同服务器对第一服务器与第二服务器发送的筛选区域信息进行筛选操作,得到重合区域信息,一方面,通过在多个区域中剔除重合区域能够得到待修正区域,另一方面,通过得到重合区域信息,能够基于重合区域信息引入第二服务器的相关数据,以实现不同服务器之间的数据融合。Specifically, the collaborative server performs a screening operation on the screening area information sent by the first server and the second server to obtain the overlapping area information. On the one hand, the area to be corrected can be obtained by eliminating the overlapping area from multiple areas; After obtaining the overlapping area information, relevant data of the second server can be introduced based on the overlapping area information, so as to realize data fusion between different servers.
步骤S206,调用重合区域信息与协同服务器以及第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型。Step S206, invoking the overlapping area information to perform an interactive training operation between the collaboration server and the second server, so as to generate a revised model according to the interactive training result.
其中,可以将重合区域信息作为训练数据,通过与协同服务器、以及第二服务器之间进行交互训练,生成修正模型。Wherein, the overlapping area information can be used as training data, and the revised model can be generated through interactive training with the collaborative server and the second server.
步骤S208,基于修正模型对待修正区域进行修正,以修正多个区域的区域画像。Step S208 , correcting the region to be corrected based on the correction model, so as to correct the region portraits of the multiple regions.
在该实施例中,修正模型用于修正多个区域中可靠性较差的目标特征,比如消费指标可信度低的区域的目标特征,目标特征可以为消费数据。In this embodiment, the correction model is used to correct target features with poor reliability in multiple regions, such as target features in regions with low reliability of consumption indicators, and the target features may be consumption data.
另外,区域画像可以理解为基于区域人口、地理位置、区域POI(Point of interesting,兴趣点)、区域消费等特征生成的画像。In addition, regional portraits can be understood as portraits generated based on features such as regional population, geographic location, regional POI (Point of Interest), and regional consumption.
在该实施例中,多个区域可以包括重合区域与待修正区域,通过将筛选区域信息发送至协同服务器,并接收协同服务器结合第一服务器的筛选区域信息与第二服务器的筛选区域信息得到的重叠区域信息,通过确定重叠区域信息,不但能够在多个区域中剔除重合区域以得到待修正区域,还能够实现与第二服务器之间的数据融合。In this embodiment, the multiple areas may include overlapping areas and areas to be corrected. The information obtained by sending the screening area information to the collaborative server and receiving the filtering area information of the first server and the screening area information of the second server is obtained by the collaborative server. For the overlapping area information, by determining the overlapping area information, not only can the overlapping area be eliminated from multiple areas to obtain the area to be corrected, but also data fusion with the second server can be realized.
进一步地,基于融合数据得到修正模型,以采用修正模型对待修正区域的画像进行修正,一方面,能够提高对区域画像的描述的精准度,进而提高对区域画像后续利用的可靠 性,另一方面,在整个交互过程中,协同服务器用于协助训练,从而有利于减少对协同服务器的资源占用。Further, the correction model is obtained based on the fusion data, and the correction model is used to correct the portrait of the area to be corrected. On the one hand, the accuracy of the description of the area portrait can be improved, thereby improving the reliability of the subsequent use of the area portrait, on the other hand. , in the whole interaction process, the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.
例如,区域画像后续利用包括后期的区域选址以及广告投放等业务场景。For example, the subsequent use of regional portraits includes business scenarios such as regional location selection and advertisement placement in the later stage.
在一个实施例中,将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收协同服务器发送的重合区域的信息包括:基于第一筛选规则对多个区域执行筛选操作,以得到第一筛选区域信息;将第一筛选区域信息发送至协同服务器,并接收协同服务器发送的第一重合区域的信息,其中,第一重合区域的信息用于表示无效区域,第一重合区域的信息由协同服务器根据第一筛选区域信息与第二服务器发送的第二筛选区域信息生成。In one embodiment, sending the screening area information screened out from the multiple areas to the collaborative server to receive the overlapping area information sent by the collaborative server includes: performing a filtering operation on the multiple areas based on the first filtering rule, to obtain The first screening area information; send the first screening area information to the collaborative server, and receive the information of the first overlapping area sent by the collaborative server, wherein the information of the first overlapping area is used to indicate an invalid area, and the information of the first overlapping area Generated by the collaborative server according to the first screening area information and the second screening area information sent by the second server.
以消费指标为例,可靠区域指消费指标可信度为0的区域,即认为该区域的消费数据无法反映真实消费水平。Taking the consumption index as an example, the reliable area refers to the area where the reliability of the consumption index is 0, that is, it is considered that the consumption data in this area cannot reflect the real consumption level.
例如,第一筛选规则可以为:人口总数<C&特征POI总数<D&消费金额<E,以筛选出消费指标为0的区域。第一服务器与第二服务器分别筛选出自身特征满足第一筛选规则的区域,并将该类区域传递到协同服务器。For example, the first filtering rule may be: total population<C&total number of characteristic POIs<D&consumption amount<E, so as to filter out the area where the consumption index is 0. The first server and the second server respectively screen out areas whose own characteristics satisfy the first screening rule, and transmit such areas to the collaborative server.
在一个实施例中,根据筛选规则对多个区域执行筛选操作,并得到筛选区域信息还包括:删除多个区域中的第一重合区域,以得到剩余区域;基于第二筛选规则对剩余区域执行筛选操作,以得到第一筛选区域信息;将第二筛选区域信息发送至协同服务器,并接收协同服务器发送的第二重合区域的信息,其中,第二重合区域的信息用于表示可靠区域,第二重合区域的信息由协同服务器根据第三筛选区域信息与第二服务器发送的第四筛选区域信息生成。In one embodiment, performing a screening operation on multiple regions according to the screening rule and obtaining the screening region information further includes: deleting the first overlapping region in the multiple regions to obtain the remaining region; performing the screening operation on the remaining regions based on the second screening rule The screening operation is performed to obtain the first screening area information; the second screening area information is sent to the collaborative server, and the information of the second overlapping area sent by the collaborative server is received, wherein the information of the second overlapping area is used to indicate a reliable area, and the first The information of the double overlapping area is generated by the collaboration server according to the third screening area information and the fourth screening area information sent by the second server.
以消费指标为例,可靠区域指消费指标可信度高的区域,即认为该区域的消费数据能够反映真实消费水平。Taking the consumption index as an example, the reliable area refers to the area with high reliability of the consumption index, that is, the consumption data in this area is considered to reflect the real consumption level.
另外,第二筛选规则可以基于上述特征生成,以区域的消费指标为例,将第二筛选规则设置为区域人口大于A&POI总数大于0&消费金额大于或等于B,基于上述第二筛选规则,能够筛选出消费指标可信度高的区域。In addition, the second screening rule can be generated based on the above characteristics. Taking the consumption index of the region as an example, the second screening rule is set as the regional population is greater than A & the total number of POIs is greater than 0 & the consumption amount is greater than or equal to B. Based on the above second screening rule, it is possible to filter Identify areas with high confidence in consumption indicators.
在该实施例中,通过先筛选出消费指标为0的区域,剩余区域则为消费指标不为0的区域,进一步划分指标可信度高和指标可信度低的区域,进而能够利用指标可信度高的区域进行联邦建模,然后对指标可信度低的区域的目标特征进行重新确定,以实现区域画像的修订。In this embodiment, by first screening out the areas where the consumption index is 0, the remaining areas are the areas where the consumption index is not 0, and further dividing the areas with high index credibility and low index credibility, and then the index can be used to The regions with high reliability are subjected to federated modeling, and then the target features of the regions with low index reliability are re-determined to realize the revision of the regional portrait.
在一个实施例中,步骤S206中,基于第一重合区域的信息生成待修正区域包括:删除多个区域中的第一重合区域与第二重合区域,以得到待修正区域。In one embodiment, in step S206, generating the region to be corrected based on the information of the first overlapping region includes: deleting the first overlapping region and the second overlapping region in the multiple regions to obtain the region to be corrected.
在该实施例中,通过设置第一筛选规则与第二筛选规则,确定多个区域中第一重合区域,即无效区域,以及第二重合区域,即可靠区域,将第一重合区域与第二重合区域在多个区域中剔除,剩余区域即为可信度不高的区域,即需要修正的区域,通过上述操作,能够保证修正操作对象的准确性。In this embodiment, by setting the first screening rule and the second screening rule, the first overlapping area, that is, the invalid area, and the second overlapping area, that is, the reliable area, among the multiple areas are determined, and the first overlapping area and the second overlapping area are determined. The overlapping area is eliminated from multiple areas, and the remaining area is the area with low reliability, that is, the area that needs to be corrected. Through the above operations, the accuracy of the corrected operation object can be guaranteed.
如图3所示,在一个实施例中,步骤S208基于修正模型对待修正区域进行修正,以 修正待修正区域的区域画像包括:As shown in Figure 3, in one embodiment, step S208 is based on the correction model to amend the area to be corrected, to revise the area portrait of the area to be corrected includes:
步骤S302,将待修正区域的区域特征输入到修正模型,以输出修正后的目标特征。In step S302, the regional features of the region to be corrected are input into the correction model to output the corrected target features.
步骤S304,采用修正的目标特征代替待修正区域中原始的目标特征,以更新多个区域的目标特征。In step S304, the modified target features are used to replace the original target features in the region to be corrected, so as to update the target features of multiple regions.
具体地,利用训练好的模型推断目标指数可信度低的区域,重新得到该类区域的目标特征以替换原始不准确的区域目标特征。Specifically, the trained model is used to infer the regions with low reliability of the target index, and the target features of such regions are retrieved to replace the original inaccurate region target features.
基于更新后的多个区域的目标特征确定多个区域的目标指数,具体包括:Based on the updated target features of the multiple regions, the target indices of the multiple regions are determined, including:
步骤S306,对多个区域的目标特征执行聚类操作,并得到多个聚类中心与对应的聚类簇。Step S306, perform a clustering operation on the target features of multiple regions, and obtain multiple cluster centers and corresponding cluster clusters.
步骤S308,对多个聚类中心进行排序,并且将每个聚类中心对应配置一个得分区间。Step S308, sort the plurality of cluster centers, and configure a score interval corresponding to each cluster center.
步骤S310,将聚类簇匹配至对应的得分区间,以生成多个区域的目标指数。In step S310, the clusters are matched to corresponding score intervals to generate target indices of multiple regions.
步骤S312,基于目标指数修正多个区域的区域画像。Step S312, correcting the regional portraits of the multiple regions based on the target index.
在该实施例中,对修正后的消费数据进行聚类,可聚类为5-10类,具体个数根据场景或业务需求进行判断。In this embodiment, the revised consumption data is clustered, and the data can be clustered into 5-10 categories, and the specific number is determined according to the scene or business requirements.
在对目标特征进行聚类后,对聚类中心从小到大排序,并依次将对应聚类簇的指标数据匹配到相应的得分区间,使得最终指标得分在0-100之间,该目标指数得分即为修正后的城市区域目标指数。得到的精确画像目标指数可以用来做消费分析及区域消费力预估等。After the target features are clustered, the cluster centers are sorted from small to large, and the index data of the corresponding clusters are matched to the corresponding score interval in turn, so that the final index score is between 0-100, the target index score It is the revised urban area target index. The obtained precise profile target index can be used for consumption analysis and regional consumption power estimation.
在一个实施例中,基于更新后的多个区域的目标特征确定多个区域的目标指数,还可以采用以下步骤实现,包括:将多个区域的目标特征输入预设的分类模型中,以由分类模型根据对修正后的目标特征的分类结果,输出待修正区域的目标指数,其中,基于监督学习的方式对历史的目标指数进行训练,以生成分类模型。In one embodiment, determining the target indices of the multiple regions based on the updated target features of the multiple regions can also be achieved by adopting the following steps, including: inputting the target features of the multiple regions into a preset classification model, so that the The classification model outputs the target index of the area to be corrected according to the classification result of the corrected target feature, wherein the historical target index is trained based on supervised learning to generate the classification model.
在一个实施例中,调用重合区域信息与协同服务器以及第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型包括:接收协同服务器发送的密钥信息;调用密钥信息与重合区域信息对与第二服务器进行联邦学习模型的交互加密训练,生成修正模型。In one embodiment, invoking the overlapping area information and performing an interactive training operation between the collaborative server and the second server to generate a revised model according to the interactive training result includes: receiving key information sent by the collaborative server; invoking the key information and the overlapping area The information pair performs the interactive encryption training of the federated learning model with the second server to generate a revised model.
在该实施例中,在做城市区域画像指标修正的时候,利用联邦学习可以在企业数据不出库的情况下,进行区域画像指标修正。该方法利用画像指标可信度高的区域训练联邦模型,修正可信度低的区域的指标特征。相对于由单方数据得到的区域画像,利用本文的区域画像修正技术可以达到在保护多方数据安全的前提下刻画出更精准的城市区域画像,以服务于后期区域画像应用场景。In this embodiment, when revising the urban area portrait index, federated learning can be used to correct the regional portrait index when the enterprise data is not stored in the database. This method trains the federated model by using the regions with high reliability of the portrait indicators, and corrects the indicators characteristics of the regions with low reliability. Compared with the regional portrait obtained from unilateral data, the regional portrait correction technology in this paper can achieve a more accurate urban regional portrait under the premise of protecting the security of multi-party data, so as to serve the later application scenarios of regional portraits.
另外,在原始数据不出库的情况下,也可以采用多方安全计算等基于多方数据安全隐私保护跨域建模的方法来替代联邦学习算法。In addition, in the case that the original data is not stored in the database, multi-party secure computing and other methods based on cross-domain modeling of multi-party data security and privacy protection can also be used to replace the federated learning algorithm.
具体地,我们以包含两个数据拥有方(即第一服务器和第二服务器)的场景为例介绍联邦学习的系统构架。该构架可扩展至包含多个数据拥有方的场景。假设第一服务器和第 二服务器联合训练一个机器学习模型,它们的业务系统分别拥有各自用户的相关数据。此外,第二服务器还拥有模型需要预测的标签数据。出于数据隐私保护和安全考虑,第一服务器和第二服务器无法直接进行数据交换,可使用联邦学习系统建立模型。联邦学习系统构架由三部分构成。Specifically, we introduce the system architecture of federated learning by taking a scenario including two data owners (ie, the first server and the second server) as an example. The architecture can be extended to scenarios involving multiple data owners. Suppose that the first server and the second server jointly train a machine learning model, and their business systems have relevant data about their respective users. In addition, the second server also has the label data that the model needs to predict. For data privacy protection and security considerations, the first server and the second server cannot directly exchange data, and a federated learning system can be used to build a model. The architecture of the federated learning system consists of three parts.
第一部分:加密样本对齐。由于两家的用户群体并非完全重合,系统利用基于加密的用户样本对齐技术,在第一服务器和第二服务器不公开各自数据的前提下确认双方的共有用户,并且不暴露不互相重叠的用户,以便联合这些用户的特征进行建模。Part 1: Encrypted sample alignment. Since the user groups of the two companies do not completely overlap, the system uses encryption-based user sample alignment technology to confirm the common users of both parties on the premise that the first server and the second server do not disclose their respective data, and does not expose users that do not overlap each other. In order to combine the characteristics of these users to model.
第二部分:加密模型训练。在确定共有用户群体后,就可以利用这些数据训练机器学习模型。为了保证训练过程中数据的保密性,需要借助第三方协同服务器进行加密训练。Part II: Encrypted model training. Once the shared user group is identified, the data can be used to train a machine learning model. In order to ensure the confidentiality of data during the training process, it is necessary to use a third-party collaborative server for encrypted training.
如图4所示,以线性回归模型为例,训练过程包括:As shown in Figure 4, taking the linear regression model as an example, the training process includes:
步骤S402,协同服务器把公钥分发给第一服务器和第二服务器,用以对训练过程中需要交换的数据进行加密。Step S402, the collaborative server distributes the public key to the first server and the second server to encrypt the data to be exchanged in the training process.
步骤S404,第一服务器和第二服务器之间以加密形式交互用于计算梯度的中间结果。Step S404, the first server and the second server exchange the intermediate result for calculating the gradient in encrypted form.
步骤S406,第一服务器和第二服务器分别基于加密的梯度值进行计算,同时第二服务器根据其标签数据计算损失,并把结果汇总给协同服务器。Step S406, the first server and the second server respectively perform calculations based on the encrypted gradient values, while the second server calculates the loss according to its tag data, and summarizes the results to the collaborative server.
步骤S408,协同服务器通过汇总结果计算总梯度值并将其解密。Step S408, the collaborative server calculates the total gradient value through the summary result and decrypts it.
步骤S410,协同服务器将解密后的梯度分别回传给第一服务器和第二服务器。Step S410, the collaborative server transmits the decrypted gradients back to the first server and the second server respectively.
步骤S412,第一服务器和第二服务器根据梯度更新各自模型的参数。Step S412, the first server and the second server update the parameters of the respective models according to the gradient.
步骤S414,迭代上述步骤直至损失函数收敛,以生成修正模型。Step S414, the above steps are iterated until the loss function converges to generate a revised model.
在样本对齐及模型训练过程中,第一服务器和第二服务器各自的数据均保留在本地,且训练中的数据交互也不会导致数据隐私泄露。因此,双方在联邦学习的帮助下得以实现合作训练模型。In the process of sample alignment and model training, the respective data of the first server and the second server are kept locally, and the data interaction during training will not lead to data privacy leakage. Therefore, the two parties can cooperate to train the model with the help of federated learning.
第三部分:效果激励。提供数据多的机构所获得的模型效果会更好,模型效果取决于数据提供方对自己和他人的贡献。这些模型的效果在联邦机制上会分发给各个机构反馈,并继续激励更多机构加入这一数据联邦。The third part: effect incentives. Models obtained by institutions that provide more data will perform better, and the model performance depends on the contributions of the data providers to themselves and others. The effects of these models will be distributed to agencies for feedback on the federal mechanism, and will continue to motivate more agencies to join this data federation.
如图5所示,服务器集群140具体为协同服务器,根据本公开的另一个实施例的区域画像的修正方法,包括:As shown in FIG. 5 , the server cluster 140 is specifically a collaborative server, and a method for correcting an area portrait according to another embodiment of the present disclosure includes:
步骤S502,分别接收第一服务器与第二服务器发送的筛选区域信息。Step S502, respectively receiving the screening area information sent by the first server and the second server.
步骤S504,对筛选区域信息取交集,生成重合区域的信息。Step S504, taking the intersection of the information of the screening area, and generating the information of the overlapping area.
步骤S506,将重合区域的信息发送至第一服务器与第二服务器。Step S506, the information of the overlapping area is sent to the first server and the second server.
步骤S508,基于重合区域信息与第一服务器和/或第二服务器之间执行交互训练操作,以使第一服务器和/或第二服务根据交互训练结果生成修正模型,以及基于修正模型修正各自的待修正区域。Step S508, perform an interactive training operation with the first server and/or the second server based on the overlapping area information, so that the first server and/or the second service generate a modified model according to the interactive training result, and modify the respective area to be corrected.
在该实施例中,在协同服务器端,通过对第一服务器发送的筛选区域与第二服务器发送的筛选区域进行取交集操作,得到第一服务器与第二服务器上同时具有的区域ID,以 确定多个区域中的重合区域,重合区域可以包括可靠区域,进而能够在修正模型的训练过程中,分别结合第一服务器上存储的重合区域的特征信息与第二服务器上存储的重合区域的特征信息进行模型训练,基于融合数据得到修正模型,以采用修正模型对待修正区域的画像进行修正,一方面,能够提高对区域画像的描述的精准度,进而提高对区域画像后续利用的可靠性,另一方面,在整个交互过程中,协同服务器用于协助训练,从而有利于减少对协同服务器的资源占用。In this embodiment, on the collaborative server side, by performing an intersection operation on the screening region sent by the first server and the screening region sent by the second server, the region IDs both on the first server and the second server are obtained to determine Overlapping regions in multiple regions, the overlapping regions may include reliable regions, so that in the training process of the revised model, the feature information of the overlapping regions stored on the first server and the feature information of the overlapping regions stored on the second server can be combined respectively. Carry out model training, obtain a correction model based on the fusion data, and use the correction model to correct the portrait of the area to be corrected. On the one hand, it can improve the accuracy of the description of the area portrait, thereby improving the reliability of subsequent use of the area portrait. On the one hand, in the whole interaction process, the collaborative server is used to assist in training, thereby helping to reduce the resource occupation of the collaborative server.
在一个实施例中,分别接收第一服务器与第二服务器发送的筛选区域信息包括:接收第一服务器发送的第一筛选信息,以及第二服务器发送的第二筛选信息,以对第一筛选信息与第二筛选息取交集;以及接收第一服务器发送的第三筛选信息,以及第二服务器发送的第四筛选信息,以对第三筛选信息与第四筛选息取交集。In one embodiment, respectively receiving the screening area information sent by the first server and the second server includes: receiving the first screening information sent by the first server and the second screening information sent by the second server, so as to analyze the first screening information intersecting with the second screening information; and receiving the third screening information sent by the first server and the fourth screening information sent by the second server, so as to obtain the intersection of the third screening information and the fourth screening information.
在该实施例中,在协同服务器端,通过对第一筛选区域与第二筛选区域,确定多个区域中第一重合区域,即无效区域,通过对第三筛选区域与第四筛选区域,确定多个区域中第二重合区域,即可靠区域,以在第一服务器端与第二服务器端将第一重合区域与第二重合区域在多个区域中剔除,剩余区域即为可信度不高的区域,即需要修正的区域,通过上述操作,能够保证修正操作对象的准确性。In this embodiment, on the collaborative server side, the first overlapping area, that is, the invalid area, is determined by comparing the first screening area and the second screening area, and the third screening area and the fourth screening area are determined. The second overlapping area in the multiple areas is the reliable area, so that the first overlapping area and the second overlapping area are eliminated from the multiple areas on the first server side and the second server side, and the remaining area is the low reliability area. The area that needs to be corrected is the area that needs to be corrected. Through the above operations, the accuracy of the corrected operation object can be guaranteed.
在一个实施例中,基于重合区域信息与第一服务器和/或第二服务器之间执行交互训练操作包括:分别向第一服务器与第二服务器发送密钥信息,以使第一服务器和/或第二服务器基于密钥信息进行联邦学习模型的交互加密训练。In one embodiment, performing an interactive training operation with the first server and/or the second server based on the overlapping area information includes: sending key information to the first server and the second server, respectively, so that the first server and/or The second server performs interactive encryption training of the federated learning model based on the key information.
在该实施例中,通过向第一服务器与第二服务器发送密钥信息,利用本文的区域画像修正技术可以达到在保护多方数据安全的前提下刻画出更精准的城市区域画像,以服务于后期区域画像应用场景。In this embodiment, by sending the key information to the first server and the second server, the regional portrait correction technology in this paper can be used to depict a more accurate urban area portrait under the premise of protecting the security of multi-party data, so as to serve the later stage. Area portrait application scenarios.
下面结合图6,以修正城市区域消费指标为例,将消费数据作为目标特征,将消费指标作为目标指数,对本公开的区域画像的修正方案进行进一步描述。In the following, with reference to FIG. 6 , taking the revised urban regional consumption index as an example, the consumption data is taken as the target feature, and the consumption index is taken as the target index, and the correction scheme of the regional portrait of the present disclosure is further described.
各个机构(包括但不限于第一服务器10与第二服务器20为例)对城市区域按照Geohash7(可采用Geohash6或其他划分方法)进行网格划分,并对企业数据库中区域人口、地理位置、区域消费、区域POI等数据进行预处理得到区域网格对应的特征,例如通过订单数据匹配到地址和消费金额,得到区域消费特征。结合两个服务器的数据特征,制定第一筛选规则(人口总数<10&特征POI总数<1&消费金额<100)以筛选出消费指标为0的区域,以及第二筛选规则(人口总数>3&POI总数>0&消费金额>=100)以筛选出消费指标可信度高的区域,结合协同服务器30完成修正过程。Each organization (including but not limited to the first server 10 and the second server 20 as an example) divides the urban area into grids according to Geohash7 (Geohash6 or other division methods can be used), and conducts grid division on the regional population, geographical location, and area in the enterprise database. Consumption, regional POI and other data are preprocessed to obtain the corresponding features of the regional grid. For example, by matching the order data to the address and consumption amount, the regional consumption characteristics are obtained. Combining the data features of the two servers, formulate the first screening rule (total population < 10 & total number of characteristic POIs < 1 & consumption amount < 100) to filter out areas where the consumption index is 0, and the second screening rule (total population > 3 & total POI > 0 & consumption amount >= 100) to filter out areas with high reliability of consumption indicators, and complete the correction process in conjunction with the collaborative server 30 .
步骤S602,第一服务器与第二服务器分别筛选出自身特征满足第一筛选规则的区域,并将该类区域传递到协同服务器。Step S602, the first server and the second server respectively screen out areas whose own characteristics satisfy the first screening rule, and transmit such areas to the collaborative server.
步骤S604,协同服务器收集到各个服务器传递的区域ID集合,再取,ID集合的交集,得到第一重叠区域的信息。Step S604, the collaborative server collects the area ID sets transmitted by each server, and then takes the intersection of the ID sets to obtain the information of the first overlapping area.
步骤S606,协同服务器将第一重叠区域的信息传递到各个服务器。Step S606, the collaborative server transmits the information of the first overlapping area to each server.
其中,定义该交集区域为消费指标为0的区域,剩下的区域则为消费指标可信度高和消费指标可信度低的区域。Among them, the intersection area is defined as the area where the consumption index is 0, and the remaining areas are the areas where the reliability of the consumption indicator is high and the reliability of the consumption indicator is low.
再次结合双方平台特征,认为该部分区域的消费数据能反映出该区域的真实消费水平。Combining the characteristics of the two platforms again, it is believed that the consumption data of this part of the region can reflect the real consumption level of the region.
步骤S608,在删除消费指标为0的区域后,第一服务器与第二服务器分别筛选出自身特征满足第二筛选特征的区域,并将该类区域传递到协同服务器。Step S608, after deleting the area with the consumption index of 0, the first server and the second server respectively screen out the area whose own characteristics satisfy the second screening characteristic, and transmit the area to the collaborative server.
步骤S610,协同服务器收集到各个服务器传递的区域ID集合,再取ID集合的交集,得到第二重叠区域的信息。Step S610, the collaborative server collects the area ID sets transmitted by each server, and then takes the intersection of the ID sets to obtain the information of the second overlapping area.
步骤S612,协同服务器将第二重叠区域的信息传递到各个服务器。Step S612, the cooperative server transmits the information of the second overlapping area to each server.
其中,定义该交集区域为消费指标可信度高的区域。Among them, the intersection area is defined as an area with high reliability of consumption indicators.
步骤S614,剔除指标为0及可信度高的区域,剩下的区域则为消费可信度低的待修正区域。In step S614, the regions with the index of 0 and high reliability are eliminated, and the remaining regions are regions to be corrected with low consumption reliability.
步骤S616,先对双方可信度低及可信度高的区域进行id对齐,利用可信度高的区域作为训练数据进行联邦建模,调整参数多次训练模型,选择合适参数训练出最佳模型,作为修正模型,双方服务器分别保存模型至本地。Step S616, firstly align the ids of the regions with low reliability and high reliability of both parties, use the regions with high reliability as training data for federated modeling, adjust the parameters to train the model multiple times, and select appropriate parameters to train the best model. Model, as a revised model, the two servers save the model to the local respectively.
另外,也可以多次调整选择不同得联邦模型(如:联邦Boosting、联邦森林等)以选择合适参数训练出最佳模型,双方平台分别保存模型至本地。In addition, different federated models (such as federated Boosting, federated forest, etc.) can also be adjusted and selected multiple times to select appropriate parameters to train the best model, and the two platforms save the models locally.
步骤S618,利用训练好的模型推断消费指标可信度低的区域,重新得到该类区域的消费数据以替换原始不准确的区域消费数据。Step S618, use the trained model to infer an area with low reliability of the consumption index, and obtain the consumption data of this type of area again to replace the original inaccurate area consumption data.
步骤S620,对修正后的消费数据进行聚类,得到修正后的城市区域消费指标。In step S620, the revised consumption data are clustered to obtain revised urban area consumption indicators.
具体地,一般可聚类为5-10类,具体个数根据场景或业务需求进行判断。在对消费数据进行聚类后,对聚类中心从小到大排序,并依次将对应聚类簇的指标数据匹配到相应的得分区间,使得最终指标得分在0-100之间,该消费指标得分即为修正后的城市区域消费指标。得到的精确画像消费指标可以用来做消费分析及区域消费力预估等。Specifically, it can generally be clustered into 5-10 categories, and the specific number is determined according to the scene or business requirements. After clustering the consumption data, sort the cluster centers from small to large, and sequentially match the index data of the corresponding cluster to the corresponding score interval, so that the final index score is between 0-100, the consumption index score It is the revised urban regional consumption index. The obtained accurate portrait consumption indicators can be used for consumption analysis and regional consumption power estimation.
需要注意的是,上述附图仅是根据本发明示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。It should be noted that the above-mentioned drawings are only schematic illustrations of the processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It is easy to understand that the processes shown in the above figures do not indicate or limit the chronological order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, in multiple modules.
所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。As will be appreciated by one skilled in the art, various aspects of the present invention may be implemented as a system, method or program product. Therefore, various aspects of the present invention can be embodied in the following forms: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", "module" or "system".
下面参照图7来描述根据本发明的这种实施方式的区域画像的修正装置700。图7所示的区域画像的修正装置700仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。Next, referring to FIG. 7 , an apparatus 700 for correcting a region portrait according to this embodiment of the present invention will be described. The apparatus 700 for correcting a region portrait shown in FIG. 7 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.
区域画像的修正装置700以硬件模块的形式表现。区域画像的修正装置700的组件可 以包括但不限于:传输模块702,用于将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收协同服务器发送的重合区域的信息,其中,重合区域的信息由协同服务器根据第一服务器发送的筛选区域信息与第二服务器发送的筛选区域信息生成;确定模块704,用于基于重合区域的信息确定待修正区域;交互训练模块706,用于调用重合区域信息与协同服务器以及第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型;修正模块708,用于基于修正模型对待修正区域进行修正,以修正多个区域的区域画像。The correction device 700 of the region image is represented in the form of a hardware module. The components of the region portrait correction device 700 may include, but are not limited to: a transmission module 702, configured to send the screening region information screened out from multiple regions to the collaborative server, to receive the overlapping region information sent by the collaborative server, wherein, The information of the overlapping area is generated by the collaborative server according to the screening area information sent by the first server and the screening area information sent by the second server; the determination module 704 is used to determine the area to be corrected based on the information of the overlapping area; the interactive training module 706 is used for Invoke the overlapping area information to perform an interactive training operation between the collaborative server and the second server, so as to generate a correction model according to the interactive training result; the correction module 708 is used to correct the area to be corrected based on the correction model, so as to correct the regional portraits of multiple areas .
下面参照图8来描述根据本发明的这种实施方式的区域画像的修正装置800。图8所示的区域画像的修正装置800仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。Next, referring to FIG. 8 , an apparatus 800 for correcting a region portrait according to this embodiment of the present invention will be described. The apparatus 800 for correcting a region portrait shown in FIG. 8 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.
区域画像的修正装置800以硬件模块的形式表现。区域画像的修正装置800的组件可以包括但不限于:接收模块802,用于分别接收第一服务器与第二服务器发送的筛选区域信息;处理模块804,用于对筛选区域信息取交集,生成重合区域的信息;发送模块806,用于将重合区域的信息发送至第一服务器与第二服务器;辅助训练模块808,用于基于所述重合区域信息与所述第一服务器和/或所述第二服务器之间执行辅助交互训练,以使所述第一服务器和/或所述第二服务所述根据交互训练结果生成修正模型,以及基于所述修正模型修正各自的待修正区域。The correction device 800 of the region image is represented in the form of a hardware module. The components of the region portrait correction device 800 may include, but are not limited to: a receiving module 802, for respectively receiving the screening region information sent by the first server and the second server; a processing module 804, for taking the intersection of the screening region information and generating a coincidence The information of the area; the sending module 806 is used to send the information of the overlapping area to the first server and the second server; the auxiliary training module 808 is used to communicate with the first server and/or the first server and/or the first server based on the information of the overlapping area Auxiliary interactive training is performed between the two servers, so that the first server and/or the second service generates a correction model according to the interactive training result, and corrects the respective regions to be corrected based on the correction model.
下面参照图9来描述根据本发明的这种实施方式的电子设备900。图9显示的电子设备900仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。An electronic device 900 according to this embodiment of the present invention is described below with reference to FIG. 9 . The electronic device 900 shown in FIG. 9 is only an example, and should not impose any limitations on the function and scope of use of the embodiments of the present invention.
如图9所示,电子设备900以通用计算设备的形式表现。电子设备900的组件可以包括但不限于:上述至少一个处理单元910、上述至少一个存储单元920、连接不同系统组件(包括存储单元920和处理单元910)的总线930。As shown in FIG. 9, electronic device 900 takes the form of a general-purpose computing device. Components of the electronic device 900 may include, but are not limited to, the above-mentioned at least one processing unit 910 , the above-mentioned at least one storage unit 920 , and a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910 ).
其中,存储单元存储有程序代码,程序代码可以被处理单元1010执行,使得处理单元910执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的步骤。例如,处理单元1010可以执行如图2中所示的步骤S202、S204至S210,以及本公开的区域画像的修正方法中限定的其他步骤。The storage unit stores program codes, which can be executed by the processing unit 1010, so that the processing unit 910 performs the steps according to various exemplary embodiments of the present invention described in the above-mentioned "Exemplary Methods" section of this specification. For example, the processing unit 1010 may perform steps S202 , S204 to S210 as shown in FIG. 2 , and other steps defined in the method for correcting a region portrait of the present disclosure.
存储单元920可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)9201和/或高速缓存存储单元9202,还可以进一步包括只读存储单元(ROM)9203。The storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 9201 and/or a cache storage unit 9202 , and may further include a read only storage unit (ROM) 9203 .
存储单元920还可以包括具有一组(至少一个)程序模块9205的程序/实用工具9204,这样的程序模块9205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, An implementation of a network environment may be included in each or some combination of these examples.
总线930可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。The bus 930 may be representative of one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures bus.
电子设备900也可以与一个或多个外部设备960(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备交互的设备通信,和/或与使得该电子设备900能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口950进行。并且,电子设备900还可以通过网络适配器950与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器950通过总线930与电子设备900的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。 Electronic device 900 may also communicate with one or more external devices 960 (eg, keyboards, pointing devices, Bluetooth devices, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device, and/or communicate with The electronic device 900 can communicate with any device (eg, router, modem, etc.) that communicates with one or more other computing devices. Such communication may take place through input/output (I/O) interface 950 . Also, the electronic device 900 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 950 . As shown, network adapter 950 communicates with other modules of electronic device 900 via bus 930 . It should be understood that, although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and Data backup storage system, etc.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。From the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to an embodiment of the present disclosure.
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本发明的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当程序产品在终端设备上运行时,程序代码用于使终端设备执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的步骤。In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium on which a program product capable of implementing the above-described method of the present specification is stored. In some possible implementations, various aspects of the present invention can also be implemented in the form of a program product, which includes program code, when the program product runs on a terminal device, the program code is used to cause the terminal device to execute the above-mentioned description in this specification. The steps according to various exemplary embodiments of the present invention are described in the "Example Methods" section.
根据本发明的实施方式的用于实现上述方法的程序产品,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本发明的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。A program product for implementing the above method according to an embodiment of the present invention may adopt a portable compact disc read only memory (CD-ROM) and include program codes, and may run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设 备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider business via an Internet connection).
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the apparatus for action performance are mentioned in the above detailed description, this division is not mandatory. Indeed, according to embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied.
此外,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。Additionally, although the various steps of the methods of the present disclosure are depicted in the figures in a particular order, this does not require or imply that the steps must be performed in the particular order or that all illustrated steps must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, and the like.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、移动终端、或者网络设备等)执行根据本公开实施方式的方法。From the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to an embodiment of the present disclosure.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由所附的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common general knowledge or techniques in the technical field not disclosed by this disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the appended claims.
工业实用性Industrial Applicability
本公开提供的方案,通过将筛选区域信息发送至协同服务器,并接收协同服务器结合第一服务器的筛选区域信息与第二服务器的筛选区域信息得到的重叠区域信息,通过确定重叠区域信息,不但能够在多个区域中剔除重合区域以得到待修正区域,还能够实现与第二服务器之间的数据融合。进一步地,基于融合数据得到修正模型,以采用修正模型对待修正区域的画像进行修正,一方面,能够提高对区域画像的描述的精准度,进而提高对区域画像后续利用的可靠性,另一方面,在整个交互过程中,协同服务器用于协助训练,从而有利于减少对协同服务器的资源占用。In the solution provided by the present disclosure, by sending the screening area information to the collaborative server, and receiving the overlapping area information obtained by the collaborative server combining the screening area information of the first server and the screening area information of the second server, by determining the overlapping area information, not only can The overlapping area is eliminated from the multiple areas to obtain the area to be corrected, and data fusion with the second server can also be realized. Further, the correction model is obtained based on the fusion data, and the correction model is used to correct the portrait of the area to be corrected. On the one hand, the accuracy of the description of the area portrait can be improved, and the reliability of the subsequent use of the area portrait can be improved, on the other hand. , in the whole interaction process, the collaborative server is used to assist training, which is beneficial to reduce the resource occupation of the collaborative server.

Claims (14)

  1. 一种区域画像的修正方法,适用于第一服务器,其特征在于,包括:A method for correcting an area portrait, which is applicable to a first server, is characterized in that, it includes:
    将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收所述协同服务器发送的重合区域的信息,其中,所述重合区域的信息由所述协同服务器根据所述第一服务器发送的筛选区域信息与第二服务器发送的筛选区域信息生成;Send the screening area information screened out from the multiple areas to the collaborative server to receive the information of the overlapping area sent by the collaborative server, wherein the information of the overlapping area is sent by the collaborative server according to the first server The screening area information is generated from the screening area information sent by the second server;
    基于所述重合区域的信息确定待修正区域;Determine the area to be corrected based on the information of the overlapping area;
    调用所述重合区域信息与所述协同服务器以及所述第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型;invoking the overlapping area information to perform an interactive training operation between the collaborative server and the second server, so as to generate a revised model according to the interactive training result;
    基于所述修正模型对所述待修正区域进行修正,以修正所述多个区域的区域画像。The region to be corrected is corrected based on the correction model, so as to correct the region portraits of the multiple regions.
  2. 根据权利要求1所述的区域画像的修正方法,其特征在于,所述将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收所述协同服务器发送的重合区域的信息包括:The method for correcting an area portrait according to claim 1, wherein the sending of the screening area information selected from the plurality of areas to the collaborative server to receive the information of the overlapping area sent by the collaborative server comprises:
    基于第一筛选规则对所述多个区域执行筛选操作,以得到第一筛选区域信息;Perform a screening operation on the plurality of regions based on the first screening rule to obtain first screening region information;
    将所述第一筛选区域信息发送至协同服务器,并接收所述协同服务器发送的第一重合区域的信息,sending the first screening area information to the collaborative server, and receiving the information of the first overlapping area sent by the collaborative server,
    其中,所述第一重合区域的信息用于表示无效区域,所述第一重合区域的信息由所述协同服务器根据所述第一筛选区域信息与所述第二服务器发送的第二筛选区域信息生成。Wherein, the information of the first overlapping area is used to indicate an invalid area, and the information of the first overlapping area is used by the collaborative server according to the first screening area information and the second screening area information sent by the second server. generate.
  3. 根据权利要求2所述的区域画像的修正方法,其特征在于,所述根据筛选规则对多个区域执行筛选操作,并得到筛选区域信息还包括:The method for correcting a region portrait according to claim 2, wherein the performing a screening operation on a plurality of regions according to a screening rule, and obtaining the screening region information further comprises:
    删除所述多个区域中的所述第一重合区域,以得到剩余区域;deleting the first overlapping area in the plurality of areas to obtain a remaining area;
    基于第二筛选规则对所述剩余区域执行筛选操作,以得到第三筛选区域信息;Perform a screening operation on the remaining area based on the second screening rule to obtain third screening area information;
    将所述第三筛选区域信息发送至协同服务器,并接收所述协同服务器发送的第二重合区域的信息,sending the third screening area information to the collaborative server, and receiving the information of the second overlapping area sent by the collaborative server,
    其中,所述第二重合区域的信息用于表示可靠区域,所述第二重合区域的信息由所述协同服务器根据所述第三筛选区域信息与所述第二服务器发送的第四筛选区域信息生成。Wherein, the information of the second overlapping area is used to indicate a reliable area, and the information of the second overlapping area is used by the collaborative server according to the third screening area information and the fourth screening area information sent by the second server. generate.
  4. 根据权利要求1所述的区域画像的修正方法,其特征在于,所述基于所述修正模型对所述待修正区域进行修正,以修正所述多个区域的区域画像包括:The method for correcting a region portrait according to claim 1, wherein the modifying the region to be corrected based on the correction model to correct the region portraits of the multiple regions comprises:
    将所述待修正区域的区域特征输入到所述修正模型,以输出修正后的目标特征;Inputting the regional feature of the region to be corrected into the corrected model to output the corrected target feature;
    采用所述修正的目标特征代替所述待修正区域中原始的目标特征,以更新所述多个区域的目标特征;Using the modified target feature to replace the original target feature in the to-be-corrected region to update the target features of the multiple regions;
    基于更新后的所述多个区域的目标特征确定所述多个区域的目标指数;determining target indices of the plurality of regions based on the updated target characteristics of the plurality of regions;
    基于所述目标指数修正所述多个区域的区域画像。The regional profiles of the plurality of regions are modified based on the target index.
  5. 根据权利要求4所述的区域画像的修正方法,其特征在于,所述基于更新后的所述多个区域的目标特征确定所述多个区域的目标指数包括:The method for correcting an area portrait according to claim 4, wherein the determining the target index of the plurality of areas based on the updated target characteristics of the plurality of areas comprises:
    对所述多个区域的目标特征执行聚类操作,并得到多个聚类中心与对应的聚类簇;Perform a clustering operation on the target features of the multiple regions, and obtain multiple cluster centers and corresponding cluster clusters;
    对所述多个聚类中心进行排序,并且将每个聚类中心对应配置一个得分区间;Sort the plurality of cluster centers, and configure a score interval corresponding to each cluster center;
    将所述聚类簇匹配至对应的所述得分区间,以生成所述多个区域的目标指数。Matching the clusters to the corresponding score intervals to generate target indices for the plurality of regions.
  6. 根据权利要求4所述的区域画像的修正方法,其特征在于,所述基于更新后的所述多个区域的目标特征确定所述多个区域的目标指数包括:The method for correcting an area portrait according to claim 4, wherein the determining the target index of the plurality of areas based on the updated target characteristics of the plurality of areas comprises:
    将所述多个区域的目标特征输入预设的分类模型中,以由所述分类模型根据对所述修正后的目标特征的分类结果,输出所述多个区域的目标指数,inputting the target features of the multiple regions into a preset classification model, so that the classification model outputs the target indices of the multiple regions according to the classification results of the modified target features,
    其中,基于监督学习的方式对历史的目标指数进行训练,以生成所述分类模型。Wherein, the historical target index is trained based on supervised learning to generate the classification model.
  7. 根据权利要求1至6中任一项所述的区域画像的修正方法,其特征在于,所述调用所述重合区域信息与所述协同服务器以及所述第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型包括:The method for correcting an area portrait according to any one of claims 1 to 6, wherein the invoking the overlapping area information performs an interactive training operation between the collaborative server and the second server, to Generating a revised model based on the interactive training results includes:
    接收所述协同服务器发送的密钥信息;receiving the key information sent by the collaborative server;
    调用所述密钥信息与所述重合区域信息对与所述第二服务器进行联邦学习模型的交互加密训练,生成所述修正模型。Invoking the key information and the overlapping area information to perform interactive encryption training of the federated learning model with the second server to generate the revised model.
  8. 一种区域画像的修正方法,适用于协同服务器,其特征在于,包括:A method for correcting an area portrait, which is applicable to a collaborative server, is characterized in that, it includes:
    分别接收第一服务器与第二服务器发送的筛选区域信息;respectively receiving the screening area information sent by the first server and the second server;
    对所述筛选区域信息取交集,生成重合区域的信息;Taking the intersection of the screening area information, the information of the overlapping area is generated;
    将所述重合区域的信息发送至所述第一服务器与所述第二服务器;sending the information of the overlapping area to the first server and the second server;
    基于所述重合区域信息与所述第一服务器和/或所述第二服务器之间执行交互训练操作,以使所述第一服务器和/或所述第二服务所述根据交互训练结果生成修正模型,以及基于所述修正模型修正各自的待修正区域。An interactive training operation is performed between the first server and/or the second server based on the overlapping area information, so that the first server and/or the second service generate a correction based on the interactive training result. model, and correct the respective regions to be corrected based on the corrected model.
  9. 根据权利要求8所述的区域画像的修正方法,其特征在于,所述分别接收第一服务器与第二服务器发送的筛选区域信息包括:The method for correcting a region portrait according to claim 8, wherein the receiving the screening region information sent by the first server and the second server respectively comprises:
    接收所述第一服务器发送的第一筛选信息,以及所述第二服务器发送的第二筛选信息,以对所述第一筛选信息与所述第二筛选息取交集;以及receiving the first screening information sent by the first server and the second screening information sent by the second server, so as to obtain the intersection of the first screening information and the second screening information; and
    接收所述第一服务器发送的第三筛选信息,以及所述第二服务器发送的第四筛选信息,以对所述第三筛选信息与所述第四筛选息取交集。The third screening information sent by the first server and the fourth screening information sent by the second server are received, so as to obtain the intersection of the third screening information and the fourth screening information.
  10. 根据权利要求8所述的区域画像的修正方法,其特征在于,所述基于所述重合区域信息与所述第一服务器和/或所述第二服务器之间执行交互训练操作包括:The method for correcting an area portrait according to claim 8, wherein the performing an interactive training operation with the first server and/or the second server based on the overlapping area information comprises:
    分别向所述第一服务器与所述第二服务器发送密钥信息,以使所述第一服务器和/或所述第二服务器基于所述密钥信息进行联邦学习模型的交互加密训练。Send key information to the first server and the second server respectively, so that the first server and/or the second server perform interactive encryption training of the federated learning model based on the key information.
  11. 一种区域画像的修正装置,适用于第一服务器,其特征在于,包括:A device for correcting an area portrait, suitable for a first server, is characterized in that, comprising:
    传输模块,用于将从多个区域中筛选出的筛选区域信息发送至协同服务器,以接收所述协同服务器发送的重合区域的信息,其中,所述重合区域的信息由所述协同服务器根据所述第一服务器发送的筛选区域信息与第二服务器发送的筛选区域信息生成;The transmission module is configured to send the screening area information selected from the multiple areas to the collaborative server, so as to receive the information of the overlapping area sent by the collaborative server, wherein the information of the overlapping area is determined by the collaborative server according to the information of the overlapping area. The screening area information sent by the first server and the screening area information sent by the second server are generated;
    确定模块,用于基于所述重合区域的信息确定待修正区域;a determining module for determining the area to be corrected based on the information of the overlapping area;
    交互训练模块,用于调用所述重合区域信息与所述协同服务器以及所述第二服务器之间执行交互训练操作,以根据交互训练结果生成修正模型;An interactive training module, configured to invoke the overlapping area information to perform an interactive training operation between the collaborative server and the second server, so as to generate a revised model according to the interactive training result;
    修正模块,用于基于所述修正模型对所述待修正区域进行修正,以修正所述多个区域的区域画像。A correction module, configured to correct the to-be-corrected area based on the correction model, so as to correct the area portraits of the multiple areas.
  12. 一种区域画像的修正装置,适用于协同服务器,其特征在于,包括:A device for correcting an area portrait, suitable for a collaborative server, is characterized in that it includes:
    接收模块,用于分别接收第一服务器与第二服务器发送的筛选区域信息;a receiving module for respectively receiving the screening area information sent by the first server and the second server;
    处理模块,用于对所述筛选区域信息取交集,生成重合区域的信息;a processing module, used for taking the intersection of the information of the screening area, and generating the information of the overlapping area;
    发送模块,用于将所述重合区域的信息发送至所述第一服务器与所述第二服务器;a sending module, configured to send the information of the overlapping area to the first server and the second server;
    辅助训练模块,用于基于所述重合区域信息与所述第一服务器和/或所述第二服务器之间执行辅助交互训练,以使所述第一服务器和/或所述第二服务所述根据交互训练结果生成修正模型,以及基于所述修正模型修正各自的待修正区域。An auxiliary training module, configured to perform auxiliary interactive training with the first server and/or the second server based on the overlapping area information, so that the first server and/or the second service the A correction model is generated according to the interactive training result, and the respective regions to be corrected are corrected based on the correction model.
  13. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    处理器;以及processor; and
    存储器,用于存储所述处理器的可执行指令;a memory for storing executable instructions for the processor;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1~7中任意一项所述的区域画像的修正方法和/或权利要求8~10中任意一项所述的区域画像的修正方法。Wherein, the processor is configured to execute the method for modifying a region portrait according to any one of claims 1 to 7 and/or the region according to any one of claims 8 to 10 by executing the executable instructions Image correction method.
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1~10中任意一项所述的区域画像的修正方法。A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method for correcting a region portrait according to any one of claims 1 to 10 is implemented.
PCT/CN2021/126483 2020-11-18 2021-10-26 Region portrait correction method and apparatus, and electronic device and readable storage medium WO2022105554A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011291786.9 2020-11-18
CN202011291786.9A CN113781082B (en) 2020-11-18 2020-11-18 Method and device for correcting regional portrait, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
WO2022105554A1 true WO2022105554A1 (en) 2022-05-27

Family

ID=78835317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/126483 WO2022105554A1 (en) 2020-11-18 2021-10-26 Region portrait correction method and apparatus, and electronic device and readable storage medium

Country Status (2)

Country Link
CN (1) CN113781082B (en)
WO (1) WO2022105554A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3477527A1 (en) * 2017-10-31 2019-05-01 Twinpeek Privacy management
CN110210626A (en) * 2019-05-31 2019-09-06 京东城市(北京)数字科技有限公司 Data processing method, device and computer readable storage medium
CN110797124A (en) * 2019-10-30 2020-02-14 腾讯科技(深圳)有限公司 Model multi-terminal collaborative training method, medical risk prediction method and device
CN111935156A (en) * 2020-08-12 2020-11-13 科技谷(厦门)信息技术有限公司 Data privacy protection method for federated learning

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902742B (en) * 2019-02-28 2021-07-16 深圳前海微众银行股份有限公司 Sample completion method, terminal, system and medium based on encryption migration learning
CN111949859B (en) * 2019-05-16 2023-09-29 Oppo广东移动通信有限公司 User portrait updating method, device, computer equipment and storage medium
KR20190106942A (en) * 2019-08-30 2019-09-18 엘지전자 주식회사 Artificial device and method for controlling the same
CN111325267B (en) * 2020-02-18 2024-02-13 京东城市(北京)数字科技有限公司 Data fusion method, device and computer readable storage medium
CN111582508A (en) * 2020-04-09 2020-08-25 上海淇毓信息科技有限公司 Strategy making method and device based on federated learning framework and electronic equipment
CN111666576B (en) * 2020-04-29 2023-08-04 平安科技(深圳)有限公司 Data processing model generation method and device, and data processing method and device
CN111860868B (en) * 2020-07-27 2023-10-31 深圳前海微众银行股份有限公司 Training sample construction method, device, equipment and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3477527A1 (en) * 2017-10-31 2019-05-01 Twinpeek Privacy management
CN110210626A (en) * 2019-05-31 2019-09-06 京东城市(北京)数字科技有限公司 Data processing method, device and computer readable storage medium
CN110797124A (en) * 2019-10-30 2020-02-14 腾讯科技(深圳)有限公司 Model multi-terminal collaborative training method, medical risk prediction method and device
CN111935156A (en) * 2020-08-12 2020-11-13 科技谷(厦门)信息技术有限公司 Data privacy protection method for federated learning

Also Published As

Publication number Publication date
CN113781082B (en) 2023-04-07
CN113781082A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN110245510B (en) Method and apparatus for predicting information
US11875400B2 (en) Systems, methods, and apparatuses for dynamically assigning nodes to a group within blockchains based on transaction type and node intelligence using distributed ledger technology (DLT)
US20220230071A1 (en) Method and device for constructing decision tree
CN110383791B (en) Map application crowdsourcing based on blockchain
JP2022529967A (en) Extracting data from the blockchain network
CN111695697A (en) Multi-party combined decision tree construction method and device and readable storage medium
CN111081337B (en) Collaborative task prediction method and computer readable storage medium
CN112270597A (en) Business processing and credit evaluation model training method, device, equipment and medium
US20240012641A1 (en) Model construction method and apparatus, and medium and electronic device
WO2022237194A1 (en) Abnormality detection method and apparatus for accounts in federal learning system, and electronic device
WO2021203919A1 (en) Method and apparatus for evaluating joint training model
CN111310204A (en) Data processing method and device
CN113129149A (en) Transaction risk identification method and device based on block chain and safe multi-party calculation
US20230068770A1 (en) Federated model training method and apparatus, electronic device, computer program product, and computer-readable storage medium
CN114547658B (en) Data processing method, device, equipment and computer readable storage medium
Lisdorf Demystifying smart cities: practical perspectives on how cities can leverage the potential of new technologies
CN115392718A (en) Processing method, device, equipment and medium of process model
US20240176841A1 (en) Centralized dynamic portal for creating and hosting static and dynamic applications
CN110837657B (en) Data processing method, client, server and storage medium
CN108765579A (en) One kind being based on VR technology exhibition display connection methods and device
WO2022105554A1 (en) Region portrait correction method and apparatus, and electronic device and readable storage medium
CN113923225A (en) Distributed architecture-based federated learning platform, method, device and storage medium
CN112215710A (en) Annuity data processing method, block chain system, medium and electronic device
CN111402045A (en) Account data supervision method and device
US20180123967A1 (en) Provisioning insight services in a data provider landscape

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893702

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 11/07/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21893702

Country of ref document: EP

Kind code of ref document: A1