WO2021135474A1 - Method and apparatus for fusing data from multiple data sources, electronic device, and storage medium - Google Patents

Method and apparatus for fusing data from multiple data sources, electronic device, and storage medium Download PDF

Info

Publication number
WO2021135474A1
WO2021135474A1 PCT/CN2020/119073 CN2020119073W WO2021135474A1 WO 2021135474 A1 WO2021135474 A1 WO 2021135474A1 CN 2020119073 W CN2020119073 W CN 2020119073W WO 2021135474 A1 WO2021135474 A1 WO 2021135474A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
fused
fusion
training feature
original
Prior art date
Application number
PCT/CN2020/119073
Other languages
French (fr)
Chinese (zh)
Inventor
喻宁
陈克炎
朱艳乔
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021135474A1 publication Critical patent/WO2021135474A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • This application relates to the field of big data technology, and in particular to a data fusion method, device, electronic device, and readable storage medium from multiple data sources.
  • the methods of data fusion mainly include empirical value method and unsupervised method. Both methods can complete data fusion, but the inventor realizes that the empirical value method is subjective, while the non-supervised method lacks the guidance of label data. It is easy to cause the accuracy of the fusion data to be low.
  • a data fusion device with multiple data sources comprising:
  • the data mapping module is used to obtain the original data set to be fused, the training feature set, and the training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
  • the data fusion module is used to input the standard to-be-fused data set into the standard fusion model to perform a fusion operation to obtain fused data, and return the fused data to the client.
  • the processor executes the instructions stored in the memory to implement the following steps:
  • the standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
  • This application can solve the problems of strong subjectivity in the data fusion process and low fusion accuracy.
  • FIG. 1 is a schematic flowchart of a data fusion method for multiple data sources provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of modules of a data fusion method with multiple data sources provided by an embodiment of the application;
  • FIG. 4 is a schematic diagram of the internal structure of an electronic device of a data fusion method with multiple data sources provided by an embodiment of the application;
  • the data fusion method of multiple data sources includes:
  • the purchase price of Toyota Motor, Toyota Motor, and Toyota Motor is 170,000, etc., I have three claims information in the insurance company (including claims for accidents in the driving car, etc.), and have purchased medical insurance, unemployment insurance, etc., the above-mentioned about Xiaoyu upload
  • the data about auto insurance pricing is the original data set to be fused
  • the purpose of this application is to solve the final fusion data according to the original data set to be fused.
  • the data mapping operation includes data normalization. Since the data comes from different channels, the range of data values is not the same. In order to reduce the pressure of calculation, it is necessary to normalize the data of different channels, that is, the data Unified mapping to the interval [0,1] interval.
  • the normalization method of the data used here is dispersion standardization, as shown below:
  • x * is the standard data to be fused
  • min is the minimum value of the original data set to be fused
  • max is the maximum value of the original data set to be fused
  • x is the data in the original data set to be fused.
  • a game A is online for public beta, and now the score label data for game A is obtained from different channels, the score label of a game in the channel 1 data is 65, the score range is [0,100], and the score of the game in channel 2 is 0.46.
  • the score range is [0,1], the score of the game in channel 3 is 0, and the score range is [-1,1].
  • the game A in channel 1, channel 2, and channel 3 The score changes to 0.65, 0.46, and 0.50.
  • logit(y is ) ⁇ 0 + ⁇ 1 x i1 + ⁇ 2 x i2 +... ⁇ s x is +...+ ⁇ k x ik +e i
  • y is represents the predicted fusion value corresponding to the sth training feature
  • e i is the preset error value
  • ⁇ 0 , ⁇ 1 ,..., ⁇ k are the weight coefficients. If the dimension of each training feature in the training feature set is 3, the number of weight coefficients is also 3.
  • the loss function is further obtained as:
  • the above-mentioned training feature set X (x i1 , x i2 , x i3 ,..., x ik ) and the training feature label set are substituted into the loss function to calculate the weight update value.
  • the S2 step is mainly to obtain the weight coefficients ⁇ 0 , ⁇ 1 , ⁇ 2 , and ⁇ k by solving the minimized loss function J( ⁇ ), where e i represents the error of the training process.
  • ⁇ 0 , ⁇ 1 ,..., ⁇ s ,..., ⁇ k represent the weight update value.
  • this embodiment further includes: when the fusion data is successfully returned to the client, establishing a one-to-one correspondence between the fusion data and the original data set to be fused in the client, and The fusion data and the original data set to be fused are stored according to the one-to-one correspondence.
  • the data fusion device 100 with multiple data sources described in this application can be installed in an electronic device.
  • the data fusion device 100 with multiple data sources may include a data mapping module 101, a model training module 102, and a data fusion module 103.
  • the module described in this application can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the data mapping module 101 is configured to obtain an original data set to be fused, a training feature set, and a training feature label set from a client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
  • the model training module 102 is configured to use the training feature set and the training feature tag set to train a pre-built original fusion model to obtain a standard fusion model;
  • the data fusion module 103 is configured to input the standard to-be-fused data set into the standard fusion model to perform a fusion operation to obtain fused data, and return the fused data to the client.
  • the data mapping module 101 obtains an original data set to be fused, a training feature set, and a training feature label set from the client, and performs a data mapping operation on the original data set to be fused to obtain a standard data set to be fused.
  • the main purpose of this application is to perform fusion operations on data from different channels, which has greater application value, in which data from different channels can be collected to obtain the original data set to be fused.
  • Xiaoyu purchased car insurance pricing
  • Xiaoyu uploaded a lot of data on car insurance pricing, including basic information about Xiaoyu 32 years old, male, bachelor degree, urban household registration, a residential house in the urban area, and history of gastric perforation surgery.
  • the purchase price of Toyota Motor, Toyota Motor, and Toyota Motor is 170,000, etc., I have three claims information in the insurance company (including claims for accidents in the driving car, etc.), and have purchased medical insurance, unemployment insurance, etc., the above-mentioned about Xiaoyu upload
  • the data about auto insurance pricing is the original data set to be fused
  • the data mapping operation includes data normalization. Since the data comes from different channels, the range of data values is not the same. In order to reduce the pressure of calculation, it is necessary to normalize the data of different channels, that is, the data Unified mapping to the interval [0,1] interval.
  • the normalization method of the data used here is dispersion standardization, as shown below:
  • a game A is online for public beta, and now the score label data for game A is obtained from different channels, the score label of a game in the channel 1 data is 65, the score range is [0,100], and the score of the game in channel 2 is 0.46.
  • the score range is [0,1], the score of the game in channel 3 is 0, and the score range is [-1,1].
  • the game A in channel 1, channel 2, and channel 3 The score changes to 0.65, 0.46, and 0.50.
  • the form of the training feature set is: X(x i1 ,x i2 ,x i3 ,...,x ik ), where x i1 ,x i2 ,x i3 ,...,x ik represent training features from different channels, And the feature dimensions of x i1 , x i2 , x i3 ,..., x ik are the same, and k represents the number of training feature sets.
  • the training of the pre-built original fusion model using the training data set to obtain the standard fusion model includes: initializing a weight coefficient to obtain an initial value of the weight, wherein the weight coefficient and the training feature set have the same feature dimension, according to The initial value of the weight constructs an original logistic regression model, a loss function for solving the loss value of the original logistic regression model is constructed, the training feature set is used as the input value of the loss function, and the training feature tag set is used as the The label value of the loss function is minimized to obtain the weight update value, and the weight update value is replaced with the weight initial value of the original logistic regression model to obtain the standard fusion model.
  • logit(y is ) ⁇ 0 + ⁇ 1 x i1 + ⁇ 2 x i2 +... ⁇ s x is +...+ ⁇ k x ik +e i
  • y js represents the training feature label corresponding to the sth training feature.
  • the above-mentioned training feature set X (x i1 , x i2 , x i3 ,..., x ik ) and the training feature label set are substituted into the loss function to calculate the weight update value.
  • the model training module 102 mainly obtains the weight coefficients ⁇ 0 , ⁇ 1 , ⁇ 2 , and ⁇ k by solving the minimized loss function J( ⁇ ), where e i represents the error of the training process.
  • the data fusion module 103 inputs the standard to-be-fused data set into the standard fusion model to perform a fusion operation to obtain fused data, and returns the fused data to the client.
  • the standard fusion model including the weight update value is obtained as follows:
  • ⁇ 0 , ⁇ 1 ,..., ⁇ s ,..., ⁇ k represent the weight update value.
  • FIG. 4 it is a schematic diagram of the structure of an electronic device that implements the data fusion method of multiple data sources in this application.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of a data fusion program with multiple data sources, but also to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc.
  • the processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (such as executing Data fusion programs with multiple data sources, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
  • the electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the data fusion 12 of multiple data sources stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
  • the pre-built original fusion model is trained to obtain the standard fusion model.
  • the standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
  • the integrated module/unit of the electronic device 1 can be stored in a non-volatile computer readable storage medium, or can be stored In a volatile computer-readable storage medium.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • the computer-readable medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:
  • the data fusion method with multiple data sources provided in this application further ensures the privacy and security of all the above-mentioned data
  • all the above-mentioned data can also be stored in a node of a blockchain.
  • general fusion data, etc. these data can be stored in the blockchain node.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method for fusing data from multiple data sources, relating to big data technologies, and comprising: obtaining an original data set to be fused, a training feature set, and a training feature tag set from a client, and performing a data mapping operation on the original data set to be fused to obtain a standard data set to be fused (S1); training a pre-constructed original fusion model by using the training feature set and the training feature tag set to obtain a standard fusion model (S2); and inputting the standard data set to be fused into the standard fusion model to implement a fusion operation to obtain fused data, and returning the fused data to the client (S3). Also provided are an apparatus for fusing data from multiple data sources, an electronic device, and a computer readable storage medium. The problems of strong subjectivity and low fusion accuracy in a data fusion process can be solved.

Description

多数据来源的数据融合方法、装置、电子设备及存储介质Data fusion method, device, electronic equipment and storage medium of multiple data sources
本申请要求于2020年1月2日提交中国专利局、申请号为CN202010004568.6,发明名称为“多数据来源的数据融合方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of a Chinese patent application filed with the Chinese Patent Office on January 2, 2020, the application number is CN202010004568.6, and the invention title is "Data fusion method, device, electronic equipment and storage medium with multiple data sources". The entire content is incorporated into this application by reference.
技术领域Technical field
本申请涉及大数据技术领域,尤其涉及一种多数据来源的数据融合方法、装置、电子设备及可读存储介质。This application relates to the field of big data technology, and in particular to a data fusion method, device, electronic device, and readable storage medium from multiple data sources.
背景技术Background technique
随着大数据及人工智能的发展,数据来源越来越多且越来越复杂,对数据分析工作带了巨大的挑战,因此在数据分析工作开始之前,先对数据进行融合是必不可少的措施。目前对于数据融合的手段主要有经验值方法和非监督方法,两种方法都可以完成数据融合,但发明人意识到经验值方法具有较大的主观性,而非监督方法由于缺乏标签数据的指导作用,容易导致融合后的数据精确性不高。With the development of big data and artificial intelligence, more and more data sources are becoming more and more complex, which brings huge challenges to data analysis work. Therefore, before data analysis work starts, it is essential to integrate data first. Measures. At present, the methods of data fusion mainly include empirical value method and unsupervised method. Both methods can complete data fusion, but the inventor realizes that the empirical value method is subjective, while the non-supervised method lacks the guidance of label data. It is easy to cause the accuracy of the fusion data to be low.
发明内容Summary of the invention
一种多数据来源的数据融合方法,包括:A data fusion method with multiple data sources, including:
从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;Obtain an original data set to be fused, a training feature set, and a training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;Using the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。The standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
一种多数据来源的数据融合装置,所述装置包括:A data fusion device with multiple data sources, the device comprising:
数据映射模块,用于从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;The data mapping module is used to obtain the original data set to be fused, the training feature set, and the training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
模型训练模块,用于利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;The model training module is configured to use the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
数据融合模块,用于将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。The data fusion module is used to input the standard to-be-fused data set into the standard fusion model to perform a fusion operation to obtain fused data, and return the fused data to the client.
一种电子设备,所述电子设备包括:An electronic device, which includes:
存储器,存储至少一个指令;及Memory, storing at least one instruction; and
处理器,执行所述存储器中存储的指令以实现如下步骤:The processor executes the instructions stored in the memory to implement the following steps:
从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;Obtain an original data set to be fused, a training feature set, and a training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;Using the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。The standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被电子设备中的处理器执行以实现如下步骤:A computer-readable storage medium storing at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the following steps:
从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;Obtain an original data set to be fused, a training feature set, and a training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;Using the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。The standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
本申请可以解决数据融合过程主观性强、融合准确率低的问题。This application can solve the problems of strong subjectivity in the data fusion process and low fusion accuracy.
附图说明Description of the drawings
图1为本申请一实施例提供的多数据来源的数据融合方法的流程示意图;FIG. 1 is a schematic flowchart of a data fusion method for multiple data sources provided by an embodiment of the application;
图2为本申请一实施例提供的多数据来源的数据融合方法中S2的详细流程示意图;FIG. 2 is a detailed flowchart of S2 in a data fusion method with multiple data sources provided by an embodiment of the application;
图3为本申请一实施例提供的多数据来源的数据融合方法的模块示意图;3 is a schematic diagram of modules of a data fusion method with multiple data sources provided by an embodiment of the application;
图4为本申请一实施例提供的多数据来源的数据融合方法的电子设备的内部结构示意图;4 is a schematic diagram of the internal structure of an electronic device of a data fusion method with multiple data sources provided by an embodiment of the application;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请提供一种多数据来源的数据融合方法。参照图1所示,为本申请一实施例提供的多数据来源的数据融合方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。This application provides a data fusion method with multiple data sources. Referring to FIG. 1, it is a schematic flowchart of a data fusion method with multiple data sources provided by an embodiment of this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
在本实施例中,多数据来源的数据融合方法包括:In this embodiment, the data fusion method of multiple data sources includes:
S1、从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集。S1. Obtain an original data set to be fused, a training feature set, and a training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused.
本申请主要目的是对不同渠道的数据进行融合操作,具有较大的应用价值,其中汇集不同渠道的数据可得到原始待融合数据集。如小宇购买车险定价时,小宇上传了大量关于车险定价的数据,包括小宇的基本信息32岁、男、本科学历、城镇户口、市区一套住宅房、曾有过胃穿孔手术记录、丰田汽车、购买丰田汽车的购买价格为17万等,曾在保险公司内有三次理赔信息(包括驾驶的汽车发生意外的理赔等),购买过医疗保险、失业保险等,上述关于小宇上传关于车险定价的数据即为原始待融合数据集,The main purpose of this application is to perform fusion operations on data from different channels, which has greater application value, in which data from different channels can be collected to obtain the original data set to be fused. For example, when Xiaoyu purchased car insurance pricing, Xiaoyu uploaded a lot of data on car insurance pricing, including basic information about Xiaoyu 32 years old, male, bachelor degree, urban household registration, a residential house in the urban area, and history of gastric perforation surgery. The purchase price of Toyota Motor, Toyota Motor, and Toyota Motor is 170,000, etc., I have three claims information in the insurance company (including claims for accidents in the driving car, etc.), and have purchased medical insurance, unemployment insurance, etc., the above-mentioned about Xiaoyu upload The data about auto insurance pricing is the original data set to be fused,
本申请的目的是根据所述原始待融合数据集求解最后的融合数据。The purpose of this application is to solve the final fusion data according to the original data set to be fused.
进一步地,所述数据映射操作包括数据归一化,由于数据来源于不同的渠道,数据数值的范围不尽相同,为了减轻计算压力,有必要对不同渠道的数据进行归一化处理,即将数据统一映射到区间[0,1]区间上。此处用到的数据的归一化方法为离差标准化,如下所示:Further, the data mapping operation includes data normalization. Since the data comes from different channels, the range of data values is not the same. In order to reduce the pressure of calculation, it is necessary to normalize the data of different channels, that is, the data Unified mapping to the interval [0,1] interval. The normalization method of the data used here is dispersion standardization, as shown below:
Figure PCTCN2020119073-appb-000001
Figure PCTCN2020119073-appb-000001
其中,x *为标准待融合数据,min为所述原始待融合数据集的最小值,max为所述原始待融合数据集的最大值,x为所述原始待融合数据集内的数据。 Wherein, x * is the standard data to be fused, min is the minimum value of the original data set to be fused, max is the maximum value of the original data set to be fused, and x is the data in the original data set to be fused.
如某款游戏A上线公测,现从不同渠道得到对游戏A的评分标签数据,渠道1数据中某游戏的评分标签为65,评分范围为[0,100]、渠道2中该游戏的评分为0.46,评分范围为[0,1]、渠道3中对该游戏的评分为0,评分范围为[-1,1],则经过上述归一化后,渠道1、渠道2、渠道3中游戏A的评分转变为0.65、0.46、0.50。For example, a game A is online for public beta, and now the score label data for game A is obtained from different channels, the score label of a game in the channel 1 data is 65, the score range is [0,100], and the score of the game in channel 2 is 0.46. The score range is [0,1], the score of the game in channel 3 is 0, and the score range is [-1,1]. After the above normalization, the game A in channel 1, channel 2, and channel 3 The score changes to 0.65, 0.46, and 0.50.
较佳地,所述训练特征集和所述训练特征标签集统称为训练数据集,如上述希望对小宇购买车险定价所上传的数据进行数据融合,需要预训练车险定价融合模型,而预训练车险定价融合模型需要有大量已有的训练数据集,如小张车险定价所上传的数据及融合完成的数据、小迟车险定价所上传的数据及融合完成的数据,其中训练特征 集即为上传的数据,融合完成的数据即为训练特征标签集。Preferably, the training feature set and the training feature label set are collectively referred to as a training data set. As described above, if you want to perform data fusion on the data uploaded by Xiaoyu’s purchase of auto insurance pricing, you need to pre-train the auto insurance pricing fusion model, and pre-training The auto insurance pricing fusion model requires a large number of existing training data sets, such as the data uploaded by Xiao Zhang’s auto insurance pricing and the fusion data, the data uploaded by Xiao Chi’s auto insurance pricing, and the fusion data. The training feature set is the upload The fused data is the training feature label set.
进一步地,所述训练特征集形式为:X(x i1,x i2,x i3,…,x ik),其中x i1,x i2,x i3,…,x ik表示来自不同渠道的训练特征,且x i1,x i2,x i3,…,x ik的特征维度相同,k表示所述训练特征集的数量。 Further, the form of the training feature set is: X(x i1 ,x i2 ,x i3 ,...,x ik ), where x i1 ,x i2 ,x i3 ,...,x ik represent training features from different channels, And the feature dimensions of x i1 , x i2 , x i3 ,..., x ik are the same, and k represents the number of training feature sets.
S2、利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型。S2. Use the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model.
详细地,所述利用所述训练数据集训练预构建的原始融合模型得到标准融合模型,可参阅图2详细流程示意图所示,包括:In detail, the use of the training data set to train the pre-built original fusion model to obtain the standard fusion model can be referred to as shown in the detailed flowchart of Figure 2, including:
S21、初始化权重系数得到权重初始值,其中所述权重系数与所述训练特征集具有相同特征维度;S21: Initialize the weight coefficient to obtain the initial value of the weight, wherein the weight coefficient and the training feature set have the same feature dimension;
S22、根据所述权重初始值构建原始逻辑回归模型,构建求解所述原始逻辑回归模型损失值的损失函数;S22. Construct an original logistic regression model according to the initial value of the weight, and construct a loss function for solving the loss value of the original logistic regression model;
S23、将所述训练特征集作为所述损失函数的输入值、将所述训练特征标签集作为所述损失函数的标签值,最小化所述损失函数得到权重更新值;S23. Use the training feature set as the input value of the loss function, and use the training feature label set as the label value of the loss function, and minimize the loss function to obtain a weight update value;
S24、将所述权重更新值替换所述原始逻辑回归模型的权重初始值,得到所述标准融合模型。S24. Replace the weight update value with the weight initial value of the original logistic regression model to obtain the standard fusion model.
具体地,所述原始逻辑回归模型依赖于当前已公开的逻辑方程,其中逻辑方程的数学表现形式如下:Specifically, the original logistic regression model relies on the currently published logistic equations, and the mathematical expression of the logistic equations is as follows:
logit(y is)=θ 01x i12x i2+…θ sx is+…+θ kx ik+e i logit(y is )=θ 01 x i12 x i2 +…θ s x is +…+θ k x ik +e i
其中,y is表示第s个训练特征所对应的预测融合值,e i为预设的误差值,θ 01,…,θ k即为权重系数。,若所述训练特征集内每个训练特征的维度为3,则权重系数的个数也为3。 Among them, y is represents the predicted fusion value corresponding to the sth training feature, e i is the preset error value, and θ 0 , θ 1 ,..., θ k are the weight coefficients. If the dimension of each training feature in the training feature set is 3, the number of weight coefficients is also 3.
进一步地,
Figure PCTCN2020119073-appb-000002
further,
Figure PCTCN2020119073-appb-000002
联合上式得到所述原始逻辑回归模型为:The original logistic regression model obtained by combining the above formula is:
Figure PCTCN2020119073-appb-000003
Figure PCTCN2020119073-appb-000003
所述损失函数J(θ)为:The loss function J(θ) is:
Figure PCTCN2020119073-appb-000004
Figure PCTCN2020119073-appb-000004
Figure PCTCN2020119073-appb-000005
Figure PCTCN2020119073-appb-000005
进一步得到损失函数为:The loss function is further obtained as:
Figure PCTCN2020119073-appb-000006
Figure PCTCN2020119073-appb-000006
其中,y js表示第s个训练特征所对应的训练特征标签。 Among them, y js represents the training feature label corresponding to the sth training feature.
详细地,将上述训练特征集X(x i1,x i2,x i3,…,x ik)和所述训练特征标签集代入至所述损失函数中计算得到权重更新值。 In detail, the above-mentioned training feature set X (x i1 , x i2 , x i3 ,..., x ik ) and the training feature label set are substituted into the loss function to calculate the weight update value.
S2步骤主要是通过求解最小化损失函数J(θ)从而得到权重系数θ 0、θ 1、θ 2、θ k,其中e i代表训练过程的误差。 The S2 step is mainly to obtain the weight coefficients θ 0 , θ 1 , θ 2 , and θ k by solving the minimized loss function J(θ), where e i represents the error of the training process.
S3、将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。S3. Input the standard data set to be fused into the standard fusion model to perform a fusion operation to obtain fusion data, and return the fusion data to the client.
如S2所述得到包括权重更新值的标准融合模型如下:As described in S2, the standard fusion model including the weight update value is obtained as follows:
Figure PCTCN2020119073-appb-000007
Figure PCTCN2020119073-appb-000007
其中,β 01,…,β s,…,β k表示所述权重更新值。 Wherein, β 0 , β 1 ,..., β s ,..., β k represent the weight update value.
如上所述,某款游戏A上线公测,进行归一化后得到的游戏评分为0.65、0.46、0.50,则将0.65表示x i1、0.46表示x i2,以此类推,求解所述标准融合模型得到融合数据y isAs mentioned above, a certain game A is online and the game scores obtained after normalization are 0.65 , 0.46, 0.50, then 0.65 represents x i1 , 0.46 represents x i2 , and so on, and the standard fusion model is solved to obtain Fusion data y is .
进一步地,本实施例还包括:当所述融合数据返回至所述客户端成功时,在所述客户端内将所述融合数据与所述原始待融合数据集建立一对一对应关系,并根据所述一对一对应关系存储所述融合数据与所述原始待融合数据集。Further, this embodiment further includes: when the fusion data is successfully returned to the client, establishing a one-to-one correspondence between the fusion data and the original data set to be fused in the client, and The fusion data and the original data set to be fused are stored according to the one-to-one correspondence.
如图3所示,是本申请多数据来源的数据融合装置的功能模块图。As shown in Fig. 3, it is a functional block diagram of the data fusion device with multiple data sources in this application.
本申请所述多数据来源的数据融合装置100可以安装于电子设备中。根据实现的功能,所述多数据来源的数据融合装置100可以包括数据映射模块101、模型训练模块102、数据融合模块103。本申请所述模块也可以称之为单元,是指一种能够被电子设备的处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。The data fusion device 100 with multiple data sources described in this application can be installed in an electronic device. According to the realized functions, the data fusion device 100 with multiple data sources may include a data mapping module 101, a model training module 102, and a data fusion module 103. The module described in this application can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
在本实施例中,关于各模块/单元的功能如下:In this embodiment, the functions of each module/unit are as follows:
所述数据映射模块101,用于从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;The data mapping module 101 is configured to obtain an original data set to be fused, a training feature set, and a training feature label set from a client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
所述模型训练模块102,用于利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;The model training module 102 is configured to use the training feature set and the training feature tag set to train a pre-built original fusion model to obtain a standard fusion model;
所述数据融合模块103,用于将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。The data fusion module 103 is configured to input the standard to-be-fused data set into the standard fusion model to perform a fusion operation to obtain fused data, and return the fused data to the client.
详细地,所述多数据来源的数据融合装置各模块在被一个电子设备的处理器所执行时,可以实现如下方法步骤:In detail, when each module of the data fusion device with multiple data sources is executed by a processor of an electronic device, the following method steps can be implemented:
所述数据映射模块101从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集。The data mapping module 101 obtains an original data set to be fused, a training feature set, and a training feature label set from the client, and performs a data mapping operation on the original data set to be fused to obtain a standard data set to be fused.
本申请主要目的是对不同渠道的数据进行融合操作,具有较大的应用价值,其中汇集不同渠道的数据可得到原始待融合数据集。如小宇购买车险定价时,小宇上传了大量关于车险定价的数据,包括小宇的基本信息32岁、男、本科学历、城镇户口、市区一套住宅房、曾有过胃穿孔手术记录、丰田汽车、购买丰田汽车的购买价格为17万等,曾在保险公司内有三次理赔信息(包括驾驶的汽车发生意外的理赔等),购买过医疗保险、失业保险等,上述关于小宇上传关于车险定价的数据即为原始待融合数据集,The main purpose of this application is to perform fusion operations on data from different channels, which has greater application value, in which data from different channels can be collected to obtain the original data set to be fused. For example, when Xiaoyu purchased car insurance pricing, Xiaoyu uploaded a lot of data on car insurance pricing, including basic information about Xiaoyu 32 years old, male, bachelor degree, urban household registration, a residential house in the urban area, and history of gastric perforation surgery. The purchase price of Toyota Motor, Toyota Motor, and Toyota Motor is 170,000, etc., I have three claims information in the insurance company (including claims for accidents in the driving car, etc.), and have purchased medical insurance, unemployment insurance, etc., the above-mentioned about Xiaoyu upload The data about auto insurance pricing is the original data set to be fused,
本申请的目的是根据所述原始待融合数据集求解最后的融合数据。The purpose of this application is to solve the final fusion data according to the original data set to be fused.
进一步地,所述数据映射操作包括数据归一化,由于数据来源于不同的渠道,数据数值的范围不尽相同,为了减轻计算压力,有必要对不同渠道的数据进行归一化处理,即将数据统一映射到区间[0,1]区间上。此处用到的数据的归一化方法为离差标准化,如下所示:Further, the data mapping operation includes data normalization. Since the data comes from different channels, the range of data values is not the same. In order to reduce the pressure of calculation, it is necessary to normalize the data of different channels, that is, the data Unified mapping to the interval [0,1] interval. The normalization method of the data used here is dispersion standardization, as shown below:
Figure PCTCN2020119073-appb-000008
Figure PCTCN2020119073-appb-000008
其中,x *为标准待融合数据,min为所述原始待融合数据集的最小值,max为所述原始待融合数据集的最大值,x为所述原始待融合数据集内的数据。 Wherein, x * is the standard data to be fused, min is the minimum value of the original data set to be fused, max is the maximum value of the original data set to be fused, and x is the data in the original data set to be fused.
如某款游戏A上线公测,现从不同渠道得到对游戏A的评分标签数据,渠道1数据中某游戏的评分标签为65,评分范围为[0,100]、渠道2中该游戏的评分为0.46,评分范围为[0,1]、渠道3中对该游戏的评分为0,评分范围为[-1,1],则经过上述归一化后,渠道1、渠道2、渠道3中游戏A的评分转变为0.65、0.46、0.50。For example, a game A is online for public beta, and now the score label data for game A is obtained from different channels, the score label of a game in the channel 1 data is 65, the score range is [0,100], and the score of the game in channel 2 is 0.46. The score range is [0,1], the score of the game in channel 3 is 0, and the score range is [-1,1]. After the above normalization, the game A in channel 1, channel 2, and channel 3 The score changes to 0.65, 0.46, and 0.50.
较佳地,所述训练特征集和所述训练特征标签集统称为训练数据集,如上述希望对小宇购买车险定价所上传的数据进行数据融合,需要预训练车险定价融合模型,而预训练车险定价融合模型需要有大量已有的训练数据集,如小张车险定价所上传的数据及融合完成的数据、小迟车险定价所上传的数据及融合完成的数据,其中训练特征 集即为上传的数据,融合完成的数据即为训练特征标签集。Preferably, the training feature set and the training feature label set are collectively referred to as a training data set. As described above, if you want to perform data fusion on the data uploaded by Xiaoyu’s purchase of auto insurance pricing, you need to pre-train the auto insurance pricing fusion model, and pre-training The auto insurance pricing fusion model requires a large number of existing training data sets, such as the data uploaded by Xiao Zhang’s auto insurance pricing and the fusion data, the data uploaded by Xiao Chi’s auto insurance pricing, and the fusion data. The training feature set is the upload The fused data is the training feature label set.
进一步地,所述训练特征集形式为:X(x i1,x i2,x i3,…,x ik),其中x i1,x i2,x i3,…,x ik表示来自不同渠道的训练特征,且x i1,x i2,x i3,…,x ik的特征维度相同,k表示所述训练特征集的数量。 Further, the form of the training feature set is: X(x i1 ,x i2 ,x i3 ,...,x ik ), where x i1 ,x i2 ,x i3 ,...,x ik represent training features from different channels, And the feature dimensions of x i1 , x i2 , x i3 ,..., x ik are the same, and k represents the number of training feature sets.
所述模型训练模块102利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型。The model training module 102 uses the training feature set and the training feature tag set to train a pre-built original fusion model to obtain a standard fusion model.
详细地,所述利用所述训练数据集训练预构建的原始融合模型得到标准融合模型,包括:初始化权重系数得到权重初始值,其中所述权重系数与所述训练特征集具有相同特征维度,根据所述权重初始值构建原始逻辑回归模型,构建求解所述原始逻辑回归模型损失值的损失函数,将所述训练特征集作为所述损失函数的输入值、将所述训练特征标签集作为所述损失函数的标签值,最小化所述损失函数得到权重更新值,将所述权重更新值替换所述原始逻辑回归模型的权重初始值,得到所述标准融合模型。In detail, the training of the pre-built original fusion model using the training data set to obtain the standard fusion model includes: initializing a weight coefficient to obtain an initial value of the weight, wherein the weight coefficient and the training feature set have the same feature dimension, according to The initial value of the weight constructs an original logistic regression model, a loss function for solving the loss value of the original logistic regression model is constructed, the training feature set is used as the input value of the loss function, and the training feature tag set is used as the The label value of the loss function is minimized to obtain the weight update value, and the weight update value is replaced with the weight initial value of the original logistic regression model to obtain the standard fusion model.
具体地,所述原始逻辑回归模型依赖于当前已公开的逻辑方程,其中逻辑方程的数学表现形式如下:Specifically, the original logistic regression model relies on the currently published logistic equations, and the mathematical expression of the logistic equations is as follows:
logit(y is)=θ 01x i12x i2+…θ sx is+…+θ kx ik+e i logit(y is )=θ 01 x i12 x i2 +…θ s x is +…+θ k x ik +e i
其中,y is表示第s个训练特征所对应的预测融合值,e i为预设的误差值,θ 01,…,θ k即为权重系数。,若所述训练特征集内每个训练特征的维度为3,则权重系数的个数也为3。 Among them, y is represents the predicted fusion value corresponding to the sth training feature, e i is the preset error value, and θ 0 , θ 1 ,..., θ k are the weight coefficients. If the dimension of each training feature in the training feature set is 3, the number of weight coefficients is also 3.
进一步地,
Figure PCTCN2020119073-appb-000009
further,
Figure PCTCN2020119073-appb-000009
联合上式得到所述原始逻辑回归模型为:The original logistic regression model obtained by combining the above formula is:
Figure PCTCN2020119073-appb-000010
Figure PCTCN2020119073-appb-000010
所述损失函数J(θ)为:The loss function J(θ) is:
Figure PCTCN2020119073-appb-000011
Figure PCTCN2020119073-appb-000011
Figure PCTCN2020119073-appb-000012
Figure PCTCN2020119073-appb-000012
进一步得到损失函数为:The loss function is further obtained as:
Figure PCTCN2020119073-appb-000013
Figure PCTCN2020119073-appb-000013
其中,y js表示第s个训练特征所对应的训练特征标签。 Among them, y js represents the training feature label corresponding to the sth training feature.
详细地,将上述训练特征集X(x i1,x i2,x i3,…,x ik)和所述训练特征标签集代入至所述损失函数中计算得到权重更新值。 In detail, the above-mentioned training feature set X (x i1 , x i2 , x i3 ,..., x ik ) and the training feature label set are substituted into the loss function to calculate the weight update value.
模型训练模块102主要是通过求解最小化损失函数J(θ)从而得到权重系数θ 0、θ 1、θ 2、θ k,其中e i代表训练过程的误差。 The model training module 102 mainly obtains the weight coefficients θ 0 , θ 1 , θ 2 , and θ k by solving the minimized loss function J(θ), where e i represents the error of the training process.
所述数据融合模块103将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。The data fusion module 103 inputs the standard to-be-fused data set into the standard fusion model to perform a fusion operation to obtain fused data, and returns the fused data to the client.
如模型训练模块102所述得到包括权重更新值的标准融合模型如下:As described in the model training module 102, the standard fusion model including the weight update value is obtained as follows:
Figure PCTCN2020119073-appb-000014
Figure PCTCN2020119073-appb-000014
其中,β 01,…,β s,…,β k表示所述权重更新值。 Wherein, β 0 , β 1 ,..., β s ,..., β k represent the weight update value.
如上所述,某款游戏A上线公测,进行归一化后得到的游戏评分为0.65、0.46、0.50,则将0.65表示x i1、0.46表示x i2,以此类推,求解所述标准融合模型得到融合数据y isAs mentioned above, a certain game A is online and the game scores obtained after normalization are 0.65 , 0.46, 0.50, then 0.65 represents x i1 , 0.46 represents x i2 , and so on, and the standard fusion model is solved to obtain Fusion data y is .
进一步地,本实施例还包括:当所述融合数据返回至所述客户端成功时,在所述 客户端内将所述融合数据与所述原始待融合数据集建立一对一对应关系,并根据所述一对一对应关系存储所述融合数据与所述原始待融合数据集。Further, this embodiment further includes: when the fusion data is successfully returned to the client, establishing a one-to-one correspondence between the fusion data and the original data set to be fused in the client, and The fusion data and the original data set to be fused are stored according to the one-to-one correspondence.
如图4所示,是本申请实现多数据来源的数据融合方法的电子设备的结构示意图。As shown in FIG. 4, it is a schematic diagram of the structure of an electronic device that implements the data fusion method of multiple data sources in this application.
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如多数据来源的数据融合程序12。The electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a data fusion program 12 from multiple data sources.
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如多数据来源的数据融合程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of a data fusion program with multiple data sources, but also to temporarily store data that has been output or will be output.
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如执行多数据来源的数据融合程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。The processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc. The processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (such as executing Data fusion programs with multiple data sources, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。The bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
图4仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图4示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。For example, although not shown, the electronic device 1 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power The device implements functions such as charge management, discharge management, and power consumption management. The power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators. The electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。Further, the electronic device 1 may also include a network interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may also include a user interface. The user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)). Optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only for illustrative purposes, and are not limited by this structure in the scope of the patent application.
所述电子设备1中的所述存储器11存储的多数据来源的数据融合12是多个指令的组 合,在所述处理器10中运行时,可以实现:The data fusion 12 of multiple data sources stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集。Obtain the original data set to be fused, the training feature set, and the training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused.
利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型。Using the training feature set and the training feature label set, the pre-built original fusion model is trained to obtain the standard fusion model.
将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。The standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
具体地,所述处理器10对上述指令的具体实现方法可参考图3对应实施例中相关步骤的描述,在此不赘述。Specifically, for the specific implementation method of the above-mentioned instructions by the processor 10, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG. 3, which will not be repeated here.
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性计算机可读取存储介质中,也可以存储在一个易失性计算机可读存储介质中。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。所述计算机可读介质存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:Further, if the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium, or can be stored In a volatile computer-readable storage medium. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) . The computer-readable medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:
从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;Obtain an original data set to be fused, a training feature set, and a training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;Using the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端。The standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
具体地,所述计算机程序被处理器执行时实现的步骤的具体实施例与上述实施例的相关步骤的描述大致相同,在此不赘述。Specifically, the specific embodiment of the steps implemented when the computer program is executed by the processor is substantially the same as the description of the related steps of the foregoing embodiment, and will not be repeated here.
在另一实施例中,本申请所提供的多数据来源的数据融合方法,为进一步保证上述所有出现的数据的私密和安全性,上述所有数据还可以存储于一区块链的节点中。例如普融合数据等,这些数据均可存储在区块链节点中。In another embodiment, the data fusion method with multiple data sources provided in this application further ensures the privacy and security of all the above-mentioned data, all the above-mentioned data can also be stored in a node of a blockchain. For example, general fusion data, etc., these data can be stored in the blockchain node.
需要说明的是,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。It should be noted that the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed equipment, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application.
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any associated diagram marks in the claims should not be regarded as limiting the claims involved.
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来 表示名称,而并不表示任何特定的顺序。In addition, it is obvious that the word "including" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. The second class words are used to denote names, and do not denote any specific order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present application.

Claims (20)

  1. 一种多数据来源的数据融合方法,其中,所述方法应用于电子设备中,包括:A data fusion method with multiple data sources, wherein the method is applied to an electronic device and includes:
    从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;Obtain an original data set to be fused, a training feature set, and a training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
    利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;Using the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
    将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端。The standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
  2. 如权利要求1所述的多数据来源的数据融合方法,其中,所述利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型,包括:The data fusion method of multiple data sources according to claim 1, wherein said using said training feature set and said training feature label set to train a pre-built original fusion model to obtain a standard fusion model, comprising:
    初始化权重系数得到权重初始值,其中所述权重系数与所述训练特征集具有相同特征维度;Initialize the weight coefficient to obtain the initial value of the weight, wherein the weight coefficient and the training feature set have the same feature dimension;
    根据所述权重初始值构建原始逻辑回归模型;Constructing an original logistic regression model according to the initial value of the weight;
    构建求解所述原始逻辑回归模型损失值的损失函数;Constructing a loss function for solving the loss value of the original logistic regression model;
    将所述训练特征集作为所述损失函数的输入值、将所述训练特征标签集作为所述损失函数的标签值,最小化所述损失函数得到权重更新值;Using the training feature set as the input value of the loss function, using the training feature label set as the label value of the loss function, and minimizing the loss function to obtain a weight update value;
    将所述权重更新值替换所述原始逻辑回归模型的权重初始值,得到所述标准融合模型。The weight update value replaces the weight initial value of the original logistic regression model to obtain the standard fusion model.
  3. 如权利要求2所述的多数据来源的数据融合方法,其中,所述损失函数包括:The data fusion method of multiple data sources according to claim 2, wherein the loss function comprises:
    Figure PCTCN2020119073-appb-100001
    Figure PCTCN2020119073-appb-100001
    其中,J(θ)表示所述损失函数,k表示所述训练特征集的数量,y is表示利用所述原始逻辑回归模型预测第s个训练特征对应的预测融合数据,y js表示第s个训练特征所对应的训练特征标签,θ表示所述权重系数。 Wherein, J(θ) represents the loss function, k represents the number of training feature sets, y is represents the prediction fusion data corresponding to the sth training feature using the original logistic regression model, and y js represents the sth training feature. The training feature label corresponding to the training feature, and θ represents the weight coefficient.
  4. 如权利要求1所述的多数据来源的数据融合方法,其中,所述数据映射操作包括:The data fusion method of multiple data sources according to claim 1, wherein the data mapping operation comprises:
    采用下述计算方法进行数据归一化操作:The following calculation method is used for data normalization operation:
    Figure PCTCN2020119073-appb-100002
    Figure PCTCN2020119073-appb-100002
    其中,x *为所述标准待融合数据集内的数据,min为所述原始待融合数据集的最小值,max为所述原始待融合数据集的最大值,x为所述原始待融合数据集内的数据。 Where x * is the data in the standard data set to be fused, min is the minimum value of the original data set to be fused, max is the maximum value of the original data set to be fused, and x is the original data to be fused The data in the set.
  5. 如权利要求1所述的多数据来源的数据融合方法,其中,所述方法还包括:The data fusion method of multiple data sources according to claim 1, wherein the method further comprises:
    当所述融合数据返回至所述客户端成功时,在所述客户端内将所述融合数据与所述原始待融合数据集建立一对一对应关系;When the fusion data is successfully returned to the client, establishing a one-to-one correspondence between the fusion data and the original data set to be fused in the client;
    根据所述一对一对应关系存储所述融合数据与所述原始待融合数据集。The fusion data and the original data set to be fused are stored according to the one-to-one correspondence.
  6. 如权利要求1所述的多数据来源的数据融合方法,其中,所述训练特征集形式为:X(x i1,x i2,x i3,…,x ik),其中,x i1,x i2,x i3,…,x ik表示来自不同渠道的训练特征,且x i1,x i2,x i3,…,x ik的特征维度相同,k表示所述训练特征集的数量。 The data fusion method of multiple data sources according to claim 1, wherein the training feature set is in the form of X(x i1 ,x i2 ,x i3 ,...,x ik ), where x i1 ,x i2 , x i3 ,..., x ik represent training features from different channels, and the feature dimensions of x i1 , x i2 , x i3 ,..., x ik are the same, and k represents the number of training feature sets.
  7. 如权利要求1所述的多数据来源的数据融合方法,其中,所述标准融合模型包括:The data fusion method of multiple data sources according to claim 1, wherein the standard fusion model comprises:
    Figure PCTCN2020119073-appb-100003
    Figure PCTCN2020119073-appb-100003
    其中,β 01,…,β s,…,β k表示权重更新值,x i1,x i2,x i3,…,x ik表示来自不同渠道的训练特征,y is表示所述融合数据。 Among them, β 0 , β 1 ,..., β s ,..., β k represent weight update values, x i1 , x i2 , x i3 ,..., x ik represent training features from different channels, and y is represents the fusion data .
  8. 一种多数据来源的数据融合装置,其中,所述装置包括:A data fusion device with multiple data sources, wherein the device includes:
    数据映射模块,用于从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;The data mapping module is used to obtain the original data set to be fused, the training feature set, and the training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
    模型训练模块,用于利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;The model training module is configured to use the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
    数据融合模块,用于将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端中。The data fusion module is used to input the standard to-be-fused data set into the standard fusion model to perform a fusion operation to obtain fused data, and return the fused data to the client.
  9. 一种电子设备,其中,所述电子设备包括:An electronic device, wherein the electronic device includes:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下步骤:The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the following steps:
    从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;Obtain an original data set to be fused, a training feature set, and a training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
    利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;Using the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
    将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端。The standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
  10. 如权利要求9所述的电子设备,其中,所述利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型,包括:9. The electronic device according to claim 9, wherein said using said training feature set and said training feature label set to train a pre-built original fusion model to obtain a standard fusion model, comprising:
    初始化权重系数得到权重初始值,其中所述权重系数与所述训练特征集具有相同特征维度;Initialize the weight coefficient to obtain the initial value of the weight, wherein the weight coefficient and the training feature set have the same feature dimension;
    根据所述权重初始值构建原始逻辑回归模型;Constructing an original logistic regression model according to the initial value of the weight;
    构建求解所述原始逻辑回归模型损失值的损失函数;Constructing a loss function for solving the loss value of the original logistic regression model;
    将所述训练特征集作为所述损失函数的输入值、将所述训练特征标签集作为所述损失函数的标签值,最小化所述损失函数得到权重更新值;Using the training feature set as the input value of the loss function, using the training feature label set as the label value of the loss function, and minimizing the loss function to obtain a weight update value;
    将所述权重更新值替换所述原始逻辑回归模型的权重初始值,得到所述标准融合模型。The weight update value replaces the weight initial value of the original logistic regression model to obtain the standard fusion model.
  11. 如权利要求10所述的电子设备,其中,所述损失函数包括:The electronic device of claim 10, wherein the loss function comprises:
    Figure PCTCN2020119073-appb-100004
    Figure PCTCN2020119073-appb-100004
    其中,J(θ)表示所述损失函数,k表示所述训练特征集的数量,y is表示利用所述原始逻辑回归模型预测第s个训练特征对应的预测融合数据,y js表示第s个训练特征所对应的训练特征标签,θ表示所述权重系数。 Wherein, J(θ) represents the loss function, k represents the number of training feature sets, y is represents the prediction fusion data corresponding to the sth training feature using the original logistic regression model, and y js represents the sth training feature. The training feature label corresponding to the training feature, and θ represents the weight coefficient.
  12. 如权利要求9所述的电子设备,其中,所述数据映射操作包括:9. The electronic device of claim 9, wherein the data mapping operation comprises:
    采用下述计算方法进行数据归一化操作:The following calculation method is used for data normalization operation:
    Figure PCTCN2020119073-appb-100005
    Figure PCTCN2020119073-appb-100005
    其中,x *为所述标准待融合数据集内的数据,min为所述原始待融合数据集的最小值,max为所述原始待融合数据集的最大值,x为所述原始待融合数据集内的数据。 Where x * is the data in the standard data set to be fused, min is the minimum value of the original data set to be fused, max is the maximum value of the original data set to be fused, and x is the original data to be fused The data in the set.
  13. 如权利要求9所述的电子设备,其中,所述指令被所述至少一个处理器执行,以使所述至少一个处理器还执行如下步骤:9. The electronic device according to claim 9, wherein the instructions are executed by the at least one processor, so that the at least one processor further executes the following steps:
    当所述融合数据返回至所述客户端成功时,在所述客户端内将所述融合数据与所述原始待融合数据集建立一对一对应关系;When the fusion data is successfully returned to the client, establishing a one-to-one correspondence between the fusion data and the original data set to be fused in the client;
    根据所述一对一对应关系存储所述融合数据与所述原始待融合数据集。The fusion data and the original data set to be fused are stored according to the one-to-one correspondence.
  14. 如权利要求9所述的电子设备,其中,所述训练特征集形式为:X(x i1,x i2,x i3,…,x ik),其中,x i1,x i2,x i3,…,x ik表示来自不同渠道的训练特征,且x i1,x i2,x i3,…,x ik的特征维度相同,k表示所述训练特征集的数量。 The electronic device according to claim 9, wherein the form of the training feature set is: X(x i1 ,x i2 ,x i3 ,...,x ik ), where x i1 ,x i2 ,x i3 ,..., x ik represents training features from different channels, and the feature dimensions of x i1 , x i2 , x i3 ,..., x ik are the same, and k represents the number of the training feature sets.
  15. 如权利要求9所述的电子设备,其中,所述标准融合模型包括:9. The electronic device of claim 9, wherein the standard fusion model comprises:
    Figure PCTCN2020119073-appb-100006
    Figure PCTCN2020119073-appb-100006
    其中,β 01,…,β s,…,β k表示权重更新值,x i1,x i2,x i3,…,x ik表示来自不同渠道的训练特征,y is表示所述融合数据。 Among them, β 0 , β 1 ,..., β s ,..., β k represent weight update values, x i1 , x i2 , x i3 ,..., x ik represent training features from different channels, and y is represents the fusion data .
  16. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the following steps:
    从客户端中获取原始待融合数据集、训练特征集和训练特征标签集,对所述原始待融合数据集进行数据映射操作,得到标准待融合数据集;Obtain an original data set to be fused, a training feature set, and a training feature label set from the client, and perform a data mapping operation on the original data set to be fused to obtain a standard data set to be fused;
    利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型;Using the training feature set and the training feature label set to train a pre-built original fusion model to obtain a standard fusion model;
    将所述标准待融合数据集输入至所述标准融合模型进行融合操作得到融合数据,并将所述融合数据返回至所述客户端。The standard to-be-fused data set is input to the standard fusion model to perform a fusion operation to obtain fused data, and the fused data is returned to the client.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述利用所述训练特征集和所述训练特征标签集,训练预构建的原始融合模型,得到标准融合模型,包括:15. The computer-readable storage medium according to claim 16, wherein said using said training feature set and said training feature label set to train a pre-built original fusion model to obtain a standard fusion model, comprising:
    初始化权重系数得到权重初始值,其中所述权重系数与所述训练特征集具有相同特征维度;Initialize the weight coefficient to obtain the initial value of the weight, wherein the weight coefficient and the training feature set have the same feature dimension;
    根据所述权重初始值构建原始逻辑回归模型;Constructing an original logistic regression model according to the initial value of the weight;
    构建求解所述原始逻辑回归模型损失值的损失函数;Constructing a loss function for solving the loss value of the original logistic regression model;
    将所述训练特征集作为所述损失函数的输入值、将所述训练特征标签集作为所述损失函数的标签值,最小化所述损失函数得到权重更新值;Using the training feature set as the input value of the loss function, using the training feature label set as the label value of the loss function, and minimizing the loss function to obtain a weight update value;
    将所述权重更新值替换所述原始逻辑回归模型的权重初始值,得到所述标准融合模型。The weight update value replaces the weight initial value of the original logistic regression model to obtain the standard fusion model.
  18. 如权利要求17所述的计算机可读存储介质,其中,所述损失函数包括:17. The computer-readable storage medium of claim 17, wherein the loss function comprises:
    Figure PCTCN2020119073-appb-100007
    Figure PCTCN2020119073-appb-100007
    其中,J(θ)表示所述损失函数,k表示所述训练特征集的数量,y is表示利用所述原始逻辑回归模型预测第s个训练特征对应的预测融合数据,y js表示第s个训练特征所对应的训练特征标签,θ表示所述权重系数。 Wherein, J(θ) represents the loss function, k represents the number of training feature sets, y is represents the prediction fusion data corresponding to the sth training feature using the original logistic regression model, and y js represents the sth training feature. The training feature label corresponding to the training feature, and θ represents the weight coefficient.
  19. 如权利要求16所述的计算机可读存储介质,其中,所述数据映射操作包括:The computer-readable storage medium of claim 16, wherein the data mapping operation comprises:
    采用下述计算方法进行数据归一化操作:The following calculation method is used for data normalization operation:
    Figure PCTCN2020119073-appb-100008
    Figure PCTCN2020119073-appb-100008
    其中,x *为所述标准待融合数据集内的数据,min为所述原始待融合数据集的最小值,max为所述原始待融合数据集的最大值,x为所述原始待融合数据集内的数据。 Where x * is the data in the standard data set to be fused, min is the minimum value of the original data set to be fused, max is the maximum value of the original data set to be fused, and x is the original data to be fused The data in the set.
  20. 如权利要求16所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现如下步骤:16. The computer-readable storage medium of claim 16, wherein the computer program further implements the following steps when being executed by the processor:
    当所述融合数据返回至所述客户端成功时,在所述客户端内将所述融合数据与所述原始待融合数据集建立一对一对应关系;When the fusion data is successfully returned to the client, establishing a one-to-one correspondence between the fusion data and the original data set to be fused in the client;
    根据所述一对一对应关系存储所述融合数据与所述原始待融合数据集。The fusion data and the original data set to be fused are stored according to the one-to-one correspondence.
PCT/CN2020/119073 2020-01-02 2020-09-29 Method and apparatus for fusing data from multiple data sources, electronic device, and storage medium WO2021135474A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010004568.6 2020-01-02
CN202010004568.6A CN111191733B (en) 2020-01-02 2020-01-02 Data fusion method and device for multiple data sources, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021135474A1 true WO2021135474A1 (en) 2021-07-08

Family

ID=70708372

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119073 WO2021135474A1 (en) 2020-01-02 2020-09-29 Method and apparatus for fusing data from multiple data sources, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111191733B (en)
WO (1) WO2021135474A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592019A (en) * 2021-08-10 2021-11-02 平安银行股份有限公司 Fault detection method, device, equipment and medium based on multi-model fusion
CN116303392A (en) * 2023-03-02 2023-06-23 重庆市规划和自然资源信息中心 Multi-source data table management method for real estate registration data
CN117648670A (en) * 2024-01-24 2024-03-05 润泰救援装备科技河北有限公司 Rescue data fusion method, electronic equipment, storage medium and rescue fire truck

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191733B (en) * 2020-01-02 2020-09-29 平安科技(深圳)有限公司 Data fusion method and device for multiple data sources, electronic equipment and storage medium
CN117349785B (en) * 2023-08-24 2024-04-05 长江水上交通监测与应急处置中心 Multi-source data fusion method and system for shipping government information resources

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610194A (en) * 2017-08-14 2018-01-19 成都大学 MRI super resolution ratio reconstruction method based on Multiscale Fusion CNN
CN108647716A (en) * 2018-05-09 2018-10-12 北京理工大学 A kind of diagnosing failure of photovoltaic array method based on composite information
CN109271901A (en) * 2018-08-31 2019-01-25 武汉大学 A kind of sign Language Recognition Method based on Multi-source Information Fusion
CN110197218A (en) * 2019-05-24 2019-09-03 绍兴达道生涯教育信息咨询有限公司 Thunderstorm gale grade forecast classification method based on multi-source convolutional neural networks
US20190279111A1 (en) * 2018-03-09 2019-09-12 Zestfinance, Inc. Systems and methods for providing machine learning model evaluation by using decomposition
CN111191733A (en) * 2020-01-02 2020-05-22 平安科技(深圳)有限公司 Data fusion method and device for multiple data sources, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970271B (en) * 2014-04-04 2017-06-20 浙江大学 The daily routines recognition methods of fusional movement and physiology sensing data
CN110288023A (en) * 2019-06-26 2019-09-27 广州小鹏汽车科技有限公司 Fusion method and device, detection method, acquisition methods, server and vehicle
CN110349652B (en) * 2019-07-12 2022-02-22 之江实验室 Medical data analysis system fusing structured image data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610194A (en) * 2017-08-14 2018-01-19 成都大学 MRI super resolution ratio reconstruction method based on Multiscale Fusion CNN
US20190279111A1 (en) * 2018-03-09 2019-09-12 Zestfinance, Inc. Systems and methods for providing machine learning model evaluation by using decomposition
CN108647716A (en) * 2018-05-09 2018-10-12 北京理工大学 A kind of diagnosing failure of photovoltaic array method based on composite information
CN109271901A (en) * 2018-08-31 2019-01-25 武汉大学 A kind of sign Language Recognition Method based on Multi-source Information Fusion
CN110197218A (en) * 2019-05-24 2019-09-03 绍兴达道生涯教育信息咨询有限公司 Thunderstorm gale grade forecast classification method based on multi-source convolutional neural networks
CN111191733A (en) * 2020-01-02 2020-05-22 平安科技(深圳)有限公司 Data fusion method and device for multiple data sources, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HE YAQI: "Research and Applications on the Key Technology of Multi-source Heterogeneous Data Fusion", CHINA MASTER’S THESES FULL-TEXT DATABASE, 23 March 2018 (2018-03-23), pages 1 - 80, XP055826797, ISSN: 1674-0246 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592019A (en) * 2021-08-10 2021-11-02 平安银行股份有限公司 Fault detection method, device, equipment and medium based on multi-model fusion
CN113592019B (en) * 2021-08-10 2023-09-15 平安银行股份有限公司 Fault detection method, device, equipment and medium based on multi-model fusion
CN116303392A (en) * 2023-03-02 2023-06-23 重庆市规划和自然资源信息中心 Multi-source data table management method for real estate registration data
CN116303392B (en) * 2023-03-02 2023-09-01 重庆市规划和自然资源信息中心 Multi-source data table management method for real estate registration data
CN117648670A (en) * 2024-01-24 2024-03-05 润泰救援装备科技河北有限公司 Rescue data fusion method, electronic equipment, storage medium and rescue fire truck
CN117648670B (en) * 2024-01-24 2024-04-12 润泰救援装备科技河北有限公司 Rescue data fusion method, electronic equipment, storage medium and rescue fire truck

Also Published As

Publication number Publication date
CN111191733A (en) 2020-05-22
CN111191733B (en) 2020-09-29

Similar Documents

Publication Publication Date Title
WO2021135474A1 (en) Method and apparatus for fusing data from multiple data sources, electronic device, and storage medium
WO2021189906A1 (en) Target detection method and apparatus based on federated learning, and device and storage medium
WO2021151306A1 (en) Method and apparatus for smart analysis of question and answer linguistic material, electronic device, and readable storage medium
WO2022116424A1 (en) Method and apparatus for training traffic flow prediction model, electronic device, and storage medium
WO2021212612A1 (en) Intelligent text error correction method and apparatus, electronic device and readable storage medium
WO2019169756A1 (en) Product recommendation method and apparatus, and storage medium
WO2021218336A1 (en) User information discrimination method and apparatus, and device and computer readable storage medium
WO2021151338A1 (en) Medical imagery analysis method, apparatus, electronic device and readable storage medium
WO2021212683A1 (en) Law knowledge map-based query method and apparatus, and electronic device and medium
WO2021208701A1 (en) Method, apparatus, electronic device, and storage medium for generating annotation for code change
WO2016062255A1 (en) Multi-round session interaction method and system, and computer device
CN112801718B (en) User behavior prediction method, device, equipment and medium
WO2021189910A1 (en) Image recognition method and apparatus, and electronic device and computer-readable storage medium
WO2021208703A1 (en) Method and apparatus for question parsing, electronic device, and storage medium
WO2021189909A1 (en) Lesion detection and analysis method and apparatus, and electronic device and computer storage medium
WO2021238563A1 (en) Enterprise operation data analysis method and apparatus based on configuration algorithm, and electronic device and medium
WO2021208695A1 (en) Method and apparatus for target item recommendation, electronic device, and computer readable storage medium
WO2022062449A1 (en) User grouping method and apparatus, and electronic device and storage medium
WO2022194062A1 (en) Disease label detection method and apparatus, electronic device, and storage medium
WO2021151305A1 (en) Sample analysis method, apparatus, electronic device, and medium based on missing data
CN116483976A (en) Registration department recommendation method, device, equipment and storage medium
WO2023040145A1 (en) Artificial intelligence-based text classification method and apparatus, electronic device, and medium
WO2022227192A1 (en) Image classification method and apparatus, and electronic device and medium
CN111985545A (en) Target data detection method, device, equipment and medium based on artificial intelligence
CN113157739B (en) Cross-modal retrieval method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20908576

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20908576

Country of ref document: EP

Kind code of ref document: A1