CN115203138A

CN115203138A - Data retrieval method, device and storage medium

Info

Publication number: CN115203138A
Application number: CN202210780491.0A
Authority: CN
Inventors: 刘千仞; 薛淼; 任梦璇; 任杰
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-10-18

Abstract

The disclosure provides a data retrieval method, a data retrieval device and a storage medium, which relate to the field of data processing and can improve the accuracy of data retrieval. The method comprises the following steps: sending data to be retrieved to a data server; inputting data to be retrieved into a preset neural network model, and determining a file index vector corresponding to the data to be retrieved; the file index vector is sent to the blockchain system. The embodiment of the disclosure is used in a data retrieval process.

Description

Data retrieval method, device and storage medium

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a data retrieval method, apparatus, and storage medium.

Background

In the related art, a data retrieval system generally includes a data providing end, a data using end, and a data server. The data providing end encrypts the data to be retrieved and sends the encrypted data to the data server, and the data server determines a file index vector of the data to be retrieved according to the vector space model. When the data using end needs to search, a searching request is sent to the data server, the data server generates a searching vector according to the searching request, the file index vector related to the searching vector is matched, and the data to be searched corresponding to the related file index vector is sent to the data using end.

However, in the related art, the file index vector of the data to be retrieved is determined by using the vector space model, and the accuracy of the retrieval vector is low, so that the accuracy of the final retrieval result determined by the data server is low.

Disclosure of Invention

The present disclosure provides a data retrieval method, apparatus and storage medium for improving accuracy of data retrieval.

In order to achieve the purpose, the technical scheme is as follows:

in a first aspect, a data retrieval method is provided, which is applied to a first data provider, and includes: sending data to be retrieved to a data server; inputting data to be retrieved into a preset neural network model, and determining a file index vector corresponding to the data to be retrieved; the file index vector is sent to the blockchain system.

With reference to the first aspect, in a possible implementation manner, the first data provider is one of a plurality of data providers; the method further comprises the following steps: step 1, obtaining a first model parameter; when the iteration is carried out for the first time, the first model parameter is an initial model parameter; during the Nth iteration, the first model parameter is a model parameter determined according to the first parameters of a plurality of data providing ends acquired from the blockchain system; the first parameter is a model parameter of the neural network model determined after N-1 iterations are carried out on the initial neural network model; n is a positive integer greater than or equal to 2; step 2, adjusting model parameters of the first neural network model according to the first model parameters, and determining a second neural network model; during the first iteration, the first neural network model is an initial neural network model; during the Nth iteration, the first neural network model is a neural network model determined after N-1 iterations are carried out on the initial neural network model; step 3, training a second neural network model according to the training data, and determining a third neural network model; step 4, determining whether the third neural network model meets a preset condition; step 5, if yes, determining the third neural network model as a preset neural network model; step 6, if the first parameter does not meet the requirement, taking the parameter of the third neural network model as the first parameter, and sending the first parameter to the block chain system; and (3) iteratively executing the step 1, the step 2, the step 3, the step 4, the step 5 and the step 6 until the determined third neural network model meets the preset condition.

With reference to the foregoing first aspect, in a possible implementation manner, sending data to be retrieved to a data server includes: determining a private key of a first data provider; encrypting data to be retrieved according to a private key; and sending the encrypted data to be retrieved to a data server.

With reference to the foregoing first aspect, in a possible implementation manner, the method further includes: receiving a first request message from a data using end; the first request message is used for requesting to acquire a private key of the first data providing end; sending a first response message to a data using end; the first response message includes a private key of the first data provider.

With reference to the foregoing first aspect, in a possible implementation manner, sending a file index vector to a blockchain system includes: encrypting the file index vector according to the private key; and sending the encrypted file index vector to a block chain system.

In a second aspect, a data retrieval method is provided, which is applied to a block chain system, and the method includes: receiving and storing file index vectors from a plurality of data providers; the file index vector corresponds to data to be retrieved; receiving a retrieval request from a data using end, and generating a retrieval vector according to the retrieval request; matching K file index vectors with the highest similarity with the retrieval vector in the file index vectors, wherein K is a positive integer; and sending first indication information to the data server, wherein the first indication information is used for indicating the data server to send the data to be retrieved corresponding to the K file index vectors to the data using end.

With reference to the second aspect, in a possible implementation manner, the method further includes: receiving first parameters from a plurality of data providing ends, wherein the first parameters are model parameters of the neural network model determined after N-1 iterations are carried out on the initial neural network model; n is a positive integer greater than or equal to 2; storing the first parameter in at least one blockchain node; receiving a second request message from the first data provider; the second request message is used for requesting the first parameters of the plurality of data providing terminals; the first data providing end is one of at least one data providing end; and responding to the second request message, and sending the first parameters of the plurality of data providers to the first data provider.

With reference to the second aspect, in a possible implementation manner, after receiving the second request message from the first data provider, the method further includes: determining the number of data providing ends connected by the block chain system; determining whether the number of the received first parameters is equal to the number of the data providing terminals; if yes, sending first parameters of a plurality of data providing ends to a first data providing end; if not, sending a second parameter to the first data providing end; the second parameter is a model parameter of the neural network model determined after N-2 iterations of the initial neural network model.

In a third aspect, a data retrieval method is provided, which is applied to a data server, and includes: receiving data to be retrieved from a plurality of data providing terminals; receiving first indication information from a block chain system; the first indication information is used for indicating the data server to send data to be retrieved corresponding to K file index vectors to the data using end, wherein K is a positive integer; responding to the first indication information, and matching to-be-retrieved data corresponding to the K file index vectors in to-be-retrieved data of the multiple data providing ends; and sending the data to be retrieved corresponding to the K file index vectors to the data using end.

In a fourth aspect, a data retrieval method is provided, which is applied to a data user, and includes: sending a retrieval request to the blockchain system; determining K file index vectors with the highest similarity to the retrieval vector of the retrieval request by the block link point, and sending first indication information to the data server by the block link point, wherein the first indication information is used for indicating the data server to send data to be retrieved corresponding to the K file index vectors to the data using end, and K is a positive integer; and receiving data to be retrieved corresponding to the K file index vectors sent by the data server in response to the first indication information.

With reference to the fourth aspect, in a possible implementation manner, the data to be retrieved corresponding to the K file index vectors is the data to be retrieved, which is encrypted by the first data providing end according to the private key; the method further comprises the following steps: sending a first request message to a first data provider, wherein the first data provider is a data provider with data to be retrieved corresponding to K file index vectors in a plurality of data providers; the first request message is used for requesting to acquire a private key of the first data providing end; receiving a first response message sent by a first data providing end; the first response message comprises a private key of the first data provider; and decrypting the data to be retrieved corresponding to the K file index vectors according to the private key.

In a fifth aspect, a data retrieval apparatus is provided, which is applied to a first data provider, and includes: a communication unit and a processing unit; the communication unit is used for sending data to be retrieved to the data server; the processing unit is used for inputting the data to be retrieved into a preset neural network model and determining a file index vector corresponding to the data to be retrieved; the communication unit is further used for sending the file index vector to the block chain system.

With reference to the fifth aspect, in a possible implementation manner, the first data provider is one of a plurality of data providers; a processing unit further configured to perform the following step 1, step 6: step 1, obtaining a first model parameter; when the iteration is carried out for the first time, the first model parameter is an initial model parameter; during the Nth iteration, the first model parameter is a model parameter determined according to the first parameters of a plurality of data providing ends acquired from the blockchain system; the first parameter is a model parameter of the neural network model determined after N-1 iterations of the initial neural network model; n is a positive integer greater than or equal to 2; step 2, adjusting model parameters of the first neural network model according to the first model parameters, and determining a second neural network model; during the first iteration, the first neural network model is an initial neural network model; during the Nth iteration, the first neural network model is a neural network model determined after the initial neural network model is iterated for N-1 times; step 3, training a second neural network model according to the training data, and determining a third neural network model; step 4, determining whether the third neural network model meets a preset condition; step 5, if yes, determining the third neural network model as a preset neural network model; step 6, if the first parameter is not satisfied, taking the parameter of the third neural network model as the first parameter, and sending the first parameter to the block chain system; and (3) iteratively executing the step 1, the step 2, the step 3, the step 4, the step 5 and the step 6 until the determined third neural network model meets the preset condition.

With reference to the fifth aspect, in a possible implementation manner, the processing unit is further configured to determine a private key of the first data provider; the processing unit is also used for encrypting the data to be retrieved according to the private key; and the communication unit is also used for sending the encrypted data to be retrieved to the data server.

With reference to the fifth aspect, in a possible implementation manner, the communication unit is further configured to receive a first request message from the data consumer; the first request message is used for requesting to acquire a private key of the first data providing end; the communication unit is also used for sending a first response message to the data using end; the first response message includes a private key of the first data provider.

With reference to the fifth aspect, in a possible implementation manner, the processing unit is further configured to encrypt the file index vector according to a private key; and the communication unit is also used for sending the encrypted file index vector to the block chain system.

In a sixth aspect, a data retrieving apparatus is provided, which is applied to a block chain system, and the apparatus includes: a communication unit and a processing unit; the communication unit is used for receiving and storing file index vectors from a plurality of data providing terminals; the file index vector corresponds to the data to be retrieved; the communication unit is also used for receiving a retrieval request from the data using end, and the processing unit is used for generating a retrieval vector according to the retrieval request; the processing unit is also used for matching K file index vectors with the highest similarity with the retrieval vector in the file index vectors, wherein K is a positive integer; and the communication unit is also used for sending first indication information to the data server, wherein the first indication information is used for indicating the data server to send the data to be retrieved corresponding to the K file index vectors to the data using end.

With reference to the sixth aspect, in a possible implementation manner, the communication unit is further configured to: receiving first parameters from a plurality of data providing ends, wherein the first parameters are model parameters of the neural network model determined after N-1 iterations are carried out on the initial neural network model; n is a positive integer greater than or equal to 2; storing the first parameter in at least one blockchain node; receiving a second request message from the first data provider; the second request message is used for requesting the first parameters of the plurality of data providing terminals; the first data providing end is one of the at least one data providing end; and responding to the second request message, and sending the first parameters of the plurality of data providers to the first data provider.

With reference to the sixth aspect, in a possible implementation manner, the processing unit is further configured to determine the number of data providers connected by the blockchain system; determining whether the number of the received first parameters is equal to the number of the data providing terminals; if yes, the communication unit is further used for sending first parameters of the multiple data providing ends to the first data providing end; if not, the communication unit is further used for sending a second parameter to the first data providing end; the second parameter is a model parameter of the neural network model determined after N-2 iterations of the initial neural network model.

In a seventh aspect, a data retrieval apparatus is provided, which is applied to a data server, and includes: a communication unit and a processing unit; the communication unit is used for receiving data to be retrieved from a plurality of data providing terminals; the communication unit is also used for receiving first indication information from the block chain system; the first indication information is used for indicating the data server to send data to be retrieved corresponding to K file index vectors to the data using end, and K is a positive integer; the processing unit is also used for responding to the first indication information and matching the data to be retrieved corresponding to the K file index vectors in the data to be retrieved of the plurality of data providing ends; and the communication unit is also used for sending the data to be retrieved corresponding to the K file index vectors to the data using end.

In an eighth aspect, a data retrieval device is provided, which is applied to a data using end, and includes: a communication unit and a processing unit; a processing unit for instructing the communication unit to send a retrieval request to the blockchain system; determining K file index vectors with the highest similarity to the retrieval vector of the retrieval request by the block link point, and sending first indication information to the data server by the block link point, wherein the first indication information is used for indicating the data server to send data to be retrieved corresponding to the K file index vectors to the data using end, and K is a positive integer; and the processing unit is also used for indicating the communication unit to receive the data to be retrieved corresponding to the K file index vectors sent by the data server in response to the first indication information.

With reference to the eighth aspect, in a possible implementation manner, the processing unit is configured to instruct the communication unit to send a first request message to a first data provider, where the first data provider is a data provider with data to be retrieved corresponding to K file index vectors in the multiple data providers; the first request message is used for requesting to acquire a private key of the first data providing end; the processing unit is also used for indicating the communication unit to receive the first response message sent by the first data providing end; the first response message comprises a private key of the first data provider; and the processing unit is also used for decrypting the data to be retrieved corresponding to the K file index vectors according to the private key.

In a ninth aspect, the present disclosure provides a data retrieval system, comprising: the data retrieval apparatus described in any possible implementation manner of the above fifth aspect and fifth aspect; the data retrieval apparatus described in any one of the above-described possible implementations of the sixth aspect and the sixth aspect; the data retrieval apparatus described in any possible implementation manner of the seventh aspect and the seventh aspect above; the data retrieval apparatus described in any one of the possible implementations of the above-described eighth aspect and eighth aspect.

In a tenth aspect, the present disclosure provides a data retrieval apparatus comprising: a processor and a memory; wherein the memory is configured to store computer-executable instructions, and when the data retrieval device is running, the processor executes the computer-executable instructions stored by the memory to cause the data retrieval device to perform the data retrieval method as described in the first aspect and any one of the possible implementations of the first aspect.

In an eleventh aspect, the present disclosure provides a data retrieval apparatus comprising: a processor and a memory; wherein the memory is configured to store computer-executable instructions, and when the data retrieval device is running, the processor executes the computer-executable instructions stored by the memory to cause the data retrieval device to perform the data retrieval method as described in the second aspect and any possible implementation manner of the second aspect.

In a twelfth aspect, the present disclosure provides a data retrieval apparatus comprising: a processor and a memory; wherein the memory is configured to store computer-executable instructions, and when the data retrieval apparatus is running, the processor executes the computer-executable instructions stored by the memory to cause the data retrieval apparatus to perform the data retrieval method as described in the third aspect and any one of the possible implementation manners of the third aspect.

In a thirteenth aspect, the present disclosure provides a data retrieval apparatus comprising: a processor and a memory; wherein the memory is configured to store computer-executable instructions, and when the data retrieval device is running, the processor executes the computer-executable instructions stored by the memory to cause the data retrieval device to perform the data retrieval method as described in any one of the possible implementations of the fourth aspect and the fourth aspect.

In a fourteenth aspect, the present disclosure provides a computer-readable storage medium having instructions stored therein, which when executed by a processor of a data retrieval apparatus, enable the data retrieval apparatus to perform the data retrieval method as described in the first aspect and any one of the possible implementations of the first aspect.

In a fifteenth aspect, the present disclosure provides a computer-readable storage medium having stored therein instructions, which, when executed by a processor of a data retrieval device, enable the data retrieval device to perform a data retrieval method as described in the second aspect and any one of the possible implementations of the second aspect.

In a sixteenth aspect, the present disclosure provides a computer-readable storage medium having stored therein instructions that, when executed by a processor of a data retrieval apparatus, enable the data retrieval apparatus to perform a data retrieval method as described in the third aspect and any one of the possible implementations of the third aspect.

In a seventeenth aspect, the present disclosure provides a computer-readable storage medium having stored therein instructions that, when executed by a processor of a data retrieval device, enable the data retrieval device to perform the data retrieval method as described in the fourth aspect and any one of its possible implementations.

In the present disclosure, the names of the above data retrieval devices do not limit the devices or functional modules themselves, and in actual implementation, the devices or functional modules may appear by other names. Insofar as the functions of the respective devices or functional modules are similar to those of the present disclosure, they are within the scope of the claims of the present disclosure and their equivalents.

These and other aspects of the disclosure will be more readily apparent from the following description.

The technical scheme provided by the disclosure at least brings the following beneficial effects: the application provides a data retrieval method and a data providing terminal, after determining to-be-retrieved data needing to be sent to a data server, training the data according to a preset neural network model to obtain a file index vector of the to-be-retrieved data. And the data providing end respectively sends the data to be retrieved to the data server and sends the file index vector of the data to be retrieved to the block chain system. Therefore, when the data using end needs to search, the data using end can send a search request to the blockchain system, and the blockchain system matches a plurality of file index vectors most relevant to the search request according to the search vector of the search request and instructs the data server to send the data to be searched corresponding to the file index vectors to the data providing end.

Based on the method, the data to be retrieved is directly trained at the data sending end, the trained file index vector is sent to the block chain system, the file index vector is prevented from being tampered by the outside, the data to be retrieved is sent to the data server, and the data to be retrieved is provided for the data using end by the data server. Therefore, data can be trained through the neural network model at the data sending end, and the problem that the retrieval result is inaccurate when a vector space model is used in the related technology is solved. In addition, the index file is stored in the block chain system, so that the index file can be prevented from being tampered by the outside, and the retrieval accuracy is improved.

Drawings

Fig. 1 is a schematic diagram of a hardware structure of a data retrieval device provided in the present disclosure;

FIG. 2 is a schematic diagram of a data retrieval system according to the present disclosure;

fig. 3 is a schematic diagram of a unit module in a first data provider and an internal flow for performing model training and data processing according to the present disclosure;

fig. 4 is a schematic diagram of a unit module of a data using end and an internal flow for data decryption provided by the present disclosure;

fig. 5 is a schematic diagram of a unit module of a blockchain system and an internal flow for performing uplink data storage according to the present disclosure;

fig. 6 is a schematic diagram of a unit module of a data server and an internal flow for storing and sending data to be retrieved according to the present disclosure;

FIG. 7 is a schematic flow chart diagram illustrating a data retrieval method provided by the present disclosure;

FIG. 8 is a schematic flow chart diagram illustrating yet another data retrieval method provided by the present disclosure;

fig. 9 is a schematic structural diagram of a data retrieval device applied to a first data provider according to the present disclosure;

FIG. 10 is a schematic diagram illustrating a data retrieval device applied to a blockchain system according to the present disclosure;

fig. 11 is a schematic structural diagram of a data retrieval device applied to a data server according to the present disclosure;

fig. 12 is a schematic structural diagram of a data retrieval device applied to a data using end according to the present disclosure.

Detailed Description

The following describes the data retrieval method, apparatus, and storage medium provided by the embodiments of the present disclosure in detail with reference to the accompanying drawings.

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.

The terms "first" and "second" and the like in the specification and drawings of the present disclosure are used for distinguishing different objects or for distinguishing different processes for the same object, and are not used for describing a specific order of the objects.

Furthermore, the terms "including" and "having," and any variations thereof, mentioned in the description of the present disclosure, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It is noted that in the embodiments of the present disclosure, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "e.g.," in an embodiment of the present disclosure is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.

Fig. 1 is a schematic structural diagram of a data retrieval device according to an embodiment of the present disclosure. As shown in fig. 1, the data retrieval device 100 includes at least one processor 101, a communication line 102, and at least one communication interface 104, and may also include a memory 103. The processor 101, the memory 103 and the communication interface 104 may be connected via a communication line 102.

The processor 101 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present disclosure, such as: one or more Digital Signal Processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs).

The communication link 102 may include a path for transmitting information between the aforementioned components.

The communication interface 104 is used for communicating with other devices or a communication network, and may use any transceiver or the like, such as ethernet, radio Access Network (RAN), wireless Local Area Network (WLAN), and the like.

The memory 103 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to include or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In a possible design, the memory 103 may exist independently from the processor 101, that is, the memory 103 may be a memory external to the processor 101, in which case, the memory 103 may be connected to the processor 101 through the communication line 102, and is used for storing execution instructions or application program codes, and is controlled by the processor 101 to execute, so as to implement the network quality determination method provided by the following embodiments of the present disclosure. In yet another possible design, the memory 103 may be integrated with the processor 101, that is, the memory 103 may be an internal memory of the processor 101, for example, the memory 103 is a cache memory, and may be used for temporarily storing some data and instruction information.

As one implementation, processor 101 may include one or more CPUs, such as CPU0 and CPU1 in fig. 1. As another implementation, the data retrieval device 100 may include multiple processors, such as the processor 101 and the processor 107 of FIG. 1. As yet another implementation, the data retrieval apparatus 100 may further include an output device 105 and an input device 106.

Through the description of the foregoing embodiments, it may be clearly understood by those skilled in the art that, for convenience and simplicity of description, only the division of each functional module is illustrated, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the network node is divided into different functional modules, so as to complete all or part of the functions described above. For the specific working processes of the system, the module and the network node described above, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

Hereinafter, terms related to the present application will be explained.

1. Block chain

The block chain technology is a core supporting technology of a digital cryptocurrency system represented by bitcoin. The block chain has the characteristics of decentralization, flexibility and safety.

A blockchain typically includes a plurality of blockchain nodes, and all blockchain nodes in the blockchain collectively maintain data written in the blockchain ledger. Since writing/deleting/modifying data in blockchain nodes requires that all blockchains be in common before it can be done, blockchains are a distributed database of data that is difficult to tamper with.

The distribution of the block chain is embodied not only as the distributed storage of data but also as the distributed recording of data. Distributed recording of data refers to participants of the blockchain system maintaining the data in the recorded blockchain together. The data stored in the block chain is also called as a block, and the block chain link points connect the blocks into a data chain according to different sequence of generation time of the blocks.

From a technical point of view, blockchains are not a single technique, but are the result of integration of multiple techniques. These technologies are combined together in a new structure to form a new data recording, storing and expressing mode, namely a block chain.

Generally, a blockchain has at least the following three features, which are respectively: non-tamperproof, decentralized, and smart contracts.

Among them, non-tampering refers to the fact that the blockchain technology can be applied to the whole process of tracing the data uplink, and every transfer on the chain is recorded and permanently stored by the blockchain link point. Because the data is shared by all block chain nodes in the block chain, and the modification of the database on a certain block chain node is invalid, the block chain can ensure the stability and reliability of the data and reduce the risk of tampering the data.

Decentralized refers to that the block chain maintains the shared platform collectively in a decentralized mode, each node can directly acquire information in the authority range according to own needs without an intermediate platform for information transmission, dependence on a third party is reduced, and events such as contribution and escape of the third party platform in a centralized mode can be avoided.

Smart contracts can be used to partially reduce default events, particularly in the datafiable domain.

The block chain can be divided into the following three types according to the application type, which are respectively: public, private, and federation chains, each described below.

Public chain refers to a kind of block chain without official organization and management organization and without central server. Nodes participating in the public chain freely access the network and are not controlled according to system rules, and the nodes work based on a consensus mechanism. All nodes in the blockchain can read the sending transaction to carry out effective confirmation, and any node can participate in the consensus process. In a public chain, the security, the transparency and the non-tampering of the whole block chain are jointly maintained among all nodes through a common recognition mechanism such as a cryptographic technology and a Proof of Work (POW) 2, a POS3 and the like. Typical applications of public chains include bitcoin, etherhouse, etc.

Private chain refers to a blockchain established in an intranet (e.g., in an enterprise intranet). The operation rule of the blockchain system needs to be set according to the requirement, and only a few nodes have the authority of modification/reading and the like. The private chain retains the authenticity and partially decentralized nature of the blockchain. Typical applications of the private chain include: eris Industries, overstock.

A federation chain refers to a block chain between a public chain and a private chain that is initiated jointly by several enterprises. The federation chain has a partially decentralized feature. Each organization operates one or more nodes, the data of which only allows different organizations within the blockchain to read, write and send transactions, and together record transaction data. Typical applications for federation chains include super ledger, R3 block chain federation (R3 CEV).

When the development history of the blockchain is divided from the application point of view, the blockchain is generally divided into a blockchain 1.0 stage to a blockchain 3.0 stage.

Specifically, the phase in which the blockchain is applied to the programmable digital currency represented by the bitcoin at the earliest time is referred to as the blockchain 1.0 phase. The blockchain technology expands to the financial field, mainly is a ownership registration authentication system, and market trading and business credit behaviors such as stocks, bonds, foreign exchanges, insurance and the like are realized through intelligent contracts, and the stage is also called a blockchain 2.0 stage. Hereafter, on the basis of the 2.0 stage of the block chain, the application of the token, the distributed ledger, the data layer block chain and the combination of new technologies such as artificial intelligence and the like in the financial field are further emphasized, and the stage is also called the 2.5 stage of the block chain. Currently, blockchains have been applied to fields other than the financial field, such as authentication-based services built by credit consensus, such as government, science, art, health, communications, etc., which is also referred to as the blockchain 3.0 phase.

2. Federal learning

Federal learning is a distributed machine learning technology, and can ensure the safety of data and solve the problem of data islands.

In the development of machine learning technology, the data islanding problem caused by how to ensure private data not to be leaked and how to solve data barriers caused by network security isolation and industry privacy in different industries is always faced.

In order to solve the problems of data safety and data islanding, a federal learning technology is provided in the related technology, and can be applied to a federal learning system which mainly comprises a plurality of clients and a central server. Each client is used for providing data, the central server sends a machine learning model to each client, each client trains the data in the client after receiving the machine learning model, each client sends the trained parameters to the central server, the central server collects and integrates the parameters of each client and then sends the integrated parameters to each client respectively so as to enable the clients to carry out the next iterative training, and each client continues the iterative training according to the integrated parameters after receiving the integrated parameters until the trained models meet the conditions and then stops the iterative training.

According to different data set characteristic information X of the client participating in training, the federal learning is divided into horizontal federal learning, longitudinal federal learning and federal transfer learning, and the following description is respectively given:

and horizontal federal learning is used for processing data with the same data set characteristics X and label information but different sample identifications (Identity, ID). For example, the medical record information and characteristic data for different hospitals are generally the same (i.e., data set characteristic a and label information are the same), but the users for different hospitals are different (i.e., sample IDs are different).

And longitudinal federal learning is used for processing the data with the same sample ID and different characteristics X of each data set. For example, a bank cooperates with an e-commerce platform to determine a user's credit reputation. The same group of users have the same sample ID, but because of different businesses, the user characteristics of the bank are different from those of the e-commerce platform.

And the federal transfer learning is used for processing the data with different sample IDs and different data characteristics X. Federal migration learning is used to solve the problems of low label samples and insufficient data sets. For example, data migration between the Chinese e-commerce platform and government information is difficult to realize, and the problem of difficult data migration can be solved through federal migration learning.

3. Ciphertext retrieval

In the current data processing field, with the continuous increase of data volume, large-scale data are transmitted to an untrusted cloud end for processing, and authorized users can share and retrieve the data of the cloud end at the cloud end. In order to ensure the security of data, a scheme of ciphertext retrieval is provided, the scheme is used for uploading encrypted data to a cloud end, and a user shares and retrieves the encrypted data, so that the data security is improved.

In the related art, the ciphertext retrieval technique includes: symmetric searchable encryption schemes based on ciphertext scanning concepts, ciphertext retrieval schemes (ABE) based on attributes, and the like.

Since the encryption key and the decryption key of the symmetric encryption are the same, the key leakage is easily caused by the transmission of the symmetric encryption key. The attribute-based ciphertext retrieval scheme solves the problem of secret key leakage caused by symmetric encryption key transmission, and realizes fine-grained access control of encrypted data, namely, a data owner can specify a person with a characteristic attribute to access the encrypted data.

In addition, the related art also proposes that a vector space model is adopted to vectorize the file and the retrieval request, and a secure nearest neighbor (KNN) method is used to encrypt the vector to construct an encryption index with linear structural features. The technical scheme of ciphertext retrieval is realized by calculating the relevancy score of the file and the retrieval request. In addition, in the related art, a ciphertext retrieval scheme aiming at fuzzy search and semantic retrieval also exists, which expands the retrieval keywords and solves the problem of ciphertext retrieval under the conditions of spelling errors, synonyms and the like.

In addition to the above features, the ciphertext retrieval scheme may be combined with other technologies to perform function expansion according to different use backgrounds and requirements. For example, a ciphertext search scheme based on WordNet search term expansion and a ciphertext search scheme based on user interests. Or a specific data structure is used to accelerate the retrieval speed, such as a B + tree-based ciphertext retrieval scheme and an inverted index-based ciphertext retrieval scheme.

The file and the retrieval request are vectorized by adopting a vector space model, and the vector is encrypted by using a safe KNN method to construct an encryption index with linear structure characteristics. In the technical scheme of implementing ciphertext retrieval by calculating the relevance score of the file and the retrieval request, the file and the retrieval request vector are generally vectorized by using a vector space model such as term frequency-inverse document frequency (TFIDF) for carrying out the relevance score of the file based on the distribution of query terms. The file index vector of the data to be retrieved is determined by using the vector space model, and the accuracy of the retrieval vector is low, so that the accuracy of the final retrieval result determined by the data server is also low.

With the development of deep neural networks in the field of natural language processing, data can be trained through a neural network model in the field of plaintext retrieval at present, and the accuracy of retrieval matching is improved. However, in ciphertext retrieval, data is generally encrypted and then uploaded to the cloud, and at this time, the cloud cannot directly train a neural network model for a ciphertext because the cloud cannot determine specific content of the ciphertext.

In order to solve the technical problem, the application provides a data retrieval method and a data providing terminal, after determining data to be retrieved which needs to be sent to a data server, training the data according to a preset neural network model to obtain a file index vector of the data to be retrieved. And the data providing end respectively sends the data to be retrieved to the data server and sends the file index vector of the data to be retrieved to the block chain system. Therefore, when the data using end needs to search, a search request can be sent to the blockchain system, and the blockchain system matches a plurality of file index vectors most relevant to the search request according to the search vector of the search request, and instructs the data server to send the data to be searched corresponding to the file index vectors to the data providing end.

Based on the method, the data to be retrieved are directly trained at the data sending end, the trained file index vector is sent to the block chain system, the file index vector is prevented from being tampered by the outside, the data to be retrieved are sent to the data server, and the data server provides the data to be retrieved for the data using end. Therefore, data can be trained through the neural network model at the data sending end, and the problem that the retrieval result is inaccurate when a vector space model is used in the related technology is solved. In addition, the index file is stored in the block chain system, so that the index file can be prevented from being tampered by the outside, and the retrieval accuracy is improved.

The data retrieval method provided by the present application can be applied to the data retrieval system 20 shown in fig. 2, and as shown in fig. 2, the data retrieval system 20 includes: a plurality of data providing terminals 201, data using terminals 202, a block chain system 203 and a data server 204.

Wherein, the first data provider 201 is configured to: sending data to be retrieved to the data server 204; inputting the data to be retrieved into a preset neural network model, and determining a file index vector corresponding to the data to be retrieved; the file index vector is sent to the blockchain system 203. The first data provider 201 is one of the plurality of data providers 201.

A data consumer 202 configured to: a search request is sent to the blockchain system 203.

A blockchain system 203 configured to: receiving a file index vector from the data provider 201, and uplink-storing the file index vector; receiving a retrieval request from the block chain system 203, and generating a retrieval vector according to the retrieval request; matching K file index vectors with the highest similarity with the retrieval vector in the file index vectors; and sending first indication information to the data server 204, where the first indication information is used to indicate that the data server 204 sends the data to be retrieved corresponding to the K file index vectors to the data using end 202.

A data server 204 configured to: receiving data to be retrieved from a plurality of data providing terminals 201; receiving a first indication from the blockchain system 203; the first indication information is used for indicating the data server 204 to send the data to be retrieved corresponding to the K file index vectors to the data consumer 202; and in response to the first indication information, matching the data to be retrieved corresponding to the K file index vectors in the data to be retrieved of the multiple data providers 201.

A data consumer 202 configured to: and receiving data to be retrieved corresponding to the K file index vectors from the data server 204.

The scheme at least has the following beneficial effects: the application provides a data retrieval method and a data providing terminal, wherein after the data to be retrieved which needs to be sent to a data server is determined, the data is trained according to a preset neural network model, and a file index vector of the data to be retrieved is obtained. And the data providing end respectively sends the data to be retrieved to the data server and sends the file index vector of the data to be retrieved to the block chain system. Therefore, when the data using end needs to search, the data using end can send a search request to the blockchain system, and the blockchain system matches a plurality of file index vectors most relevant to the search request according to the search vector of the search request and instructs the data server to send the data to be searched corresponding to the file index vectors to the data providing end.

Based on the method, the data to be retrieved are directly trained at the data sending end, the trained file index vector is sent to the block chain system, the file index vector is prevented from being tampered by the outside, the data to be retrieved are sent to the data server, and the data server provides the data to be retrieved for the data using end. Therefore, data can be trained through the neural network model at the data sending end, and the problem that retrieval results are inaccurate when a vector space model is used in the related technology is solved. In addition, the index file is stored in the block chain system, so that the index file can be prevented from being tampered by the outside, and the retrieval accuracy is improved.

In a possible implementation manner, the data provider 201 is further configured to: the following steps 1-6 are performed iteratively.

Step 1, obtaining a first model parameter; when the iteration is carried out for the first time, the first model parameter is an initial model parameter; at the nth iteration, the first model parameter is a model parameter determined according to the first parameters of the plurality of data providers 201 acquired from the blockchain system 203; the first parameter is a model parameter of the neural network model determined after N-1 iterations are carried out on the initial neural network model; n is a positive integer greater than or equal to 2.

Step 2, adjusting model parameters of the first neural network model according to the first model parameters, and determining a second neural network model; at a first iteration, the first neural network model is the initial neural network model; and in the Nth iteration, the first neural network model is a neural network model determined after the initial neural network model is iterated for N-1 times.

And 3, training the second neural network model according to training data, and determining a third neural network model.

And 4, determining whether the third neural network model meets a preset condition.

And 5, if yes, determining the third neural network model as the preset neural network model.

Step 6, if the first parameter does not meet the first requirement, taking the parameter of the third neural network model as the first parameter, and sending the first parameter to the block chain system 203; and iteratively executing the step 1, the step 2, the step 3, the step 4, the step 5 and the step 6 until the determined third neural network model meets the preset condition.

A blockchain system 203 further configured to: receiving first parameters from the data providing terminals 201, wherein the first parameters are model parameters of the neural network model determined after N-1 iterations of the initial neural network model; n is a positive integer greater than or equal to 2.

Storing the first parameter into at least one blockchain node.

Receiving a second request message from the first data provider 201; the second request message is used to request the first parameters of the plurality of data providers 201.

In response to the second request message, the first parameters of the plurality of data providers 201 are sent to the first data provider 201.

Based on the method, the neural network model training is carried out among the data providing ends in a federal learning mode, so that parameters of the neural network model obtained by training of the data providing ends are consistent, and the consistency of vectors is better because the data providing ends obtain files according to the neural network model. In addition, the data providing end sends the model parameters to the block chain system during each iterative training, the block chain system performs summary storage and returns the model parameters to the first data providing end, the model parameters can be prevented from being tampered in the process of interacting the model parameters by the data providing ends, and the accuracy of model training is improved.

In a possible implementation manner, the data provider 201 is further configured to: and determining a private key of the first data provider, encrypting the data to be retrieved according to the private key, and sending the encrypted data to be retrieved to the data server.

A data server 204, further configured to: receiving and storing the encrypted data to be retrieved; after receiving the first indication information, sending the encrypted data to be retrieved corresponding to the K file index vectors to the data using end 202.

The data consumer 202 is further configured to: after receiving the encrypted data to be retrieved corresponding to the K file index vectors, sending a first request message to the data provider 202, where the first request message is used to request to obtain a private key of the first data provider.

The data provider 201, further configured to: receiving a first request message, and sending a first response message to the data using end; the first response message includes a private key of the first data provider.

The data consumer 202 is further configured to: and receiving a first response message, and decrypting the encrypted data to be retrieved corresponding to the K file index vectors according to the private key.

Based on the scheme, the data to be retrieved is transmitted among the data providing end, the data server and the data using end in an encryption mode, the problem of data leakage caused by plaintext transmission in the transmission process can be reduced, and meanwhile, the scheme based on the encryption transmission enables the application to be more conveniently applied to the ciphertext retrieval process.

In a possible implementation manner, the data provider 201 is further configured to: encrypting the file index vector according to the private key; and sending the encrypted file index vector to the block chain system.

A blockchain system 203 further configured to: and receiving the encrypted file index vector, and uplink-storing the encrypted file index vector.

Based on the technical scheme, the file index vector is transmitted between the data providing end and the block chain system in an encryption mode, the problem of file index vector leakage caused by plaintext transmission in the transmission process can be reduced, and meanwhile, the method and the device can be more conveniently applied to the ciphertext retrieval process based on the scheme of encryption transmission.

In the above, the description is given of each device in the data retrieval system and the interaction flow between the devices provided in the embodiment of the present application.

Hereinafter, a unit module in each device involved in the above search system and an internal data processing flow will be described.

Fig. 3 is a schematic diagram of the unit modules in the first data provider 201 and the internal flow of performing model training and data processing. As shown in fig. 3, the first data providing terminal 201 includes an initial neural network model 2011, a storage unit 2012, a model training module 2013, a KNN module 2014, a ronard-li-vister, addi samhr, a lunard-aldman (RSA) module 2015, the storage unit 2012 stores data to be retrieved, the first data providing terminal 201 trains the initial neural network model through the model training module 2013 to obtain a preset neural network model, and the first data providing terminal inputs the data to be retrieved into the preset neural network model to obtain a file index vector. And the first data providing terminal encrypts the model parameters, the file index vectors and the data to be retrieved according to the KNN module 2014 and the RSA module 2015. And then uploading the encrypted model parameters and the file index vectors to a block chain system, and uploading the encrypted data to be retrieved to a data server. In addition, the first data providing end is used for interacting with the block chain in each iteration process, obtaining model parameters of all the data providing ends in the previous traumatic injury process, and uploading the model parameters after the iteration is completed.

Fig. 4 is a schematic diagram of a unit module of the data consumer 202 and an internal flow of data decryption provided in the present application. As shown in fig. 4, the data consumer 202 includes: an RSA module 2021, and a memory module 2022. The data using end interacts with the data server to obtain encrypted data to be retrieved, interacts with the data providing end and obtains a public key of the data providing end. The storage module 2022 is used for storing the encrypted data to be retrieved and the public key of the data provider. The RSA module 2021 is configured to decrypt the data to be retrieved according to the public key of the data provider, and determine plaintext data of the data to be retrieved.

Fig. 5 is a schematic diagram of the unit modules of the blockchain system 203 and the internal flow of performing uplink data storage according to the present application. As shown in fig. 5, the block chain system includes M block chain nodes, and the block chain nodes are used for uplink storage of data (model parameters, user information, file index vectors, and the like) reported by the data providing end after each iteration is completed, and are updated in the subsequent iteration process. Specifically, after the nth iteration is completed, each block link point links and stores the model parameters determined after the nth iteration of the data providing terminal is completed, the user information of the data providing terminal, and the block link points update the file index vector table according to the file index vector provided by the data providing terminal after the nth iteration is completed (if the iteration is the first iteration, the file index vector table is directly generated according to the file index vector provided by the data providing terminal). After the (N + 1) th iteration is completed, each block link point uplink stores the model parameters determined after the (N + 1) th iteration of the data providing terminal is completed and the user information of the data providing terminal, and the block link points update the file index vector table according to the file index vector provided by the data providing terminal after the (N + 1) th iteration is completed.

Fig. 6 is a schematic diagram of the unit modules of the data server 204 and the internal flow of storing and sending the data to be retrieved according to the present application. As shown in fig. 6, the data server 204 includes: a processing module 2041 and a storage module 2042. The data server receives data to be retrieved from the data provider 201, and stores the data to be retrieved in the storage module 2042; the communication module is further configured to receive first indication information from a block link node, and the processing module 2041 determines to-be-retrieved data corresponding to the K file retrieval vectors according to the first indication information, acquires the to-be-retrieved data from the storage module 2042, and sends the to-be-retrieved data to the data consumer.

The unit modules in the respective devices and the internal data processing flow in the search system are described above.

Hereinafter, the data search method provided by the present application will be described in detail.

The embodiment of the application provides a data retrieval method, which can be applied to a data retrieval system shown in fig. 2. As shown in fig. 7, which is a schematic flow chart of a data retrieval method provided in an embodiment of the present application, the data retrieval method may specifically be implemented by the following steps:

step 700, the data providing terminal sends the data to be retrieved to the data server. Correspondingly, the data server receives the data to be retrieved from the data providing end.

It should be noted that the number of data providing terminals may be plural. Correspondingly, in step 700, the data to be retrieved may also be sent to the data server by the plurality of data providing terminals, respectively.

It can be understood that, when the application is applied to a scenario of ciphertext retrieval, the data to be retrieved, which is sent by the data providing end to the data server, may be encrypted data to be retrieved.

After receiving the data to be retrieved from the data providing end, the data server stores the data to be retrieved in the storage space of the data server.

Optionally, the data server in the present application may be a public cloud server.

Step 701, the data providing end inputs the data to be retrieved into a preset neural network model, and determines a file index vector corresponding to the data to be retrieved.

In a possible implementation manner, the preset neural network model is a pre-trained neural network model, an input parameter of the neural network model is data to be retrieved, and an output parameter is a file index vector corresponding to the data to be retrieved.

Optionally, in the embodiment of the present application, when there are multiple data providing terminals, the multiple data providing terminals train the neural network model in a federal learning manner, so as to ensure that the preset neural network models used by the data providing terminals are the same, and further ensure consistency of file index vectors determined by the data providing terminals according to the neural network model.

Step 702, the data providing end sends the file index vector to the blockchain system. Accordingly, the blockchain system receives the file index vector from the data provider.

In a possible implementation manner, in the case that there are multiple data providers, after receiving the file index vectors sent by the multiple data providers, the blockchain updates the file index vector summary table according to the file index vectors sent by the multiple data providers. And the block chain stores the updated file index vector summary list in a chain mode.

It can be understood that, when the application is applied to a scenario of ciphertext retrieval, the file index vector sent by the data providing end to the block chain system may be an encrypted file index vector.

Step 703, the data using end sends a search request to the blockchain system. Accordingly, the blockchain system receives a retrieval request from the data consumer.

Wherein the retrieval request is for requesting retrieval of the associated data file. Optionally, the search request may be a search keyword of the first operation data at the data using end by the user, or a section of the search request that needs to be searched.

For example, when the driving information of the vehicle a needs to be retrieved, the user inputs a retrieval request "query driving information of the vehicle a" data using end at the data using end through a first operation, and sends "query driving information of the vehicle a" as the retrieval request to the blockchain system.

It is to be understood that the first operation may be a first operation that is input through an input device (e.g., a keyboard, a touch screen), or may be a first operation that is input by determining a motion of a user through image recognition, or a first operation that is input by determining a voice of the user through voice recognition, or another first operation that can be used to input information, which is not limited in this application.

Step 704, the block chain system generates a search vector according to the search request.

In a possible implementation manner, the preset neural network model is also configured in the blockchain system, the blockchain system inputs the search request into the preset neural network model, determines an input result of the preset neural network model, and the blockchain system takes an output result of the preset neural network model as the search vector.

It can be understood that the preset neural network model in the block chain has the same model parameters as the preset neural network model at the data providing end.

Step 705, the blockchain system matches the K file index vectors with the highest similarity to the search vector in the file index vectors.

In a possible implementation manner, the block chain system compares the retrieval vector with the file index vectors in the file index vector summary table, and determines the similarity between the retrieval vector and each file index vector; and the block chain system sorts the file index vectors according to the sequence of the similarity from large to small, and selects the K file index vectors with the highest similarity according to the sorting.

Step 706, the blockchain system sends the first indication information to the data server. Accordingly, the data server receives the first indication information from the blockchain system.

The first indication information is used for indicating the data server to send the data to be retrieved corresponding to the K file index vectors to the data using end.

In a possible implementation manner, the blockchain system generates the first indication information according to the K file index vectors and the related information (such as the information of the data provider) of the K file index vectors. After which the blockchain system sends the first information to the data server.

Step 707, responding to the first indication information, the data server matches the data to be retrieved corresponding to the K file index vectors in the data to be retrieved of the multiple data providing terminals.

In a possible implementation manner, after receiving the first indication information, the data server determines the data providing ends of the K file index vectors. The data server determines the data providing terminals and obtains the data to be retrieved of the data providing terminals. And the data server searches a mapping relation table of the file index vectors and the data to be retrieved, and determines the data to be retrieved corresponding to the K file index vectors according to the mapping relation table.

And 708, the data server sends the data to be retrieved corresponding to the K file index vectors to the data using end. Correspondingly, the data using end receives data to be retrieved corresponding to the K file index vectors from the data server.

In a possible implementation manner, the data providing end may further encrypt the data to be retrieved and the file index vector by a private key of the data providing end. Before step 702, the data provider may also train to obtain the above-mentioned preset neural network model through a federal learning manner, which is described in detail below.

As shown in fig. 8, the data retrieval method provided in the embodiment of the present application may specifically be implemented by the following steps:

step 800, the data providing terminal encrypts the data to be retrieved and sends the encrypted data to be retrieved to the data server.

In one possible implementation, the initial neural network model is included in the data provider. And the data providing end carries out iterative training on the initial neural network model to determine the preset neural network model. After the first iteration is completed, the data providing end encrypts the data to be retrieved and uploads the encrypted data to the data server.

If new data to be retrieved is generated in the data providing end in a certain iteration process, after the iteration is completed, the data providing end encrypts the updated data to be retrieved and sends the encrypted data to the data server.

In a possible implementation manner, the data providing end may use an RSA encryption algorithm, and use its own private key to encrypt the data to be retrieved.

Step 801, the data providing end downloads model parameters of each data providing end in the N-1 iteration process from the block chain system.

In one possible implementation, the initial neural network model is included in the data provider. And the data providing end trains the initial neural network model according to the training data in the first iterative training process to obtain the trained neural network model, and reports the model parameters of the trained neural network model to the block chain system. And after receiving the model parameters reported by each data providing end, the block chain system collects and chains up for storage. In the first iterative training process, the data providing end directly trains according to the initial model vehicle parameters of the initial neural network model without downloading the model parameters from the block chain system.

In the Nth iteration process, the data providing end firstly downloads model parameters of each data providing end in the (N-1) th iteration process from the block chain system, then updates the parameters of the current neural network model according to the model parameters of each data providing end in the (N-1) th iteration process, and trains the updated neural network model. After the training is finished, the data providing end reports the model parameters of the trained neural network model to the block chain system. And after receiving the model parameters reported by each data providing end, the block chain system collects and chains up for storage.

And adding 1 to the value N after each iteration, and iteratively executing the nth iteration process by the data providing end until the trained neural network model meets the preset condition, and taking the neural network model meeting the preset condition as the preset neural network model.

It should be noted that the preset condition may be: the model parameters determined after each data provides the training neural network model are the same, or the iteration times are greater than the preset times, or the preset conditions can be other conditions, which is not limited in the present application.

It can be understood that, the process of updating the parameters of the current neural network model by the data providing terminal according to the model parameters of each data providing terminal in the N-1 th iteration process may be: and the data providing end averages the model parameters of each data providing end in the N-1 th iteration process, updates the obtained average value as the parameters of the neural network model in the Nth iteration process, and trains the updated neural network model.

In a possible implementation manner, the data providing end may specifically determine the preset neural network model through the following steps 1 to 6.

Step 1, obtaining a first model parameter; wherein, during the first iteration, the first model parameter is an initial model parameter; in the Nth iteration, the first model parameter is a model parameter determined according to the first parameters of the multiple data providing ends acquired from the block chain system; the first parameter is a model parameter of the neural network model determined after N-1 iterations are carried out on the initial neural network model; n is a positive integer greater than or equal to 2;

step 2, adjusting model parameters of the first neural network model according to the first model parameters, and determining a second neural network model; at a first iteration, the first neural network model is the initial neural network model; during the Nth iteration, the first neural network model is a neural network model determined after N-1 iterations are carried out on the initial neural network model;

step 3, training the second neural network model according to training data, and determining a third neural network model;

step 4, determining whether the third neural network model meets a preset condition;

step 5, if yes, determining the third neural network model as the preset neural network model;

step 6, if the first parameter does not meet the first requirement, taking the parameter of the third neural network model as the first parameter, and sending the first parameter to the block chain system; and iteratively executing the step 1, the step 2, the step 3, the step 4, the step 5 and the step 6 until the determined third neural network model meets the preset condition.

Step 802, the data providing end generates a file index vector in the (N-1) th iteration.

Specifically, the data providing end updates parameters of the neural network model according to model parameters of each data providing end in the N-1 th iteration process, and generates a file index vector in the Nth iteration process according to the updated neural network model.

And step 803, the data providing end sends the parameter of the Nth round to the block link point.

Wherein, the Nth round parameter includes: parameters of the neural network model obtained after the Nth iteration, user information of the data providing end and the encrypted file index vector.

In a specific implementation manner, after the nth iterative training is completed, the data providing end encrypts the parameters of the neural network model obtained by the current iterative training, the user information of the data providing end providing the data to be retrieved, the file index vector determined by inputting the data to be retrieved into the neural network model obtained by the current iterative training, and the three parameters, and uploads the encrypted parameters to the block chain system.

Step 804, the block chain system generates block parameters according to the parameters of the nth round of all the data providing terminals and according to the parameters of the nth round of all the data providing terminals.

In a possible implementation manner, the blockchain system determines the number of data providers connected to the blockchain system, and determines whether the number of received first parameters is equal to the number of data providers; if yes, the block chain system sends first parameters of the multiple data providing ends to the first data providing end; if not, the block chain system sends a second parameter to the first data providing end; and the second parameter is a model parameter of the neural network model determined after N-2 iterations are carried out on the initial neural network model.

Specifically, the block chain system is connected to T data providers, and after receiving the N-th round parameters of the data providers, the block chain system determines whether the T data providers all send the N-th round parameters to the block chain system, and if yes, the block chain system stores all the N-th round parameters of the data providers in the block chain, and sends all the N-th round parameters of the data providers to the first data provider after receiving a first request message from the first data provider. If not, the block chain system does not store the currently received N-th round parameters from the data providing end, and sends all the N-1-th round parameters of the data providing end to the first data providing end after receiving the first request message of the first data providing end.

Step 805, the block chain system updates the file index vector summary table according to the parameters of the nth round of all the data providing terminals.

Specifically, after receiving the parameters of the nth round of all the data providing terminals, the block chain system decrypts the parameters of the nth round of all the data providing terminals and determines the file index vectors provided by all the data providing terminals. When the block chain node receives the file index vector for the first time, a file index vector summary table is established. And when a new file index vector is received subsequently, updating the file index vector summary table according to the new file index vector.

Step 806, the data request terminal sends a search request to the block system.

The specific implementation manner of step 806 may refer to step 703, which is not described herein again.

In step 807, the blockchain system determines the K file index vectors with the highest relevance to the search request and sends the K file index vectors to the data server.

The specific implementation of step 806 may refer to steps 704 to 706, which are not described herein again.

And 808, the data server determines and sends the data to be retrieved corresponding to the K file index vectors to the data using end.

The specific implementation manner of step 808 may refer to step 707 and step 708, which is not described herein again.

Step 809, the data using end interacts with the data providing end to obtain the public key of the data providing end.

In a specific implementation manner, when the data providing end sends data to be retrieved to the data server and sends a file index vector to the block chain system, the data providing end encrypts the data to be retrieved and the file index vector by using a private key of the data providing end and a corresponding encryption algorithm. Therefore, when the data using end receives the data to be retrieved from the data server, the public key of the data providing end needs to be acquired to decrypt the data to be retrieved.

It should be noted that, when the data to be retrieved come from different data providing terminals, the data to be retrieved sent by the data server to the data using terminal may further include a data providing terminal identifier to which the data to be retrieved belongs, and the data server determines the data providing terminal according to the identifier and interacts with the data providing terminal to obtain a public key of the data providing terminal.

And step 810, the data using end decrypts the data to be retrieved corresponding to the K file index vectors according to the public key of the data providing end.

After the data using end obtains the public key of the data providing end, the data using end decrypts the data to be retrieved according to the public key and the encryption algorithm used by the data providing end, and determines the data to be retrieved in the plaintext. This data retrieval process is now complete.

Based on the technical scheme, the data providing end sends the data to be retrieved to the data server in an encrypted mode, the file retrieval vectors are sent to the block chain system in an encrypted mode, the block chain system determines K most relevant file retrieval vectors according to matching results in the retrieval process, and the data server is instructed to send the corresponding data to be retrieved to the data using end.

The data providing end processes the data to be retrieved through the neural network model to determine the accuracy of the file retrieval vector of the data to be retrieved, and the accuracy is higher than that of the file retrieval vector determined through the vector space model in the prior art. According to the method and the device, the neural network model is directly subjected to model training at the data providing end, and the problem that a data server cannot directly perform model training on encrypted data in the related technology is solved.

When the data providing end carries out model training, the mode of federal learning is adopted, and the model parameters of a plurality of data providing ends during each iteration are interacted through the block chain, so that the consistency of the model parameters of the preset neural network model obtained by the data providing end is ensured, and the problem of data islanding is solved. Because the data in the training process are all uploaded to the block chain system for uplink storage, the process traceability verifiable of the Federal learning training neural network model can be carried out, whether the training parameters owned by each data are correct can be verified through the block chain system, and the virus-exposure attack, the Byzantine type fault and the like can be effectively prevented.

After receiving the file retrieval vector, the block chain system collects the file retrieval vector into a file retrieval vector table, so that the retrieval efficiency can be improved.

It should be noted that, in the embodiment of the present disclosure, the data retrieval device encrypts the collected data and encrypts and transmits the data in the data transmission process, so as to avoid user information leakage caused by data theft.

In the above, the functions of the high-speed rail user terminal identification system, and the interaction between the devices according to the embodiments of the present disclosure are described in detail.

It can be seen that the technical solutions provided by the embodiments of the present disclosure are introduced mainly from the perspective of methods. To implement the above functions, it includes hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The data retrieval device according to the embodiments of the present disclosure may be divided into functional modules according to the above method examples, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. Optionally, the division of the modules in the embodiment of the present disclosure is illustrative, and is only a logic function division, and there may be another division manner in actual implementation.

Fig. 9 is a schematic structural diagram of a data retrieval device applied to a first data provider according to an embodiment of the present disclosure. The data retrieval apparatus includes: a communication unit 901 and a processing unit 902.

A communication unit 901, configured to send data to be retrieved to a data server; the processing unit 902 is configured to input data to be retrieved into a preset neural network model, and determine a file index vector corresponding to the data to be retrieved; the communication unit 901 is further configured to send the file index vector to the blockchain system.

In a possible implementation manner, the first data provider is one of a plurality of data providers; processing unit 902 is further configured to perform the following step 1, step 6: step 1, obtaining a first model parameter; when the iteration is carried out for the first time, the first model parameter is an initial model parameter; during the Nth iteration, the first model parameter is a model parameter determined according to the first parameters of a plurality of data providing ends acquired from the blockchain system; the first parameter is a model parameter of the neural network model determined after N-1 iterations are carried out on the initial neural network model; n is a positive integer greater than or equal to 2; step 2, adjusting model parameters of the first neural network model according to the first model parameters, and determining a second neural network model; during the first iteration, the first neural network model is an initial neural network model; during the Nth iteration, the first neural network model is a neural network model determined after N-1 iterations are carried out on the initial neural network model; step 3, training a second neural network model according to the training data, and determining a third neural network model; step 4, determining whether the third neural network model meets a preset condition; step 5, if yes, determining the third neural network model as a preset neural network model; step 6, if the first parameter does not meet the requirement, taking the parameter of the third neural network model as the first parameter, and sending the first parameter to the block chain system; and (5) iteratively executing the step (1), the step (2), the step (3), the step (4), the step (5) and the step (6) until the determined third neural network model meets the preset condition.

In a possible implementation manner, the processing unit 902 is further configured to determine a private key of the first data provider; the processing unit 902 is further configured to encrypt data to be retrieved according to a private key; the communication unit 901 is further configured to send the encrypted data to be retrieved to the data server.

In a possible implementation manner, the communication unit 901 is further configured to receive a first request message from the data consumer; the first request message is used for requesting to acquire a private key of the first data providing end; a communication unit 901, further configured to send a first response message to the data consumer; the first response message includes a private key of the first data provider.

In a possible implementation manner, the processing unit 902 is further configured to encrypt the file index vector according to a private key; the communication unit 901 is further configured to send the encrypted file index vector to the blockchain system.

Fig. 10 is a schematic structural diagram of a data retrieval device applied to a block chain system according to an embodiment of the disclosure. The data retrieval device includes: a communication unit 1001 and a processing unit 1002.

A communication unit 1001 for receiving and storing file index vectors from a plurality of data providers; the file index vector corresponds to data to be retrieved; the communication unit 1001 is further configured to receive a retrieval request from a data using end, and the processing unit 1002 is configured to generate a retrieval vector according to the retrieval request; the processing unit 1002 is further configured to match K file index vectors with the highest similarity to the retrieval vector in the file index vectors, where K is a positive integer; the communication unit 1001 is further configured to send first indication information to the data server, where the first indication information is used to indicate the data server to send data to be retrieved corresponding to the K file index vectors to the data using end.

In one possible implementation, the communication unit 1001 is further configured to: receiving first parameters from a plurality of data providing ends, wherein the first parameters are model parameters of the neural network model determined after N-1 iterations are carried out on the initial neural network model; n is a positive integer greater than or equal to 2; storing the first parameter into at least one blockchain node; receiving a second request message from the first data provider; the second request message is used for requesting the first parameters of the plurality of data providing terminals; the first data providing end is one of the at least one data providing end; and responding to the second request message, and sending the first parameters of the plurality of data providers to the first data provider.

In a possible implementation manner, the processing unit 1002 is further configured to determine the number of data providers connected by the blockchain system; determining whether the number of received first parameters is equal to the number of data providers; if yes, the communication unit 1001 is further configured to send a first parameter of the multiple data providing terminals to the first data providing terminal; if not, the communication unit 1001 is further configured to send a second parameter to the first data providing end; the second parameter is a model parameter of the neural network model determined after N-2 iterations of the initial neural network model.

Fig. 11 is a schematic structural diagram of a data retrieval device applied to a data server according to an embodiment of the present disclosure. The data retrieval apparatus includes: a communication unit 1101 and a processing unit 1102.

A communication unit 1101 for receiving data to be retrieved from a plurality of data providing terminals; a communication unit 1101, further configured to receive first indication information from the block chain system; the first indication information is used for indicating the data server to send data to be retrieved corresponding to K file index vectors to the data using end, wherein K is a positive integer; the processing unit 1102 is further configured to match, in response to the first indication information, to-be-retrieved data corresponding to the K file index vectors in to-be-retrieved data of the multiple data providers; the communication unit 1101 is further configured to send data to be retrieved corresponding to the K file index vectors to the data consumer.

Fig. 12 is a schematic structural diagram of a data retrieval device applied to a data using end according to an embodiment of the present disclosure. The data retrieval apparatus includes: a communication unit 1201 and a processing unit 1202.

A processing unit 1202, configured to instruct the communication unit 1201 to send a retrieval request to the blockchain system; determining K file index vectors with the highest similarity to the retrieval vector of the retrieval request by the block link node, and sending first indication information to the data server by the block link node, wherein the first indication information is used for indicating the data server to send data to be retrieved corresponding to the K file index vectors to the data using end, and K is a positive integer; the processing unit 1202 is further configured to instruct the communication unit 1201 to receive data to be retrieved corresponding to the K file index vectors sent by the data server in response to the first instruction information.

In a possible implementation manner, the processing unit 1202 is configured to instruct the communication unit 1201 to send a first request message to a first data provider, where the first data provider is a data provider having data to be retrieved corresponding to K file index vectors in a plurality of data providers; the first request message is used for requesting to acquire a private key of a first data providing end; the processing unit 1202 is further configured to instruct the communication unit 1201 to receive the first response message sent by the first data provider; the first response message comprises a private key of the first data provider; the processing unit 1202 is further configured to decrypt, according to the private key, to-be-retrieved data corresponding to the K file index vectors.

The embodiment of the present disclosure provides a data retrieval apparatus, which is configured to execute a method that any device in the data integrity determination system needs to execute. The data retrieval device can be the data retrieval device referred to in the disclosure, or a module in the data retrieval device; or a chip in the data retrieval device, or other devices for executing the network quality determination method, which is not limited in this disclosure.

The embodiment of the present disclosure further provides a computer-readable storage medium, in which instructions are stored, and when the computer executes the instructions, the computer executes each step in the method flow shown in the foregoing method embodiment.

Embodiments of the present disclosure provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the communication method in the above-described method embodiments.

Embodiments of the present disclosure provide a chip comprising a processor and a communication interface, the communication interface being coupled to the processor, the processor being configured to run a computer program or instructions to implement the communication method as in the above-mentioned method embodiments.

The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, and a hard disk. Random Access Memory (RAM), read-Only Memory (ROM), erasable Programmable Read-Only Memory (EPROM), registers, a hard disk, optical fiber, a portable Compact disk Read-Only Memory (CD-ROM), optical storage devices, magnetic storage devices, or any other form of computer-readable storage medium known in the art, in any suitable combination. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). In the disclosed embodiments, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Since the apparatus, the device, the computer-readable storage medium, and the computer program product in the embodiments of the disclosure may be applied to the method, so that the technical effects obtained by the apparatus and the device may also refer to the method embodiments, which are not described herein again.

The above description is only an embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

1. A data retrieval method, applied to a first data provider, the method comprising:

sending data to be retrieved to a data server;

inputting the data to be retrieved into a preset neural network model, and determining a file index vector corresponding to the data to be retrieved;

and sending the file index vector to a block chain system.

2. The method of claim 1, wherein the first data provider is one of a plurality of data providers; the method further comprises the following steps:

step 1, obtaining a first model parameter; wherein, during the first iteration, the first model parameter is an initial model parameter; during the Nth iteration, the first model parameter is a model parameter determined according to the first parameters of the plurality of data providing ends acquired from the blockchain system; the first parameter is a model parameter of the neural network model determined after N-1 iterations are carried out on the initial neural network model; n is a positive integer greater than or equal to 2;

step 2, adjusting model parameters of the first neural network model according to the first model parameters, and determining a second neural network model; at a first iteration, the first neural network model is the initial neural network model; during the Nth iteration, the first neural network model is a neural network model determined after the initial neural network model is iterated for N-1 times;

step 6, if the first parameter is not satisfied, taking the parameter of the third neural network model as the first parameter, and sending the first parameter to the block chain system; and iteratively executing the step 1, the step 2, the step 3, the step 4, the step 5 and the step 6 until the determined third neural network model meets the preset condition.

3. The method according to claim 1 or 2, wherein the sending the data to be retrieved to the data server comprises:

determining a private key of the first data provider;

encrypting the data to be retrieved according to the private key;

and sending the encrypted data to be retrieved to the data server.

4. The method of claim 3, further comprising:

receiving a first request message from a data using end; the first request message is used for requesting to acquire a private key of the first data provider;

sending a first response message to the data using end; the first response message includes a private key of the first data provider.

5. The method of claim 4, wherein sending the file index vector to a blockchain system comprises:

encrypting the file index vector according to the private key;

and sending the encrypted file index vector to the block chain system.

6. A data retrieval method, applied to a block chain system, the method comprising:

receiving and storing file index vectors from a plurality of data providers; the file index vector corresponds to data to be retrieved;

receiving a retrieval request from a data using end, and generating a retrieval vector according to the retrieval request;

matching K file index vectors with the highest similarity with the retrieval vector in the file index vectors, wherein K is a positive integer;

and sending first indication information to a data server, wherein the first indication information is used for indicating the data server to send the data to be retrieved corresponding to the K file index vectors to the data using end.

7. The method of claim 6, further comprising:

receiving first parameters from the data providing ends, wherein the first parameters are model parameters of the neural network model determined after N-1 iterations of the initial neural network model; n is a positive integer greater than or equal to 2;

storing the first parameter into at least one blockchain node;

receiving a second request message from the first data provider; the second request message is used for requesting the first parameters of the plurality of data providers; the first data provider is one of the at least one data provider;

and responding to the second request message, and sending the first parameters of the plurality of data providing terminals to the first data providing terminal.

8. The method of claim 7, wherein after receiving the second request message from the first data provider, the method further comprises:

determining the number of data providers connected by the blockchain system;

determining whether the number of received first parameters is equal to the number of data providers;

if so, sending first parameters of the plurality of data providing ends to the first data providing end;

if not, sending a second parameter to the first data providing end; and the second parameter is a model parameter of the neural network model determined after N-2 iterations of the initial neural network model.

9. A data retrieval method is applied to a data server, and the method comprises the following steps:

receiving data to be retrieved from a plurality of data providing terminals;

receiving first indication information from a block chain system; the first indication information is used for indicating a data server to send data to be retrieved corresponding to the K file index vectors to a data using end, and K is a positive integer;

responding to the first indication information, and matching to-be-retrieved data corresponding to the K file index vectors in to-be-retrieved data of the multiple data providing ends;

and sending the data to be retrieved corresponding to the K file index vectors to the data using end.

10. A data retrieval method is applied to a data using end, and the method comprises the following steps:

sending a retrieval request to the blockchain system; determining K file index vectors with the highest similarity to the retrieval vector of the retrieval request by using the block chain nodes, and sending first indication information to a data server by using the block chain nodes, wherein the first indication information is used for indicating the data server to send data to be retrieved corresponding to the K file index vectors to the data using end, and K is a positive integer;

and receiving data to be retrieved corresponding to the K file index vectors sent by the data server in response to the first indication information.

11. The method according to claim 10, wherein the data to be retrieved corresponding to the K file index vectors is the data to be retrieved encrypted by the first data provider according to a private key; the method further comprises the following steps:

sending a first request message to a first data providing end, wherein the first data providing end is a data providing end which has data to be retrieved corresponding to the K file index vectors in a plurality of data providing ends; the first request message is used for requesting to acquire a private key of the first data provider;

receiving a first response message sent by the first data providing terminal; the first response message comprises a private key of the first data provider;

and decrypting the data to be retrieved corresponding to the K file index vectors according to the private key.

12. A data retrieval device, comprising: a communication unit and a processing unit;

the communication unit is used for receiving and storing file index vectors from a plurality of data providing terminals; the file index vector corresponds to data to be retrieved;

the communication unit is also used for receiving a retrieval request from a data using end and generating a retrieval vector according to the retrieval request;

the processing unit is used for matching K file index vectors with the highest similarity with the retrieval vector in the file index vectors, wherein K is a positive integer;

the communication unit is further configured to send first indication information to a data server, where the first indication information is used to indicate the data server to send to-be-retrieved data corresponding to the K file index vectors to the data using end.

13. A data retrieval device, comprising: a communication unit and a processing unit;

the communication unit is used for sending data to be retrieved to the data server;

the processing unit is used for inputting the data to be retrieved into a preset neural network model and determining a file index vector corresponding to the data to be retrieved;

the communication unit is further configured to send the file index vector to a blockchain system.

14. A data retrieval device, comprising: a communication unit and a processing unit;

the communication unit is used for receiving data to be retrieved from a plurality of data providing terminals;

the communication unit is further used for receiving first indication information from a block chain system; the first indication information is used for indicating a data server to send data to be retrieved corresponding to the K file index vectors to a data using end, and K is a positive integer;

the processing unit is used for responding to the first indication information and matching the data to be retrieved corresponding to the K file index vectors in the data to be retrieved of the data providing ends;

the communication unit is further configured to send the data to be retrieved corresponding to the K file index vectors to the data using end.

15. A data retrieval device, comprising: a communication unit and a processing unit;

the processing unit is used for instructing the communication unit to send a retrieval request to the blockchain system; determining K file index vectors with the highest similarity to the retrieval vector of the retrieval request by using the block chain nodes, and sending first indication information to a data server by using the block chain nodes, wherein the first indication information is used for indicating the data server to send data to be retrieved corresponding to the K file index vectors to the data using end, and K is a positive integer;

the processing unit is further configured to instruct the communication unit to receive data to be retrieved corresponding to the K file index vectors sent by the data server in response to the first indication information.

16. A data retrieval system, comprising: the data retrieval device of claim 12, the data retrieval device of claim 13, the data retrieval device of claim 14, and the data retrieval device of claim 15.

17. A data retrieval device, comprising: a processor and a memory; wherein the memory is configured to store computer-executable instructions that, when executed by the data retrieval device, cause the data retrieval device to perform the data retrieval method of any one of claims 1-11.

18. A computer-readable storage medium having stored therein instructions, which when executed by a processor of a data retrieval device, cause the data retrieval device to perform the data retrieval method of any one of claims 1-11.