CN115842818A

CN115842818A - Big data transmission method and device, computer equipment and storage medium

Info

Publication number: CN115842818A
Application number: CN202211392117.XA
Authority: CN
Inventors: 吴有亮
Original assignee: Ping An E Wallet Electronic Commerce Co Ltd
Current assignee: Ping An E Wallet Electronic Commerce Co Ltd
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-03-24

Abstract

The application relates to the technical field of computers, and discloses a big data transmission method, a device, computer equipment and a storage medium, which comprise: if the data transmission interface receives a big data transmission request sent by a client, calling a computing engine to acquire target data corresponding to first identification information in the big data transmission request from a database; writing the target data into at least one target file according to a preset transmission data volume; storing at least one target file in a distributed storage; marking at least one target file, and determining second identification information of each target file; sending the second identification information to the client through a data transmission interface; if the data transmission interface receives a data calling request sent by the client, the target file corresponding to the target identification information in the data calling request in the distributed storage is sent to the client through the data transmission interface, and efficient and flexible big data transmission is achieved.

Description

Big data transmission method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for big data transmission, a computer device, and a storage medium.

Background

With the advent of the cloud era, big data (Big data) has attracted more and more attention. Large data is typically used to accommodate large amounts of unstructured and semi-structured data. At present, a conventional big data transmission scheme is to off-line clean data through a big data platform, and then send the cleaned data File to a destination address of a butt-joint party in an SFTP (secure File Transfer Protocol) manner. However, the scheme can only realize the file transfer of the T +1 day, cannot meet the requirement of high timeliness, and is low in transmission efficiency particularly for large data. Moreover, once the content and format of the data file are fixed, the data file cannot be changed and is not flexible.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for transmitting big data, a computer device, and a storage medium, so as to solve the problems of low efficiency and inflexibility of data transmission of big data volume.

In a first aspect, a big data transmission method is provided, including:

if the data transmission interface receives a big data transmission request sent by a client, calling a computing engine to acquire target data corresponding to first identification information in the big data transmission request from a database;

calling the calculation engine to write target data into at least one target file according to a preset transmission data volume;

storing at least one target file in a distributed storage;

marking at least one target file, and determining second identification information of each target file;

sending the second identification information to the client through the data transmission interface;

and if the data transmission interface receives a data calling request sent by the client, sending a target file corresponding to target identification information in the data calling request in the distributed storage to the client through the data transmission interface, wherein the second identification information comprises the target identification information.

Further, writing the target data into at least one target file according to a preset transmission data volume, including:

determining the file quantity of at least one target file according to the preset transmission data quantity and the total data quantity of the target data;

matching parallelism according to the number of files;

and writing the target data into at least one target file according to the parallelism and the preset transmission data volume.

Further, the big data transmission method further comprises the following steps:

and if the sensitive data exist in the target data, encrypting the sensitive data.

Further, the sensitive data is encrypted, and the encryption processing comprises the following steps:

analyzing the target data to obtain sensitive data meeting sensitive screening conditions;

encrypting the sensitive data according to a preset secret key;

and writing the encrypted sensitive data into the target data.

Further, invoking the computing engine to obtain the target data corresponding to the first identification information in the big data transmission request from the database, including:

processing the first identification information to generate a control instruction;

configuring parameter information of a calculation engine;

and calling a calculation engine according to the parameter information and the control instruction to acquire target data.

Further, after writing the target data into at least one target file according to the preset transmission data volume, the method further comprises: updating the state information of the target data to a processed state;

sending the second identification information to the client through the data transmission interface, including: if the data transmission interface receives a state query request sent by a client, acquiring state information of target data; and if the state information is in the processed state, sending the second identification information to the client through the data transmission interface.

Further, the distributed memory includes at least one of: GFS memory, NAS memory, NFS memory.

comparing the user information of the client in the big data transmission request with the authority information of the target data;

and if the user information accords with the authority information, calling a calculation engine to acquire target data corresponding to the first identification information in the big data transmission request from the database.

In a second aspect, a big data transmission device is provided, including:

the processing module is used for calling the computing engine to acquire target data corresponding to the first identification information in the big data transmission request from the database if the data transmission interface receives the big data transmission request sent by the client; and the number of the first and second groups,

calling a calculation engine to write target data into at least one target file according to a preset transmission data volume;

the storage module is used for storing at least one target file in the distributed storage;

the processing module is further used for marking at least one target file and determining second identification information of each target file;

the communication module is used for sending the second identification information to the client through the data transmission interface; and the number of the first and second groups,

and if a data calling request sent by the client is received, sending a target file corresponding to target identification information in the data calling request in the distributed storage to the client through a data transmission interface, wherein the second identification information comprises the target identification information.

Further, the processing module is specifically configured to determine the file number of the at least one target file according to a preset transmission data amount and a total data amount of the target data; matching the parallelism according to the number of files; and writing the target data into at least one target file according to the parallelism and the preset transmission data amount.

Further, the big data transmission device further comprises:

and the encryption and decryption module is used for encrypting the sensitive data if the sensitive data exists in the target data.

Further, the encryption and decryption module is specifically used for analyzing and processing the target data to acquire sensitive data meeting the sensitive screening conditions; encrypting the sensitive data according to a preset secret key; and writing the encrypted sensitive data into the target data.

Further, the processing module is specifically configured to process the first identification information and generate a control instruction; configuring parameter information of a calculation engine; and calling a calculation engine according to the parameter information and the control instruction to acquire target data.

Further, the processing module is further configured to update the state information of the target data to a processed state;

the communication module is specifically used for acquiring the state information of the target data if the data transmission interface receives a state query request sent by the client; and if the state information is the processed state, sending the second identification information to the client through the data transmission interface.

Further, the processing module is also used for comparing the user information of the client in the big data transmission request with the authority information of the target data; and if the user information accords with the authority information, calling a calculation engine to acquire target data corresponding to the first identification information in the big data transmission request from the database.

In a third aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the big data transmission method are implemented.

In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above big data transmission method.

In the scheme implemented by the big data transmission method, the device, the computer equipment and the storage medium, when the client needs to acquire the target data, the client accesses a data transmission interface provided by the big data system and initiates a big data transmission request. After the big data system receives the big data transmission request, the big data system analyzes first identification information in the big data transmission request and calls target data required by a client by using the first identification information. After the target data are obtained, the big data system distributes the target data to at least one target file according to the preset transmission data volume allowed by the data transmission interface, so that the big data are split. Further, the big data system stores at least one target file in the distributed memory and marks the target file, and second identification information obtained after marking is fed back to the client. Therefore, the client user can flexibly acquire the required data in batches by taking the target file as a unit. Through the technical scheme, on one hand, the data acquisition right is directly placed on the client side, data can be acquired according to the self-defined first or second identification information, the timeliness of data exchange is improved, and the requirement that the client flexibly acquires data content according to the requirement is met. On the other hand, a large amount of target data are split into at least one target file, so that the transmission efficiency and convenience of the large data file can be greatly improved, data redundancy and transmission channel blockage are avoided, and the possibility of transmission failure is reduced. On the other hand, the computing capacity of the big data system is controlled through the interface and resource allocation is carried out, so that the utilization efficiency of network resources is improved, the data utilization rate of the big data system is increased, unclear naming and code confusion caused by random naming of developers are prevented on the basis of ensuring user requirements, the development efficiency is improved, and the purposes of convenience in maintenance and expansion are achieved.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic flow chart of a big data transmission method in the present application;

FIG. 2 is a schematic diagram of a scenario of a big data transmission method in the present application;

FIG. 3 is a schematic diagram of a large data transmission device according to the present application;

fig. 4 is a schematic structural diagram of a computer device in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of a big data transmission method provided in an embodiment of the present application, which is suitable for a big data system, and the method includes the following steps:

s10: if the data transmission interface receives a big data transmission request sent by a client, calling a computing engine to acquire target data corresponding to first identification information in the big data transmission request from a database;

the big data transmission request comprises first identification information of target data, and the first identification information comprises information used for identifying the data, such as a data table name, a column where the data is located or data time. The database is used for storing all data of the system, wherein the target data is also included.

Specifically, the big data transmission method provided by the application can be applied to big data systems in various application scenarios. For example, in the medical field, medical record information required by a user can be inquired from massive electronic medical records based on a big data system, and medical record reference is provided for the user. As shown in fig. 2, the big data system includes a server, a Hadoop cluster, and an encryption/decryption module. The server is used for information interaction with the client, and the server is provided with a data transmission interface which is used for establishing communication between the server and the client so as to realize control and resource allocation. The Hadoop cluster can interact with the server and the encryption and decryption module, is a distributed system infrastructure, is used for storing and analyzing a large amount of unstructured data in a distributed computing environment, provides computing or application service for the server and the client in a network environment, and can copy each data block to other nodes to ensure that data cannot be lost when a single node fails, and therefore the fault resistance is improved. The encryption and decryption module is used for encrypting or decrypting data.

It should be noted that Hadoop is a Distributed framework, and is composed of three parts, namely storage (HDFS), resource scheduling (Yarn), and computation (MapReduce), and Hadoop can allocate a huge data set to a plurality of nodes in a cluster composed of ordinary computers for storage, and index and track the data, thereby greatly improving the efficiency of processing and analyzing the large data. The computing engine is integrated in the Hadoop cluster.

In one possible implementation, the big data system may be a medical platform and the target data may be medical data, such as personal health records, prescriptions, exam reports, and the like.

In this embodiment, when the client needs to acquire the target data, the client accesses a data transmission interface provided by the big data system and initiates a big data transmission request. The data transmission interface transmits the received big data transmission request to a big data system, the big data system analyzes first identification information in the big data transmission request after receiving the big data transmission request, and calls a calculation engine to call target data required by a client from a database by taking the first identification information as a screening condition. Therefore, the big data system directly puts the data acquisition right to the client side, can configure data to be transmitted according to the first user-defined identification information, avoids transmitting a large amount of data at one time, ensures that non-target data in the database cannot be leaked, further improves the timeliness of data exchange, and meets the requirement that the client side flexibly acquires data contents according to requirements.

It should be noted that the big data transmission request further includes user information of the client, and the invoking of the computing engine obtains target data corresponding to the first identification information in the big data transmission request from the database specifically includes: comparing the user information of the client in the big data transmission request with the authority information of the target data; and if the user information accords with the authority information, calling a calculation engine to acquire target data corresponding to the first identification information in the big data transmission request from the database.

In this embodiment, the big data system may compare the user information with the authority information of the target data corresponding to the first identification information. And if the user information accords with the authority information, the client side is proved to have the authority for acquiring the target data, and then the subsequent step of acquiring the target data is carried out. If the user information does not accord with the authority information, the client cannot acquire the target data, at the moment, the big data system feeds back prompt information to the client through the data transmission interface, and meanwhile, the big data system refuses to acquire the target data. Therefore, the safety of data query and transmission is improved, and the reliability of a big data system is improved.

In some embodiments of the present application, a specific entity alignment scheme is provided, in S10, that is, a computing engine is called to obtain target data corresponding to first identification information in a big data transmission request from a database, and the specific step includes:

s11: processing the first identification information to generate a control instruction;

specifically, the computing engine may adopt a Spark computing engine, which is more easily developed and implements various complex functions based on a computing model of RDD (elastic distributed data set), wherein Spark provides a Spark SQL technical component to facilitate execution of an interactive query task. SparkSQL is a component of the Spark computation engine's provisioning for performing interactive query tasks. At this time, the control command is a task command for controlling SparkSQL.

S12: configuring parameter information of a calculation engine;

wherein the parameter information includes but is not limited to at least one of the following: deployment mode, cache memory amount, thread number, thread core number and thread memory. For example, the Spark calculation engine supports 3 cluster deployment modes, which are independent scheduling (standby), resource scheduling (Yarn), and management (facilities); the user of the cache memory amount stores the intermediate data generated in the calculation process so as to improve the performance of processing the streaming data and the iterative data; the number of threads, the number of thread cores and the thread memory are used for limiting the parallel performance of the computing engine.

S13: and calling a calculation engine according to the parameter information and the control instruction to acquire target data.

In this embodiment, a task instruction for controlling the calculation engine is generated based on the first identification information. And simultaneously setting parameter information required by the calculation engine according to requirements. And starting a query component of the calculation engine in the Executor according to the configured parameter information, and starting the query component by using the control instruction to search in the database so as to obtain the target data of the field corresponding to the first identification information.

In some embodiments of the present application, a specific entity alignment scheme is provided, in S10, that is, after the computing engine is invoked to obtain the target data corresponding to the first identification information in the big data transmission request from the database, the big data transmission method further includes: and if the sensitive data exist in the target data, encrypting the sensitive data.

In this embodiment, if the target data includes a sensitive data column, for example, information such as a name, a phone number, an identification number, and the like, the big data system may encrypt the suspected sensitive data related to the target data, so as to prevent the private data of the user from being leaked and protect the security of the private data.

In a possible implementation manner, the encrypting process is performed on the sensitive data, and specifically includes the following steps:

(1) Analyzing the target data to obtain sensitive data meeting sensitive screening conditions;

the definition of the sensitive data can be determined according to the requirements of the user and relevant contents such as industry information and the like of the user, the definition of the sensitive data may be different aiming at different user groups, and the sensitive screening conditions for screening the sensitive data can be determined based on the user groups to which the user belongs.

In a specific application scenario, the sensitive data may include information specified in law, such as a mobile phone number, an identity card number, an IP address, and an IMEI (international mobile equipment identity), and spam information such as politics and customs, and may also include unique sensitive information of an industry to which the user belongs. The screening condition may include characteristics of the sensitive data, such as an arrangement rule of data of a number type, such as an identification number, an IP address, and the like, on a data structure, a short sentence or a phrase including a specific word eye, and the like, and may also include a type of the sensitive data.

(2) Encrypting the sensitive data according to a preset secret key;

the preset key may be a preset fixed key, a key selected for each user from the encryption algorithm library through user information, or a matching preset key from the encryption algorithm library according to the type of the sensitive data type with the largest data volume. Therefore, different keys are set for different users or data types, so that malicious users cannot guess the keys, especially under the condition that sensitive data of two users are the same, character strings are different due to different keys, the guessing and exhaustion difficulty of the malicious users is increased, and the security level of the sensitive data is improved.

(3) And writing the encrypted sensitive data into the target data.

Specifically, the preset key may be at least two of data encryption standards DES, 3DES, international data encryption algorithm IDEA, RSA algorithm, digital signature algorithm DSA, advanced encryption standard AES, and digest algorithm MD 5. Of course, other encryption algorithms may be added to the encryption algorithm, and the embodiment of the present application is not particularly limited.

In this embodiment, the big data system converts unstructured target data into data that can be processed by recognition through an analytic process, considering that unstructured data cannot be directly recognized as structured data. And traversing the analyzed target data, and screening the sensitive data meeting the sensitive screening conditions from the target data. And then encrypting the sensitive data in the target data according to a preset secret key, and covering the encrypted sensitive data with the original unencrypted sensitive data in the target data, thereby completing the desensitization operation of the sensitive data. Therefore, the security level of the sensitive data is improved, and the privacy of the user is prevented from being revealed.

Further, analyzing the target data, and acquiring the sensitive data meeting the sensitive screening condition includes: if the target data contains unstructured data, performing field segmentation processing on the unstructured data in the target data, and determining the format of each field in the target data; and if the similarity between any field in the target data and the preset field in the dictionary is higher than a preset threshold value, taking the characteristic information of the preset field in the dictionary as the characteristic information of any field. And if the characteristic information of any field belongs to the screening condition, determining any field as sensitive data.

S20: calling a calculation engine to write target data into at least one target file according to a preset transmission data volume;

the preset transmission data volume can be reasonably set according to the data volume allowed to be transmitted by the data transmission interface, for example, the data transmission interface allows a file of 20MB to be transmitted at one time, and the preset transmission data volume can be set to be less than 20MB.

In this embodiment, after the target data is obtained, the big data system allocates the target data to at least one target file by using the calculation engine on the condition of the preset transmission data amount allowed by the data transmission interface, so that splitting of the big data is realized. The transmission efficiency and the convenience of big data files can be greatly improved, data redundancy and transmission channel blockage are avoided, and the possibility of transmission failure is reduced.

In some embodiments of the present application, a specific entity alignment scheme is provided, in S20, that is, the target data is written into at least one target file according to a preset transmission data size, which specifically includes the following steps:

s21: determining the file quantity of at least one target file according to the preset transmission data quantity and the total data quantity of the target data;

specifically, the file number of the at least one target file = a total data amount/a preset transmission data amount of the target data. For example, the total data volume of all the target data is 100000 pieces of log data, and the preset transmission data volume is 10000 pieces of data, then the number of the files required to be split =100000/10000= 10.

S22: matching the parallelism according to the number of files;

s23: and writing the target data into at least one target file according to the parallelism and the preset transmission data amount.

In the embodiment, the number of files required to be split is calculated through the preset transmission data volume and the total data volume of the target data, and the optimal parallelism for the large data system processor is dynamically matched according to the number of the files and the real-time load condition of the large data system. And finally, according to the parallelism and the preset transmission data volume, parallel running target data writing operation in a multithreading mode to obtain target files with the number of files. Therefore, the calculation speed is accelerated, the time for splitting data is effectively reduced, and the data transmission efficiency is improved.

Specifically, for example, taking 10000 records as an example of the preset data transmission amount, the total record number of the target data is obtained by the calculation engine. The total number of records is divided by 10000 to obtain the number of the redistributed files, and then the parallelism of the redistribution process is calculated. And after the distribution parallelism is set, reading target data and writing the target data into a file, when the new file meets 10000 records, writing subsequent target data into a new file, and so on until all target data are distributed.

S30: storing at least one target file in a distributed storage;

specifically, the distributed storage may be a GFS (Google File System) storage, the GFS is an expandable distributed File System and is used for large-scale, distributed applications accessing a large amount of data, and the GFS cluster is composed of a master (management service node) and a large amount of chunkserver (server); the distributed Storage can also be an NAS (Network Attached Storage), which takes data as a center, completely separates Storage equipment from a server, and centrally manages the data, thereby releasing bandwidth, improving performance, reducing total cost of ownership, and protecting investment; the distributed memory may also be a Network File System (NFS) memory, and the NFS can share resources among computers in a Network through a TCP/IP Network to save a local storage space.

In this embodiment, at least one target file obtained after splitting is stored in the distributed storage, so that target data required by a user is stored in the distributed storage, and some data that cannot be used temporarily are stored in the original disk, and memory pressure and bottleneck caused by big data are reduced by the classification of target data storage. When a user needs to look up target data, only the target file in the distributed memory needs to be called and checked, and the data query efficiency and the system performance are improved.

S40: marking at least one target file, and determining second identification information of each target file;

the second identification information comprises a calculation task list number, a target file number, data contents corresponding to the target file and the like.

S50: sending the second identification information to the client through a data transmission interface;

s60: and if the data transmission interface receives a data calling request sent by the client, sending a target file corresponding to the target identification information in the data calling request in the distributed storage to the client through the data transmission interface.

Wherein the second identification information includes target identification information.

In the embodiment, at least one target file is stored in the distributed storage and marked, and second identification information obtained after marking is fed back to the client. Therefore, the client user inputs the target identification information as required, so that the big data system retrieves the target file from the distributed storage according to the target identification information and feeds the target file back to the client. Therefore, a user can flexibly acquire required data in batches by taking the target file as a unit, meanwhile, the computing capacity of the big data system is controlled through the interface and resource allocation is carried out, the utilization efficiency of network resources is improved, the data utilization rate of the big data system is increased, unclear naming and code confusion caused by random naming of developers are prevented on the basis of ensuring the user requirements, the development efficiency is improved, and the purposes of convenience in maintenance and expansion are achieved.

In some embodiments of the present application, a specific entity alignment scheme is provided, S20, that is, after target data is written into at least one target file according to a preset transfer data size, the big data transfer method further includes the following steps: updating the state information of the target data to a processed state;

in this embodiment, each time the big data system generates a target file, the state information of the target data written in the target file is updated to the processed state, so that the client user can know the data transmission progress in time.

Further, in S60, that is, sending the second identification information to the client through the data transmission interface, the method specifically includes the following steps:

s61: if the data transmission interface receives a state query request sent by a client, acquiring state information of target data;

s62: and if the state information is in the processed state, sending the second identification information to the client through the data transmission interface.

In this embodiment, when the client needs to check the progress of data processing, a status query request is sent by the client. The big data system obtains status information of the target data in response to the request. And if the state information of the target data is in a processed state, the target data is described to complete splitting, and the big data system sends second identification information such as the serial number of the target file where the target data is located, the task order number of the splitting task and the like to the client through the data transmission interface so that a client user can select the target file to be checked as required.

Specifically, for example, the newly allocated target file is moved to the shared GFS disk, the target file is numbered from 1 and incremented in a manner that the step size is 1, the file number and the job ticket number (second identification information) are stored in the relational database, the file number represents the page number of the data transmission interface, the total number of the numbers represents the total number of pages of the data transmission interface, and the job status is updated to "processed". And the client calls the interface to inquire whether the task is completed according to the task sheet number, and if the cluster calculation is completed, the client returns 'completed' and includes the total page number and the task sheet number. And the client user calls the data transmission interface according to the task list number and the page number to acquire the data of one target file at one time.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The big data transmission method provided by the embodiment of the application can be applied to an application environment as shown in fig. 2, wherein a client communicates with a server of a big data system through an interface, the server of the big data system can perform information interaction with a Hadoop cluster of the big data system, the server has a data transmission interface, and the big data system further comprises an encryption and decryption module used for encrypting the sensitive data.

In an embodiment, a big data transmission device is provided, and the big data transmission device corresponds to the big data transmission method in the above embodiments one to one. As shown in fig. 3, the large data transmission apparatus includes a processing module 301, a storage module 302, and a communication module 303. The functional modules are explained in detail as follows:

the processing module 301 is configured to, if the data transmission interface receives a big data transmission request sent by a client, invoke a computing engine to obtain target data corresponding to first identification information in the big data transmission request from a database; calling a calculation engine to write target data into at least one target file according to a preset transmission data volume;

a storage module 302, configured to store at least one target file in a distributed storage;

the processing module 301 is further configured to perform a marking process on at least one target file, and determine second identification information of each target file;

the communication module 303 is configured to send the second identification information to the client; and if a data calling request sent by the client is received, sending a target file corresponding to target identification information in the data calling request in the distributed storage to the client, wherein the second identification information comprises the target identification information.

In an embodiment, the processing module 301 is specifically configured to determine, according to a preset transmission data amount and a total data amount of the target data, a file number of at least one target file; matching the parallelism according to the number of files; and writing the target data into at least one target file according to the parallelism and the preset transmission data amount.

In one embodiment, the big data transmission apparatus further includes: and the encryption and decryption module (not shown in the figure) is used for encrypting the sensitive data if the sensitive data exists in the target data.

In an embodiment, the encryption and decryption module is specifically configured to perform analysis processing on target data to obtain sensitive data meeting a sensitive screening condition; encrypting the sensitive data according to a preset secret key; and writing the encrypted sensitive data into the target data.

In an embodiment, the processing module 301 is specifically configured to process the first identification information and generate a control instruction; configuring parameter information of a calculation engine; and calling a calculation engine according to the parameter information and the control instruction to acquire target data.

In an embodiment, the processing module 301 is further configured to update the state information of the target data to a processed state; the communication module 303 is specifically configured to obtain state information of the target data if the data transmission interface receives a state query request sent by the client; and if the state information is in the processed state, sending the second identification information to the client through the data transmission interface.

In one embodiment, the distributed memory includes at least one of: GFS memory, NAS memory, NFS memory.

In an embodiment, the processing module 301 is further configured to compare user information of the client in the big data transmission request with authority information of the target data; and if the user information accords with the authority information, calling a calculation engine to acquire target data corresponding to the first identification information in the big data transmission request from the database.

When a client needs to acquire target data, the client accesses a data transmission interface provided by a big data system and initiates a big data transmission request. After the big data system receives the big data transmission request, the big data system analyzes first identification information in the big data transmission request and calls target data required by a client by using the first identification information. After the target data are obtained, the big data system distributes the target data to at least one target file according to the preset transmission data volume allowed by the data transmission interface, so that the big data are split. And further, storing at least one target file in a distributed memory, marking the target file, and feeding back second identification information obtained after marking to the client. Therefore, the client user can flexibly acquire the required data in batches by taking the target file as a unit. Through the technical scheme, on one hand, the data acquisition right is directly placed on the client side, data can be acquired according to the self-defined first or second identification information, the timeliness of data exchange is improved, and the requirement that the client flexibly acquires data content according to the requirement is met. On the other hand, a large amount of target data are split into at least one target file, so that the transmission efficiency and convenience of the large data file can be greatly improved, data redundancy and transmission channel blockage are avoided, and the possibility of transmission failure is reduced. On the other hand, the computing capacity of the big data system is controlled through the interface and resource allocation is carried out, so that the utilization efficiency of network resources is improved, the data utilization rate of the big data system is increased, unclear naming and code confusion caused by random naming of developers are prevented on the basis of ensuring user requirements, the development efficiency is improved, and the purposes of convenience in maintenance and expansion are achieved.

For specific limitations of the big data transmission device, reference may be made to the above limitations on the big data transmission method, which is not described herein again. The modules in the big data transmission device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: if the data transmission interface receives a big data transmission request sent by a client, calling a computing engine to acquire target data corresponding to first identification information in the big data transmission request from a database; writing the target data into at least one target file according to a preset transmission data volume; storing at least one target file in a distributed storage; marking at least one target file, and determining second identification information of each target file; sending the second identification information to the client through a data transmission interface; and if the data transmission interface receives a data calling request sent by the client, sending a target file corresponding to target identification information in the data calling request in the distributed storage to the client through the data transmission interface, wherein the second identification information comprises the target identification information.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: if the data transmission interface receives a big data transmission request sent by a client, calling a computing engine to acquire target data corresponding to first identification information in the big data transmission request from a database; writing the target data into at least one target file according to a preset transmission data volume; storing at least one target file in a distributed storage; marking at least one target file, and determining second identification information of each target file; sending the second identification information to the client through the data transmission interface; and if the data transmission interface receives a data calling request sent by the client, sending a target file corresponding to target identification information in the data calling request in the distributed storage to the client through the data transmission interface, wherein the second identification information comprises the target identification information.

In one embodiment, a computer device is provided, which may be a client or a server, and its internal structure is shown in fig. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external server through a network connection. The computer program is executed by a processor to implement the functions or steps of a big data transmission method.

Specifically, the computing and devices provided in the embodiments of the present application may be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and servers. The server may be an independent server or a server cluster composed of a plurality of servers.

It should be noted that, the functions or steps that can be implemented by the computer-readable storage medium or the computer device may be correspondingly described in the foregoing method embodiments, and in order to avoid repetition, the description is not repeated here.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

Although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that various changes and modifications may be made, and equivalents may be substituted for elements thereof without departing from the scope of the present invention; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. A big data transmission method is characterized by comprising the following steps:

writing the target data into at least one target file according to a preset transmission data volume;

storing the at least one target file in a distributed storage;

marking the at least one target file, and determining second identification information of each target file;

2. The big data transmission method according to claim 1, wherein the writing of the target data into at least one target file according to a preset transmission data amount comprises:

determining the file quantity of the at least one target file according to the preset transmission data quantity and the total data quantity of the target data;

matching parallelism according to the number of the files;

and writing the target data into the at least one target file according to the parallelism and the preset transmission data volume.

3. The big data transmission method according to claim 1, wherein after the invoking computing engine obtains the target data corresponding to the first identification information in the big data transmission request from a database, the method further comprises:

4. The big data transmission method according to claim 3, wherein the encrypting the sensitive data comprises:

analyzing the target data to obtain the sensitive data meeting sensitive screening conditions;

encrypting the sensitive data according to a preset secret key;

and writing the encrypted sensitive data into the target data.

5. The big data transmission method according to claim 1, wherein the invoking calculation engine obtains the target data corresponding to the first identification information in the big data transmission request from a database, and includes:

configuring parameter information of the calculation engine;

and calling the calculation engine according to the parameter information and the control instruction to acquire the target data.

6. The big data transmission method according to claim 1,

after writing the target data into at least one target file according to a preset transmission data volume, the method further comprises:

updating the state information of the target data to a processed state;

the sending the second identification information to the client through the data transmission interface includes:

if the data transmission interface receives a state query request sent by the client, acquiring state information of the target data;

and if the state information is in a processed state, sending the second identification information to the client through the data transmission interface.

7. The big data transmission method according to any one of claims 1 to 6, wherein the invoking calculation engine obtains target data corresponding to the first identification information in the big data transmission request from a database, and includes:

and if the user information accords with the authority information, calling a calculation engine to acquire target data corresponding to the first identification information in the big data transmission request from a database.

8. A big data transmission apparatus, comprising:

the processing module is used for calling a computing engine to acquire target data corresponding to first identification information in a big data transmission request from a database if the data transmission interface receives the big data transmission request sent by a client; calling the calculation engine to write the target data into at least one target file according to a preset transmission data volume;

the storage module is used for storing the at least one target file in a distributed memory;

the processing module is further configured to perform marking processing on the at least one target file, and determine second identification information of each target file;

the communication module is used for sending the second identification information to the client through the data transmission interface; and if a data call request sent by the client is received, sending a target file corresponding to target identification information in the data call request in the distributed storage to the client through the data transmission interface, wherein the second identification information comprises the target identification information.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the big data transmission method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the big data transmission method according to any of claims 1 to 7.