CN116684437A - Distributed data management method based on natural language analysis - Google Patents

Distributed data management method based on natural language analysis Download PDF

Info

Publication number
CN116684437A
CN116684437A CN202310976377.XA CN202310976377A CN116684437A CN 116684437 A CN116684437 A CN 116684437A CN 202310976377 A CN202310976377 A CN 202310976377A CN 116684437 A CN116684437 A CN 116684437A
Authority
CN
China
Prior art keywords
data
information
calculation module
information calculation
natural language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310976377.XA
Other languages
Chinese (zh)
Other versions
CN116684437B (en
Inventor
张玉磊
梅雪明
丁皓
张敬超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Liangjie Data Technology Co ltd
Original Assignee
Jiangsu Liangjie Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Liangjie Data Technology Co ltd filed Critical Jiangsu Liangjie Data Technology Co ltd
Priority to CN202310976377.XA priority Critical patent/CN116684437B/en
Publication of CN116684437A publication Critical patent/CN116684437A/en
Application granted granted Critical
Publication of CN116684437B publication Critical patent/CN116684437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/288Distributed intermediate devices, i.e. intermediate devices for interaction with other intermediate devices on the same level
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A distributed data management method based on natural language analysis, comprising: in response to an instruction to gather data information, obtaining the data information from a data storage device; calculating the data information by using a second information calculation module to obtain intermediate data and sending the intermediate data to the data exchange equipment; transmitting the intermediate data in the data exchange equipment to a server; the server side calculates the intermediate data by using a first information calculation module to obtain target data; the second information calculation module is obtained by the first information calculation module through natural language analysis. When the data volume obtained by the server from the data storage device is too large, the technical scheme of the application can greatly reduce the transmission of data information, save time and avoid potential safety hazards caused by segmented transmission of a large amount of data, such as data leakage and the like.

Description

Distributed data management method based on natural language analysis
Technical Field
The application belongs to the field of distributed data management, and particularly relates to a distributed data management method based on natural language analysis.
Background
In the prior art, massive data are often distributed in different data storage devices, and when a server needs to access the data, the data are often read from the different data storage devices through a data exchange device and are unified and collected to the server. However, the data storage devices themselves may be dispersed in different cities of different countries. Therefore, when data is acquired from the data storage device by using the data exchange device, the data needs to be transmitted in segments due to huge data volume, so that the transmission time is slow, and the data is easy to hack, thereby causing data leakage.
Disclosure of Invention
In order to solve the defects existing in the prior art, the application aims to solve the defects, and further provides a distributed data management method based on natural language analysis.
The application adopts the following technical scheme.
The first aspect of the application discloses a distributed data management method based on natural language analysis, which comprises the following steps:
step S1, responding to an instruction for collecting data information, and acquiring the data information from a data storage device;
step S2, calculating the data information by using a second information calculation module to obtain intermediate data and sending the intermediate data to the data exchange equipment;
step S3, intermediate data in the data exchange equipment are sent to a server;
step S4, the server calculates the intermediate data by using a first information calculation module to obtain target data; the second information calculation module is obtained by the first information calculation module through natural language analysis.
A second aspect of the present application discloses a distributed data management system based on natural language analysis, for performing the method of the first aspect, comprising: comprising the following steps: the system comprises a plurality of data storage devices, data exchange devices and a server;
the data information is stored in a plurality of data storage devices in a distributed mode, the data storage devices respond to instructions for collecting the data information to obtain the data information, and the second information calculation module is utilized to calculate the data information, so that intermediate data is obtained and sent to the data exchange device;
the data exchange equipment is used for sending the intermediate data to the server;
the server side is used for sending an instruction for collecting data information, and calculating the intermediate data by using the first information calculation module to obtain target data; the second information calculation module is obtained by the first information calculation module through natural language analysis.
The third aspect of the application discloses a terminal, which comprises a processor and a storage medium; the method is characterized in that:
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of the first aspect.
A fourth aspect of the application discloses a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to the first aspect.
Compared with the prior art, the application has the following advantages:
when the data volume obtained by the server from the data storage device is too large, the technical scheme of the application can greatly reduce the transmission of data information, save time and avoid potential safety hazards caused by segmented transmission of a large amount of data, such as data leakage and the like.
Drawings
FIG. 1 is a schematic diagram of a distributed data management system.
FIG. 2A is a schematic diagram of a distributed data management system based on natural language analysis, according to an embodiment of the present application.
FIG. 2B is a schematic diagram of a first logic module and a second logic module according to an embodiment of the application.
Fig. 3A is a code schematic diagram of a machine learning algorithm of an embodiment of the present application.
FIG. 3B is a code schematic diagram of another machine learning algorithm according to an embodiment of the present application.
Fig. 3C is a code schematic diagram after the first cutting module adapts to the interface according to an embodiment of the present application.
FIG. 4A is a schematic diagram of a communication method between a first logic module and a second logic module according to an embodiment of the present application.
Fig. 4B is a schematic diagram of a communication method between the first and second logic modules according to an embodiment of the present application based on the scenario of fig. 3B.
FIG. 5 is a flow chart of a distributed data management method based on natural language analysis according to an embodiment of the present application.
Detailed Description
The application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and are not intended to limit the scope of the present application.
A distributed data management system is shown in fig. 1. The distributed data management system may include: the system comprises a plurality of data storage devices, at least one data exchange device and a server.
The data storage device is used for storing data, and generally mainly comprises a magnetic disk, and massive data are stored on the magnetic disk in the form of a database. The data storage device should also include a first interface for external access and storing the above-mentioned mass of data.
The data exchange device may be integrated inside the server, but is usually separately drawn out of the server, so as to share the pressure of the server. The data exchange device at least comprises a second interface and a third interface. It is understood that the second interface is communicatively connected to the first interface of the data storage device, and the third interface is communicatively connected to the logic module of the server. The switching end of the data switching device is also generally responsible for the secure transmission of data.
It is understood that the first interface, the second interface and the third interface are also logic modules in nature, and the internal logic thereof can be implemented by codes. The functions born by the first interface, the second interface and the third interface only need to be responsible for the external interface, so the naming is distinguished.
In one general scenario of the present application, for example: the wind power information of the Guangzhou, cheng and Harbin cities throughout the year is required to be collected, and then the wind power information with the highest wind power value is obtained. Taking fig. 1 as a reference, it is not difficult to generalize that the above-mentioned process may include steps 1 to 3.
Step 1, data information is acquired from a data storage device in response to an instruction for gathering the data information.
In a general scenario, the data information may correspond to wind information. It will be appreciated that instructions for gathering data information are typically sent by the server to the data storage device via the data exchange device.
And step 2, the data information is sent to the server through the data exchange equipment.
And step 3, the server calculates according to the data information to obtain target data.
In a general scene, the server side performs calculation according to the data information, and can correspondingly sequence the wind power information according to the wind power value of the wind power information, and the target data can correspondingly correspond to the wind power information with the highest wind power value.
The drawbacks of the above steps are readily apparent from the background art. In step 2, massive data information needs to be transmitted to the data exchange device via the data storage device and then forwarded to the server. In addition, since the data information storage devices are distributed in different cities, from the viewpoint of the routing nodes, a plurality of routing nodes may be required for the transmission from the data storage devices to the data exchange device, which has a small influence on the pressure of communication transmission, and the transmission mode is also very easy to cause data leakage.
Based on the above, the application discloses a distributed data management system based on natural language analysis, as shown in fig. 2A. In contrast to fig. 1, a second logic module is also included in each data storage device, the second logic module including the first interface.
To illustrate the second logic module in more detail, FIG. 2B subdivides the first logic module and the second logic module. The first logic module includes a fourth interface, a first information calculating module, and other modules, where the first information calculating module is configured to process the calculating process described in the above step 3, and the other modules are configured to process other processes, for example: issuing instructions to gather data information, etc. The second information calculation module is obtained through natural language analysis according to the first information calculation module.
Accordingly, referring to fig. 2A, it is not difficult to generalize the above process to include steps S1 to S4.
Step S1, data information is acquired from a data storage device in response to an instruction for gathering the data information.
And S2, calculating the data information by using a second information calculation module to obtain intermediate data and sending the intermediate data to the data exchange equipment.
And step S3, transmitting the intermediate data in the data exchange equipment to the server.
Step S4, the server calculates the intermediate data by using a first information calculation module to obtain target data; the second information calculation module is obtained by the first information calculation module through natural language analysis.
It will be appreciated that the communication process described above may be as shown in fig. 4A. It should be noted that, the data exchange device is not shown in fig. 4A, but it should be understood that, when the data storage device sends the intermediate data to the server, the intermediate data needs to be combined at the data exchange device and then forwarded to the server. When the server sends the second information calculation module to the data storage devices, the server also relies on the data exchange device as an intermediary to distribute the second information calculation module to the plurality of data storage devices.
It will be appreciated that in one general scenario of the present application described above, it is readily known by natural language analysis: the second information calculation module is identical to the first information calculation module.
It should be noted that, in the present application, natural language analysis is mainly used to analyze the meaning of the code, in fact, since the compiler itself is provided with a certain analysis tool, and the grammar of the code is more strict, the difficulty of analyzing the meaning of the code is far less than that of analyzing the natural language. Thus, only specific training of the code through natural language analysis is required to achieve the effects of the application that are obtained through natural language analysis.
To describe the general principles of natural language analysis, this paragraph next describes a non-generic scenario. In some machine learning or deep learning algorithm scenarios, a neural network needs to be trained by collecting massive wind data, temperature data and humidity data, so as to reach a conclusion of weather prediction, and the code framework can be shown in fig. 3A. The expression form of the first information calculating module may be a function f1 in fig. 3A or a function f3 in fig. 3B. The input parameters are wind data wind_datas, temperature data temp_datas and humidity data humi_datas, respectively. The function ft_pre will typically preprocess the initial data, e.g., process the "raw" temperature data, and then obtain the processed temperature data temp_datas2. The function ft_class is used for classifying data, the temperature data l_temp_datas after partial processing is used for machine learning, and the temperature data s_temp_datas after partial processing is used for verifying the learning result. The function ft_learning, the function ft_confirm, and the function f_adjustment are respectively a machine learning method, a method of verifying a learning result, and a method of adjusting a machine learning method according to a verification confidence.
It should be noted that, in view of simplicity, the code of fig. 3A is exemplified by c++, but it does not strictly follow the syntax of c++, and there is a syntax problem such as lack of type in the input parameters of #40 line. But this does not affect the specific process of analyzing embodiments of the present application.
It may be appreciated that, in the scenario where the data information is applied to machine learning or deep learning, in step S4, the second information calculating module obtains, through natural language analysis, the specific steps S101 to S103 from the first information calculating module.
Step S101, obtaining the name of the training model variable. The training model variables are the set of all parameters in the algorithm model.
Step S102, finding the name of the training model variable in the first information calculation module, and cutting the first information calculation module into a first cutting module and a second cutting module, wherein the second cutting module does not contain any information of the training model variable.
Step S103, the training model variable is referenced in an input interface of the first cutting module, the first cutting module is used as a second information calculation module, and the second cutting module is used for replacing the first information calculation module.
Correspondingly, the data storage device at least comprises a first data storage device, a second data storage device and a third data storage device, and the step 2 specifically comprises steps S201 to S203.
Step S201, calculating the data information by using a second information calculation module of the first data storage device so as to iterate the reference amount; wherein the reference quantity comprises the training model variable.
Step S202, calculating the data information by using a second information calculation module of the second data storage device so as to iterate the reference amount.
It will be appreciated that the second information calculation module in the second data storage device is not transmitted by the server, but by the first data storage device via the data exchange device. Thus, the second information computation module in the second data storage device carries information of the reference amount. The second information calculation module of the third data storage device hereinafter is similar.
Step S203, calculating the data information by using the second information calculation module of the third data storage device to iterate the reference amount, and sending the iterated reference amount to the server.
The communication method between the first logic module and the second logic module is applied to the scene of machine learning or deep learning, and the communication method between the first logic module and the second logic module is applied to the scene of ordinary. The difference can be seen in fig. 4B and fig. 4A. The first data storage device, the second data storage device, and the third data storage device may correspond to the data storage device 1, the data storage device 2, and the data storage device 3 in fig. 4B, respectively. It will be appreciated that the data exchange device may be intermediated when the data storage device 1 transmits training model variables to the data storage device 2. Furthermore, when the data storage device 1 transmits the training model variables to the data storage device 2, the training model variables are iterated through the data storage device 1 at this time. Therefore, the training model variables must be referenced in the input interface of the first cutting module mentioned in step S103, explaining this reason.
It will be appreciated that, for example, in FIG. 3A, the training model variable is alg_params. In general, taking fig. 3A as an example, it stores training results in class type, and the training results should at least include: an algorithm for training a model, parameters for training a model, and an interface for training a model. It will be appreciated that the algorithm for training the model refers to the machine learning algorithm or the deep neural network algorithm, etc.; the parameters of the training model refer to all parameters corresponding to the algorithm of the training model, and the parameters are continuously optimized along with the continuous training of the learning of each data information; the interface of the training model internally references the training model variable for obtaining a specific conclusion according to the input of new data information. The specific conclusion is the final goal of the algorithm of the training model. In short, the training model variables are trained with known data information, so that when new data information is entered, specific conclusions can be reached through the interface of the training model containing the training model variables.
It is not difficult to find that the parameters of the training model generally must be configured with some parameter names, regardless of the content of the algorithm itself. For example: model parameters, tuning parameters, super parameters, etc. Thus, the obtaining the names of the training model variables in step S101 may specifically include: analyzing the effect of each variable in the first information calculation module one by one according to natural language analysis; and determining the names of the training model variables according to the action of each variable. More importantly, an interface is inevitably reloaded in the training model, and the participation of the interface contains the type of the data information. The obtaining the name of the training model variable in step S101 may further include: and obtaining the type of the data information, searching a function of the type of the data information contained in the input parameters of the interface, and determining the name of the training model variable according to the function.
Typically, algorithms are packaged in a single package of written algorithm files that may include a number of annotations for the algorithm. Thus, step S101 further includes: and confirming the action of the variable according to the annotation information in the first information calculation module. In addition, considering that the current natural language processing capability is still in the development stage, the confirmation can also be assisted in a manual judgment mode.
In fig. 3A, it is understood that the first cutting module includes the #41 to #43 row codes, and the second cutting module includes the #44 row codes.
Fig. 3B gives another example of a scenario. In fig. 3B, the training model variables are iteratively adjusted by the variable belief in the for loop. The variable belief is used for representing the credibility of the current training model variable. In fig. 3B, it is understood that the first cutting module includes the #61 to #68 row codes, and the second cutting module is empty. With reference to fig. 3B and fig. 4B, it is not difficult to verify the feasibility of the transmission and invocation of the data information in steps S101 to S103 and S201 to S203 in the scenario of fig. 3B.
More specifically, the feasibility of the upper segment assumes that in the first scenario, the ft_class function is used to cut the data information into equal parts by the base data amount 100w, where the ratio of s_temp_data to l_temp_data is 9:1. And assuming a number of data storage devices of 3, the number of data on the first, second and third data storage devices is 520w, 730w and 930w, respectively. Assuming that the most original method is adopted, that is, all data is transmitted to the server at one time and is uniformly processed by the server, s_temp_data.length () at the #64 line code is celing ((520+730+930)/100) =22. Wherein, cening is rounding up the symbol. If the methods of steps S101 to S103 and S201 to S203 are adopted, s_temp_data.length () in the second information calculation modules in the first data storage device, the second data storage device and the third data storage device are 6, 8 and 10, respectively. The only difference between the two is that the number of cut parts is greater in the examples of the present application. Many times, when the number of actual cuts is too small compared to the amount of base data, the data itself is wasted. That is, the last 20w data in the first data storage device, or the last 30w data in the second data storage device, or the last 30w data in the third data storage device, may not achieve a good training effect because the amount of data is too small compared to the amount of base data, thereby wasting a total of 80w of data.
The above situation may generally be disregarded because this approach necessitates more natural language analysis techniques that analyze at least the effects of ft_class and track validation of validation data, e.g., s_temp_datas. This is a great difficulty for either current code semantic analysis or natural language analysis. However, since the function ft_class is generally independent of the actual algorithm module and is generally determined by the actual project and engineering, the ft_class may be determined by preprocessing means such as manual marking, for example, directly and manually defining a first mapping table, where a key represents a function name, for example, ft_class, and a value represents the function of the function, for example, may be a "data cutting function". The first mapping table is used for performing targeted perfect processing on details, and the values of the first mapping table should be expressed as uniformly as possible. The reason for unified expression is that the preprocessing means further comprises a second mapping table, the keys of which exactly correspond to the values of the first mapping table, and the values of which contain the processing means information.
Based on this, in some embodiments, in step S4, the second information calculating module is obtained by the first information calculating module through natural language analysis and further includes steps S301 to S304, where steps S301 to S304 are performed before step S103.
Step S301, determining whether a function in the first information calculating module appears in a preset first mapping table.
Step S302, if the function g appears in a preset first mapping table, the corresponding function of the function g is obtained according to the first mapping table, and the processing means information is obtained according to a second mapping table.
Step S303, according to the processing means information, and combining natural language analysis to determine verification data.
Step S304, the verification data is referenced in the input interface of the first cutting module, and the first cutting module is modified to adapt to the interface by combining the processing means information.
Correspondingly, the reference quantity also includes the verification data.
Taking fig. 3B as an example, assuming that the function g is ft_class and the corresponding function is "data slicing function", the processing means information may include condition settings for a specific code segment, threshold settings for parameters, and an adaptive function. The specific code segment here is typically the code segment referenced to the output result of the function g (e.g., l_temp_datas), e.g., the #65 line code segment. The threshold setting of the parameter may be determining the size of l_temp_datas [ i ]. The adaptation function is used to modify the first cutting module to adapt it to the interface.
For convenience of explanation, the verification data is divided into first verification data and second verification data, wherein the first verification data is associated with the verification data in the reference amount in step S201, and the second verification data is associated with the verification data in the reference amount in step S202. In the first scenario, it is understood that the first verification data is l_temp_data [5], l_wind_data [5], l_humi_data [5], s_temp_data [5], s_wind_data [5], s_humi_data [5]. And the second verification data is l_temp_datas [7], l_wind_datas [7], l_humi_datas [7], s_temp_datas [7], s_wind_datas [7], s_humi_datas [7]. In step S201, when the #64 line code starts to execute for the 6 th time, since the data amount of l_temp_datas [5] is 20×9/10=18w, the data which is not executed is packed and transmitted together (i.e. the verification data is referenced in the input interface of the first cutting module) to the second data storage device according to the threshold setting of the parameter. In the second data storage device, its fitness function may be, as in the merge function of fig. 3C, merging 20w parts of data referenced in the first data storage device, i.e., quote. Thus solving the problem of data waste. After the first cutting module is modified to accommodate the interface, the first cutting module is as shown in fig. 3C.
In summary, the application discloses a distributed data management system based on natural language analysis, comprising: the system comprises a plurality of data storage devices, a data exchange device and a server.
The data information is stored in a plurality of data storage devices in a distributed mode, the data storage devices respond to instructions for collecting the data information to obtain the data information, the second information calculation module is used for calculating the data information, and intermediate data are obtained and sent to the data exchange device.
The data exchange device is used for sending the intermediate data to the server.
The server side is used for sending an instruction for collecting data information, and calculating the intermediate data by using the first information calculation module to obtain target data; the second information calculation module is obtained by the first information calculation module through natural language analysis.
While the applicant has described and illustrated the embodiments of the present application in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present application, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present application, and not to limit the scope of the present application, but any improvements or modifications based on the spirit of the present application should fall within the scope of the present application.

Claims (9)

1. A distributed data management method based on natural language analysis is characterized by comprising the following steps S1-S4;
step S1, responding to an instruction for collecting data information, and acquiring the data information from a data storage device;
step S2, calculating the data information by using a second information calculation module to obtain intermediate data and sending the intermediate data to the data exchange equipment;
step S3, intermediate data in the data exchange equipment are sent to a server;
step S4, the server calculates the intermediate data by using a first information calculation module to obtain target data; the second information calculation module is obtained by the first information calculation module through natural language analysis.
2. The distributed data management method based on natural language analysis according to claim 1, wherein in step S4, the second information calculation module obtains the specific steps S101 to S103 by the first information calculation module through natural language analysis, when the data information is applied to a scene of machine learning or deep learning;
step S101, obtaining the names of training model variables;
step S102, finding the name of the training model variable in the first information calculation module, and taking the first information calculation module as a first cutting module and a second cutting module, wherein the second cutting module does not contain any information of the training model variable;
step S103, referring to the training model variable in an input interface of the first cutting module, taking the first cutting module as a second information calculation module, and replacing the first information calculation module with the second cutting module;
correspondingly, the data storage device at least comprises a first data storage device, a second data storage device and a third data storage device, and the step 2 specifically comprises steps S201-S203;
step S201, calculating data information by using a second information calculation module of the first data storage device so as to iteratively train model variables;
step S202, calculating the data information by using a second information calculation module of the second data storage device according to the training model variables iterated in the step S201 so as to iterate the training model variables;
step S203, the second information calculation module of the third data storage device is utilized, and according to the training model variable iterated in step S202, the data information is calculated, so as to iterate the training model variable, and the training model variable is sent to the server.
3. The method for distributed data management based on natural language analysis according to claim 2, wherein the obtaining the names of the training model variables in step S101 specifically includes: analyzing the effect of each variable in the first information calculation module one by one according to natural language analysis; and determining the names of the training model variables according to the action of each variable.
4. The method of claim 2, wherein obtaining the names of the training model variables in step S101 further comprises: and obtaining the type of the data information, searching a function of the type of the data information contained in the input parameters of the interface, and determining the name of the training model variable according to the function.
5. The distributed data management method according to claim 2, wherein step S101 further comprises: and confirming the action of the variable according to the annotation information in the first information calculation module.
6. The distributed data management method based on natural language analysis according to claim 2, wherein in step S4, the second information calculation module is obtained by the first information calculation module through natural language analysis further includes steps S301 to S304, wherein steps S301 to S304 are performed before step S103;
step S301, judging whether a function in a first information calculation module appears in a preset first mapping table;
step S302, if the function g appears in a preset first mapping table, acquiring the corresponding action of the function g according to the first mapping table, and acquiring the processing means information according to a second mapping table;
step S303, according to the processing means information, and combining natural language analysis to determine verification data;
step S304, referring to the verification data in the input interface of the first cutting module, and modifying the first cutting module to adapt to the interface by combining the processing means information;
correspondingly, the reference quantity also includes the verification data.
7. A distributed data management system based on natural language analysis for performing the method of any one of claims 1-6, comprising: comprising the following steps: the system comprises a plurality of data storage devices, data exchange devices and a server;
the data information is stored in a plurality of data storage devices in a distributed mode, the data storage devices respond to instructions for collecting the data information to obtain the data information, and the second information calculation module is utilized to calculate the data information, so that intermediate data is obtained and sent to the data exchange device;
the data exchange equipment is used for sending the intermediate data to the server;
the server side is used for sending an instruction for collecting data information, and calculating the intermediate data by using the first information calculation module to obtain target data; the second information calculation module is obtained by the first information calculation module through natural language analysis.
8. A terminal comprising a processor and a storage medium; the method is characterized in that:
the storage medium is used for storing instructions;
the processor being operative according to the instructions to perform the steps of the method according to any one of claims 1-6.
9. Computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-6.
CN202310976377.XA 2023-08-04 2023-08-04 Distributed data management method based on natural language analysis Active CN116684437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310976377.XA CN116684437B (en) 2023-08-04 2023-08-04 Distributed data management method based on natural language analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310976377.XA CN116684437B (en) 2023-08-04 2023-08-04 Distributed data management method based on natural language analysis

Publications (2)

Publication Number Publication Date
CN116684437A true CN116684437A (en) 2023-09-01
CN116684437B CN116684437B (en) 2023-10-03

Family

ID=87789512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310976377.XA Active CN116684437B (en) 2023-08-04 2023-08-04 Distributed data management method based on natural language analysis

Country Status (1)

Country Link
CN (1) CN116684437B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190164083A1 (en) * 2017-11-28 2019-05-30 Adobe Inc. Categorical Data Transformation and Clustering for Machine Learning using Natural Language Processing
US20200301932A1 (en) * 2019-03-20 2020-09-24 Promethium, Inc. Using stored execution plans for efficient execution of natural language questions
CN112769907A (en) * 2020-12-29 2021-05-07 苏宁消费金融有限公司 Internal data exchange system of distributed system based on internet financial industry
CN113297218A (en) * 2021-05-20 2021-08-24 广州光点信息科技有限公司 Multi-system data interaction method, device and system
CN113807950A (en) * 2021-09-22 2021-12-17 平安银行股份有限公司 Business analysis method based on natural language processing model and related device
US20220019934A1 (en) * 2020-07-15 2022-01-20 Bank Of America Corporation System for artificial intelligence-based electronic data analysis in a distributed server network
CN114035936A (en) * 2021-10-15 2022-02-11 北京潞晨科技有限公司 Multidimensional parallel processing method, system and equipment based on artificial intelligence and readable storage medium
CN114238438A (en) * 2021-12-10 2022-03-25 北京天融信网络安全技术有限公司 Method, device, equipment and medium for real-time calculation and statistics of data
CN115017420A (en) * 2022-05-06 2022-09-06 上海捷晓信息技术有限公司 Intelligent address searching system and method based on deep learning
CN115879541A (en) * 2022-12-05 2023-03-31 阿里巴巴(中国)有限公司 Data processing method, data representation learning method, system and equipment
CN116306897A (en) * 2023-02-06 2023-06-23 杭州电子科技大学 Neural network distributed automatic parallel training method based on AC reinforcement learning

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190164083A1 (en) * 2017-11-28 2019-05-30 Adobe Inc. Categorical Data Transformation and Clustering for Machine Learning using Natural Language Processing
US20200301932A1 (en) * 2019-03-20 2020-09-24 Promethium, Inc. Using stored execution plans for efficient execution of natural language questions
US20220019934A1 (en) * 2020-07-15 2022-01-20 Bank Of America Corporation System for artificial intelligence-based electronic data analysis in a distributed server network
CN112769907A (en) * 2020-12-29 2021-05-07 苏宁消费金融有限公司 Internal data exchange system of distributed system based on internet financial industry
CN113297218A (en) * 2021-05-20 2021-08-24 广州光点信息科技有限公司 Multi-system data interaction method, device and system
CN113807950A (en) * 2021-09-22 2021-12-17 平安银行股份有限公司 Business analysis method based on natural language processing model and related device
CN114035936A (en) * 2021-10-15 2022-02-11 北京潞晨科技有限公司 Multidimensional parallel processing method, system and equipment based on artificial intelligence and readable storage medium
CN114238438A (en) * 2021-12-10 2022-03-25 北京天融信网络安全技术有限公司 Method, device, equipment and medium for real-time calculation and statistics of data
CN115017420A (en) * 2022-05-06 2022-09-06 上海捷晓信息技术有限公司 Intelligent address searching system and method based on deep learning
CN115879541A (en) * 2022-12-05 2023-03-31 阿里巴巴(中国)有限公司 Data processing method, data representation learning method, system and equipment
CN116306897A (en) * 2023-02-06 2023-06-23 杭州电子科技大学 Neural network distributed automatic parallel training method based on AC reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SEONGIK PARK等: ""A Neural Language Model for Multi-Dimensional Textual Data based on CNN-LSTM Network"", 《2018 19TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD)》 *
刘旭东;苏马婧;朱广宇;: "基于自然语言处理的多源情报分析系统的研究与设计", 信息技术与网络安全, no. 05 *
杨越童: ""面向NLP领域中稀疏模型的分布式训练优化技术研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN116684437B (en) 2023-10-03

Similar Documents

Publication Publication Date Title
CN110704518B (en) Business data processing method and device based on Flink engine
CN106528880B (en) Method and system for regulating data structure format of multi-source power business data
CN105072130B (en) A kind of ASN.1 decoders code automatic generation method
CN111585344B (en) Substation intelligent checking method and device based on total station IED simulation
CN111191767A (en) Vectorization-based malicious traffic attack type judgment method
CN111427940B (en) Self-adaptive database conversion method and device
CN111048080A (en) Intelligent dispatching command system based on voice recognition technology
CN115170344A (en) Intelligent processing method and device, medium and equipment for operation events of regulation and control system
CN116684437B (en) Distributed data management method based on natural language analysis
CN101021916A (en) Business process analysis method
CN114817178A (en) Industrial Internet data storage method, system, storage medium and electronic equipment
CN113434123A (en) Service processing method and device and electronic equipment
CN109033483B (en) Method, device and system for defining data relationship in YANG model
CN116628451B (en) High-speed analysis method for information to be processed
CN113094932A (en) Method, device, equipment and storage medium for acquiring construction cost of power transformation project
CN111522705A (en) Intelligent operation and maintenance solution method for industrial big data
CN111507477A (en) Automatic machine learning platform based on block chain
CN115205030A (en) Wind-controlled user portrait system based on configurable big data analysis
CN112015726B (en) User activity prediction method, system and readable storage medium
CN113553728A (en) Method and system for generating intelligent substation station control layer application system model
CN113689310A (en) Data processing method based on industrial Internet of things intelligent chip and related equipment
CN113051445A (en) Industrial production data processing method and device, computer equipment and storage medium
CN110647546A (en) Third-party rule engine generation method and device
CN114884937B (en) New energy centralized control system data breakpoint continuous transmission method
CN117786705B (en) Statement-level vulnerability detection method and system based on heterogeneous graph transformation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant