CN114936187A - Data file processing method, device, equipment and storage medium - Google Patents

Data file processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114936187A
CN114936187A CN202210557890.0A CN202210557890A CN114936187A CN 114936187 A CN114936187 A CN 114936187A CN 202210557890 A CN202210557890 A CN 202210557890A CN 114936187 A CN114936187 A CN 114936187A
Authority
CN
China
Prior art keywords
rule
fragmentation
rule set
data file
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210557890.0A
Other languages
Chinese (zh)
Inventor
蒋冬良
周如龙
欧阳晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yaxin Technology Co ltd
Original Assignee
Guangzhou Yaxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yaxin Technology Co ltd filed Critical Guangzhou Yaxin Technology Co ltd
Priority to CN202210557890.0A priority Critical patent/CN114936187A/en
Publication of CN114936187A publication Critical patent/CN114936187A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data file processing method and a related product, and relates to the field of big data. The method comprises the following steps: if the first rule set is not configured for the data file to be processed, checking the fragment rules in the rule pool, and determining a second rule set from a plurality of experience sets according to at least one matching rule obtained through checking; and carrying out fragment processing on the data file to be processed through the second rule set, and sending the obtained fragment file to the server, wherein the first rule set is a rule set matched with the server. If the first rule set which is matched with the first rule set exists, the fragmentation processing can be carried out through the first rule set, if the first rule set does not exist, the second rule set can be determined from past experience according to the verification operation, and the fragmentation processing can be carried out through the second rule set. Therefore, the reasoning efficiency can be obviously improved for the slicing operation under different scenes.

Description

Data file processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a method and an apparatus for processing a data file, an electronic device, and a computer-readable storage medium.
Background
With the advent of the artificial intelligence era, industries began to focus on mining potential values from massive amounts of data. For example, operators, financial institutions, governments, etc. can reason by applying AI models based on the large amount of data generated under various scenarios to capture the value that is hidden in the data. Due to the limitation of server resources, before reasoning, big data to be inferred are generally segmented, and reasoning is performed according to the segmented data.
Currently, there are two ways of reasoning. The first method is to perform basic fragmentation on a big data file only after the big data file is acquired, and then perform inference based on the fragmented file, and the first method has huge consumption on server resources (mainly including a CPU and a memory) and low inference efficiency. The second mode is only to slightly improve the basic fragments, the reasoning efficiency of the method is improved for a small part of scenes, and the effect of improving most of scenes is not obvious.
Disclosure of Invention
The scheme shown in the embodiment of the application aims to solve one of the technical problems.
According to an aspect of an embodiment of the present application, there is provided a method for processing a data file, the method including:
if the data file to be processed is not configured with the first rule set, checking the fragmentation rules in the rule pool, and determining a second rule set from a plurality of experience sets according to at least one matching rule obtained by checking;
and carrying out fragment processing on the data file to be processed through the second rule set, and sending the obtained fragment file to the server, wherein the first rule set is a rule set matched with the server.
In one possible implementation, each fragmentation rule is configured with a priority level; verifying the fragmentation rule in the rule pool may specifically include:
sequentially verifying each fragmentation rule in the rule pool according to the sequence of the priority levels from high to low so as to determine a first matching degree of each fragmentation rule and server resources; and determining the fragment rule corresponding to the first matching degree which is greater than the preset threshold value as a matching rule.
In another possible implementation manner, determining the second rule set from the multiple experience sets according to the at least one matching rule obtained by verification may specifically include:
determining a second matching degree of each experience set and at least one matching rule, wherein the experience sets comprise at least one slicing rule in a rule pool; and determining a second rule set according to the experience set corresponding to the maximum second matching degree.
In another possible implementation manner, verifying each sharding rule in the rule pool to determine a first matching degree of each sharding rule with the server resource includes:
determining a first resource of each fragmentation rule, wherein the first resource is a server resource required by the corresponding fragmentation rule to process the data file to be processed; and comparing the matching degree of the second resource with each item of the first resource to determine the first matching degree of the corresponding fragment rule and the second resource, wherein the second resource is the currently provided server resource.
In another possible implementation manner, if the second rule set includes the feature protection fragmentation rule; the method for processing the data file to be processed in the fragmentation mode through the second rule set specifically comprises the following steps:
fragmenting the data file to be processed according to other fragmentation rules in the second rule set to obtain at least two fragment files; and adjusting qualified data in each of the at least two fragmented files.
In another possible implementation manner, if the data file to be processed is configured with the first rule set, the method includes:
the method comprises the steps that fragmentation processing is carried out on a data file to be processed through a first rule set, and the obtained fragmentation file is sent to a server;
wherein, the first rule set is a rule set adapted to the server, and comprises: the first rule set is adapted to an inference model configured in the server.
In another possible implementation manner, if the first rule set is not configured for the data file to be processed, the method further includes:
and if the total memory occupied by the data files to be processed is not greater than the first threshold value and the total number of the items of the data files to be processed is not greater than the second threshold value, sending the data files to be processed to a single server.
According to another aspect of the embodiments of the present application, there is provided a data file processing apparatus, including:
the verification module is used for verifying the fragmentation rules in the rule pool if the first rule set is not configured in the data file to be processed, and determining a second rule set from a plurality of experience sets according to at least one matching rule obtained through verification;
and the fragmentation module is used for performing fragmentation processing on the data file to be processed through the second rule set and sending the obtained fragmentation file to the server, and the first rule set is a rule set matched with the server.
According to another aspect of embodiments of the present application, there is provided an electronic device including:
memory, processor and computer program stored on the memory, the processor executing the computer program to perform the steps of a method of data files as illustrated herein.
According to a further aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of data file as presented herein.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
the embodiment of the application provides a data file processing method, which comprises the following steps: if the data file to be processed is not configured with the first rule set, checking the fragmentation rules in the rule pool, and determining a second rule set from a plurality of experience sets according to at least one matching rule obtained by checking; carrying out fragment processing on the data file to be processed through a second rule set, and sending the obtained fragment file to a server; wherein the first rule set is a rule set adapted to the server. If the first rule set matched with the server exists, the data file to be processed can be subjected to fragmentation processing through the first rule set, and if the first rule set does not exist, the second rule set can be determined from past experience according to verification operation, and fragmentation processing is performed on the data file to be processed through the second rule set. Therefore, the reasoning efficiency can be obviously improved for the slicing operation under different scenes.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1a is a schematic flowchart of a conventional offline reasoning method according to an embodiment of the present application;
fig. 1b is a schematic flowchart of an offline inference method based on basic segments according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a data file processing method according to an embodiment of the present application;
fig. 3 is a schematic view of an application scenario of processing a data file according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data file processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The drawings in the application describe embodiments of the application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Several terms referred to in this application will first be introduced and explained:
in light of the background art, fig. 1a and 1b are flowcharts illustrating specific implementations of the first and second embodiments, respectively.
Fig. 1a shows a flow diagram of a conventional offline reasoning method, and after a big data file is received, the big data file is subjected to basic fragmentation, and the obtained fragmented file is sent to a reasoning node of a server for reasoning. However, this method consumes a lot of server resources (mainly CPU and memory), and the reasoning efficiency is low.
Fig. 1b shows a flow diagram of an offline reasoning method based on basic fragmentation, and after a big data file is received, any fragmentation rule is selected from a rule pool to fragment the big data file. Although the fragmentation operation in this manner is helpful to improve the inference efficiency in a part of scenes, the improvement effect is not obvious in most scenes, for example, if the data volume of the fragmented file processed by the inference model changes greatly and requires complete data, the fragmented file obtained by a single fragmentation rule cannot meet the requirement.
The application provides a data file processing method, a data file processing device, an electronic device and a computer-readable storage medium, and aims to solve the above technical problems in the prior art.
The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application are explained below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps, etc. in different embodiments is not repeated.
Referring to fig. 2, an embodiment of the present application provides a method for processing a data file, where the method is applied to a terminal, and the terminal may be an electronic device such as a computer. The method comprises steps S210-S220:
s210, if the first rule set is not configured in the data file to be processed, the fragment rules in the rule pool are verified, and a second rule set is determined from a plurality of experience sets according to at least one matching rule obtained through verification.
The data file to be processed may be an offline big data file, and the offline big data file is characterized by a large data volume or a large memory occupied by the data volume or the memory. The big data file may be a single file of recorded data or may be a folder including a plurality of subfiles of recorded data. For a folder, the total number of entries of the data file to be processed is the total number of entries of each subfile, and the total memory occupied by the data file to be processed is the memory occupied by the folder.
Specifically, all validated fragmentation rules are recorded in the rule pool, and each fragmentation rule can be configured with unique identification, interface information, and other information. Wherein, the first rule set or the second rule set may include one or more than one fragmentation rule.
And S220, carrying out fragment processing on the data file to be processed through the second rule set, and sending the obtained fragment file to the server, wherein the first rule set is a rule set matched with the server.
Wherein the inference model can be configured in the server before preparing to process offline big data. After the inference model is configured, it may also be configured with a customized rule set that fits the inference model. The fragment file obtained by processing the big data file through the customized rule set can adapt to the requirements of the reasoning model on the fragment file. Wherein the customized rule set is a first rule set.
The embodiment of the application provides a data file processing method, which comprises the following steps: the method comprises the following steps: if the first rule set is not configured for the data file to be processed, checking the fragment rules in the rule pool, and determining a second rule set from a plurality of experience sets according to at least one matching rule obtained through checking; carrying out fragmentation processing on the data file to be processed through a second rule set, and sending the obtained fragmentation file to a server; the first rule set is a rule set matched with the server. If the first rule set matched with the server exists, the data file to be processed can be subjected to fragmentation processing through the first rule set, and if the first rule set does not exist, the second rule set can be determined from past experience according to the verification operation, and the data file to be processed is subjected to fragmentation processing through the second rule set. Therefore, the reasoning efficiency can be obviously improved for the slicing operation under different scenes.
The embodiment of the application also provides a possible implementation mode, and each fragmentation rule is configured with a priority level; the slicing rules in the rule pool are verified, including steps Sa1-Sa2 (not shown).
Sa1, checking each sharding rule in the rule pool in sequence from high priority to low priority to determine a first matching degree of each sharding rule with the server resource.
Optionally, the rule pool includes, but is not limited to, the following fragmentation rules: data volume fragmentation rule, reasoning node efficiency fragmentation rule, thread number fragmentation rule, file size fragmentation rule and feature protection fragmentation rule.
In one example, the setting of the priority level includes, but is not limited to: the priority level is set from high to low, and is a first level, a second level and a third level in sequence. For example, the priority levels of the data volume fragmentation rule and the file size fragmentation rule are set to be a first level, the priority levels of the inference node efficiency fragmentation rule and the thread number fragmentation rule are set to be a second level, and the priority level of the feature protection fragmentation rule is set to be a third level. Specifically, based on the setting manner of the priority level, the checking process may include: and checking the fragmentation rules of the corresponding levels from the first level, the second level and the third level in sequence.
In one example, a high priority level may be used to set a sharding rule with a higher frequency of use, and the sharding rule at this level can meet the requirements of most inference models for sharding files, such as a data volume sharding rule; a low priority level may be used to set a less frequently used fragmentation rule, e.g., a feature guard fragmentation rule.
In one example, each sharding rule can be understood as follows. Data volume fragmentation rule: fragmenting the data file to be processed according to the second threshold, for example, if the total number of the items of the data file to be processed is 100 ten thousand, and the second threshold is 30 ten thousand, 4 fragmented files can be obtained, and the data volume of each fragmented file is: 30, 10 ten thousand; the second threshold may be input by a user or may be a default value. File size fragmentation rule: and fragmenting the data file to be processed according to a first threshold, for example, the memory occupied by the data file to be processed is 8G, if the first threshold is 2G, 4 fragmented files can be obtained, and the size of each fragmented file is 2G. Reasoning node efficiency fragmentation rule: and (4) carrying out fragmentation by referring to the server with the residual memory larger than zero, wherein the server is provided with an inference model. Thread number fragmentation rule: carrying out fragmentation according to the total number of threads for processing the fragmented files, for example, if the total number of threads is 3, dividing the data file to be processed into 3 fragmented files occupying the same memory; the total number of threads may be input by a user or may be a default value. The feature protection fragmentation rule is generally used as a fragmentation rule used last in a fragmentation processing process and is used for verifying the integrity of data of a single fragmentation file.
In one possible implementation, the method further includes:
and responding to the newly added operation carrying the new fragmentation rule, adding the new fragmentation rule to the rule pool, and setting a priority level for the new fragmentation rule.
In one possible implementation, the method further includes:
and updating the priority level of the target fragmentation rule in response to the updating operation aiming at the target fragmentation rule.
With the progress and development of the reasoning model, the old fragmentation rule may not adapt to changes and become infrequently used or unused, and the priority level of the old fragmentation rule can be reduced; or, if the existing fragmentation rules are not applicable, the rule pool can be enriched by expanding new fragmentation rules.
The embodiment of the present application further provides a possible implementation manner, if the second rule set includes the feature protection fragmentation rule, performing fragmentation processing on the data file to be processed through the second rule set, which specifically may include:
fragmenting the data file to be processed according to other fragmentation rules in the second rule set to obtain at least two fragment files; and adjusting the data meeting the conditions in each of the at least two fragmented files based on the feature protection fragmentation rule.
Specifically, when the data file to be processed is sliced according to the second rule set, each slicing rule in the second rule set needs to be executed. It should be noted that, when processing is performed using the fragmentation rule, the priority level of the fragmentation rule does not need to be referred to.
Specifically, the following processing is performed on each of the at least two fragmented files: retrieving the target characteristics of each piece of data of each fragmented file, sequencing each piece of data in each fragmented file, matching the first piece of data and the last piece of data of each fragmented file, moving qualified data in each fragmented file, and merging the qualified data.
In one example, after sharded files A, B, and C are obtained, each sharded file describes production detail data for each quarter of a year. Wherein, the fragment file A: production detail data of quarter 1 in 2019, quarter 2 in 2019, quarter 3 in 2019 and quarter 4 in 2019; the fragment file B comprises: production detail data of quarter 1 in 2020, quarter 2 in 2020, quarter 3 in 2020, quarter 4 in 2020, and quarter 1 in 2021; and (3) slicing the file C: production details data on quarter 2 of 2021, quarter 3 of 2021, and quarter 4 of 2021. After the above processing is performed on the fragment file a, the fragment file B, and the fragment file C, if it is found that the data of "quarter 1 in 2021" in the fragment file B needs to be adjusted, the data of "quarter 1 in 2021" is moved from the fragment file B to the fragment file C.
It should be noted that this example is merely illustrative of one way of using the feature guard fragmentation rule and is not intended to be limiting. The feature protection fragmentation rule may have other manners, and this application is not limited thereto.
The embodiment of the present application further provides a possible implementation manner, where each fragmentation rule in the rule pool is verified to determine a first matching degree between each fragmentation rule and a server resource, and the method specifically includes:
determining a first resource of each fragmentation rule, wherein the first resource is a server resource required by the corresponding fragmentation rule to process the data file to be processed; and comparing the matching degree of the second resource with each first resource to determine the first matching degree of the corresponding fragmentation rule and the second resource, wherein the second resource is the currently provided server resource.
The second resource may include a remaining memory resource of each server provided. The first resource may include a server memory resource required for each fragmented file.
Specifically, the total number of entries of the data file to be processed and the total memory occupied by the data file to be processed are obtained, and the following operations are executed for any one fragmentation rule: and obtaining the fragmentation rule to perform simulated fragmentation on the data file to be processed to obtain the memory occupied by each fragmented file. And comparing the memory occupied by each fragment file with the residual memory resources of each server in the second resources in order to obtain a first matching degree of the first resources and the second resources.
Optionally, the ordered comparison process may include sequentially comparing the fragmented files with the highest memory among the fragmented files. In the comparison process, a first matching degree can be determined according to a plurality of indexes, such as a first index and a second index. Wherein the first index includes: whether all the fragmented files can be matched with the server (for example, the remaining memory provided by the server can satisfy the fragmented files) is determined, if so, the value of the first index is a full score, and if not, the value of the first index is 0. The second index includes: after determining the servers matching each sharded file, determining the utilization rate of each server, and determining the value of the second index by all utilization rates.
By performing the verification operation on each fragmentation rule in the verification stage, the provided server resources can be ensured to be adapted to the obtained fragmentation file.
Sa2 determines the fragmentation rule corresponding to the first matching degree greater than the preset threshold as the matching rule.
The determined matching rule may be one or more than one.
The embodiment of the present application further provides a possible implementation manner, where the determining the second rule set from the multiple experience sets according to at least one matching rule specifically includes:
determining a second matching degree of each experience set and at least one matching rule, wherein the experience sets comprise at least one slicing rule in a rule pool; and determining a second rule set according to the experience set corresponding to the maximum second matching degree.
Optionally, a plurality of inference processes with high inference efficiency are screened out from the historical inference process, and the fragmentation rule sets corresponding to the plurality of inference processes are determined as experience sets. Alternatively, a user input set of fragmentation rules is received and the input set of fragmentation rules is determined to be a set of experiences. Each experience set may include one slicing rule, or more than one slicing rule.
Optionally, the determining of the second matching degree includes: and if a single matching rule exists, determining a second matching degree by calculating the proportion of the matching rule in the experience set, and determining the experience set with the highest second matching degree as a second rule set. If more than one matching rule exists, screening out the experience sets meeting the conditions by taking the experience sets comprising at least one matching rule as the conditions, and setting the second matching degree of the other experience sets not meeting the conditions to be zero; then, calculating a second matching degree of each qualified experience set and the corresponding matching rule, and executing the following operations for each qualified experience set: determining the slicing rule of the experience set and the set consisting of all the matching rules, sequentially determining a first proportion of the overlapped slicing rule in all the matching rules and a second proportion of the overlapped slicing rule in the experience set, and determining a second matching degree according to the first proportion and the second proportion.
Optionally, if the experience set corresponding to the largest second matching degree includes one experience set, determining the single experience set as a second rule set; and if the experience set corresponding to the maximum second matching degree comprises more than one experience set, screening out a second rule set from the more than one experience set according to the priority level of the fragmentation rule.
In an example, screening out the second rule set from the more than one experience set according to the priority level of the fragmentation rule may specifically include: counting a first total number of the fragmentation rules belonging to the first level in each experience set, and determining the experience set with the largest first total number as a second rule set; and if more than one same first total number exists during the statistics of the fragmentation rules of the first level, counting a second total number of the fragmentation rules belonging to the second level in each experience set, and determining the experience set with the maximum second total number as a second rule set. And so on until a second rule set is determined.
The embodiment of the present application further provides a possible implementation manner, where if the to-be-processed data file is configured with the first rule set, the method includes:
and carrying out fragment processing on the data file to be processed through the first rule set, and sending the obtained fragment file to the server.
Wherein, the first rule set is a rule set adapted to the server, and comprises: the first rule set is adapted to an inference model configured in the server.
In one example, the server is configured with an inference model A, an inference model B and an inference model C, and the data volume processed by each inference model is different. The data volume of the fragment files processed by the inference model A is moderate, the service is simple, and a data volume fragment rule can be designated as a first rule set of the inference model A; the data volume of the fragment files processed by the inference model B is greatly changed, the service is simple, and a data volume fragment rule and an inference node efficiency fragment rule can be specified; the data volume of the fragment files processed by the inference model C is moderate, and since the inference model C is a time sequence type model and requires data integrity, a data volume fragment rule and a feature protection fragment rule can be designated as a first rule set of the inference model C.
In the actual processing process, the memory occupied by the large data files to be analyzed is different in size, some large data files are larger, some large data files are smaller, and time is not wasted for fragmentation.
The embodiment of the present application further provides a possible implementation manner, and if the to-be-processed data file is not configured with the first rule set, the method further includes:
and if the total memory occupied by the data files to be processed is not greater than the first threshold value and the total number of the items of the data files to be processed is not greater than the second threshold value, sending the data files to be processed to a single server.
The data file processing method disclosed by the embodiment of the application can be suitable for various fragment scenes of large data files. In order to describe the processing method of the data file more clearly, the embodiment of the present application further provides a flow diagram of an offline big data file processing scheme to illustrate the method, as shown in fig. 3. The scheme includes steps S1001 to S1006.
S1001, the system receives a large data file A.
S1002, the system judges whether the file A has a configuration customized rule set (corresponding to the first rule set).
If the customized rule set is configured, S1005 is executed. If the customized rule set is not configured, S1003 is executed.
S1003, the system starts a verification module to verify the file A.
The data volume and the occupied memory of the file A are obtained. If the data size of the file a is not greater than the data size threshold (corresponding to the second threshold), and the memory occupied by the file a is not greater than the memory threshold (corresponding to the first threshold), then S1006 is directly executed, otherwise S1004 is executed.
S1004, the system checks all the fragmentation rules in the rule pool according to the file A, and screens out the fragmentation rule set (corresponding to a second rule set) adapted to the file A.
Wherein, the rule pool comprises the following fragmentation rules: data volume rules, reasoning node efficiency fragmentation rules, thread number fragmentation rules, file size fragmentation rules, and feature protection fragmentation rules. Wherein, the rule pool can also be added with new fragment rules.
Each fragmentation rule in the rule pool is provided with a priority level, and the priority level of each fragmentation rule is referred to in the process of determining the fragmentation rule set.
S1005, the system processes the file A by the customized rule set or the fragment rule set.
Specifically, after the fragmentation processing, at least one fragmented file, for example, fragmented file 1, fragmented file 2, fragmented file 3, and fragmented file 4, is obtained.
And S1006, the system sends the fragment file 1, the fragment file 2, the fragment file 3 and the fragment file 4 to a server cluster for processing.
The server cluster comprises a plurality of servers, each server is an inference node, and each inference node is provided with an inference model. And after the server cluster receives the fragment file, distributing the fragment file to a corresponding server node for processing.
Referring to fig. 4, an embodiment of the present application provides a data file processing apparatus, where the apparatus 400 may include: a verification module 410 and a fragmentation module 420.
The checking module 410 is configured to, if the first rule set is not configured for the data file to be processed, check the fragmentation rule in the rule pool, and determine a second rule set from the multiple experience sets according to at least one matching rule obtained through the checking.
And the fragmentation module 420 is configured to perform fragmentation processing on the data file to be processed through the second rule set, and send the obtained fragmentation file to the server, where the first rule set is a rule set adapted to the server.
In one possible implementation, each fragmentation rule is configured with a priority level; the checking module 410 is specifically configured to, in checking the fragmentation rule in the rule pool to obtain at least one matching rule:
sequentially verifying each fragmentation rule in the rule pool according to the sequence of the priority levels from high to low so as to determine a first matching degree of each fragmentation rule and server resources; and determining the fragment rule corresponding to the first matching degree which is greater than the preset threshold value as a matching rule.
In one possible implementation, the checking module 410 is configured to determine the second rule set from the plurality of experience sets according to the at least one matching rule, and specifically to:
determining a second matching degree of each experience set and at least one matching rule, wherein the experience sets comprise at least one fragment rule in a rule pool; and determining a second rule set according to the experience set corresponding to the maximum second matching degree.
In a possible implementation manner, the verifying module 410 is specifically configured to, in verifying each fragmentation rule in the rule pool and determining the first matching degree between each fragmentation rule and the server resource, determine that:
determining a first resource of each fragmentation rule, wherein the first resource is a server resource required by the corresponding fragmentation rule for processing a data file to be processed; and comparing the matching degree of the second resource with each first resource to determine the first matching degree of the corresponding fragmentation rule and the second resource, wherein the second resource is the currently provided server resource.
In a possible implementation manner, if the second rule set includes the feature protection fragmentation rule, the fragmentation module 420 performs fragmentation processing on the data file to be processed through the second rule set to obtain a fragmentation file specifically configured to:
fragmenting the data file to be processed according to other fragmentation rules in the second rule set to obtain at least two fragment files;
and adjusting qualified data in each of the at least two fragmented files.
In a possible implementation manner, if the pending data file is configured with the first rule set, the fragmentation module 420 is further configured to:
and processing the data file to be processed through the first rule set, and sending the obtained fragment file to the server.
Wherein, the first rule set is a rule set adapted to the server, and comprises: the first rule set is adapted to an inference model configured in the server.
In a possible implementation manner, if the first rule set is not configured for the data file to be processed, the checking module 410 may be further configured to:
and if the total memory occupied by the data files to be processed is not greater than the first threshold value and the total number of the items of the data files to be processed is not greater than the second threshold value, sending the data files to be processed to a single server.
The apparatus of the embodiment of the present application may execute the method provided by the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the apparatus of the embodiments of the present application correspond to the steps in the method of the embodiments of the present application, and for the detailed functional description of the modules of the apparatus, reference may be specifically made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
In an embodiment of the present application, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory, where the processor executes the computer program to implement the steps of a method for processing a data file, and compared with the related art, the method can implement: .
In an alternative embodiment, an electronic device is provided, as shown in fig. 5, the electronic device 5000 shown in fig. 5 includes: a processor 5001 and a memory 5003. The processor 5001 and the memory 5003 are coupled, such as via a bus 5002. Optionally, the electronic device 5000 may further include a transceiver 5004, and the transceiver 5004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that the transceiver 5004 is not limited to one in practical application, and the structure of the electronic device 5000 is not limited to the embodiment of the present application.
The Processor 5001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 5001 may also be a combination of processors implementing computing functionality, e.g., a combination comprising one or more microprocessors, a combination of DSPs and microprocessors, or the like.
Bus 5002 can include a path that conveys information between the aforementioned components. The bus 5002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 5002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The Memory 5003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium, other magnetic storage devices, or any other medium that can be used to carry or store computer programs and that can be Read by a computer, without limitation.
The memory 5003 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 5001 for execution. The processor 5001 is configured to execute computer programs stored in the memory 5003 to implement the steps shown in the foregoing method embodiments.
Among them, electronic devices include but are not limited to: a computer device.
Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program may implement the steps and corresponding contents of the foregoing method embodiments.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments may be implemented.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and claims of this application and in the preceding drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times. Under the scenario that the execution time is different, the execution sequence of the sub-steps or phases may be flexibly configured according to the requirement, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in this application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of this application are also within the protection scope of the embodiments of this application without departing from the technical idea of this application.

Claims (10)

1. A method for processing a data file, the method comprising:
if the data file to be processed is not configured with the first rule set, checking the fragmentation rules in the rule pool, and determining a second rule set from a plurality of experience sets according to at least one matching rule obtained by checking;
and carrying out fragment processing on the data file to be processed through the second rule set, and sending the obtained fragment file to a server, wherein the first rule set is a rule set matched with the server.
2. The method of claim 1, wherein each fragmentation rule is configured with a priority level; the slicing rules in the check rule pool include:
sequentially verifying each fragmentation rule in the rule pool according to the sequence of the priority levels from high to low so as to determine a first matching degree of each fragmentation rule and server resources;
and determining the slicing rule corresponding to the first matching degree which is greater than a preset threshold value as the matching rule.
3. The method of claim 2, wherein determining the second set of rules from the plurality of empirical sets based on the at least one matching rule from the verification comprises:
determining a second matching degree of each experience set and the at least one matching rule, wherein the experience set comprises at least one slicing rule in the rule pool;
and determining the second rule set according to the experience set corresponding to the maximum second matching degree.
4. The method according to claim 2 or 3, wherein the checking each sharding rule in the rule pool to determine a first matching degree of each sharding rule with the server resource comprises:
determining a first resource of each fragmentation rule, wherein the first resource is a server resource required by the corresponding fragmentation rule for processing the data file to be processed;
and comparing the matching degree of the second resource with each item of the first resource to determine the first matching degree of the corresponding fragmentation rule and the second resource, wherein the second resource is the currently provided server resource.
5. The method of claim 1, wherein if the second rule set includes a feature guard fragmentation rule; the slicing processing of the data file to be processed by the second rule set includes:
fragmenting the data file to be processed according to other fragmentation rules in the second rule set to obtain at least two fragment files;
and adjusting the qualified data in each of the at least two fragmented files.
6. The method of claim 1, wherein if the pending data file is configured with the first rule set, the method comprises:
carrying out fragment processing on the data file to be processed through the first rule set, and sending the obtained fragment file to a server;
wherein the first rule set is a rule set adapted to the server, and includes:
the first rule set is adapted to an inference model configured in the server.
7. The method of claim 1, wherein if the first rule set is not configured for the pending data file, the method further comprises:
and if the total memory occupied by the data files to be processed is not more than a first threshold value and the total number of the items of the data files to be processed is not more than a second threshold value, sending the data files to be processed to a single server.
8. An apparatus for processing a data file, the apparatus comprising:
the verification module is used for verifying the fragmentation rules in the rule pool if the first rule set is not configured in the data file to be processed, and determining a second rule set from a plurality of experience sets according to at least one matching rule obtained through verification;
and the fragmentation module is used for carrying out fragmentation processing on the data file to be processed through the second rule set and sending the obtained fragmentation file to a server, and the first rule set is a rule set matched with the server.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202210557890.0A 2022-05-19 2022-05-19 Data file processing method, device, equipment and storage medium Pending CN114936187A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210557890.0A CN114936187A (en) 2022-05-19 2022-05-19 Data file processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210557890.0A CN114936187A (en) 2022-05-19 2022-05-19 Data file processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114936187A true CN114936187A (en) 2022-08-23

Family

ID=82865790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210557890.0A Pending CN114936187A (en) 2022-05-19 2022-05-19 Data file processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114936187A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501706A (en) * 2023-06-28 2023-07-28 中国人民解放军总医院 Data configuration method and device for medical artificial intelligence model detection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501706A (en) * 2023-06-28 2023-07-28 中国人民解放军总医院 Data configuration method and device for medical artificial intelligence model detection
CN116501706B (en) * 2023-06-28 2023-09-19 中国人民解放军总医院 Data configuration method and device for medical artificial intelligence model detection

Similar Documents

Publication Publication Date Title
CN108681565B (en) Block chain data parallel processing method, device, equipment and storage medium
CN112799584B (en) Data storage method and device
CN114996173B (en) Method and device for managing write operation of storage equipment
US20230161811A1 (en) Image search system, method, and apparatus
CN113886162A (en) Computing equipment performance test method, computing equipment and storage medium
CN110750517B (en) Data processing method, device and equipment of local storage engine system
CN114936187A (en) Data file processing method, device, equipment and storage medium
CN110069217B (en) Data storage method and device
CN112101024B (en) Target object identification system based on app information
CN112835682A (en) Data processing method and device, computer equipment and readable storage medium
CN113240013A (en) Model training method, device and equipment based on sample screening and storage medium
CN112286968A (en) Service identification method, equipment, medium and electronic equipment
US10761940B2 (en) Method, device and program product for reducing data recovery time of storage system
CN111221690A (en) Model determination method and device for integrated circuit design and terminal
CN114281761A (en) Data file loading method and device, computer equipment and storage medium
CN115438020A (en) Database resource scheduling method, device, equipment and medium
US11481130B2 (en) Method, electronic device and computer program product for processing operation commands
CN111061712A (en) Data connection operation processing method and device
CN112600756B (en) Service data processing method and device
CN117271440B (en) File information storage method, reading method and related equipment based on freeRTOS
US11662937B2 (en) Copying data based on overwritten probabilities
US11513862B2 (en) System and method for state management of devices
KR102662776B1 (en) Apparatus and method for providing proposal service of system design for system construction based on user needs
CN110427391B (en) Method, apparatus and computer program product for determining duplicate data
CN111176611B (en) Method and device for generating random data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination