CN118093059A - Multi-mode unstructured data processing method and device and electronic equipment - Google Patents

Multi-mode unstructured data processing method and device and electronic equipment Download PDF

Info

Publication number
CN118093059A
CN118093059A CN202311595351.7A CN202311595351A CN118093059A CN 118093059 A CN118093059 A CN 118093059A CN 202311595351 A CN202311595351 A CN 202311595351A CN 118093059 A CN118093059 A CN 118093059A
Authority
CN
China
Prior art keywords
processing
data
task
unstructured data
processing task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311595351.7A
Other languages
Chinese (zh)
Inventor
苏萌
刘译璟
李亚博
李彦泽
毛健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Percent Technology Group Co ltd
Original Assignee
Beijing Percent Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Percent Technology Group Co ltd filed Critical Beijing Percent Technology Group Co ltd
Priority to CN202311595351.7A priority Critical patent/CN118093059A/en
Publication of CN118093059A publication Critical patent/CN118093059A/en
Pending legal-status Critical Current

Links

Landscapes

  • Stored Programmes (AREA)

Abstract

The embodiment of the application provides a multi-mode unstructured data processing method, a device, electronic equipment and a storage medium, wherein the multi-mode unstructured data processing method comprises the following steps: obtaining unstructured data to be processed in a target scene; configuring a processing task corresponding to the unstructured data according to the target scene by utilizing a predefined data format to obtain a configured processing task; and calling an application interface corresponding to the configured processing task, and processing unstructured data corresponding to the configured processing task through the application interface.

Description

Multi-mode unstructured data processing method and device and electronic equipment
Technical Field
The present application relates to the field of data processing, and in particular, to a method and apparatus for processing multi-mode unstructured data, an electronic device, and a storage medium.
Background
Unstructured data refers to data that is represented by a two-dimensional logical table of a database, which is irregular or incomplete in data structure, without a predefined data model. Unstructured data often involves different processing methods for different data contents during processing.
Under some scenes, because the unstructured data are diversified in form and disordered in processing mode, the unstructured data in specific scenes are required to be processed from the code level by application, so that the unstructured data are required to be processed from the code level of application for each specific scene, the processing mode is complicated, and the processing efficiency of the unstructured data is low.
Disclosure of Invention
The embodiment of the application aims to provide a multi-mode unstructured data processing method, a multi-mode unstructured data processing device, electronic equipment and a storage medium, which can improve the processing efficiency of unstructured data.
In order to solve the technical problems, the embodiment of the application is realized as follows:
In a first aspect, an embodiment of the present application provides a method for processing multi-mode unstructured data, where the method for processing multi-mode unstructured data includes: obtaining unstructured data to be processed in a target scene; configuring processing tasks corresponding to unstructured data according to a target scene by utilizing a predefined data format to obtain configured processing tasks; and calling an application interface corresponding to the configured processing task, and processing unstructured data corresponding to the configured processing task through the application interface.
In a second aspect, an embodiment of the present application provides a multi-modal unstructured data processing apparatus, including: the acquisition module is used for acquiring unstructured data to be processed in a target scene; the configuration module is used for configuring the processing task corresponding to the unstructured data according to the target scene by utilizing a predefined data format to obtain the configured processing task; and the processing module is used for calling an application interface corresponding to the configured processing task and processing unstructured data corresponding to the configured processing task through the application interface.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete communication with each other through a communication bus; a memory for storing a computer program; a processor for executing a program stored on a memory for implementing the steps of the method for processing multi-modal unstructured data as mentioned in the first aspect.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the multi-modal unstructured data processing method as mentioned in the first aspect.
According to the technical scheme provided by the embodiment of the application, the unstructured data to be processed in the target scene is obtained, the data structure of the processing task corresponding to the unstructured data is configured according to the target scene by utilizing the predefined data format, the configured processing task is obtained, finally, the application interface corresponding to the configured processing task is called, and the data corresponding to the configured processing task is processed through the application interface.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a multi-mode unstructured data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating a module composition of a multi-mode unstructured data processing apparatus according to an embodiment of the present application;
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The embodiment of the application aims to provide a multi-mode unstructured data processing method, a multi-mode unstructured data processing device, electronic equipment and a storage medium, which can improve the processing efficiency of unstructured data.
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, shall fall within the scope of the application.
As shown in fig. 1, an embodiment of the present application provides a multi-mode unstructured data processing method, where an execution body of the method may be a server, and the multi-mode unstructured data processing method may specifically include the following steps:
in step S101, unstructured data to be processed in a target scene is acquired.
Specifically, the target scene may be a video picture content recognition scene, a video speech recognition scene, a document element extraction scene, a document specific content extraction scene, an uploaded document translation scene, a false message recognition scene, a dataset field extraction scene, and the like. It should be noted that the target scene may be other scenes, and embodiments of the present application are not limited herein. Unstructured data to be processed refers to data which is irregular or incomplete in data structure and has no predefined data model in the scene, and the processing process of the unstructured data which is inconvenient to use is carried out by using the data represented by a two-dimensional logic table of a database can comprise storage, information extraction, conversion, diversion, convergence, parallel and other processing, and each processing step of the unstructured data can correspond to a processing task.
In step S103, processing tasks corresponding to the unstructured data are configured according to the target scene by using a predefined data format, and the configured processing tasks are obtained.
Specifically, the predefined data format may be used to perform all steps of processing on unstructured data, and the processing tasks corresponding to the processing steps of the unstructured data are configured through the data format, so that the unstructured data are docked with an application, and after the processing tasks are configured according to the data format, the data corresponding to the processing tasks can be suitable for being processed by corresponding application interfaces.
The predefined data format may be as shown in table 1 below:
TABLE 1 data format
Wherein, in the course of configuring the processing task, the requirements of the data format include at least one of the following: the configured data format of the processing task can have a node with a start_tag as true, and the multithreading can run the starting stage at the same time; each node may have a plurality of wait conditions, provided that the application processing state of the upstream node id is complete; the execution flow chart does not allow the ring to appear, and the front end and the back end are checked through an algorithm; the node corresponding to each processing task can directly split a plurality of downstream steps, and multithread parallel processing is realized. Wherein the nodes are used to perform processing tasks.
In one possible implementation manner, configuring processing tasks corresponding to unstructured data according to a target scene by using a predefined data format, and obtaining the configured processing tasks includes: configuring a data structure of a processing task according to scene configuration information in a data format; reading a relation field value in a data structure of a processing task, wherein the relation field value comprises starting node information and ending node information; configuring an execution flow chart between processing tasks according to the relation field values; and configuring parameters of each processing task and configuring task result data among each processing task according to the execution flow chart.
Specifically, the scenario configuration information refers to a scenario id, a scenario basic configuration parameter, and the like in a data format, and when configuring a data structure of a processing task, a relationship field "relations" of the processing task is configured, where the relationship field "relations" includes a dependency relationship between processing tasks of unstructured data, information of a start node and information of an end node corresponding to the processing tasks, and a field value in the relationship field "relations" is analyzed to obtain the dependency relationship between the processing tasks.
Further, for each processing task, parameters of the processing link are corresponding, specifically, different parts of the processing task are combined into a complete parameter set according to a predefined rule, and according to the attribute and type of the processing task, required key information is extracted from the processing task and appropriate format conversion is performed to configure the parameters of the processing task so as to meet the requirements of subsequent processing tasks.
Furthermore, each processing task has a dependency relationship, and when the latter processing task processes data, the data content of the processing task corresponding to the last node needs to be analyzed and assembled so as to integrate the result into a result set field, thereby facilitating the use of the data source by each application.
In one possible implementation, configuring an execution flow diagram between processing tasks according to a relationship field value includes: determining the dependency relationship among the processing tasks according to the relationship field value; and configuring an execution flow chart based on the dependency relationship, the initial node information and the end node information, wherein each node in the execution flow chart corresponds to one processing task, and each side represents the dependency relationship among the processing tasks.
Specifically, as the relationship field "relations" in table 1 includes the information of the start node "start_code", the information of the end node "end_code", and the relationship table id between the processing tasks, the relationship table indicates the dependency relationship between the processing tasks, which may be regarded as a sequence or logic relationship for describing the dependency relationship or execution sequence between the processing tasks, and by extracting the information of the start node "start_code" and the information of the end node "end_code", the dependency relationship between the processing tasks time may be serialized into an execution flowchart in which the flow and relationship between the tasks are presented by the nodes and edges, each node representing one processing task or step, and the edges representing the dependency relationship or execution sequence between the processing tasks.
In one possible implementation, configuring parameters of each processing task according to an execution flow chart includes: extracting target fields corresponding to the data format from the processing tasks according to the task sequence in the execution flow chart; extracting key data from processing task data of a processing task; and configuring parameters of each processing task according to the target field and the key data.
Specifically, when the parameters of the processing task are configured, specific fields or data corresponding to the data format are required to be extracted from the processing task and formatted, according to the design and the requirements of the processing task and the attribute and the type of the processing task, required key data which are the significative data of unstructured data required to be processed by the processing task are extracted from the processing task, the processing task can be accurately identified and corresponding parameter sets can be generated for the target fields and the key data, and the parameters are subjected to proper format conversion in the assembling process so as to meet the requirements of the subsequent application when the processing task is executed.
In one possible implementation, configuring task result data between processing tasks includes: determining a processing task corresponding to a previous node of the current node according to the execution flow chart; extracting a target field corresponding to a data format from task result data of a processing task corresponding to a previous node; the target field is configured into a task result dataset.
Specifically, after the processing task corresponding to the previous node is executed, the processing result is required to be integrated into a result set field, so that the application is convenient for using the data source, after the processing of the previous node is completed, the task result of the previous node is required to be analyzed, the analysis process involves extracting a target field, separating a data block, converting a format and the like from the task result, so as to obtain key data content from the task result, the analyzed data is assembled into the result set field, and the assembly process involves inserting the analyzed data into a corresponding field of a result set (the data field in the data format of the table 1 above) according to a predefined structure and rule.
In step S105, an application interface corresponding to the configured processing task is called, and unstructured data corresponding to the configured processing task is processed through the application interface.
Specifically, after the processing tasks are configured, the processing tasks need to be further processed, specifically, the configured processing tasks are transmitted as parameters to the distributor, the distributor requests according to the application interface addresses (such as address fields in table 1) responded in the data format, and unstructured data to be processed by the processing tasks are transmitted to corresponding applications for processing. The application interface is a core of unstructured data processing, receives parameters of processing tasks from the distributor, further processes and processes unstructured data according to a preset algorithm, logic or configuration, the multi-mode unstructured data processing operations can involve screening, cleaning, conversion, aggregation, analysis and the like of the data, other multi-mode unstructured data processing operations can be performed on the unstructured data according to actual requirements, and the application interface processes the input unstructured data through application of internal processing flows and functions and outputs processed results in an asynchronous mode.
According to the technical scheme disclosed by the embodiment of the application, the unstructured data to be processed in the target scene is obtained, the data structure of the processing task corresponding to the unstructured data is configured according to the target scene by utilizing the predefined data format, the configured processing task is obtained, finally, the application interface corresponding to the configured processing task is called, and the data corresponding to the configured processing task is processed through the application interface, so that when the unstructured data is processed through the application, the data structure of the processing task corresponding to the unstructured data is configured through the predefined data format, namely, the unstructured data is supported to be configured according to the scene by utilizing the predefined data format, and then the processing task configured according to the data format is used for processing the unstructured data, and the unstructured data is not required to be processed from the code layer of the application, so that the processing efficiency of the unstructured data is improved.
In one possible implementation, after configuring the execution flow chart between processing tasks according to the relationship field values, the method further includes: converting the execution flow chart into JSON text; storing the JSON text in a Redis database; invoking an application interface corresponding to the configured processing task, and processing unstructured data corresponding to the configured processing task through the application interface comprises: reading a JSON text from the Redis database, respectively calling corresponding application interfaces according to processing tasks corresponding to nodes in the JSON text, and processing data corresponding to the configured processing tasks according to the task sequence in the JSON text through the application interfaces.
Specifically, by converting the execution flow chart into JSON texts for recording, storing the JSON texts in a dis database, which is a high-performance memory database capable of rapidly storing and retrieving a large amount of data, and storing the execution flow chart in the dis database, the durability and reliability of the execution flow chart can be ensured, the execution flow chart is prevented from being damaged and lost, and the dis database can also perform efficient read-write operation, so that the JSON texts are read from the dis database, and the read-write efficiency of the execution flow chart is improved. Further, the plurality of nodes can read the information of the execution flow chart stored in the Redis database, and then call the corresponding application interface to execute the processing task according to the task sequence in the execution flow chart. By the method, the distributed processing tasks can be executed, namely, the computing resources of a plurality of nodes are fully utilized, the parallel processing capacity of the processing tasks is improved, the data processing efficiency is further improved, and in the execution of the distributed flow, the processing task of the first node can be written into a memory task queue and is set to be executed preferentially.
In one possible implementation manner, after calling an application interface corresponding to the configured processing task and processing unstructured data corresponding to the configured processing task through the application interface, the method further includes: and writing the processed data after the application interface processes the data corresponding to the processing task into a result memory queue and a persistence queue.
Specifically, the processing result of the processing task can be monitored in real time through Kafka, the processing result data is written into a result memory queue and a persistence queue, then a storage thread is responsible for reading the processing data in the persistence queue, splitting the processing data, and writing the split result into an elastic search or an HDFS or Kafka for persistence one by one field, so that the loss of the processing data is avoided, and the long-term storage and the inquireability of the data are ensured. Further, the result processing thread reads the processing data from the result memory queue, parses the processing data in the result memory queue, and determines a next operation by acquiring nextid of the processing task of the next node. The processing data are stored in the result memory queue and the persistent queue, so that the real-time multi-mode unstructured data processing capability and the result analysis function are improved, the processing data after the application interface processes the data corresponding to the processing tasks can be processed efficiently, and task scheduling can be performed according to the dependency relationship among the processing tasks.
Further, in the result processing thread, in addition to parsing the processed processing data and obtaining nextid of the processing task of the next node, necessary contents are also required to be assembled into appropriate data structures, and these data structures may include some key fields, parameters or metadata in table 1, so that the processing task of the next node can perform correct processing. After the data structure is assembled, the result processing thread sends the result processing thread to a task queue, the task queue serves as a transfer station, the transfer station receives the data structure from the result processing thread and distributes the data structure to the corresponding node of the next processing task, the closed loop is realized by sending the data structure to the task queue, and under the condition that the closed loop is realized, the processing task of each node can process the result after the processing task of the last node, and the data structure needed by the next node is generated based on the processed result.
Corresponding to the multi-mode unstructured data processing method provided in the foregoing embodiment, based on the same technical concept, the embodiment of the present application further provides a multi-mode unstructured data processing device, and fig. 2 is a schematic block diagram of the multi-mode unstructured data processing device provided in the embodiment of the present application, where the multi-mode unstructured data processing device is configured to execute the multi-mode unstructured data processing method described in fig. 1, as shown in fig. 2, and the multi-mode unstructured data processing device 200 includes: an acquisition module 201, configured to acquire unstructured data to be processed in a target scene; the configuration module 202 is configured to configure a processing task corresponding to unstructured data according to a target scene by using a predefined data format, so as to obtain a configured processing task; and the processing module 203 is configured to call an application interface corresponding to the configured processing task, and process unstructured data corresponding to the configured processing task through the application interface.
According to the technical scheme provided by the embodiment of the application, the unstructured data to be processed in the target scene is obtained, the data structure of the processing task corresponding to the unstructured data is configured according to the target scene by utilizing the predefined data format, the configured processing task is obtained, finally, the application interface corresponding to the configured processing task is called, and the data corresponding to the configured processing task is processed through the application interface.
In one possible implementation, the configuration module 202 is further configured to configure a data structure of the processing task according to the scene configuration information in the data format; reading a relation field value in a data structure of a processing task, wherein the relation field value comprises starting node information and ending node information; configuring an execution flow chart between processing tasks according to the relation field values; and configuring parameters of each processing task and configuring task result data among each processing task according to the execution flow chart.
In one possible implementation, the configuration module 202 is further configured to determine a dependency relationship between the processing tasks according to the relationship field value; and configuring an execution flow chart based on the dependency relationship, the initial node information and the end node information, wherein each node in the execution flow chart corresponds to one processing task, and each side represents the dependency relationship among the processing tasks.
In one possible implementation, the configuration module 202 is further configured to convert the execution flow chart into JSON text; storing the JSON text in a Redis database; reading a JSON text from the Redis database, respectively calling corresponding application interfaces according to processing tasks corresponding to nodes in the JSON text, and processing data corresponding to the configured processing tasks according to the task sequence in the JSON text through the application interfaces.
In a possible implementation manner, the configuration module 202 is further configured to extract, from the processing tasks, a target field corresponding to the data format according to a task order in the execution flow chart; extracting key data from processing task data of a processing task; and configuring parameters of each processing task according to the target field and the key data.
In a possible implementation manner, the configuration module 202 is further configured to determine, according to the execution flow chart, a processing task corresponding to a previous node of the current node; extracting a target field corresponding to a data format from task result data of a processing task corresponding to a previous node; the target field is configured into a task result dataset.
In one possible implementation, the method further includes: and the writing module is used for writing the processed data after the application interface processes the data corresponding to the processing task into the result memory queue and the persistence queue.
The multi-mode unstructured data processing device provided by the embodiment of the application can realize each process in the embodiment corresponding to the multi-mode unstructured data processing method, has the same or similar beneficial effects, and is not repeated here for avoiding repetition.
It should be noted that, the multi-mode unstructured data processing device provided by the embodiment of the present application and the multi-mode unstructured data processing method provided by the embodiment of the present application are based on the same application conception, so that the implementation of the embodiment can refer to the implementation of the multi-mode unstructured data processing method, and have the same or similar beneficial effects, and the repetition is omitted.
According to the method for processing multi-mode unstructured data provided in the foregoing embodiments, based on the same technical concept, the embodiment of the present application further provides an electronic device, where the electronic device is configured to execute the method for processing multi-mode unstructured data, and fig. 3 is a schematic structural diagram of an electronic device for implementing the embodiments of the present application, as shown in fig. 3. The electronic device may be configured or configured differently, may include one or more processors 301 and memory 302, and may have one or more applications or data stored in memory 302. Wherein the memory 302 may be transient storage or persistent storage. The application programs stored in memory 302 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for use in an electronic device.
Still further, the processor 301 may be arranged to communicate with the memory 302 and execute a series of computer executable instructions in the memory 302 on an electronic device. The electronic device may also include one or more power supplies 303, one or more wired or wireless network interfaces 304, one or more input/output interfaces 305, and one or more keyboards 306.
In this embodiment, the electronic device includes a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete communication with each other through a bus; a memory for storing a computer program; the processor is configured to execute the program stored in the memory, implement each step in the method embodiment in fig. 1, and have the beneficial effects of the method embodiment, so that the embodiments of the present application are not repeated herein.
The embodiment also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in the embodiment of the method of fig. 1, and has the advantages of the embodiment of the method, and in order to avoid repetition, the embodiment of the application is not described herein.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, the electronic device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash memory (flashRAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transitorymedia), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A method of processing multi-modal unstructured data, the method comprising:
Obtaining unstructured data to be processed in a target scene;
Configuring a processing task corresponding to the unstructured data according to the target scene by utilizing a predefined data format to obtain a configured processing task;
and calling an application interface corresponding to the configured processing task, and processing unstructured data corresponding to the configured processing task through the application interface.
2. The method for processing multi-modal unstructured data according to claim 1, wherein configuring the processing tasks corresponding to the unstructured data according to the target scene using a predefined data format comprises:
Configuring a data structure of the processing task according to scene configuration information in the data format;
reading a relation field value in a data structure of the processing task, wherein the relation field value comprises starting node information and ending node information;
Configuring an execution flow chart among the processing tasks according to the relation field values;
And configuring parameters of each processing task according to the execution flow chart, and configuring task result data among the processing tasks.
3. The method of claim 2, wherein configuring the execution flow chart between the processing tasks according to the relationship field value comprises:
determining the dependency relationship among the processing tasks according to the relationship field value;
And configuring the execution flow chart based on the dependency relationship, the starting node information and the ending node information, wherein each node in the execution flow chart corresponds to a processing task, and each side represents the dependency relationship among the processing tasks.
4. The method of claim 2, further comprising, after said configuring an execution flow chart between said processing tasks according to said relationship field values:
converting the execution flow chart into a JSON text;
storing the JSON text in a Redis database;
the calling the application interface corresponding to the configured processing task, and the processing the unstructured data corresponding to the configured processing task through the application interface comprises the following steps:
Reading the JSON text from the Redis database, respectively calling corresponding application interfaces according to processing tasks corresponding to nodes in the JSON text, and processing data corresponding to the configured processing tasks according to task sequences in the JSON text through the application interfaces.
5. The method of claim 2, wherein configuring parameters of each of the processing tasks according to the execution flow chart comprises:
Extracting target fields corresponding to the data format from the processing tasks according to the task sequence in the execution flow chart;
extracting key data from the processing task data of the processing task;
And configuring parameters of each processing task according to the target field and the key data.
6. The method of claim 2, wherein said configuring task result data between each of said processing tasks comprises:
determining a processing task corresponding to a previous node of the current node according to the execution flow chart;
extracting a target field corresponding to the data format from task result data of a processing task corresponding to the previous node;
The target field is configured into a task result dataset.
7. The method for processing multi-modal unstructured data according to claim 1, wherein after the invoking the application interface corresponding to the configured processing task, processing unstructured data corresponding to the configured processing task through the application interface, further comprises:
And writing the processed data after the application interface processes the data corresponding to the processing task into a result memory queue and a persistence queue.
8. A multi-modal unstructured data processing apparatus, comprising:
the acquisition module is used for acquiring unstructured data to be processed in a target scene;
The configuration module is used for configuring the processing task corresponding to the unstructured data according to the target scene by utilizing a predefined data format to obtain the configured processing task;
And the processing module is used for calling an application interface corresponding to the configured processing task and processing unstructured data corresponding to the configured processing task through the application interface.
9. An electronic device comprising a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete communication with each other through a communication bus; a memory for storing a computer program; a processor for executing programs stored on a memory to perform the steps of the multi-modal unstructured data processing method according to any of claims 1-7.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which, when executed by a processor, implements the multi-modal unstructured data processing method steps of any of claims 1-7.
CN202311595351.7A 2023-11-27 2023-11-27 Multi-mode unstructured data processing method and device and electronic equipment Pending CN118093059A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311595351.7A CN118093059A (en) 2023-11-27 2023-11-27 Multi-mode unstructured data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311595351.7A CN118093059A (en) 2023-11-27 2023-11-27 Multi-mode unstructured data processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN118093059A true CN118093059A (en) 2024-05-28

Family

ID=91160288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311595351.7A Pending CN118093059A (en) 2023-11-27 2023-11-27 Multi-mode unstructured data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN118093059A (en)

Similar Documents

Publication Publication Date Title
CN106202235B (en) Data processing method and device
CN108280023B (en) Task execution method and device and server
CN109104327B (en) Service log generation method, device and equipment
CN103309904A (en) Method and device for generating data warehouse ETL (Extraction, Transformation and Loading) codes
CN111339311A (en) Method, device and processor for extracting structured events based on generative network
US10496423B2 (en) Method for opening up data and functions of terminal application based on reconstruction technology
CN110955714A (en) Method and device for converting unstructured text into structured text
CN111897828A (en) Data batch processing implementation method, device, equipment and storage medium
CN110895544A (en) Interface data processing method, device, system and storage medium
CN111553652A (en) Service processing method and device
CN114490641A (en) Industrial Internet data sharing method, equipment and medium
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN110083602B (en) Method and device for data storage and data processing based on hive table
CN109582776B (en) Model generation method and device, electronic device and storage medium
CN118093059A (en) Multi-mode unstructured data processing method and device and electronic equipment
CN110019357B (en) Database query script generation method and device
CN110019295B (en) Database retrieval method, device, system and storage medium
CN110019497B (en) Data reading method and device
CN114691112A (en) Data processing method and device and data processing server
CN112214669A (en) Home decoration material formaldehyde release data processing method and device and monitoring server
CN110908898B (en) Method and system for generating test scheme
CN111352940A (en) Data processing method and system
CN112445784B (en) Text structuring method, equipment and system
CN111401005A (en) Text conversion method and device and readable storage medium
CN110956672A (en) Marketing strategy construction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination