CN115795046A - Data processing method, device, system, electronic device and storage medium - Google Patents

Data processing method, device, system, electronic device and storage medium Download PDF

Info

Publication number
CN115795046A
CN115795046A CN202211409956.8A CN202211409956A CN115795046A CN 115795046 A CN115795046 A CN 115795046A CN 202211409956 A CN202211409956 A CN 202211409956A CN 115795046 A CN115795046 A CN 115795046A
Authority
CN
China
Prior art keywords
knowledge
unit
data
content
knowledge unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211409956.8A
Other languages
Chinese (zh)
Inventor
周立运
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rubik's Cube Medical Technology Suzhou Co ltd
Original Assignee
Rubik's Cube Medical Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rubik's Cube Medical Technology Suzhou Co ltd filed Critical Rubik's Cube Medical Technology Suzhou Co ltd
Priority to CN202211409956.8A priority Critical patent/CN115795046A/en
Publication of CN115795046A publication Critical patent/CN115795046A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a data processing method, a device, a system, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a knowledge modeling document, wherein the knowledge modeling document comprises knowledge types and attributes of knowledge units; determining a source text of any knowledge unit from a corresponding source knowledge base in the data management module based on a source in the attribute of any knowledge unit; processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit; and based on the destination in the attribute of any knowledge unit, sending the knowledge content of any knowledge unit to a corresponding output knowledge base in the data management module for storage. The data processing method, the device, the system, the electronic equipment and the storage medium improve the reusability of the data source and avoid management confusion caused by data copying.

Description

Data processing method, device, system, electronic device and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, apparatus, system, electronic device, and storage medium.
Background
Knowledge Graph (Knowledge Graph) can effectively establish semantic relations implicit in natural language texts by means of machines to a certain extent, and provides an effective means for people in the internet era to mine text data. By means of a rich knowledge map, people can more easily and flexibly discover the relation between knowledge and knowledge.
To construct a knowledge graph, a large amount of data needs to be processed to form retrievable structured data, and a common solution in terms of data processing workflow is to combine AI + HI. The AI refers to an Artificial Intelligence (Artificial Intelligence) means, and automatically predicts a data set in batch by using methods such as machine learning, deep learning, or rule matching, for example, a technology in the field of Natural Language Processing (NLP). HI refers to Human Intelligence (Human Intelligence), a domain expert that manually processes data from a source knowledge base.
In a classical workflow, AI and HI processing data are not performed simultaneously, certain training data accumulated after HI processing is provided for AI to perform model training in an initial stage of data processing, and AI prediction results in a later stage are provided for HI to perform audit verification. The way that the AI and the HI alternately perform decentralized processing on the same data can cause the problem of generating a large amount of backup data, and is not beneficial to data management.
Therefore, how to improve the reusability of data in the data processing workflow is an urgent problem to be solved.
Disclosure of Invention
The invention provides a data processing method, a data processing device, a data processing system, electronic equipment and a storage medium, which are used for solving the defects that a large amount of data are backed up and data management is not facilitated in the prior art by a dispersed processing mode of the same data.
The invention provides a data processing method, which comprises the following steps:
acquiring a knowledge modeling document, wherein the knowledge modeling document comprises knowledge types and attributes of knowledge units;
determining a source text of any knowledge unit from a corresponding source knowledge base in a data management module based on a source in the attribute of the knowledge unit;
processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit;
and based on the destination in the attributes of any knowledge unit, sending the knowledge content of any knowledge unit to a corresponding output knowledge base in a data management module for storage so as to generate a target data product.
According to the data processing method provided by the invention, the processing mode in the attribute of any knowledge unit comprises at least one of the following modes:
an extraction processing mode for extracting entities contained in the source text of any knowledge unit;
selecting a processing mode for selecting and obtaining the knowledge content of any knowledge unit from predefined options;
and the input processing mode is used for manually inputting the knowledge content of any knowledge unit.
According to the data processing method provided by the present invention, the processing method in the attribute of any knowledge unit further includes:
and the mapping processing mode is used for reprocessing the knowledge content of any knowledge unit based on a mapping relation, and the mapping relation represents the corresponding relation between the knowledge content of any knowledge unit before reprocessing and the knowledge content after reprocessing.
According to the data processing method provided by the present invention, the mapping processing manner includes normalization, and the processing of the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing manner in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit includes:
and matching the source text of any knowledge unit with the dictionary of the corresponding knowledge type in the data management module based on the knowledge type of any knowledge unit, and determining the knowledge content of any knowledge unit based on the matching result.
According to the data processing method provided by the invention, the processing of the source text of any knowledge unit to obtain the knowledge content of any knowledge unit comprises the following steps:
and processing the source text of any knowledge unit by adopting a human intelligence and artificial intelligence cooperation mode to obtain the knowledge content of any knowledge unit.
According to the data processing method provided by the invention, the attribute of each knowledge unit further comprises a hierarchical structure relationship between each knowledge unit and other knowledge units, and the method further comprises the following steps:
and displaying the hierarchical structure relationship between each knowledge unit and other knowledge units so that the user can verify the target data product based on the hierarchical structure relationship.
The present invention also provides a data processing apparatus comprising:
the system comprises a document acquisition unit, a knowledge modeling unit and a knowledge modeling unit, wherein the document acquisition unit is used for acquiring a knowledge modeling document which comprises knowledge types and attributes of knowledge units;
the text determining unit is used for determining the source text of any knowledge unit from a corresponding source knowledge base in the data management module based on the source in the attribute of the knowledge unit;
the text processing unit is used for processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit;
and the knowledge storage unit is used for sending the knowledge content of any knowledge unit to a corresponding output knowledge base in the data management module for storage based on the destination in the attribute of the knowledge unit so as to generate a target data product.
The present invention also provides a data processing system, comprising:
the data processing device described above;
the knowledge modeling module is connected with the data processing device and used for creating a knowledge modeling document aiming at a target data product, recording updating contents and managing the version of the knowledge modeling document under the condition that the update of the knowledge modeling document is detected;
and the data management module is respectively connected with the data processing device and the knowledge modeling module and is used for storing and managing the source text and the knowledge content of any knowledge unit and the dictionary.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the data processing method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as described in any one of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a data processing method as described in any one of the above.
According to the data processing method, the data processing device, the data processing system, the electronic equipment and the storage medium, data processing is driven by the knowledge modeling document, the processing flow is configured through the knowledge modeling document, heterogeneous data resources and processing modes can be flexibly configured, and the data processing speed and flexibility are improved; in the data processing workflow, the source knowledge base and the output knowledge base are managed uniformly by the data management module, so that the reusability of a data source is improved, and management confusion caused by data copying is avoided.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a data processing method provided by the present invention;
FIG. 2 is a schematic structural diagram of a data processing apparatus provided in the present invention;
FIG. 3 is a schematic diagram of a data processing system according to the present invention;
FIG. 4 is a second schematic flow chart of the data processing method provided by the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
In the current data processing workflow, AI and HI processing data are not performed simultaneously, in the initial stage of data processing, certain training data accumulated after HI processing is provided for AI to perform model training, and in the later stage, AI prediction results are provided for HI to perform audit verification. The way of performing decentralized processing on the same data by means of alternating AI and HI causes a problem of generating a large amount of backup data, and is not beneficial to data management.
In order to solve the above problem, an embodiment of the present invention provides a data processing method, which integrates all shared data resources in a data processing workflow based on a pre-configured knowledge modeling document.
Fig. 1 is a schematic flow chart of a data processing method provided by the present invention, and the execution subject of each step in the method may be a data processing apparatus, and the apparatus may be implemented by software and/or hardware. The device can be integrated in electronic equipment, and electronic equipment can be personal computer, high in the clouds equipment, smart mobile phone and panel computer etc.. As shown in fig. 1, the data processing method provided by the embodiment of the present invention may include the following steps:
step 110, a knowledge modeling document is obtained, wherein the knowledge modeling document comprises knowledge types and attributes of knowledge units.
Specifically, the data processing flow is driven by knowledge modeling documents, which may be created in advance at the knowledge modeling module before step 110 is performed. The data products of an enterprise typically present the results of the process data in a particular form, and thus different data products need to define different knowledge modeling documents. The knowledge modeling document can be obtained by knowledge modeling for understanding of the domain knowledge by experts in the field according to the morphology of the target data product.
The knowledge modeling document includes knowledge types and attributes for each knowledge unit. The knowledge type of each knowledge unit is obtained by abstracting the entity contained in the knowledge unit, and people, countries, medicines and the like are abstract concepts. As another example, knowledge modeling for a "clinical trial" wherein the knowledge types for each knowledge unit may specifically include "drugs", "indications", "clinical stages", "patient baseline characteristics", "regimen", etc. The medicine and the combined SOC belong to subunits of a combined scheme, and knowledge modeling of a hierarchical structure is formed.
The knowledge types of the knowledge units can be understood as field types of processed data, and after a plurality of documents are processed to obtain text contents corresponding to the field types respectively, the text contents are represented by the knowledge units with hierarchical structures, so that a knowledge graph belonging to a clinical test project is formed.
The attribute of each knowledge unit is a structural description of the knowledge unit, and can represent the characteristics of each knowledge unit. The concept of "person" has several attributes, including: gender, name, date of birth, etc. In the embodiment of the present invention, the attributes of each knowledge unit may include the following dimensions: hierarchical relationships, sources, destinations, and processing manners.
The hierarchical structure relationship among the attributes of each knowledge unit can represent the hierarchical structure relationship between each knowledge unit and other knowledge units, for example, the "combination scheme" is a parent level of the "medicine" and the "combination SOC", and the "medicine" and the "combination SOC" are sub-level of the "combination scheme". Further, the hierarchical structure relationship in the attribute of each knowledge unit can be visualized and shown by a tree structure.
The sources in the attributes of each knowledge unit may characterize the content sources of each knowledge unit, such as the content in a "drug" knowledge unit may come from a database of each national or regional drug regulatory agency, or a commercial database; the contents of the combination scheme can be from a clinical test database, a Chinese clinical test registration center and the like, and can be collectively called a source knowledge base.
The destination in the attribute of each knowledge unit can represent the output position of the knowledge content of each knowledge unit obtained after processing, for example, the knowledge content of the knowledge unit of 'drug' obtained after processing comprises 'Lauratinib' and 'Nawaruiuzumab', and the drug entity can be output to a drug output knowledge base.
The processing mode in the attribute of each knowledge unit can represent the processing mode of each knowledge unit, such as a mode of extracting from a text, and a mode of manually inputting or selecting.
It is to be understood that after the knowledge modeling module creates the knowledge modeling document, the knowledge modeling document may be sent to the data processing apparatus, or the knowledge modeling document may be shared so that the data processing apparatus may obtain the created knowledge modeling document.
Step 120, based on the source in the attribute of any knowledge unit, determining the source text of any knowledge unit from the corresponding source knowledge base in the data management module.
Specifically, after the data processing device acquires the knowledge modeling document, the knowledge modeling document can be analyzed, and for any knowledge unit, a source in the attribute of the knowledge unit is obtained through analysis, so that a source knowledge base corresponding to the source can be determined from the data management module, and a source text of any knowledge unit can be determined from the corresponding source knowledge base.
Preferably, the source text can be a document with different structures such as HTML, PDF, JSON, etc. The source text may be an original text in the source knowledge base, or may be a knowledge text obtained by processing the original text, which is not specifically limited in this embodiment of the present invention.
It should be noted that, after the source text is determined, the source text of the knowledge unit may be further displayed on a front page, so as to facilitate data processing.
And step 130, processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit.
Specifically, after the source text of the knowledge unit is obtained, the source text can be processed to obtain the knowledge content of the knowledge unit. The knowledge content here may be specifically names of the respective entities, or scores, and is specifically determined according to the knowledge type of the knowledge unit.
The processing mode of any knowledge unit is well defined in the knowledge modeling document, and the processing specifically can be to acquire knowledge content matched with the type of the knowledge unit from a source text. The data processing device provides a plurality of processing mode marking functions, and is adapted to the processing mode configuration of each knowledge unit in the knowledge modeling document through modes of clicking, dragging, drawing, inputting and the like.
And step 140, based on the destination in the attribute of any knowledge unit, sending the knowledge content of any knowledge unit to a corresponding output knowledge base in the data management module for storage so as to generate a target data product.
Specifically, after the knowledge content of the knowledge unit is obtained, the knowledge content of the knowledge unit can be stored in the corresponding output knowledge base for storage according to the heading in the attribute of the knowledge unit. The corresponding output knowledge base is the output knowledge base matched with the knowledge type of the knowledge unit, for example, the output knowledge base can comprise a medicine output knowledge base, a target output knowledge base and the like, and if the knowledge type of the knowledge unit is a medicine, the processed knowledge content is stored in the medicine output knowledge base; and if the knowledge type of the knowledge unit is the target point, storing the processed knowledge content to a target point output knowledge base.
And after the knowledge content of each knowledge unit is obtained, a target data product can be generated.
The method provided by the embodiment of the invention drives data processing by the knowledge modeling document, configures the processing flow by the knowledge modeling document, can flexibly configure heterogeneous data resources and processing modes, and improves the data processing speed and flexibility; in the data processing workflow, the source knowledge base and the output knowledge base are managed in a unified mode through the data management module, reusability of a data source is improved, and management chaos caused by data copying is avoided.
In addition, in the knowledge modeling aspect, in the prior art, basically, the summary ontology is taken as a final target, and a knowledge model is summarized from big data in a manual construction mode, an automatic construction mode, a semi-automatic construction mode and the like, so that the generation of a knowledge modeling document is driven by data. The direction of attention of the technologies is mostly how to refine knowledge modeling from existing data, but basically does not pay attention to the adverse effect of the knowledge modeling on data production.
The method provided by the embodiment of the invention provides a paradigm for driving data processing by using knowledge modeling documents, the built knowledge modeling documents are not used as the end point of data flow, but a centralized knowledge modeling document which can be read and understood by people, a tool platform and AI algorithm tasks is designed, the adverse effect of knowledge modeling on data processing is emphasized, and the application range of knowledge modeling is widened.
Based on any embodiment, the processing mode in the attribute of any knowledge unit comprises at least one of the following modes:
the extraction processing mode is used for extracting entities contained in the source text of any knowledge unit;
selecting a processing mode for selecting and obtaining the knowledge content of any knowledge unit from predefined options;
and the input processing mode is used for manually inputting the knowledge content of any knowledge unit.
Specifically, the processing mode may be configured in the knowledge modeling document, and the processing function is realized in the data processing apparatus. The processing mode can comprise at least one of the following modes: extracting the processing mode, selecting the processing mode and recording the processing mode.
The extraction processing mode refers to extracting entities contained in the source text of any knowledge unit and is suitable for entity extraction, named entity identification or entity link configuration. The extraction needs to provide the position of the source knowledge base or a normalization mode. The normalization method can include: the dictionary location pointed to, or the user manually enters a dictionary entry, or none.
The entity extraction or Named Entity Recognition (NER) is a method in natural language processing, which is to extract effective Entities from natural language texts containing a large amount of noise so as to construct and output a knowledge base; the entity linking refers to further converting the extracted content into another expression, which is also called Normalization (Normalization).
The processing mode selection refers to selecting from predefined options to obtain the knowledge content of the knowledge unit, wherein the options are pre-defined enumerable items, for example, selectable score options can be pre-defined for classified knowledge types.
The input processing mode refers to a data processing mode which cannot directly extract knowledge from a source text, needs to artificially extract output knowledge and manually inputs the knowledge.
Based on any of the above embodiments, the processing manner in the attribute of any knowledge unit further includes:
and the mapping processing mode is used for carrying out reprocessing on the knowledge content of any knowledge unit based on the mapping relation, and the mapping relation represents the corresponding relation between the knowledge content of any knowledge unit before reprocessing and the knowledge content after reprocessing.
Specifically, the processing method may further include a mapping processing method, where the mapping processing method is a data processing method for deriving new knowledge from existing knowledge based on a mapping relationship, that is, for re-processing the generated knowledge content. The mapping processing mode is generally used for further performing secondary or multistage processing on the basis of primary knowledge (such as extraction, selection, input and the like). Normalization is also one of the processing ways, so the mapping is also adapted to normalization. The mapping relationship represents a correspondence relationship between the knowledge content before the reprocessing of the knowledge unit and the knowledge content after the reprocessing.
It should be noted that, according to the source of the mapping, the mapping processing method can be divided into two types: source table mapping and result mapping. The source table mapping refers to mapping a certain field in source data into target data, and the result mapping refers to generating an additional field according to the generated target data.
Based on any embodiment, the mapping processing method includes normalization, processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing method in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit, and includes:
and matching the source text of any knowledge unit with the dictionary of the corresponding knowledge type in the data management module based on the knowledge type of any knowledge unit, and determining the knowledge content of any knowledge unit based on the matching result.
Specifically, the mapping processing manner may include normalization, and during the normalization, the source text of the knowledge unit is matched with the dictionary of the corresponding knowledge type in the data management module according to the knowledge type of the knowledge unit, where the source text is knowledge content processed from an original document and may be structured data. And determining the knowledge content of the knowledge unit according to the matching result.
In one embodiment, the knowledge unit "country" that needs to be extracted comes directly from the locationlist. The knowledge content of the location count field in the source knowledge base is "CN", the normalization _ name field is queried from the normalization country dictionary count _ normalization, and the obtained result of all _ name field "[ China, china ]" flows to the output knowledge base.
The method provided by the embodiment of the invention adapts to various processing modes aiming at different knowledge units, expands the supportable range of data processing modes and provides corresponding processing mode marking functions, thereby completing data processing tasks.
Based on any embodiment, the method for processing the source text of any knowledge unit to obtain the knowledge content of any knowledge unit comprises the following steps:
and processing the source text of any knowledge unit by adopting a human intelligence and artificial intelligence cooperative mode to obtain the knowledge content of any knowledge unit.
Specifically, human intelligence and artificial intelligence cooperate to be a semi-automated data processing mode that integrates human interactive operations and algorithmic analysis. The artificial intelligence is a machine learning model role for automatically processing data, and the human intelligence is a data personnel role for manually processing data.
Human intelligence and artificial intelligence are synergistically embodied in two aspects: the artificial intelligence helps human intelligence, and data personnel can selectively import results generated by the artificial intelligence; the human intelligence helps artificial intelligence, and data processed by the human intelligence serves as an artificial intelligence model training data set to help the artificial intelligence model to be iteratively updated.
And the cooperative iteration of human intelligence and artificial intelligence is realized, the updating iteration is rapid, the data capacity is improved, and data personnel can produce target data more rapidly. The AI algorithm can use data processed by data personnel as a training set to automatically iterate the performance of an AI model, so that batch prediction of prediction data meeting the requirements of target products is achieved.
Based on any of the above embodiments, the attribute of each knowledge unit further includes a hierarchical relationship between each knowledge unit and other knowledge units, and the method further includes:
and displaying the hierarchical structure relationship between each knowledge unit and other knowledge units so that the user can verify the target data product based on the hierarchical structure relationship.
Specifically, considering that a hierarchical structure relationship exists between knowledge units, the attribute of each knowledge unit also includes the hierarchical structure relationship between each knowledge unit and other knowledge units, and the hierarchical structure relationship can be shown in a tree structure form during data processing. Both data personnel and artificial intelligence algorithms can obtain the hierarchical structure relationship. In the data product stage, the target data product can be verified by querying the hierarchical structure relationship.
The data processing apparatus provided by the present invention will be described below, and the data processing apparatus described below and the data processing method described above may be referred to in correspondence with each other.
Based on any of the above embodiments, fig. 2 is a schematic structural diagram of the data processing device provided by the present invention, and as shown in fig. 2, the data processing device 200 includes a document acquiring unit 210, a text determining unit 220, a text processing unit 230, and a knowledge storage unit 240, wherein,
a document obtaining unit 210, configured to obtain a knowledge modeling document, where the knowledge modeling document includes knowledge types and attributes of knowledge units;
a text determining unit 220, configured to determine a source text of any knowledge unit from a source knowledge base corresponding to the data management module based on a source in the attribute of the knowledge unit;
a text processing unit 230, configured to process the source text of any knowledge unit based on the knowledge type of the any knowledge unit and the processing manner in the attribute of the any knowledge unit, so as to obtain the knowledge content of the any knowledge unit;
and the knowledge storage unit 240 is configured to send the knowledge content of any knowledge unit to a corresponding output knowledge base in the data management module for storage based on the destination in the attribute of the any knowledge unit, so as to generate a target data product.
According to the data processing device provided by the embodiment of the invention, the data processing is driven by the knowledge modeling document, the processing flow is configured by the knowledge modeling document, heterogeneous data resources and processing modes can be flexibly configured, and the data processing speed and flexibility are improved; in the data processing workflow, the source knowledge base and the output knowledge base are managed uniformly by the data management module, so that the reusability of a data source is improved, and management confusion caused by data copying is avoided.
Based on any embodiment, the processing mode in the attribute of any knowledge unit comprises at least one of the following modes:
an extraction processing mode for extracting entities contained in the source text of any knowledge unit;
selecting a processing mode for selecting and obtaining the knowledge content of any knowledge unit from predefined options;
and the input processing mode is used for manually inputting the knowledge content of any knowledge unit.
Based on any of the above embodiments, the processing manner in the attribute of any knowledge unit further includes:
and the mapping processing mode is used for reprocessing the knowledge content of any knowledge unit based on a mapping relation, and the mapping relation represents the corresponding relation between the knowledge content of any knowledge unit before reprocessing and the knowledge content after reprocessing.
Based on any of the above embodiments, the mapping processing mode includes normalization, and the text processing unit is specifically configured to:
and matching the source text of any knowledge unit with the dictionary of the corresponding knowledge type in the data management module based on the knowledge type of any knowledge unit, and determining the knowledge content of any knowledge unit based on the matching result.
Based on any of the embodiments described above, the text processing unit is specifically configured to:
and processing the source text of any knowledge unit by adopting a human intelligence and artificial intelligence cooperative mode to obtain the knowledge content of any knowledge unit.
Based on any of the above embodiments, the attributes of the knowledge units further include hierarchical relationships between the knowledge units and other knowledge units, and the data processing device further includes a display module configured to: and displaying the hierarchical structure relationship between each knowledge unit and other knowledge units so that the user can verify the target data product based on the hierarchical structure relationship.
Based on any of the above embodiments, fig. 3 is a schematic structural diagram of the data processing system provided by the present invention, and as shown in fig. 3, the data processing system provided by the embodiment of the present invention includes a data processing apparatus 200, a knowledge modeling module 310 and a data management module 320, wherein,
a knowledge modeling module 310, connected to the data processing apparatus 200, for creating a knowledge modeling document for a target data product, recording an update content when it is detected that the knowledge modeling document is updated, and managing a version of the knowledge modeling document;
and the data management module 320 is respectively connected with the data processing device 200 and the knowledge modeling module 310 and is used for storing and managing the source text and knowledge content of any knowledge unit and a dictionary.
Specifically, the knowledge modeling module is used for editing and creating a knowledge modeling document, recording the knowledge modeling document modified by a user each time, and performing version management on the knowledge modeling document. The knowledge modeling document defines the knowledge type and the attribute of each knowledge unit, wherein the attribute comprises the hierarchical structure relationship between each knowledge unit and other knowledge units, the source and destination of each knowledge unit and the processing mode of each knowledge unit.
The data management module is used for storing and managing source texts and knowledge contents of any knowledge unit, and is designed to enable a plurality of projects to reuse data sources, so that management confusion caused by data copying is avoided. The data sources may include a source knowledge base, a production knowledge base, and a dictionary. The database into which the acquired data are uniformly stored is called a source knowledge base and is a source for data processing. The processed and sorted data output database becomes an output knowledge base which is the initial form of a data product. Dictionaries contain a knowledge base of normalized terms in the professional domain, typically used for entity linking (normalization) tasks.
The functions provided by the data management module may include: adding, modifying, and deleting data sources; the various data sources are presented in a tabular or nested configuration.
The data processing device provides two main functions: and providing a data annotation environment and providing an AI + HI cooperative interaction function.
The providing of the data annotation environment may specifically include: displaying documents with different structures such as HTML, PDF, JSON and the like from a source knowledge base of a data management module; displaying the planned hierarchical structure relationship in the knowledge modeling module; displaying the document and the labeled content by versions: the annotation of the data person or the version of the AI model prediction; showing the context relationship of the annotation content in the document, wherein the context relationship of the annotation content in the document can be understood as highlighting the annotation content in the source document; and a plurality of processing modes are provided with marking functions: the processing mode configuration of each knowledge unit in the knowledge modeling module is adapted through modes of clicking, dragging, scratching, inputting and the like; and outputting the labeling result to a yield knowledge base of the data management module.
The data processing system provided by the embodiment of the invention drives data processing by the knowledge modeling document, configures the processing flow by the knowledge modeling document, can flexibly configure heterogeneous data resources and processing modes, and improves the data processing speed and flexibility; in the data processing workflow, the source knowledge base and the output knowledge base are managed in a unified mode through the data management module, reusability of a data source is improved, and management chaos caused by data copying is avoided.
Based on any of the above embodiments, fig. 4 is a second schematic flow chart of the data processing method provided by the present invention, as shown in fig. 4,
a document to be processed is firstly collected to a source knowledge base, then is cooperatively processed by data personnel and an AI algorithm in a data processing system, processed data is published to an output knowledge base, and finally the data is used for generating a data product.
In this data flow, knowledge modeling documents are in a centralized position. First, the "Source to go" dimension of the knowledge modeling document is tied to the Source knowledge base, the Producer knowledge base, and the dictionary in the data management module. Secondly, the 'processing mode' dimension of the knowledge modeling document drives the diversified processing of different granularity data of the data processing system, and data personnel and AI algorithms in the data processing device are visible to the 'target data structure' dimension of the knowledge modeling document. Wherein the target data structure is the hierarchical relationship between each knowledge unit and other knowledge units. Finally, in the data product stage, product personnel can trace the source of the deviation occurring in the data processing process by looking up the knowledge modeling document.
Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor) 510, a communication Interface (Communications Interface) 520, a memory (memory) 530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a data processing method comprising: acquiring a knowledge modeling document, wherein the knowledge modeling document comprises knowledge types and attributes of knowledge units;
determining a source text of any knowledge unit from a corresponding source knowledge base in a data management module based on a source in the attribute of the knowledge unit;
processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit;
and based on the destination in the attributes of any knowledge unit, sending the knowledge content of any knowledge unit to a corresponding output knowledge base in a data management module for storage so as to generate a target data product.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the data processing method provided by the above methods, the method comprising: acquiring a knowledge modeling document, wherein the knowledge modeling document comprises knowledge types and attributes of knowledge units;
determining a source text of any knowledge unit from a corresponding source knowledge base in a data management module based on a source in the attribute of the knowledge unit;
processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit;
and based on the destination in the attributes of any knowledge unit, sending the knowledge content of any knowledge unit to a corresponding output knowledge base in a data management module for storage so as to generate a target data product.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method provided by the above methods, the method comprising: acquiring a knowledge modeling document, wherein the knowledge modeling document comprises knowledge types and attributes of knowledge units;
determining a source text of any knowledge unit from a corresponding source knowledge base in a data management module based on a source in the attribute of the knowledge unit;
processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit;
and based on the destination in the attributes of any knowledge unit, sending the knowledge content of any knowledge unit to a corresponding output knowledge base in a data management module for storage so as to generate a target data product.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of data processing, comprising:
acquiring a knowledge modeling document, wherein the knowledge modeling document comprises knowledge types and attributes of knowledge units;
determining a source text of any knowledge unit from a corresponding source knowledge base in a data management module based on a source in the attribute of the knowledge unit;
processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit;
and based on the destination in the attributes of any knowledge unit, sending the knowledge content of any knowledge unit to a corresponding output knowledge base in a data management module for storage so as to generate a target data product.
2. The data processing method of claim 1, wherein the processing mode in the attribute of any knowledge unit comprises at least one of the following modes:
an extraction processing mode for extracting entities contained in the source text of any knowledge unit;
selecting a processing mode for selecting and obtaining the knowledge content of any knowledge unit from predefined options;
and the input processing mode is used for manually inputting the knowledge content of any knowledge unit.
3. The data processing method of claim 2, wherein the processing mode in the attribute of any knowledge unit further comprises:
and the mapping processing mode is used for reprocessing the knowledge content of any knowledge unit based on a mapping relation, and the mapping relation represents the corresponding relation between the knowledge content of any knowledge unit before reprocessing and the knowledge content after reprocessing.
4. The data processing method of claim 3, wherein the mapping process includes normalization, and the processing of the source text of any knowledge unit based on the knowledge type of any knowledge unit and the process manner in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit includes:
and matching the source text of any knowledge unit with the dictionary of the corresponding knowledge type in the data management module based on the knowledge type of any knowledge unit, and determining the knowledge content of any knowledge unit based on the matching result.
5. The data processing method of claim 1, wherein the processing the source text of any knowledge unit to obtain the knowledge content of any knowledge unit comprises:
and processing the source text of any knowledge unit by adopting a human intelligence and artificial intelligence cooperative mode to obtain the knowledge content of any knowledge unit.
6. The data processing method of any one of claims 1-5, wherein the attributes of each knowledge unit further comprise a hierarchical relationship between each knowledge unit and other knowledge units, the method further comprising:
and displaying the hierarchical structure relationship between each knowledge unit and other knowledge units so that the user can verify the target data product based on the hierarchical structure relationship.
7. A data processing apparatus, comprising:
the system comprises a document acquisition unit, a knowledge modeling unit and a knowledge modeling unit, wherein the document acquisition unit is used for acquiring a knowledge modeling document which comprises the knowledge type and the attribute of each knowledge unit;
the text determining unit is used for determining the source text of any knowledge unit from a corresponding source knowledge base in the data management module based on the source in the attribute of the knowledge unit;
the text processing unit is used for processing the source text of any knowledge unit based on the knowledge type of any knowledge unit and the processing mode in the attribute of any knowledge unit to obtain the knowledge content of any knowledge unit;
and the knowledge storage unit is used for sending the knowledge content of any knowledge unit to a corresponding output knowledge base in the data management module for storage based on the destination in the attribute of the knowledge unit so as to generate a target data product.
8. A data processing system, comprising:
the data processing apparatus of claim 7;
the knowledge modeling module is connected with the data processing device and used for creating a knowledge modeling document aiming at a target data product, recording updating contents and managing the version of the knowledge modeling document under the condition that the update of the knowledge modeling document is detected;
and the data management module is respectively connected with the data processing device and the knowledge modeling module and is used for storing and managing the source text and the knowledge content of any knowledge unit and a dictionary.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data processing method of any one of claims 1 to 6 when executing the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the data processing method according to any one of claims 1 to 6.
CN202211409956.8A 2022-11-10 2022-11-10 Data processing method, device, system, electronic device and storage medium Pending CN115795046A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211409956.8A CN115795046A (en) 2022-11-10 2022-11-10 Data processing method, device, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211409956.8A CN115795046A (en) 2022-11-10 2022-11-10 Data processing method, device, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN115795046A true CN115795046A (en) 2023-03-14

Family

ID=85436861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211409956.8A Pending CN115795046A (en) 2022-11-10 2022-11-10 Data processing method, device, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115795046A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116401703A (en) * 2023-03-28 2023-07-07 广东利元亨智能装备股份有限公司 Data processing method, data management platform, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116401703A (en) * 2023-03-28 2023-07-07 广东利元亨智能装备股份有限公司 Data processing method, data management platform, device and storage medium

Similar Documents

Publication Publication Date Title
US9652719B2 (en) Authoring system for bayesian networks automatically extracted from text
WO2019051426A1 (en) Pruning engine
CN111813963B (en) Knowledge graph construction method and device, electronic equipment and storage medium
EP3671526B1 (en) Dependency graph based natural language processing
CN111656453A (en) Hierarchical entity recognition and semantic modeling framework for information extraction
CN111897836A (en) Search system, method and storage medium
Kamalabalan et al. Tool support for traceability of software artefacts
CN115795046A (en) Data processing method, device, system, electronic device and storage medium
US9898467B1 (en) System for data normalization
US10360208B2 (en) Method and system of process reconstruction
US11816770B2 (en) System for ontological graph creation via a user interface
Randles et al. A vocabulary for describing mapping quality assessment, refinement and validation
Gkotse et al. Automatic Web Application Generation from an Irradiation Experiment Data Management Ontology (IEDM)
CN113434658A (en) Thermal power generating unit operation question-answer generation method, system, equipment and readable storage medium
Jain et al. Semantic web, ontologies and e-government: A review
CN115757823B (en) Data processing method, device, electronic equipment and storage medium
US11940964B2 (en) System for annotating input data using graphs via a user interface
CN117891531A (en) System parameter configuration method, system, medium and electronic equipment for SAAS software
CN106855842A (en) A kind of Program Static Analysis method based on regular expression
CN115309391A (en) Code segment multiplexing method and device, electronic equipment and storage medium
CN116431481A (en) Code parameter verification method and device based on multi-code condition
CN113886446A (en) Job automatic scheduling method and device, electronic equipment and readable storage medium
CN117332092A (en) Database knowledge graph construction method and device and electronic equipment
Gkotse et al. JACoW: Automatic Web Application Generation From an Irradiation Experiment Data Management Ontology (IEDM)
CN115563239A (en) Question answering method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination