CN116805012A - Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment - Google Patents

Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment Download PDF

Info

Publication number
CN116805012A
CN116805012A CN202310862945.3A CN202310862945A CN116805012A CN 116805012 A CN116805012 A CN 116805012A CN 202310862945 A CN202310862945 A CN 202310862945A CN 116805012 A CN116805012 A CN 116805012A
Authority
CN
China
Prior art keywords
score
evaluation
quality
data
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310862945.3A
Other languages
Chinese (zh)
Inventor
夏晓晴
李馨迟
褚洪澎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Technology Innovation Center
China Telecom Corp Ltd
Original Assignee
China Telecom Technology Innovation Center
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Technology Innovation Center, China Telecom Corp Ltd filed Critical China Telecom Technology Innovation Center
Priority to CN202310862945.3A priority Critical patent/CN116805012A/en
Publication of CN116805012A publication Critical patent/CN116805012A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to the technical field of knowledge maps and provides a quality evaluation method of a multi-mode knowledge map, a quality evaluation device of the multi-mode knowledge map, a computer storage medium and electronic equipment, wherein the method comprises the following steps: after a multi-modal data set for constructing a multi-modal knowledge graph is acquired, performing quality evaluation on the multi-modal data set to obtain a first evaluation score; in the process of constructing a multi-mode knowledge graph by utilizing the multi-mode data set, performing quality evaluation on intermediate process data obtained based on the multi-mode data set to obtain a second evaluation score; after the constructed multi-modal knowledge graph is obtained, carrying out quality evaluation on the output result of the multi-modal knowledge graph to obtain a third evaluation score; and obtaining a quality evaluation result according to the first evaluation score, the second evaluation score and the third evaluation score. The method and the device can evaluate the quality of the whole construction process of the multi-mode knowledge graph, and are convenient for purposefully improving the graph quality.

Description

Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment
Technical Field
The disclosure relates to the technical field of knowledge maps, and in particular relates to a quality evaluation method of a multi-modal knowledge map, a quality evaluation device of the multi-modal knowledge map, a computer storage medium and electronic equipment.
Background
Along with the rapid development of the telecommunication field, a telecommunication terminal stores a large amount of data of different types, including a plurality of modes such as texts, audios, images and videos, the data storage occupies a large amount of resources and has low utilization rate, and the relationship among the data can be effectively stored through a knowledge graph, so that the association relationship and knowledge among things can be recorded and modeled.
In recent years, along with the continuous development of multi-modal technology, a multi-modal knowledge graph-based technology is continuously proposed, and how to accurately evaluate the quality of a multi-modal knowledge graph is a current problem to be solved.
In view of the foregoing, there is a need in the art to develop a new quality assessment method and apparatus for multi-modal knowledge-base.
It should be noted that the information disclosed in the foregoing background section is only for enhancing understanding of the background of the present disclosure.
Disclosure of Invention
The disclosure aims to provide a quality evaluation method of a multi-modal knowledge graph, a quality evaluation device of the multi-modal knowledge graph, a computer storage medium and an electronic device, so as to overcome the technical problem that the quality of the multi-modal knowledge graph cannot be accurately evaluated due to the limitation of the related technology at least to a certain extent.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a quality assessment method of a multi-modal knowledge-graph, including: after a multi-modal data set for constructing a multi-modal knowledge graph is acquired, performing quality evaluation on the multi-modal data set to obtain a first evaluation score; in the process of constructing the multi-modal knowledge graph by utilizing the multi-modal data set, performing quality evaluation on intermediate process data obtained based on the multi-modal data set to obtain a second evaluation score; after the constructed multi-mode knowledge graph is obtained, carrying out quality evaluation on the output result of the multi-mode knowledge graph to obtain a third evaluation score; and obtaining a quality evaluation result according to the first evaluation score, the second evaluation score and the third evaluation score.
In an exemplary embodiment of the present disclosure, the method further comprises: collecting original multi-mode data sets from a plurality of service channels in a target service field; preprocessing the original multi-modal data set to obtain the multi-modal data set.
In an exemplary embodiment of the present disclosure, the multimodal dataset comprises a plurality of pieces of sample data; the quality evaluation of the multi-mode dataset to obtain the first evaluation score includes: acquiring index scores of the multi-modal dataset corresponding to each of at least two quality evaluation indexes; and weighting at least two index scores corresponding to the at least two quality evaluation indexes to obtain the first evaluation score.
In an exemplary embodiment of the present disclosure, the quality assessment indicator comprises a data integrity indicator; the obtaining an index score of the multi-modal dataset corresponding to each of at least two quality assessment indices includes: carrying out integrity detection on each sample data in the multi-mode data set to obtain a data quantity passing through the integrity detection; an index score of the multimodal dataset corresponding to the data integrity index is determined from a ratio between an amount of data passing the integrity detection and an overall amount of data of the plurality of sample data.
In an exemplary embodiment of the present disclosure, the quality assessment indicator comprises a data consistency indicator; the obtaining an index score of the multi-modal dataset corresponding to each of at least two quality assessment indices includes: performing similarity detection on any two pieces of sample data in the multi-mode data set to obtain an accumulated value of similarity between any two pieces of sample data; acquiring the product of the total data quantity of the plurality of pieces of sample data and the associated value thereof; and acquiring a ratio between the accumulated value and the product, and determining an index score of the multi-mode data set corresponding to the data consistency index according to a difference value between a preset numerical value and the ratio.
In an exemplary embodiment of the present disclosure, the sample data in the multimodal dataset includes at least two of: text sample data, image sample data, and audio sample data; the quality assessment index includes a data accuracy index, and the obtaining an index score of the multi-modal dataset for each of at least two quality assessment indices includes: performing error text detection on the text sample data to obtain a first number of error sample data in the text sample data; performing image quality detection on the image sample data to obtain a second number of image sample data which do not pass the image quality detection; performing audio quality detection on the audio sample data to obtain a third number of audio sample data which do not pass the audio quality detection; acquiring accumulated values of the first quantity, the second quantity and the third quantity; determining a data error rate corresponding to the multi-mode data set according to a ratio between the accumulated value and a total data amount of the plurality of pieces of sample data; and determining an index score of the multi-mode data set corresponding to the data accuracy index according to a difference value between a preset numerical value and the data error rate.
In an exemplary embodiment of the disclosure, the performing quality evaluation on the intermediate process data obtained based on the multi-modal dataset, to obtain a second evaluation score, includes: after extracting entity association relations from the multi-mode data set, acquiring a first quality score corresponding to the entity association relations; after generating an entity link result according to the entity and the entity association relationship, acquiring a second quality score of the entity link result; after the multi-mode knowledge graph is generated according to the entity link result, a third quality score of connectivity of the entity association relationship in the multi-mode knowledge graph is obtained; and weighting the first quality score, the second quality score and the third quality score to obtain the second evaluation score.
In an exemplary embodiment of the present disclosure, the performing quality evaluation on the output result of the multi-mode knowledge-graph to obtain a third evaluation score includes: collecting L output results of the multi-mode knowledge graph within a preset time period, and obtaining K scoring results of K field experts aiming at each output result; l and K are integers greater than 0; calculating the average value of the K scoring results aiming at each output result, calculating the square value of the difference between each scoring result and the average value, and determining the accumulated value of the K square values; acquiring the average value of L accumulated values corresponding to the L output results; and carrying out root-opening operation on the average value of the L accumulated values to obtain the third evaluation score.
In an exemplary embodiment of the present disclosure, the obtaining a quality assessment result according to the first assessment score, the second assessment score, and the third assessment score includes: determining a first weight corresponding to the first evaluation score, a second weight corresponding to the second evaluation score, and a third weight corresponding to the third evaluation score; and weighting the first evaluation score, the second evaluation score and the third evaluation score according to the first weight, the second weight and the third weight to obtain the quality evaluation result.
In an exemplary embodiment of the present disclosure, the determining a first weight corresponding to the first evaluation score, a second weight corresponding to the second evaluation score, and a third weight corresponding to the third evaluation score includes: data normalization is carried out on the first evaluation score, the second evaluation score and the third evaluation score, and a first normalization score, a second normalization score and a third normalization score are obtained; calculating information entropy corresponding to the first normalized score, and determining a first relative entropy corresponding to the information entropy; calculating information entropy corresponding to the second normalized score, and determining second relative entropy corresponding to the information entropy; calculating information entropy corresponding to the third normalized score, and determining a third relative entropy corresponding to the information entropy; and determining the first weight, the second weight and the third weight according to the first relative entropy, the second relative entropy and the third relative entropy.
In an exemplary embodiment of the disclosure, the determining the first weight, the second weight, and the third weight according to the first relative entropy, the second relative entropy, and the third relative entropy includes: acquiring accumulated values among the first relative entropy, the second relative entropy and the third relative entropy; determining the first weight according to the ratio between the first relative entropy and the accumulated value; determining the second weight according to the ratio between the second relative entropy and the accumulated value; and determining the third weight according to the ratio between the third relative entropy and the accumulated value.
According to a second aspect of the present disclosure, there is provided a quality assessment apparatus for a multi-modal knowledge-graph, comprising: the first evaluation module is used for carrying out quality evaluation on the multi-modal data set after the multi-modal data set for constructing the multi-modal knowledge graph is acquired, so as to obtain a first evaluation score; the second evaluation module is used for performing quality evaluation on intermediate process data obtained based on the multi-modal data set in the process of constructing the multi-modal knowledge graph by utilizing the multi-modal data set to obtain a second evaluation score; the third evaluation module is used for carrying out quality evaluation on the output result of the multi-modal knowledge graph after the constructed multi-modal knowledge graph is obtained, so as to obtain a third evaluation score; and the result output module is used for obtaining a quality evaluation result according to the first evaluation score, the second evaluation score and the third evaluation score.
According to a third aspect of the present disclosure, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the quality assessment method of a multimodal knowledge graph as described in the first aspect above.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the quality assessment method of the multimodal knowledge graph of the first aspect described above via execution of the executable instructions.
As can be seen from the above technical solutions, the quality evaluation method of the multi-modal knowledge graph, the quality evaluation device of the multi-modal knowledge graph, the computer storage medium and the electronic device in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:
in the technical solutions provided by some embodiments of the present disclosure, after a multimodal data set for constructing a multimodal knowledge graph is obtained, quality evaluation is performed on the multimodal data set to obtain a first evaluation score, in a process of constructing the multimodal knowledge graph by using the multimodal data set, quality evaluation is performed on intermediate process data obtained based on the multimodal data set to obtain a second evaluation score, after a constructed multimodal knowledge graph is obtained, quality evaluation is performed on an output result of the multimodal knowledge graph to obtain a third evaluation score, and quality evaluation results are obtained according to the first evaluation score, the second evaluation score and the third evaluation score.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 is a flow chart of a method for quality assessment of a multimodal knowledge graph in an embodiment of the disclosure;
FIG. 2 is a flow chart illustrating how a multi-modal dataset may be quality evaluated to obtain a first evaluation score in an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating how index scores corresponding to data integrity indices for a multi-modal dataset are obtained in an embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating how index scores corresponding to data consistency indices for a multi-modal dataset are obtained in an embodiment of the present disclosure;
FIG. 5 is a flow chart illustrating how index scores corresponding to data accuracy indices for a multimodal dataset are obtained in an embodiment of the disclosure;
FIG. 6 is a flow chart illustrating how intermediate process data obtained based on the multi-modal dataset may be quality evaluated to obtain a second evaluation score in an embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating how the output result of the multi-modal knowledge-graph is evaluated for quality, and a third evaluation score is obtained in an embodiment of the present disclosure;
FIG. 8 shows a flow diagram of how quality assessment results are obtained from a first assessment score, a second assessment score, and a third assessment score in an embodiment of the present disclosure;
FIG. 9 illustrates a flow diagram of how a first weight corresponding to the first assessment score, a second weight corresponding to the second assessment score, and a third weight corresponding to the third assessment score are determined in an embodiment of the present disclosure;
fig. 10 is a schematic diagram showing a structure of a quality assessment apparatus of a multi-modal knowledge-graph in an exemplary embodiment of the present disclosure;
fig. 11 shows a schematic structural diagram of an electronic device in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. in addition to the listed elements/components/etc.; the terms "first" and "second" and the like are used merely as labels, and are not intended to limit the number of their objects.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.
With the rapid development of the telecommunication field, a telecommunication terminal stores a large amount of data of different types, including a plurality of modes such as text, audio, image, video, etc., however, the data storage needs to occupy a large amount of resources, and the utilization rate of the data is low, so how to effectively utilize the data is an important research topic at present.
The knowledge graph is used as an association network of a graph model, has strong semantic processing capability and open organization capability, can effectively store the relationship among data, and records and models the association relationship among things.
In recent years, along with the continuous development of multi-modal techniques, multi-modal knowledge-graph techniques have been proposed. The multi-mode knowledge graph is characterized in that the multi-mode data has larger semantic difference, so that the accuracy is low and the coverage rate is low when knowledge extraction is performed, and therefore, how to effectively evaluate and optimize the quality of the multi-mode knowledge graph is a current urgent problem to be solved.
In the embodiment of the disclosure, a quality evaluation method of a multi-mode knowledge graph is provided first, which overcomes the defect that the quality of the multi-mode knowledge graph cannot be accurately evaluated in the related art at least to a certain extent.
Fig. 1 is a flow chart illustrating a method for quality assessment of a multi-modal knowledge-graph in an embodiment of the disclosure, where an execution subject of the method for quality assessment of a multi-modal knowledge-graph may be a server for quality assessment of a multi-modal knowledge-graph.
Referring to fig. 1, a quality assessment method of a multi-modal knowledge-graph according to one embodiment of the present disclosure includes the steps of:
Step S110, after a multi-modal data set for constructing a multi-modal knowledge graph is acquired, performing quality evaluation on the multi-modal data set to obtain a first evaluation score;
step S120, in the process of constructing a multi-mode knowledge graph by utilizing the multi-mode data set, performing quality evaluation on intermediate process data obtained based on the multi-mode data set to obtain a second evaluation score;
step S130, after the constructed multi-mode knowledge graph is obtained, performing quality evaluation on the output result of the multi-mode knowledge graph to obtain a third evaluation score;
step S140, obtaining a quality evaluation result according to the first evaluation score, the second evaluation score and the third evaluation score.
In the technical solution provided in the embodiment shown in fig. 1, after a multimodal data set for constructing a multimodal knowledge graph is obtained, the quality evaluation is performed on the multimodal data set to obtain a first evaluation score, in the process of constructing the multimodal knowledge graph by using the multimodal data set, the quality evaluation is performed on intermediate process data obtained based on the multimodal data set to obtain a second evaluation score, after a constructed multimodal knowledge graph is obtained, the quality evaluation is performed on an output result of the multimodal knowledge graph to obtain a third evaluation score, and according to the first evaluation score, the second evaluation score and the third evaluation score, the quality evaluation result is obtained.
The specific implementation of each step in fig. 1 is described in detail below:
each source or form of information may be referred to as a modality, and a multi-modality refers to a data form of different modalities or formats of the same modality, and generally represents mixed data in text, picture, audio, video, etc.
Before step S110, it should be noted that, the present disclosure may collect an original multi-modal dataset from a plurality of service channels (for example, a text library, a corpus, a video library of a telecommunication service system, a text library, a corpus, a video library of an internet of things terminal device, etc. of a target service domain (for example, a telecommunication domain), which may be set by itself according to practical situations, and the present disclosure does not specifically limit this), and then may perform preprocessing on the collected original multi-modal dataset, for example: error data filtering processing, feature extraction processing and the like to obtain a multi-mode data set.
The multimodal dataset may contain N (an integer greater than 1) pieces of sample data, which may be various types of mixed data of text, picture, audio, video, etc.
In step S110, after a multi-modal dataset for constructing a multi-modal knowledge-graph is acquired, a quality evaluation is performed on the multi-modal dataset, resulting in a first evaluation score.
In this step, the multi-modal data set may be obtained, and the quality evaluation may be performed on the multi-modal data set to obtain a first evaluation score.
Specifically, referring to fig. 2, fig. 2 is a flow chart illustrating how quality evaluation is performed on a multi-modal dataset to obtain a first evaluation score in an embodiment of the disclosure, including steps S201-S202:
in step S201, an index score of the multi-modal dataset corresponding to each of the at least two quality evaluation indexes is acquired.
In this step, it should be noted that the at least two quality evaluation indexes may include: a data integrity index, a data consistency index, and a data accuracy index.
For the above data integrity index, reference may be made to fig. 3, and fig. 3 is a flow chart illustrating how to obtain index scores corresponding to the data integrity index for a multi-modal data set in an embodiment of the disclosure, including steps S301-S302:
in step S301, integrity detection is performed on each piece of sample data in the multimodal dataset, and the amount of data passing the integrity detection is obtained.
In this step, integrity detection may be used to detect whether there is a missing or null value in the data. For example, after the integrity test is performed on the above N pieces of sample data, the amount of data passing the integrity test may be counted, for example: m, M is less than or equal to N.
In step S302, an index score of the multi-modal dataset corresponding to the data integrity index is determined from a ratio between the amount of data passing the integrity detection and the total amount of data of the plurality of pieces of sample data.
In this step, an index score of the multi-modal dataset corresponding to the data integrity index may be determined based on the ratio between the data amount M detected by the integrity and the total data amount N of the plurality of sample data, i.e
For the above data consistency index, reference may be made to fig. 4, and fig. 4 is a flow chart illustrating how to obtain index scores corresponding to the data consistency index in the multimodal dataset according to an embodiment of the disclosure, including steps S401 to S403:
in step S401, similarity detection is performed on any two pieces of sample data in the multimodal dataset, and an accumulated value of similarity between any two pieces of sample data is obtained.
In this step, similarity detection may be performed on any two pieces of sample data (e.g., i and j) out of the above-described N pieces of sample data to obtain similarity (i, j) thereof.
By way of example, assuming that there are 10 pieces of sample data in total, 9 pieces of similarity can be obtained, and thus, the 9 pieces of similarity can be accumulated to obtain an accumulated value Σsimilarity (i, j).
The specific manner used for similarity detection may be cosine distance, euclidean distance, etc., and may be set according to the actual situation, which is not particularly limited in this disclosure.
In step S402, the product between the total data amount of the plurality of pieces of sample data and the associated value thereof is acquired.
In this step, the association value may beThus, the product between the total data amount of the plurality of pieces of sample data and its associated value is +.>
In step S403, a ratio between the accumulated value and the product is obtained, and an index score of the multi-mode dataset corresponding to the data consistency index is determined according to a difference between the preset value and the ratio.
In this step, the ratio between the accumulated value and the product can be obtainedThe preset value may be 1, whereby +.>An index score corresponding to the data consistency index for the multimodal dataset is determined.
For the above data accuracy index, reference may be made to fig. 5, and fig. 5 is a flowchart illustrating how to obtain an index score corresponding to the data accuracy index of the multi-modal data set in an embodiment of the disclosure, including steps S501-S506:
in step S501, error text detection is performed on the text sample data, and a first number of error sample data in the text sample data is obtained.
In this step, in view of the fact that the sample data in the multimodal dataset may include text sample data, erroneous text detection (e.g., misspelling, detection of grammar errors) may be performed on the text sample data to obtain a first number of erroneous sample data in the text sample data.
In step S502, image quality detection is performed on the image sample data, and a second number of image sample data that has not passed the image quality detection is obtained.
In this step, image quality detection (e.g., detection of image sharpness, noise) may be performed on the image sample data in view of the fact that the sample data in the multimodal dataset may contain image sample data, to obtain a second amount of image sample data that fails the image quality detection.
In step S503, audio quality detection is performed on the audio sample data, and a third number of audio sample data that does not pass the audio quality detection is obtained.
In this step, in view of the fact that the sample data in the multi-modal dataset may include audio sample data, audio quality detection (e.g., detection of audio clarity, noise) may be performed on the audio sample data to obtain a third number of audio sample data that fails the audio quality detection.
In step S504, the accumulated values of the first number, the second number, and the third number are acquired.
In step S505, a data error rate corresponding to the multi-modal data set is determined according to a ratio between the accumulated value and a total data amount of the plurality of pieces of sample data.
In this step, the accumulated value may be compared with the total data amount N of the plurality of pieces of sample data, to obtain a data error rate corresponding to the multi-modal data set.
In step S506, an index score of the multi-modal dataset corresponding to the data accuracy index is determined according to a difference between the preset value and the data error rate.
In this step, the preset value may be 1, so that an index score of the multi-mode dataset corresponding to the data accuracy index may be determined according to a difference between 1 and the data error rate.
Referring next to fig. 2, in step S202, weighting processing is performed on at least two index scores corresponding to at least two quality evaluation indexes, to obtain a first evaluation score.
In this step, after obtaining the index score of the multi-modal data set for the data integrity index, the index score of the multi-modal data set for the data consistency index, and the index score of the multi-modal data set for the data accuracy index, the first evaluation score may be calculated based on the following formula 1:
Pre-score=w1×s1+w2×s2+w3×s3 equation 1
Wherein Pre-score indicates the first evaluation score, S1 indicates an index score of the multi-modal dataset with respect to the data integrity index, w1 indicates a weight coefficient preset for S1, S2 indicates an index score of the multi-modal dataset with respect to the data integrity index, w2 indicates a weight coefficient preset for S2, S3 indicates an index score of the multi-modal dataset with respect to the data integrity index, and w3 indicates a weight coefficient preset for S3.
After the first evaluation score is obtained, step S120 may be performed, in which quality evaluation is performed on intermediate process data obtained based on the multi-modal dataset in the process of constructing the multi-modal knowledge-graph using the multi-modal dataset, to obtain a second evaluation score.
In this step, referring to fig. 6, fig. 6 is a schematic flow chart showing how to perform quality evaluation on intermediate process data obtained based on a multi-modal dataset to obtain a second evaluation score in the embodiment of the present disclosure, including steps S601-S604:
in step S601, after extracting the entity association relationship from the multimodal dataset, a first quality score corresponding to the entity association relationship is obtained.
In this step, the entity and the entity association relationship, that is, the correlation between two or more entities, may be extracted from the multimodal dataset.
After the entity association relationship is extracted from the multi-mode data set, whether the entity association relationship can accurately reflect the semantic association relationship between different sample data can be evaluated, so that the accuracy rate P1 and the recall rate R1 of the entity association relationship are obtained, and further, the accuracy rate P1 and the recall rate R1 can be weighted and integrated to obtain a first quality score.
In step S602, after generating the entity link result according to the entity and entity association relationship, a second quality score of the entity link result is obtained.
In this step, entity linking refers to a process of unambiguously pointing an identified entity object (e.g., a person name, place name, organization name, etc.) in free text to a target entity in a knowledge base.
For example, after the entity link result is generated according to the entity and the entity association relationship, the quality of the entity link result may be evaluated to evaluate whether the entity link result is correct, so that the accuracy P2 and the recall R2 of the entity link result may be obtained, and further, the accuracy P2 and the recall R2 may be weighted and combined to obtain the second quality score.
In step S603, after the multi-mode knowledge graph is generated according to the entity linking result, a third quality score of connectivity of the entity association relationship in the multi-mode knowledge graph is obtained.
In this step, after the multi-mode knowledge graph is generated according to the entity linking result, connectivity of the entity association relationship in the multi-mode knowledge graph may be evaluated to obtain an accuracy rate P3 and a recall rate R3 corresponding to the connectivity of the entity association relationship, and further, the accuracy rate P3 and the recall rate R3 may be weighted and integrated to obtain a third quality score.
In step S604, the first quality score, the second quality score, and the third quality score are weighted to obtain a second evaluation score.
In this step, the first quality score, the second quality score, and the third quality score may be weighted based on the following formula 2 to obtain the second evaluation score:
f1-score=α×f1 (P1, R1) +β×f1 (P2, R2) +γ×f1 (P3, R3) formula 2
Wherein F1-score represents the second evaluation score, F1 (P1, R1) represents the first quality score, F1 (P2, R2) represents the second quality score, F1 (P3, R3) represents the third quality score, α represents a weighting coefficient corresponding to the first quality score, β represents a weighting coefficient corresponding to the second quality score, γ represents a weighting coefficient corresponding to the third quality score, and the weighting coefficient can be set according to the actual requirement, which is not particularly limited in the present disclosure.
After the first evaluation score and the second evaluation score are obtained, step S130 may be performed, and after the constructed multi-modal knowledge-graph is obtained, quality evaluation is performed on the output result of the multi-modal knowledge-graph to obtain a third evaluation score.
In this step, after the constructed multi-modal knowledge graph is obtained, fig. 7 may be referred to, and fig. 7 shows a flow chart of how to perform quality evaluation on the output result of the multi-modal knowledge graph in the embodiment of the disclosure, and a third evaluation score is obtained, including step S701-step S704:
in step S701, L output results of the multi-mode knowledge graph within a preset period are collected, and K scoring results of K domain experts for each output result are obtained.
In this step, L output results of the multi-modal knowledge graph in a preset time period (for example, 10 days, which can be set according to actual conditions, and the disclosure does not specifically limit the present disclosure), and K scoring results of K domain experts for each output result, may be acquired. For example, assuming that L is 3 and k is 5, each domain expert may score each output result, to obtain 5 scoring results corresponding to each output result.
In step S702, for each output result, the average value of K scoring results is calculated, and the square value of the difference between each scoring result and the average value is calculated, and the accumulated value of K square values is determined.
In this step, for each output result, the average value corresponding to the 5 scoring results may be calculated, and for example, assuming that for the output result a, the 5 scoring results are 9, 8, 9, 10, and 9, respectively, the average value corresponding to the 5 scoring results may be:dividing into two parts.
Thereafter, a square value of the difference between each scoring result and the average may be calculated, and for the output result a, the square value of the difference between each scoring result and the average may be: (9-9) 2 =0,(8-9) 2 =1,(9-9) 2 =0,(10-9) 2 =1,(9-9) 2 =0。
Thus, the accumulated value of the K square values may be: 0+1+0+1+0=2.
In step S703, the average value of the L accumulated values corresponding to the L output results is acquired.
In this step, the accumulated values of the plurality of square values corresponding to each output result may be sequentially calculated with reference to the explanation related to step S702, until L accumulated values corresponding to L output results are obtained, and then, the average value of the L accumulated values may be calculated.
In step S704, an open root number operation is performed on the average value of the L accumulated values, to obtain a third evaluation score.
In this step, the root number of the average value of the L accumulated values may be calculated to obtain the third evaluation score After-score.
After the first, second, and third evaluation scores are obtained, step S140 may be entered to obtain a quality evaluation result based on the first, second, and third evaluation scores.
In this step, reference may be made to fig. 8, and fig. 8 is a flow chart showing how quality evaluation results are obtained according to the first evaluation score, the second evaluation score, and the third evaluation score in the embodiment of the present disclosure, including step S801 to step S802:
in step S801, a first weight corresponding to the first evaluation score, a second weight corresponding to the second evaluation score, and a third weight corresponding to the third evaluation score are determined.
In this step, referring to fig. 9, fig. 9 is a flowchart illustrating how to determine a first weight corresponding to a first evaluation score, a second weight corresponding to a second evaluation score, and a third weight corresponding to a third evaluation score in the embodiment of the present disclosure, including steps S901-S905:
in step S901, the first evaluation score, the second evaluation score, and the third evaluation score are data-normalized, and the first normalized score, the second normalized score, and the third normalized score are obtained.
In this step, the first evaluation score, the second evaluation score, and the third evaluation score may be normalized to the [0,1] interval to obtain a first normalized score, a second normalized score, and a third normalized score.
In step S902, an information entropy corresponding to the first normalized score is calculated, and a first relative entropy corresponding to the information entropy is determined.
In this step, the information entropy corresponding to the first normalized score may be calculated based on the following formula 3, and the first relative entropy corresponding to the information entropy may be calculated based on the following formula 4:
E i =-(x i *log (x i ) Equation 3
Wherein when i is 1, x is as defined above i Representing a first normalized score, E i Representing the information entropy corresponding to the first normalized score, R i And the first relative entropy corresponding to the information entropy is represented.
In step S903, an information entropy corresponding to the second normalized score is calculated, and a second relative entropy corresponding to the information entropy is determined.
In this step, similarly, the relevant explanation of step S902 may be referred to, and when i is taken to be 2, the information entropy corresponding to the second normalized score is calculated by means of the above-described formula 3, and the second relative entropy is calculated based on the formula 4.
In step S904, an information entropy corresponding to the third normalized score is calculated, and a third relative entropy corresponding to the information entropy is determined.
In this step, similarly, the relevant explanation of step S902 may be referred to, and when i takes 3, the information entropy corresponding to the third normalized score is calculated by means of the above-described formula 3, and the third relative entropy is calculated based on the formula 4.
In step S905, the first weight, the second weight, and the third weight are determined according to the first relative entropy, the second relative entropy, and the third relative entropy.
In this step, normalization processing may be performed on the first relative entropy, the second relative entropy, and the third relative entropy, to determine a first weight, a second weight, and a third weight.
Specifically, two accumulated values of the first relative entropy, the second relative entropy and the third relative entropy can be obtained first, then, a first weight w4 is determined according to the ratio between the first relative entropy and the accumulated value, a second weight w5 is determined according to the ratio between the second relative entropy and the accumulated value, and a third weight w6 is determined according to the ratio between the third relative entropy and the accumulated value.
After determining the first weight, the second weight, and the third weight, step S802 may be performed to perform weighting processing on the first evaluation score, the second evaluation score, and the third evaluation score according to the first weight, the second weight, and the third weight, so as to obtain a quality evaluation result.
In this step, the product value between the first weight and the first evaluation score may be calculated, the product value between the second weight and the second evaluation score may be calculated, and the product value between the third weight and the third evaluation score may be calculated, and further, the quality evaluation result may be determined according to the accumulated values of the three product values.
After the quality evaluation result is obtained, the first evaluation score, the second evaluation score, the third evaluation score and the quality evaluation result can be jointly displayed to a constructor of the multi-modal knowledge graph, so that the constructor can purposefully refine, complement or improve the constructed multi-modal knowledge graph based on the data, and the quality of the multi-modal knowledge graph is further improved.
Based on the technical scheme, the method has at least the following technical effects:
according to the method and the device, quality evaluation is respectively carried out before, during and after construction of the multi-mode knowledge graph, and quality evaluation and control are carried out on the whole construction process of the multi-mode knowledge graph, so that the technical problems of large semantic difference among multi-mode data and low extracted knowledge accuracy can be effectively improved, and the knowledge construction accuracy of the knowledge graph and the construction quality of the graph are improved.
Based on the above technical solutions, the present disclosure may be applicable at least to the following application scenarios:
the method can be applied to a scene of multi-mode data knowledge graph construction, and can solve the problems of large semantic difference among multi-mode data, low accuracy of knowledge extraction and low coverage rate by performing quality evaluation operation on the multi-mode knowledge graph. By carrying out quality evaluation on the whole process of multi-modal knowledge graph construction, data errors and detection errors in the knowledge graph construction process can be effectively detected, so that the multi-modal knowledge graph is improved in a targeted manner, the quality of the multi-modal knowledge graph is improved continuously, richer data are provided for an application end, and the application quality of the multi-modal data is improved.
The present disclosure further provides a quality evaluation device for multi-modal knowledge patterns, and fig. 10 shows a schematic structural diagram of a quality evaluation device for multi-modal knowledge patterns in an exemplary embodiment of the present disclosure; as shown in fig. 10, the quality assessment apparatus 1000 of the multi-modal knowledge graph may include a first assessment module 1010, a second assessment module 1020, a third assessment module 1030, and a result output module 1030.
Wherein:
the first evaluation module 1010 is configured to perform quality evaluation on a multi-modal dataset for constructing a multi-modal knowledge graph after acquiring the multi-modal dataset, so as to obtain a first evaluation score;
a second evaluation module 1020, configured to perform quality evaluation on intermediate process data obtained based on the multi-modal dataset in a process of constructing the multi-modal knowledge graph by using the multi-modal dataset, to obtain a second evaluation score;
a third evaluation module 1030, configured to perform quality evaluation on an output result of the multi-modal knowledge-graph after obtaining the constructed multi-modal knowledge-graph, to obtain a third evaluation score;
and a result output module 1040, configured to obtain a quality evaluation result according to the first evaluation score, the second evaluation score, and the third evaluation score.
In an exemplary embodiment of the present disclosure, the first evaluation module 1010 is configured to:
collecting original multi-mode data sets from a plurality of service channels in a target service field; preprocessing the original multi-modal data set to obtain the multi-modal data set.
In an exemplary embodiment of the present disclosure, the first assessment module 1010 includes a plurality of pieces of sample data in a multi-modality dataset; the quality evaluation of the multi-mode dataset to obtain the first evaluation score includes: acquiring index scores of the multi-modal dataset corresponding to each of at least two quality evaluation indexes; and weighting at least two index scores corresponding to the at least two quality evaluation indexes to obtain the first evaluation score.
In an exemplary embodiment of the present disclosure, the quality assessment indicator comprises a data integrity indicator; the first evaluation module 1010 obtains an index score for the multimodal dataset corresponding to each of at least two quality assessment indices, comprising: carrying out integrity detection on each sample data in the multi-mode data set to obtain a data quantity passing through the integrity detection; an index score of the multimodal dataset corresponding to the data integrity index is determined from a ratio between an amount of data passing the integrity detection and an overall amount of data of the plurality of sample data.
In an exemplary embodiment of the present disclosure, the quality assessment indicator comprises a data consistency indicator; the first evaluation module 1010 obtains an index score for the multimodal dataset corresponding to each of at least two quality assessment indices, comprising: performing similarity detection on any two pieces of sample data in the multi-mode data set to obtain an accumulated value of similarity between any two pieces of sample data; acquiring the product of the total data quantity of the plurality of pieces of sample data and the associated value thereof; and acquiring a ratio between the accumulated value and the product, and determining an index score of the multi-mode data set corresponding to the data consistency index according to a difference value between a preset numerical value and the ratio.
In an exemplary embodiment of the present disclosure, the sample data in the multimodal dataset includes at least two of: text sample data, image sample data, and audio sample data; the quality-assessment indicators include data accuracy indicators, and the first assessment module 1010 obtains an indicator score for the multi-modal dataset corresponding to each of the at least two quality-assessment indicators, including: performing error text detection on the text sample data to obtain a first number of error sample data in the text sample data; performing image quality detection on the image sample data to obtain a second number of image sample data which do not pass the image quality detection; performing audio quality detection on the audio sample data to obtain a third number of audio sample data which do not pass the audio quality detection; acquiring accumulated values of the first quantity, the second quantity and the third quantity; determining a data error rate corresponding to the multi-mode data set according to a ratio between the accumulated value and a total data amount of the plurality of pieces of sample data; and determining an index score of the multi-mode data set corresponding to the data accuracy index according to a difference value between a preset numerical value and the data error rate.
In an exemplary embodiment of the present disclosure, the second evaluation module 1020 performs a quality evaluation on the intermediate process data obtained based on the multi-modal dataset to obtain a second evaluation score, including: after extracting entity association relations from the multi-mode data set, acquiring a first quality score corresponding to the entity association relations; after generating an entity link result according to the entity and the entity association relationship, acquiring a second quality score of the entity link result; after the multi-mode knowledge graph is generated according to the entity link result, a third quality score of connectivity of the entity association relationship in the multi-mode knowledge graph is obtained; and weighting the first quality score, the second quality score and the third quality score to obtain the second evaluation score.
In an exemplary embodiment of the present disclosure, the third evaluation module 1030 performs quality evaluation on the output result of the multi-modal knowledge-graph to obtain a third evaluation score, including: collecting L output results of the multi-mode knowledge graph within a preset time period, and obtaining K scoring results of K field experts aiming at each output result; l and K are integers greater than 0; calculating the average value of the K scoring results aiming at each output result, calculating the square value of the difference between each scoring result and the average value, and determining the accumulated value of the K square values; acquiring the average value of L accumulated values corresponding to the L output results; and carrying out root-opening operation on the average value of the L accumulated values to obtain the third evaluation score.
In an exemplary embodiment of the present disclosure, the result output module 1040 obtains a quality assessment result according to the first assessment score, the second assessment score, and the third assessment score, including: determining a first weight corresponding to the first evaluation score, a second weight corresponding to the second evaluation score, and a third weight corresponding to the third evaluation score; and weighting the first evaluation score, the second evaluation score and the third evaluation score according to the first weight, the second weight and the third weight to obtain the quality evaluation result.
In an exemplary embodiment of the present disclosure, the result output module 1040 determines a first weight corresponding to the first evaluation score, a second weight corresponding to the second evaluation score, and a third weight corresponding to the third evaluation score, including: data normalization is carried out on the first evaluation score, the second evaluation score and the third evaluation score, and a first normalization score, a second normalization score and a third normalization score are obtained; calculating information entropy corresponding to the first normalized score, and determining a first relative entropy corresponding to the information entropy; calculating information entropy corresponding to the second normalized score, and determining second relative entropy corresponding to the information entropy; calculating information entropy corresponding to the third normalized score, and determining a third relative entropy corresponding to the information entropy; and determining the first weight, the second weight and the third weight according to the first relative entropy, the second relative entropy and the third relative entropy.
In an exemplary embodiment of the present disclosure, the result output module 1040 determines the first weight, the second weight, and the third weight according to the first relative entropy, the second relative entropy, and the third relative entropy, including: acquiring accumulated values among the first relative entropy, the second relative entropy and the third relative entropy; determining the first weight according to the ratio between the first relative entropy and the accumulated value; determining the second weight according to the ratio between the second relative entropy and the accumulated value; and determining the third weight according to the ratio between the third relative entropy and the accumulated value.
The specific details of each module in the quality evaluation device of the multi-modal knowledge graph are described in detail in the quality evaluation method of the corresponding multi-modal knowledge graph, so that the details are not repeated here.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
The present application also provides a computer-readable storage medium that may be included in the electronic device described in the above embodiments; or may exist alone without being incorporated into the electronic device.
The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The computer-readable storage medium carries one or more programs which, when executed by one such electronic device, cause the electronic device to implement the methods described in the embodiments above.
In addition, an electronic device capable of realizing the method is provided in the embodiment of the disclosure.
Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 1100 according to such an embodiment of the present disclosure is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
As shown in fig. 11, the electronic device 1100 is embodied in the form of a general purpose computing device. Components of electronic device 1100 may include, but are not limited to: the at least one processing unit 1110, the at least one memory unit 1120, a bus 1130 connecting the different system components (including the memory unit 1120 and the processing unit 1110), and a display unit 1140.
Wherein the storage unit stores program code that is executable by the processing unit 1110 such that the processing unit 1110 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 1110 may perform the steps as shown in fig. 1: step S110, after a multi-modal data set for constructing a multi-modal knowledge graph is acquired, performing quality evaluation on the multi-modal data set to obtain a first evaluation score; step S120, in the process of constructing the multi-modal knowledge graph by utilizing the multi-modal data set, performing quality evaluation on intermediate process data obtained based on the multi-modal data set to obtain a second evaluation score; step S130, after the constructed multi-mode knowledge graph is obtained, performing quality evaluation on the output result of the multi-mode knowledge graph to obtain a third evaluation score; and step S140, obtaining a quality evaluation result according to the first evaluation score, the second evaluation score and the third evaluation score.
The storage unit 1120 may include a readable medium in the form of a volatile storage unit, such as a Random Access Memory (RAM) 11201 and/or a cache memory 11202, and may further include a Read Only Memory (ROM) 11203.
The storage unit 1120 may also include a program/utility 11204 having a set (at least one) of program modules 11205, such program modules 11205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus 1130 may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a bus using any of a variety of bus architectures.
The electronic device 1100 may also communicate with one or more external devices 1200 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 1100, and/or any devices (e.g., routers, modems, etc.) that enable the electronic device 1100 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1150. Also, electronic device 1100 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1160. As shown, network adapter 1160 communicates with other modules of electronic device 1100 via bus 1130. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 1100, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (14)

1. The quality evaluation method of the multi-mode knowledge graph is characterized by comprising the following steps of:
after a multi-modal data set for constructing a multi-modal knowledge graph is acquired, performing quality evaluation on the multi-modal data set to obtain a first evaluation score;
in the process of constructing the multi-modal knowledge graph by utilizing the multi-modal data set, performing quality evaluation on intermediate process data obtained based on the multi-modal data set to obtain a second evaluation score;
after the constructed multi-mode knowledge graph is obtained, carrying out quality evaluation on the output result of the multi-mode knowledge graph to obtain a third evaluation score;
And obtaining a quality evaluation result according to the first evaluation score, the second evaluation score and the third evaluation score.
2. The method according to claim 1, wherein the method further comprises:
collecting original multi-mode data sets from a plurality of service channels in a target service field;
preprocessing the original multi-modal data set to obtain the multi-modal data set.
3. The method of claim 1, wherein the multimodal dataset comprises a plurality of pieces of sample data;
the quality evaluation of the multi-mode dataset to obtain the first evaluation score includes:
acquiring index scores of the multi-modal dataset corresponding to each of at least two quality evaluation indexes;
and weighting at least two index scores corresponding to the at least two quality evaluation indexes to obtain the first evaluation score.
4. A method according to claim 3, wherein the quality assessment indicator comprises a data integrity indicator;
the obtaining an index score of the multi-modal dataset corresponding to each of at least two quality assessment indices includes:
Carrying out integrity detection on each sample data in the multi-mode data set to obtain a data quantity passing through the integrity detection;
an index score of the multimodal dataset corresponding to the data integrity index is determined from a ratio between an amount of data passing the integrity detection and an overall amount of data of the plurality of sample data.
5. A method according to claim 3, wherein the quality assessment indicator comprises a data consistency indicator;
the obtaining an index score of the multi-modal dataset corresponding to each of at least two quality assessment indices includes:
performing similarity detection on any two pieces of sample data in the multi-mode data set to obtain an accumulated value of similarity between any two pieces of sample data;
acquiring the product of the total data quantity of the plurality of pieces of sample data and the associated value thereof;
and acquiring a ratio between the accumulated value and the product, and determining an index score of the multi-mode data set corresponding to the data consistency index according to a difference value between a preset numerical value and the ratio.
6. A method according to claim 3, wherein the sample data in the multimodal dataset comprises at least two of: text sample data, image sample data, and audio sample data;
The quality assessment index includes a data accuracy index, and the obtaining an index score of the multi-modal dataset for each of at least two quality assessment indices includes:
performing error text detection on the text sample data to obtain a first number of error sample data in the text sample data;
performing image quality detection on the image sample data to obtain a second number of image sample data which do not pass the image quality detection;
performing audio quality detection on the audio sample data to obtain a third number of audio sample data which do not pass the audio quality detection;
acquiring accumulated values of the first quantity, the second quantity and the third quantity;
determining a data error rate corresponding to the multi-mode data set according to a ratio between the accumulated value and a total data amount of the plurality of pieces of sample data;
and determining an index score of the multi-mode data set corresponding to the data accuracy index according to a difference value between a preset numerical value and the data error rate.
7. The method of claim 1, wherein the performing a quality assessment of the intermediate process data obtained based on the multimodal dataset to obtain a second assessment score comprises:
After extracting entity association relations from the multi-mode data set, acquiring a first quality score corresponding to the entity association relations;
after generating an entity link result according to the entity and the entity association relationship, acquiring a second quality score of the entity link result;
after the multi-mode knowledge graph is generated according to the entity link result, a third quality score of connectivity of the entity association relationship in the multi-mode knowledge graph is obtained;
and weighting the first quality score, the second quality score and the third quality score to obtain the second evaluation score.
8. The method according to claim 1, wherein performing quality evaluation on the output result of the multi-modal knowledge-graph to obtain a third evaluation score includes:
collecting L output results of the multi-mode knowledge graph within a preset time period, and obtaining K scoring results of K field experts aiming at each output result; l and K are integers greater than 0;
calculating the average value of the K scoring results aiming at each output result, calculating the square value of the difference between each scoring result and the average value, and determining the accumulated value of the K square values;
Acquiring the average value of L accumulated values corresponding to the L output results;
and carrying out root-opening operation on the average value of the L accumulated values to obtain the third evaluation score.
9. The method of claim 1, wherein the obtaining a quality assessment result based on the first assessment score, the second assessment score, and the third assessment score comprises:
determining a first weight corresponding to the first evaluation score, a second weight corresponding to the second evaluation score, and a third weight corresponding to the third evaluation score;
and weighting the first evaluation score, the second evaluation score and the third evaluation score according to the first weight, the second weight and the third weight to obtain the quality evaluation result.
10. The method of claim 9, wherein the determining a first weight corresponding to the first evaluation score, a second weight corresponding to the second evaluation score, and a third weight corresponding to the third evaluation score comprises:
data normalization is carried out on the first evaluation score, the second evaluation score and the third evaluation score, and a first normalization score, a second normalization score and a third normalization score are obtained;
Calculating information entropy corresponding to the first normalized score, and determining a first relative entropy corresponding to the information entropy;
calculating information entropy corresponding to the second normalized score, and determining second relative entropy corresponding to the information entropy;
calculating information entropy corresponding to the third normalized score, and determining a third relative entropy corresponding to the information entropy;
and determining the first weight, the second weight and the third weight according to the first relative entropy, the second relative entropy and the third relative entropy.
11. The method of claim 10, wherein the determining the first, second, and third weights from the first, second, and third relative entropies comprises:
acquiring accumulated values among the first relative entropy, the second relative entropy and the third relative entropy;
determining the first weight according to the ratio between the first relative entropy and the accumulated value;
determining the second weight according to the ratio between the second relative entropy and the accumulated value;
and determining the third weight according to the ratio between the third relative entropy and the accumulated value.
12. A quality assessment device for a multi-modal knowledge graph, comprising:
the first evaluation module is used for carrying out quality evaluation on the multi-modal data set after the multi-modal data set for constructing the multi-modal knowledge graph is acquired, so as to obtain a first evaluation score;
the second evaluation module is used for performing quality evaluation on intermediate process data obtained based on the multi-modal data set in the process of constructing the multi-modal knowledge graph by utilizing the multi-modal data set to obtain a second evaluation score;
the third evaluation module is used for carrying out quality evaluation on the output result of the multi-modal knowledge graph after the constructed multi-modal knowledge graph is obtained, so as to obtain a third evaluation score;
and the result output module is used for obtaining a quality evaluation result according to the first evaluation score, the second evaluation score and the third evaluation score.
13. A computer storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method for quality assessment of a multimodal knowledge graph according to any one of claims 1-11.
14. An electronic device, comprising:
A processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the quality assessment method of the multimodal knowledge graph of any of claims 1-11 via execution of the executable instructions.
CN202310862945.3A 2023-07-13 2023-07-13 Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment Pending CN116805012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310862945.3A CN116805012A (en) 2023-07-13 2023-07-13 Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310862945.3A CN116805012A (en) 2023-07-13 2023-07-13 Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN116805012A true CN116805012A (en) 2023-09-26

Family

ID=88079501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310862945.3A Pending CN116805012A (en) 2023-07-13 2023-07-13 Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN116805012A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033668A (en) * 2023-10-07 2023-11-10 之江实验室 Knowledge graph quality assessment method and device, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033668A (en) * 2023-10-07 2023-11-10 之江实验室 Knowledge graph quality assessment method and device, storage medium and electronic equipment
CN117033668B (en) * 2023-10-07 2024-01-26 之江实验室 Knowledge graph quality assessment method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN111090641B (en) Data processing method and device, electronic equipment and storage medium
CN109714636B (en) User identification method, device, equipment and medium
CN111382255A (en) Method, apparatus, device and medium for question and answer processing
CN111949798B (en) Method and device for constructing map, computer equipment and storage medium
CN112507701A (en) Method, device, equipment and storage medium for identifying medical data to be corrected
CN110781413A (en) Interest point determining method and device, storage medium and electronic equipment
CN116805012A (en) Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment
CN111062431A (en) Image clustering method, image clustering device, electronic device, and storage medium
CN109582906B (en) Method, device, equipment and storage medium for determining data reliability
CN114758742A (en) Medical record information processing method and device, electronic equipment and storage medium
CN111161884A (en) Disease prediction method, device, equipment and medium for unbalanced data
CN108563645B (en) Metadata translation method and device of HIS (hardware-in-the-system)
CN111950267B (en) Text triplet extraction method and device, electronic equipment and storage medium
CN112528062A (en) Cross-modal weapon retrieval method and system
CN115346145A (en) Method, device, storage medium and computer program product for identifying repeated video
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
CN110263083A (en) Processing method, device, equipment and the medium of knowledge mapping
CN111063445A (en) Feature extraction method, device, equipment and medium based on medical data
WO2023060954A1 (en) Data processing method and apparatus, data quality inspection method and apparatus, and readable storage medium
CN110515758A (en) A kind of Fault Locating Method, device, computer equipment and storage medium
CN112949777B (en) Similar image determining method and device, electronic equipment and storage medium
CN114297453B (en) Alarm prediction method and device, electronic equipment and storage medium
CN115762704A (en) Prescription auditing method, device, equipment and storage medium
CN110675009A (en) Method and device for evaluating convenience of cell
CN109597881A (en) Matching degree determines method, apparatus, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination