CN117033724B

CN117033724B - Multi-mode data retrieval method based on semantic association

Info

Publication number: CN117033724B
Application number: CN202311071657.2A
Authority: CN
Inventors: 张鸡环
Original assignee: Guangzhou Joysim Technology Co ltd
Current assignee: Guangzhou Joysim Technology Co ltd
Priority date: 2023-08-24
Filing date: 2023-08-24
Publication date: 2024-05-03
Anticipated expiration: 2043-08-24
Also published as: CN117033724A

Abstract

The invention discloses a multi-mode data retrieval method based on semantic association, which relates to the technical field of multi-mode data retrieval and comprises the following steps: the method comprises the steps of collecting modal data information and retrieval evaluation index information during operation of a multi-modal data retrieval system based on semantic association, comprehensively analyzing to generate accuracy evaluation indexes, establishing a data set, comprehensively analyzing the accuracy evaluation indexes in the set to generate operation state signals, and respectively sending different prompts. According to the invention, through evaluating the accuracy of the multi-mode data retrieval system during semantic association modeling, when the accuracy is reduced, the system senses in time, and prompts relevant maintenance personnel to take corresponding maintenance and optimization measures, so that the accuracy of the semantic association modeling is ensured, the semantic association among data is well captured by the model, the decrease of the correlation between the retrieval result returned by the system and the user query is effectively prevented, and the misleading retrieval result is effectively prevented from being provided for the user.

Description

Multi-mode data retrieval method based on semantic association

Technical Field

The invention relates to the technical field of multi-mode data retrieval, in particular to a multi-mode data retrieval method based on semantic association.

Background

The multi-modal data retrieval system based on semantic association is a comprehensive software system, and aims to achieve semantic association among various data types (text, images, audio and the like) so as to retrieve related cross-modal data under the condition that a user provides inquiry, and the system can provide more accurate, comprehensive and intelligent information retrieval experience for the user.

The multi-mode data retrieval method based on semantic association generally comprises the processes of data preprocessing and feature extraction, semantic association modeling, query and retrieval, feedback and optimization, and for different data modes, preprocessing and feature extraction are needed to be performed firstly so as to represent the data into a vector form suitable for processing. For example, for text data, natural language processing techniques (e.g., word embedding, TF-IDF, etc.) may be used to convert text to a vector representation. For image data, a Convolutional Neural Network (CNN) may be used to extract image features, while for audio data, a spectrogram or MFCC method may be used to extract audio features. The objective of semantic association modeling is to map data of different modes to a shared semantic space, after the semantic association modeling, when a user provides a query, the query data is converted into a shared semantic representation, and the shared semantic representation is matched with the data which is already converted into the shared semantic representation in a database, and finally, according to feedback of the user, the semantic association of the model can be further optimized.

The prior art has the following defects:

The semantic association modeling is the core of the multi-modal data retrieval system, if the semantic association modeling accuracy is poor, the multi-modal data retrieval system cannot sense in time, the poor semantic association modeling accuracy means that the model cannot capture semantic association between data well, and the correlation between the retrieval result returned by the system and the user query is reduced, so that the retrieval accuracy of the whole multi-modal data retrieval system is affected, and misleading retrieval results can be provided for users.

The above information disclosed in the background section is only for enhancement of understanding of the background of the disclosure and therefore it may include information that does not form the prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide a multi-mode data retrieval method based on semantic association, which evaluates the accuracy of the multi-mode data retrieval system during semantic association modeling, timely senses the multi-mode data retrieval system when the accuracy of the multi-mode data retrieval system during semantic association modeling is reduced, prompts relevant maintenance personnel to take corresponding maintenance and optimization measures, ensures the accuracy of the semantic association modeling, ensures that the model well captures the semantic association between data, effectively prevents the retrieval accuracy of the whole multi-mode data retrieval system from being influenced by the reduced correlation between the retrieval result returned by the system and the query of a user, and simultaneously effectively prevents the retrieval result providing misleading for the user so as to solve the problems in the background technology.

In order to achieve the above object, the present invention provides the following technical solutions: the multi-mode data retrieval method based on semantic association comprises the following steps:

S100, acquiring a plurality of data messages based on semantic association when the multi-mode data retrieval system operates, wherein the plurality of data messages comprise mode data messages and retrieval evaluation index messages, and processing the mode data messages and the retrieval evaluation index messages after acquisition;

s200, comprehensively analyzing the processed modal data information and the retrieval evaluation index information in the operation process of the multi-modal data retrieval system to generate an accuracy evaluation index;

S300, establishing a data set of a plurality of accuracy evaluation indexes generated when the multi-mode data retrieval system operates, and comprehensively analyzing the accuracy evaluation indexes in the data set to generate an operation state signal;

S400, respectively sending different prompts to running state signals generated when the multi-mode data retrieval system runs.

Preferably, the modal data information comprises a modal sample data volume balance coefficient and a modal data volume similarity degree anomaly coefficient, after acquisition, the modal sample data volume balance coefficient and the modal data volume similarity degree anomaly coefficient are respectively calibrated to be PH ^MT and XS ^MT, the retrieval evaluation index information comprises a retrieval recall rate anomaly concealment coefficient, and after acquisition, the retrieval recall rate anomaly concealment coefficient is calibrated to be JS ^YC.

Preferably, the logic for obtaining the modal sample data size balance coefficient is as follows:

S101, acquiring sample data amounts in different modes at the same moment in the operation process of the multi-mode data retrieval system, and calibrating the sample data amounts to be beta ^SJ _x, wherein x represents the numbers of the modes in the different modes at the same moment in the operation process of the multi-mode data retrieval system, and x=1, 2, 3, 4, … …, m and m are positive integers;

S102, calculating standard deviations of sample data amounts in different modes at the same moment in the operation process of the multi-mode data retrieval system, and calibrating the standard deviations of the sample data amounts as R, wherein the standard deviations are as follows:

，

Wherein, For the average value of sample data amounts in different modes at the same moment in the operation process of the multi-mode data retrieval system, the acquired calculation formula is as follows: /(I)

S103, obtaining standard deviations of sample data amounts generated when the multi-mode data retrieval system operates at different moments in T time, and recalibrating the standard deviations of the sample data amounts to be R _y,y to represent numbers of the standard deviations of the sample data amounts generated when the multi-mode data retrieval system operates at different moments in T time, wherein y=1, 2, 3,4, … … and n are positive integers;

S104, establishing a data set of sample data volume standard deviations generated in the operation T time of the multi-mode data retrieval system, sequencing the sample data volume standard deviations in the data set according to the sequence, and calibrating the maximum sample data volume standard deviation in the data set as R _max;

S105, calculating a modal sample data volume balance coefficient through a maximum sample data volume standard deviation R _max in a data set, wherein the calculated expression is as follows:

Preferably, the logic for obtaining the similarity degree anomaly coefficient of the modal data volume is as follows:

s201, converting all modal data into vector representation;

S202, carrying out normalization processing on each vector to ensure that data of different modes have the same weight in distance calculation and have unit norms;

s203, for each mode, calculating the internal Euclidean distance of the mode;

For the ith modality, assuming that its vector is expressed as Aiv, the euclidean distance calculation formula is:

wherein, aiv' is the number of corresponding elements of Aiv and other corresponding vectors, v represents the i-th mode on the same dimension, v=1, 2,3, 4, … …, p is a positive integer.

S204, acquiring internal Euclidean distances of each mode of the multi-mode data retrieval system at different moments in T time, and calibrating the internal Euclidean distances as Distance _j, wherein j represents the number of the internal Euclidean distances of each mode of the multi-mode data retrieval system at different moments in T time, and j=1, 2, 3, 4, … …, q and q are positive integers;

S205, establishing a data set of the internal Euclidean Distance of each mode of the multi-mode data retrieval system in the time T, sequencing the internal Euclidean distances in the data set according to the sequence, and calibrating the maximum internal Euclidean Distance in each mode as a Distance ^{Maximum value};

S206, calculating a similarity degree anomaly coefficient of the modal data volume, wherein the calculated expression is as follows: Wherein x represents the number of the mode of the multi-mode data retrieval system, and x=1, 2,3, 4, … …, m are positive integers.

Preferably, the logic for retrieving the recall anomaly concealment coefficients is as follows:

S301, acquiring an optimal retrieval recall rate range of a multi-mode data retrieval system, and calibrating the optimal retrieval recall rate range as gamma ^ZH _min～γ^ZH _max;

s302, acquiring retrieval recall rates of the multi-mode data retrieval system in different time periods within T time, and calibrating the retrieval recall rates as gamma ^ZH _r, wherein r represents the number of the retrieval recall rates of the multi-mode data retrieval system in different time periods within T time, and r=1, 2, 3, 4, … … and a are positive integers;

the calculation formula of the recall rate is as follows: recall = number of relevant data retrieved/total number of all relevant data;

S303, calibrating a retrieval recall rate smaller than an optimal retrieval recall rate range gamma ^ZH _min～γ^ZH _max as gamma ^ZH _u, wherein u represents the number of the retrieval recall rate smaller than an optimal retrieval recall rate range gamma ^ZH _min～γ^ZH _max, and u=1, 2, 3, 4, … … and e are positive integers;

s304, calculating a retrieval recall abnormal hiding coefficient, wherein the calculated expression is as follows: Wherein/>

Preferably, after the modal sample data volume balance coefficient PH ^MT, the modal data volume similarity degree anomaly coefficient XS ^MT and the retrieval recall anomaly concealment coefficient JS ^YC are obtained, an evaluation model is established, and an accuracy evaluation index theta ^zqd _w is generated according to the following formula:

Wherein x1, x2 and x3 are respectively preset scale coefficients of a modal sample data volume balance coefficient PH ^MT, a modal data volume similarity degree anomaly coefficient XS ^MT and a retrieval recall rate anomaly concealment coefficient JS ^YC, and x1, x2 and x3 are all larger than 0.

Preferably, a data set is established by a plurality of accuracy evaluation indexes generated when the multi-mode data retrieval system is operated, and the data set is calibrated to be F, then F＝{θ^zqd _w}＝{θ^zqd ₁、θ^zqd ₂、…、θ^zqd _s},w＝1、2、3、4、……、s,s is a positive integer;

Calculating the average value and standard deviation of a plurality of accuracy evaluation indexes in a data set, respectively calibrating the accuracy evaluation index average value and the accuracy evaluation index standard deviation as P1 and P2, and respectively comparing the accuracy evaluation index average value P1 and the accuracy evaluation index standard deviation P2 with a preset accuracy evaluation index reference threshold K1 and a preset standard deviation reference threshold K2 to generate the following conditions:

If P1 is greater than or equal to K1, generating a first running state signal;

If P1 is smaller than K1 and P2 is larger than or equal to K2, generating a second running state signal;

if P1 is less than K1 and P2 is less than K2, a third operating state signal is generated.

Preferably, when the first running state signal is obtained, a first-level accuracy early warning prompt is sent out to prompt relevant maintainers that the accuracy is poor in semantic association modeling when the multi-mode data retrieval system runs, and the multi-mode data retrieval system needs to be maintained and optimized in time;

When a second running state signal is obtained, a secondary accuracy early warning prompt is sent out to prompt relevant maintainers that the accuracy is good and bad when the multi-mode data retrieval system runs, the running state is extremely unstable, and the multi-mode data retrieval system needs to be maintained and optimized in time;

and when the third running state signal is acquired, no early warning prompt is sent out.

In the technical scheme, the invention has the technical effects and advantages that:

According to the invention, through evaluating the accuracy of the multi-mode data retrieval system during semantic association modeling, when the accuracy of the multi-mode data retrieval system during semantic association modeling is reduced, the system senses in time and prompts relevant maintainers to take corresponding maintenance and optimization measures, so that the accuracy of the semantic association modeling is ensured, the semantic association among data is well captured by the model, the influence of the reduced correlation between the retrieval result returned by the system and the user query on the retrieval accuracy of the whole multi-mode data retrieval system is effectively prevented, and the misleading retrieval result is effectively prevented from being provided for the user;

According to the invention, the data set is established for comprehensive analysis by establishing the accuracy condition during semantic association modeling of the multi-mode data retrieval system, so that the analysis of the abnormal condition during semantic association modeling of the multi-mode data retrieval system can be realized, the related maintainers can know the abnormal condition during semantic association modeling conveniently, and can conduct targeted maintenance and optimization, and the maintenance and optimization efficiency is improved;

According to the method, the data set is established for comprehensive analysis on the accuracy condition during semantic association modeling of the multi-mode data retrieval system, the early warning prompt can be prevented from being sent out due to sudden abnormality of the accuracy during semantic association modeling, the early warning accuracy can be improved, the trust degree of relevant maintainers on the early warning prompt is further improved, and the stable and efficient operation of the multi-mode data retrieval system is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for those skilled in the art.

FIG. 1 is a flow chart of a method of the multi-modal data retrieval method based on semantic association.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

The invention provides a multi-mode data retrieval method based on semantic association as shown in figure 1, which comprises the following steps:

The modal data information comprises a modal sample data volume balance coefficient and a modal data volume similarity degree anomaly coefficient, and after acquisition, the modal sample data volume balance coefficient and the modal data volume similarity degree anomaly coefficient are respectively calibrated to be PH ^MT and XS ^MT;

In the semantic association modeling stage, the large sample data size difference between different modalities can cause the following serious influence on the accuracy of the multi-modality data retrieval system:

Unbalanced sample problem: sample data volume unbalance may cause that information of certain modes is ignored or not fully considered in the modeling process, while other modes with more samples may dominate the learning process of the whole semantic association model, which causes that the response of the system to certain modes is poor in query, thereby affecting the accuracy of the retrieval result;

Overfitting problem: modalities with smaller sample data sizes can easily cause overfitting, and particularly in the case of higher sample dimensions, the model can excessively depend on the characteristics of a small number of samples, so that more data conditions cannot be generalized, higher errors can be generated during actual retrieval, and the robustness of the system is reduced;

Cross-modality consistency is difficult to capture: in semantic association modeling, cross-modal consistency is a key problem, and modalities with smaller data volume often cannot capture consistency information between other modalities well, so that the model is difficult to accurately establish semantic association between modalities, and accuracy of a retrieval result is affected;

Information loss: modalities with a smaller data volume may not fully express their inherent semantic information, resulting in information loss, which affects the accurate understanding of semantic association between the system and the modalities, thereby reducing the overall accuracy performance of the retrieval system;

Therefore, the sample data volume condition of the multi-mode data retrieval system in the semantic association modeling stage is monitored, and the problem that the sample data volume difference between different modes is larger to influence the semantic association modeling accuracy can be perceived;

The logic for acquiring the modal sample data volume balance coefficient is as follows:

It should be noted that, for a system using a database to store data, a database monitoring tool may be used to monitor the number of samples of different modes in the database in real time, where the tools may provide information such as the size, the number of rows, and the index usage situation of the database table, for example DataDog is a powerful cloud monitoring platform, and supports multiple databases, including MySQL, postgreSQL, mongoDB, etc., dataDog provides real-time monitoring and alarm functions, and may monitor performance indexes and data volume information of the database; for another example, prometheus is an open source system monitoring and alarm tool that supports a variety of databases, such as MySQL, postgreSQL, etc., and Prometheus obtains sample data amounts and other metrics of the databases in real time through custom query language PromQL;

，

The sample data size standard deviation R can be known that the larger the representation value of the sample data size standard deviation R in different modes at the same moment in the operation process of the multi-mode data retrieval system is, the worse the balance of the sample data sizes in different modes at the same moment is, and otherwise, the better the balance of the sample data sizes in different modes at the same moment is;

The calculation expression of the modal sample data volume balance coefficient shows that the larger the expression value of the modal sample data volume balance coefficient generated when the multi-modal data retrieval system operates in the T time is, the worse the accuracy of the multi-modal data retrieval system in semantic association modeling is, and otherwise, the better the accuracy of the multi-modal data retrieval system in semantic association modeling is;

After semantic association modeling, the euclidean distance between data is large, which will have the following serious effects on the accuracy of semantic association modeling:

Low correlation matching: a larger euclidean distance means that the vector representations between the data are far apart in the semantic space, which may result in the associated data not being matched exactly, e.g., a pair of related images and text should be a smaller distance in the semantic space, but if the distance is larger, the system may not be able to match correctly;

Inaccurate search results: in a multi-mode data retrieval task, a user generally hopes that the system can return data of multiple modes related to query, and if the distance of the data in a semantic space is large, the system can return irrelevant or inaccurate data, so that the quality of a retrieval result is reduced;

misunderstanding semantic association: the larger distance of the data in the semantic space may cause a deviation of the understanding of the semantic association by the system, and the system may erroneously consider the irrelevant data as relevant or ignore some data which is actually relevant, thereby misunderstanding the real semantic association between the data;

Reducing system efficiency: a vector representation with a larger distance may increase the computational burden of the retrieval system, requiring more time and resources for distance computation of high-dimensional vectors on a large-scale dataset, which will reduce the efficiency and response speed of the system;

It should be noted that, similarity measurement indexes, such as cosine similarity, euclidean distance, manhattan distance, etc., between the data representations after semantic association modeling are obtained, and these indexes can be used to measure the similarity degree of different data in the semantic space, so as to evaluate the modeling accuracy;

Therefore, the Euclidean distance between sample data after semantic association modeling of the multi-mode data retrieval system is monitored, and the problem that the Euclidean distance abnormality between the sample data affects the semantic association modeling accuracy can be perceived;

The logic for acquiring the abnormal coefficient of the similarity degree of the modal data volume is as follows:

s201, converting all modal data into vector representation;

this can be achieved by various methods, such as extracting features using a pre-trained deep learning model, or converting text data into a vector representation using word embedding techniques, assuming n modalities, one for each modality;

S202, carrying out normalization processing on each vector to ensure that data of different modes have the same weight in distance calculation and the data of different modes have unit norms (namely L2 norms are 1);

by doing so, the order of magnitude difference between modes can be eliminated, so that the distance calculation is more robust;

s203, for each mode, calculating the internal Euclidean distance of the mode;

S206, calculating a similarity degree anomaly coefficient of the modal data volume, wherein the calculated expression is as follows: Wherein x represents the mode number of the multi-mode data retrieval system, and x=1, 2,3, 4, … … and m are positive integers;

The calculation expression of the abnormal coefficient of the similarity degree of the modal data volume shows that the larger the expression value of the abnormal coefficient of the similarity degree of the modal data volume generated when the multi-modal data retrieval system operates in the T time is, the worse the accuracy of the multi-modal data retrieval system in semantic association modeling is, and otherwise, the better the accuracy of the multi-modal data retrieval system in semantic association modeling is;

The retrieval evaluation index information comprises a retrieval recall rate abnormal hiding coefficient, and after acquisition, the retrieval recall rate abnormal hiding coefficient is marked as JS ^YC;

In a multi-mode data retrieval system, a semantic retrieval stage refers to that after a user inputs a query, the system performs semantic search and retrieves related data according to query content, and if the retrieval recall rate of the system in the semantic retrieval stage is low, that is, the data related to the query cannot be effectively retrieved, the accuracy of semantic association modeling can be seriously affected as follows:

Incomplete semantic association modeling: the low recall means that the system fails to retrieve all query-related data, which results in a lack of a portion of important data samples during the semantic association modeling phase, and thus the semantic association model may lack a complete understanding of global data relationships, thereby affecting modeling accuracy and comprehensiveness;

Semantic association model of bias errors: because some relevant data samples are missed, the system may be more prone to bias and error in modeling, which may lead to inaccurate understanding of the semantic association model for certain categories or topics;

data imbalance problem: the low recall rate may lead to an imbalance in the distribution of data samples of different categories or topics in semantic association modeling, which would make the model perform better on some categories and worse on other categories, resulting in instability and unreliability of the model performance;

The accuracy of the search result is reduced: the accuracy of semantic association modeling depends on the effect of the retrieval stage, and if the retrieval recall rate is low, the model can be modeled in limited data samples, and the data samples can not necessarily represent the whole semantic association, so that the accuracy of the final retrieval result is reduced;

therefore, the retrieval recall rate of the multi-mode data retrieval system after semantic association modeling is monitored, and the problem that the abnormal retrieval recall rate affects the semantic association modeling accuracy can be perceived;

The logic for retrieving recall anomaly concealment coefficients is as follows:

It should be noted that, a control experiment is designed, different recall thresholds or other semantic association models are used for comparison, and performance under different recall ranges can be evaluated by using a plurality of data sets, different query types and different models, so that an optimal retrieval recall range of the multi-mode data retrieval system is obtained, the optimal retrieval recall range is not specifically limited, and is set according to different requirements;

S302, acquiring retrieval recall rates of the multi-mode data retrieval system in different time periods (the time periods can be equal or unequal) within the T time, and calibrating the retrieval recall rates to be gamma ^ZH _r, wherein r represents the number of the retrieval recall rates of the multi-mode data retrieval system in different time periods within the T time, and r=1, 2, 3, 4, … …, a and a are positive integers;

It should be noted that if there is a labeled data set, which includes related data samples of various categories or topics, and a standard answer of related data is defined for each query, then the number and total number of related data can be directly obtained from the labeled data set, and in the semantic retrieval stage, the number of data samples that are returned by the system and matched with the standard answer is the number of related data that is retrieved;

The calculation expression of the retrieval recall abnormal hiding coefficient shows that the larger the expression value of the retrieval recall abnormal hiding coefficient generated when the multi-mode data retrieval system operates in the T time is, the worse the accuracy of the multi-mode data retrieval system in semantic association modeling is shown, and otherwise, the better the accuracy of the multi-mode data retrieval system in semantic association modeling is shown;

After the modal sample data volume balance coefficient PH ^MT, the modal data volume similarity degree anomaly coefficient XS ^MT and the retrieval recall anomaly concealment coefficient JS ^YC are obtained, an evaluation model is built, an accuracy evaluation index theta ^zqd _w is generated according to the following formula:

Wherein x1, x2 and x3 are respectively preset proportional coefficients of a modal sample data volume balance coefficient PH ^MT, a modal data volume similarity degree anomaly coefficient XS ^MT and a retrieval recall rate anomaly concealment coefficient JS ^YC, and x1, x2 and x3 are all larger than 0;

The calculation formula shows that the larger the modal sample data volume balance coefficient generated by the multi-modal data retrieval system when running in the T time is, the larger the modal data volume similarity degree anomaly coefficient is, the larger the retrieval recall ratio anomaly concealment coefficient is, namely the larger the expression value of the accuracy evaluation index theta ^zqd _w generated by the multi-modal data retrieval system when running in the T time is, the worse the accuracy of the multi-modal data retrieval system when carrying out semantic association modeling is shown, the smaller the modal sample data volume balance coefficient generated by the multi-modal data retrieval system when running in the T time is, the smaller the modal data volume similarity degree anomaly coefficient is, the smaller the retrieval recall ratio anomaly concealment coefficient is, namely the smaller the expression value of the accuracy evaluation index theta ^zqd _w generated by the multi-modal data retrieval system when running in the T time is, and the better the accuracy of the multi-modal data retrieval system when carrying out semantic association modeling is shown;

Establishing a data set from a plurality of accuracy evaluation indexes generated when the multi-mode data retrieval system operates, and calibrating the data set as F, wherein F＝{θ^zqd _w}＝{θ^zqd ₁、θ^zqd ₂、…、θ^zqd _s},w＝1、2、3、4、……、s,s is a positive integer;

if P1 is greater than or equal to K1, generating a first running state signal, wherein the first running state signal indicates that the accuracy of semantic association modeling is poor when the multi-mode data retrieval system runs;

If P1 is smaller than K1 and P2 is larger than or equal to K2, generating a second running state signal, wherein the second running state signal indicates that the accuracy in semantic association modeling is good and bad when the multi-mode data retrieval system runs, and the running state is extremely unstable;

if P1 is smaller than K1 and P2 is smaller than K2, generating a third running state signal, wherein the third running state signal indicates that the accuracy of semantic association modeling is better when the multi-mode data retrieval system runs;

it should be noted that, if the accuracy evaluation index is greater than or equal to the accuracy evaluation index reference threshold, the accuracy of the multi-mode data retrieval system in semantic association modeling is relatively poor, and if the accuracy evaluation index is less than the accuracy evaluation index reference threshold, the accuracy of the multi-mode data retrieval system in semantic association modeling is relatively good;

S400, respectively sending different prompts to running state signals generated when the multi-mode data retrieval system runs;

When a first running state signal is acquired, a first-level accuracy early warning prompt is sent out to prompt relevant maintainers that the accuracy of a multi-mode data retrieval system is poor in semantic association modeling when the multi-mode data retrieval system runs, the multi-mode data retrieval system needs to be maintained and optimized in time, the accuracy of the semantic association modeling is ensured, the semantic association among data is well captured by a model, the influence of the decrease of the correlation between a retrieval result returned by the system and user query on the retrieval accuracy of the whole multi-mode data retrieval system is effectively prevented, and meanwhile misleading retrieval results are effectively prevented from being provided for users;

When a second running state signal is obtained, a secondary accuracy early warning prompt is sent out to prompt relevant maintainers that the accuracy is good and bad when the multi-mode data retrieval system runs, the running state is extremely unstable, and the multi-mode data retrieval system needs to be maintained and optimized in time so that the system can run stably and efficiently;

When a third running state signal is obtained, no early warning prompt is sent out, which indicates that the accuracy of semantic association modeling is better when the multi-mode data retrieval system runs;

The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The multi-mode data retrieval method based on semantic association is characterized by comprising the following steps of:

The modal data information comprises a modal sample data volume balance coefficient and a modal data volume similarity degree anomaly coefficient, the modal sample data volume balance coefficient and the modal data volume similarity degree anomaly coefficient are respectively calibrated to be PH ^MT and XS ^MT after acquisition, the retrieval evaluation index information comprises a retrieval recall rate anomaly concealment coefficient, and the retrieval recall rate anomaly concealment coefficient is calibrated to be JS ^YC after acquisition;

，

S103, obtaining standard deviations of sample data amounts generated when the multi-mode data retrieval system operates at different moments in T time, and recalibrating the standard deviations of the sample data amounts to R _y, wherein y represents numbers of the standard deviations of the sample data amounts generated when the multi-mode data retrieval system operates at different moments in T time, and y=1, 2, 3, 4, … …, n and n are positive integers;

s201, converting all modal data into vector representation;

s203, for each mode, calculating the internal Euclidean distance of the mode;

wherein, aiv' is the number of corresponding elements of Aiv and other corresponding vectors, v represents the i-th mode on the same dimension, v=1, 2, 3, 4, … …, p is a positive integer;

The logic for retrieving recall anomaly concealment coefficients is as follows:

，

Wherein x1, x2 and x3 are respectively preset proportional coefficients of a modal sample data volume balance coefficient PH ^MT, a modal data volume similarity degree anomaly coefficient XS ^MT and a retrieval recall anomaly concealment coefficient JS ^YC, and x1, x2 and x3 are all larger than 0;

If P1 is greater than or equal to K1, generating a first running state signal;

if P1 is smaller than K1 and P2 is smaller than K2, generating a third running state signal;

2. The multi-modal data retrieval method based on semantic association according to claim 1, wherein when a first running state signal is obtained, a first-level accuracy early warning prompt is sent out to prompt relevant maintenance personnel that the accuracy is poor in semantic association modeling when a multi-modal data retrieval system runs, and the multi-modal data retrieval system needs to be maintained and optimized in time;