CN112579841A

CN112579841A - Multi-mode database establishing method, multi-mode database retrieving method and multi-mode database retrieving system

Info

Publication number: CN112579841A
Application number: CN202011542924.6A
Authority: CN
Inventors: 贾红; 李植民
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-03-30
Anticipated expiration: 2040-12-23
Also published as: CN112579841B

Abstract

The invention provides a multi-modal database establishing method, a multi-modal database retrieving method and a multi-modal database retrieving system, wherein the establishing method is used for acquiring mixed modal data; extracting data characteristics corresponding to the modal data from the mixed modal data to construct a mixed modal characteristic set; clustering the feature data in the mixed modal feature set according to the range of the number of the preset label categories to obtain sub-labels corresponding to the modal data and total labels corresponding to the mixed modal data under different preset label categories; and respectively calculating the score values of preset clustering evaluation indexes corresponding to the total labels under different preset label category numbers to determine the target label category number, and establishing a multi-mode database according to the sub-labels corresponding to the modal data under the target label category number and the total labels corresponding to the mixed modal data. The established database has the overall label based on the whole situation and the sub label based on the local situation under each mode, the accuracy of detail retrieval of the data is improved, and the method has good universality.

Description

Multi-mode database establishing method, multi-mode database retrieving method and multi-mode database retrieving system

Technical Field

The invention relates to the technical field of computers, in particular to a multi-mode database establishing method, a multi-mode database searching method and a multi-mode database searching system.

Background

Multimodal data refers to data collected under a variety of different devices or scenarios. Real-world datasets tend to be multi-modal, for example: a story can be described by text narration or by image or audio; a document may be represented in a number of different languages and may also be represented by user ratings, etc. The establishment of the multi-modal database aims to obtain important features and representative retrieval labels of the multi-modal data by analyzing and processing the multi-modal data, and establish the database which is convenient for subsequent data retrieval on the basis of the important features and the representative retrieval labels.

Because the multi-mode database can fully utilize complementarity among the multiple modes, the redundancy among the modes is eliminated, and compared with the traditional single-mode database, the multi-mode database can comprehensively reflect the authenticity of data, so that the requirement for establishing the multi-mode database is very urgent. However, existing multimodal databases are generally of the same data type at the data level, for example: all data are image data or audio data and the like, and when the tags are established in the multi-modal data, the number of categories of the sub-tags in each modality is set to be consistent with the number of categories of the total tags. However, the setting mode ignores the data characteristic information of the data in different modes, and the accuracy of detail retrieval of the data in the multi-mode database is influenced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a multimodal database establishing method, a multimodal database retrieving method, and a multimodal database retrieving system, so as to overcome a problem in the prior art that the multimodal database establishing method ignores data characteristic information of data in different modalities, so that accuracy of detail retrieval of the multimodal database is low.

The embodiment of the invention provides a method for establishing a multi-mode database, which comprises the following steps:

acquiring mixed modal data, wherein the mixed modal data comprises multi-modal data of different data types;

extracting data characteristics corresponding to the modal data from the mixed modal data to construct a mixed modal characteristic set;

clustering the feature data in the mixed modal feature set according to the range of the number of the preset label categories to obtain sub-labels corresponding to the modal data under different preset label categories and total labels corresponding to the mixed modal data;

respectively calculating the score values of preset clustering evaluation indexes corresponding to the total labels under different preset label category numbers;

and determining the category number of target labels based on the score value, and establishing a multi-mode database according to the sub-labels corresponding to the modal data under the category number of the target labels and the total labels corresponding to the mixed modal data.

Optionally, the extracting data features corresponding to each modality data from the mixed modality data to construct a mixed modality feature set includes:

classifying the mixed modal data based on the data type to obtain each single modal data;

acquiring preset feature extraction parameters of each single-mode data, and performing feature extraction on each single-mode data according to the preset feature extraction parameters to obtain feature data corresponding to each single-mode data;

and constructing the mixed modal feature set based on the feature data corresponding to each single modal data.

Optionally, the clustering the feature data in the mixed modality feature set according to the preset tag category number range to obtain sub tags corresponding to each modality data under different preset tag category numbers and a total tag corresponding to the mixed modality data includes:

acquiring the category number of the current preset labels;

clustering the characteristic data corresponding to each single-mode data to obtain a current sub-label corresponding to each single-mode data;

and clustering the current sub-labels corresponding to the modal data based on the current preset label category number to obtain the current total label of the mixed modal data.

Optionally, the clustering the feature data in the mixed modality feature set according to the preset tag category number range to obtain sub tags corresponding to each modality data under different preset tag category numbers and a total tag corresponding to the mixed modality data further includes:

respectively calculating the adjusted Land coefficient value of each current sub-label based on the current total label;

and performing dimension reduction updating on the feature data corresponding to each current sub-label which does not meet the preset adjustment Lande coefficient value, and clustering the updated feature data again to update the current sub-label corresponding to each monomodal data.

Optionally, the determining the number of categories of the target tags based on the score value, and establishing a multi-modal database according to the sub-tags corresponding to each modal data under the number of categories of the target tags and the total tags corresponding to the mixed modal data includes:

sorting the scoring values corresponding to the category numbers of the preset labels;

determining the maximum preset label category number of the scoring values as the target label category number;

and respectively adding a total tag and corresponding sub-tags to each modal data to establish a multi-modal database of the mixed modal data.

The embodiment of the invention also provides a multi-mode database retrieval method, which comprises the following steps:

acquiring a retrieval tag set, wherein the retrieval tag set comprises a plurality of retrieval tags;

and searching in the multi-modal database based on the retrieval tag set to obtain a retrieval result corresponding to the retrieval tag set, wherein the multi-modal database is established by adopting the multi-modal database establishing method according to another embodiment of the invention.

The embodiment of the invention also provides a multi-mode database establishing system, which comprises the following steps:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring mixed modal data, and the mixed modal data comprises multi-modal data of different data types;

the first processing module is used for extracting data features corresponding to the modal data from the mixed modal data to construct a mixed modal feature set;

the second processing module is used for clustering the feature data in the mixed modal feature set according to the range of the preset tag category number to obtain sub tags corresponding to the modal data under different preset tag category numbers and a total tag corresponding to the mixed modal data;

the third processing module is used for respectively calculating the score values of the preset clustering evaluation indexes corresponding to the total labels under different preset label category numbers;

and the fourth processing module is used for determining the category number of the target tags based on the score value and establishing a multi-mode database according to the sub-tags corresponding to the modal data under the category number of the target tags and the total tags corresponding to the mixed modal data.

The embodiment of the invention also provides a multi-modal database retrieval system, which is characterized by comprising the following components:

the second acquisition module is used for acquiring a retrieval tag set, and the retrieval tag set comprises a plurality of retrieval tags;

and a fifth processing module, configured to perform retrieval in the multi-modal database based on the retrieval tag set to obtain a retrieval result corresponding to the retrieval tag set, where the multi-modal database is a multi-modal database established by using the multi-modal database establishment system according to another embodiment of the present invention.

An embodiment of the present invention further provides an electronic device, including: the multi-modal database building method comprises a memory and a processor, wherein the memory and the processor are mutually connected in a communication mode, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the multi-modal database building method or the multi-modal database retrieval method provided by the embodiment of the invention.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to execute the multimodal database establishment method provided in the embodiment of the present invention, or execute the multimodal database retrieval method provided in the embodiment of the present invention.

The technical scheme of the invention has the following advantages:

the embodiment of the invention provides a method and a system for establishing a multi-modal database, wherein mixed modal data are obtained, and the mixed modal data comprise multi-modal data of different data types; extracting data characteristics corresponding to the modal data from the mixed modal data to construct a mixed modal characteristic set; clustering the feature data in the mixed modal feature set according to the range of the number of the preset label categories to obtain sub-labels corresponding to the modal data and total labels corresponding to the mixed modal data under different preset label categories; respectively calculating the score values of preset clustering evaluation indexes corresponding to the total labels under different preset label category numbers; and determining the category number of the target tags based on the score values, and establishing a multi-mode database according to the sub-tags corresponding to the modal data under the category number of the target tags and the total tags corresponding to the mixed modal data. Therefore, data features of modal data of each data type within the range of different preset tag types are extracted, sub-tags corresponding to the modal data and total tags of mixed modal data are constructed in a feature clustering mode, sub-tags and total tags of the modal data are determined by calculating the evaluation values of preset clustering evaluation indexes corresponding to different preset tag types to establish a multi-modal database, so that the database has the overall tags based on the whole situation and the sub-tags based on each local mode, the secondary tags can be used for data, the accuracy of detail retrieval of the data is improved, the universality is good, and the database comprising various different data types can be established.

The embodiment of the invention provides a multi-modal database retrieval method and a multi-modal database retrieval system. Therefore, the retrieval of the retrieval tag set is carried out by utilizing the multi-mode database which is based on the overall tags and the local sub-tags in each mode, the detail retrieval of the data stored in the database is facilitated, the accuracy of the retrieval result is improved, and the application range of the multi-mode database is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a multimodal database creation method in an embodiment of the invention;

FIG. 2 is a schematic diagram of a tag generation process in an embodiment of the invention;

FIG. 3 is a schematic diagram of a multimodal database created in an embodiment of the invention;

FIG. 4 is a flowchart of a multimodal database retrieval method in an embodiment of the invention

FIG. 5 is a schematic structural diagram of a multimodal database building system in an embodiment of the invention;

FIG. 6 is a schematic structural diagram of a multimodal database retrieval system in an embodiment of the invention;

fig. 7 is a schematic structural diagram of an electronic device in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical features mentioned in the different embodiments of the invention described below can be combined with each other as long as they do not conflict with each other.

Based on the above problem, an embodiment of the present invention provides a method for building a multi-modal database specifically for multi-modal data of different data types, as shown in fig. 1, the method for building a multi-modal database mainly includes the following steps:

step S101: mixed modality data is acquired.

In the embodiment of the present invention, the mixed modality data is exemplified by the multi-modality data composed of images, texts, user evaluations and audios, which is only taken as an example and is not limited thereto.

Step S102: and extracting data characteristics corresponding to the modal data from the mixed modal data to construct a mixed modal characteristic set.

The different modality data includes different types of data information, the data characteristics are typical characteristics reflecting the type of data, and the specific extracted characteristic types may be set according to actual needs, which is not limited in the present invention. In the embodiment of the present invention, the data features extracted for the image data include color distribution, texture, edge, histogram of directional gradient, and the like, the data features extracted for the text data are word frequency features, the data features extracted for the user evaluation data are frequencies of keywords, and the data features extracted for the audio data are spectrum features and the like.

Step S103: and clustering the feature data in the mixed modal feature set according to the preset tag category number range to obtain sub-tags corresponding to the modal data and total tags corresponding to the mixed modal data under different preset tag category numbers.

The preset tag category number range is based on the establishment requirement of the multi-modal database, and the determined database includes an approximate range of the total tag number, such as: the preset tag category number range is (5, 10), which indicates that the total tag number of all data partitions in the multi-modal database is 5 to 10.

Step S104: and respectively calculating the score values of the preset clustering evaluation indexes corresponding to the total labels under different preset label category numbers.

The preset clustering evaluation Index is used for evaluating the accuracy of the classification result of the multi-modal data under different preset label types, in the embodiment of the invention, a contour coefficient (Calinski-Harabasz Index, CH coefficient for short) is selected as the preset clustering evaluation Index, and in practical application, other clustering evaluation indexes such as Calinski Harabaz Index and the like can be selected, which is not limited by the invention.

Step S105: and determining the category number of the target tags based on the score values, and establishing a multi-mode database according to the sub-tags corresponding to the modal data under the category number of the target tags and the total tags corresponding to the mixed modal data.

Specifically, after sub-tags and total tags are respectively set for different modal data according to tag results under the target tag category number, the sub-tags are used as serial numbers to store mixed modal data, and a multi-modal database corresponding to the mixed modal data is obtained

Through the steps S101 to S105, the multi-modal database establishment method provided in the embodiment of the present invention performs data feature extraction on modal data of each data type within a range of different preset tag types, then establishes sub-tags corresponding to each modal data and total tags of mixed modal data in a feature clustering manner, and then establishes a multi-modal database by calculating the evaluation values of preset clustering evaluation indexes corresponding to different preset tag types to determine sub-tags and total tags of each modal data, so that the database has both a global-based total tag and a local-based sub-tag in each mode, and can be used for performing secondary tagging on data, thereby improving the accuracy of detail retrieval on data, and having good versatility, and being capable of establishing databases including various different data types.

Specifically, in an embodiment, the step S102 specifically includes the following steps:

step S201: and classifying the mixed modal data based on the data type to obtain each single modal data.

Specifically, the mixed-mode data is classified according to data types to obtain single-mode data, where each single-mode data represents one data type, such as: the mixed modality data is classified by several data types, image, text, audio, and user rating.

Step S202: and acquiring preset feature extraction parameters of each single-mode data, and performing feature extraction on each single-mode data according to the preset feature extraction parameters to obtain feature data corresponding to each single-mode data.

Step S203: and constructing a mixed modal feature set based on the feature data corresponding to each single modal data.

Wherein, a mixed modal feature set is constructed by representing the single modal data with feature data, such as: assuming that the single-mode data is image data, extracting features such as color distribution, texture, edges, histogram of directional gradients, etc. in the image data as the single-mode data representing the image data. The data size of the single-mode data is simplified through a characteristic extraction mode, subsequent clustering analysis is facilitated, the calculation rate is improved, and the sub-labels corresponding to the mode data are obtained.

Specifically, in an embodiment, the step S103 specifically includes the following steps:

step S301: and acquiring the category number of the current preset labels.

Wherein the current preset tag class number k belongs to the preset tag class number range k ∈ [ Min _ k-Max _ k ].

Step S302: and clustering the characteristic data corresponding to each single-mode data to obtain the current sub-label corresponding to each single-mode data.

Specifically, a single-view clustering algorithm is performed on each single-mode data to obtain a sub-label L of each mode¹ ^-4. In practical applications, the clustering algorithm may be selected according to the data type of the monomodal data, which is not limited in the present invention. In the embodiment of the invention, a distance-based clustering algorithm is used for images and audios, and a cosine similarity-based clustering algorithm is used for text and user evaluation. The category number of the sub-labels under each mode is selected as the range [ Min _ k-Max _ k ] of the preset label category number]Random value of (2).

Step S303: and clustering the current sub-labels corresponding to the modal data based on the current preset label category number to obtain the current total label of the mixed modal data.

Specifically, by tagging each modality with a sub-label L^1-4And as an input, obtaining an overall total label L by using a multi-view clustering algorithm, wherein the category number of the total label L is the current preset label category number k. In the embodiment of the present invention, the total label L is obtained by using a multi-view clustering algorithm based on a co-occurrence matrix, and in practical application, the multi-view clustering algorithm may be selected from the prior art according to actual needs, which is not limited to the present invention.

Step S304: based on the current total label, the adjusted reed coefficient values of each current sub-label are calculated respectively.

Step S305: and performing dimension reduction updating on the feature data corresponding to each current sub-label which does not meet the preset adjustment Lande coefficient value, and clustering the updated feature data again to update the current sub-label corresponding to each monomodal data.

Specifically, each sub-label L is paired based on the current total label L^1-4And performing collaborative learning. The specific implementation mode is that the supervised linear dimensionality reduction is carried out on the single-mode data of which the adjusted Lande coefficient index of the sub-label is lower than the average value based on L to obtain new single-mode data, and the new sub-label is obtained by carrying out cluster analysis again. Until a stopping criterion is fulfilled, i.e. characteristic data samplesThe class relationship between the labels does not change any more, and the obtained labels are the sub-label and the total label corresponding to the circulation result. The above steps are then repeated until the number of categories k of total labels has gone through [ Min _ k-Max _ k [ ]]Fig. 2 is a schematic diagram of the label generation process.

Specifically, in an embodiment, the step S105 specifically includes the following steps:

step S501: and sorting the scoring values corresponding to the category numbers of the preset labels.

Step S502: and determining the number of the preset label categories with the maximum scoring values as the number of the target label categories.

Specifically, calculating a CH coefficient of a total label of a cycle result corresponding to each preset label category number, taking the CH coefficient as a vertical coordinate, taking [ Min _ k-Max _ k ] as a horizontal coordinate, and drawing a line graph; and (3) selecting a k value at an 'inflection point' in the graph by using an 'Elbow' method (Elbow method), wherein the corresponding obtained sub-label and total label are the final result.

Step S503: and respectively adding a total tag and corresponding sub-tags to each modal data to establish a multi-modal database of mixed modal data.

Specifically, the multi-modal database is built by storing mixed modal data, storing a general tag for each piece of data, and storing a respective sub tag under each mode of each piece of data, and fig. 3 is a schematic diagram of the multi-modal database built according to the embodiment of the present invention. By setting the preset label category number range and utilizing the process of collaborative learning between the sub-labels and the general labels, the accuracy of data sub-label and general label classification in the multi-modal database is improved, and the accuracy of a retrieval result is improved when the established multi-modal database is subjected to label retrieval.

In practical application, the established multi-modal database is assumed to comprise tens of thousands of pieces of data, and each piece of data has a corresponding image modality and a corresponding text modality; assuming that the total tags of the database point to 6 different people, i.e. the number of categories is 6, for image data in a single modality, its sub-tags may point to 7 types of expressions, i.e. the number of categories is 7, for text data in a single modality, its sub-tags may be 8 types of subjects; so in this database, if a piece of data, its overall label is 'Li Ming', while its sub-label in the image modality may be 'Happy' and the sub-label in the text modality is 'Play'. Therefore, the labels in the multi-modal database established by the embodiment of the invention are not only global-based total labels, but also local-based sub labels in each modality, and can be used for carrying out secondary labeling on data, thereby being beneficial to carrying out secondary retrieval on the data in the future. The number of the sub-tags of the single-mode data obtained by the embodiment of the invention is indefinite, so that the method and the device are more in line with practical significance. In addition, the multi-mode database established by the embodiment of the invention has good universality, can be used for various types of data, and can be suitable for data such as texts, images, videos and the like.

The embodiment of the invention also provides a multi-modal database retrieval method, as shown in fig. 4, the multi-modal database retrieval method comprises the following steps:

step S1: and acquiring a retrieval tag set, wherein the retrieval tag set comprises a plurality of retrieval tags.

Specifically, the retrieval tag set is composed of a first-level tag (corresponding to a general tag corresponding to data in the multi-modal database) and a plurality of second-level tags (corresponding to sub-tags corresponding to data in the multi-modal database).

Step S2: and searching in the multi-modal database based on the retrieval tag set to obtain a retrieval result corresponding to the retrieval tag set, wherein the multi-modal database is established by the multi-modal database establishing method provided by another embodiment of the invention.

Specifically, in the multimodal database established in another embodiment of the present invention, each piece of data includes a total tag and a sub-tag, so that the data can be retrieved from the multimodal database for the second time by using the retrieval tag set, that is, the data corresponding to each sub-tag is obtained, and the user experience is improved.

By executing the steps, the retrieval method of the multi-modal database provided by the embodiment of the invention utilizes the multi-modal database which is based on the overall tags and the local sub-tags in each mode to retrieve the tag set, thereby being beneficial to the detail retrieval of the data stored in the database, improving the accuracy of the retrieval result and further improving the application range of the multi-modal database.

An embodiment of the present invention further provides a multimodal database establishment system, as shown in fig. 5, the multimodal database establishment system includes:

the first obtaining module 101 is configured to obtain mixed-modality data, where the mixed-modality data includes multi-modality data of different data types. For details, refer to the related description of step S101 in the above method embodiment.

The first processing module 102 is configured to extract data features corresponding to the modality data from the mixed modality data, and construct a mixed modality feature set. For details, refer to the related description of step S102 in the above method embodiment.

The second processing module 103 is configured to cluster the feature data in the mixed modality feature set according to the preset tag category number range, so as to obtain sub tags corresponding to the modality data under different preset tag category numbers and a total tag corresponding to the mixed modality data. For details, refer to the related description of step S103 in the above method embodiment.

The third processing module 104 is configured to calculate score values of preset clustering evaluation indexes corresponding to the total tags under different preset tag category numbers, respectively. For details, refer to the related description of step S104 in the above method embodiment.

The fourth processing module 105 is configured to determine the number of categories of the target tags based on the score value, and establish a multi-modal database according to the sub-tags corresponding to each modal data and the total tags corresponding to the mixed modal data under the number of categories of the target tags. For details, refer to the related description of step S105 in the above method embodiment.

Through the cooperative cooperation of the above components, the multi-modal database establishment system provided by the embodiment of the invention performs data feature extraction on modal data of each data type within the range of different preset tag types, then establishes sub-tags corresponding to each modal data and total tags of mixed modal data in a feature clustering manner, and then establishes the multi-modal database by determining sub-tags and total tags of each modal data through calculating the evaluation values of preset clustering evaluation indexes corresponding to different preset tag types, so that the database has both global-based total tags and local-based sub-tags in each mode, and can be used for performing secondary tagging on data, thereby improving the accuracy of detail retrieval on data, having good universality, and being capable of establishing databases including various different data types.

An embodiment of the present invention further provides a multi-modal database retrieval system, as shown in fig. 6, the multi-modal database retrieval system includes:

the second obtaining module 1 is configured to obtain a search tag set, where the search tag set includes a plurality of search tags. For details, reference is made to the description relating to step S1 in the above method embodiment.

The fifth processing module 2 is configured to perform retrieval in the multimodal database based on the retrieval tag set to obtain a retrieval result corresponding to the retrieval tag set, where the multimodal database is a multimodal database established by using the multimodal database establishing system provided by another embodiment of the present invention. For details, reference is made to the description relating to step S2 in the above method embodiment.

Through the cooperative cooperation of the above components, the retrieval system of the multi-modal database provided by the embodiment of the invention utilizes the multi-modal database based on the overall tags and the local sub-tags in each mode to retrieve the tag set, thereby facilitating the detail retrieval of the data stored in the database, improving the accuracy of the retrieval result and further improving the application range of the multi-modal database.

There is also provided an electronic device according to an embodiment of the present invention, as shown in fig. 7, the electronic device may include a processor 901 and a memory 902, where the processor 901 and the memory 902 may be connected by a bus or in another manner, and fig. 7 illustrates an example of a connection by a bus.

Processor 901 may be a Central Processing Unit (CPU). The Processor 901 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 902, which is a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in the method embodiments of the present invention. The processor 901 executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory 902, that is, implements the methods in the above-described method embodiments.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 901, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 902, which when executed by the processor 901 performs the methods in the above-described method embodiments.

The specific details of the electronic device may be understood by referring to the corresponding related descriptions and effects in the above method embodiments, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A method for multimodal database creation, comprising:

2. The method according to claim 1, wherein the extracting data features corresponding to each modality data from the mixed modality data to construct a mixed modality feature set comprises:

3. The method according to claim 2, wherein the clustering the feature data in the mixed modality feature set according to the preset tag category number range to obtain sub tags corresponding to each modality data and a total tag corresponding to the mixed modality data under different preset tag category numbers comprises:

acquiring the category number of the current preset labels;

4. The method according to claim 3, wherein the clustering is performed on the feature data in the mixed modality feature set according to a preset tag category number range to obtain sub tags corresponding to each modality data and a total tag corresponding to the mixed modality data under different preset tag category numbers, further comprising:

5. The method according to claim 1, wherein the determining a number of target label categories based on the score value, and establishing a multi-modal database according to the sub-labels corresponding to the modal data under the number of target label categories and the general label corresponding to the mixed modal data comprises:

6. A multimodal database retrieval method, comprising:

and searching in the multi-modal database based on the retrieval tag set to obtain a retrieval result corresponding to the retrieval tag set, wherein the multi-modal database is established by adopting the multi-modal database establishing method according to any one of claims 1 to 5.

7. A multimodal database building system, comprising:

8. A multimodal database retrieval system, comprising:

a fifth processing module, configured to perform a search in the multi-modal database based on the search tag set to obtain a search result corresponding to the search tag set, where the multi-modal database is the multi-modal database established by using the multi-modal database establishment system according to claim 7.

9. An electronic device, comprising:

a memory and a processor communicatively coupled to each other, the memory having stored therein computer instructions, the processor performing the method of any one of claims 1-5 or performing the method of claim 6 by executing the computer instructions.

10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-5 or to perform the method of claim 6.