CN117786137A - Method, device and equipment for inquiring multimedia data and readable storage medium - Google Patents

Method, device and equipment for inquiring multimedia data and readable storage medium Download PDF

Info

Publication number
CN117786137A
CN117786137A CN202311809163.XA CN202311809163A CN117786137A CN 117786137 A CN117786137 A CN 117786137A CN 202311809163 A CN202311809163 A CN 202311809163A CN 117786137 A CN117786137 A CN 117786137A
Authority
CN
China
Prior art keywords
data
tag
audio
image
multimedia data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311809163.XA
Other languages
Chinese (zh)
Inventor
张生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Transtrue Technology Co ltd
Original Assignee
Beijing Transtrue Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Transtrue Technology Co ltd filed Critical Beijing Transtrue Technology Co ltd
Priority to CN202311809163.XA priority Critical patent/CN117786137A/en
Publication of CN117786137A publication Critical patent/CN117786137A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a multimedia data query method, a device, equipment and a readable storage medium, wherein similarity calculation is carried out on search keywords and various classification words to obtain target classification words. And acquiring candidate multimedia data based on the target tag set, and acquiring the tag set of the candidate multimedia data as a tag set to be compared. And obtaining the matching degree of the tag set to be compared and the search keyword set. And if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a search result. Because the classifying words are obtained by classifying the tags of the multimedia data with a plurality of data types, and the candidate multimedia data are multimedia data with at least one tag belonging to a tag set of target classification, the method and the device can be used for associating the multimedia data with different data types by classifying the tags of the multimedia data with a plurality of data types in advance, so that the high-efficiency retrieval of the multimedia data with the cross data types is realized.

Description

Method, device and equipment for inquiring multimedia data and readable storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for querying multimedia data.
Background
With the development of computer technology, the scale of various multimedia data such as images, audios and videos in the internet is also larger and larger, and various multimedia data are often stored in different multimedia databases in a classified manner, so that the prior art mainly aims at single retrieval of various multimedia data, and how to quickly retrieve target data from a large-scale cross-type database is a problem to be solved urgently at present.
Disclosure of Invention
The application provides a multimedia data query method, a device, equipment and a readable storage medium, wherein the method comprises the following steps:
a multimedia data query method, comprising:
acquiring a search keyword set, wherein the search keyword set comprises at least one search keyword;
performing similarity calculation on the search keywords and each classified word to obtain target classified words, wherein each classified word is obtained by classifying tags of multimedia data of a plurality of data types, and the data types at least comprise images, audios and videos;
acquiring candidate multimedia data based on a target tag set, wherein the candidate multimedia data is multimedia data with at least one tag belonging to the target tag set, and the target tag set is a tag set corresponding to the target classification;
Acquiring a tag set of the candidate multimedia data as a tag set to be compared;
obtaining the matching degree of the tag set to be compared and the search keyword set;
and if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a search result.
Optionally, the acquiring the search keyword set includes:
acquiring data to be retrieved, wherein the data to be retrieved comprises one or more of image data, text data, audio data and video data;
identifying at least one keyword of the data to be searched as a search keyword;
and obtaining the search keyword set based on each search keyword.
Optionally, the multimedia data query method further includes:
acquiring multimedia data of a plurality of data types;
respectively acquiring a plurality of labels of each multimedia data to obtain a label set of each multimedia data;
classifying the labels of all the data types of the multimedia data to obtain a plurality of classified words and label sets corresponding to the classified words, wherein the label sets of the classified words comprise a plurality of labels belonging to the classified words;
correspondingly storing the classified words and a label set of the classified words;
The identification and tag set of the multimedia data are correspondingly stored.
Optionally, acquiring the tag set of the image data includes:
recognizing characters in the image data by using an optical character recognition OCR technology to obtain character data of the image data, extracting keywords of the character data of the image data, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels;
identifying object elements in the image data by using an image identification technology, identifying image characteristics of the object elements, and obtaining image tags based on the image characteristics;
and acquiring a tag set of the image data, wherein the tag set of the image data comprises a text tag and an image tag of the image data.
Optionally, acquiring a tag set of the audio data includes:
extracting words in the audio data by using a voice recognition technology to obtain word data of the audio data, extracting keywords of the word data of the audio data as word labels, wherein the keywords of the word data comprise semantic keywords and emotion keywords;
acquiring audio characteristics of the audio data, and acquiring an audio tag based on the audio characteristics of the audio data;
And acquiring a tag set of the audio data, wherein the tag set of the audio data comprises a text tag and an audio tag of the audio data.
Optionally, obtaining a tag set of the video data includes:
extracting the audio of the video data as audio to be identified;
extracting characters in the audio to be identified by using a voice identification technology to obtain character data of the audio to be identified, extracting keywords of the character data of the audio to be identified, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels;
acquiring audio characteristics of the audio to be identified, and acquiring an audio tag based on the audio characteristics of the audio to be identified;
extracting an image of a preset key frame of the video data to serve as an image to be identified;
recognizing characters in the image to be recognized by using an optical character recognition OCR technology to obtain character data of the image to be recognized, extracting keywords of the character data of the image to be recognized as character labels, wherein the keywords of the character data comprise semantic keywords and emotion keywords;
identifying object elements in the image to be identified by using an image identification technology, identifying image characteristics of the object elements, and obtaining an image tag based on the image characteristics;
And acquiring a tag set of the video data, wherein the tag set of the video data comprises a text tag and an audio tag of the audio to be identified, and a text tag and an image tag of the image to be identified.
Optionally, obtaining the matching degree of the tag set to be compared and the search keyword set includes:
acquiring the text similarity of the tag set to be compared and the search keyword set;
acquiring the association degree of the candidate multimedia data and the search keyword set, wherein the association degree is positively correlated with the times of taking the candidate multimedia data as the search result of the search keyword set;
and determining the matching degree of the tag set to be compared and the search keyword set based on the text similarity and the association degree, wherein the matching degree is positively correlated with the text similarity and the association degree respectively.
A multimedia data query apparatus comprising:
a search information acquisition unit configured to acquire a search keyword set including at least one search keyword;
the classifying and searching unit is used for carrying out similarity calculation on the search keywords and each classifying word to obtain target classifying words, wherein each classifying word is obtained by classifying tags of multimedia data of a plurality of data types, and the data types at least comprise images, audios and videos;
The data screening unit is used for acquiring candidate multimedia data based on a target tag set, wherein the candidate multimedia data is multimedia data with at least one tag belonging to the target tag set, and the target tag set is a tag set corresponding to the target classification;
the to-be-compared data acquisition unit is used for acquiring a tag set of the candidate multimedia data as a tag set to be compared;
the matching unit is used for acquiring the matching degree of the tag set to be compared and the search keyword set;
and the search result acquisition unit is used for determining the candidate multimedia data as a search result if the matching degree is greater than a preset matching degree threshold value.
A multimedia data query device, comprising: a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the multimedia data query method as described above.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a multimedia data query method as described above.
As can be seen from the above technical solutions, the method, apparatus, device, and readable storage medium for querying multimedia data provided in the embodiments of the present application acquire a search keyword set, where the search keyword set includes at least one search keyword. And carrying out similarity calculation on the search keywords and each classification word to obtain target classification words, acquiring candidate multimedia data based on a target label set, and acquiring a label set of the candidate multimedia data as a label set to be compared. And obtaining the matching degree of the tag set to be compared and the search keyword set. And if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a search result. Because the classifying words are obtained by classifying the tags of the multimedia data with a plurality of data types, and the candidate multimedia data are multimedia data with at least one tag belonging to a tag set of target classification, the method and the device can be used for associating the multimedia data with different data types by classifying the tags of the multimedia data with a plurality of data types in advance, so that the high-efficiency retrieval of the multimedia data with the cross data types is realized.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
Fig. 1 is a schematic flow chart of a specific implementation of a multimedia data query method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a specific implementation of a tag obtaining method according to an embodiment of the present application;
fig. 3 is a flow chart of a method for querying multimedia data according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a multimedia data query device according to an embodiment of the present application;
Fig. 5 is a schematic structural diagram of a multimedia data query device according to an embodiment of the present application.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure.
It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure. The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. The references to "a" and "an" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The multimedia data query method provided by the embodiment of the application is applied to but not limited to a scene of retrieving target multimedia data from a plurality of multimedia databases storing different data types based on information to be retrieved, and is applicable to mobile equipment (such as a mobile phone or a tablet), a computer PC or a server (comprising a single point server or a server cluster).
Fig. 1 is a specific implementation flow of a multimedia data query method provided in an embodiment of the present application, where, as shown in fig. 1, the method includes:
s101, acquiring multimedia data of a plurality of data types.
In this embodiment, the data types include at least image, audio and video, and the multimedia data of a plurality of data types are respectively obtained from a plurality of multimedia databases storing different data types, for example, the image data is obtained from the image database.
S102, respectively acquiring a plurality of labels of each multimedia data to obtain a label set of each multimedia data.
In this embodiment, the plurality of tags included in the tag set of the multimedia data are used for characterizing the multimedia data from different dimensions, where the tags include emotion tags, content tags, and the like.
For multimedia data of different data types, the method for extracting the tag is different, and fig. 2 is a tag obtaining method provided in an embodiment of the present application, as shown in fig. 2, the method includes:
S201, characters in the image data are recognized by using an optical character recognition OCR technology, character data of the image data are obtained, and keywords of the character data of the image data are extracted to serve as character labels.
In this embodiment, the keywords of the text data include semantic keywords and emotion keywords, optionally, the method for extracting the keywords of the text data of the image data includes using a semantic recognition model to recognize the semantic keywords of the text data, and using an emotion recognition model to recognize the emotion keywords of the text data, and it should be noted that an artificial intelligent model for recognizing the semantic keywords or emotion keywords is trained in advance, and the model result and training method can be seen in the prior art.
S202, identifying object elements in the image data by using an image identification technology, identifying image characteristics of the object elements, and obtaining image labels based on the image characteristics.
In this embodiment, the object elements include elements such as key objects and key scenes in the picture, and the image features of the object elements include, but are not limited to, content features such as object names and object positional relationships. Alternatively, the image feature is taken as an image tag, for example, the object element includes mountain and house, and the image feature of the object element includes mountain and house, and the image tag includes a scenery map, mountain and house, and the like.
The image recognition technology and the recognition method of the image features of the object element can be referred to the prior art.
S203, acquiring a label set of the image data.
In this embodiment, the tag set of the image data includes a text tag and an image tag of the image data.
S204, extracting characters in the audio data by using a voice recognition technology to obtain character data of the audio data, and extracting keywords of the character data of the audio data as character labels.
S205, acquiring audio characteristics of the audio data, and obtaining an audio tag based on the audio characteristics of the audio data.
In this embodiment, the audio features refer to time domain features, tone features, and music theory features, and the audio tags include, but are not limited to, tone tags, emotion tags, and audio type (American sound, ethnic style, etc.) tags.
S206, acquiring a tag set of the audio data.
In this embodiment, the tag set of the audio data includes text tags and audio tags of the audio data.
S207, extracting the audio of the video data as the audio to be identified.
S208, extracting characters in the audio to be identified by using a voice identification technology, obtaining character data of the audio to be identified, and extracting keywords of the character data of the audio to be identified as character labels.
S209, acquiring audio characteristics of the audio to be identified, and acquiring an audio tag based on the audio characteristics of the audio to be identified.
S210, extracting images of preset key frames of video data to serve as images to be identified.
In this embodiment, the preset key frame may be acquired based on a preset acquisition frequency. For example, image frames are acquired every 5 seconds.
S211, recognizing characters in the image to be recognized by using an optical character recognition OCR technology, obtaining character data of the image to be recognized, and extracting keywords of the character data of the image to be recognized as character labels.
S212, identifying object elements in the image to be identified by using an image identification technology, identifying image characteristics of the object elements, and obtaining an image tag based on the image characteristics.
S213, acquiring a tag set of the video data.
In this embodiment, the tag set of the video data includes a text tag and an audio tag of the audio to be recognized, and a text tag and an image tag of the image to be recognized.
It should be noted that, in the method, text data (i.e., text structured data) of multimedia data with different data types may also be stored correspondingly to the identifier of the multimedia data.
In summary, for each type of multimedia data, tag extraction of the multimedia data is achieved by acquiring different types of data (text, image and audio) in the multimedia data, and the obtained tag set carries out multidimensional expression on the multimedia data, so that the accuracy of the tag is improved.
S103, classifying the labels of all the data types of the multimedia data to obtain a plurality of classified words and label sets corresponding to the classified words.
In this embodiment, the tag set of the classification word includes a plurality of tags belonging to the classification word, and the specific classification method includes a plurality of types, for example, clustering all the tags by using a k-means clustering algorithm or other clustering algorithms to obtain a plurality of clusters, obtaining the classification word of each cluster, and forming the tag set corresponding to the classification word by the tags belonging to the cluster corresponding to the classification word. Specific implementation methods of specific clustering algorithms can be seen in the prior art.
S104, correspondingly storing the classified words and the label set of the classified words.
S105, correspondingly storing the identification and the label set of the multimedia data.
S106, acquiring data to be retrieved.
In this embodiment, the data to be retrieved includes one or more of image data, text data, audio data, and video data.
S107, identifying at least one keyword of the data to be retrieved as a retrieval keyword.
In this embodiment, the method for identifying at least one keyword of the data to be retrieved may refer to S201 to S213, where the data to be retrieved is used as multimedia data, and a tag of the data to be retrieved is obtained as a keyword.
S108, similarity calculation is carried out on the search keywords in the search keyword set and each classification word, and the target classification word is obtained.
In this embodiment, the search keyword set includes at least one search keyword.
And S109, acquiring candidate multimedia data based on the target tag set.
In this embodiment, the candidate multimedia data is multimedia data having at least one tag belonging to a target tag set, where the target tag set is a tag set corresponding to a target classification.
In this embodiment, the identifier of the multimedia data corresponding to the target tag set is obtained by storing the identifier of the multimedia data and the tag set in advance.
S110, acquiring a tag set of the candidate multimedia data as a tag set to be compared.
In this embodiment, the tag set of the candidate multimedia data is obtained through the identification of the multimedia data corresponding to the target tag set.
S111, obtaining the matching degree of the tag set to be compared and the search keyword set.
In this embodiment, the method for obtaining the matching degree between the tag set to be compared and the search keyword set includes: and obtaining the text similarity of the tag set to be compared and the search keyword set, and obtaining the association degree of the candidate multimedia data and the search keyword set. And determining the matching degree of the tag set to be compared and the search keyword set based on the text similarity and the association degree, wherein the matching degree is positively correlated with the text similarity and the association degree respectively, and the association degree is positively correlated with the number of times of taking the candidate multimedia data as the search result of the search keyword set.
And S112, if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a search result.
According to the technical scheme, the multimedia data query method provided by the embodiment of the application generates text structured data and various types of labels by preprocessing the multimedia data of various data types in advance, so that the retrieval and analysis efficiency is improved. By classifying the labels of the multimedia data of different types, the associated retrieval of the multimedia data is realized. And the retrieval accuracy and efficiency are continuously improved by adopting a retrieval performance optimization method based on user feedback and deep learning.
It should be noted that, S101 to S112 are only an optional implementation procedure of the multimedia data query method provided in the embodiment of the present application, and other implementation procedures are also included in the embodiment of the present application, for example, S102 is only an optional implementation procedure of acquiring a tag set of each multimedia data, S201 to S203 are only an optional implementation procedure of acquiring a tag set of image data, S204 to S206 are only an optional implementation procedure of acquiring a tag set of audio data, S207 to S213 are only an optional implementation procedure of acquiring a tag set of video data, and as shown in fig. 2, the method is not limited to the order of acquiring a tag set of video data, acquiring a tag set of image data, and acquiring a tag set of audio data. For another example, in an alternative embodiment, the method further includes, after S112: and displaying the search result from high to low according to the matching degree. For another example, in an alternative implementation method, the method constructs a matching degree model through a search algorithm, the matching degree model obtains the matching degree of the tag set to be compared and the search keyword set through a preset search algorithm, and then the method further comprises a search performance optimization flow, and the specific flow of the query method of the multimedia data is optimized according to user feedback and a user-defined strategy. The method specifically comprises the following steps:
Optimization based on user feedback: and when the search result is returned, providing a user evaluation function, collecting satisfaction information of the user on the search result, analyzing the user satisfaction data, and adjusting parameters of a search algorithm.
Automatic optimization based on deep learning: training a retrieval model by using a deep learning technology, learning the relation among keywords, labels, multimedia data and internal structures thereof, and optimizing a retrieval result by using the trained retrieval model in the retrieval process.
In summary, the method for querying multimedia data provided in the embodiments of the present application may be summarized as a flowchart of a method for querying multimedia data shown in fig. 3, and as shown in fig. 3, the method may include S301 to S305.
S301, acquiring a search keyword set.
In this embodiment, the search keyword set includes at least one search keyword.
S302, similarity calculation is carried out on the search keywords and each classified word, and the target classified word is obtained.
In this embodiment, each classification word is obtained by classifying tags of multimedia data of a plurality of data types, where the data types include at least images, audio, and video.
S303, acquiring candidate multimedia data based on the target tag set.
In this embodiment, the candidate multimedia data is multimedia data having at least one tag belonging to a target tag set, where the target tag set is a tag set corresponding to a target classification.
S304, acquiring a tag set of the candidate multimedia data as a tag set to be compared.
S305, obtaining the matching degree of the tag set to be compared and the search keyword set.
And S306, if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a retrieval result.
According to the technical scheme, the multimedia data query method provided by the embodiment of the application obtains the search keyword set, wherein the search keyword set comprises at least one search keyword. And carrying out similarity calculation on the search keywords and each classification word to obtain target classification words, acquiring candidate multimedia data based on a target label set, and acquiring a label set of the candidate multimedia data as a label set to be compared. And obtaining the matching degree of the tag set to be compared and the search keyword set. And if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a search result. Because the classifying words are obtained by classifying the tags of the multimedia data with a plurality of data types, and the candidate multimedia data are multimedia data with at least one tag belonging to a tag set of target classification, the method and the device can be used for associating the multimedia data with different data types by classifying the tags of the multimedia data with a plurality of data types in advance, so that the high-efficiency retrieval of the multimedia data with the cross data types is realized.
Fig. 4 is a schematic structural diagram of a multimedia data query device according to an embodiment of the present application, where, as shown in fig. 4, the device may include:
a search information acquisition unit 401 for acquiring a search keyword set including at least one search keyword;
a classification search unit 402, configured to perform similarity calculation on the search keyword and each classification word to obtain a target classification word, where each classification word is obtained by classifying tags of multimedia data of a plurality of data types, where the data types at least include images, audio, and video;
a data filtering unit 403, configured to obtain candidate multimedia data based on a target tag set, where the candidate multimedia data is multimedia data having at least one tag belonging to the target tag set, and the target tag set is a tag set corresponding to the target classification;
a to-be-compared data obtaining unit 404, configured to obtain a tag set of the candidate multimedia data as a to-be-compared tag set;
a matching unit 405, configured to obtain a matching degree of the tag set to be compared and the search keyword set;
And the search result obtaining unit 406 is configured to determine the candidate multimedia data as a search result if the matching degree is greater than a preset matching degree threshold.
The retrieval information acquisition unit is configured to acquire a retrieval keyword set, including: the retrieval information acquisition unit is specifically configured to:
acquiring data to be retrieved, wherein the data to be retrieved comprises one or more of image data, text data, audio data and video data; identifying at least one keyword of the data to be searched as a search keyword; and obtaining the search keyword set based on each search keyword.
Optionally, the multimedia data query device further includes a tag library construction unit, configured to obtain multimedia data of a plurality of data types; respectively acquiring a plurality of labels of each multimedia data to obtain a label set of each multimedia data; classifying the labels of all the data types of the multimedia data to obtain a plurality of classified words and label sets corresponding to the classified words, wherein the label sets of the classified words comprise a plurality of labels belonging to the classified words; correspondingly storing the classified words and a label set of the classified words; the identification and tag set of the multimedia data are correspondingly stored.
Optionally, the tag library construction unit is configured to obtain a tag set of the image data, and includes: the tag library construction unit is specifically configured to:
recognizing characters in the image data by using an optical character recognition OCR technology to obtain character data of the image data, extracting keywords of the character data of the image data, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels; identifying object elements in the image data by using an image identification technology, identifying image characteristics of the object elements, and obtaining image tags based on the image characteristics; and acquiring a tag set of the image data, wherein the tag set of the image data comprises a text tag and an image tag of the image data.
Optionally, the tag library construction unit is configured to obtain a tag set of the audio data, including: the tag library construction unit is specifically configured to:
extracting words in the audio data by using a voice recognition technology to obtain word data of the audio data, extracting keywords of the word data of the audio data as word labels, wherein the keywords of the word data comprise semantic keywords and emotion keywords; acquiring audio characteristics of the audio data, and acquiring an audio tag based on the audio characteristics of the audio data; and acquiring a tag set of the audio data, wherein the tag set of the audio data comprises a text tag and an audio tag of the audio data.
Optionally, the tag library construction unit is configured to obtain a tag set of video data, including: the tag library construction unit is specifically configured to:
extracting the audio of the video data as audio to be identified; extracting characters in the audio to be identified by using a voice identification technology to obtain character data of the audio to be identified, extracting keywords of the character data of the audio to be identified, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels; acquiring audio characteristics of the audio to be identified, and acquiring an audio tag based on the audio characteristics of the audio to be identified; extracting an image of a preset key frame of the video data to serve as an image to be identified; recognizing characters in the image to be recognized by using an optical character recognition OCR technology to obtain character data of the image to be recognized, extracting keywords of the character data of the image to be recognized as character labels, wherein the keywords of the character data comprise semantic keywords and emotion keywords; identifying object elements in the image to be identified by using an image identification technology, identifying image characteristics of the object elements, and obtaining an image tag based on the image characteristics; and acquiring a tag set of the video data, wherein the tag set of the video data comprises a text tag and an audio tag of the audio to be identified, and a text tag and an image tag of the image to be identified.
Optionally, the matching unit is configured to obtain a matching degree of the tag set to be compared and the search keyword set, and includes: the matching unit is specifically used for:
acquiring the text similarity of the tag set to be compared and the search keyword set; acquiring the association degree of the candidate multimedia data and the search keyword set, wherein the association degree is positively correlated with the times of taking the candidate multimedia data as the search result of the search keyword set; and determining the matching degree of the tag set to be compared and the search keyword set based on the text similarity and the association degree, wherein the matching degree is positively correlated with the text similarity and the association degree respectively.
It should be noted that, the units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware. The name of the unit does not constitute a limitation of the unit itself in some cases, and for example, a "matching unit" may also be described as "a unit for acquiring matching degree of a tag set to be compared and a search keyword set".
Fig. 5 shows a schematic structural diagram of the multimedia data query apparatus, which may include: at least one processor 501, at least one communication interface 502, at least one memory 503, and at least one communication bus 504;
In the embodiment of the present application, the number of the processor 501, the communication interface 502, the memory 503, and the communication bus 504 is at least one, and the processor 501, the communication interface 502, and the memory 503 complete communication with each other through the communication bus 504;
the processor 501 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 503 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), etc., such as at least one magnetic disk memory;
the memory stores a program, and the processor may execute the program stored in the memory to implement each step of the multimedia data query method provided in the embodiment of the present application, as follows:
acquiring a search keyword set, wherein the search keyword set comprises at least one search keyword;
performing similarity calculation on the search keywords and each classified word to obtain target classified words, wherein each classified word is obtained by classifying tags of multimedia data of a plurality of data types, and the data types at least comprise images, audios and videos;
Acquiring candidate multimedia data based on a target tag set, wherein the candidate multimedia data is multimedia data with at least one tag belonging to the target tag set, and the target tag set is a tag set corresponding to the target classification;
acquiring a tag set of the candidate multimedia data as a tag set to be compared;
obtaining the matching degree of the tag set to be compared and the search keyword set;
and if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a search result.
Optionally, the acquiring the search keyword set includes: acquiring data to be retrieved, wherein the data to be retrieved comprises one or more of image data, text data, audio data and video data; identifying at least one keyword of the data to be searched as a search keyword; and obtaining the search keyword set based on each search keyword.
Optionally, the multimedia data query method further includes: acquiring multimedia data of a plurality of data types; respectively acquiring a plurality of labels of each multimedia data to obtain a label set of each multimedia data; classifying the labels of all the data types of the multimedia data to obtain a plurality of classified words and label sets corresponding to the classified words, wherein the label sets of the classified words comprise a plurality of labels belonging to the classified words; correspondingly storing the classified words and a label set of the classified words; the identification and tag set of the multimedia data are correspondingly stored.
Optionally, acquiring the tag set of the image data includes: recognizing characters in the image data by using an optical character recognition OCR technology to obtain character data of the image data, extracting keywords of the character data of the image data, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels; identifying object elements in the image data by using an image identification technology, identifying image characteristics of the object elements, and obtaining image tags based on the image characteristics; and acquiring a tag set of the image data, wherein the tag set of the image data comprises a text tag and an image tag of the image data.
Optionally, acquiring a tag set of the audio data includes: extracting words in the audio data by using a voice recognition technology to obtain word data of the audio data, extracting keywords of the word data of the audio data as word labels, wherein the keywords of the word data comprise semantic keywords and emotion keywords; acquiring audio characteristics of the audio data, and acquiring an audio tag based on the audio characteristics of the audio data; and acquiring a tag set of the audio data, wherein the tag set of the audio data comprises a text tag and an audio tag of the audio data.
Optionally, obtaining a tag set of the video data includes: extracting the audio of the video data as audio to be identified; extracting characters in the audio to be identified by using a voice identification technology to obtain character data of the audio to be identified, extracting keywords of the character data of the audio to be identified, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels; acquiring audio characteristics of the audio to be identified, and acquiring an audio tag based on the audio characteristics of the audio to be identified; extracting an image of a preset key frame of the video data to serve as an image to be identified; recognizing characters in the image to be recognized by using an optical character recognition OCR technology to obtain character data of the image to be recognized, extracting keywords of the character data of the image to be recognized as character labels, wherein the keywords of the character data comprise semantic keywords and emotion keywords; identifying object elements in the image to be identified by using an image identification technology, identifying image characteristics of the object elements, and obtaining an image tag based on the image characteristics; and acquiring a tag set of the video data, wherein the tag set of the video data comprises a text tag and an audio tag of the audio to be identified, and a text tag and an image tag of the image to be identified.
Optionally, obtaining the matching degree of the tag set to be compared and the search keyword set includes: acquiring the text similarity of the tag set to be compared and the search keyword set; acquiring the association degree of the candidate multimedia data and the search keyword set, wherein the association degree is positively correlated with the times of taking the candidate multimedia data as the search result of the search keyword set; and determining the matching degree of the tag set to be compared and the search keyword set based on the text similarity and the association degree, wherein the matching degree is positively correlated with the text similarity and the association degree respectively.
The embodiment of the application also provides a readable storage medium, which can store a computer program suitable for being executed by a processor, and when the computer program is executed by the processor, the steps of the method for querying multimedia data provided by the embodiment of the application are implemented as follows:
acquiring a search keyword set, wherein the search keyword set comprises at least one search keyword;
performing similarity calculation on the search keywords and each classified word to obtain target classified words, wherein each classified word is obtained by classifying tags of multimedia data of a plurality of data types, and the data types at least comprise images, audios and videos;
Acquiring candidate multimedia data based on a target tag set, wherein the candidate multimedia data is multimedia data with at least one tag belonging to the target tag set, and the target tag set is a tag set corresponding to the target classification;
acquiring a tag set of the candidate multimedia data as a tag set to be compared;
obtaining the matching degree of the tag set to be compared and the search keyword set;
and if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a search result.
Optionally, the acquiring the search keyword set includes:
acquiring data to be retrieved, wherein the data to be retrieved comprises one or more of image data, text data, audio data and video data;
identifying at least one keyword of the data to be searched as a search keyword;
and obtaining the search keyword set based on each search keyword.
Optionally, the multimedia data query method further includes:
acquiring multimedia data of a plurality of data types;
respectively acquiring a plurality of labels of each multimedia data to obtain a label set of each multimedia data;
Classifying the labels of all the data types of the multimedia data to obtain a plurality of classified words and label sets corresponding to the classified words, wherein the label sets of the classified words comprise a plurality of labels belonging to the classified words;
correspondingly storing the classified words and a label set of the classified words;
the identification and tag set of the multimedia data are correspondingly stored.
Optionally, acquiring the tag set of the image data includes:
recognizing characters in the image data by using an optical character recognition OCR technology to obtain character data of the image data, extracting keywords of the character data of the image data, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels;
identifying object elements in the image data by using an image identification technology, identifying image characteristics of the object elements, and obtaining image tags based on the image characteristics;
and acquiring a tag set of the image data, wherein the tag set of the image data comprises a text tag and an image tag of the image data.
Optionally, acquiring a tag set of the audio data includes:
extracting words in the audio data by using a voice recognition technology to obtain word data of the audio data, extracting keywords of the word data of the audio data as word labels, wherein the keywords of the word data comprise semantic keywords and emotion keywords;
Acquiring audio characteristics of the audio data, and acquiring an audio tag based on the audio characteristics of the audio data;
and acquiring a tag set of the audio data, wherein the tag set of the audio data comprises a text tag and an audio tag of the audio data.
Optionally, obtaining a tag set of the video data includes:
extracting the audio of the video data as audio to be identified;
extracting characters in the audio to be identified by using a voice identification technology to obtain character data of the audio to be identified, extracting keywords of the character data of the audio to be identified, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels;
acquiring audio characteristics of the audio to be identified, and acquiring an audio tag based on the audio characteristics of the audio to be identified;
extracting an image of a preset key frame of the video data to serve as an image to be identified;
recognizing characters in the image to be recognized by using an optical character recognition OCR technology to obtain character data of the image to be recognized, extracting keywords of the character data of the image to be recognized as character labels, wherein the keywords of the character data comprise semantic keywords and emotion keywords;
Identifying object elements in the image to be identified by using an image identification technology, identifying image characteristics of the object elements, and obtaining an image tag based on the image characteristics;
and acquiring a tag set of the video data, wherein the tag set of the video data comprises a text tag and an audio tag of the audio to be identified, and a text tag and an image tag of the image to be identified.
Optionally, obtaining the matching degree of the tag set to be compared and the search keyword set includes:
acquiring the text similarity of the tag set to be compared and the search keyword set;
acquiring the association degree of the candidate multimedia data and the search keyword set, wherein the association degree is positively correlated with the times of taking the candidate multimedia data as the search result of the search keyword set;
and determining the matching degree of the tag set to be compared and the search keyword set based on the text similarity and the association degree, wherein the matching degree is positively correlated with the text similarity and the association degree respectively.
It should be noted that in the context of this disclosure, a readable storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The readable storage medium may be a machine-readable signal medium or a machine-readable storage medium. The readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Finally, it is further noted that in the context of the present disclosure, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.
While several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims (10)

1. A method for querying multimedia data, comprising:
acquiring a search keyword set, wherein the search keyword set comprises at least one search keyword;
Performing similarity calculation on the search keywords and each classified word to obtain target classified words, wherein each classified word is obtained by classifying tags of multimedia data of a plurality of data types, and the data types at least comprise images, audios and videos;
acquiring candidate multimedia data based on a target tag set, wherein the candidate multimedia data is multimedia data with at least one tag belonging to the target tag set, and the target tag set is a tag set corresponding to the target classification;
acquiring a tag set of the candidate multimedia data as a tag set to be compared;
obtaining the matching degree of the tag set to be compared and the search keyword set;
and if the matching degree is larger than a preset matching degree threshold value, determining the candidate multimedia data as a search result.
2. The method of claim 1, wherein the obtaining the search keyword set includes:
acquiring data to be retrieved, wherein the data to be retrieved comprises one or more of image data, text data, audio data and video data;
identifying at least one keyword of the data to be searched as a search keyword;
And obtaining the search keyword set based on each search keyword.
3. The method of claim 1, further comprising:
acquiring multimedia data of a plurality of data types;
respectively acquiring a plurality of labels of each multimedia data to obtain a label set of each multimedia data;
classifying the labels of all the data types of the multimedia data to obtain a plurality of classified words and label sets corresponding to the classified words, wherein the label sets of the classified words comprise a plurality of labels belonging to the classified words;
correspondingly storing the classified words and a label set of the classified words;
the identification and tag set of the multimedia data are correspondingly stored.
4. A method of querying multimedia data as claimed in claim 3, wherein obtaining a set of tags for image data comprises:
recognizing characters in the image data by using an optical character recognition OCR technology to obtain character data of the image data, extracting keywords of the character data of the image data, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels;
Identifying object elements in the image data by using an image identification technology, identifying image characteristics of the object elements, and obtaining image tags based on the image characteristics;
and acquiring a tag set of the image data, wherein the tag set of the image data comprises a text tag and an image tag of the image data.
5. A method of querying multimedia data as claimed in claim 3, wherein obtaining a tag set of audio data comprises:
extracting words in the audio data by using a voice recognition technology to obtain word data of the audio data, extracting keywords of the word data of the audio data as word labels, wherein the keywords of the word data comprise semantic keywords and emotion keywords;
acquiring audio characteristics of the audio data, and acquiring an audio tag based on the audio characteristics of the audio data;
and acquiring a tag set of the audio data, wherein the tag set of the audio data comprises a text tag and an audio tag of the audio data.
6. A method of querying multimedia data as claimed in claim 3, wherein obtaining a tag set of video data comprises:
Extracting the audio of the video data as audio to be identified;
extracting characters in the audio to be identified by using a voice identification technology to obtain character data of the audio to be identified, extracting keywords of the character data of the audio to be identified, wherein the keywords of the character data comprise semantic keywords and emotion keywords as character labels;
acquiring audio characteristics of the audio to be identified, and acquiring an audio tag based on the audio characteristics of the audio to be identified;
extracting an image of a preset key frame of the video data to serve as an image to be identified;
recognizing characters in the image to be recognized by using an optical character recognition OCR technology to obtain character data of the image to be recognized, extracting keywords of the character data of the image to be recognized as character labels, wherein the keywords of the character data comprise semantic keywords and emotion keywords;
identifying object elements in the image to be identified by using an image identification technology, identifying image characteristics of the object elements, and obtaining an image tag based on the image characteristics;
and acquiring a tag set of the video data, wherein the tag set of the video data comprises a text tag and an audio tag of the audio to be identified, and a text tag and an image tag of the image to be identified.
7. The method of claim 3, wherein the obtaining the matching degree between the tag set to be compared and the search keyword set includes:
acquiring the text similarity of the tag set to be compared and the search keyword set;
acquiring the association degree of the candidate multimedia data and the search keyword set, wherein the association degree is positively correlated with the times of taking the candidate multimedia data as the search result of the search keyword set;
and determining the matching degree of the tag set to be compared and the search keyword set based on the text similarity and the association degree, wherein the matching degree is positively correlated with the text similarity and the association degree respectively.
8. A multimedia data query apparatus, comprising:
a search information acquisition unit configured to acquire a search keyword set including at least one search keyword;
the classifying and searching unit is used for carrying out similarity calculation on the search keywords and each classifying word to obtain target classifying words, wherein each classifying word is obtained by classifying tags of multimedia data of a plurality of data types, and the data types at least comprise images, audios and videos;
The data screening unit is used for acquiring candidate multimedia data based on a target tag set, wherein the candidate multimedia data is multimedia data with at least one tag belonging to the target tag set, and the target tag set is a tag set corresponding to the target classification;
the to-be-compared data acquisition unit is used for acquiring a tag set of the candidate multimedia data as a tag set to be compared;
the matching unit is used for acquiring the matching degree of the tag set to be compared and the search keyword set;
and the search result acquisition unit is used for determining the candidate multimedia data as a search result if the matching degree is greater than a preset matching degree threshold value.
9. A multimedia data query device, comprising: a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the multimedia data query method according to any one of claims 1 to 7.
10. A readable storage medium having stored thereon a computer program, which, when executed by a processor, implements the steps of the multimedia data query method of any of claims 1 to 7.
CN202311809163.XA 2023-12-26 2023-12-26 Method, device and equipment for inquiring multimedia data and readable storage medium Pending CN117786137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311809163.XA CN117786137A (en) 2023-12-26 2023-12-26 Method, device and equipment for inquiring multimedia data and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311809163.XA CN117786137A (en) 2023-12-26 2023-12-26 Method, device and equipment for inquiring multimedia data and readable storage medium

Publications (1)

Publication Number Publication Date
CN117786137A true CN117786137A (en) 2024-03-29

Family

ID=90401347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311809163.XA Pending CN117786137A (en) 2023-12-26 2023-12-26 Method, device and equipment for inquiring multimedia data and readable storage medium

Country Status (1)

Country Link
CN (1) CN117786137A (en)

Similar Documents

Publication Publication Date Title
US20200401615A1 (en) System and methods thereof for generation of searchable structures respective of multimedia data content
CN100485665C (en) Method and apparatus for content representation and retrieval in concept model space
US7707162B2 (en) Method and apparatus for classifying multimedia artifacts using ontology selection and semantic classification
JP5602135B2 (en) Method and system for automatic personal annotation of video content
JP5037627B2 (en) Image identification using face recognition
CN109871464B (en) Video recommendation method and device based on UCL semantic indexing
EP2406734A1 (en) Automatic and semi-automatic image classification, annotation and tagging through the use of image acquisition parameters and metadata
US10380267B2 (en) System and method for tagging multimedia content elements
CN111353055B (en) Cataloging method and system based on intelligent tag extension metadata
CN113065018A (en) Audio and video index library creating and retrieving method and device and electronic equipment
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
CN111241336A (en) Audio scene recognition method and device, electronic equipment and medium
CN117786137A (en) Method, device and equipment for inquiring multimedia data and readable storage medium
CN112069331B (en) Data processing and searching method, device, equipment and storage medium
CN113672768A (en) Artificial intelligence for content discovery
CN111291224A (en) Video stream data processing method, device, server and storage medium
CN108780462B (en) System and method for clustering multimedia content elements
CN118051653B (en) Multi-mode data retrieval method, system and medium based on semantic association
Liu et al. NewsBR: a content-based news video browsing and retrieval system
CN110717091B (en) Entry data expansion method and device based on face recognition
US10585934B2 (en) Method and system for populating a concept database with respect to user identifiers
Abe et al. Clickable real world: Interaction with real-world landmarks using mobile phone camera
Derakhshan et al. A Review of Methods of Instance-based Automatic Image Annotation
Seltzer et al. The data deluge: Challenges and opportunities of unlimited data in statistical signal processing
WO2023235780A1 (en) Video classification and search system to support customizable video highlights

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination