CN111291204A - Multimedia data fusion method and device - Google Patents

Multimedia data fusion method and device Download PDF

Info

Publication number
CN111291204A
CN111291204A CN201911259689.9A CN201911259689A CN111291204A CN 111291204 A CN111291204 A CN 111291204A CN 201911259689 A CN201911259689 A CN 201911259689A CN 111291204 A CN111291204 A CN 111291204A
Authority
CN
China
Prior art keywords
multimedia data
data
vector
feature
types
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911259689.9A
Other languages
Chinese (zh)
Other versions
CN111291204B (en
Inventor
何志强
刘鑫
张继勇
庄浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Finance University
Original Assignee
Huarui Xinzhi Technology Beijing Co ltd
Hebei Finance University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huarui Xinzhi Technology Beijing Co ltd, Hebei Finance University filed Critical Huarui Xinzhi Technology Beijing Co ltd
Priority to CN201911259689.9A priority Critical patent/CN111291204B/en
Publication of CN111291204A publication Critical patent/CN111291204A/en
Application granted granted Critical
Publication of CN111291204B publication Critical patent/CN111291204B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification

Abstract

The embodiment of the application provides a multimedia data fusion method and equipment, and the method comprises the following steps: receiving multimedia data from a plurality of terminal devices, wherein the data types of the multimedia data comprise at least two of the following types: text, image, audio. And correspondingly identifying the multimedia data of each data type to obtain the feature vector of each multimedia data, wherein the feature vector is used for expressing the feature of each multimedia data. And performing vector conversion on the feature vectors of the multimedia data based on the relationship between the feature vectors of the multimedia data and the preset conversion vectors so as to enable the feature vectors of the multimedia data of different data types to be in the same vector space. And clustering the multimedia data of different data types according to the converted feature vectors of the multimedia data.

Description

Multimedia data fusion method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a multimedia data fusion method and device.
Background
With the high development of information technology, large-scale multimedia data is generated from multiple dimensions. For example, video data and picture data are obtained from a camera, text data is obtained from the inside of the text, and audio data is obtained through a buried point technology. The same theme is expressed in the face of many different forms of data, whose high level semantics are very similar, but whose underlying features are very different between different media, and such data have strong relevance. Such correlated data can be applied to many aspects, such as searching, where each person can search for one of them that can be correlated to other related events. For example, a celebrity name can be used as a keyword to search through a hundred-degree search engine, and the information of the celebrity can be searched. These materials include photographs of the moto, personal data, audio of a lecture, video, etc. Therefore, multimedia data fusion becomes crucial.
In the existing data fusion technology, a corresponding label is often labeled for multimedia data by a manual labeling method, and the multimedia data is clustered by the label of the multimedia data, so as to realize the fusion of the multimedia data. By the method, on one hand, a large number of annotators and auditors are needed, and a large amount of manpower is consumed. On the other hand, due to subjectivity of annotators and auditors and richness of semantic content, labels used for annotating multimedia data cannot sufficiently clearly and completely express meanings represented by the data, so that relevance of the multimedia data is weak.
Disclosure of Invention
The embodiment of the specification provides a multimedia data fusion method and device, and is used for solving the problems of low efficiency, poor quality and the like of multimedia data fusion caused by the fact that manual labeling is needed to be carried out on multimedia data when the multimedia data fusion is carried out in the prior art.
In one aspect, an embodiment of the present application provides a multimedia data fusion method, where the method includes: receiving multimedia data from each of a plurality of terminal devices, the data types of the multimedia data including at least two of: one or more of text, images, audio; respectively carrying out corresponding identification on the multimedia data of each data type to obtain a feature vector of each multimedia data, wherein the feature vector is used for expressing the feature of each multimedia data; performing vector conversion on the feature vectors of the multimedia data based on the relationship between the feature vectors of the multimedia data and preset conversion vectors so as to enable the feature vectors of the multimedia data of different data types to be in the same vector space; and clustering the multimedia data of different data types according to the converted feature vectors of the multimedia data.
In a possible implementation manner, based on the feature vector of each multimedia data and the number of preset multimedia data categories, vector conversion is performed on the feature vector of each multimedia data, and a specific preset algorithm is shown in the following formula:
Figure BDA0002311279310000021
where k is the number of predetermined multimedia data categories, θkIs the feature vector of the kth multimedia data, x is the preset conversion vector, T represents transposition, and P (i) is the feature vector after vector conversion.
In a possible implementation manner, clustering multimedia data of different data types according to feature vectors of each multimedia data after vector conversion specifically includes: determining whether the multimedia data of different data types are of one type or not according to the converted feature vectors of the multimedia data; and clustering the multimedia data of one type with different data types based on a preset clustering algorithm.
In a possible implementation manner, according to the feature vector of each multimedia data after vector conversion, it is determined whether multimedia data of different data types are of one type, specifically: calculating Euler distances among feature vectors after the multimedia data vectors of different data types are converted; and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data of different data types as one type.
In one possible implementation, the data types of the multimedia data further include: and (6) video.
In a possible implementation manner, before performing corresponding identification on the multimedia data of each data type to obtain the feature vector of each multimedia data, the method further includes: respectively carrying out corresponding preprocessing on the multimedia data of different data types.
On the other hand, an embodiment of the present application further provides a multimedia data fusion device, which includes: the receiving module is used for receiving multimedia data from a plurality of terminal devices, and the data types of the multimedia data comprise at least two of the following types: text, images, audio; the identification module is used for correspondingly identifying the multimedia data of each data type to obtain the characteristic vector of each multimedia data; wherein, the feature vector is used for representing the feature of each multimedia data; the vector conversion module is used for performing vector conversion on the feature vectors of the multimedia data based on the relationship between the feature vectors of the multimedia data and the preset conversion vectors so as to enable the feature vectors of the multimedia data of different data types to be in the same vector space; and the clustering module is used for clustering the multimedia data of different data types according to the converted feature vectors of the multimedia data.
In a possible implementation manner, based on the feature vector of each multimedia data and the number of preset multimedia data categories, vector conversion is performed on the feature vector of each multimedia data, and a specific preset algorithm is shown in the following formula:
Figure BDA0002311279310000031
where k is the number of predetermined multimedia data categories, θkIs the feature vector of the kth multimedia data, x is the preset conversion vector, T represents transposition, and P (i) is the feature vector after vector conversion.
In one possible implementation, the clustering module includes: a determining unit and a clustering unit; a determining unit, configured to determine whether multimedia data of different data types are of one type according to the feature vector of each converted multimedia data; and the clustering unit is used for clustering multimedia data of different data types based on a preset clustering algorithm.
In a possible implementation manner, the determining unit is specifically configured to: calculating Euler distances among feature vectors after the multimedia data vectors of different data types are converted; and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data of different data types as one type.
According to the multimedia data fusion method and device provided by the embodiment of the application, multimedia data of different data types can be classified through the feature vectors of the multimedia data, and one type of multimedia data is clustered. On one hand, compared with the manual labeling method for classification, a large amount of manpower and material resources can be saved, and the method has objectivity. On the other hand, when multimedia data set is carried out, clustering of the multimedia data through artificial marking is avoided, so that the efficiency and the quality of data fusion are further improved, and the user experience is also improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a multimedia data fusion method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a multimedia data fusion device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person skilled in the art without making any inventive step based on the embodiments in the description belong to the protection scope of the present application.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a multimedia data fusion method according to an embodiment of the present application. As shown in fig. 1, the data processing method includes the steps of:
s101, the server receives multimedia data from a plurality of terminal devices.
Wherein the data type of the multimedia data comprises at least one of: text, image, audio. In some embodiments of the present application, the data type of the multimedia data further includes video, which may be either audio video or non-audio video.
The terminal device may be hardware or software. When the terminal device is hardware, it may be various electronic devices such as a computer, a camera, a scanner, and the like. When the terminal device is software, the software can be installed in the electronic devices listed above. For example, when the terminal device is a video camera, the multimedia data received by the server is video data; when the terminal equipment is music software, the multimedia data received by the server is audio data; when the terminal device is a camera, the multimedia data received by the server is image data.
S102, respectively preprocessing the multimedia data with different data types.
For preprocessing multimedia data (hereinafter referred to as text data) of text type, data cleaning can be performed by, for example, regularization processing case and case, semantic disambiguation, synonym replacement processing, and the like, that is, the text data is preprocessed.
The preprocessing performed on the multimedia data of an image type (hereinafter referred to as image data) can appropriately discard a low-quality image. For example, blurred images, images with strong human scene complexity.
The preprocessing of the text data and the image data is not limited to the above-described method, and may be performed by other methods. For example, the image data is PS processed to improve the resolution of the image.
The audio type multimedia data (hereinafter referred to as audio data) is preprocessed, and the audio data may be subjected to noise reduction processing to reduce the influence of noise.
For preprocessing multimedia data of a video type (hereinafter referred to as video data), a composite image of the video data may be generated according to a video frame sequence of the video data, and then processed according to a preprocessing method of the image data.
It should be noted that the server may send request information to the corresponding terminal device, and the terminal device sends the corresponding multimedia data to the server based on the received request information.
S103, respectively carrying out corresponding identification on the multimedia data of each data type to obtain the feature vector of each multimedia data.
The feature vector as referred to herein refers to a feature for representing multimedia data. The feature vector corresponding to multimedia data of, for example, an image type is an image feature vector, which is used to represent a feature of a shape in an image.
For the text data, the feature vector of the text data can be obtained through a preset text feature extraction model. The text feature extraction model may be a pre-trained neural network model, such as a BERT model. The training of the BERT model is divided into two steps of pre-training and fine-tuning. Pre-training is not related to downstream tasks, but is a very time consuming and expensive process. Calling an open-source neural network model should be undertaken for this without repeating this process. Neural network models are summaries of a priori knowledge of the language, and once owned, do not require repeated construction. A network expansion architecture that fine tunes the specific downstream task can be employed. In general, fine tuning of BERTs is a lightweight task, with fine tuning primarily adjusting to extend the network rather than the BERTs themselves. Furthermore, one of the important roles of the BERT model is to generate word vectors, which can be used to solve a word ambiguity problem that cannot be solved by the word2vec model.
For image data, feature vectors of the image data can be obtained through an image feature extraction model, wherein the image feature extraction model is a neural network model. For example, using a relatively classical deep convolutional neural network in conjunction with pooling layers. Because the image is used as a signal source, parameters of the neural network are huge, in order to reduce the calculation amount of training, a pooling layer is provided to further abstract the calculation result of the neural network model convolution layer, the weight amount to be trained is reduced, and overfitting is also prevented.
For the audio data, the feature vector of the audio data can be directly obtained through the corresponding audio feature extraction model, or the audio data can be converted into text data, and the feature vector of the audio data is obtained in the process of inputting the text data into the corresponding text feature extraction model.
For video data, a feature vector of the video data can be directly obtained through a corresponding video feature extraction model; or generating a combined graph from the video data according to the video sequence frame, and inputting the generated combined graph into a corresponding image feature extraction model to obtain a feature vector corresponding to the video data.
The audio characteristic extraction model and the video characteristic extraction model are pre-trained neural network models.
It should be noted that the feature vector of the multimedia data can be obtained not only by the corresponding model, but also by other algorithms, which is not limited in the embodiment of the present application.
And S104, performing vector conversion on the feature vectors of the multimedia data based on the relationship between the feature vectors of the multimedia data and the preset conversion vectors so as to enable the feature vectors of the multimedia data of different data types to be in the same vector space.
The preset transformation vector may be obtained by learning through a neural network model.
Because the data types of the multimedia data are different, whether the multimedia data with different data types are of one type or not can not be directly determined according to the corresponding feature vectors.
Therefore, in some embodiments of the present application, the feature vectors of the multimedia data may be vector-converted according to a preset algorithm, so that the feature vectors of the multimedia data of different data types are in the same vector space.
In some embodiments of the present application, vector conversion is performed on the feature vector of each multimedia data based on the feature vector of each multimedia data and the number of preset categories of multimedia data, and the following formula is specifically shown:
Figure BDA0002311279310000071
wherein k is the number of the preset multimedia data categories, thetakThe feature vector of the kth multimedia data is represented by x, T represents transposition, and p (i) is a feature vector after vector conversion.
The k may be a parameter defined by itself.
Through the formula, the feature vectors of the multimedia data with different data types can be converted into the vectors of the same vector space.
And S105, determining whether the multimedia data of different data types are of one type or not according to the converted feature vectors of the multimedia data.
Specifically, calculating Euler distances between feature vectors of multimedia data of different data types after vector conversion;
and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data of different data types as one type.
For example, a euler distance between a converted feature vector of text data and a converted feature vector of image data is calculated, and in the case where the euler distance is smaller than a preset threshold, the text data and the image data are determined to be one type.
For another example, when a text data and an image data are classified, the euler distance between the converted feature vector of the text data and the converted feature vector of another audio data is also smaller than a preset threshold, and the text data, the image data, and the audio data are classified.
It should be noted that the preset threshold may be set in advance, or may be adjusted in real time according to actual conditions.
And S106, clustering the multimedia data of different data types based on a preset clustering algorithm.
In the embodiment of the application, multimedia data of one type and different data types can be clustered through a preset clustering algorithm, such as a k-means clustering algorithm.
Based on the above scheme, the multimedia data fusion method provided in the embodiment of the present application can determine whether multimedia data of different data types are of one type or not through the feature vector of each multimedia data, and cluster the multimedia data of one type of different data types to implement the fusion of the multimedia data. On one hand, compared with the manual labeling method for classification, a large amount of manpower and material resources can be saved, and the method has objectivity. On the other hand, when multimedia data set is carried out, clustering of the multimedia data through artificial marking is avoided, so that the efficiency and the quality of data fusion are further improved, and the user experience is also improved.
Based on the same idea, some embodiments of the present application further provide a device corresponding to the above method.
Fig. 2 is a schematic structural diagram of a multimedia data fusion device according to an embodiment of the present application. As shown in fig. 2, the apparatus 200 includes: receiving module 210, identifying module 220, vector converting module 230, clustering module 240
The receiving module 210 is configured to receive multimedia data from a plurality of terminal devices, where data types of the multimedia data include at least two of the following: text, image, audio. The identification module 220 is configured to perform corresponding identification on the multimedia data of each data type to obtain a feature vector of each multimedia data; wherein the feature vector is used to represent the feature of each multimedia data. The vector transformation module 230 is configured to perform vector transformation on the feature vectors of the multimedia data based on a relationship between the feature vectors of the multimedia data and a preset transformation vector, so that the feature vectors of the multimedia data of different data types are in the same vector space. The clustering module 240 is configured to cluster the multimedia data of different data types according to the feature vector of each converted multimedia data.
In a possible implementation manner, based on the feature vector of each multimedia data and the number of preset multimedia data categories, vector conversion is performed on the feature vector of each multimedia data, and a specific preset algorithm is shown in the following formula:
Figure BDA0002311279310000091
where k is the number of predetermined multimedia data categories, θkIs the feature vector of the kth multimedia data, x is the preset conversion vector, T represents transposition, and P (i) is the feature vector after vector conversion.
In one possible implementation, the clustering module 240 includes: a determination unit (not shown in the figure) and a clustering unit (not shown in the figure). And the determining unit is used for determining whether the multimedia data of different data types are of one type or not according to the converted feature vectors of the multimedia data. And the clustering unit is used for clustering multimedia data of different data types based on a preset clustering algorithm.
In a possible implementation manner, the determining unit is specifically configured to: calculating Euler distances among feature vectors after the multimedia data vectors of different data types are converted; and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data of different data types as one type.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The devices and the methods provided by the embodiment of the application are in one-to-one correspondence, so the devices also have beneficial technical effects similar to the corresponding methods.
The embodiments of the present invention are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims (10)

1. A method for multimedia data fusion, the method comprising:
receiving multimedia data from a plurality of terminal devices, wherein the data types of the multimedia data comprise at least two of the following types: text, images, audio;
respectively carrying out corresponding identification on the multimedia data of each data type to obtain a feature vector of each multimedia data, wherein the feature vector is used for expressing the feature of each multimedia data;
performing vector conversion on the feature vectors of the multimedia data based on the relationship between the feature vectors of the multimedia data and preset conversion vectors so as to enable the feature vectors of the multimedia data of different data types to be in the same vector space;
and clustering the multimedia data of different data types according to the converted feature vectors of the multimedia data.
2. The method of claim 1, wherein the feature vector of each multimedia data is vector-converted based on the feature vector of each multimedia data and a predetermined number of categories of the multimedia data, as shown in the following formula:
Figure FDA0002311279300000011
wherein k is the number of the preset multimedia data categories, thetakThe feature vector of the kth multimedia data is represented by x, T represents transposition, and p (i) is a feature vector after vector conversion.
3. The method of claim 1, wherein clustering multimedia data of different data types according to the feature vectors of the multimedia data after vector conversion specifically comprises:
determining whether the multimedia data of different data types are of one type or not according to the converted feature vectors of the multimedia data;
and clustering the multimedia data of one type with different data types based on a preset clustering algorithm.
4. The method according to claim 3, wherein determining whether multimedia data of different data types are of one type according to the feature vector of each multimedia data after vector conversion is specifically:
calculating Euler distances among feature vectors after the multimedia data vectors of different data types are converted;
and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data of different data types as one type.
5. The method of claim 1, wherein the data type of the multimedia data further comprises: and (6) video.
6. The method of claim 1, wherein before the corresponding identification of the multimedia data of each data type is performed to obtain the feature vector of each multimedia data, the method further comprises:
respectively carrying out corresponding preprocessing on the multimedia data of different data types.
7. A multimedia data fusion device, characterized in that the device comprises:
the receiving module is used for receiving multimedia data from a plurality of terminal devices, and the data types of the multimedia data comprise at least two of the following types: text, images, audio;
the identification module is used for correspondingly identifying the multimedia data of each data type to obtain the characteristic vector of each multimedia data; wherein, the feature vector is used for representing the feature of each multimedia data;
the vector conversion module is used for performing vector conversion on the feature vectors of the multimedia data based on the relationship between the feature vectors of the multimedia data and the preset conversion vectors so as to enable the feature vectors of the multimedia data of different data types to be in the same vector space;
and the clustering module is used for clustering the multimedia data of different data types according to the converted feature vectors of the multimedia data.
8. The apparatus of claim 7, wherein the feature vector of each multimedia data is vector-converted based on the feature vector of each multimedia data and a number of predetermined categories of multimedia data, as shown in the following formula:
Figure FDA0002311279300000021
wherein k is the number of the preset multimedia data categories, thetakThe feature vector of the kth multimedia data is represented by x, T represents transposition, and p (i) is a feature vector after vector conversion.
9. The apparatus of claim 7, wherein the clustering module comprises: a determining unit and a clustering unit;
the determining unit is used for determining whether the multimedia data of different data types are of one type or not according to the converted feature vectors of the multimedia data;
the clustering unit is used for clustering multimedia data of different data types based on a preset clustering algorithm.
10. The device according to claim 9, wherein the determining unit is specifically configured to:
calculating Euler distances among feature vectors after the multimedia data vectors of different data types are converted;
and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data of different data types as one type.
CN201911259689.9A 2019-12-10 2019-12-10 Multimedia data fusion method and device Active CN111291204B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911259689.9A CN111291204B (en) 2019-12-10 2019-12-10 Multimedia data fusion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911259689.9A CN111291204B (en) 2019-12-10 2019-12-10 Multimedia data fusion method and device

Publications (2)

Publication Number Publication Date
CN111291204A true CN111291204A (en) 2020-06-16
CN111291204B CN111291204B (en) 2023-08-29

Family

ID=71021287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911259689.9A Active CN111291204B (en) 2019-12-10 2019-12-10 Multimedia data fusion method and device

Country Status (1)

Country Link
CN (1) CN111291204B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6086006A (en) * 1996-10-29 2000-07-11 Scerbvo, Iii; Frank C. Evidence maintaining tape recording reels and cassettes
CN101021849A (en) * 2006-09-14 2007-08-22 浙江大学 Transmedia searching method based on content correlation
CN103440292A (en) * 2013-08-16 2013-12-11 新浪网技术(中国)有限公司 Method and system for retrieving multimedia information based on bit vector
CN104182421A (en) * 2013-05-27 2014-12-03 华东师范大学 Video clustering method and detecting method
CN104679902A (en) * 2015-03-20 2015-06-03 湘潭大学 Information abstract extraction method in conjunction with cross-media fuse
CN110209844A (en) * 2019-05-17 2019-09-06 腾讯音乐娱乐科技(深圳)有限公司 Multi-medium data matching process, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6086006A (en) * 1996-10-29 2000-07-11 Scerbvo, Iii; Frank C. Evidence maintaining tape recording reels and cassettes
CN101021849A (en) * 2006-09-14 2007-08-22 浙江大学 Transmedia searching method based on content correlation
CN104182421A (en) * 2013-05-27 2014-12-03 华东师范大学 Video clustering method and detecting method
CN103440292A (en) * 2013-08-16 2013-12-11 新浪网技术(中国)有限公司 Method and system for retrieving multimedia information based on bit vector
CN104679902A (en) * 2015-03-20 2015-06-03 湘潭大学 Information abstract extraction method in conjunction with cross-media fuse
CN110209844A (en) * 2019-05-17 2019-09-06 腾讯音乐娱乐科技(深圳)有限公司 Multi-medium data matching process, device and storage medium

Also Published As

Publication number Publication date
CN111291204B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
US8818916B2 (en) System and method for linking multimedia data elements to web pages
CN111259215A (en) Multi-modal-based topic classification method, device, equipment and storage medium
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
CN110580500A (en) Character interaction-oriented network weight generation few-sample image classification method
CN113255755A (en) Multi-modal emotion classification method based on heterogeneous fusion network
CN109885796B (en) Network news matching detection method based on deep learning
CN111464881B (en) Full-convolution video description generation method based on self-optimization mechanism
CN106354856B (en) Artificial intelligence-based deep neural network enhanced search method and device
CN111753133A (en) Video classification method, device and storage medium
CN115292470B (en) Semantic matching method and system for intelligent customer service of petty loan
CN110717421A (en) Video content understanding method and device based on generation countermeasure network
CN113704506A (en) Media content duplication eliminating method and related device
US11537636B2 (en) System and method for using multimedia content as search queries
CN112381114A (en) Deep learning image annotation system and method
CN116662565A (en) Heterogeneous information network keyword generation method based on contrast learning pre-training
CN116091836A (en) Multi-mode visual language understanding and positioning method, device, terminal and medium
CN115599953A (en) Training method and retrieval method of video text retrieval model and related equipment
CN116977701A (en) Video classification model training method, video classification method and device
CN114782752B (en) Small sample image integrated classification method and device based on self-training
CN111291204B (en) Multimedia data fusion method and device
CN114973086A (en) Video processing method and device, electronic equipment and storage medium
CN112749556B (en) Multi-language model training method and device, storage medium and electronic equipment
Jiao et al. Realization and improvement of object recognition system on raspberry pi 3b+
CN114491010A (en) Training method and device of information extraction model
Liu et al. Research on graphic-text relationship in film and television works based on big data model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230803

Address after: No.3188 Hengxiang North Street, Baoding City, Hebei Province 071051

Applicant after: Hebei Finance University

Address before: No.3188 Hengxiang North Street, Baoding City, Hebei Province 071051

Applicant before: Hebei Finance University

Applicant before: HUARUI XINZHI TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant