CN111291204B - Multimedia data fusion method and device - Google Patents

Multimedia data fusion method and device Download PDF

Info

Publication number
CN111291204B
CN111291204B CN201911259689.9A CN201911259689A CN111291204B CN 111291204 B CN111291204 B CN 111291204B CN 201911259689 A CN201911259689 A CN 201911259689A CN 111291204 B CN111291204 B CN 111291204B
Authority
CN
China
Prior art keywords
multimedia data
data
vector
multimedia
feature vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911259689.9A
Other languages
Chinese (zh)
Other versions
CN111291204A (en
Inventor
何志强
刘鑫
张继勇
庄浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Finance University
Original Assignee
Hebei Finance University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Finance University filed Critical Hebei Finance University
Priority to CN201911259689.9A priority Critical patent/CN111291204B/en
Publication of CN111291204A publication Critical patent/CN111291204A/en
Application granted granted Critical
Publication of CN111291204B publication Critical patent/CN111291204B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification

Abstract

The embodiment of the application provides a multimedia data fusion method and device, which comprises the following steps: receiving multimedia data from a plurality of terminal devices, the data types of the multimedia data comprising at least two of: text, images, audio. And respectively carrying out corresponding identification on the multimedia data of each data type to obtain feature vectors of each multimedia data, wherein the feature vectors are used for representing the features of each multimedia data. And carrying out vector conversion on the feature vectors of the multimedia data based on the relation between the feature vectors of the multimedia data and the preset conversion vector so that the feature vectors of the multimedia data with different data types are in the same vector space. And clustering the multimedia data with different data types according to the feature vectors of the converted multimedia data.

Description

Multimedia data fusion method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for multimedia data fusion.
Background
With the high development of information technology, large-scale multimedia data is generated from multiple dimensions. For example, video data and picture data are obtained from a video camera, text data are obtained from the inside of a text, and audio data are obtained by a buried point technique. The same subject is presented against many different forms of data whose high level semantics are very similar but whose underlying features are far from different media, such data having a strong correlation. Such data with relevance may be applied in many ways, such as searching, where each person may search for one of the other related events. For example, a celebrity name may be used as a keyword to search through a hundred-degree search engine to find information about the celebrity. These materials include photographs, personal data, speech audio, video, etc. of the moxidec. Therefore, multimedia data fusion becomes critical.
In the existing data fusion technology, corresponding labels are marked for multimedia data by a manual marking method, and clustering is carried out through the labels of the multimedia data so as to realize the fusion of the multimedia data. By the method, a large number of labeling personnel and auditing personnel are needed, and a large amount of manpower is consumed. On the other hand, due to subjectivity of labeling personnel and auditing personnel and richness of semantic content, labels for labeling multimedia data are often insufficient to clearly and completely express meanings represented by the data, so that the relevance of the multimedia data is weaker.
Disclosure of Invention
The embodiment of the specification provides a multimedia data fusion method and device, which are used for solving the problems of low efficiency, poor quality and the like of multimedia data fusion caused by the need of manually marking the multimedia data when the multimedia data is fused in the prior art.
In one aspect, an embodiment of the present application provides a multimedia data fusion method, where the method includes: receiving multimedia data from each of a plurality of terminal devices, the data types of the multimedia data including at least two of: one or more of text, image, audio; respectively carrying out corresponding identification on the multimedia data of each data type to obtain feature vectors of each multimedia data, wherein the feature vectors are used for representing the features of each multimedia data; based on the relation between the characteristic vector of each multimedia data and a preset conversion vector, carrying out vector conversion on the characteristic vector of each multimedia data so as to enable the characteristic vectors of the multimedia data with different data types to be in the same vector space; and clustering the multimedia data with different data types according to the feature vectors of the converted multimedia data.
In one possible implementation manner, vector conversion is performed on the feature vector of each multimedia data based on the feature vector of each multimedia data and the number of preset categories of the multimedia data, and a specific preset algorithm is shown in the following formula:
wherein k is the number of categories of the preset multimedia data, θ k And x is a preset conversion vector, T is a transposition, and P (i) is a vector converted feature vector.
In one possible implementation manner, clustering the multimedia data of different data types according to the feature vector of each multimedia data after vector conversion specifically includes: determining whether the multimedia data with different data types are of one type according to the feature vector of each converted multimedia data; based on a preset clustering algorithm, clustering the multimedia data of different data types.
In one possible implementation manner, according to the feature vector of each multimedia data after vector conversion, determining whether the multimedia data with different data types is of a type, specifically: calculating Euler distances among feature vectors converted by the multimedia data vectors of different data types; and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data with different data types as one type.
In one possible implementation, the data types of the multimedia data further include: video.
In one possible implementation manner, before respective identification is performed on the multimedia data of each data type, and a feature vector of each multimedia data is obtained, the method further includes: and respectively preprocessing the multimedia data with different data types.
On the other hand, the embodiment of the application also provides a multimedia data fusion device, which comprises: the receiving module is used for receiving the multimedia data from a plurality of terminal devices, and the data types of the multimedia data comprise at least two of the following: text, image, audio; the identification module is used for respectively carrying out corresponding identification on the multimedia data of each data type to obtain the feature vector of each multimedia data; wherein, the feature vector is used for representing the feature of each multimedia data; the vector conversion module is used for carrying out vector conversion on the feature vectors of the multimedia data based on the relation between the feature vectors of the multimedia data and the preset conversion vectors so as to enable the feature vectors of the multimedia data with different data types to be in the same vector space; and the clustering module is used for clustering the multimedia data with different data types according to the feature vectors of the converted multimedia data.
In one possible implementation manner, vector conversion is performed on the feature vector of each multimedia data based on the feature vector of each multimedia data and the number of preset categories of the multimedia data, and a specific preset algorithm is shown in the following formula:
wherein k is the number of categories of the preset multimedia data, θ k And x is a preset conversion vector, T is a transposition, and P (i) is a vector converted feature vector.
In one possible implementation, the clustering module includes: a determining unit and a clustering unit; the determining unit is used for determining whether the multimedia data with different data types are of one type according to the feature vectors of the converted multimedia data; and the clustering unit is used for clustering the multimedia data of different data types based on a preset clustering algorithm.
In one possible implementation, the determining unit is specifically configured to: calculating Euler distances among feature vectors converted by the multimedia data vectors of different data types; and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data with different data types as one type.
According to the multimedia data fusion method and device provided by the embodiment of the application, the multimedia data with different data types can be classified through the feature vectors of the multimedia data, and the multimedia data of one type can be clustered. On the one hand, compared with the manual labeling method, the method can save a great deal of manpower and material resources and has more objectivity. On the other hand, when the multimedia data set is carried out, the clustering of the multimedia data by artificial labeling is avoided, so that the efficiency and the quality of data fusion are further improved, and the user experience is also improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a flowchart of a multimedia data fusion method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a multimedia data fusion device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art without the exercise of inventive faculty, are intended to be within the scope of the application, based on the embodiments in the specification.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
Fig. 1 is a flowchart of a multimedia data fusion method according to an embodiment of the present application. As shown in fig. 1, the data processing method includes the steps of:
s101, the server receives multimedia data from a plurality of terminal devices.
Wherein the data type of the multimedia data includes at least one of: text, images, audio. In some embodiments of the present application, the data type of the multimedia data further includes video, where the video may be a video with sound or may be an audio-free video.
The terminal device may be hardware or software. When the terminal device is hardware, it may be various electronic devices such as a computer, a camera, a scanner, and the like. When the terminal device is software, it can be installed in the above-listed electronic device. For example, when the terminal device is a camera, the multimedia data received by the server is video data; when the terminal equipment is music software, the multimedia data received by the server are audio data; when the terminal device is a camera, the multimedia data received by the server is image data.
S102, respectively preprocessing the multimedia data with different data types.
The preprocessing of the text type multimedia data (hereinafter referred to as text data) can be performed by, for example, regularization processing case, semantic disambiguation, synonym substitution processing, and the like, that is, the preprocessing of the text data.
The preprocessing of the image type multimedia data (hereinafter referred to as image data) can appropriately discard the low quality image. Such as blurred images, and images with strong complexity of the person's scene.
The preprocessing of the text data and the image data is not limited to the above method, and may be performed by other methods. For example PS on the image data to increase the resolution of the image.
For preprocessing audio-type multimedia data (hereinafter, audio data), noise reduction processing may be performed on the audio data to reduce the influence of noise.
The preprocessing of multimedia data of video type (hereinafter referred to as video data) may be performed by generating a combined picture of the video data from a sequence of video frames of the video data, and processing the combined picture according to a preprocessing method of the image data.
It should be noted that, the server may send the request information to the corresponding terminal device, and the terminal device sends the corresponding multimedia data to the server based on the received request information.
S103, respectively carrying out corresponding identification on the multimedia data of each data type to obtain the feature vector of each multimedia data.
The feature vector referred to herein refers to a feature for representing multimedia data. For example, the feature vector corresponding to the multimedia data of the image type is an image feature vector, and is used to represent the feature of the shape in the image.
For the text data, the feature vector of the text data can be obtained through a preset text feature extraction model. The text feature extraction model referred to herein may be a pre-trained neural network model, such as a BERT model. The BERT model training is divided into two steps of pre-training and fine tuning. Pretraining is independent of downstream tasks, but is a very time-consuming and costly process. For this, a neural network model calling an open source should be adopted without repeating this process. The neural network model is a summary of a priori knowledge of the language, and does not require repeated construction once owned. A network extension architecture that fine-tunes to specific downstream tasks may be employed. Overall, the fine tuning of BERT is a lightweight task, with the main tuning being to extend the network rather than BERT itself. Furthermore, one of the important roles of the BERT model is to generate word vectors, which can be used to solve the word ambiguity problem that cannot be solved by the word2vec model.
For image data, feature vectors of the image data can be obtained through an image feature extraction model, which is a neural network model. For example, using a relatively classical deep convolutional neural network in combination with a pooling layer. Because the image is used as a signal source, the parameters of the neural network are huge, in order to reduce the calculated amount of training, a pooling layer is provided for further abstracting the calculated result of the convolutional layer of the neural network model, the weight amount to be trained is reduced, and meanwhile, the overfitting is prevented.
For the audio data, the feature vector of the audio data can be directly obtained through the corresponding audio feature extraction model, the audio data can be converted into text data, and the feature vector of the audio data can be obtained by inputting the text data into the corresponding text feature extraction model.
For video data, feature vectors of the video data can be directly obtained through corresponding video feature extraction models; the video data can also be generated into a combined graph according to the video sequence frames, and the generated combined graph is input into a corresponding image feature extraction model to obtain the feature vector corresponding to the video data.
The audio feature extraction model and the video feature extraction model are all pre-trained neural network models.
It should be noted that, the feature vector of the multimedia data may be obtained not only through a corresponding model, but also through other algorithms, which is not limited in the embodiment of the present application.
And S104, carrying out vector conversion on the feature vectors of the multimedia data based on the relation between the feature vectors of the multimedia data and the preset conversion vectors so that the feature vectors of the multimedia data with different data types are in the same vector space.
The preset conversion vector may be obtained by learning through a neural network model.
Since the data types of the multimedia data are different, whether the multimedia data with different data types are in one type cannot be directly determined according to the corresponding feature vector.
Therefore, in some embodiments of the present application, the feature vectors of each multimedia data may be subjected to vector conversion according to a preset algorithm, so that the feature vectors of the multimedia data of different data types are in the same vector space.
In some embodiments of the present application, vector conversion is performed on the feature vector of each multimedia data based on the feature vector of each multimedia data and the number of preset categories of the multimedia data, and the specific formula is as follows:
wherein k is the number of categories of the preset multimedia data, and θ k And the x is a preset conversion vector, T is a transpose, and P (i) is a vector converted feature vector.
The k may be a self-defined parameter.
Through the formula, the characteristic vectors of the multimedia data with different data types can be converted into vectors in the same vector space.
S105, determining whether the multimedia data with different data types are of one type according to the feature vector of each converted multimedia data.
Specifically, calculating Euler distances between the vector-converted feature vectors of the multimedia data of different data types;
and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data with different data types as one type.
For example, the euler distance between the converted feature vector of the text data and the converted feature vector of the image data is calculated, and the text data and the image data are determined to be in a class when the euler distance is smaller than a preset threshold.
For another example, in the case that a text data and an image data are classified, if the euler distance between the converted feature vector of the text data and the converted feature vector of another audio data is also smaller than a predetermined threshold, the text data, the image data and the audio data are determined to be classified.
The preset threshold value may be set in advance, or may be adjusted in real time according to actual situations.
S106, based on a preset clustering algorithm, clustering the multimedia data of different data types.
In the embodiment of the application, the multimedia data of different data types can be clustered through a preset clustering algorithm, such as a k-means clustering algorithm.
Based on the above scheme, the multimedia data fusion method provided by the embodiment of the application can determine whether the multimedia data with different data types are of one type or not through the feature vector of each multimedia data, and cluster the multimedia data with different data types of one type so as to realize the fusion of the multimedia data. On the one hand, compared with the manual labeling method, the method can save a great deal of manpower and material resources and has more objectivity. On the other hand, when the multimedia data set is carried out, the clustering of the multimedia data by artificial labeling is avoided, so that the efficiency and the quality of data fusion are further improved, and the user experience is also improved.
Based on the same thought, some embodiments of the present application further provide a device corresponding to the above method.
Fig. 2 is a schematic structural diagram of a multimedia data fusion device according to an embodiment of the present application. As shown in fig. 2, the apparatus 200 includes: receiving module 210, identifying module 220, vector converting module 230, clustering module 240
The receiving module 210 is configured to receive multimedia data from a plurality of terminal devices, where data types of the multimedia data include at least two of the following: text, images, audio. The identifying module 220 is configured to identify the multimedia data of each data type respectively, so as to obtain a feature vector of each multimedia data; wherein the feature vector is used for representing the features of each multimedia data. The vector conversion module 230 is configured to perform vector conversion on the feature vectors of each multimedia data based on a relationship between the feature vectors of each multimedia data and a preset conversion vector, so that the feature vectors of the multimedia data with different data types are in the same vector space. The clustering module 240 is configured to cluster the multimedia data with different data types according to the feature vector of each of the converted multimedia data.
In one possible implementation manner, vector conversion is performed on the feature vector of each multimedia data based on the feature vector of each multimedia data and the number of preset categories of the multimedia data, and a specific preset algorithm is shown in the following formula:
wherein k is the number of categories of the preset multimedia data, θ k And x is a preset conversion vector, T is a transposition, and P (i) is a vector converted feature vector.
In one possible implementation, the clustering module 240 includes: a determining unit (not shown in the figure) and a clustering unit (not shown in the figure). And the determining unit is used for determining whether the multimedia data with different data types are of one type according to the feature vector of each converted multimedia data. And the clustering unit is used for clustering the multimedia data of different data types based on a preset clustering algorithm.
In one possible implementation, the determining unit is specifically configured to: calculating Euler distances among feature vectors converted by the multimedia data vectors of different data types; and under the condition that the Euler distance is smaller than a preset threshold value, determining the multimedia data with different data types as one type.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The devices and the methods provided in the embodiments of the present application are in one-to-one correspondence, so that the devices also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices are not described here again.
In this description, all embodiments of the present application are described in a progressive manner, and identical and similar parts of all embodiments are referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (4)

1. A method of multimedia data fusion, the method comprising:
receiving multimedia data from a plurality of terminal devices, wherein the data types of the multimedia data comprise at least two of the following: text, image, audio;
respectively carrying out corresponding identification on the multimedia data of each data type to obtain feature vectors of each multimedia data, wherein the feature vectors are used for representing the features of each multimedia data;
based on the relation between the characteristic vector of each multimedia data and the preset conversion vector, carrying out vector conversion on the characteristic vector of each multimedia data so as to enable the characteristic vectors of the multimedia data with different data types to be in the same vector space, wherein the specific formula is as follows:
wherein k is the number of categories of the preset multimedia data, and θ k The method is characterized in that the method is the characteristic vector of kth multimedia data, x is a preset conversion vector, T is a transposition, and P (i) is the characteristic vector after vector conversion;
clustering the multimedia data with different data types according to the feature vector of each converted multimedia data, wherein the clustering comprises the following steps:
determining whether the multimedia data with different data types are in one type according to the feature vectors of the converted multimedia data, specifically, calculating Euler distances between the feature vectors of the multimedia data with different data types, and determining the multimedia data with different data types are in one type under the condition that the Euler distances are smaller than a preset threshold;
based on a preset clustering algorithm, clustering the multimedia data of different data types.
2. The method of claim 1, wherein the data type of the multimedia data further comprises: video.
3. The method of claim 1, wherein before the respective identification of the multimedia data of each data type is performed to obtain the feature vector of each multimedia data, the method further comprises:
and respectively preprocessing the multimedia data with different data types.
4. A multimedia data fusion device, the device comprising:
a receiving module, configured to receive multimedia data from a plurality of terminal devices, where data types of the multimedia data include at least two of the following: text, image, audio;
the identification module is used for respectively carrying out corresponding identification on the multimedia data of each data type to obtain the feature vector of each multimedia data; wherein the feature vector is used for representing the features of each multimedia data;
the vector conversion module is used for carrying out vector conversion on the feature vectors of the multimedia data based on the relation between the feature vectors of the multimedia data and the preset conversion vectors so that the feature vectors of the multimedia data with different data types are in the same vector space, and the specific formula is as follows:
wherein k is the number of categories of the preset multimedia data, and θ k The method is characterized in that the method is the characteristic vector of kth multimedia data, x is a preset conversion vector, T is a transposition, and P (i) is the characteristic vector after vector conversion;
the clustering module is used for clustering the multimedia data of different data types according to the feature vectors of the converted multimedia data;
the clustering module comprises: a determining unit and a clustering unit;
the determining unit is configured to determine whether the multimedia data of different data types are in a class according to the feature vectors of the converted multimedia data, specifically, calculate euler distances between feature vectors of the multimedia data vectors of different data types, and determine that the multimedia data of different data types are in a class when the euler distances are smaller than a preset threshold;
the clustering unit is used for clustering the multimedia data of different data types based on a preset clustering algorithm.
CN201911259689.9A 2019-12-10 2019-12-10 Multimedia data fusion method and device Active CN111291204B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911259689.9A CN111291204B (en) 2019-12-10 2019-12-10 Multimedia data fusion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911259689.9A CN111291204B (en) 2019-12-10 2019-12-10 Multimedia data fusion method and device

Publications (2)

Publication Number Publication Date
CN111291204A CN111291204A (en) 2020-06-16
CN111291204B true CN111291204B (en) 2023-08-29

Family

ID=71021287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911259689.9A Active CN111291204B (en) 2019-12-10 2019-12-10 Multimedia data fusion method and device

Country Status (1)

Country Link
CN (1) CN111291204B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6086006A (en) * 1996-10-29 2000-07-11 Scerbvo, Iii; Frank C. Evidence maintaining tape recording reels and cassettes
CN101021849A (en) * 2006-09-14 2007-08-22 浙江大学 Transmedia searching method based on content correlation
CN103440292A (en) * 2013-08-16 2013-12-11 新浪网技术(中国)有限公司 Method and system for retrieving multimedia information based on bit vector
CN104182421A (en) * 2013-05-27 2014-12-03 华东师范大学 Video clustering method and detecting method
CN104679902A (en) * 2015-03-20 2015-06-03 湘潭大学 Information abstract extraction method in conjunction with cross-media fuse
CN110209844A (en) * 2019-05-17 2019-09-06 腾讯音乐娱乐科技(深圳)有限公司 Multi-medium data matching process, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6086006A (en) * 1996-10-29 2000-07-11 Scerbvo, Iii; Frank C. Evidence maintaining tape recording reels and cassettes
CN101021849A (en) * 2006-09-14 2007-08-22 浙江大学 Transmedia searching method based on content correlation
CN104182421A (en) * 2013-05-27 2014-12-03 华东师范大学 Video clustering method and detecting method
CN103440292A (en) * 2013-08-16 2013-12-11 新浪网技术(中国)有限公司 Method and system for retrieving multimedia information based on bit vector
CN104679902A (en) * 2015-03-20 2015-06-03 湘潭大学 Information abstract extraction method in conjunction with cross-media fuse
CN110209844A (en) * 2019-05-17 2019-09-06 腾讯音乐娱乐科技(深圳)有限公司 Multi-medium data matching process, device and storage medium

Also Published As

Publication number Publication date
CN111291204A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN107491534B (en) Information processing method and device
US11288444B2 (en) Optimization techniques for artificial intelligence
US20220121906A1 (en) Task-aware neural network architecture search
CN110532554A (en) A kind of Chinese abstraction generating method, system and storage medium
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
CN114298158A (en) Multi-mode pre-training method based on image-text linear combination
CN112883731B (en) Content classification method and device
CN106354856B (en) Artificial intelligence-based deep neural network enhanced search method and device
CN109885796B (en) Network news matching detection method based on deep learning
CN112307164A (en) Information recommendation method and device, computer equipment and storage medium
CN115292470B (en) Semantic matching method and system for intelligent customer service of petty loan
CN111723295A (en) Content distribution method, device and storage medium
CN110659392B (en) Retrieval method and device, and storage medium
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN111460224B (en) Comment data quality labeling method, comment data quality labeling device, comment data quality labeling equipment and storage medium
CN111291204B (en) Multimedia data fusion method and device
CN115599953A (en) Training method and retrieval method of video text retrieval model and related equipment
CN114782752B (en) Small sample image integrated classification method and device based on self-training
CN115623134A (en) Conference audio processing method, device, equipment and storage medium
CN112749556B (en) Multi-language model training method and device, storage medium and electronic equipment
CN114973086A (en) Video processing method and device, electronic equipment and storage medium
CN114842301A (en) Semi-supervised training method of image annotation model
Jiao et al. Realization and improvement of object recognition system on raspberry pi 3b+
CN110472140B (en) Object word recommendation method and device and electronic equipment
CN114357111A (en) Policy association influence analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230803

Address after: No.3188 Hengxiang North Street, Baoding City, Hebei Province 071051

Applicant after: Hebei Finance University

Address before: No.3188 Hengxiang North Street, Baoding City, Hebei Province 071051

Applicant before: Hebei Finance University

Applicant before: HUARUI XINZHI TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant