CN113591655A - Video contrast loss calculation method, system, storage medium and electronic device - Google Patents
Video contrast loss calculation method, system, storage medium and electronic device Download PDFInfo
- Publication number
- CN113591655A CN113591655A CN202110835232.9A CN202110835232A CN113591655A CN 113591655 A CN113591655 A CN 113591655A CN 202110835232 A CN202110835232 A CN 202110835232A CN 113591655 A CN113591655 A CN 113591655A
- Authority
- CN
- China
- Prior art keywords
- video
- sound
- contrast loss
- encoder
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a video contrast loss calculation method, a video contrast loss calculation system, a storage medium and electronic equipment, wherein the contrast loss calculation method comprises the following steps: and sound sampling: continuously sampling a plurality of video equal parts for each video; processing the video frame and the sound frequency spectrum of each video equal part through an Encoder network and a co-attribute i on module to obtain modal characteristics of vision and sound; calculating the contrast loss: calculating the contrast loss according to the modal characteristics of vision and sound; the processing step comprises: an input step: respectively sending the video frame and the sound frequency spectrum of each video equal part to an Encoder network; encoder network processing: and the video frame and the sound frequency spectrum are processed by the Encoder network to obtain visual characteristics and sound characteristics. The invention can realize cross-modal information fusion, and the sound information can guide the learning of the visual model and the visual information can guide the learning of the sound model.
Description
Technical Field
The invention belongs to the field of video contrast loss calculation, and particularly relates to a video contrast loss calculation method, a video contrast loss calculation system, a storage medium and electronic equipment.
Background
1) The characterization is performed based on a direct classification method, for example, a video directly gives a class, and the encoder part of the model is trained by the classification method. The direct classification method is a supervised method, data needs to be labeled, and the contrast learning method is a self-supervision method, does not need to be labeled, and can directly learn semantic abstract information of the image by using the characteristics of the data.
2) The generation learning method is used for representation, the method pays more attention to details of the pixel level of the image, however, the contrast learning method only needs to distinguish in a feature space, does not pay attention to details of the pixel, and pays more attention to abstract semantic information.
The prior art is as follows:
the existing video contrast loss calculation method is used for extracting features of a video, and downstream tasks such as action recognition, scene segmentation, scene classification and the like can be better performed after a better feature exists.
The prior art transducer can realize automatic attention, and the attention is not influenced by distance.
Disclosure of Invention
The embodiment of the application provides a method, a system, a storage medium and an electronic device for calculating the contrast loss of a video, so as to at least solve the problem that the conventional method for calculating the contrast loss of the video has a complex program.
The invention provides a video contrast loss calculation method, which comprises the following steps:
a sampling step: continuously sampling a plurality of video equal parts for each video;
processing the video frame and the sound frequency spectrum of each video equal part through an Encoder network and a co-attention module to obtain modal characteristics of vision and sound;
calculating the contrast loss: and calculating the contrast loss according to the modal characteristics of vision and sound.
The above-mentioned contrast loss calculation method, wherein the processing step includes:
an input step: respectively sending the video frame and the sound frequency spectrum of each video equal part to an Encoder network;
encoder network processing: the video frame and the sound frequency spectrum are processed by the Encoder network to obtain visual characteristics and sound characteristics;
and a co-attention module is used for processing, wherein the video features and the sound features are input into the co-attention module together, and after the processing of the co-attention module is completed, a processing result is input into the multi-layer perceptron layer to obtain the modal features of vision and sound.
The above method for calculating the contrast loss, wherein the Encoder network comprises: the video frame is input into the Encoder _ v network, the sound spectrum is input into the Encoder _ a network, the weights of the Encoder _ v network and the Encoder _ a network are not shared, and the video frame and the sound spectrum of the same video equal part are kept consistent in time.
The above-mentioned contrast loss calculation method, wherein the contrast loss calculation step includes: and calculating according to the modal characteristics of vision and sound to obtain the contrast loss of each video equal part, and performing mean operation according to a plurality of contrast losses to obtain the total loss.
The invention also provides a system for calculating the contrast loss of the video, which comprises the following steps:
the sampling module is used for continuously sampling a plurality of video equal parts for each video;
the processing module is used for processing the video frame and the sound frequency spectrum of each video equal part through an Encoder network and a co-attention module to obtain modal characteristics of vision and sound;
a contrast loss calculation module that performs a contrast loss calculation according to the modal characteristics of vision and sound.
The above ratio loss calculating system, wherein the processing module comprises:
the input unit is used for respectively transmitting the video frame and the sound frequency spectrum of each video equal part to an Encoder network;
the Encoder network processing unit is used for processing the video frame and the sound spectrum to obtain visual characteristics and sound characteristics;
and the co-attention module unit inputs the video features and the sound features into the co-attention module, and the co-attention module inputs the processing result into the multi-layer perceptron layer after the processing is finished so as to obtain the modal features of vision and sound.
The above ratio loss calculation system, wherein the Encoder network comprises: the video frame is input into the Encoder _ v network, the sound spectrum is input into the Encoder _ a network, the weights of the Encoder _ v network and the Encoder _ a network are not shared, and the video frame and the sound spectrum of the same video equal part are kept consistent in time.
The above-mentioned ratio loss calculating system, wherein the ratio loss calculating module includes: and calculating according to the modal characteristics of vision and sound to obtain the contrast loss of each video equal part, and performing mean operation according to a plurality of contrast losses to obtain the total loss.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the contrast loss calculation method as described in any one of the above when executing the computer program.
A storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a contrast loss calculation method as described in any one of the above.
The invention has the beneficial effects that:
the invention belongs to the field of computer vision in the deep learning technology. The invention uses cross-model attention mode to do comparison study; and performing information interaction between the sequences in a sequence clip mode. 1) The invention can realize cross-modal information fusion, and the sound information of the invention can guide the learning of the visual model and the visual information can guide the learning of the sound model. 2) The invention uses a self-supervision method without marking data. 3) The invention carries out contrast learning on the continuous clip sequences and increases the information interaction between the sequences.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application.
In the drawings:
FIG. 1 is a flow chart of a method of contrast loss calculation for video in accordance with the present invention;
FIG. 2 is a flow chart of substep S2 of the present invention;
FIG. 3 is a diagram of a model of the present invention;
FIG. 4a is a cross-model annotation block diagram of the present invention;
FIG. 4b is a diagram of self-attention module of the present invention;
FIG. 5 is a loss calculation graph of the present invention;
FIG. 6 is a schematic diagram of a system for calculating contrast loss of a video according to the present invention;
fig. 7 is a frame diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.
Before describing in detail the various embodiments of the present invention, the core inventive concepts of the present invention are summarized and described in detail by the following several embodiments.
The first embodiment is as follows:
referring to fig. 1, fig. 1 is a flowchart of a video contrast loss calculation method. As shown in fig. 1, the method for calculating the contrast loss of a video according to the present invention includes:
sampling step S1: continuously sampling a plurality of video equal parts for each video;
a processing step S2, in which the video frame and the sound frequency spectrum of each video equal part are processed by an Encoder network and a co-attention module to obtain the modal characteristics of vision and sound;
contrast loss calculation step S3: and calculating the contrast loss according to the modal characteristics of vision and sound.
Referring to fig. 2, fig. 2 is a flowchart of the processing step S2. As shown in fig. 2, the processing step S2 includes:
input step S21: respectively sending the video frame and the sound frequency spectrum of each video equal part to an Encoder network;
encoder network processing step S22: the video frame and the sound frequency spectrum are processed by the Encoder network to obtain visual characteristics and sound characteristics;
and a co-attention module processing S23, inputting the video features and the sound features into the co-attention module together, and inputting the processing result into the multi-layer perceptron layer after the co-attention module processing is finished to obtain the modal features of vision and sound.
Wherein the Encoder network includes: the video frame is input into the Encoder _ v network, the sound spectrum is input into the Encoder _ a network, the weights of the Encoder _ v network and the Encoder _ a network are not shared, and the video frame and the sound spectrum of the same video equal part are kept consistent in time.
Wherein the contrast loss calculating step includes: and calculating according to the modal characteristics of vision and sound to obtain the contrast loss of each video equal part, and performing mean operation according to a plurality of contrast losses to obtain the total loss.
Specifically, the invention uses a deep learning method to characterize the visual part and the sound part of the video, so that the feature space can well map the original visual information and the sound information. When a video has good visual and sound characteristics, the downstream task can be better done.
Further, as shown in fig. 3, 4 and 5, the method for calculating the contrast loss of the video of the present invention includes:
Respectively sending the visual frame and the voice frequency spectrum of each clip to an Encoder _ v and an Encoder _ a, wherein the network weights of the Encoder _ v and the Encoder _ a are not shared; the video frames and the audio spectrum of the same clip remain temporally coincident.
And 3, obtaining n-D characteristics after the information of the visual and sound modes passes through the Encoder, wherein n is the clip length, and D is the characteristic dimension.
And 4, sending the visual feature (n X D) and the sound feature (n X D) to the co-attention module together. Each co-attention module is composed of a cross-module attention module and a self-attention module, wherein the cross-module attention module is shown in figure 4(a), and the self-attention module is shown in figure 4 (b).
Step 5. there may be a plurality of co-attention modules in step 4. And entering a multilayer perceptron layer (MLP) after all the co-attention modules are completed.
And 6, obtaining the characteristics of the visual and sound modes in the step 5, wherein the characteristic dimension is n × 256, and then calculating the contrast loss, wherein the contrast loss is calculated in a mode shown in the figure 5, each pair of clip characteristics calculate one contrast loss, and the total loss is the average of n pairs of losses.
Still further, the present invention uses cross-model attention for comparative learning.
Still further, the invention uses a sequence clip mode to carry out information interaction between sequences.
Example two:
referring to fig. 6, fig. 6 is a schematic structural diagram of a video contrast loss calculation system according to the present invention. Fig. 6 shows a system for calculating contrast loss of a video according to the present invention, which includes:
the sampling module is used for continuously sampling a plurality of video equal parts for each video;
the processing module is used for processing the video frame and the sound frequency spectrum of each video equal part through an Encoder network and a co-attention module to obtain modal characteristics of vision and sound;
a contrast loss calculation module that performs a contrast loss calculation according to the modal characteristics of vision and sound.
Wherein the processing module comprises:
the input unit is used for respectively transmitting the video frame and the sound frequency spectrum of each video equal part to an Encoder network;
the Encoder network processing unit is used for processing the video frame and the sound spectrum to obtain visual characteristics and sound characteristics;
and the co-attention module unit inputs the video features and the sound features into the co-attention module, and the co-attention module inputs the processing result into the multi-layer perceptron layer after the processing is finished so as to obtain the modal features of vision and sound.
Wherein the Encoder network includes: the video frame is input into the Encoder _ v network, the sound spectrum is input into the Encoder _ a network, the weights of the Encoder _ v network and the Encoder _ a network are not shared, and the video frame and the sound spectrum of the same video equal part are kept consistent in time.
Wherein the contrast loss calculation module comprises: and calculating according to the modal characteristics of vision and sound to obtain the contrast loss of each video equal part, and performing mean operation according to a plurality of contrast losses to obtain the total loss.
Example three:
referring to fig. 7, this embodiment discloses an embodiment of an electronic device. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 reads and executes the computer program instructions stored in the memory 82 to implement the contrast loss calculation method for video in any of the above embodiments.
In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 7, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The bus 80 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (Front Side Bus), an FSB (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an Infini Band Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device may be based on the contrast loss calculation of the video to implement the methods described in connection with fig. 1-2.
In addition, in combination with the method for calculating the contrast loss of the video in the foregoing embodiments, the embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement a method of contrast loss calculation for video as in any of the above embodiments.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
In summary, the beneficial effects of the invention are that the scheme realizes the calculation of the contrast loss of the video, 1) the invention can realize cross-modal information fusion, the sound information can guide the learning of the visual model, and the visual information can guide the learning of the sound model. 2) The invention uses a self-supervision method without marking data. 3) The invention carries out contrast learning on the continuous clip sequences and increases the information interaction between the sequences.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (10)
1. A method for calculating a contrast loss of a video, comprising:
a sampling step: continuously sampling a plurality of video equal parts for each video;
processing the video frame and the sound frequency spectrum of each video equal part through an Encoder network and a co-attention module to obtain modal characteristics of vision and sound;
calculating the contrast loss: and calculating the contrast loss according to the modal characteristics of vision and sound.
2. The method of calculating the contrast loss of a video according to claim 1, wherein the processing step comprises:
an input step: respectively sending the video frame and the sound frequency spectrum of each video equal part to an Encoder network;
encoder network processing: the video frame and the sound frequency spectrum are processed by the Encoder network to obtain visual characteristics and sound characteristics;
and a co-attention module is used for processing, wherein the video features and the sound features are input into the co-attention module together, and after the processing of the co-attention module is completed, a processing result is input into the multi-layer perceptron layer to obtain the modal features of vision and sound.
3. The video contrast loss calculation method of claim 2, wherein the Encoder network comprises: the video frame is input into the Encoder _ v network, the sound spectrum is input into the Encoder _ a network, the weights of the Encoder _ v network and the Encoder _ a network are not shared, and the video frame and the sound spectrum of the same video equal part are kept consistent in time.
4. The method of calculating the contrast loss of a video according to claim 1, wherein the contrast loss calculating step comprises: and calculating according to the modal characteristics of vision and sound to obtain the contrast loss of each video equal part, and performing mean operation according to a plurality of contrast losses to obtain the total loss.
5. A system for calculating contrast loss of a video, comprising:
the sampling module is used for continuously sampling a plurality of video equal parts for each video;
the processing module is used for processing the video frame and the sound frequency spectrum of each video equal part through an Encoder network and a co-attention module to obtain modal characteristics of vision and sound;
a contrast loss calculation module that performs a contrast loss calculation according to the modal characteristics of vision and sound.
6. The video contrast loss calculation system of claim 5, wherein the processing module comprises:
the input unit is used for respectively transmitting the video frame and the sound frequency spectrum of each video equal part to an Encoder network;
the Encoder network processing unit is used for processing the video frame and the sound spectrum to obtain visual characteristics and sound characteristics;
and the co-attention module unit inputs the video features and the sound features into the co-attention module, and the co-attention module inputs the processing result into the multi-layer perceptron layer after the processing is finished so as to obtain the modal features of vision and sound.
7. The video contrast loss calculation system of claim 6, wherein the Encoder network comprises: the video frame is input into the Encoder _ v network, the sound spectrum is input into the Encoder _ a network, the weights of the Encoder _ v network and the Encoder _ a network are not shared, and the video frame and the sound spectrum of the same video equal part are kept consistent in time.
8. The system for calculating the contrast loss of a video according to claim 5, wherein the contrast loss calculating module comprises: and calculating according to the modal characteristics of vision and sound to obtain the contrast loss of each video equal part, and performing mean operation according to a plurality of contrast losses to obtain the total loss.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the contrast loss calculation method according to any one of claims 1 to 4 when executing the computer program.
10. A storage medium on which a computer program is stored, which program, when being executed by a processor, carries out the contrast loss calculation method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110835232.9A CN113591655A (en) | 2021-07-23 | 2021-07-23 | Video contrast loss calculation method, system, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110835232.9A CN113591655A (en) | 2021-07-23 | 2021-07-23 | Video contrast loss calculation method, system, storage medium and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113591655A true CN113591655A (en) | 2021-11-02 |
Family
ID=78249202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110835232.9A Pending CN113591655A (en) | 2021-07-23 | 2021-07-23 | Video contrast loss calculation method, system, storage medium and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113591655A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112203122A (en) * | 2020-10-10 | 2021-01-08 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based similar video processing method and device and electronic equipment |
CN112820320A (en) * | 2020-12-31 | 2021-05-18 | 中国科学技术大学 | Cross-modal attention consistency network self-supervision learning method |
CN112926379A (en) * | 2021-01-07 | 2021-06-08 | 上海明略人工智能(集团)有限公司 | Method and device for constructing face recognition model |
WO2021115180A1 (en) * | 2019-12-13 | 2021-06-17 | 北京金山云网络技术有限公司 | Sample image processing method and apparatus, electronic device, and medium |
-
2021
- 2021-07-23 CN CN202110835232.9A patent/CN113591655A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021115180A1 (en) * | 2019-12-13 | 2021-06-17 | 北京金山云网络技术有限公司 | Sample image processing method and apparatus, electronic device, and medium |
CN112203122A (en) * | 2020-10-10 | 2021-01-08 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based similar video processing method and device and electronic equipment |
CN112820320A (en) * | 2020-12-31 | 2021-05-18 | 中国科学技术大学 | Cross-modal attention consistency network self-supervision learning method |
CN112926379A (en) * | 2021-01-07 | 2021-06-08 | 上海明略人工智能(集团)有限公司 | Method and device for constructing face recognition model |
Non-Patent Citations (1)
Title |
---|
YING CHENG等: "Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning", 《PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111666960B (en) | Image recognition method, device, electronic equipment and readable storage medium | |
CN107545889A (en) | Suitable for the optimization method, device and terminal device of the model of pattern-recognition | |
CN110189260B (en) | Image noise reduction method based on multi-scale parallel gated neural network | |
CN113569705A (en) | Scene segmentation point judgment method and system, storage medium and electronic device | |
CN112784572A (en) | Marketing scene conversational analysis method and system | |
JP2023535108A (en) | Video tag recommendation model training method, video tag determination method, device, electronic device, storage medium and computer program therefor | |
CN113743277A (en) | Method, system, equipment and storage medium for short video frequency classification | |
CN113012689B (en) | Electronic equipment and deep learning hardware acceleration method | |
CN114048288A (en) | Fine-grained emotion analysis method and system, computer equipment and storage medium | |
CN113902636A (en) | Image deblurring method and device, computer readable medium and electronic equipment | |
CN113591655A (en) | Video contrast loss calculation method, system, storage medium and electronic device | |
CN113569703B (en) | Real division point judging method, system, storage medium and electronic equipment | |
CN113569704B (en) | Segmentation point judging method, system, storage medium and electronic equipment | |
CN110414527A (en) | Character identifying method, device, storage medium and electronic equipment | |
CN111784567B (en) | Method, apparatus, electronic device, and computer-readable medium for converting image | |
CN114254563A (en) | Data processing method and device, electronic equipment and storage medium | |
CN112560970A (en) | Abnormal picture detection method, system, equipment and storage medium based on self-coding | |
CN113742525A (en) | Self-supervision video hash learning method, system, electronic equipment and storage medium | |
CN113570417A (en) | Social digital marketing method and system, storage medium and electronic equipment | |
CN112257726A (en) | Target detection training method, system, electronic device and computer readable storage medium | |
CN113569706B (en) | Video scene segmentation point judging method, system, storage medium and electronic equipment | |
CN113343669B (en) | Word vector learning method, system, electronic equipment and storage medium | |
CN112863497B (en) | Method and device for speech recognition, electronic equipment and computer readable storage medium | |
CN113821661B (en) | Image retrieval method, system, storage medium and electronic device | |
CN116596043B (en) | Convolutional neural network calculation method, system, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |