CN116156279A

CN116156279A - Short video transmission processing method and system based on artificial intelligence

Info

Publication number: CN116156279A
Application number: CN202310222805.XA
Authority: CN
Inventors: 曾海兵; 闫国范
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-03-09
Filing date: 2023-03-09
Publication date: 2023-05-23

Abstract

The application relates to the technical field of artificial intelligence, live broadcast and short video processing, and relates to a short video transmission processing method and system based on artificial intelligence. According to the short video transmission processing method based on artificial intelligence, when the threads are configured, the video expression vectors of the sample video expression contents are identified by combining the secondary video expression contents related to the sample video expression contents, so that the video expression vectors of the identified sample video expression contents are more accurate and reliable, the accuracy of the video expression contents to be processed is improved, the accurate control of video data transmission can be realized, and the short video transmission processing efficiency is improved.

Description

Short video transmission processing method and system based on artificial intelligence

Technical Field

The application relates to the technical field of artificial intelligence, live broadcast and short video processing, in particular to a short video transmission processing method and system based on artificial intelligence.

Background

In the internet era, short videos become a part of current people's recorded lives and become an important component of current live broadcast with goods and online sales. At present, in the conventional technology, the video has the characteristics of incomplete video, easy drying and the like, so that the experience of a user can be reduced, the problems can not be solved effectively when the defects are improved in the prior art, and the fluency and the integrity of the video can not be improved.

Disclosure of Invention

In order to improve the technical problems in the related art, the application provides a short video transmission processing method and system based on artificial intelligence.

In a first aspect, there is provided a short video transmission processing method based on artificial intelligence, the method at least comprising: obtaining first video interaction data, wherein the first video interaction data comprises important video fragments and at least one secondary video fragment, the fragment vector of the important video fragment represents a video expression vector of target video expression content, the video expression vector is controlled to transmit, the fragment vector of the secondary video fragment represents a video expression vector of secondary video expression content, and the secondary video expression content is video expression content related to the target video expression content; and loading the first video interaction data to a vector optimization thread, wherein the vector optimization thread optimizes the segment vector of the important video segment by combining the segment vector of the secondary video segment in the first video interaction data to obtain a video expression vector of the optimized target video expression content, and controls the video expression vector to transmit.

In an independent embodiment, the vector optimization thread optimizes the segment vector of the important video segment in combination with the segment vector of the secondary video segment in the first video interaction data to obtain a video expression vector of the optimized target video expression content, and controls the video expression vector to transmit, including: determining a confidence level between the important video segment and each of the secondary video segments in the first video interaction data; combining the confidence coefficient to fuse the video expression vectors of the secondary video segments to obtain the weighting vector of the important video segment; and combining the video expression vector of the important video segment with the weighting vector to obtain the video expression vector of the optimized target video expression content, and controlling the video expression vector to transmit.

In an independently implemented embodiment, before the obtaining the first video interaction data, the method further comprises: and combining the target video expression content, and acquiring secondary video expression content associated with the target video expression content from a prestored video expression content set.

In an independently implemented embodiment, the obtaining secondary video presentation content associated with the target video presentation content from a set of pre-stored video presentation content in combination with the target video presentation content comprises: the video expression vectors of the target video expression contents are obtained one by one through a vector identification thread, and the video expression vectors are controlled to be transmitted and prestored for the video expression vectors of each audio content in a video expression content set; and controlling the video expression vector to transmit and pre-store the vector commonality association degree between the video expression vector of each audio content in the video expression content set based on the video expression vector of the target video expression content, and determining secondary video expression content associated with the target video expression content from the pre-store video expression content set.

In an independent embodiment, the controlling the degree of vector commonality association between the video enunciation vector for transmitting and pre-storing the video enunciation vector for each audio content in a set of video enunciation content based on the video enunciation vector for the target video enunciation content, determining secondary video enunciation content associated with the target video enunciation content, comprises: distributing the vector commonality association degree between the target video expression content and each audio content according to the association coefficient of the vector commonality association degree from big to small; and screening the audio content corresponding to the vector commonality association degree of the interval data, and regarding the audio content as the secondary video expression content with the association of the target video expression content.

In an independently implemented embodiment, the controlling the degree of vector commonality association between the video enunciation vector for transmission and the video enunciation vector for each of the audio content in a set of pre-stored video enunciation content based on the video enunciation vector for the target video enunciation content, determining secondary video enunciation content from the set of pre-stored video enunciation content that is associated with the target video enunciation content, comprises: combining the video expression vectors of the target video expression contents, controlling the vector commonality association degree between the video expression vectors for transmission and the video expression vectors of each audio content, and obtaining a first video expression content associated with the target video expression contents from each audio content; obtaining second video expression content associated with the first video expression content from each of the audio contents in combination with the vector commonality association degree between the video expression vector of the first video expression content and the video expression vector of the audio content; and regarding the first video expression content and the second video expression content as secondary video expression content of the target video expression content.

In an independent embodiment, the number of vector optimized threads is one, or a number accumulated one by one; when the number of vector optimization threads is several: the calculation of one vector optimization thread is random, and the calculation is the first video interaction data output by the vector optimization thread which is the last time.

In an independent embodiment, the fusing the video expression vectors of the secondary video segments in combination with the confidence level to obtain the weighted vector of the important video segment includes: and clustering the video expression vectors of the secondary video segments by combining the confidence coefficient to obtain the weighting vector of the important video segment.

In an independent embodiment, the combining the video expression vector of the important video segment with the weighting vector to obtain the video expression vector of the optimized target video expression content, and controlling the video expression vector to transmit includes: combining a video expression vector of the important video segment with the weighting vector; and carrying out first projection on the combined vectors to obtain video expression vectors of the optimized target video expression content, and controlling the video expression vectors to be transmitted.

In an independently implemented embodiment, said determining a confidence level between said important video segment and each said secondary video segment in said first video interaction data comprises: performing second projection on the important video segment and the secondary video segment; determining an association relation between the important video segment and the secondary video segment after the second projection; and determining the confidence coefficient according to the association relation after the first projection processing.

In an independently implemented embodiment, the target video presentation content comprises: searching video expression content to be processed and pre-storing each audio content in the video expression content set; after obtaining the video expression vector of the target video expression content corresponding to the important video segment and controlling the video expression vector to transmit, the method further comprises: and controlling the vector commonality association degree between the video expression vector and the video expression vector of each audio content based on the video expression vector of the optimized target video expression content, and obtaining the near video expression content of the target video expression content from the audio content as a search result.

In a second aspect, an artificial intelligence based short video transmission processing system is provided, comprising a processor and a memory in communication with each other, the processor being adapted to read a computer program from the memory and execute the computer program to implement the method described above.

According to the short video transmission processing method and system based on artificial intelligence, when threads are configured, the video expression vectors of the example video expression content are identified by combining the secondary video expression content associated with the example video expression content, so that the video expression vectors of the identified example video expression content are more accurate and reliable, the accuracy of video expression content to be processed is improved, and therefore video data transmission can be accurately controlled.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a short video transmission processing method based on artificial intelligence according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions described above, the following detailed description of the technical solutions of the present application is provided through the accompanying drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present application are detailed descriptions of the technical solutions of the present application, and not limit the technical solutions of the present application, and the technical features of the embodiments and embodiments of the present application may be combined with each other without conflict.

Referring to fig. 1, a short video transmission processing method based on artificial intelligence is shown, which may include the following technical solutions described in step 101 and step 102.

In step 101, first video interaction data is obtained, where the first video interaction data includes an important video segment and at least one secondary video segment, a segment vector of the important video segment represents a video expression vector of a target video expression content, the video expression vector is controlled to transmit, and a segment vector of the secondary video segment represents a video expression vector of a secondary video expression content, and the secondary video expression content is a video expression content associated with the target video expression content.

In this embodiment, the target video expression content is a video expression content of a video expression vector to be screened, and the video expression content may be video expression content in different application scenarios, for example, may be video expression content to be searched in a video expression content search application, and the pre-stored video expression content set may be a search pre-stored video expression content set in the video expression content search application.

For example, the secondary video presentation may be obtained by obtaining secondary video presentation associated with the presence of the target video presentation from a set of pre-stored video presentation before obtaining the first video interactive data. For example, the secondary video presentation content may be determined according to a video presentation vector similarity metric, for example, the video presentation vectors of the target video presentation content are obtained one by one through a vector recognition thread, the video presentation vectors of each audio content in the set of video presentation vectors are controlled to be transmitted and pre-stored, the degree of vector commonality association between the video presentation vectors of the target video presentation content and the video presentation vectors of each audio content in the set of pre-stored video presentation content is controlled based on the video presentation vectors of the target video presentation content, and the secondary video presentation content associated with the target video presentation content is determined from the set of pre-stored video presentation content.

In one embodiment, the degree of vector commonality association between the target video expression content and each audio content may be distributed according to the order from large to small according to the association coefficient of the degree of vector commonality association, and the audio content corresponding to the first X vector commonality association is screened, which is regarded as the secondary video expression content associated with the target video expression content.

In one possible implementation embodiment, it is also possible to obtain a first video expression content associated with the target video expression content according to the similarity between the video expression vectors, and then obtain a second video expression content associated with the first video expression content, and consider both the first video expression content and the second video expression content as secondary video expression content of the target video expression content.

In step 102, the first video interaction data is loaded to a vector optimization thread, and the vector optimization thread optimizes the segment vector of the important video segment in combination with the segment vector of the secondary video segment in the first video interaction data to obtain a video expression vector of the optimized target video expression content, and controls the video expression vector to transmit.

Taking the feature extraction unit as an example, the feature extraction unit in this embodiment may optimize the segment vector of the important video segment according to the segment vector of the secondary video segment, for example, may determine the confidence coefficient between the important video segment and each secondary video segment in the first video interaction data, fuse the video expression vector of each secondary video segment according to the confidence coefficient to obtain the weighting vector of the important video segment, and combine the video expression vector of the important video segment with the weighting vector to obtain the video expression vector of the optimized target video expression content, so as to control the video expression vector to transmit.

In the present embodiment, the number of the feature extraction units may be one or a number accumulated one by one. For example, when the number of feature extraction units is two, the first video interaction data is loaded to the first feature extraction unit, the first feature extraction unit optimizes the video expression vector of the important video segment according to the video expression vector of each secondary video segment, and the video expression vector of the important video segment in the first video interaction data output by the first feature extraction unit is already optimized and is the optimized first video interaction data. The optimized first video interaction data is continuously loaded to a second feature extraction unit, the second feature extraction unit continuously optimizes the video expression vector of the important video segment according to the video expression vector of each secondary video segment, and the second optimized first video interaction data is output, wherein the video expression vector of the important video segment is also secondarily optimized.

The first video interaction data in this embodiment includes a plurality of segments (e.g., important video segments, secondary video segments), where a segment vector of each segment characterizes a video expression vector of the video expression content represented by the segment. In addition, each segment in the first video interaction data can be regarded as an important video segment, and the video expression vector of the video expression content corresponding to the segment is optimized through the embodiment, for example, when the segment is regarded as an important video segment, the first video interaction data of which the segment is regarded as an important video segment is obtained, and the first video interaction data is loaded to a vector optimization thread to optimize the video expression vector of the segment.

According to the short video transmission processing method based on artificial intelligence, the video expression vector is optimized and screened by using the vector optimization thread, and the vector optimization thread optimizes the video expression vector of the important video segment according to the video expression vector of the secondary video segment of the important video segment, so that the video expression vector of the identified target video expression content can be controlled to transmit, the target video expression content can be expressed more accurately, and the video expression content identification process is more accurate and reliable.

The processing flow of the vector optimization thread in this embodiment describes how the vector optimization thread optimizes the video expression vector loaded to the video expression content of the thread. Taking the feature extraction unit as an example, the vector optimization thread may include the following steps.

In step 200, a confidence level between the important video segment and the secondary video segment is determined based on the video expression vectors of the important video segment and the secondary video segment.

In this embodiment, the important video clip may be a target video expression content of the thread application stage, and the secondary video clip may be a secondary video expression content of the target video expression content.

In step 202, the confidence is combined to cluster the video expression vectors of the secondary video segments to obtain the weighted vectors of the important video segments.

In step 204, the video expression vector of the important video segment and the weighting vector are combined to obtain an optimized vector of the optimized target video expression content.

Through the steps 200 to 204, the segment vector of the important video segment in the first video interaction data is optimized, and the video expression vector of the optimized important video segment is obtained.

According to the short video transmission processing method based on artificial intelligence, the feature extraction unit clusters according to the video expression vectors of the secondary video segments of the important video segments to determine the vectors of the important video segments, so that the video expression vectors of the sample video expression content and the vectors of other related video expression contents can be comprehensively referred to, the identified sample video expression vectors are more accurate and reliable, the accuracy of video expression content to be processed is improved, and the video data transmission can be accurately controlled.

The method for configuring the vector optimization thread provided by the embodiment is used for describing the configuration process of the vector optimization thread, and specifically comprises the following steps.

In step 300, configuration secondary video presentation associated with the presence of the example video presentation is obtained from a set of configuration pre-stored video presentation according to the example video presentation for configuring the vector optimization thread.

For example, in the application scenario where the video presentation content is to be processed, the set of pre-stored video presentation content may be a set of searched pre-stored video presentation content, i.e. the set of searched pre-stored video presentation content is searched to obtain video presentation content associated with the exemplary video presentation content.

In the present embodiment, obtaining video presentation content associated with the existence of the example video presentation content may be referred to as "configuring secondary video presentation content".

The configuration secondary video expression content may be obtained, for example, by determining, as the configuration secondary video expression content, video expression content having a higher proximity according to the degree of vector commonality association between video expression contents.

In step 302, second video interaction data is obtained, where the second video interaction data includes configuration important video segments and at least one configuration secondary video segment, where a segment vector of the configuration important video segment represents a video expression vector of an example video expression content, and a segment vector of the configuration secondary video segment represents a video expression vector of a configuration secondary video expression content, and the configuration secondary video expression content is a video expression content associated with the example video expression content.

In this embodiment, the second video interaction data may include a plurality of segments thereon.

Wherein the fragment may comprise: one configuration important video clip, and not less than one configuration secondary video clip. The configuration important video segments represent example video presentation content, and each configuration secondary video segment represents one configuration secondary video presentation content determined in step 300. The segment vector of each segment is a video expression vector, e.g., the segment vector configuring the important video segment is a video expression vector of the example video expression content, and the segment vector configuring the secondary video segment is a video expression vector configuring the secondary video expression content.

In step 304, the second video interaction data is loaded to a vector optimization thread, which optimally configures segment vectors of important video segments in combination with segment vectors of configured secondary video segments in the second video interaction data.

The number of the feature extraction units may be one or a number accumulated one by one, for example. For example, when the number of feature extraction units is two, the video interaction data is loaded to a first feature extraction unit that optimizes the video expression vector of the important video segment based on the video expression vector of each of the secondary video segments, and the video interaction data output from the first feature extraction unit has been optimized. The optimized video interaction data is continuously loaded to a second feature extraction unit, the second feature extraction unit continuously optimizes the video expression vector of the important video segment according to the video expression vector of each secondary video segment, and outputs the video expression vector of the important video segment after secondary optimization.

In step 306, regression analysis data of the sample video presentation content is obtained according to the video presentation vectors of the sample video presentation content screened by the vector optimization thread.

In step 308, the debug vector optimizes thread coefficients of the thread in conjunction with the regression analysis data.

According to the configuration method of the vector optimization thread, when the thread is configured, the video expression vector of the sample video expression content is identified by combining the near video expression content of the sample video expression content, so that the video expression vector of the sample video expression content and the related other video expression content vectors of the sample video expression content can be comprehensively referred to, the identified sample video expression vector is more accurate and reliable, the accuracy to be processed of the video expression content is improved, and the video data transmission can be accurately controlled.

In another embodiment, the method for configuring the vector optimization thread filters the video expression vector through a pre-configured thread (which may be referred to as a vector recognition thread) for filtering the vector, and obtains a configuration secondary video expression content associated with the sample video expression content from a configuration pre-stored video expression content set according to a similarity measure of the video expression vector. Specifically, the method comprises the following steps.

In step 400, a thread for filtering vectors is preconfigured using a configuration set.

The video presentation content in the configuration set may be referred to as configuration video presentation content. The configuration process of the vector identification thread may include: screening and configuring video expression vectors of video expression content through a vector identification thread; combining the video expression vectors of the configuration video expression content to obtain regression analysis data of the configuration video expression content; and debugging the thread coefficient of the vector identification thread based on the regression analysis data and the identification information of the configuration video expression content.

It should be understood that the above-mentioned configuration video expression content refers to video expression content used for configuring the vector identification thread, and the above-mentioned exemplary video expression content refers to a configuration process to be applied to a vector optimization thread after the vector identification thread is configured, for example, the pre-configured vector identification thread firstly filters an exemplary video expression content and configures a video expression vector of each audio content in a pre-stored video expression content set, and then loads the video expression vector into the vector optimization thread to perform video expression vector optimization after regenerating video interaction data, wherein the video expression vector is loaded into the video expression content, i.e., the exemplary video expression content, used in the vector optimization thread configuration process. The example video presentation content and the configuration video presentation content may be the same or different.

In step 402, the exemplary video presentation content and the video presentation vector configuring each audio content in the set of pre-stored video presentation content are obtained one by one through a vector identification thread.

In step 404, a first video presentation associated with the presence of the example video presentation is obtained from each audio content in combination with a degree of vector commonality association between the example video presentation and the video presentation vector of each audio content.

In this embodiment, the audio content is video expression content in a search pre-stored video expression content set.

For example, the degree of vector commonality between the video expression vector of the exemplary video expression content and the video expression vector of each audio content may be calculated one by one, and each audio content may be distributed according to the similarity, for example, in order from high to low. And selecting the audio content ranked in the first K bits from the distribution result as the first video expression content of the example video expression content.

In step 406, second video presentation content associated with the first video presentation content is obtained from the audio content according to a degree of vector commonality association between video presentation vectors of the first video presentation content and the audio content.

In this embodiment, the degree of vector commonality association between the video expression vectors of the first video expression content and the audio content may then be calculated, and the audio content associated with the first video expression content is obtained from the audio content, and regarded as the second video expression content.

In this embodiment, the first video presentation content of the important video segment corresponding to the exemplary video presentation content may be found, and then the search for the secondary video presentation content is stopped. Alternatively, a greater number of secondary video presentations, such as a third video presentation, or a fourth video presentation, may also be found. The specific searching of several layers of secondary video expression contents can be determined according to the effects of real-time testing in different application scenes. The first video expression content, the second video expression content and the like can be called secondary video expression content, and the configuration stage of the thread can be called configuration of the secondary video expression content; in the thread application phase, it may be referred to as secondary video presentation content.

It will also be appreciated that the secondary video presentation may be obtained in other ways than the example of this step. For example, a similarity target value may be set, and all or part of the audio content with a vector commonality association degree higher than the target value may be directly regarded as the secondary video presentation content of the example video presentation content. For another example, instead of using a vector recognition thread to filter video expression vectors, the video expression vectors may be determined by taking values of several dimensions of the video expression content.

In step 408, second video interaction data is generated from the example video presentation content and the secondary video presentation content, the segments in the second video interaction data comprising: a configuration important video clip for representing the example video presentation content, and at least one configuration secondary video clip for representing the secondary video presentation content, and the clip vector of the clip is the video presentation vector of the example video presentation content or secondary video presentation content.

In step 410, the second video interaction data is loaded to a vector optimization thread, and the vector optimization thread optimizes and configures video expression vectors of important video segments in combination with video expression vectors configuring secondary video segments in the second video interaction data, screens out video expression vectors of sample video expression contents, and obtains regression analysis data of the sample video expression contents according to the video expression vectors.

In step 412, the thread coefficients of the vector optimized thread and the thread coefficients of the vector identified thread are debugged based on regression analysis data of the example video presentation.

The thread coefficient debugging in the step can be used for debugging the thread coefficient of the vector identification thread, or not debugging the thread coefficient of the vector identification thread, and can be determined according to the real-time configuration condition.

According to the configuration method of the vector optimization thread, when the thread is configured, the video expression vector of the sample video expression content is identified by combining the near video expression content of the sample video expression content, so that the video expression vector of the sample video expression content and the related other video expression content vectors of the sample video expression content can be comprehensively referred to, the identified sample video expression vector is more accurate and reliable, the accuracy to be processed of the video expression content is improved, and the video data transmission can be accurately controlled; in addition, the video expression vector is screened by adopting the vector identification thread, so that the screening efficiency of the video expression vector can be improved, the thread configuration speed is further improved, the thread coefficient of the vector identification thread can be debugged according to the loss value, and the vector identification thread screening vector is more accurate.

The embodiment of the application also provides a video expression content searching method, which aims to search the video expression content related to the target video expression content from the prestored video expression content set. Specifically, the method comprises the following steps.

In step 700, target video presentation content to be processed is obtained.

In step 702, a video expression vector of the target video expression content is obtained through screening, and the video expression vector is controlled to be transmitted.

In this embodiment, the method for processing short video transmission based on artificial intelligence according to one embodiment of the present application may be described.

In step 704, the video expression vector of each audio content in the set of pre-stored video expression content is filtered.

In this embodiment, the method for processing short video transmission based on artificial intelligence according to any embodiment of the present application may be, for example, screening video expression vectors of each audio content in a set of pre-stored video expression contents.

In step 706, based on the video expression vectors of the target video expression content, the degree of vector commonality association between the video expression vectors of the target video expression content and the video expression vectors of each audio content is controlled to obtain the near video expression content of the target video expression content as a search result.

In this embodiment, the video expression vector of the target video expression content may be controlled to transmit the video expression vector and the vector common correlation process metric is performed between the video expression vectors of each audio content, so that the audio content having correlation is regarded as a search result.

According to the video expression content searching method, the screened sample video expression vectors are more accurate and reliable, so that the accuracy of the searching result is improved.

Video interaction data may be generated based on the example video presentation content and the searched secondary video presentation content. Comprising an important video segment and a number of secondary video segments. Wherein the important video segments represent example video representations, each of the secondary video segments represents a secondary video representation, the secondary video segments including the first video representation and also including the second video representation. The segment vector of each segment is a video expression vector of the video expression content represented by the segment, and the video expression vector is a video expression vector which is screened when the secondary video expression content is obtained for vector commonality association degree comparison, for example, the video expression vector screened by the vector identification thread can be used.

It can be appreciated that when executing the content described in the

above steps

101 and 102, the video expression vector of the exemplary video expression content is identified by combining the secondary video expression content associated with the exemplary video expression content when configuring the thread, so that the video expression vector of the identified exemplary video expression content is more accurate and reliable, the accuracy of the video expression content to be processed is improved, and thus, the video data transmission can be accurately controlled.

On the basis of the above, there is provided an artificial intelligence based short video transmission processing apparatus 200 applied to an artificial intelligence based short video transmission processing system, the apparatus comprising:

a data obtaining module 210, configured to obtain first video interaction data, where the first video interaction data includes an important video segment and at least one secondary video segment, a segment vector of the important video segment represents a video expression vector of a target video expression content, the video expression vector is controlled to transmit, and a segment vector of the secondary video segment represents a video expression vector of a secondary video expression content, and the secondary video expression content is a video expression content associated with the target video expression content;

The vector transmission module 220 is configured to load the first video interaction data to a vector optimization thread, where the vector optimization thread optimizes a segment vector of the important video segment in combination with a segment vector of a secondary video segment in the first video interaction data to obtain a video expression vector of the optimized target video expression content, and controls the video expression vector to transmit.

Based on the above, an artificial intelligence based short video transmission processing system 300 is shown, comprising a processor 310 and a memory 320 in communication with each other, said processor 310 being adapted to read a computer program from said memory 320 and execute it for implementing the method described above.

On the basis of the above, there is also provided a computer readable storage medium on which a computer program stored which, when run, implements the above method.

In summary, based on the above scheme, when a thread is configured, the video expression vector of the example video expression content is identified by combining the secondary video expression content associated with the example video expression content, so that the video expression vector of the identified example video expression content is more accurate and reliable, the accuracy of the video expression content to be processed is improved, and the video data transmission can be accurately controlled.

It should be appreciated that the systems and modules thereof shown above may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may then be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present application may be implemented not only with hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also with software, such as executed by various types of processors, and with a combination of the above hardware circuitry and software (e.g., firmware).

It should be noted that, the advantages that may be generated by different embodiments may be different, and in different embodiments, the advantages that may be generated may be any one or a combination of several of the above, or any other possible advantages that may be obtained.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations of the present application may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this application, and are therefore within the spirit and scope of the exemplary embodiments of this application.

Meanwhile, the present application uses specific words to describe embodiments of the present application. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present application may be combined as suitable.

Furthermore, those skilled in the art will appreciate that the various aspects of the invention are illustrated and described in the context of a number of patentable categories or circumstances, including any novel and useful procedures, machines, products, or materials, or any novel and useful modifications thereof. Accordingly, aspects of the present application may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media.

The computer storage medium may contain a propagated data signal with the computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take on a variety of forms, including electro-magnetic, optical, etc., or any suitable combination thereof. A computer storage medium may be any computer readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated through any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or a combination of any of the foregoing.

The computer program code necessary for operation of portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, c#, vb net, python, etc., a conventional programming language such as C language, visual Basic, fortran 2003, perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, ruby and Groovy, or other programming languages, etc. The program code may execute entirely on the user's computer or as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the use of services such as software as a service (SaaS) in a cloud computing environment.

Furthermore, the order in which the elements and sequences are presented, the use of numerical letters, or other designations are used in the application and are not intended to limit the order in which the processes and methods of the application are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present application. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.

Likewise, it should be noted that in order to simplify the presentation disclosed herein and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the subject application. Indeed, less than all of the features of a single embodiment disclosed above.

In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the numbers allow for adaptive variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations that may be employed in some embodiments to confirm the breadth of the range, in particular embodiments, the setting of such numerical values is as precise as possible.

Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this application is hereby incorporated by reference in its entirety. Except for application history documents that are inconsistent or conflicting with the present application, documents that are currently or later attached to this application for which the broadest scope of the claims to the present application is limited. It is noted that the descriptions, definitions, and/or terms used in the subject matter of this application are subject to such descriptions, definitions, and/or terms if they are inconsistent or conflicting with such descriptions, definitions, and/or terms.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of this application. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present application may be considered in keeping with the teachings of the present application. Accordingly, embodiments of the present application are not limited to only the embodiments explicitly described and depicted herein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. An artificial intelligence-based short video transmission processing method is characterized in that the method at least comprises the following steps:

obtaining first video interaction data, wherein the first video interaction data comprises important video fragments and at least one secondary video fragment, the fragment vector of the important video fragment represents a video expression vector of target video expression content, the video expression vector is controlled to transmit, the fragment vector of the secondary video fragment represents a video expression vector of secondary video expression content, and the secondary video expression content is video expression content related to the target video expression content;

and loading the first video interaction data to a vector optimization thread, wherein the vector optimization thread optimizes the segment vector of the important video segment by combining the segment vector of the secondary video segment in the first video interaction data to obtain a video expression vector of the optimized target video expression content, and controls the video expression vector to transmit.

2. The method of claim 1, wherein the vector optimization thread optimizes the segment vectors of the important video segments in combination with the segment vectors of the secondary video segments in the first video interaction data to obtain video expression vectors of the optimized target video expression content, and controlling the video expression vectors to transmit, comprising:

Determining a confidence level between the important video segment and each of the secondary video segments in the first video interaction data;

combining the confidence coefficient to fuse the video expression vectors of the secondary video segments to obtain the weighting vector of the important video segment;

and combining the video expression vector of the important video segment with the weighting vector to obtain the video expression vector of the optimized target video expression content, and controlling the video expression vector to transmit.

3. The method of claim 1, wherein prior to obtaining the first video interaction data, the method further comprises: and combining the target video expression content, and acquiring secondary video expression content associated with the target video expression content from a prestored video expression content set.

4. The method of claim 2, wherein the obtaining secondary video presentation content associated with the target video presentation content from a set of pre-stored video presentation content in conjunction with the target video presentation content comprises:

the video expression vectors of the target video expression contents are obtained one by one through a vector identification thread, and the video expression vectors are controlled to be transmitted and prestored for the video expression vectors of each audio content in a video expression content set;

And controlling the video expression vector to transmit and pre-store the vector commonality association degree between the video expression vector of each audio content in the video expression content set based on the video expression vector of the target video expression content, and determining secondary video expression content associated with the target video expression content from the pre-store video expression content set.

5. The method of claim 4, wherein the controlling the degree of vector commonality association between the video enunciation vector for transmission and the video enunciation vector for each audio content in a set of pre-stored video enunciation content based on the video enunciation vector for the target video enunciation content, determining secondary video enunciation content associated with the target video enunciation content, comprises:

distributing the vector commonality association degree between the target video expression content and each audio content according to the association coefficient of the vector commonality association degree from big to small;

and screening the audio content corresponding to the vector commonality association degree of the interval data, and regarding the audio content as the secondary video expression content with the association of the target video expression content.

6. The method of claim 4, wherein said controlling the degree of vector commonality association between the video enunciation vector for transmission and the video enunciation vector for each of the audio content in a set of pre-stored video enunciation content based on the video enunciation vector for the target video enunciation content, determining secondary video enunciation content from the set of pre-stored video enunciation content that is associated with the target video enunciation content, comprises:

combining the video expression vectors of the target video expression contents, controlling the vector commonality association degree between the video expression vectors for transmission and the video expression vectors of each audio content, and obtaining a first video expression content associated with the target video expression contents from each audio content; obtaining second video expression content associated with the first video expression content from each of the audio contents in combination with the vector commonality association degree between the video expression vector of the first video expression content and the video expression vector of the audio content;

and regarding the first video expression content and the second video expression content as secondary video expression content of the target video expression content.

7. The method according to claim 2, wherein the number of vector optimized threads is one or a number accumulated one by one; when the number of vector optimization threads is several: the calculation of one vector optimization thread is random, and the calculation is the first video interaction data output by the vector optimization thread which is the last time.

8. The method of claim 2, wherein said fusing the video expression vectors of each of the secondary video segments in combination with the confidence level to obtain the weight vector of the important video segment comprises: and clustering the video expression vectors of the secondary video segments by combining the confidence coefficient to obtain the weighting vector of the important video segment.

9. The method of claim 8, wherein the combining the video expression vector of the important video segment and the weighting vector to obtain the video expression vector of the optimized target video expression content, and controlling the video expression vector to transmit, comprises: combining a video expression vector of the important video segment with the weighting vector; performing first projection on the combined vectors to obtain video expression vectors of the optimized target video expression content, and controlling the video expression vectors to be transmitted;

Wherein said determining a confidence level between said important video segment and each said secondary video segment in said first video interaction data comprises:

performing second projection on the important video segment and the secondary video segment;

determining an association relation between the important video segment and the secondary video segment after the second projection;

determining the confidence coefficient according to the association relation after the first projection processing;

wherein, the target video expression content comprises: searching video expression content to be processed and pre-storing each audio content in the video expression content set;

after obtaining the video expression vector of the target video expression content corresponding to the important video segment and controlling the video expression vector to transmit, the method further comprises: and controlling the vector commonality association degree between the video expression vector and the video expression vector of each audio content based on the video expression vector of the optimized target video expression content, and obtaining the near video expression content of the target video expression content from the audio content as a search result.

10. An artificial intelligence based short video transmission processing system comprising a processor and a memory in communication with each other, the processor being adapted to read a computer program from the memory and execute it to implement the method of any one of claims 1-9.