CN116992010A

CN116992010A - Content distribution and interaction method and system based on multi-mode large model

Info

Publication number: CN116992010A
Application number: CN202310961722.2A
Authority: CN
Inventors: 朱玮; 杨波; 沈峰
Original assignee: Ignorance Beijing Smart Technology Co ltd
Current assignee: Ignorance Beijing Smart Technology Co ltd
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-11-03

Abstract

The application relates to the technical field of multi-mode big models, in particular to a content distribution and interaction method and system based on a multi-mode big model, wherein the method comprises the following steps: s1, acquiring content; s2, storing contents; s3, transmitting the content to a multi-modal large model server; s4, processing the content through the multi-mode large model; s5, receiving a content request instruction input by a user; s6, processing and storing the instruction. Generating semantic features of a user content request according to the instruction, and accordingly obtaining target content to be sent; s7, requesting the content distribution server to send the content; s8, sending the content to a user dialogue interface; s9, sending the content to the multi-mode large model; s10, the multi-mode large model generates the dialogue answers of the round and sends the dialogue answers to the user dialogue interface. By adopting the technical scheme of the application, the content can be automatically tidied, processed and recommended according to the request of the user, and related content is continuously sent to the user, so that a brand new content organization mode, a content presentation mode and reading experience are provided for the user.

Description

Content distribution and interaction method and system based on multi-mode large model

Technical Field

The application relates to the technical field of multi-modal large models of artificial intelligence, and particularly discloses a content distribution and interaction method and system based on a multi-modal large model.

Background

The content on the internet includes news, articles, graphics, video, etc. The distribution and interaction methods of content on the internet have undergone multiple changes. Keyword searches from the initial web page list and mail group to the search engine, then blog subscriptions and microblog subscriptions take up the main stream, with automatic recommendation based on algorithms, etc. After mobile internet, sharing content and public number subscription in the dialog boxes of chat and social App is also an important channel for content distribution.

These existing distribution methods greatly increase the acquisition and propagation speeds of the content. However, all these methods have weaknesses, such as web page lists and mail groups depend on operators, the search engine has the problems of fuzzy semantics and inaccurate answers, the subscription mode needs manual subscription, algorithm recommendation can be controlled, and the sharing of chatting and social apps has the risks of randomness and control.

The user reads the content on the internet, still similar to traditional paper reading. Although hyperlinks can jump between text, the content to which the jump is made is fixed and not necessarily what the user wants to see. The content reading of the Internet is not greatly different from the human-computer interaction with the paper reading.

With the advent of multi-modal large models, AI can smoothly converse with people and represent a leaping progress in both semantic understanding and logical reasoning.

Therefore, there is a need for a method and a system for content distribution and interaction based on a multi-modal large model, which can automatically sort, process and recommend content according to a user's request and continuously send related content to the user.

Disclosure of Invention

The application aims to provide a content distribution and interaction method based on a multi-mode large model, which can automatically sort, process and recommend contents according to the request of a user and continuously send related contents to the user.

In order to solve the technical problems, the application provides the following technical scheme:

a content distribution and interaction method based on a multi-mode big model comprises the following steps:

s1, acquiring content;

s2, storing the content to a content management server;

s3, transmitting the content in the content management server to a multi-modal large model server;

s4, processing the content through the multi-mode large model in the multi-mode large model server, and storing the processed content to the content management server; the processing includes generating vector data for the content and generating a summary, an illustration, a deep annotation, and a corresponding set of semantic feature labels; the method also comprises the steps of generating vector data for feedback and comments of the user on the content and a corresponding semantic feature tag set;

s5, receiving a content request instruction input by a user, and sending the content request instruction to a multi-modal large model server;

s6, the multi-mode large model server generates characteristics of a user content request according to the instruction, and sends the characteristics to the content management server for storage, and performs matching analysis with a content semantic characteristic set stored on the content server to obtain target content to be sent; judging whether the target content needs to be processed through the multi-mode big model, if not, jumping to the step S7, and if so, jumping to the step S9;

s7, the content server requests the content distribution server to send the content;

s8, the content server sends the content to the user dialogue interface;

s9, the content server sends the content to the multi-mode big model;

and S10, the multi-mode large model generates a dialogue answer of the round according to the historical content of the user dialogue acquired by the user dialogue interface and the content sent by the content server, and sends the dialogue answer to the user dialogue interface.

Further, in the step S1, the acquired content includes content manually input by a user who provides the content at the content input interface, content captured by a crawler of the input server, or content acquired from an external content distribution server.

Further, in the step S1, the content includes news, stories, papers, and internet texts.

Further, the step S4 specifically includes:

s401, generating a summary;

s402, generating an illustration;

s403, generating word embedding vectors and storing the word embedding vectors into a pre-created vector database;

s404, generating TF-IDF characteristic values;

s405, generating a semantic feature label set;

s406, generating basic depth annotation;

s407, generating batch content of the briefs and storing the batch content to a content management server.

Further, in the step S403, the content is cut and divided into text blocks of a plurality of words, word embedding vectors are generated for each text block through a language model of the multi-modal large model, and are stored into a vector database;

in step S404, a TF-IDF algorithm is adopted, a keyword TF-IDF value is generated on the basis of a pre-constructed corpus, a abstract based on TF-IDF word frequency is generated, and a content vector based on TF-IDF word frequency is generated.

Further, in the step S406, the deep annotation includes performing background introduction, noun interpretation and deep mining on knowledge points formed by paragraphs and phrases in the content.

Further, in the step S5, the content request instruction includes an instruction of a free text dialogue and a structured instruction;

when the content request instruction is an instruction of a free text dialogue, the multi-mode big model carries out semantic understanding on the content request instruction and carries out execution based on the characteristics of the semantic understanding;

the content server stores the content request instruction.

Further, in the step S6, for the instruction of the free text dialogue, the multimodal big model understands the user intention, extracts the characteristics of the content instruction required by the user, the extraction of the characteristics includes a tag set generated by a word embedding algorithm, a TF-IDF algorithm or a tag generating algorithm of the multimodal big model, and the generated result is sent to the content management server for storage.

It is a second object of the present application to provide a content distribution and interaction system based on a multi-modal large model, using the above method, comprising:

the content input and collection server is used for inputting and collecting content;

the multi-mode large model server is preset with a multi-mode large model and is used for running the multi-mode large model, analyzing and processing the content and carrying out dialogue with a user;

the content management server is used for storing the content and the demand instruction of the user;

the content distribution server is used for sending the content to the outside;

the terminal comprises a user dialogue interface for receiving the content input by the user or displaying the content to the user.

The application has the beneficial effects that: under the support of the multi-mode large model, the scheme can realize the conversation window type content distribution and interaction modes for the user. Related content can be automatically and continuously sent according to a natural language dialogue request of a user, wherein the content is text content such as news, reports, papers, internet texts and the like. Operators of content providing services do not need to manually select, sign and distribute content, but rely on multi-modal large models for automated processing. The content distribution more closely matches the user's request. The automatic recommendation and automatic arrangement of the content are completely based on the neutral multi-mode large model and the requirements of the user, so that the disadvantages of algorithm control are avoided. The reading of the content can be deeply mined in a dialogue interaction mode, so that the traditional simple reading and interaction mode is changed.

Drawings

FIG. 1 is a flow diagram of a method for content distribution and interaction based on a multimodal big model according to an embodiment;

fig. 2 is a schematic flow chart of acquiring content in step S2 in the method of the embodiment;

FIG. 3 is a flow chart of the process of the content of step S4 in the method of the embodiment;

fig. 4 is a schematic diagram of two news bulletin presenting manners in step S8 in the method of the embodiment;

FIG. 5 is a schematic diagram of presentation and interaction of news content at step S8 in an embodiment of a method;

FIG. 6 is a schematic diagram of user interaction and content pushing in a method of an embodiment.

Detailed Description

The following is a further detailed description of the embodiments:

examples

The embodiment also provides a content distribution and interaction method based on a multi-mode big model, as shown in fig. 1, which comprises the following contents:

s1, receiving content input by a user (namely, content manually input by a user providing the content on a content input interface), or capturing the content through a crawler, or acquiring the content from an external content distribution server, namely, acquiring the content from a content distribution server of another system;

the content includes text content of news, report, paper, internet, and the like, and in this embodiment, the content is industry news of different kinds. News collections may be entered through a user dialog interface of the terminal, including title, author, time, tags, text, etc. The content collection and input server crawlers can also capture the content on the internet.

In this embodiment, a manner of acquiring content through interfaces between the content distribution and interaction systems of the respective multi-modal large models is also adopted. As shown in fig. 2, in addition to entering and crawling content by conventional means, the content is invoked by the content distribution and interaction system a of the multimodal mass model and the content distribution and interaction system B of the multimodal mass model.

S2, storing the collected content to a content management server;

for example, content crawled on the internet is written to a content management server through an API.

In this embodiment, the content management server includes a series of services: storing news content, and storing and retrieving text files by adopting Solr; storing formatted data of news, such as title, author, time, etc., using mysql; storing word embedding vectors generated by multi-mode large model analysis or other algorithms by using a Langchain vector database; the news request instructions for each user are stored.

S3, transmitting the content to a multi-modal large model server;

the multimodal big model preset in the multimodal big model server of the embodiment may be a chatGPT of OpenAI, and is called through an API; or may be an own large model, running on an own server, such as a proprietary large model obtained by fine-tuning with LLAMA. To meet the requirements of content processing and dialogue, the parameters of the large model should be in the billion scale.

S4, processing the content through the multi-mode large model in the multi-mode large model server, and storing the processed content to the content management server; the processing includes generating vector data for the content and generating a summary, an illustration, a deep annotation, and a corresponding set of semantic feature labels; vector data is generated for feedback and comments of the user on the content, and a corresponding semantic feature tag set is also included.

The abstract is a summary of the content, is used for being displayed on the cover when being pushed to the user dialogue interface, and the length of the abstract can give instructions to the multi-mode large model according to the requirement. The illustration is generated by the diagram generating function of the multi-mode large model based on the keywords extracted from the abstract. Depth annotation is the advanced generation of content for the part of the content that needs annotation and further exploration. For example, names of persons, names of places, terms, and economic data, production data appearing in news, for example, are interpreted deeply, informing the user what these data represent. The content to be generated in the step can be further set according to the requirement of a system operator, so that the steps of content acquisition and AI analysis are completed;

in this embodiment, the steps of processing the content by the multimodal big model are as shown in fig. 3, and specifically include:

s401, generating a abstract. The main parameters for generating the abstract are set as follows:

"prompt" please generate a summary for the article,

“temperature”:0.5,

“max_tokens”:200,

“frequency_penalty”:0.5,

“presence_penalty”:0.0

s402, generating an illustration. And (3) calling a graph generation model of the multi-mode large model, and generating an illustration by taking the abstract generated in the step S401 as a Prompt.

S403, generating word embedding vectors. Word embedding functionality provided by a multi-modal large model is employed. Firstly, cutting the content, dividing the content into small text blocks according to the period number as a dividing basis, and if one sentence exceeds 1000 words, searching for the middle comma division again. Word embedding vectors are then generated for each text block through a language model of the multimodal big model and stored into a pre-created vector database.

Specifically, the enabling API is called, the usage model is "text-embedding-ada-002", word embedding vectors are generated for all text contents, and the word embedding vectors are stored into a vector database. The vector database used in this embodiment is fass.

S404, generating TF-IDF characteristic values. And generating a keyword TF-IDF value based on a pre-constructed corpus by adopting a TF-IDF algorithm, generating a abstract based on TF-IDF word frequency, and generating a content vector based on the TF-IDF word frequency.

On content query matching for news, the TF-IDF algorithm has better effects in keyword matching, article semantic matching and abstract matching in some scenes. In this embodiment, two kinds of vector matching are adopted in combination.

S405, generating a semantic feature label set. The semantic feature label set is a content classification label based on understanding the content. Multiple labels can be preset for the multi-mode large model, and the labels can be freely generated by the multi-mode large model. The purpose is to facilitate later screening of content according to the user's content request instructions. For example, a semantic feature tag set of science and technology news is:

[ news, science and technology, AI, computer, artificial intelligence, big model, language model, openai, chatGPT, chatGPT3.5, sam altman, AI New product release, 3/1/2023, great news, … ]

The present embodiment calls the API of Openai, and generates a semantic feature tag set for each news using the model "GPT-3.5". In a traditional deep learning model, this is a text multi-classification task and an entity recognition task. Both types of tasks can be generated by a multimodal big model.

In this embodiment, the categories of news categorization are specified as follows:

politics, science and technology, financial, military, sports, entertainment, academic, health, nature, history, culture, society;

in this embodiment, the classifications that need to be analyzed are also: news country, news severity, news length, news writing;

in this embodiment, the generated tag types include: place name, person name, date, time, organization name, monetary amount, brand, term, etc.

Generating word embedding vectors for all tags by using the ebeddingAPI of Openai, and storing the word embedding vectors into a vector database.

S406, generating basic depth annotation. The deep annotation is to make necessary background introduction, noun interpretation, deep mining and other processes on knowledge points formed by paragraphs and phrases in the content. These deep annotations may help the user to better understand news.

In the present embodiment, explanation is made for a tag set generated by news, such as explaining place names, explaining person names, etc., and explanation for terms and numbers includes deep mining of terms and numbers, such as what an economic index means.

In this embodiment, a special deep annotation is set, i.e. news background introduction, and the chatGPT answers such a question: why this news has a reporting value and why this news has such a great significance.

S407, generating batch content of the briefing type, and storing the batch content in a content management server for unified call.

In this embodiment, a large number of batch contents of bulletin types are set, for example, daily scientific news bulletins are generated for the user, or daily financial news bulletins are generated for the user. Such tasks can enter the content management server along with the acquisition of news, update the briefing in real time by the scrolling of the multi-mode big model, and send to the terminal according to the time required by the user.

S5, receiving a content request instruction input by a user on a user dialogue interface, and sending the content request instruction to a multi-modal large model server; the content server stores the content request instruction;

the content request instruction sent by the user can be the setting of a content distribution mode under the multi-mode model, or the request of a certain content;

in this embodiment, the user may input a content request instruction in the user session interface, where the content request instruction has two formats, one is a free text session instruction and the other is a structured instruction.

For free text conversations, the multimodal big model needs to understand the semantics and execute based on the semantic understanding, and the semantics may contain more complex logic, and needs to be formatted, analyzed and stored. The coverage time and scope of the content request instructions is also understood by the multimodal big model. For example, the text dialog of the user may be: please send me a news bulletin about AI today. This text conversation is performed only once. The text dialog of the user may also be: please send news bulletins in AI for me every day in the future. This piece of text needs to be stored and becomes the content request rule for the user. It is also possible that: please no further news in AI later. It is necessary to understand that the rules in this text conversation are negative.

In this embodiment, the structured instructions use a slash as the command parser. The executable command is entered after the slash. The basic format is as follows:

/command[r][expression7]

where r is a custom parameter of the Request command, such as NOW, DAILY, etc.

One example is news of a request Now artificial intelligence big model

S6, the multi-mode large model server generates characteristics of a user content request according to the content request instruction, and performs matching analysis with a semantic characteristic tag set stored on the content server to obtain target content to be sent; it is determined whether the target content needs to be processed via the multimodal big model, if not, it jumps to step S7, if so, it jumps to step S9.

For the instruction of free text dialogue, the multi-mode big model understands the user intention, and the feature extraction is carried out on the content required by the user, wherein the extraction can be a word embedding (embedding) algorithm of the multi-mode big model, a TF-IDF algorithm or a tag set generated by a tag generation algorithm, and the generated result is transmitted to a content management server for storage. And correspondingly, matching and comparing the content with a feature set which is generated in advance for the content on the content server. In other embodiments, a full text word vector matching search may also be performed.

In the embodiment, three feature matching methods of keyword features, TF-IDF algorithm features and large model Embedding sentence vectors are combined.

The content request instructions of the users and the characteristic values calculated based on the multi-mode large model server are stored by the content server, namely, each user stores the corresponding content request instructions and the characteristic values of the instructions on the content server for matching content.

S7, the content management server requests the content distribution server to send the content;

in the present embodiment, the content distribution server is designed to reduce the stress of the content management server. In a system in which the user pressure is not great, two servers may be combined, and the content management server may realize the function of the content distribution server.

S8, the content distribution server sends the content to a user dialogue interface of the terminal for users to read;

if the user applies for news in a fixed format, the content is directly sent to the user dialogue interface for the user to read.

In the present embodiment, the transmitted contents include four types:

the first category is a presentation category, namely news presentation which is sorted by a multi-mode big model according to the content requirements of users. As shown in fig. 4, the presentation may be presented in two ways, one displaying only the title and the other displaying the title and summary.

The second category is a single news bulletin category, that is, a news bulletin is sent by the multi-mode large model according to the content requirement of the user, and the sending can be continuous, and the multi-mode large model is sent to the user at any time as new news enters the content management server. A single news item displays headlines, summaries, schematics, etc.

And the third category is a single news full text category, namely, aiming at the content application of a user, the multi-mode large model sends news full text to a user dialogue interface. At this time, in order to support deep reading of the user, all supporting contents, such as background introduction, digital parsing, noun interpretation and the like, generated by the multi-mode large model for news are sent to the user dialogue interface, and the contents can be presented in a separate display frame at any time along with the reading of the user.

The user dialogue interface in this embodiment is a multi-box dialogue interface, including a dialogue main interface of a user and a multi-mode large model, and an auxiliary window for displaying annotation paraphrasing, and the auxiliary window may have a plurality of auxiliary windows.

A specific display layout is shown in fig. 5, and the elements of the user session interface are explained as follows:

the content browsing main window can be a page opened by a browser or a homepage of a mobile App, and comprises:

a model dialog window, in which dialogues of the multimodal mass model with the user are displayed. The news briefs and news pushed by the multi-mode large model are also automatically rolled out. For example, a summary display of a piece of content, typically a news feed, is displayed in summary form, with only the illustration icons and subject summaries being displayed. The record, with hyperlinks, can be clicked to jump to a news detail page, which is shown in the interface on the right side of fig. 5.

The user enters a command window. I.e. the user can enter a dialogue here for a multimodal big model.

The display window of the specific content, namely the news detail page, is jumped to the news detail page after being clicked by the abstract, and comprises the following steps:

and displaying a main window of the specific content. The content of the news is displayed in this window.

Annotation of content and an explanation window. The comments and explanations of the content, such as noun explanation of a noun in news, background introduction of news, comprehensive judgment of a number, and encyclopedic information retrieval of a person are all shown in the auxiliary window. Moreover, this window is also a dialog window with the user. When the user inputs a sentence of dialogue in the lower user instruction input window, the multi-mode large model makes an answer in the auxiliary window. The main window does not react to the user's problem and displays news content statically.

And closing the display window of the specific content, and returning to the content browsing main window.

The fourth category is a conversational category, i.e., the user expands discussions with a multimodal big model for a piece of news. For example, the user is reading a "news about the huge amount of effort used for large model training" and discussing the price problem of the graphic card with the multi-modal large model. This content is in a general dialog format.

S9, the content server sends the content to the multi-modal large model server;

if the news of the content server does not need to be processed, it can be directly transferred to the user dialogue interface. If the processing is needed, the processing is carried out by the multi-mode large model in the multi-mode large model server, and then the processed multi-mode large model is sent to the user dialogue interface.

In this embodiment, in most cases, processing via a multi-modal large model is required. For example, in step S8, the first class, the second class, and the fourth class are all organized by the multimodal big model language and sent to the user dialogue interface. Only the third category is the content which is preprocessed by the multi-mode big model and is directly sent.

S10, the multi-mode large model generates a dialogue answer of the round according to the historical content of the user dialogue acquired by the user dialogue interface and the content sent by the content server, and sends the dialogue answer to the user dialogue interface. For example, the multimodal big model matches the obtained content from the content server according to the content request instructions of the user, and after combining, generates an appropriate answer, and sends the answer to the user dialogue interface.

In this embodiment, user feedback and comments are also important links to news distribution and interaction. As shown in fig. 6, after the user makes a comment on the news, the comment content is sent to the content management server, and then is acquired by the multi-mode large model, semantic understanding is performed on the comment content, and various feature data sets are generated. The comment data of the user will also be one dimension in the distribution and propagation thereafter. However, in order to prevent the distribution and propagation mechanism from malicious manipulation, comment data is analyzed by the multimodal big model only for agreed users. Namely, if some readers set, accept the tags of the evaluation semantics, the accepted content is influenced by the evaluation; the reader does not set, and the content that he receives is not affected by the rating.

In this embodiment, under the condition that the user agrees to the real name, the reader a may set to preferentially accept the content positively evaluated by the reader B or preferentially accept the content negatively evaluated by the reader B.

In this embodiment, under the condition that the user agrees to the real name, the reader a may evaluate "send this article to the reader B", and under the premise that the reader B sets to accept "evaluate influence distribution", the reader B may receive this piece of content evaluated by the reader a.

Under the mechanism, the functions of 'copy', 'circle read', 'feed read' and the like under the traditional paper flow and the functions of 'attention', 'friend circle' and the like of internet content and social application can be realized, and the difference is that in the embodiment, the operation is automatically made after the natural language comments of the users are understood by the multi-mode large model.

Therefore, user comments as a dimension can dynamically affect the transmission and dissemination of news. If the user does not want to be affected by the comment, it may be set that "evaluation affects distribution" is not accepted.

Based on the above method, the present embodiment further provides a content distribution and interaction system based on a multi-mode big model, including:

a content management server for storing content and content-related data, and a content request instruction of a user;

the content distribution server is used for sending the content to the outside;

the terminal comprises a user dialogue interface, a user interaction interface and a display interface, wherein the user dialogue interface is used for receiving content input by a user and displaying the content to the user; the terminal comprises a PC, a smart phone, a tablet personal computer and the like; for example, the user dialog interface runs on the browser of a PC or on the App of a smartphone.

According to the scheme of the embodiment, the multi-mode large model of artificial intelligence is adopted to conduct feature analysis on news and other contents, and rich semantic features are built for the news for distribution. The user informs the multi-mode big model of the application requirement of news through the form of dialogue, and the multi-mode big model is matched with the proper content from the news base based on the semantic understanding of the user request. The multi-mode large model is used for carrying out processing such as sorting, extracting, abstracting, deepening mining and the like on the matched content and providing the processed content for a user. The content distribution and interaction mode based on the multi-mode large model is completely dependent on a reliable multi-mode large model algorithm, and is different from the traditional distribution algorithm in the possibility of being controlled, and the authority of automatically screening the content is given to a user for the first time. Meanwhile, due to the processing capacity of the multi-mode large model, news can be processed in advance, background knowledge, noun interpretation, deep analysis and other contents are provided for the news, and various problems of a user can be solved at any time while the user reads the news, so that the news cannot be automatically completed under the prior technical conditions. Under this scheme, artificial intelligence becomes people's news collection and editing assistant provides targeted news briefs, news to everyone, and the degree of depth interpretation to the news. This is a revolutionary method of content distribution and dissemination.

The foregoing is merely an embodiment of the present application, the present application is not limited to the field of this embodiment, and the specific structures and features well known in the schemes are not described in any way herein, so that those skilled in the art will know all the prior art in the field before the application date or priority date of the present application, and will have the capability of applying the conventional experimental means before the date, and those skilled in the art may, in light of the present application, complete and implement the present scheme in combination with their own capabilities, and some typical known structures or known methods should not be an obstacle for those skilled in the art to practice the present application. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present application, and these should also be considered as the scope of the present application, which does not affect the effect of the implementation of the present application and the utility of the patent. The protection scope of the present application is subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims

1. A content distribution and interaction method and system based on a multi-mode big model are characterized by comprising the following steps:

s1, acquiring content;

s2, storing the content to a content management server;

s6, the multi-mode large model server generates semantic features of the user content request according to the instruction, and the semantic features are stored by the content management server and are matched and analyzed with the content semantic feature set stored on the content server to obtain target content to be sent; judging whether the target content needs to be processed through the multi-mode big model, if not, jumping to the step S7, and if so, jumping to the step S9;

s8, the content server sends the content to the user dialogue interface;

s9, the content server sends the content to the multi-mode big model;

2. The multi-modal large model based content distribution and interaction method according to claim 1, wherein: in the step S1, the acquired content includes content manually input by a user providing the content at the content input interface, content captured by a crawler of the input server, or content acquired from an external content distribution server.

3. The multi-modal large model based content distribution and interaction method according to claim 2, wherein: in the step S1, the content includes news, stories, papers and internet texts.

4. The multi-modal large model based content distribution and interaction method according to claim 1, wherein: the step S4 specifically includes:

s401, generating a summary;

s402, generating an illustration;

s404, generating TF-IDF characteristic values;

s405, generating a semantic feature label set;

s406, generating basic depth annotation;

5. The multi-modal large model based content distribution and interaction method of claim 4, wherein: in the step S403, the content is cut and divided into text blocks of a plurality of words, and an encoding vector is generated for each text block through a language model of the multi-modal large model and stored in a vector database;

6. The multi-modal large model-based content distribution and interaction method of claim 5, wherein: in step S406, the deep annotation includes performing background introduction, noun interpretation and deep mining on knowledge points formed by paragraphs and phrases in the content.

7. The multi-modal large model based content distribution and interaction method according to claim 1, wherein: in the step S5, the content request instruction includes an instruction of a free text dialogue and a structured instruction;

when the content request instruction is an instruction of a free text dialogue, the multi-mode big model carries out semantic understanding processing on the content request instruction and carries out execution based on semantic understanding;

the content server stores the content request instruction.

8. The multi-modal large model-based content distribution and interaction method of claim 7, wherein: in step S6, for the instruction of the free text dialogue, the multimodal big model understands the user intention, extracts the characteristics of the content required by the user, the extraction of the characteristics includes a tag set generated by a word embedding algorithm, a TF-IDF algorithm or a tag generating algorithm of the multimodal big model, and the generated result is sent to the content management server for storage.

9. A content distribution and interaction system based on a multimodal mass model, using the method of any of claims 1-8, comprising:

a content management server for storing content and a content request instruction of a user;

the content distribution server is used for sending the content to the outside;