CN117591697A

CN117591697A - Text recommendation method and system based on artificial intelligence and video processing

Info

Publication number: CN117591697A
Application number: CN202410078311.3A
Authority: CN
Inventors: 和彩霞
Original assignee: Chengdu Yadu Kesheng Technology Co ltd
Current assignee: Chengdu Yadu Kesheng Technology Co ltd
Priority date: 2024-01-19
Filing date: 2024-01-19
Publication date: 2024-02-23
Anticipated expiration: 2044-01-19
Also published as: CN117591697B

Abstract

The invention provides a text recommendation method and a system based on artificial intelligence and video processing, and relates to the technical field of text recommendation; if the book reading operation of the user is detected, a front camera is opened to acquire a video when the user reads the book, and a mobile phone screen is recorded to acquire a screen recorded video; inputting the video of the user reading the book and the screen recorded video into an interest paragraph determination model to determine a plurality of initial interest paragraphs in the book; determining a target paragraph based on text content corresponding to a plurality of initial interest paragraphs in the book; generating paragraph description images based on the text content of the target paragraphs by using a generated countermeasure network; inputting the paragraph description image into a cover determination model to obtain a target book cover; and recommending the target book corresponding to the target book cover to the user, wherein the method can accurately recommend the book text suitable for the user.

Description

Text recommendation method and system based on artificial intelligence and video processing

Technical Field

The invention relates to the technical field of text recommendation, in particular to a text recommendation method and system based on artificial intelligence and video processing.

Background

With the continuous progress of technology, artificial intelligence has become a hot topic in today's society. In the field of book recommendation, conventional recommendation methods generally recommend based on factors such as historical reading records of users, popularity of books, and the like. However, these methods often fail to accurately meet the personalized needs of the user. Firstly, although the recommendation method based on the historical reading record of the user can reflect the reading interest of the user to a certain extent, the reading interest of the user is continuously changed, the historical record can only reflect the past condition, the future reading requirement can not be predicted, and the recommendation result is often low in accuracy. Secondly, although the recommending method based on the popularity of books can recommend some popular books, the popularity does not necessarily represent the preference degree of the user to the books, so the recommending accuracy of the method is low, and the personalized requirements of the user cannot be accurately met.

How to accurately recommend book text suitable for users is a current urgent problem to be solved.

Disclosure of Invention

The invention mainly solves the technical problem of how to accurately recommend the book text suitable for the user.

According to a first aspect, the present invention provides a text recommendation method based on artificial intelligence and video processing, comprising: detecting whether a user starts a book reading operation; if the book reading operation of the user is detected, a front camera is opened to acquire a video when the user reads the book, and a mobile phone screen is recorded to acquire a screen recorded video; inputting the video of the user reading the book and the screen recorded video into an interest paragraph determination model to determine a plurality of initial interest paragraphs in the book; determining a target paragraph based on text content corresponding to a plurality of initial interest paragraphs in the book; generating paragraph description images based on the text content of the target paragraphs by using a generated countermeasure network; inputting the paragraph description image into a cover determination model to obtain a target book cover; and recommending the target book corresponding to the target book cover to a user.

Further, the interest paragraph determination model is a transducer model, the input of the interest paragraph determination model is a video when the user reads a book and the screen recorded video, and the output of the interest paragraph determination model is a plurality of initial interest paragraphs in the book.

Still further, the method further comprises: acquiring uninteresting operation of a user on the target book; in response to the uninteresting operation of the user on the target book, removing the target paragraph from the initial interest paragraphs to obtain multiple removed paragraphs; generating a plurality of paragraph description images by using the generating countermeasure network based on the text content of the plurality of paragraphs after the elimination; acquiring a paragraph description image selected by a user, wherein the paragraph description image selected by the user is a paragraph description image selected by the user from a plurality of paragraph description images; inputting the paragraph description image selected by the user into the cover determination model to obtain a book cover to be recommended; and recommending the books to be recommended corresponding to the book covers to be recommended to the user.

Still further, the detecting whether the user starts the book reading operation includes: it is detected whether the user clicks the start reading button.

Still further, the input of the generated countermeasure network is the text content of the target paragraph, and the output of the generated countermeasure network is the paragraph description image.

According to a second aspect, the present invention provides a text recommendation system based on artificial intelligence and video processing, comprising: the detection module is used for detecting whether a user starts book reading operation or not;

the acquisition module is used for opening the front camera to acquire the video when the user reads the book and recording the mobile phone screen to acquire the screen recorded video if the user is detected to open the book reading operation;

an initial paragraph determination module for inputting the video of the user reading the book and the screen recorded video into an interest paragraph determination model to determine a plurality of initial interest paragraphs in the book;

a target paragraph determining module, configured to determine a target paragraph based on text contents corresponding to a plurality of initial interest paragraphs in the book;

a paragraph description image generation module for generating a paragraph description image based on the text content of the target paragraph using a generation countermeasure network;

the target book cover determining module is used for inputting the paragraph description image into a cover determining model to obtain a target book cover;

and the recommending module is used for recommending the target book corresponding to the target book cover to the user.

Still further, the system is further configured to:

acquiring uninteresting operation of a user on the target book;

in response to the uninteresting operation of the user on the target book, removing the target paragraph from the initial interest paragraphs to obtain multiple removed paragraphs;

generating a plurality of paragraph description images by using the generating countermeasure network based on the text content of the plurality of paragraphs after the elimination;

acquiring a paragraph description image selected by a user, wherein the paragraph description image selected by the user is a paragraph description image selected by the user from a plurality of paragraph description images;

inputting the paragraph description image selected by the user into the cover determination model to obtain a book cover to be recommended;

and recommending the books to be recommended corresponding to the book covers to be recommended to the user.

Still further, the detection module is further configured to: it is detected whether the user clicks the start reading button.

The invention provides a text recommendation method and a system based on artificial intelligence and video processing, wherein the method comprises the steps of detecting whether a user starts book reading operation or not; if the book reading operation of the user is detected, a front camera is opened to acquire a video when the user reads the book, and a mobile phone screen is recorded to acquire a screen recorded video; inputting the video of the user reading the book and the screen recorded video into an interest paragraph determination model to determine a plurality of initial interest paragraphs in the book; determining a target paragraph based on text content corresponding to a plurality of initial interest paragraphs in the book; generating paragraph description images based on the text content of the target paragraphs by using a generated countermeasure network; inputting the paragraph description image into a cover determination model to obtain a target book cover; and recommending the target book corresponding to the target book cover to the user, wherein the method can accurately recommend the book text suitable for the user.

Drawings

FIG. 1 is a schematic flow chart of a text recommendation method based on artificial intelligence and video processing according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of recommending books to be recommended according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a text recommendation system based on artificial intelligence and video processing according to an embodiment of the present invention.

Detailed Description

In the embodiment of the invention, a text recommendation method based on artificial intelligence and video processing is provided as shown in fig. 1, and the text recommendation method based on artificial intelligence and video processing comprises the following steps of S1-S7:

step S1, detecting whether a user starts book reading operation.

Whether the user is performing a book reading operation is determined by monitoring the user's behavior or operation, such as clicking a specific button or entering a specific mode. In some embodiments, whether the user initiates the book reading operation may be determined by detecting whether the user clicks a start reading button. As an example, a button of "start reading" is displayed in the mobile phone screen, and if the user clicks the button of "start reading", it is indicated that the user opens the book reading operation.

And S2, if the fact that the user starts the book reading operation is detected, opening a front camera to obtain videos when the user reads the book, and recording a mobile phone screen to obtain screen recorded videos.

The front camera is a camera of which the front face of the equipment faces towards the user and is used for shooting the face of the user.

The video of the user reading the book is a continuous image sequence containing the user's face and actions acquired by the front camera. The video of the user reading the book records the user's behavior and response during the reading process.

The screen recorded video obtained by the mobile phone screen is recorded video of screen content obtained through a mobile phone screen recording function. The video can capture the operations of scrolling, page turning, labeling and the like when a user reads books.

And S3, inputting the video of the user reading the book and the screen recorded video into an interest paragraph determining model to determine a plurality of initial interest paragraphs in the book.

The interest paragraph determination model is a transducer model, the input of the interest paragraph determination model is a video when the user reads a book and the screen recorded video, and the output of the interest paragraph determination model is a plurality of initial interest paragraphs in the book. The transducer model is one implementation of artificial intelligence.

The transducer establishes a global dependency relationship in the input sequence through a self-attention mechanism, and can process the relevance among different elements at the same time. The transducer model can understand the semantics of each element in the video and the screen recording video information when the user reads the book, and capture the association relationship between the video and the screen recording video information. The transducer model includes an Encoder (Encoder) and a Decoder (Decoder). Each section is formed by stacking a plurality of identical layers. The encoder functions to learn the representation of the input sequence, including the self-attention mechanism and the Feed-Forward Network (Feed-Forward Network). The decoder introduces an additional Multi-Head Attention mechanism (Multi-Head Attention) on the basis of the encoder for decoding the encoder output and generating the target sequence.

The transducer model can capture global information and local associations in the input sequence to better understand the context in the sequence. For the video and the screen recorded video when the user reads, the data can be time series data, wherein the time series data comprise information such as actions, facial expressions and the like of the user in the reading process and the sequence display of contents such as characters, pictures and the like on the screen. In reading videos and screen recording videos, a transducer can learn information such as text content in the videos and behavior feedback of users, and map the information into semantic spaces related to book content for understanding and analysis. As an example, in reading a video, a user may briefly stay on a paragraph or page a number of times, and such repeated gazing or stay behavior may also be effectively captured by the transducer model, helping the model determine the paragraphs of interest to the user.

In some embodiments, the interest paragraph determination model includes a video matching layer, a paragraph action determination layer, a degree of interest determination layer, and an interest paragraph screening layer. The video matching layer, the paragraph action determining layer, the interest degree determining layer and the interest paragraph screening layer all comprise a transducer structure. The input of the video matching layer is the video when the user reads the book and the screen recorded video, the output of the video matching layer is the segmented video when the user reads the book corresponding to each paragraph of the book, the screen recorded segmented video corresponding to each paragraph of the book, the input of the paragraph action determining layer is the segmented video when the user reads the book corresponding to each paragraph of the book, the screen recorded segmented video corresponding to each paragraph of the book, the output of the paragraph action determining layer is the reading time length corresponding to each paragraph of the book, the facial expression sequence, the gesture operation of the user and the eye action sequence, the input of the interest degree determining layer is the reading time length corresponding to each paragraph of the book, the facial expression sequence, the gesture operation of the user and the eye action sequence, the output of the interest degree determining layer is the interest degree of each paragraph of the book, the input of the interest paragraph screening layer is the interest degree of each paragraph of the book, and the output of the interest screening layer is a plurality of initial interest paragraphs.

The reading time length, the facial expression sequence, the gesture operation of the user and the eye action sequence corresponding to each paragraph of the book can be used for judging the interest degree of the user in each paragraph, so that a plurality of initial interest paragraphs are determined. As an example, the reading duration corresponding to each paragraph of the book is the duration that the user stays on each paragraph. A long dwell may indicate that the user is interested in the paragraph content, while a short dwell may indicate lack of interest. The facial expression sequence corresponding to each paragraph of the book is a change sequence of facial expressions of a user in the reading process. The sequence of facial expressions corresponding to each paragraph of the book may represent the user's interest in that paragraph, as an example, if the user exhibits a high level of excitement or concentration of facial expressions, they may be considered to be interested in the paragraph content. Gesture operations of the user corresponding to each paragraph of the book record gesture operations such as page turning, marking and swiping of the user in the reading process. As an example, the gesture operation of the user is related to the user's interest in a specific paragraph, for example, if the user marks or highlights a certain paragraph, it indicates that the user's interest in the paragraph is high. The eye motion sequence refers to a deformed sequence of eye movements of the user. By analyzing the sequence of eye movements of the user, one can learn how much and how much attention they are focusing on each paragraph. For example, looking at a particular paragraph for a long period of time may indicate that the user is interested in the paragraph content, while frequent glances and rapid jumps may indicate that the paragraph is of low interest or difficult to understand.

And S4, determining a target paragraph based on the text contents corresponding to the initial interest paragraphs in the book.

The target paragraph is a paragraph that is more representative of content screened from the plurality of initial interest paragraphs in the book for book recommendation. As an example, one target paragraph may be: "a beautiful sunset, an afterglow of the sunset, is sprinkled on the lake surface, and shows golden yellow brilliance".

In some embodiments, the target paragraph may be determined based on a target paragraph determination model. The target paragraph determination model is a deep neural network model. The input of the target paragraph determining model is text content corresponding to a plurality of initial interest paragraphs in the book, and the output of the target paragraph determining model is the target paragraph. The deep neural network model includes a deep neural network (Deep Neural Networks, DNN). The deep neural network model is one implementation of artificial intelligence. The deep neural network may include a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Networks, CNN), and so on. In the training process, the deep neural network model optimizes the weight and deviation of the deep neural network model through multiple iterations, so that the understanding capability of the deep neural network model on text data is gradually improved. The model learned representation may capture similarity, relevance, and semantic information between different paragraphs, enabling inference of a target paragraph with more representative content from the initial paragraph of interest entered. When a new plurality of initial interest paragraphs are entered, the deep neural network model maps the input to a more representative target paragraph of content associated therewith based on knowledge it learns during the training process. This mapping is based on the abstraction and understanding of the text data by a model, which can identify a more representative target paragraph of content by pattern matching, semantic association, etc.

And step S5, generating paragraph description images based on the text content of the target paragraphs by using a generation countermeasure network.

The input of the generated countermeasure network is the text content of the target paragraph, and the output of the generated countermeasure network is the paragraph description image. The generation of the countermeasure network (Generative Adversarial Network, GAN for short) comprises a generator and a discriminator, which are mutually game and are continuously optimized to achieve the purpose of generating the vivid data.

The generation of the countermeasure network can learn the characteristic representation of the data through a training process. The generator may generate realistic image samples and the arbiter may attempt to distinguish the image generated by the generator from the real image. As training proceeds, the generator gradually learns to generate more realistic images, and the arbiter gradually and accurately determines which images are generated. The generation of the countermeasure network may translate the literal content of the target paragraph into a paragraph description image.

The paragraph descriptive image is an image generated by generating an countermeasure network corresponding to the literal content of the target paragraph.

And S6, inputting the paragraph description image into a cover determination model to obtain the target book cover.

The cover determination model is a deep neural network model. And the input of the cover determination model is the paragraph description image, and the output of the cover determination model is the cover of the target book. The target book cover is the book cover that is the most similar or the most matched to the paragraph descriptive image.

As an example, the paragraph description image describes that "at the afternoon of a sunny day, a girl planted with her grandpa with a bright carnation in a garden, two faces overflowed with happy smiles", and the target book cover determined by the cover determination model is a related story book cover named "memory in the sun", a similar scene is presented on the cover, a warm moment of the girl and grandpa is presented, and information of the title and the author is added. The cover determination model may learn the association between the paragraph description image and the target book cover through a training process. By generating a paragraph description image generated against the network, the contents described in the paragraph and the characteristics thereof can be reflected more accurately. Thus, when the target book covers are matched according to the paragraph description images, the matching accuracy and precision can be improved. By converting the paragraph description into an image and generating a corresponding book cover, a more visual and intuitive reading experience can be provided for the user. The user can directly know the theme, emotion, style and the like of the books through the cover images, so that the books interested in the user can be selected and read more conveniently.

And S7, recommending the target book corresponding to the target book cover to a user.

The target book corresponding to the target book cover is a book corresponding to the target book cover.

In some embodiments, if the user is not interested in the target book, the book to be recommended may be recommended through fig. 2, and fig. 2 is a schematic flow chart of the recommended book to be recommended provided in the embodiment of the present invention, where the recommended book to be recommended includes steps S21 to S26:

step S21, the uninteresting operation of the user on the target book is obtained.

This step refers to obtaining operations or feedback that the user is not interested in the target book representation. As an example, a user clicking on a "no interest" button or other corresponding operation on an application or platform indicates that the target book is not of interest.

Step S22, in response to the user' S uninteresting operation on the target book, the target paragraph is removed from the multiple initial interest paragraphs to obtain multiple removed paragraphs.

And removing the target paragraph from the multiple initial interest paragraphs containing the target paragraph according to the operation that the user is not interested in the target book, so as to obtain multiple removed paragraphs.

As an example, the plurality of initial interest paragraphs include an "a" segment, a "B" segment, a "C" segment, a "D" segment, and the target paragraph is a "B" segment, and the plurality of culled paragraphs are an "a" segment, a "C" segment, and a "D" segment.

Step S23, generating a plurality of paragraph description images by using the generating countermeasure network based on the text content of the plurality of paragraphs after the elimination.

The description about generating the countermeasure network can be referred to as step S5. The generation countermeasure network can generate a plurality of paragraph description images according to the text content of the plurality of paragraphs after being removed.

Step S24, acquiring a paragraph description image selected by a user, wherein the paragraph description image selected by the user is a paragraph description image selected by the user from a plurality of paragraph description images.

This step refers to the user selecting and confirming a paragraph description image of interest from the generated plurality of paragraph description images. As an example, a user selects one of the presented plurality of paragraph description images, indicating an interest in the corresponding paragraph.

Step S25, inputting the paragraph description image selected by the user into the cover determining model to obtain the book cover to be recommended.

The cover determining model takes the paragraph description image selected by the user as input to generate a corresponding book cover to be recommended. The description of the cover determination model may be referred to as step S6.

And S26, recommending the books to be recommended corresponding to the book covers to be recommended to the user.

The method comprises the steps of associating book covers to be recommended generated according to paragraph description images selected by a user with corresponding books to be recommended, and recommending the books to be recommended to the user.

Based on the same inventive concept, fig. 3 is a schematic diagram of a text recommendation system based on artificial intelligence and video processing according to an embodiment of the present invention, where the text recommendation system based on artificial intelligence and video processing includes:

a detection module 31, configured to detect whether a user starts a book reading operation;

the acquiring module 32 is configured to, if it is detected that the user starts the book reading operation, open the front camera to acquire the video when the user reads the book and record the mobile phone screen at the same time to obtain the screen recorded video;

an initial paragraph determination module 33 for inputting the video of the user reading the book and the screen recorded video into an interest paragraph determination model to determine a plurality of initial interest paragraphs in the book;

a target paragraph determining module 34, configured to determine a target paragraph based on text content corresponding to a plurality of initial interesting paragraphs in the book;

a paragraph description image generation module 35 for generating a paragraph description image based on the text content usage of the target paragraph;

the target book cover determining module 36 is configured to input the paragraph description image into a cover determining model to obtain a target book cover;

and the recommending module 37 is configured to recommend the target book corresponding to the target book cover to a user.

Claims

1. A text recommendation method based on artificial intelligence and video processing, comprising:

detecting whether a user starts a book reading operation;

if the book reading operation of the user is detected, a front camera is opened to acquire a video when the user reads the book, and a mobile phone screen is recorded to acquire a screen recorded video;

inputting the video of the user reading the book and the screen recorded video into an interest paragraph determination model to determine a plurality of initial interest paragraphs in the book;

determining a target paragraph based on text content corresponding to a plurality of initial interest paragraphs in the book;

generating paragraph description images based on the text content of the target paragraphs by using a generated countermeasure network;

inputting the paragraph description image into a cover determination model to obtain a target book cover;

and recommending the target book corresponding to the target book cover to a user.

2. The text recommendation method based on artificial intelligence and video processing of claim 1, wherein the interest paragraph determination model is a transducer model, the input of the interest paragraph determination model is video of the user reading a book and the screen recorded video, and the output of the interest paragraph determination model is a plurality of initial interest paragraphs in the book.

3. The artificial intelligence and video processing based text recommendation method according to claim 1, wherein the method further comprises:

acquiring uninteresting operation of a user on the target book;

4. The text recommendation method based on artificial intelligence and video processing according to claim 1, wherein the detecting whether a user starts a book reading operation comprises: it is detected whether the user clicks the start reading button.

5. The text recommendation method based on artificial intelligence and video processing of claim 1, wherein the input to the generating countermeasure network is text content of the target paragraph and the output to the generating countermeasure network is paragraph description image.

6. A text recommendation system based on artificial intelligence and video processing, comprising:

the detection module is used for detecting whether a user starts book reading operation or not;

7. The text recommendation system based on artificial intelligence and video processing of claim 6 wherein the paragraph of interest determination model is a transducer model, the input of the paragraph of interest determination model is video of the user reading a book and the screen recorded video, and the output of the paragraph of interest determination model is a plurality of initial paragraphs of interest in the book.

8. The artificial intelligence and video processing based text recommendation system according to claim 6, wherein the system is further configured to:

acquiring uninteresting operation of a user on the target book;

9. The text recommendation system based on artificial intelligence and video processing of claim 6, wherein the detection module is further configured to: it is detected whether the user clicks the start reading button.

10. The artificial intelligence and video processing based text recommendation system of claim 6, wherein the input to the generate countermeasure network is textual content of the target paragraph and the output to the generate countermeasure network is a paragraph description image.