CN115935049A

CN115935049A - Recommendation processing method and device based on artificial intelligence and electronic equipment

Info

Publication number: CN115935049A
Application number: CN202111005050.5A
Authority: CN
Inventors: 程川
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2023-04-07

Abstract

The application provides a recommendation processing method and device based on artificial intelligence, electronic equipment and a computer readable storage medium; the method comprises the following steps: acquiring an image of an object to be recommended, wherein the image is obtained by shooting when the object to be recommended watches information in the environment; identifying the image to obtain at least one of the state characteristic of the object to be recommended and the environment characteristic of the environment; acquiring target recommendation information for the object to be recommended and generating a plurality of candidate display modes of the target recommendation information; determining a target display mode which is adapted to at least one of the state characteristic and the environment characteristic from the plurality of candidate display modes; and displaying the target recommendation information according to the target display mode. Through the method and the device, the recommendation accuracy can be improved through the real-time shot image.

Description

Recommendation processing method and device based on artificial intelligence and electronic equipment

Technical Field

The present application relates to artificial intelligence technologies, and in particular, to a recommendation processing method and apparatus based on artificial intelligence, an electronic device, and a computer-readable storage medium.

Background

Artificial Intelligence (AI) is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

Information recommendation is an important application of artificial intelligence, a ranking module in a recommendation system usually predicts click rate and ranks based on a machine learning model, and a high-grade is taken as a priority recommendation object. Various efforts are made in the related art to improve the click rate prediction accuracy of the machine learning model, for example, a large amount of feature data is constructed in a feature engineering stage to enable the machine learning model to learn sufficiently, but the matching degree between the instant recommendation information and the object to be recommended is very high, and the situation that interactive behaviors cannot be generated still exists.

Disclosure of Invention

The embodiment of the application provides a recommendation processing method and device based on artificial intelligence, electronic equipment and a computer-readable storage medium, and recommendation accuracy can be improved by means of images shot in real time.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a recommendation processing method based on artificial intelligence, which comprises the following steps:

acquiring an image of an object to be recommended, wherein the image is obtained by shooting when the object to be recommended watches information in the environment;

performing identification processing on the image to obtain at least one of the state characteristic of the object to be recommended and the environmental characteristic of the environment;

acquiring target recommendation information for the object to be recommended and generating a plurality of candidate display modes of the target recommendation information;

determining a target display mode which is adaptive to at least one of the state characteristic and the environment characteristic from the plurality of candidate display modes;

and displaying the target recommendation information according to the target display mode.

The embodiment of the application provides a recommendation processing apparatus based on artificial intelligence, including:

the device comprises a shooting module, a recommendation module and a recommendation module, wherein the shooting module is used for obtaining an image of an object to be recommended, and the image is obtained by shooting when the object to be recommended watches information in an environment;

the identification module is used for identifying the image to obtain at least one of the state characteristic of the object to be recommended and the environment characteristic of the environment;

the recommendation module is used for acquiring target recommendation information for the object to be recommended and generating a plurality of candidate display modes of the target recommendation information;

a display module for determining a target display mode adapted to at least one of the status feature and the environmental feature from the plurality of candidate display modes;

the display module is further configured to display the target recommendation information according to the target display mode.

In the foregoing solution, the recommending module is further configured to: acquiring recommendation information of a plurality of candidates for the object to be recommended; target recommendation information adapted to at least one of the status feature and the environmental feature is determined from the plurality of candidate recommendation information.

In the foregoing solution, the recommending module is further configured to: performing the following processing for each of the candidate recommendation information: acquiring content characteristics of the candidate recommendation information and portrait characteristics of the object to be recommended; performing first feature cross processing on at least one of the content features of the candidate recommendation information, the portrait features of the object to be recommended, and the state features and the environment features to obtain a first feature cross processing result; performing logistic regression processing on the first feature cross processing result to obtain a first recommendation index of the candidate recommendation information; and determining the candidate recommendation information corresponding to the first recommendation index exceeding the first recommendation index threshold value as target recommendation information.

In the foregoing solution, the recommending module is further configured to: selecting a plurality of candidate recommendation information corresponding to the first recommendation index exceeding a first recommendation index threshold; selecting at least one piece of recommendation information meeting diversity conditions from the selected multiple candidate recommendation information as target recommendation information; wherein the diversity condition specifies a maximum number of target recommendation information belonging to the same category.

In the foregoing solution, the recommending module is further configured to: generating at least one candidate template of the target recommendation information; generating at least one candidate summary of the target recommendation information; generating at least one candidate cover page of the target recommendation information; combining the at least one candidate template, the at least one candidate abstract and the at least one candidate cover page in different modes to obtain a plurality of candidate display modes; wherein the candidate summary comprises at least one of: the title of the target recommendation information, the recommendation reason of the target recommendation information and the introduction of the target recommendation information.

In the foregoing solution, the recommending module is further configured to: performing at least one of the following processes: extracting key content of the target recommendation information, and generating the candidate abstract according to the key content; determining an interactive object having an interactive behavior with the target recommendation information, and generating the candidate abstract according to the interactive object; and determining the association information of which the association degree with the target recommendation information exceeds an association degree threshold value, and generating the candidate abstract according to the association information.

In the foregoing solution, the recommending module is further configured to: when the target recommendation information is a recommendation video, dividing the recommendation video into a plurality of shots, wherein each shot comprises a plurality of continuous video frames; performing the following processing for each of the shots: clustering a plurality of video frames of the shot to obtain a plurality of clusters under the shot; and determining the video frame in each cluster closest to the corresponding cluster center, and determining the closest video frame as the candidate cover.

In the foregoing solution, the recommending module is further configured to: performing the following for each video frame of the recommended video: determining the position of each pixel point of the video frame; any two continuous video frames are combined into a segmentation unit, and the following processing is executed for each segmentation unit of the recommended video: determining gray level difference values of pixel points at the same positions of two video frames of the segmentation unit, and averaging the gray level difference values of the pixel points at multiple positions to obtain difference values of the segmentation unit; and determining that the segmentation unit corresponding to the difference value larger than the difference value threshold value has a shot boundary, and dividing two video frames of the segmentation unit into different shots.

In the foregoing solution, the recommending module is further configured to: composing the plurality of consecutive video frames into a set of video frames; randomly selecting N video frames from the video frame set, taking image features corresponding to the N video frames as initial clustering centers of a plurality of clustering sets, and removing the N video frames from the video frame set, wherein N is the number of candidate covers corresponding to the shot, and N is an integer greater than or equal to 2; initializing the iteration number of clustering processing to be M, and establishing a null set corresponding to each cluster, wherein M is an integer greater than or equal to 2; performing the following processing in each iteration process of the clustering processing: updating each cluster set, executing cluster center generation processing based on an updating processing result to obtain a new cluster center of each cluster, adding the video frame corresponding to the initial cluster center to the video frame set again when the new cluster center is different from the initial cluster center, and updating the initial cluster center based on the new cluster center; determining each cluster set obtained after M times of iteration as a cluster processing result, or determining each cluster set obtained after M times of iteration as a cluster processing result; the clustering centers of the clusters obtained after iteration for M times are the same as those of the clusters obtained after iteration for M-1 times, M is an integer variable, and the value of M is more than or equal to 2 and less than or equal to M.

In the foregoing solution, the recommending module is further configured to: performing the following for each of the clusters: averaging the image characteristics of each video frame of the cluster to obtain a cluster center of the cluster; and determining the video frame closest to the clustering center according to the distance between the image characteristic of each video frame of the cluster and the clustering center.

In the foregoing solution, the identification module is further configured to perform at least one of the following processes: performing state recognition processing on the image to obtain state characteristics of the object to be recommended; and carrying out environment recognition processing on the image to obtain the environmental characteristics of the environment.

In the foregoing solution, the display module is configured to execute the following processing for each candidate display mode: extracting text features and image features of the candidate display modes; performing second feature cross processing on at least one of the text feature and the image feature of the candidate display mode, the portrait feature, the state feature and the environment feature to obtain a second feature cross processing result; and performing logistic regression processing on the second feature cross processing result to obtain a second recommendation index of the candidate display mode of the target recommendation information. And determining the candidate display mode corresponding to the second recommendation index exceeding the second recommendation index threshold value as the target display mode.

identifying the image to obtain at least one of the state characteristic of the object to be recommended and the environment characteristic of the environment;

acquiring recommendation information of a plurality of candidates for the object to be recommended;

determining target recommendation information adapted to at least one of the state feature and the environmental feature from the plurality of candidate recommendation information;

and displaying the target recommendation information.

the recommending module is used for acquiring recommending information of a plurality of candidates aiming at the object to be recommended;

the recommendation module is further used for determining target recommendation information which is adaptive to at least one of the state characteristic and the environment characteristic from the plurality of candidate recommendation information;

and the display module is used for displaying the target recommendation information.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the recommendation processing method based on artificial intelligence provided by the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions and is used for realizing the recommendation processing method based on artificial intelligence provided by the embodiment of the application when being executed by a processor.

The embodiment of the application has the following beneficial effects:

the method comprises the steps of obtaining an image obtained by shooting an object (user) to be recommended when the user watches information in an environment, obtaining at least one of state characteristics and environment characteristics of the user, subsequently determining a target display mode adaptive to the at least one of the state characteristics and the environment characteristics from a plurality of candidate display modes, and displaying target recommendation information according to the target display mode.

Drawings

FIG. 1 is a schematic diagram of recommendation logic for a recommendation processing method provided in the related art;

FIG. 2 is a schematic diagram of an architecture of an artificial intelligence based recommendation processing system provided by an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

4A-4D are schematic flow charts of artificial intelligence-based recommendation processing methods provided by embodiments of the present application;

FIG. 5 is a schematic flowchart of a recommendation processing method according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating a recommendation processing method according to an embodiment of the present disclosure;

FIG. 7 is a key frame extraction diagram provided in an implementation of the present application;

FIG. 8 is a flowchart of image processing provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a display process provided in an embodiment of the present application;

FIG. 10 is a product interface diagram of an artificial intelligence based recommendation processing method provided by an embodiment of the application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) The method is characterized in that: features are data representations that can effectively reflect different characteristics of the problem.

2) Feature extraction: the feature extraction process can convert the raw data into a more efficient representation based on the problem to be solved;

3) Characteristic engineering: feature engineering is the process of utilizing relevant knowledge in the data domain to create features that enable the algorithm to achieve optimal performance;

4) User portrait: the user representation is a user model abstracted according to information such as social attributes, living habits, consumption behaviors and the like of the user;

5) Personalized recommendation: personalized recommendation is a technology for pushing different contents to different users based on user figures;

6) Personalized display: is a technique for displaying the same content to a user in different ways based on the user representation.

Referring to fig. 1, fig. 1 is a schematic diagram of a recommendation logic of a recommendation processing method provided in the related art, which extracts user characteristics, such as age, residence, gender, work, school, favorite star, frequent information type, and the like, constructs a user portrait based on the user characteristics, extracts content characteristics, such as content category, content keyword, content corresponding work, content popularity, and the like, and uses the content characteristics and the user characteristics as input characteristics of a recommendation model, the machine-learned recommendation model performs click rate prediction according to the input characteristics, generates recommended content for a user based on a click rate prediction result, and finally displays the recommended content to the user.

When the user carries out personalized pushing on videos, the image characteristics represented by historical personal watching behaviors of the user and the content characteristics of the pushed videos need to be matched to determine the most appropriate videos to push, so the image characteristics are very important for a recommendation system and are important guarantee for improving the user interest degree and the video click rate, and useful characteristics are usually extracted through a large amount of training data, so that a machine learning method is used for training to obtain a model. The validity of the model depends to a large extent on whether features extracted from the raw data by the feature engineering are valid or not.

The applicant finds that, in the process of recommending, the recommended video and how to present the recommended video affect whether the user views the content, and the following disadvantages exist in the related art: the same content may be recommended for different users for different reasons, but the same display modes such as titles, covers and the like are displayed for different users, so that the users may not perceive the interest of the recommended content, and therefore, the users may not click into the detailed page of the recommended information to watch the detailed content of the recommended information and the related content of the recommended information, and the user portrait is a long-term behavior representation, and based on that the user portrait cannot reflect the current mood and the watching environment of the users watching the recommended information, the recommended information or the display mode of the recommended information is likely not in accordance with the current mood and the watching environment of the users, thereby causing user confusion.

In view of the foregoing technical problems, embodiments of the present application provide a recommendation processing method, apparatus, electronic device and computer-readable storage medium based on artificial intelligence, which can automatically generate video summaries for different users according to their current expression states and user figures. Therefore, the user can accept the content pushed by the recommendation system better, and the recommendation effectiveness is improved.

The recommendation processing method provided by the embodiment of the application may be implemented by various electronic devices, for example, may be implemented by a terminal or a server alone, or may be implemented by cooperation of the terminal and the server.

Referring to fig. 2, fig. 2 is a schematic diagram of an architecture of an artificial intelligence based recommendation processing system provided in an embodiment of the present application, in which a terminal 400 is connected to a server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.

In some embodiments, the functions of the artificial intelligence based recommendation processing system are implemented based on the server 200 and the terminal 400, and in the process of using the terminal 400 by the user, in response to the terminal 400 receiving the refresh operation of the user, the terminal 400 acquires an image of the user, wherein the image is obtained by shooting when the user watches information in the environment; the terminal 400 sends the image and the recommendation request to the server 200, and the server 200 identifies the image to obtain at least one of the state characteristics of the user and the environmental characteristics of the environment; acquiring target recommendation information for a user through a server 200, and generating a plurality of candidate display modes of the target recommendation information; determining a target display mode adapted to at least one of the state characteristic and the environment characteristic from a plurality of candidate display modes; the server 200 returns the target display mode and the target recommendation information to the terminal 400, and displays the target recommendation information on the terminal 400 according to the target display mode.

In some embodiments, when the recommendation processing system is applied to a video recommendation scene, in response to the terminal 400 receiving a refresh operation of a user, the terminal 400 obtains an image of an object to be recommended, wherein the image is obtained by shooting when the object to be recommended watches a video in an environment; the terminal 400 sends the image to the server 200, and the server 200 identifies the image to obtain at least one of the state characteristic of the object to be recommended and the environmental characteristic of the environment; acquiring a target recommendation video for an object to be recommended through a server 200, and generating a plurality of candidate display modes of the target recommendation video; determining a target display mode adapted to at least one of the state characteristic and the environment characteristic from a plurality of candidate display modes; the server 200 returns the target display mode and the target recommended video to the terminal 400, and displays the target recommended video on the terminal 400 according to the target display mode.

In other embodiments, when the recommendation processing method provided by the embodiment of the present application is implemented by a terminal alone, in various application scenarios described above, in response to the terminal 400 receiving a refresh operation of a user, the terminal 400 acquires an image of an object to be recommended, where the image is obtained by shooting when the object to be recommended views information in an environment, performs recognition processing on the image, obtains at least one of a state feature of the object to be recommended and an environment feature of the environment, acquires target recommendation information for the object to be recommended, generates a plurality of candidate display modes of the target recommendation information, determines a target display mode adapted to at least one of the state feature and the environment feature from the plurality of candidate display modes, and displays the target recommendation information according to the target display mode.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as cloud services, a cloud database, cloud computing, cloud functions, cloud storage, a network service, cloud communication, middleware services, domain name services, security services, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart television, a smart car device, and the like, and the terminal 400 may be provided with a client, for example, but not limited to, a video client, a browser client, an information flow client, an image capturing client, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

Next, a structure of an electronic device for implementing the artificial intelligence based recommendation processing method according to the embodiment of the present application is described, and as described above, the electronic device according to the embodiment of the present application may be the server 200 or the terminal 400 in fig. 2. Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device provided in the embodiment of the present application, and the electronic device is taken as a terminal 400 for example. The terminal 400 shown in fig. 3 includes: at least one processor 410, memory 450, at least one network interface 420. The various components in the terminal 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in FIG. 3.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), and the volatile memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 452 for communicating to other electronic devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;

an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.

In some embodiments, the artificial intelligence based recommendation processing apparatus provided in the embodiments of the present application may be implemented in software, and fig. 3 shows an artificial intelligence based recommendation processing apparatus 455-1 stored in a memory 450, which may be software in the form of programs and plug-ins, and includes the following software modules: a photographing module 4551-1, a recognition module 4552-1, a recommendation module 4553-1 and a display module 4554-1, and fig. 3 shows an artificial intelligence based recommendation processing apparatus 455-2 stored in a memory 450, which may be software in the form of programs and plug-ins, etc., including the following software modules: a photographing module 4551-2, an identification module 4552-2, a recommendation module 4553-2, and a display module 4554-2, which are logical and thus may be arbitrarily combined or further divided according to functions implemented, and functions of the respective modules will be described hereinafter.

The following describes an artificial intelligence based recommendation processing method provided by the embodiment of the present application, in conjunction with an exemplary application and implementation of the terminal 400 provided by the embodiment of the present application.

Referring to fig. 4A, fig. 4A is a schematic flowchart of an artificial intelligence-based recommendation processing method provided in an embodiment of the present application, which will be described with reference to the steps shown in fig. 4A.

In step 101, an image of an object to be recommended is acquired.

As an example, the object to be recommended is a user using a terminal, and a video APP is taken as an example, then the object to be recommended is a user using the video APP on the terminal, the image is obtained by shooting when the object to be recommended watches information in an environment, for example, when a user a watches information of a top page of the video APP on a subway, the image is obtained by shooting when the user a watches information of the top page of the video APP in the subway environment, and the image is a video or a photo, that is, the image is a picture or a video frame in a video obtained by shooting.

In step 102, the image is identified to obtain at least one of the state feature of the object to be recommended and the environment feature of the environment.

In some embodiments, the step 102 of performing recognition processing on the image to obtain at least one of the state feature of the object to be recommended and the environment feature of the environment may be implemented by the following technical solutions: performing at least one of the following processes: performing state recognition processing on the image to obtain state characteristics of an object to be recommended; and carrying out environment recognition processing on the image to obtain the environmental characteristics of the environment.

As an example, referring to fig. 8, fig. 8 is an image processing flowchart provided in an embodiment of the present application, where image shooting is performed on the premise of user authorization, for example, in response to an authorization operation of an object to be recommended, a recommendation client permanently obtains a right to shoot a user, or the recommendation client obtains a right to shoot a user during a period in which the user uses the recommendation client and a recommendation request is triggered, and after obtaining an image, a state feature representing a mood state of the user and an environmental feature of an environment where the recommendation APP is viewed can be obtained based on an expression recognition model (mood recognition model) and a scene recognition model (environmental recognition model), where the expression recognition model and the scene recognition model may be neural network models and are obtained through machine learning training.

As an example, for a training process of an expression recognition model, an image sample and a pre-labeling state are input to the expression recognition model, a prediction state of the image sample is predicted through the expression recognition model, an error between the image sample and the pre-labeling state is used as a first loss, a parameter of the expression recognition model is updated with the minimized first loss as a target, so that the expression recognition model is obtained, for a training process of a scene recognition model, the image sample and a pre-labeling environment are input to the scene recognition model, a prediction environment of the image sample is predicted through the scene recognition model, an error between the image sample and the pre-labeling environment is used as a second loss, a parameter of the scene recognition model is updated with the minimized second loss as a target, so that the scene recognition model is obtained.

In step 103, target recommendation information for the object to be recommended is acquired, and a plurality of candidate display modes of the target recommendation information are generated.

In some embodiments, referring to fig. 4B, fig. 4B is a flowchart of a recommendation processing method based on artificial intelligence provided in an embodiment of the present application, and the acquiring of the target recommendation information for the object to be recommended in step 103 may be implemented by steps 1031 to 1032, which will be described with reference to steps 1031 to 1032 shown in fig. 4B.

In step 1031, recommendation information for a plurality of candidates of an object to be recommended is acquired.

As an example, the candidate recommendation information may be at least one of: text, video or image, and candidate recommendation information is obtained through recall processing, or is obtained through sorting processing, or is obtained through reordering processing.

In step 1032, target recommendation information adapted to at least one of the state feature and the environmental feature is determined from the plurality of candidate recommendation information.

In some embodiments, the determining of the target recommendation information adapted to at least one of the state feature and the environmental feature from the plurality of candidate recommendation information may be implemented by the following technical solutions: the following processing is performed for each candidate recommendation information: acquiring content characteristics of candidate recommendation information and portrait characteristics of an object to be recommended; performing first feature cross processing on at least one of content features of the candidate recommendation information, portrait features of the object to be recommended, and state features and environment features to obtain a first feature cross processing result; performing logistic regression processing on the first feature cross processing result to obtain a first recommendation index of candidate recommendation information; and determining the candidate recommendation information corresponding to the first recommendation index exceeding the first recommendation index threshold value as target recommendation information.

By way of example, the portrait features are single features or combined features related to user portrait data, the single features may be user ages and user professions, the combined features may be combined features of different dimensions of the users, such as the combined features of the user ages and the user professions, the content features are single features or combined features related to recommendation information, the single features may be publishers of the recommendation information, categories of the recommendation information, the combined features may be combined features of different dimensions of the recommendation information, the state features and the environment features are recognized images, the state features represent mood states of the users, such as happiness or difficulty, the environment features are used for representing environments where the recommendation APP is viewed, such as environment features of subways, environment features of libraries, and the like, the first feature intersection processing is cartesian intersection or factorization machine intersection, the first feature cross processing is to combine the above features in different ways, wherein the feature used in each combination is partially or completely different to form a plurality of combined features of the recommendation information, the first recommendation index factor of each combined feature of the recommendation information is used as a weighting parameter, each combined feature of the recommendation information is weighted and summed to obtain a first recommendation index of the recommendation information, wherein the first recommendation index factor of the combined feature is the product of the first recommendation index correlation factors of the features included in the combined features, the first recommendation index correlation factor is an updatable parameter in the recommendation model training process, the first feature cross processing result is subjected to logistic regression processing, the logistic regression processing can be linear processing or processing combining linear processing and logistic regression equation, for linear processing, then, the first recommendation index, for example, the click rate, can be obtained by linearly combining the first feature cross processing results, which is shown in formula (1):

w herein ₀ Is an offset amount of the offset processing, w herein _i The result x is processed for each first feature _i Respectively corresponding first recommendation index factors.

In some embodiments, linear combinations may also be substituted into the logistic regression equation (2) based on the above linear processing, such that the linear combinations of the features are used as arguments, since the value range of the arguments is negative infinity to positive infinity, the arguments are mapped onto (0,1) using the logistic regression equation, and the result obtained after the substitution into the logistic regression equation is used as the first recommendation index, such as the click rate:

where z is a linear combination of the features shown in equation (1), and g is the first recommendation index.

In some embodiments, the determining, as the target recommendation information, the candidate recommendation information corresponding to the first recommendation index exceeding the first recommendation index threshold may be implemented by the following technical solutions: selecting a plurality of candidate recommendation information corresponding to the first recommendation index exceeding a first recommendation index threshold; selecting at least one piece of recommendation information meeting diversity conditions from the selected plurality of candidate recommendation information as target recommendation information; wherein the diversity condition specifies a maximum number of target recommendation information belonging to the same category.

As an example, selecting at least one piece of recommendation information meeting diversity conditions from the selected multiple candidate pieces of recommendation information as target recommendation information may be implemented by forming a first information set by the multiple candidate pieces of recommendation information, transferring recommendation information with a highest first recommendation index in the first information set to a second information set to serve as first information in the second information set, and determining a semantic distance between the first information and the highest first recommendation index in the first information set when the number of recommendation information in the second information set is smaller than an information number threshold; when the semantic distance is greater than the semantic distance threshold, information with the highest first recommendation index is transferred from the first information set to the second information set and updated to be first information in the second information set, and the information transfer process may be only based on the first recommendation index, for example, 10000 pieces of recommendation information are respectively subjected to first recommendation index prediction to obtain corresponding first recommendation indexes, then the 10000 pieces of recommendation information are arranged from high to low according to the recommendation indexes, 200 pieces of recommendation information with the recommendation indexes arranged in the preamble positions can be selected as head recommendation information, and the recommendation information is transferred to the second information set, where the number of recommendation information in the first information set and the second information set can be set according to actual needs, and where the information number threshold is used as the minimum value of the number of recommendation information in the second information set.

In some embodiments, when the number of recommended information in the second information set is smaller than the information number threshold, the deduplication function number threshold is the number of information in the second information set, that is, the number of information that needs to be obtained after deduplication processing is performed, and the number of information in the second information set is smaller than the information number threshold, which means that the number of information in the second information set does not reach a preset value, that is, deduplication processing needs to be further performed on the first information set.

In some embodiments, when the semantic distance is greater than a semantic distance threshold, the recommendation information with the highest first recommendation index in the first information set is deleted from the first information set, added to the second information set, updated to the first information in the second information set, and semantic distance determination is continued to be performed on the recommendation information with the highest first recommendation index in the first information set and the first information just updated to the second information set, so as to continue to screen out recommendation information that is not similar to the recommendation information that was last transferred to the second information set from the first information set, wherein similarity and dissimilarity are defined by a semantic distance threshold, when the semantic distance between two recommendation information is greater than the semantic distance threshold, it is determined that the two recommendation information belong to dissimilar recommendation information, and when the semantic distance between the two recommendation information is not greater than the semantic distance threshold, it is determined that the two recommendation information belong to similar recommendation information, that is the same recommendation information, for example, a category of a game, and the like, and the category hierarchy belonging to the category can be determined according to actual conditions.

In some embodiments, referring to fig. 4C, fig. 4C is a schematic flowchart of a recommendation processing method based on artificial intelligence provided in this embodiment of the present application, and the multiple candidate display manners for generating the target recommendation information in step 103 may be implemented through steps 1033 to 1036, which will be described with reference to steps 1033 to 1036 shown in fig. 4C.

In step 1033, at least one candidate template of the target recommendation information is generated.

In some embodiments, the candidate template may be generated according to the target recommendation information, for example, if the target recommendation information is a new video, a candidate template corresponding to the new video is generated, the occupation ratio of the cover page in the candidate template corresponding to the new video is greater than that in other candidate templates,

in step 1034, at least one candidate summary of the target recommendation information is generated.

In some embodiments, the generating of the at least one candidate summary of the target recommendation information in step 1034 may be implemented by the following technical solutions: performing at least one of the following processes: extracting key content of the target recommendation information, and generating a candidate summary according to the key content; determining an interactive object having an interactive behavior with the target recommendation information, and generating a candidate abstract according to the interactive object; and determining the association information of which the association degree with the target recommendation information exceeds an association degree threshold, and generating the candidate abstract according to the association information.

As an example, extracting key content of the target recommendation information, and generating candidate summaries according to the key content, for example, for movie a, generating candidate summaries according to actor information of movie a; determining an interactive object having an interactive behavior with the target recommendation information, and generating a candidate abstract according to the interactive object, for example, if a movie A is observed by a user A and a user B, generating the candidate abstract according to the user A and the user B to represent that the user A and the user B have observed the movie A; and determining the association information of which the association degree with the target recommendation information exceeds an association degree threshold, and generating a candidate summary according to the association information, for example, for movie A, if the association degree between a certain advertisement A and the movie A exceeds the association degree threshold, generating the candidate summary according to the advertisement A, so as to recommend the advertisement A to the user through the candidate summary.

In step 1035, at least one candidate cover for the target recommendation information is generated;

in some embodiments, when the target recommendation information is a recommendation video, the step 1035 of generating at least one candidate cover of the target recommendation information may be implemented by: segmenting the recommended video into a plurality of shots, wherein each shot comprises a plurality of consecutive video frames; the following processing is performed for each shot: clustering a plurality of video frames of the shot to obtain a plurality of clusters under the shot; and determining the video frame closest to the corresponding cluster center in each cluster, and determining the closest video frame as a candidate cover.

In some embodiments, the aforementioned dividing of the recommended video into a plurality of shots may be implemented by the following technical solutions: performing the following for each video frame of the recommended video: determining the position of each pixel point of a video frame; any two consecutive video frames are grouped into segmentation units, and the following processing is performed for each segmentation unit of the recommended video: determining gray level difference values of pixel points at the same positions of two video frames of the segmentation unit, and averaging the gray level difference values of the pixel points at multiple positions to obtain difference values of the segmentation unit; and determining that the segmentation unit corresponding to the difference value larger than the difference value threshold value has a shot boundary, and dividing two video frames of the segmentation unit into different shots.

As an example, a video is divided into a plurality of shots, mainly to find boundaries of the shots, where one shot is an event or continuous action that depicts the same scene, so when a shot changes, a change between video frames will be relatively large, and a difference between video frames within the same shot will be relatively small, so that whether two video frames belong to the same shot can be determined by a difference between two adjacent video frames, and the difference between two video frames can be represented by a gray difference, see formula (3):

wherein Dis (difference of segmentation unit) is the gray difference between video frame I1 and video frame I2, M is the pixel number of video frame I1 and video frame I2, x, y are the positions of the pixels corresponding to video frame I1 and video frame I2, I is the gray difference between video frame I1 and video frame I2 ₁ (x,y)-I ₂ (x, y) is the gray difference value of the pixel points at the same position of the two video frames, and the difference between the two frames can be calculated by comparing the histograms and the edges of the two frames.

In some embodiments, the clustering process is performed on the plurality of video frames of the shot to obtain a plurality of clusters under the shot, and the clustering process can be implemented by the following technical scheme: forming a plurality of consecutive video frames into a set of video frames; randomly selecting N video frames from a video frame set, taking image features corresponding to the N video frames as initial clustering centers of a plurality of clustering sets, and removing the N video frames from the video frame set, wherein N is the number of candidate covers corresponding to a shot, and N is an integer greater than or equal to 2; initializing the iteration number of clustering processing to be M, and establishing a null set corresponding to each cluster, wherein M is an integer greater than or equal to 2; the following processing is performed during each iteration of the clustering process: updating each cluster set, executing cluster center generation processing based on the updating processing result to obtain a new cluster center of each cluster, adding the video frame corresponding to the initial cluster center to the video frame set again when the new cluster center is different from the initial cluster center, and updating the initial cluster center based on the new cluster center; determining a set of each cluster obtained after M times of iteration as a clustering result, or determining a set of each cluster obtained after M times of iteration as a clustering result; the clustering centers of the clusters obtained after iteration for M times are the same as those of the clusters obtained after iteration for M-1 times, M is an integer variable, and the value of M is more than or equal to 2 and less than or equal to M.

As an example, the above-mentioned updating process is performed on each set of clusters, and the cluster center generating process is performed based on the updating process result to obtain a new cluster center of each cluster, which may be implemented by the following technical solutions: performing the following for each video frame in the set of video frames: determining the similarity between the image characteristics of the video frames and the initial clustering center of each cluster; determining the initial clustering center corresponding to the maximum similarity as belonging to the same cluster as the video frame, and transferring the video frame to a clustering set corresponding to the initial clustering center with the maximum similarity, wherein the initial clustering center with the maximum similarity is the initial clustering center corresponding to the maximum similarity; and carrying out average processing on the image characteristics of each video frame in each cluster set to obtain a new cluster center of each cluster.

Taking advantage of the above example, if there are 30 video frames and N is 2, then the clustering process aims to divide the 30 video frames into two clusters, each cluster has a corresponding set, each set includes video frames corresponding to the clusters, first randomly selects image features of the 2 video frames as initial clustering centers of the two clusters, calculates similarity between each video frame and the 2 initial clustering centers for the remaining 28 video frames, for example, adopts L2 distance to evaluate similarity, for example, for a video frame E, the image features thereof are closer to the initial clustering center a, allocates the video frame E to the set of clusters corresponding to the initial clustering center a, after allocation operations are performed on the 28 video frames, recalculates a corresponding new clustering center for each cluster, if the new clustering centers of the two clusters are the same or the similarity between the new clustering center and the initial clustering center is greater than the similarity threshold, then directly determines each set as a result of the clustering process, if the new clustering centers of the two clusters are not the same or the similarity between the new clustering center and the initial clustering center is greater than the similarity threshold, and if the new clustering center is not the same as the initial clustering center, or if the new clustering center is not the video frame is not more than the similarity threshold, and the video frame is allocated to the initial clustering center, and the video frame is allocated again, and the video frame is not allocated to the video frame is not allocated.

In some embodiments, the determining the video frame closest to the corresponding cluster center in each cluster may be implemented by the following technical solutions: the following processing is performed for each cluster: averaging the image characteristics of each video frame of the cluster to obtain a cluster center of the cluster; and determining the video frame closest to the clustering center according to the distance between the image characteristics of each video frame of the cluster and the clustering center.

As an example, for a certain cluster a, there are 10 video frames, the image features of the 10 video frames of the cluster a are averaged to obtain a cluster center of the cluster, and according to a distance between the image feature of each video frame of the cluster a and the cluster center, a video frame closest to the cluster center is determined, for example, the distance between the image feature of the video frame a and the cluster center is closest, and then the video frame a is determined as the video frame closest to the cluster center.

In step 1036, at least one candidate template, at least one candidate summary, and at least one candidate cover are combined in different ways to obtain a plurality of candidate display modes.

As an example, the candidate summary includes at least one of: the target recommendation information comprises a title of the target recommendation information, a recommendation reason of the target recommendation information and an introduction of the target recommendation information, each of the candidate display modes comprises a candidate template, a candidate abstract and a candidate cover page, and the candidate display modes are different from one another.

In some embodiments, only at least one candidate abstract and at least one candidate cover may be generated, and then the at least one candidate abstract and the at least one candidate cover may be combined in different manners to obtain a plurality of candidate display manners.

In step 104, a target display mode adapted to at least one of the status characteristic and the environmental characteristic is determined from the plurality of candidate display modes.

In some embodiments, the determining the target display mode adapted to at least one of the state feature and the environmental feature from the plurality of candidate display modes in step 104 may be implemented by the following technical solutions: the following processing is performed for each candidate display manner: extracting text features and image features of the candidate display modes; performing second feature cross processing on the text feature and at least one of the image feature, the portrait feature, the state feature and the environment feature of the candidate display mode to obtain a second feature cross processing result; and performing logistic regression processing on the second characteristic cross processing result to obtain a second recommendation index of the candidate display mode of the target recommendation information. And determining the candidate display mode corresponding to the second recommendation index exceeding the second recommendation index threshold value as the target display mode.

As an example, the portrait features are single features or combined features related to the portrait data of the user, the single features may be age of the user and occupation of the user, the combined features may be combined features of different dimensions of the user, such as combined features of age of the user and occupation of the user, the state features and the environment features are obtained by identifying images, the state features characterize mood state of the user, such as happy or unhappy, the environment features are used for characterizing environment where recommended APPs are browsed, such as environment features of a subway, environment features of a library, and the like, the first feature intersection processing is cartesian intersection or factorial intersection, the features are combined in different ways in the first feature intersection processing, wherein the feature used in each combination is partially different or completely different, so as to form a plurality of combined features of recommendation information, each combined feature of the recommendation information is used as a weighting parameter, each combined feature of the recommendation information is weighted and summed, and recommended, and a second index of the recommendation information is obtained, wherein the second index of the combined features included in the combined features of the combined features is a linear regression processing model, and the combined recommendation processing is performed as a linear regression processing model, and the result of the combined recommendation processing is obtained by combining a linear regression processing method, wherein the method is performed by a linear regression processing method:

w herein ₀ For the offset amount of the offset processing, w here _i The result x is processed for each second feature _i Respectively corresponding second recommendation index factors.

In some embodiments, linear combinations may also be substituted into the logistic regression equation (5) based on the above linear processing, such that the linear combinations of the features are used as arguments, since the value range of the arguments is negative infinity to positive infinity, the arguments are mapped onto (0,1) using the logistic regression equation, and the result obtained after the substitution into the logistic regression equation is used as a second recommendation index, such as click rate:

where z is a linear combination of the features shown in equation (4), and g is the second recommendation index.

In step 105, the target recommendation information is displayed in a target display manner.

Referring to fig. 4D, fig. 4D is a schematic flowchart of a recommendation processing method based on artificial intelligence according to an embodiment of the present application, and will be described with reference to the steps shown in fig. 4D.

In step 201, an image of an object to be recommended is acquired, wherein the image is obtained by shooting when the object to be recommended watches information in the environment.

As an example, the image is a photograph or a video frame in a video taken when the object to be recommended views information in the environment.

In step 202, identifying the image to obtain at least one of a state feature of the object to be recommended and an environment feature of the environment;

in step 203, acquiring recommendation information of a plurality of candidates for an object to be recommended;

in step 204, target recommendation information adapted to at least one of the state characteristic and the environment characteristic is determined from the plurality of candidate recommendation information;

in step 205, the target recommendation information is displayed.

As an example, the implementation of steps 201-202 may refer to the implementation of steps 101-102, and the implementation of steps 203-205 may refer to the implementation of steps 1031-1032.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

In some embodiments, when the recommendation processing system is applied to a video recommendation scene, the terminal acquires an image of an object to be recommended, wherein the image is obtained by shooting when the object to be recommended watches a video in an environment; the terminal sends the image to the server, and the server identifies the image to obtain at least one of the state characteristic of the object to be recommended and the environment characteristic of the environment; acquiring a target recommendation video for an object to be recommended through a server, and generating a plurality of candidate display modes of the target recommendation video; determining a target display mode adapted to at least one of the state characteristic and the environment characteristic from a plurality of candidate display modes; the server returns the target display mode and the target recommendation video to the terminal, the target recommendation video is displayed on the terminal according to the target display mode, referring to fig. 10, fig. 10 is a product interface diagram of the recommendation processing method based on artificial intelligence provided by the embodiment of the application, and a playing page 502 of the target recommendation information is displayed in response to the triggering operation of the user on the target display information 501, wherein the display mode of the target display information 501 is personalized, different covers, titles and the like can be displayed for different users.

Referring to fig. 5, fig. 5 is a flowchart of a recommendation processing method provided in an embodiment of the present application, which includes extracting content features of a user image and a video through feature engineering, then obtaining a user photo in real time (after user authorization), extracting state features and environment features (features of the user photo), training a recommendation model using historical data, obtaining a recommendation video corresponding to the user according to the recommendation model, training a video display model using the historical data, generating personalized display content according to the features of the user photo, the user image, and the recommendation video using the video display model, and pushing the personalized display content (a cover sheet, a title, a brief introduction, a recommendation reason, and the like) to the user.

The core idea of the embodiment of the application is to improve the click rate and the accuracy of video recommendation by automatically generating a display mode of a recommended video, see fig. 6, where fig. 6 is a display flow chart of a recommendation processing method provided by the embodiment of the application, first, a recommendation system obtains an individualized recommended video corresponding to a user, and for video text data, abstracts (titles, brief descriptions, and the like) of a plurality of texts can be generated from different angles, and as candidate abstracts, the abstracts can be automatically generated by a machine learning method; the same video can highlight the information in a certain dimension, for example, for a movie related video, the content summary can be edited and generated in the following dimensions: a) Information dimension: such as highlighting actors in a movie; b) User dimension: for example, highlighting a video that is viewed by multiple friends; c) Propagation dimension: for example, the popularity of a related movie is highlighted, a picture or a video is extracted from a recommended video as a candidate cover page aiming at the cover page of a display mode, a user portrait of a user and a current photo of the user are obtained, state features and environment features are obtained based on the current photo of the user, a personalized display mode is generated through a video display model (a machine learning model), when the machine learning model is used for selecting a final display mode, a machine learning algorithm except a logistic regression model, such as a neural network, a random forest and the like, can be selected mainly based on a data distribution situation and an actual scene.

In some embodiments, referring to fig. 7, fig. 7 is a schematic diagram of key frame extraction provided in this application, for a candidate cover of a display mode, extracting a plurality of key frames from a video as candidates for the cover, where a key frame extraction technique of the video needs to be used, and using a clustering-based idea, a video is first divided into a plurality of shots, then image features (e.g., features such as color, histogram, shape, motion, and the like) are extracted for each frame in the shots, then all video frames under the shots are divided into K classes through a K-Means algorithm, and finally a video frame closest to a clustering center is selected in each class as a key frame of the class. The method comprises the following steps of dividing a video into a plurality of shots, mainly searching for the boundaries of the shots, wherein one shot is an event or continuous action for describing the same scene, so that when the shot changes, the change between video frames is large, and the difference between the video frames in the same shot is small, so that whether two video frames belong to the same shot can be judged through the difference between two adjacent video frames, and the difference between the two video frames can be represented by gray difference, which is shown in a formula (6):

the difference between the two frames can be calculated by comparing histograms and edges of the two frames, and the like.

In some embodiments, when extracting key frames in a video, other key frame extraction methods may also be used, for example, a method based on video content may extract a picture in which a certain actor appears as a key frame, different content recognition models may be trained for different types of videos, and then key frames are extracted through the content recognition models.

In some embodiments, referring to fig. 8, fig. 8 is an image processing flowchart provided in an embodiment of the present application, where image shooting is performed on the premise of user authorization, for example, in response to an authorization operation of an object to be recommended, a recommendation client permanently obtains a right to shoot a user, or the recommendation client obtains a right to shoot a user during a period when the user uses the recommendation client and a recommendation request is triggered, after an image is obtained, a state feature representing a mood state of the user and an environment feature of an environment where a recommended APP is browsed may be obtained based on an expression recognition model and a scene recognition model, so as to form a photo feature of a video currently seen by the user, where the expression recognition model and the scene recognition model may be neural network models and are obtained through machine learning training.

In some embodiments, referring to fig. 9,fig. 9 is a schematic display processing diagram provided in this embodiment of the application, and based on a machine learning model, a logistic regression model used in a recommendation system may be multiplexed according to a state feature, an environmental feature, a user portrait, and a most suitable text and picture selected from candidates of texts and pictures as display contents, the portrait feature, the environmental feature, the text feature, such as a bag feature, an article feature, and the like, of a candidate text may be extracted for the user, image features such as a color and a histogram may be extracted for the candidate image, then the features may be cross-processed, and whether the user clicks on a certain display mode of a recommended video may be predicted using the logistic regression model, and an output of the logistic regression model is a probability that the user clicks on the certain display mode of the recommended video, so that for each candidate display mode of the recommended video, a probability that the user clicks on may be obtained, as long as a display mode with the highest probability is selected as a final result, where a training process of the logistic regression model is as follows: first, training data D is collected, the training data D being a plurality of sets of data, e.g., (x) ¹ ，y ¹ )，(x ² ，y ² )，…，(x ^N ，y ^N ) Wherein, x represents a multi-dimensional feature vector (including the features of the user side and the features of the video side), y represents whether the user clicks the candidate display mode of the recommended video, 1 is click, 0 is no click, the training target is the minimization of the loss function, and the loss function is shown in formula (7):

wherein, P (y = | x; theta) is the click rate obtained by prediction, x is a multidimensional feature vector, the training aims to find a set of parameters theta capable of minimizing the loss function, and then the set of parameters is applied to predict new data.

The display scheme of the video recommended by the user based on the individuation of the user photo can better attract the user, so that the click rate and the accuracy of video recommendation are improved.

Continuing with the exemplary structure of the artificial intelligence based recommendation processing devices 455-1 and 455-2 provided in the embodiments of the present application as software modules, in some embodiments, as shown in FIG. 3, the software modules stored in the artificial intelligence based recommendation processing device 455-1 of the memory 450 may include: the shooting module 4551-1 is configured to acquire an image of an object to be recommended, where the image is obtained by shooting when the object to be recommended watches information in an environment; the recognition module 4552-1 is configured to perform recognition processing on the image to obtain at least one of a state feature of the object to be recommended and an environment feature of the environment; the recommendation module 4553-1 is configured to acquire target recommendation information for an object to be recommended and generate multiple candidate display modes of the target recommendation information; a display module 4554-1 configured to determine a target display mode adapted to at least one of the status feature and the environmental feature from the plurality of candidate display modes; the display module 4554-1 is further configured to display the target recommendation information according to the target display mode.

In some embodiments, the recommendation module 4553-1 is further configured to: acquiring recommendation information of a plurality of candidates aiming at an object to be recommended; target recommendation information adapted to at least one of the state feature and the environmental feature is determined from the plurality of candidate recommendation information.

In some embodiments, the recommendation module 4553-1 is further configured to: the following processing is performed for each candidate recommendation information: acquiring content characteristics of candidate recommendation information and portrait characteristics of an object to be recommended; performing first feature cross processing on at least one of the content feature of the candidate recommendation information, the portrait feature of the object to be recommended, and the state feature and the environmental feature to obtain a first feature cross processing result; performing logistic regression processing on the first feature cross processing result to obtain a first recommendation index of candidate recommendation information; and determining the candidate recommendation information corresponding to the first recommendation index exceeding the first recommendation index threshold value as target recommendation information.

In some embodiments, the recommendation module 4553-1 is further configured to: selecting a plurality of candidate recommendation information corresponding to the first recommendation index exceeding a first recommendation index threshold; selecting at least one piece of recommendation information meeting diversity conditions from the selected multiple candidate recommendation information as target recommendation information; wherein the diversity condition specifies a maximum number of target recommendation information belonging to the same category.

In some embodiments, the recommendation module 4553-1 is further configured to: generating at least one candidate template of the target recommendation information; generating at least one candidate abstract of the target recommendation information; generating at least one candidate cover page of the target recommendation information; combining at least one candidate template, at least one candidate abstract and at least one candidate cover in different modes to obtain a plurality of candidate display modes; wherein the candidate summary comprises at least one of: title of the target recommendation information, reason for recommending the target recommendation information, and introduction of the target recommendation information.

In some embodiments, the recommendation module 4553-1 is further configured to: performing at least one of the following processes: extracting key content of the target recommendation information, and generating a candidate abstract according to the key content; determining an interactive object having an interactive behavior with the target recommendation information, and generating a candidate abstract according to the interactive object; and determining the association information of which the association degree with the target recommendation information exceeds an association degree threshold, and generating the candidate abstract according to the association information.

In some embodiments, the recommendation module 4553-1 is further configured to: when the target recommendation information is a recommendation video, dividing the recommendation video into a plurality of shots, wherein each shot comprises a plurality of continuous video frames; the following processing is performed for each shot: clustering a plurality of video frames of the shot to obtain a plurality of clusters under the shot; and determining the video frame closest to the corresponding cluster center in each cluster, and determining the closest video frame as a candidate cover.

In some embodiments, the recommendation module 4553-1 is further configured to: performing the following for each video frame of the recommended video: determining the position of each pixel point of a video frame; any two consecutive video frames are grouped into segmentation units, and the following processing is performed for each segmentation unit of the recommended video: determining gray level difference values of pixel points at the same positions of two video frames of the segmentation unit, and averaging the gray level difference values of the pixel points at multiple positions to obtain difference values of the segmentation unit; and determining that the segmentation unit corresponding to the difference value larger than the difference value threshold value has a shot boundary, and dividing two video frames of the segmentation unit into different shots.

In some embodiments, the recommendation module 4553-1 is further configured to: forming a plurality of consecutive video frames into a set of video frames; randomly selecting N video frames from a video frame set, taking image features corresponding to the N video frames as initial clustering centers of a plurality of clustering sets, and removing the N video frames from the video frame set, wherein N is the number of candidate covers corresponding to a shot, and N is an integer greater than or equal to 2; initializing the iteration number of clustering processing to be M, and establishing a null set corresponding to each cluster, wherein M is an integer greater than or equal to 2; the following processing is performed during each iteration of the clustering process: updating each cluster set, executing cluster center generation processing based on the updating processing result to obtain a new cluster center of each cluster, adding the video frame corresponding to the initial cluster center to the video frame set again when the new cluster center is different from the initial cluster center, and updating the initial cluster center based on the new cluster center; determining a set of each cluster obtained after M times of iteration as a clustering result, or determining a set of each cluster obtained after M times of iteration as a clustering result; the clustering centers of the clusters obtained after iteration for M times are the same as those of the clusters obtained after iteration for M-1 times, M is an integer variable, and the value of M is more than or equal to 2 and less than or equal to M.

In some embodiments, the recommendation module 4553-1 is further configured to: the following processing is performed for each cluster: averaging the image characteristics of each video frame of the cluster to obtain a cluster center of the cluster; and determining the video frame closest to the clustering center according to the distance between the image characteristics of each video frame of the cluster and the clustering center.

In some embodiments, the identifying module 4552-1 is further configured to perform at least one of the following: performing state recognition processing on the image to obtain state characteristics of the object to be recommended; and carrying out environment recognition processing on the image to obtain the environmental characteristics of the environment.

In some embodiments, the display module 4554-1 is configured to perform the following processing for each candidate display mode: extracting text features and image features of the candidate display modes; performing second feature cross processing on at least one of the text feature, the image feature, the portrait feature, the state feature and the environmental feature of the candidate display mode to obtain a second feature cross processing result; and performing logistic regression processing on the second characteristic cross processing result to obtain a second recommendation index of the candidate display mode of the target recommendation information. And determining the candidate display mode corresponding to the second recommendation index exceeding the second recommendation index threshold value as the target display mode.

In some embodiments, as shown in FIG. 3, the software modules stored in the artificial intelligence based recommendation processing device 455-2 of the memory 450 may include: the shooting module 4551-2 is configured to acquire an image of an object to be recommended, where the image is obtained by shooting when the object to be recommended watches information in an environment; the identification module 4552-2 is configured to perform identification processing on the image to obtain at least one of a state feature of the object to be recommended and an environmental feature of the environment; a recommending module 4553-2, configured to obtain recommendation information for multiple candidates of an object to be recommended; the recommendation module 4553-2 is further configured to determine target recommendation information adapted to at least one of the state feature and the environmental feature from the plurality of candidate recommendation information; and a display module 4554-2 for displaying the target recommendation information.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the electronic device executes the artificial intelligence based recommendation processing method described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will be executed by the processor to perform an artificial intelligence based recommendation processing method provided by embodiments of the present application, for example, the artificial intelligence based recommendation processing method shown in fig. 4A-4D.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiment of the application, an image is obtained by shooting when an object to be recommended (a user) watches information in an environment, at least one of the state characteristic and the environment characteristic of the user is obtained, a target display mode adaptive to at least one of the state characteristic and the environment characteristic is subsequently determined from a plurality of candidate display modes, and target recommendation information is displayed according to the target display mode.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A recommendation processing method based on artificial intelligence is characterized by comprising the following steps:

determining a target display mode which is adapted to at least one of the state characteristic and the environment characteristic from the plurality of candidate display modes;

2. The method according to claim 1, wherein the obtaining target recommendation information for the object to be recommended comprises:

target recommendation information adapted to at least one of the status feature and the environmental feature is determined from the plurality of candidate recommendation information.

3. The method of claim 2, wherein determining target recommendation information from the plurality of candidate recommendation information that is adapted to at least one of the status feature and the environmental feature comprises:

performing the following processing for each of the candidate recommendation information:

acquiring content characteristics of the candidate recommendation information and portrait characteristics of the object to be recommended;

performing first feature cross processing on at least one of the content features of the candidate recommendation information, the portrait features of the object to be recommended, and the state features and the environment features to obtain a first feature cross processing result;

performing logistic regression processing on the first feature cross processing result to obtain a first recommendation index of the candidate recommendation information;

and determining the candidate recommendation information corresponding to the first recommendation index exceeding the first recommendation index threshold value as target recommendation information.

4. The method according to claim 3, wherein the determining recommendation information of the first recommendation index corresponding candidate exceeding the first recommendation index threshold as the target recommendation information comprises:

selecting a plurality of candidate recommendation information corresponding to the first recommendation index exceeding a first recommendation index threshold;

selecting at least one piece of recommendation information meeting diversity conditions from the selected multiple candidate recommendation information as target recommendation information;

wherein the diversity condition specifies a maximum number of target recommendation information belonging to the same category.

5. The method of claim 1, wherein the generating the plurality of candidate display modes of the target recommendation information comprises:

generating at least one candidate template of the target recommendation information;

generating at least one candidate summary of the target recommendation information;

generating at least one candidate cover page of the target recommendation information;

combining the at least one candidate template, the at least one candidate abstract and the at least one candidate cover page in different modes to obtain a plurality of candidate display modes;

wherein the candidate summary comprises at least one of: the title of the target recommendation information, the recommendation reason of the target recommendation information and the introduction of the target recommendation information.

6. The method of claim 5, wherein the generating at least one candidate summary of the target recommendation information comprises:

performing at least one of the following processes:

extracting key content of the target recommendation information, and generating the candidate abstract according to the key content;

determining an interactive object having an interactive behavior with the target recommendation information, and generating the candidate abstract according to the interactive object;

and determining the association information of which the association degree with the target recommendation information exceeds an association degree threshold, and generating the candidate abstract according to the association information.

7. The method of claim 5, wherein when the target recommendation information is a recommendation video, the generating at least one candidate cover page of the target recommendation information comprises:

segmenting the recommended video into a plurality of shots, wherein each shot comprises a plurality of consecutive video frames;

performing the following processing for each of the shots:

clustering a plurality of video frames of the shot to obtain a plurality of clusters under the shot;

and determining the video frame closest to the corresponding cluster center in each cluster, and determining the closest video frame as the candidate cover.

8. The method of claim 7, wherein the segmenting the recommended video into a plurality of shots comprises:

performing the following for each video frame of the recommended video: determining the position of each pixel point of the video frame;

any two consecutive video frames are grouped into segmentation units, and the following processing is executed for each segmentation unit of the recommended video: determining gray level difference values of pixel points at the same positions of two video frames of the segmentation unit, and averaging the gray level difference values of the pixel points at multiple positions to obtain difference values of the segmentation unit;

and determining that the segmentation unit corresponding to the difference value larger than the difference value threshold value has a shot boundary, and dividing two video frames of the segmentation unit into different shots.

9. The method of claim 7, wherein clustering the plurality of video frames of the shot to obtain a plurality of clusters under the shot comprises:

composing the plurality of consecutive video frames into a set of video frames;

randomly selecting N video frames from the video frame set, taking image features corresponding to the N video frames as initial clustering centers of a plurality of clustering sets, and removing the N video frames from the video frame set, wherein N is the number of candidate covers corresponding to the shot, and N is an integer greater than or equal to 2;

initializing the iteration number of clustering processing to be M, and establishing a null set corresponding to each cluster, wherein M is an integer greater than or equal to 2;

performing the following processing during each iteration of the clustering processing: updating each cluster set, executing cluster center generation processing based on an updating processing result to obtain a new cluster center of each cluster, adding the video frame corresponding to the initial cluster center to the video frame set again when the new cluster center is different from the initial cluster center, and updating the initial cluster center based on the new cluster center;

determining each cluster set obtained after M times of iteration as a cluster processing result, or determining each cluster set obtained after M times of iteration as a cluster processing result;

the clustering centers of a plurality of clusters obtained after iteration is performed for M times are the same as those of a plurality of clusters obtained after iteration is performed for M-1 times, M is an integer variable, and the value of M is more than or equal to 2 and less than or equal to M.

10. The method according to claim 1, wherein the identifying the image to obtain at least one of the state feature of the object to be recommended and the environmental feature of the environment comprises:

performing at least one of the following processes:

performing state identification processing on the image to obtain state characteristics of the object to be recommended;

and carrying out environment recognition processing on the image to obtain the environmental characteristics of the environment.

11. The method according to any one of claims 1-10, wherein said determining a target display mode from said plurality of candidate display modes that is adapted to at least one of said status feature and said environmental feature comprises:

performing the following processing for each of the candidate display modes:

extracting text features and image features of the candidate display modes;

performing second feature cross processing on at least one of the text feature and the image feature of the candidate display mode, the portrait feature, the state feature and the environment feature to obtain a second feature cross processing result;

performing logistic regression processing on the second feature cross processing result to obtain a second recommendation index of the candidate display mode of the target recommendation information;

and determining the candidate display mode corresponding to the second recommendation index exceeding the second recommendation index threshold value as the target display mode.

12. A recommendation processing method based on artificial intelligence is characterized by comprising the following steps:

and displaying the target recommendation information.

13. An artificial intelligence based recommendation processing apparatus, comprising:

14. An artificial intelligence based recommendation processing apparatus, comprising:

the recommendation module is further used for determining target recommendation information which is adapted to at least one of the state characteristic and the environment characteristic from the plurality of candidate recommendation information;

15. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing the artificial intelligence based recommendation processing method of any of claims 1 to 11 or 12 when executing executable instructions stored in the memory.

16. A computer-readable storage medium storing executable instructions for implementing the artificial intelligence based recommendation processing method of any one of claims 1 to 11 or 12 when executed by a processor.