CN111309936A

CN111309936A - Method for constructing portrait of movie user

Info

Publication number: CN111309936A
Application number: CN201911373310.7A
Authority: CN
Inventors: 胡亚娇; 谢志峰; 丁友东
Original assignee: Beijing Transpacific Technology Development Ltd
Current assignee: Beijing Transpacific Technology Development Ltd; University of Shanghai for Science and Technology
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-06-19

Abstract

The invention relates to a construction method of a movie user portrait, which comprises the following steps: selecting users who have issued Chinese movie comments from a movie community website, and collecting static data and dynamic data of the users; step two, constructing a three-layer label system of the movie user portrait according to the acquired multi-dimensional data of the sample movie user; step three, predicting the first-layer label and the second-layer label of the movie user according to the corresponding relation between the multi-dimensional data of the movie user and the labels in the label system and a label hierarchy from bottom to top to construct a relatively perfect single-user portrait model; and fourthly, performing movie preference analysis on group movie users with certain common characteristics to generate a third layer of labels of the movie user portrait and construct a group user portrait. The invention generates a label system by analyzing and mapping the original data of the user, thereby realizing the labeling of the attributes of the movie users and the construction of the portrait model of the group users of the same attribute crowd.

Description

Method for constructing portrait of movie user

Technical Field

The invention relates to a construction method of a movie user portrait, and belongs to the fields of big data, data mining, natural language processing and machine learning.

Background

Under the background of big data and social media, the Internet platform analyzes potential user preferences in user information and behaviors and carries out personalized popularization on the platform information. User portrayal, i.e., user information tagging, is a target user model based on a series of real data. The user representation can label social attributes, living habits, consumption behaviors and the like of the user in a labeling mode. The user portrait is used as a main part of a recommendation system, and is widely used in commercial fields of E-commerce commodity recommendation, advertiser advertisement delivery and the like by mining user individual characteristics, individual differences among users and platform user group characteristics. Under the action of user portrait sketching, the platform can carry out personalized recommendation on the user, the user obtains better experience, and the platform can attract more traffic.

At present, most of research and realization of user portrait are based on the personality scale survey of volunteers, firstly, survey is conducted on users in a scale form, scores are calculated to obtain the personality types of the users, then, training is conducted through words in social data to obtain a model representing the relevance between the social words and the personality types, and finally, the personality types are predicted according to the social data of the users. The method is based on partial user investigation, consumes a large amount of manpower and material resources, has limitation on the research content, and has certain difficulty and unknown accuracy in scale manufacture.

Dittman et al, in Random forest A reliable tool for patient responsiveness, IEEE International Conference on Bioinformatics & biomedicine works phones IEEE,2011, apply Random forest to predict the patient's response to drugs, predict high dimensional data in the experiment using Random forest and 5 other classification learners, the results prove that Random forest has the best effect in the classification prediction of any feature selection strategy.

Wangli et al propose AdaBoost algorithm AdaBoost for multi-label classification, MLR, which is suitable for multi-label classification, and reasonably utilizes the correlation among the labels to be detected, thereby improving the accuracy of multi-label classification.

Liu Sha Jian et al in the "Graph Based Keyphrase Extraction Using LDA topoc Model", Journal of the Chinese Society for Scientific and Technical Information,2016,35(6): 664) 672 propose a keyword Extraction Model combining LDA and TextRank, and perform experiments on the short and medium text data set Huth2003 and the long text data set DUC2001, the results show the effectiveness of the method.

Fang Long et al, in "Structure-Function registration of Academic Text-Application in Automatic Keywords Extraction", Journal of the Chinese society for Scientific and Technical Information,2017,36(6):599- & lt 605 ], propose a structural Function Recognition method based on Academic Text, propose a multi-feature combination Extraction method fusing the structural and functional features of Academic Text, and recognize the structural functions by using section titles of Academic Text, and extract Keywords on a literature set in the field of computer languages by SVM two-classification and Lambdat learning sorting algorithms respectively, and experimental results show that multi-feature combinations are greatly improved in keyword Extraction effect compared with reference features.

Tengfei et al, in "Opinion Target Extraction in Chinese News documents", Proceedings of the 23rd International Conference on computational rules, post volume. Beijing: [ s.n. ], 2010: 782-. Firstly, an NLP tool LTP is used for analyzing a sentence according to syntactic specifications to judge whether a subject exists in the sentence or not so as to divide the sentence into an implicit sentence without the subject and a display sentence containing the subject, then the display sentence adopts a method of extracting all nouns in the sentence and carrying out grammatical analysis to sort candidate subjects, the implicit sentence adopts a method of converting focus concepts into Wikipedia concept vectors to extract the relevance of the sentence so as to extract candidate subjects from context by means of ranking key concepts, news topics and candidate subjects are sorted, and finally the subject of the sentence is selected according to the sorting and context information by means of a central theory.

Shiu-Li Huang et al, in "Electronic Commerce Research and Applications," propose to extract opinion phrases in comment sentences and to extract viewpoint emotion scores from the opinion phrases by customizing POS templates. In the experiment, a set of film comment POS template and a set of automobile comment POS template are respectively induced to extract short opinions of a film and an automobile, cross-domain POS templates are further induced to perform comparison, nouns, verbs, adjectives, degree adverbs and negative words are obtained from the short sentences, vocabulary scores are given, and a set of algorithm is designed to give total scores of the short sentences as viewpoint scores.

The movie reviews are not only reviews of movie elements such as the whole, content, actors and skills, shooting style, music and sound, vision and special effects, but also personal emotions, situations and experiences of movie users, even analysis and expression of the entire movie market and social situations of the movie users, so that the subject of the review sentences may belong to the movie category or other categories. The method adopts a mode of constructing a rich film word stock for extracting the subject of the film category viewpoint sentences, and adopts a mode of combining a central theory and a template for extracting the subjects of other category viewpoint sentences.

The invention combines NLP Chinese processing such as viewpoint extraction, syntactic analysis and the like, provides a method for extracting viewpoints in Chinese movie reviews, deeply mines movie user portrait labels, and constructs a complete movie user portrait model.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method for constructing a movie user portrait, which is improved, a method for constructing a user portrait in the aspect of movies is constructed, a user portrait label system containing a plurality of layers of labels is constructed according to the corresponding relation between user original data and user target attributes, structured data and unstructured texts are analyzed by different methods to generate labels, finally, a complete movie user portrait capable of showing the user viewing characteristics is sketched, movie preference analysis is further carried out on a movie group with certain characteristics, and a group user portrait model is constructed.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for constructing a movie user portrait comprises the following steps:

selecting users who have issued Chinese movie comments from a movie community website, and collecting static data and dynamic data of the users;

step two, constructing a three-layer label system of the movie user portrait according to the acquired multi-dimensional data of the sample movie user;

predicting each label of a first-layer label and a second-layer label of the movie user from bottom to top according to the corresponding relation between the multi-dimensional data of the movie user and the labels in the label system, and constructing a relatively perfect user portrait model of the single movie user;

and fourthly, according to the user characteristics, performing movie preference analysis on group movie users with certain common characteristics to generate a third layer of labels of movie user portraits and construct the group user portraits.

In the first step, from twenty movie types of drama, comedy, action, love, science fiction, animation, suspense, thriller, horror, crime, homopathy, music, singing dance, biography, history, war, western part, fantasy, adventure, disaster and swordsmanship, users in movie reviews under the same amount of popular movies are selected for data acquisition to form sample users, so that the diversity of movie types can be ensured, and the activity and the characteristic diversity of movie users can also be ensured.

The static data and the dynamic data of the user in the first step comprise four types of basic information of the user, movie evaluation information of the user, movie information and user label information, and four tables are established in a database to store the four types of information respectively.

And the multidimensional data in the second step comprise basic data, film evaluation data, diary data and film watching data of the movie user, and different labels are correspondingly constructed according to the data of each dimension of the movie user.

And constructing the movie user portrait according to the corresponding relation between the labels and the data, wherein the labels of the movie user portrait use a classification model, and the labels of the movie user portrait use a clustering model.

In the third step, the film user portrait label system is divided into three layers based on user original data, and the label system is constructed by sequentially generating labels from the lower layer to the upper layer by using statistics, a machine learning classification algorithm and an NLP (non line segment) method.

According to the static data and the dynamic data of the movie user, the movie user data are classified into four fields of basic attributes, social attributes, viewing preferences and individual characteristics, the data of each field are respectively corresponding to tags of the four fields of the movie user, wherein each field respectively comprises more than two movie user tags, each tag corresponds to at least two tag values, and the set of all tags is a tag library of the movie user portrait.

The user social ability tag in the social attributes is a measure of the bidirectional social degree of the movie user, and related data of the user social ability tag is composed of the number of other movie users concerned by the movie user with one-way social data and the number concerned by other users, wherein the concerned number and the concerned number are respectively divided into three grades of strong, medium and weak; according to the maximum value and the minimum value of the concerned number and the concerned number of all users, two threshold values are respectively set for the concerned number and the strong, medium and weak levels of the concerned number, and the users are classified in one-way social contact; the user social ability category is divided into nine levels according to the attention number and the attention number.

The film watching time characteristic related label is obtained by original data user film evaluating time, the film evaluating time is divided into user film evaluating date and user film evaluating time, and prediction of user monthly film evaluating quantity and user active time are predicted respectively; the movie user forecasts the movie evaluation number of the movie user in the current month by the historical movie evaluation amount of the movie user in one month, three months, one year, two years and three years, and the annual activity of the movie user is forecasted by the XGboost model; the future maximum possible active time of the movie user is predicted by the moment of movie user movie ratings.

One of the viewing preference features is a type tag to which a user watches a movie, and the movie tag is classified to use ten categories of movie features according to a method of "shooting in (year), (region/country), (environmental background) and (historical background) telling (content of (character) in (year)" in (form) and (manner) and (style) ";

(1) shooting year (1900 to 2019, one segment every decade)

(2) Region/country (China, Japan, USA, Europe, India, other regions)

(3) Environmental backgrounds (such as highways, cities, poverty, desert, palace, west, etc.)

(4) Historical backgrounds (such as cultural revolution, artistic revival, anti-war, etc.)

(5) Forms (opera, cartoon, drama, music drama, documentary)

(6) Ways (action, comedy, tragedy, thriller, consciousness flow, suspense)

(7) Styles (e.g., ensemble style, amalgamation style, co-occurrence style, painting style, TV style),

(8) role (family, super hero, second generation, earth, country, common people, father, black help, variant people, etc.)

(9) The times (dynasty, the republic of China, etc.)

(10) Contents (love, disaster, cult, fantasy, myth, police gangster, adventure, biography, sex, biography, history, etc.)

Categorizing the movies in the user viewing history, wherein each movie matches at most one value within each type of domain; and matching the film watching type for each movie user, and attaching a film watching label.

The film watching preference consists of film evaluation Chinese text data of film users; movie user movie review sentences are the minimum units for acquiring movie user preferences, long sentences are often involved in movie review, one sentence contains a plurality of clauses, and the phenomena of sharing objects and extending objects exist among the clauses; the analysis of the movie film comment text data adopts a method of analyzing film comment sentences one by one to obtain different viewpoints in each sentence, and extracts a theme idea or viewpoint expressed by the film comment as a whole.

Sometimes, the comment target of a sentence cannot be found in a film comment sentence, and this phenomenon is called an implicit object: a comment object that does not appear in the current sentence, such a sentence being called an implicit sentence; explicit objects: a comment object appearing in a current correct sentence, such sentence being called an explicit sentence; in movie reviews, the phenomenon of implicit objects is quite common; the film sentence is in our data set, the sentence which implies the target accounts for nearly 30% of the total; the method comprises the following steps that the problem that pointed objects are not obvious exists in movie comment data, most objects are concentrated on topics needing to be expressed by a movie, and comment objects of an implicit sentence are judged according to four aspects of movie topics, comment titles, preceding clause subjects and following clause subjects; for the display sentence, the movie type, the theme and the conveyed value are most likely to be the opinion target, so the movie theme, the object and other sequences are used to determine the main body specified by the display sentence; the inquiry of the implicit object in the movie comment depends on the following method:

1) finding out all nouns, adjectives and verbs in the same sentence, and putting the set S ═ t_iIn (1) };

2) calculate each t_iAnd t₀Mutual information MI of;

3) selecting the ten words with the highest MI and comparing them with t₀Combining into a word vector;

4) is provided with<k_ij>Is t_iThe inverted index entry of (1), wherein k_ijFor quantizing t_iConcept of wikipedia_jThe strength of association of (a); vector V is then interpreted as a vector constructed from all wikipedia concepts; each concept c_jHaving a weight w_j＝Σ_ti∈ _Vv_ik_ij；

5) The N concepts with the highest weights are selected.

The user age label of the second layer of movies in the third step segments the user ages; firstly, classifying labels of a user film watching label library according to ten types of film characteristics, then using the classified film watching labels, user social contact strong and weak labels and user influence labels as input characteristics to enter a random forest classification model, and predicting the age bracket to which the user belongs.

And in the third step, the personality labels of the second-layer film users classify the personality of the users according to the 'five-personality' of psychology, and the film watching labels, the social strong and weak labels of the users and the influence labels of the users are used as input features and are transmitted into a random forest classification model to predict the personality of the users.

The user income label of the second layer of movies in the third step divides the user income into three categories; and (4) taking the film watching tag, the social strong and weak tags of the user and the influence tags of the user as input features to be transmitted into a random forest classification model, and predicting the income of the user.

The calculation steps of the user role related labels of the second layer of movies in the third step are as follows: dividing user roles into three parts of gender, marriage and children, classifying the three parts respectively, and transmitting user social situations, user film watching time characteristic related labels and user film watching preference labels as input characteristics into an AdaBoost.

The group viewing preferences in the fourth step use statistical knowledge to respectively calculate the viewing preferences of users in different age groups, personality, income and roles; and constructing a group movie user portrait model.

Compared with the prior art, the invention has the following prominent substantive characteristics and obvious advantages:

the method not only comprehensively analyzes and maps the original data of the user to generate a label system, realizes the film watching labeling of the user in the film watching characteristic aspect, but also realizes the model construction of group user figures of different characteristic crowds to the favorite degrees of different types of films under the research based on the film watching characteristic. The method is an unprecedented user research method in movie user portrayal and has important significance for movie and personalized recommendation.

Drawings

FIG. 1 is a flow chart of a movie user representation construction method.

FIG. 2 is a block diagram of a movie user portrait labeling architecture.

Fig. 3 is a block diagram of a user movie review viewpoint extraction flow.

FIG. 4 is a user social situation calculation flow diagram.

FIG. 5 is a block diagram of a calculation process of the region situation to which the static attribute of the user belongs.

FIG. 6 is a block diagram of a user viewing time characteristic correlation label calculation process.

FIG. 7 is a block diagram of a user social attribute viewing preference tag computation flow.

FIG. 8 is a block diagram of a user static attribute age group prediction process.

FIG. 9 is a block diagram of a user static attribute personality prediction process.

FIG. 10 is a block diagram of a user static attribute revenue prediction process.

FIG. 11 is a block diagram of a user static attribute role prediction flow.

Fig. 12 is a graphical illustration of movie preference demographics for different age groups of a user population.

FIG. 13 is a fan plot illustration of the degree of preference of different personalities of a user population for a history sheet.

FIG. 14 is a histogram of different incomes of a user population versus classical dubbing preferences.

Detailed Description

So that the manner in which the features and aspects of the embodiments of the present disclosure can be understood in detail, a more particular description of the embodiments, briefly summarized above, may be had by reference to the appended drawings.

As shown in fig. 1, a method for constructing a movie user portrait includes the following steps:

step one, selecting users who have issued Chinese movie comments from a movie community website, and collecting static data and dynamic data of the users. The specific data acquisition method of the embodiment comprises the following steps:

(1) equal amount of movie reviews are selected from popular classifications of twenty movie types, namely drama, comedy, action, love, science fiction, animation, suspense, thriller, terror, crime, same sex, music, singing dance, biography, history, war, western, fantasy, adventure, disaster and martial art.

(2) And collecting the film comments, and analyzing the user ID (field name: user _ ID), the user social homepage link (field name: user _ url) and the user nickname (field name: user _ name) in the film comment HTML text. And storing the three data of the user in a user table of the MongoDB database.

(3) Reading a user _ url field in a user table, accessing a user homepage, and acquiring basic information of the user: a permanent place, a registration time, a personality signature, a number of concerns of the user. And storing the information in a user basic information table of the MongoDB database.

(4) Reading a user _ url field in a user table, splicing a movie review list page link of a user, accessing a movie review detail link, and collecting movie review related information: movie ID, movie rating title, movie rating time, movie rating content long text, rating of movie by user, movie rating useful number and movie rating useless number. And storing the information in a user movie evaluation information table of the MongoDB database.

(5) Reading the value of the movie ID field in the movie review information table of the user, splicing movie links, accessing the movie links and collecting movie information: movie rating, movie type, shooting area, movie duration. And storing the information in a user basic information table of the MongoDB database.

(6) And reading a user _ url field in the user table, splicing a user film viewing label page link, reading a page, and collecting the film viewing labels and the corresponding quantity of the film viewing labels of the user. And storing the information in a user film watching tag information table of a MongoDB database.

Step two, constructing a three-layer label system of the movie user portrait according to the acquired multi-dimensional data of the sample movie user, as shown in fig. 2; the method comprises the following steps of dividing a film user portrait label system into three layers based on user original data, and constructing the label system from the lower layer to the upper layer in a mode of sequentially generating labels by using statistics, a machine learning classification algorithm and an NLP (non line segment) method, wherein the specific label system constructing method of the embodiment comprises the following steps:

(1) and respectively predicting the social situation, the region of the user, the film watching time characteristic and the film watching preference of the user in the user attribute by using part of the original data to form a primary label.

(2) And respectively predicting the age, personality, income, user family role and social role in the user attributes by using part of the primary label data to form a secondary label.

(3) And respectively using partial first-level label data and partial second-level label data to carry out statistics on the movie preference of the common attributes of the user groups, and carrying out statistical analysis on the film viewing preference of groups of different ages, the film viewing preference of groups of different personalities of the user and the film viewing preference of groups of different incomes of the user to form a third-level label.

Predicting each label of a first-layer label and a second-layer label of the movie user from bottom to top according to the corresponding relation between the multi-dimensional data of the movie user and the labels in the label system, and constructing a relatively perfect user portrait model of the single movie user; the specific tag data mining method of the embodiment comprises the following steps:

(1) the user social situation calculation steps are shown in fig. 4 as follows: the attention rate of the original data user is divided into five levels of (0,20), (20,50), (50,100), (100,300) and more than 300, the attention rate of the user is divided into five levels of (0,50), (50,200), (200,500), (500,1500) and more than 1500, and the user is classified into an attention rate level i and an attention rate level j. The social situation of a user on a platform is represented by two labels, one label is a social strong label and a social weak label, the social strong label and the social weak label are divided into five grades from weak to strong (1,2,3,4,5), and the larger value of the grades of the attention number and the attention number is used as the value of the label; the other is a user influence label, the concerned number grade i is divided by the concerned number grade j, namely i/j, the label value is 'normal' if the result is 1, the label value is 'weak' if the result is less than 1, the user influence is weak, and the label value is 'strong' if the result is more than 1, the user influence is strong.

(2) The calculation steps of the user belonging area are shown in fig. 5 as follows: firstly, city names of original data users in city activities are extracted, null values of user common residence places are filled, then regional words of the users are expanded to a province and city regional word bank, and finally provinces and province labels of the users are matched from the regional word bank for each user.

(3) The calculation steps of the label related to the user viewing time feature are shown in fig. 6 as follows: the movie rating time of an original data user is divided into two parts: the film evaluation date of the user and the film evaluation time of the user. For the prediction of the user monthly movie evaluation quantity label, natural months are aggregated for the movie evaluation of the user according to movie evaluation dates to obtain monthly movie evaluation quantities as output of a prediction model, movie evaluation quantities of the user in the month, two months, three months and one year before the current natural month are aggregated respectively, the movie evaluation quantities aggregated in the month are used as characteristic input of the prediction model, and finally, a machine learning XGboost model is used for regression prediction. For the user active time label, firstly extracting the film evaluation time of the user, aggregating all the film evaluations of the user by hours, and taking the hour three before the ranking of the film evaluation quantity in 24 times in a day as the active time of the user.

(4) One of the viewing preference features is a type tag to which a user watches a movie, and the movie tag classification classifies movies in the user's viewing history using ten categories of movie features according to a method of "shooting in (year), (region/country), (environmental background), and (historical background) telling (content of (character) in (year)" in (form) and (manner), and (style), "wherein each movie matches at most one value in each category domain. And matching the film watching type for each movie user, and attaching a film watching label.

The viewing preference label for the user is shown in fig. 7, and is implemented in the following steps: (1) cleaning film evaluation data; (2) extracting a subject; (3) comment viewpoint extraction; (4) and extracting the comment substance emotion.

(1) Film comment data cleaning

Firstly, movie comment data need to be cleaned, a movie comment platform is a Chinese platform, and collected English comments are converted into Chinese comments. The method is characterized in that six aspects of the whole body, the content, the actors and the skills, the shooting style, the music, the sound, the vision, the special effect and the ten major types of film and television special vocabularies and network vocabularies are expanded to a Chinese vocabulary library, a JIEBA word segmentation tool is used for segmenting the vocabularies in a film user film evaluation single sentence, and the part of speech of each word is marked. And removing stop words in the comments by using the Chinese stop word list to obtain Chinese film and comment words and movie user emotion words with practical meanings.

(2) Subject extraction

For an implicit sentence, the comment data has the problem that the pointing object is not obvious, most objects are concentrated on the theme to be expressed by the movie, and the comment object of the implicit sentence is judged according to the movie theme, nouns near adjectives, preceding clause subjects, following clause subjects and comment titles. For a display sentence, nouns before and after an adjective are extracted as the main body of the display sentence.

(3) Review opinion extraction

And extracting negative words, adjectives and degree words in the sentences by adopting the POS template with the nouns removed, and calculating the emotion score of the movie user on the quality of a certain aspect of the movie according to the emotion dictionary.

(4) Sentiment extraction of comment

A user has a theme and emotional tendency in the film comment, and the LDA topic model is adopted to extract the subject term of the film comment.

(5) For the user age labels, as shown in fig. 8, the user ages are classified into four age groups of 18 years or less, 18-25 years, 25-35 years, 35-50 years, and more than 50 years, which are respectively labeled as categories (1,2,3,4, 5). Firstly, classifying labels of a user film watching label library according to ten types of movie features, then manually labeling age groups of some users, and finally, inputting the classified film watching labels, the social strong and weak labels of the users and the influence labels of the users into a random forest classification model as input features to predict the age groups of the users.

(6) As shown in fig. 9, the user personality labels are classified into openness, responsibility, camber, hommization and neurogenic according to the "five-personality" in psychology, and are respectively labeled as categories (1,2,3,4 and 5). And (4) manually marking the personality of a part of users, and inputting the film watching labels classified in the step (5), the social strong and weak labels of the users and the influence labels of the users into a random forest classification model as input features to predict the personality of the users.

(7) As shown in fig. 10, the user income label is classified into a general, and rich category (1,2, 3). And (4) manually marking the income types of a part of users, and then inputting the classified film watching labels, the social strong and weak labels of the users and the influence labels of the users in the step 5 into a random forest classification model as input features to predict the income of the users.

(8) The calculation steps of the user role related labels are shown in fig. 11 as follows: the user roles are divided into three parts, namely gender, married state and child state, the gender of the user is marked as a category (1,2), the marriage state is marked as a category (1,2), the nonmarried state and the married state is marked as a category (1,2), and the yes or no of the child is marked as a category (1, 2). Firstly, manually labeling the categories of a part of users, and then, using the social situations of the classified users in step 1, the related labels of the film watching time characteristics of the classified users in step 3 and the film watching preference labels of the classified users in step 3 as input characteristics to be transmitted into an AdaBoost.

And respectively calculating the film watching preferences of the users in different age groups, personality, income and roles by using statistical knowledge. Constructing a group movie user portrait:

(1) the bar graph is drawn as shown in fig. 12, the horizontal axis represents the viewing preference of the user, the vertical axis represents the number of people in 5 age groups of the user, and the relationship between the age of the user and the viewing preference is analyzed.

(2) Drawing a fan-shaped graph as shown in fig. 13, representing the classification of various personalities in the crowd who likes a certain kind of movies by the area of the fan-shaped graph, and analyzing the preference degrees of different crowds for the movies.

(3) The histogram is plotted as in fig. 14, with the horizontal axis representing user income and the vertical axis representing viewing preferences, and viewing preferences for different income groups are analyzed.

Claims

1. A method for constructing a user portrait of a movie is characterized by comprising the following steps:

2. The method for constructing the portrait of the movie user as claimed in claim 1, wherein in the step one, the user in the movie review under the popular movie is selected to collect data to form a sample user from twenty movie types including drama, comedy, action, love, science fiction, animation, suspense, thriller, horror, crime, same sex, music, dance, biography, history, war, western, fantasy, adventure, catastroll and swordsman, so that the diversity of the movie types and the activity and the characteristic diversity of the movie user can be ensured.

3. The method for constructing a user portrait of movie as claimed in claim 1, wherein the static data and dynamic data of the user in the first step include four types of basic user information, movie rating information, movie information and tag information, and four tables are established in the database for storing the four types of information.

4. The method for constructing a portrait of a movie user as defined in claim 1, wherein the multidimensional data in the second step includes basic data, comment data, diary data and view data of the movie user, and different tags are constructed according to the data of each dimension of the movie user.

5. The method for constructing a movie user portrait according to claim 1, wherein the model in the third step is based on correspondence between tags and data, the construction of the movie user portrait includes a movie user personal portrait and a movie user group portrait, the tags of the movie user personal portrait are processed by using a machine learning classification model and natural language, and the tags of the movie user group portrait are analyzed by using statistics.

6. A method for constructing a user portrait of a movie as defined in claim 1, wherein in the third step, each tag of the movie user is predicted by using one of statistics, machine learning random forest, XGBoost classification algorithm, adaboost, mlr multi-tag classification algorithm, and syntactic analysis of natural language processing.

7. The method for constructing a movie user portrait according to claim 3, wherein the movie user data is classified into four fields of basic attributes, social attributes, viewing preferences and personality characteristics according to the static data and dynamic data of the movie user, and the data of each field is respectively corresponding to tags of the four fields of the movie user, wherein each field comprises more than two movie user tags, each tag corresponds to at least two tag values, and the set of all tags is a tag library of the movie user portrait.

8. The movie user representation construction method according to claim 7, wherein the user social ability tag in the social attributes is a measure of the bi-directional social degree of the movie user, and the related data of the user social ability tag is composed of the number of other movie users and the number of other users concerned by the movie user, wherein the number of concerned users and the number of concerned users are respectively classified into three levels, namely strong, medium and weak; according to the maximum value and the minimum value of the concerned number and the concerned number of all users, two threshold values are respectively set for the concerned number and the strong, medium and weak levels of the concerned number, and the users are classified in one-way social contact; the user social ability category is divided into nine levels according to the attention number and the attention number.

9. The method for constructing a user representation of a movie according to claim 7, wherein one of the viewing preference features is a type tag to which the user views the movie, and the movie tag is classified according to ten categories of movie features in a method of "shoot in (year), under (region/country), (environmental background) and (historical background)," telling (content of (character) in (year) "in (form) and (way) and (style)"; categorizing the movies in the user viewing history, wherein each movie matches at most one value within each type of domain; and matching the film watching type for each movie user, and attaching a film watching label.

10. The method for constructing a user portrait of a movie according to claim 7, wherein one of the viewing preference characteristics is a user movie evaluation time, the movie evaluation time is divided into a user movie evaluation date and a user movie evaluation time, and the prediction of the monthly movie evaluation amount of the user and the user activity time are predicted respectively; the movie user forecasts the movie evaluation number of the movie user in the current month by the historical movie evaluation amount of the movie user in one month, three months, one year, two years and three years, and the annual activity of the movie user is forecasted by the XGboost model; the future maximum possible active time of the movie user is predicted by the moment of movie user movie ratings.

11. The method of claim 7, wherein the viewing preferences comprise movie user movie ratings Chinese text data; movie user movie review sentences are the minimum units for acquiring movie user preferences, long sentences are often involved in movie review, one sentence contains a plurality of clauses, and the phenomena of sharing objects and extending objects exist among the clauses; the analysis of the movie film comment text data adopts a method of analyzing film comment sentences one by one to obtain different viewpoints in each sentence, and extracts a theme idea or viewpoint expressed by the film comment as a whole.

12. The method for constructing a user portrait of movie as claimed in claim 11, wherein sometimes the comment target of a sentence is not found in a comment sentence, and this phenomenon is called implicit object: a comment object that does not appear in the current sentence, such a sentence being called an implicit sentence; explicit objects: a comment object appearing in a current correct sentence, such sentence being called an explicit sentence; in movie reviews, the phenomenon of implicit objects is quite common; the film sentence is in our data set, the sentence which implies the target accounts for nearly 30% of the total; the method comprises the following steps that the problem that pointed objects are not obvious exists in movie comment data, most objects are concentrated on topics needing to be expressed by a movie, and comment objects of an implicit sentence are judged according to four aspects of movie topics, comment titles, preceding clause subjects and following clause subjects; for the display sentence, the movie type, the theme and the conveyed value are most likely to be the opinion target, so the movie theme, the object and other sequences are used to determine the main body specified by the display sentence; the inquiry of the implicit object in the movie comment depends on the following method:

2) calculate each t_iAnd t₀Mutual information MI of;

4) is provided with<k_ij>Is t_iThe inverted index entry of (1), wherein k_ijFor quantizing t_iConcept of wikipedia_jThe strength of association of (a); vector V is then interpreted as a vector constructed from all wikipedia concepts; each concept c_jHaving a weight

5) The N concepts with the highest weights are selected.

13. The method for constructing a user representation of a movie as recited in claim 1, wherein the second layer movie user age tag in the third step segments the user age; firstly, classifying labels of a user film watching label library according to ten types of film characteristics, then using the classified film watching labels, user social contact strong and weak labels and user influence labels as input characteristics to enter a random forest classification model, and predicting the age bracket to which the user belongs.

14. The method for constructing a movie user portrait according to claim 1, wherein the second-layer movie user personality label in the third step classifies the user personality according to the psychological five personality, and the viewing label, the social strong and weak label of the user and the user influence label are used as input features to be introduced into a random forest classification model to predict the user personality.

15. The method of claim 1, wherein the second layer of movie user income label in step three classifies user income into three categories; and (4) inputting the film watching label, the social strong and weak label of the user and the influence label of the user into a random forest classification model as input features, and predicting the personality of the user.

16. The method for constructing a user portrait of movie as recited in claim 1, wherein the step of calculating the related labels of the user roles of movie in the second layer in the third step comprises: dividing user roles into three parts of gender, marriage and children, classifying the three parts respectively, and transmitting user social situations, user film watching time characteristic related labels and user film watching preference labels as input characteristics into an AdaBoost.

17. The method for constructing a user portrait of movie as claimed in claim 1, wherein the group viewing preferences in step four use statistical knowledge to calculate the viewing preferences of users in different age groups, personality, income and role categories; and constructing a group movie user portrait model.