Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to understand the reading interest of each user and further recommend the reading content to the user according to the reading interest, the embodiment of the present application provides a user representation construction method to determine the reading interest of the user.
The method can be applied to reading equipment, such as a robot, or can also be applied to other intelligent terminals, for example, an application program for implementing the method is installed in the intelligent terminal.
Referring to fig. 1, a flow chart of a user portrait construction method provided in an embodiment of the present application is schematically illustrated, and as shown in the drawing, the method may include the following steps:
step 101, obtaining historical reading data of a user.
For example, the drawing robot may obtain the user-read drawing data, which may include the user-prepared drawing data that the drawing robot reads by using a computer vision technology, the drawing data that the drawing robot reads by itself or is downloaded from the internet, and further, the drawing data that the user has ordered or has collected may be included.
Taking reading application software installed in the intelligent terminal as an example, the intelligent terminal obtains the novels and the cartoons read by the user, the data of the listened to voiced novels and the like through executing a program of the software, and the novels and the cartoons ordered or collected by the user can be obtained.
And 102, obtaining the label of each piece of historical reading content in the historical data, and determining the weight of each label.
Each piece of reading content can correspond to one or more tags, and the tags of the historical reading content of the user are extracted, so that the reading interest of the user can be conveniently known and analyzed. Optionally, a label may be set according to the type of the content, for example, a label related to the fairy tale in denmark may be "fairy tale"; a picture of an animal, the corresponding tag may be "animal". Of course, there may be a plurality of type tags, for example, a book related to teenager psychology education, and the corresponding tags may include "teenager", "psychology", and "education", etc. In addition to setting tags according to the content type of the book, corresponding tags may be set according to the author, the publisher, the age group suitable for reading, and the like.
In one possible implementation, the tags of each piece of historical reading content may be obtained by web crawler, automatic generation, or manual input. The web crawler technology is a program or script for automatically capturing web information according to a certain rule, that is, automatically capturing a tag for the content existing in another web page from the internet. Manual input, i.e., a label input by the user about the reading content. And automatic generation, namely, drawing the corresponding label automatically by the robot or the intelligent terminal and the like according to the acquired reading content by using an AI technology.
Alternatively, when the tag of the reading content is automatically generated, keywords may be extracted from the text content of the obtained reading content, for example, as shown in fig. 2, a Natural Language Processing (NLP) technique may be used to perform word segmentation and word type tagging, only some key words, such as nouns and action nouns, are reserved, and some noise information, such as stop words and punctuation marks, is removed. And then, respectively determining the weight of each keyword from three dimensions according to the word frequency, the maximum entropy and a keyword extraction algorithm.
The Term Frequency algorithm can be Term Frequency-Inverse text Frequency (TFIDF) TFIDF, and the principle is as follows: if a word or phrase appears frequently in one article, TF is high, and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification. For example, if the word "plant" in a sketch occurs with a high frequency, but the word does not occur with a high frequency in the corpus, it may be determined that the weight of the keyword "plant" is high. The maximum entropy principle is a criterion for selecting the random variable statistical characteristics to best meet objective conditions, and when only partial knowledge about unknown distribution is mastered, the probability distribution which meets the knowledge and has the maximum entropy value is selected.
And then selecting k keywords from the extracted keywords as the tags of the reading content by utilizing a multidimensional Top k Rank technology. The specific process may be as shown in fig. 2, and the keywords are first clustered and merged, for example, the keywords "home" and "family" have similar meanings, and may be clustered and merged by a clustering technique; and then, the top k keywords are reserved as the labels of the reading content by adopting a top k algorithm for the keywords subjected to clustering and merging, namely, the automatic generation of the labels of the reading content is realized.
After the tags of each piece of reading content are acquired, the weight of each tag needs to be further determined. In one possible implementation manner, a first weight of a corresponding user behavior type, the number of times of the user behavior type, a time attenuation factor, and a second weight determined according to a word frequency may be determined for each true tag. The specific label weight may be determined by the following formula:
label weight fun (action type weight, action times, time decay factor, TFIDF label weight)
Wherein, the fun function can adopt an algorithm weighted by a weighting factor; the user behavior types can comprise reading behavior types, purchasing behavior types, collecting behavior types and the like, and different behavior types can correspond to different weights; the TFIDF tag weight is the weight determined by applying the TFIDF algorithm.
And 103, filtering the labels according to the weight.
Specifically, the labels may be filtered using an interest label library and the weights of the labels; alternatively, a threshold may be set for the tag weight, so as to filter the tags; alternatively, k labels with higher weights may be selected.
And 104, determining the reading interest portrait of the user according to the filtered tags.
For example, the set of filtered tags may be used as a reading interest portrait of the user; alternatively, the reading interest portrait of the user may be generated according to a preset rule for the tag obtained after filtering.
Furthermore, besides the tags, the reading interest representation of the user can also include the corresponding weights of the tags so as to reflect the interest of the user in different aspects.
The user portrait construction method provided by the embodiment can be used for realizing the construction of the reading interest portrait of the user, fully knowing the reading interest of the user and providing conditions for recommending reading contents to the user subsequently. Because the user portrait construction system for picture book reading in the prior art is still incomplete, label setting for the picture book is incomplete, and interest portrait of children for the picture book is not constructed, the method is particularly suitable for constructing the reading interest portrait of the children for the picture book.
Based on the same technical concept, the embodiment of the application also provides a content recommendation method, which is used for recommending reading content which may be interested by a user to the user. The method can be applied to reading equipment, such as a robot, or can also be applied to other intelligent terminals, for example, an application program for implementing the method is installed in the intelligent terminal.
Referring to fig. 3, a schematic flow chart of a content recommendation method provided in the embodiment of the present application is shown, and as shown in the drawing, the method may include the following steps:
step 301, obtaining historical reading data of a user.
Similar to the foregoing embodiment, the obtained historical reading data may include content read by the reading device by using a computer vision technology, reading content stored by the reading device itself or downloaded from the internet, and reading content ordered or collected by the user.
Step 302, obtaining the label of each piece of historical reading content in the historical reading data, and determining the weight of each label; and filtering the labels according to the weight, and determining the reading interest portrait of the user according to the filtered labels.
As described above, for each piece of reading content, the tag related to the reading content may be obtained through a web crawler technology, or a manually input tag is received, or a corresponding tag may be automatically generated, and the method for automatically generating a tag is similar to the foregoing embodiment, and is not described here again.
Step 303, determining a multivariate intelligent theoretical value of the user according to the historical reading data of the user, wherein the multivariate intelligent theoretical value is used for reflecting the reading condition of the user in multiple intelligent categories.
The multivariate intelligent theory was proposed by gardner, harvard university, usa in 1983. Traditionally, schools have focused on the development of both logical mathematics and linguistic intelligence (reading, writing), but this is not all human intelligence, which is a combination of multiple dimensional intelligence. The following takes a general eight-dimensional intelligent theory as an example, and introduces human multivariate intelligence and the performance corresponding to children one by one:
self-cognitive intelligence: the ability to drill into and understand the heart and world and to direct their behavior. It appears that the child has a profound understanding of himself.
Music intelligence: feeling, appreciation, playing, singing, ability to create music. It appears that children are more sensitive to rhythm, tone, timbre and melody.
Interpersonal intelligence: learn about others, the ability to collaborate with people. It appears that the child perceives emotional changes of others and reacts appropriately accordingly.
Language intelligence: mastering and applying the abilities of language and characters. It is expressed that children can describe events in language and express ideas to communicate with people.
Physical kinesthetic intelligence: refers to the ability to apply the entire body or a part of the body (including the mouth and hands) to solve a problem or create a product.
Logical mathematical intelligence: logical reasoning, mathematical operations, scientific analysis. It shows that children are interested in causal, logical, etc. relationships between things.
Space intelligence: the ability to transform what is being observed into a model or image of the brain. It appears that children are more sensitive to lines, shapes, colors, spaces, etc.
The zoologist intelligence: the ability to study, summarize and classify all things in nature. It is expressed that children like exploring nature, planting and raising.
Different people can have different intelligent combinations, for example, architect and sculptor's space intelligence is stronger, sportsman and ballet actor's physical kinesthetic intelligence is stronger, customs personnel's interpersonal intelligence is stronger, the self-cognition intelligence of writer is stronger etc.
In order to comprehensively understand the development of the user in the multivariate intelligence, the development condition of the user in each intelligent category can be judged according to the historical reading content of the user or the reading interest picture of the user.
The following description will take the example of determining the multivariate intelligent theoretical value based on the user reading interest portrait as an example. As described in the previous embodiments, the user reading interest picture may include a set of tags. Then after the reading interest portrait of the user is obtained, whether each tag belongs to a certain intelligent category can be determined respectively. For example, the user's reading interest representation includes the following tags: "natural spelling", "astronomical knowledge", "natural knowledge", "everyday words", "Chinese character learning", "human body manufacturer", etc. The tags 'natural spelling', 'everyday language', 'Chinese character learning' are analyzed by AI technology, and the reading interest of the user in the language intelligent category is reflected; the labels of astronomical knowledge, natural knowledge and human manufacturers reflect the reading interest of users in the intelligent category of the musicians.
Alternatively, a corresponding tag may be set for each intelligent category in advance, and then the tag in the user reading interest portrait may be matched with the tag corresponding to each intelligent category, as shown in the following table.
Intelligent category
|
Label 1
|
Label 2
|
Label 3
|
...
|
Language intelligence
|
Natural spelling and reading
|
Common words
|
Chinese character learning device
|
...
|
Boctilogist intelligence
|
Astronomical knowledge
|
Knowledge of nature
|
Human body manufacturer
|
...
|
...
|
...
|
...
|
...
|
... |
If the number of the tags matched by the user in a certain intelligent category is large, the user can be considered to be better developed in the intelligent category. In another possible case, the reading interest portrait of the user includes a group of tags and a weight corresponding to each tag, and at this time, when the development condition of the user in a certain intelligent category is judged, the judgment can be further performed by combining the weights of the matched tags. For example, if the reading interest portrait of the user includes the label "natural knowledge" and the weight of the label is high, the user may be considered to develop well in the intelligent category of the musicians corresponding to the label "natural knowledge".
And step 304, determining the content recommended for the user according to the reading interest portrait of the user and the multivariate intelligent theoretical value.
When the reading content is recommended for the user according to the reading interest portrait of the user, the recommended content is more likely to be the content in which the user is interested, that is, the possibility that the user reads the recommended content is higher.
When recommending content for the user according to the multivariate intelligent theoretical value, on one hand, the content which the user is interested in can be recommended for the user, for example, if the user develops well in the language intelligent category, the user is more likely to be interested in the content related to the language intelligent category; on the other hand, the content related to the intelligent category with relatively weak development of the user can be recommended for the user to help the user to realize the comprehensive development, for example, if the user develops relatively weak in the aspect of the logic mathematical intelligent category, the reading content related to the logic mathematical intelligence can be recommended for the user; in addition, recommendation can be performed according to user requirements, for example, if the user desires to become an architect, and the architect needs strong space intelligence, reading content related to the space intelligence can be recommended for the user.
Therefore, the content recommended to the user is determined according to the reading interest portrait of the user and the multivariate intelligent theoretical value, so that the reading interest of the user can be met, and the development requirement of the user can be met.
Further, after the step 301, the information of the age, sex, location, and the like of the user may be counted and analyzed according to the obtained historical reading data, and the reading content may be recommended to the user according to the analyzed user information. For example, if the plurality of sketches read by the user are sketches suitable for being read by children aged 3 to 5, in step 304, contents suitable for being read by children aged 3 to 5 may be recommended to the user; if the historical reading content of the user is the content which is interested by the adolescent girls, the content which is suitable for the adolescent girls to read can be recommended to the user.
In addition, reading content can be recommended for the user according to the popularity. For example, the higher the number of reading clicks of the sketches A and B suitable for the children of 6-10 years old, the sketches A and B can be recommended to the users of 6-10 years old.
In one embodiment, when determining recommended content for a user, a specific flow may be as shown in fig. 4. Firstly, reading historical data of a user, data shot and recorded through a computer vision technology, ordering historical data, collecting historical data and other behavior data of the user are obtained; then analyzing information such as age (or age group), sex and the like of the user according to the data, constructing a reading interest portrait of the user and determining an intelligent theoretical value of the user; and inputting the analyzed information, the read interest portrait and the intelligent theoretical value of the user into a recommendation system through a user characteristic implantation layer. Then, filtering a large amount of reading contents in the reading library, for example, filtering a large amount of reading contents according to the age (paragraph), gender, heat, and the like of the user, inputting the candidate reading contents obtained after filtering into the recommendation system, and respectively passing through a dense layer (dense layer) and a discarding regularization layer (dropout layer) in the recommendation system, wherein the dense layer is used for classification, and the discarding regularization layer is used for temporarily discarding a part of neural network units from the network according to a certain probability in the training process of the deep learning network; and then, inputting the reading contents passing through the two layers into a two-classification layer (sigmod), namely judging whether each reading content is recommended to the user or not, and outputting the reading contents which are determined to be recommended to the user.
Based on the same technical concept, the embodiment of the present application further provides a user representation constructing apparatus, as shown in fig. 5, the apparatus may include:
an obtaining module 501, configured to obtain historical reading data of a user;
a determining module 502, configured to obtain a tag of each piece of historical reading content in the historical reading data, and determine a weight of each tag;
a filtering module 503, configured to filter the tags according to the weights;
and a construction module 504, configured to construct a reading interest representation of the user according to the filtered tags.
Optionally, the determining module 502 is specifically configured to:
determining a first weight of a user behavior type corresponding to each label, the times of the user behavior, a time attenuation factor and a second weight determined according to the word frequency;
and determining the weight of the label according to the first weight, the user behavior times, the time attenuation factor and the second weight.
Optionally, the time decay factor is calculated by the following formula:
N(t)=N0e-α(t+l)
wherein t represents the decay time, N0Denotes an initial value of the attenuation, α denotes an attenuation constant, and l denotes an amount of leftward shift.
Optionally, the determining module 502 is specifically configured to:
obtaining the label of each historical reading content through one or more of the following modes: and the web crawler automatically generates and acquires the manually input label.
Optionally, the determining module 502 is specifically configured to:
acquiring text content of each piece of historical reading content, and acquiring keywords from the text content;
determining the weight of each keyword by using a word frequency, maximum entropy and keyword extraction algorithm;
and taking the N keywords with the largest weight as the labels of the historical reading content, wherein N is an integer greater than or equal to 1.
Based on the same technical concept, an embodiment of the present application further provides a content recommendation apparatus, as shown in fig. 6, the apparatus may include:
a construction module 601, configured to construct a reading interest portrait of a user according to any embodiment of the user reading interest construction method;
a determining module 602, configured to determine a multivariate intelligent theoretical value of a user according to historical reading data of the user, where the multivariate intelligent theoretical value is used to reflect reading conditions of the user in multiple intelligent categories;
and the recommending module 603 is configured to determine content recommended for the user according to the reading interest portrait and the multivariate intelligent theoretical value.
Optionally, the determining module 602 is specifically configured to:
obtaining a label of each piece of historical reading content in the historical reading data, and determining the intelligent category to which the content belongs according to the label;
counting the historical reading number of each intelligent category;
and generating a multivariate intelligent theoretical value according to the historical reading number of each intelligent category.
Optionally, the apparatus may further include a deep learning module 604, configured to perform deep learning according to the historical reading data;
the recommending module 603 is further configured to determine content recommended for the user according to the deep learning result.
Based on the same technical concept, the embodiment of the present application further provides a user representation construction device, as shown in fig. 7, the device 700 includes: at least one processor 710, a memory 720 communicatively coupled to the at least one processor 710;
the at least one processor 710 is configured to read a program in the memory, and to perform the following steps:
acquiring historical reading data of a user;
obtaining the label of each piece of historical reading content in the historical reading data, and determining the weight of each label;
filtering the tags according to the weights;
and constructing a reading interest portrait of the user according to the filtered tags.
Optionally, when determining the weight of each tag, the processor 710 is specifically configured to:
determining a first weight of a user behavior type corresponding to each label, the times of the user behavior, a time attenuation factor and a second weight determined according to the word frequency;
and determining the weight of the label according to the first weight, the user behavior times, the time attenuation factor and the second weight.
Optionally, the time attenuation factor is calculated by the following formula:
N(t)=N0e-α(t+l)
wherein t represents the decay time, N0Denotes an initial value of the attenuation, α denotes an attenuation constant, and l denotes an amount of leftward shift.
Optionally, when the processor 710 obtains the tag of each piece of historical reading content, the processor is specifically configured to:
obtaining the label of each historical reading content through one or more of the following modes: and the web crawler automatically generates and acquires the manually input label.
Optionally, when the processor 710 obtains the tag of each piece of historical reading content in an automatic generation manner, the processor is specifically configured to:
acquiring text content of each piece of historical reading content, and acquiring keywords from the text content;
determining the weight of each keyword by using a word frequency, maximum entropy and keyword extraction algorithm;
and taking the N keywords with the largest weight as the labels of the historical reading content, wherein N is an integer greater than or equal to 1.
Based on the same technical concept, an embodiment of the present application further provides a content recommendation device, as shown in fig. 8, where the device 800 includes: at least one processor 810, a memory 820 communicatively coupled to the at least one processor 810;
the at least one processor 810 is configured to read a program in the memory, and is configured to perform the following steps:
constructing a reading interest portrait according to the method of the embodiment;
determining a multivariate intelligent theoretical value of a user according to historical reading data of the user, wherein the multivariate intelligent theoretical value is used for reflecting the reading condition of the user in multiple intelligent categories;
and determining the content recommended for the user according to the reading interest portrait and the multivariate intelligent theoretical value.
Optionally, when determining the multivariate intelligent theoretical value of the user according to the historical reading data, the processor 810 is specifically configured to:
obtaining a label of each piece of historical reading content in the historical reading data, and determining the intelligent category to which the content belongs according to the label;
counting the historical reading number of each intelligent category;
and generating a multivariate intelligent theoretical value according to the historical reading number of each intelligent category.
Optionally, the processor 810 is further configured to:
and performing deep learning according to the historical reading data, and determining the content recommended for the user according to the deep learning result.
Based on the same technical concept, embodiments of the present application further provide a computer-readable storage medium, where instructions are stored on the computer-readable storage medium, and when executed by a processor, the instructions may implement the user portrait construction method or the content recommendation method.
In addition, other identical elements exist. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.