CN112749329A - Content search method, content search device, computer equipment and storage medium - Google Patents

Content search method, content search device, computer equipment and storage medium Download PDF

Info

Publication number
CN112749329A
CN112749329A CN202010354375.3A CN202010354375A CN112749329A CN 112749329 A CN112749329 A CN 112749329A CN 202010354375 A CN202010354375 A CN 202010354375A CN 112749329 A CN112749329 A CN 112749329A
Authority
CN
China
Prior art keywords
user
content
search
class
subclasses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010354375.3A
Other languages
Chinese (zh)
Inventor
彭江军
周智昊
熊欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010354375.3A priority Critical patent/CN112749329A/en
Publication of CN112749329A publication Critical patent/CN112749329A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a content search method, a content search device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: receiving a content search request, wherein the content search request carries a user identifier and a search word; searching at least one content corresponding to the search word from a content database according to the search word; and sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs to obtain a content search result for display. According to the method and the device, based on a user cooperation mechanism, the interest points of a single user can be complemented, and the contents which are possibly interested by the user are mined, so that the sequencing rank of the contents can be advanced during sequencing, the personalized search precision is improved, and the content search accuracy is improved.

Description

Content search method, content search device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a content search method and apparatus, a computer device, and a storage medium.
Background
With the rapid development of the internet, the content on the internet is more and more abundant, and a user can search the content required by the user on the internet through computer equipment. Through analysis, the proportion of the number of times that the user clicks the content with the ranking rank after 5 to the total number of clicks is more than 40%, so that personalized search is necessary, and the content meeting the user interest is referred to the ranking rank.
In order to implement personalized search, in the related art, generally, a user image is constructed based on inherent attributes, browsing behaviors and watching behaviors of a user to depict the gender, region, favorite content type or classification and the like of the user, the user recalls content after submitting a search word, the recall result is ranked by using the user image, and then the ranked result is displayed to the user. Although the browsing information based on a single user can well mine the interests of the user for a period of time at present, all the mined information comes from the interests of the user and is limited by the historical behaviors of the user, so that the content displayed to the user is gradually solidified, and the required content cannot be accurately displayed for the user.
Disclosure of Invention
The embodiment of the application provides a content searching method, a content searching device, computer equipment and a storage medium, which can be used for mining content which a user may be interested in, improving the accuracy of personalized search and improving the accuracy of content search. The technical scheme is as follows:
in one aspect, a content search method is provided, and the method includes:
receiving a content search request, wherein the content search request carries a user identifier and a search word;
searching at least one content corresponding to the search word from a content database according to the search word;
and sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs to obtain a content search result for display.
In one aspect, a content search apparatus is provided, the apparatus including:
the receiving module is used for receiving a content searching request, and the content searching request carries a user identifier and a searching word;
the recall module is used for searching at least one content corresponding to the search word from a content database according to the search word;
and the sequencing module is used for sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs to obtain a content search result for display.
In one possible implementation, the sorting module is configured to:
acquiring at least one type of characteristics of a first characteristic of the search word, a second characteristic of the at least one content or a third characteristic between the search word and the at least one content;
and sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification, and the user behavior characteristics and the at least one class of characteristics corresponding to the user identification group to obtain a content search result for display.
In one possible implementation, the sorting module is configured to:
inputting the user attribute characteristics and the user behavior characteristics corresponding to the user identification, the user behavior characteristics corresponding to the user identification group and the at least one type of characteristics into a sequencing model;
performing fusion processing on the input features based on the sequencing model, performing activation processing on the fused features, and outputting sequencing information of the at least one content;
and sequencing the at least one content according to the sequencing information of the at least one content to obtain a content search result for display.
In one possible implementation, the apparatus further includes:
the first obtaining module is used for obtaining the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs from the stored user attribute characteristics and the user behavior characteristics corresponding to at least one user identification group according to the user identification.
In one possible implementation, the apparatus further includes:
the clustering module is used for clustering the at least one user identifier according to the user attribute characteristics and the user behavior characteristics corresponding to the at least one user identifier to obtain at least one user identifier group;
and the second obtaining module is used for obtaining the user behavior characteristics corresponding to the at least one user identification group according to the user behavior characteristics corresponding to the user identifications included in the at least one user identification group.
In one possible implementation, the clustering module is configured to:
dividing the at least one user identifier into a first number of large classes according to the user attribute characteristics corresponding to the at least one user identifier;
for any one of the first number of major classes, clustering the user identifications included in the any one major class according to the user behavior characteristics corresponding to the user identifications included in the any one major class to obtain minor classes under the any one major class;
and taking the subclass under the first number of major classes as the at least one user identification group.
In one possible implementation, the clustering module is configured to:
determining the subclasses to which the user identifications included in any one of the major classes belong according to the user behavior characteristics corresponding to the user identifications included in the major classes and the initial class centers of a second number to obtain the subclasses of the second number;
updating respective class centers of the second number of subclasses;
merging the subclasses of which the distances between the class centers are smaller than a distance threshold value according to the distances between the class centers of different subclasses in the second number of subclasses to obtain a new class;
updating the class center of the new class obtained by merging;
and repeatedly executing the steps of determining the subclasses to which the user identifications included in any one of the classes belong, updating the class centers of the subclasses, combining the subclasses of which the distances between the class centers are smaller than a distance threshold value, and updating the class center of the new class obtained by combination until convergence, wherein the subclasses obtained during convergence are used as the subclasses under any one of the classes.
In one aspect, a computer device is provided, which includes one or more processors and one or more memories having at least one program code stored therein, which is loaded and executed by the one or more processors to implement the above-mentioned content search method.
In one aspect, a computer-readable storage medium having at least one program code stored therein is provided, the at least one program code being loaded and executed by a processor to implement the above-mentioned content search method.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
after a content search request is received, corresponding content is recalled in a content database according to search terms carried in the content search request, and then the recalled content is sequenced according to user identifications carried in the content search request and user behavior characteristics corresponding to user identification groups to which the user identifications belong, so that a content search result for displaying to a user is obtained. According to the technical scheme, the recall results of the search terms are sorted according to the personal behaviors of the user and the group behaviors of the user, and because the behaviors of the group of the user are likely to be behaviors which the user can possibly perform, the interest points of a single user can be complemented by the mechanism based on user cooperation, and the contents which the user can possibly interest are mined, so that the sorting order of the contents can be advanced during sorting, the accuracy of personalized search is improved, and the accuracy of content search is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an implementation environment of a content search method provided by an embodiment of the present application;
fig. 2 is a flowchart of a content search method provided in an embodiment of the present application;
FIG. 3 is a diagram illustrating a search ranking feature configuration provided by an embodiment of the present application;
fig. 4 is a flowchart of a content search method provided in an embodiment of the present application;
FIG. 5 is a diagram illustrating a search ranking feature configuration provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of a content search apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Before explaining the embodiments of the present application in detail, some terms related to the embodiments of the present application will be explained.
Personalized search: the search different from all users is the same set of sequencing results, personalized search experience is provided for different users, the same search terms are achieved, and different search results are displayed for different users.
Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like.
Machine Learning (ML)/deep Learning is a one-field multi-field cross subject and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
The scheme provided by the embodiment of the application relates to the machine learning technology of artificial intelligence, and provides a content searching method, and the specific implementation process of the method is explained by the following embodiment.
Fig. 1 is a schematic diagram of an implementation environment of a content search method provided in an embodiment of the present application, and referring to fig. 1, the implementation environment may include a terminal 101 and a server 102.
The terminal 101 is connected to the server 102 through a wireless network or a wired network. The terminal 101 may be a smart phone, a tablet computer, a portable computer, or the like. The terminal 101 is installed and operated with an application program supporting content search. Illustratively, the terminal 101 is a terminal used by a user, and a user account is registered in an application running in the terminal 101.
The server 102 may be a cloud computing platform, a virtualization center, or the like. Server 102 is used to provide background services for applications that support content searching. Alternatively, the server 102 undertakes the primary content search work, and the terminal 101 undertakes the secondary content search work; or, the server 102 undertakes the secondary content searching work, and the terminal 101 undertakes the primary content searching work; alternatively, the server 102 or the terminal 101 may be separately provided with the content search work.
Optionally, the server 102 comprises: an access server, a content search server and a database. The access server is used to provide access services for the terminal 101. The content search server is used for providing background services related to content search. The database may include a content database, a user information database, and the like, and the content search server may be one or more servers, which may correspond to different databases based on different services provided by the servers. When the content search server is a plurality of content search servers, at least two content search servers exist for providing different services, and/or at least two content search servers exist for providing the same service, for example, providing the same service in a load balancing manner, which is not limited in the embodiment of the present application.
The terminal 101 may be generally referred to as one of a plurality of terminals, and the embodiment is only illustrated by the terminal 101.
Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer. For example, the number of the terminal may be only one, or several tens or hundreds, or more, and in this case, other terminals are also included in the implementation environment. The number of terminals and the type of the device are not limited in the embodiments of the present application.
Fig. 2 is a flowchart of a content search method according to an embodiment of the present application. The method is executed by a computer device, which may be a terminal or a server, and referring to fig. 2, the method may include:
201. the computer device receives a content search request, the content search request carrying a user identification and a search term.
The user identifier may be a user identifier of any user who submits a content search request to the computer device, and for the form of the user identifier, the user identifier may be a user account logged in the application program. The search term (query) may comprise one word or a plurality of words.
202. The computer device searches at least one content corresponding to the search word from the content database according to the search word.
The content database may be used to store content published by each user to the internet, including any form of content such as video, audio, pictures or documents.
Searching the content database for content corresponding to (matching) the search term may also be referred to as a recall process. The content corresponding to the search term may include content that hits the search term, content that hits a term having a similarity greater than a similarity threshold with the search term, or content that hits a keyword in the search term.
The at least one content searched in step 202 may be unordered, or may have a default order, and for different users, if the same search term is submitted, the content searched in step 202 is the same, and if the at least one content is directly presented to the user, the content search results presented for different users are the same, and personalized search cannot be implemented, so that the searched results need to be further sorted through user information.
203. And the computer equipment sorts the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs to obtain a content search result for display.
The user attribute feature corresponding to any user identifier may be generated based on user attribute data of a user to which the user Identifier (ID) belongs, where the user attribute data includes age, gender, region, and the like. The computer device may generate the user attribute feature based on the user attribute data in a preset feature transformation manner, for example, perform a numerical processing on the user attribute data to obtain the user attribute feature. The user behavior characteristics corresponding to any user identification can be generated based on historical behavior data of the user belonging to the user identification on the content, and can reflect the interest points of the user in a period of time. Wherein the historical behavior data comprises browsing behavior data and click behavior data. For example, the computer device may generate the user behavior feature based on the historical behavior data of the user in a preset feature generation manner, for example, perform numerical processing on the historical behavior data of the user to obtain the user behavior feature. The user behavior characteristics corresponding to any user identification group can be the average value of the behavior characteristics corresponding to the user identifications included in the user identification group, and can reflect the interest points of all users in the group where the users are located.
Through a mechanism based on user cooperation, interest points of a single user can be complemented, and contents possibly interested by the user are mined, so that the contents can be referred to the ranking rank in the ranking process, novelty is added for personalized search of the user, and the accuracy of content search is improved.
According to the method provided by the embodiment of the application, after the content search request is received, the corresponding content is recalled in the content database according to the search terms carried in the content search request, and then the recalled content is sequenced according to the user identification carried in the content search request and the user behavior characteristics corresponding to the user identification group to which the user identification belongs, so that the content search result displayed to the user is obtained. According to the technical scheme, the recall results of the search terms are sorted according to the personal behaviors of the user and the group behaviors of the user, and because the behaviors of the group of the user are likely to be behaviors which the user can possibly perform, the interest points of a single user can be complemented by the mechanism based on user cooperation, and the contents which the user can possibly interest are mined, so that the sorting order of the contents can be advanced during sorting, the accuracy of personalized search is improved, and the accuracy of content search is improved.
The flow shown in fig. 2 is a basic flow of the embodiment of the present application, and in the embodiment corresponding to fig. 4, a detailed flow of the embodiment of the present application will be described based on the basic flow shown in fig. 2.
Before describing the detailed flow of the embodiments of the present application, description will be given to feature types that can be used in the sorting of the embodiments of the present application.
Referring to fig. 3, a schematic diagram of a search ranking feature configuration is provided, and as shown in fig. 3, the features used for ranking may include a first feature (Query dimension feature), a second feature (Document dimension feature), a third feature (Query-Document dimension feature), and a user dimension feature.
The first, second and third features may be referred to as generic features, which are the same for the same search term for a generic search.
The first feature is also a Query dimension feature, which is a feature related to a search term and can be obtained by analyzing the search term. The first feature may include an entity, freshness of the search term, a categorical propensity to which the search term belongs, and a weight of a participle of the search term. For example, the search term is "star a married," and the entity (star a), freshness (day), classification tendency (star event) to which the search term belongs, and weight of the participle of the search term can be given in the search term analysis, such as participle of "star a married," to obtain two participles of "star a" and "married," with weights of 0.6 and 0.4, respectively. These are all Query dimensional features.
The second feature is also a Document dimension feature, and is a feature related to the recalled content (the content returned by the recall engine, that is, the content searched according to the search term), and can be obtained by analyzing the recalled content. The second feature may include a distribution time (upload time) of the content, a content quality, a number of times clicked, and a number of times exposed. For example, the search term is "star a married," one of the recalled documents is entitled "star a divulged message, will marry No. day" and these are documentary dimensional features by analyzing the time of publication, the clarity, the number of times the Document was historically clicked and the number of exposures.
The third feature is also a Query-Document dimension feature, which is a feature between the search term and the recalled Document and can be obtained by analyzing the search term and the recalled content. The third feature may include keyword relevance between the search term and the recalled Document, the number of hits, semantic relevance, and the amount of clicks accumulated in the Query-Document dimension, among others. For example, the search word is "star a married", one of the recalled documents is entitled as "star a divulgence message and will be married in the near future", and the keyword correlation, the number of hits, and the semantic correlation between the "star a divulgence message" and the "star a divulgence message" between the "star a divulgence message and the" star a divulgence message "are analyzed, and after the search word" star a divulgence "is input by each user, the click rate of the Document is" star a divulgence message "and" will be married in the near future ", which are all the Query-Document dimensional features.
The user dimension feature is also a user portrait feature, and may include a user attribute feature and a user behavior feature (a user behavior feature corresponding to a single user identifier), which is a basis for personalization.
The user attribute feature corresponding to any user identifier may be generated based on user attribute data of a user to which the user Identifier (ID) belongs, where the user attribute data includes age, gender, region, and the like. The computer device may generate the user attribute feature based on the user attribute data in a preset feature transformation manner, for example, perform a numerical processing on the user attribute data to obtain the user attribute feature, such as gender and region in table 1. The user attribute characteristics can be seen in table 1.
TABLE 1
User identification Age (age) ... Sex Region of origin
763245 32 ... 1 (woman) ... 1 (front line city)
Besides the user attribute characteristics, namely age, gender and region, the computer equipment can also count the historical behavior data of the user on the content and generate the user behavior characteristics based on the historical behavior data.
The user behavior characteristics may include integrated characteristics of content of interest to the user (content that the user clicked on). Taking a video as an example, the computer device may generate a comprehensive feature of the video that is interested by the user by counting the viewing history data of the video in a period of time, for example, 30 days, and then using the behavior data after cleaning the viewing history data (repeated click or de-duplication of multiple clicks), as shown in table 2.
TABLE 2
User identification Correlation Novelty of disease ... ... Duration of time Viewing time
763245 0.7 0.5 ... ... 1200s 120s
The features shown in table 2 are specific to the user person. The value (0.7) corresponding to the relevance in table 2 is used to indicate the degree of relevance between the video of interest to the user and the search term, and the value is between 0 and 1. The value (0.5) corresponding to the newness in table 2 is used to represent the newness of the video in which the user is interested, and the value of the newness is between 0 and 1, and the newness is used to indicate the hotness degree of the video at the current time point. The value (1200s) corresponding to the duration in table 2 is used to indicate the length of the video. The value (120s) corresponding to the viewing time in table 2 is used to indicate the time when the user views the video, for example, 120s is viewed.
The user behavior characteristics may also include type characteristics of content of interest to the user. The computer device may categorize the content of interest to the user by type, and taking video as an example, the computer device may categorize the video into movies, television shows, animations, sports, entertainment, games, podcasts, hotspots, documentaries, heddles, news, finance, fashion, travel, education, and the like (not all shown in table 2). The statistics of the type categories of the videos watched by the user are performed, and normalized data of the user on the video types of the videos interested (liked) by the user are obtained as the type characteristics of the content interested by the user, as shown in table 3, where the sum of the numerical values in table 3 is equal to 1.
TABLE 3
User identification Film TV play Sports Comprehensive art ... Finance and economics
763245 0.2 0.3 0.02 0.15 ... 0.01
The value corresponding to any type of video in table 3 is the interest value of the user in such video, for example, the value (0.2) corresponding to the movie in table 3 means that the interest value of the user in the video is 0.2, and other types are the same and are not described in detail. The interest value is used for representing the interest degree of the user, and the greater the value, the greater the interest degree of the user.
The advantage of categorizing by video type is that when some search terms have videos in multiple video categories at the same time, the computer device can preferentially select the video category of interest for the user according to the user's preference. If the user likes a movie when the search term (query) is "dishonest do not disturb", the video of the movie can be ranked in front of the art.
In addition to being categorized by video type, the computer device may also be categorized by video content, such as categorizing videos into curios, music, kids, adults, speech, employment, etc. (not all shown in table 2). The content category of the video watched by the user is counted to obtain the normalized data of the user on the video content of the video interested (liked) as the type feature of the content interested by the user, as shown in table 4, the sum of the numerical values in table 4 is equal to 1.
TABLE 4
User identification Ancient dress Illusion Workplace
763245 0.2 0.5 0.01
The value corresponding to the video of any content type in table 4 is the value of interest of the user in the video content, for example, the value (0.2) corresponding to the antique in table 4 means that the value of interest of the user in the antique video content is 0.2, and other content types are the same and are not described again one by one.
The user behavior characteristics may also include temporal characteristics of content of interest to the user. Taking a video as an example, the time can be the time for watching the video, and the time for watching the video is also an important user behavior characteristic, which reflects when the user likes to watch the video, and the fixed rule of the leisure time period in the daily work and rest of the user is reflected behind the time, so that the similarity of behaviors between the user and the user can be well obtained according to the characteristics. The distribution of the temporal profile is shown in table 5.
TABLE 5
User identification 1-3 points Points 11-13 19 to 21 points
763245 0.01 0.15 0.4
The corresponding value of any time period in table 5 may be a ratio of the number of times that the user views the video in the time period to the total number of times that the user views the video in all time periods, and may indicate the user's acceptance of viewing the video in the time period. Table 5 reflects the viewing time distribution of users, and the receptivity of users to recall contents is different for different time periods. Taking a video as an example, a user has a higher acceptance for a long video in a relatively idle time, that is, the user is willing to spend time watching the long video.
As can be seen from tables 1 to 5, the behavior features of the user are very sparse, that is, the values under a large number of features are close to 0, and if the behavior features of the user are directly selected, the weight of the non-sparse features learned by the ranking model becomes higher and higher, and the ranking information of the content with such features becomes higher, so that the content of the personalized search ranking of the user is in several particularly narrow classes. Resulting in the inability to bring fresh different categories of content to the user, which may lose interest over time. Therefore, according to the technical scheme provided by the embodiment of the application, the recall results are sorted according to the individual behaviors of the users and the group behaviors to which the users belong, the interest points of the single user can be complemented by the mechanism based on the user cooperation, and the contents which are possibly interested by the users are mined, so that the sorting order of the contents can be advanced during sorting, the personalized search precision is improved, and the content search accuracy is improved. The specific technical solution is described in the embodiment corresponding to fig. 4.
Fig. 4 is a flowchart of a content search method according to an embodiment of the present application. The method is performed by a computer device, and referring to fig. 4, the method may comprise:
401. and the computer equipment clusters the at least one user identifier according to the user attribute characteristics and the user behavior characteristics corresponding to the at least one user identifier to obtain at least one user identifier group.
The computer device can cluster the user identifications close to the corresponding user attribute characteristics and the user behavior characteristics together according to the user attribute characteristics and the user behavior characteristics corresponding to the at least one user identification to obtain a user identification group. Taking the content as a video and the user behavior characteristics as the behavior characteristics of the user to the content as an example, the computer device can cluster the users with the same user attribute and the same browsing and watching habits, so that the users with the same behavior are gathered together as much as possible, and personalized generalization is performed by means of the characteristics of the clusters.
Regarding clustering, the number k of clusters (k is an integer greater than 0) may not be specified because it is difficult to know in advance how many classes users are suitably classified. The embodiment of the application can select a hierarchical clustering method, and does not specify the number k of clusters.
The computer equipment can combine the user attribute characteristics such as age, gender and region of the user and the user behavior characteristics such as watching time and watching type into clustering characteristics, and can obtain a characteristic matrix X of the user.
Figure BDA0002472965700000121
Wherein each X in the feature matrix XiIs shown asA user profile feature, including user attribute features and user behavior features, may be in the form of a representation of a feature vector.
In general, the feature matrix X is very large, and in the embodiment of the present application, all users may be classified into a large class by performing coarse-grained classification to avoid computational complexity, and then the users in each large class are classified into a small class.
In one possible implementation, the step 401 may include the following steps one to three:
step one, dividing the at least one user identifier into a first number of large classes according to the user attribute characteristics corresponding to the at least one user identifier.
The computer device may perform a coarse-grained division on all the user identifiers in the at least one user identifier according to the user attribute characteristics, so that computational complexity may be avoided. Specifically, the computer device may divide the different types of user attribute features into a plurality of categories, any two categories of which the user attribute features are different types may be combined in the plurality of categories, and the user identifiers in the same category after the user attribute features are combined are taken as a large category, so that a plurality of large categories may be obtained, and at this time, the division of the first level is completed.
For example, the computer device may divide all the user identities into 20-30 major categories according to age and region, and specifically, the computer device may divide the age into 5 categories, such as 10-20 years old, 21-30 years old, 31-40 years old, 41-50 years old, 50-60 years old, etc., divide the region into 5 categories, such as a first-line city, a second-line city, a third-line city, a fourth-line city, and a fifth-line city, etc., and then combine the 5 categories of the age division and the 5 categories of the region division to obtain 25 major categories, which are respectively 10-20 years old and the region being a first-line city, 10-20 years old and the region being a second-line city, … …, 50-60 years old and the region being a fifth-line city, etc.
And secondly, for any one of the first large classes, clustering the user identifications included in the any one large class according to the user behavior characteristics corresponding to the user identifications included in the any one large class to obtain the subclasses under the any one large class.
After the computer device performs the first-level division on the user identifiers, the computer device may perform the second-level clustering on the user identifiers in each large class according to the mode of specifying the minimum distance between the class centers, so that the advantage of avoiding specifying the number of classes to be divided is achieved.
In one possible implementation, the second step may include the following steps a to e:
step a, determining the subclasses to which the user identifications included in any major class belong according to the user behavior characteristics corresponding to the user identifications included in any major class and the initial class centers of the second quantity to obtain the subclasses of the second quantity.
The second number of initial class centers may be any designated class centers, or may be a second number of user behavior features arbitrarily selected from the user behavior features corresponding to all user identifiers.
Each initial class center represents a different subclass, and the computer device may assign a larger class number N (a second number), for example, N is 100 (where N may be as large as possible and may be set at will) according to a K-Means algorithm (hard clustering algorithm). The computer device may then determine to which subclass each of the user identities comprised in any one of the major classes belongs according to the K-Means algorithm.
Specifically, for any user identifier included in any large class, the computer device may allocate, according to a distance (similarity) between a user behavior feature corresponding to the user identifier and each initial class center, the user identifier to the subclass represented by the initial class center having the smallest distance from the user identifier, and use the subclass as the subclass to which the user identifier belongs, until all user identifiers in the user identifier in any large class are completely allocated, so as to obtain a second number of subclasses.
And b, updating the class centers of the second number of subclasses.
The computer device can calculate the class center of each subclass according to the K-Means algorithm, and the class center obtained by respectively calculating each subclass is used as the updated class center of each subclass. Specifically, for any subclass, the computer device may use the average of the user behavior characteristics corresponding to all the user identifiers in any subclass as the updated class center of any subclass.
And c, merging the subclasses of which the distances between the class centers are smaller than the distance threshold value according to the distances between the class centers of different subclasses in the second number of subclasses to obtain a new class.
The computer device may set a distance threshold for controlling the minimum distance between class centers of different subclasses, and for class centers of subclasses generated in each iteration, the computer device may automatically merge two subclasses into a new class if the distance between the class centers of any two subclasses is less than the distance threshold.
And d, updating the class center of the new class obtained by merging.
The computer device can calculate the class center of the new class obtained by combination according to the K-Means algorithm, and the calculated class center is used as the updated class center of the new class. Specifically, the computer device may use an average of the user behavior characteristics corresponding to all the user identifiers in the new class as an updated class center of the new class.
And e, repeatedly executing the steps of determining the subclasses to which the user identifications included in any one of the major classes belong, updating the class centers of the subclasses, combining the subclasses of which the distance between the class centers is smaller than a distance threshold value, and updating the class center of the new class obtained by combination until convergence, wherein the subclasses obtained during convergence are used as the subclasses under any one of the major classes.
The computer device may repeatedly execute the above steps a to e, except that the number of the subclasses in the steps a to e changes along with the iterative process until the class center of each subclass does not change any more, that is, convergence, and then each subclass obtained at this time is taken as the subclass under any one of the major classes.
And step three, taking the subclass under the first number of major classes as the at least one user identification group.
And the computer equipment can obtain the subclasses under each of the first number of major classes through the second step, and takes all the subclasses under the major classes as at least one user identification group obtained by final clustering.
The subclasses under each large class clustered by the computer device according to the hierarchical clustering method may be inconsistent because the number of subclasses clustered in the second hierarchy is different for different large classes classified in the first hierarchy.
The method comprises the steps of firstly carrying out first-level division on all users according to user attribute characteristics to obtain a plurality of major classes, then clustering the users in each major class according to user behavior characteristics to obtain minor classes under each major class, and taking each minor class as a user identification group to obtain at least one user identification group. The scheme for clustering the users with similar user attribute characteristics and user behavior characteristics leads the users with the same behavior to be clustered as much as possible, thereby being capable of realizing personalized generalization by means of the characteristics of the groups.
402. And the computer equipment acquires the user behavior characteristics corresponding to the at least one user identification group according to the user behavior characteristics corresponding to the user identifications included in the at least one user identification group.
After the computer device obtains at least one user identifier group through step 402, for any user identifier group in the at least one user identifier group, the computer device may average the user behavior characteristics corresponding to the user identifiers included in the any user identifier group, and use the average as the user behavior characteristics corresponding to the any user identifier group.
403. The computer device stores the user behavior characteristics corresponding to the at least one user identification group.
The computer device can store at least one user identification group obtained by clustering and the user behavior characteristics corresponding to the user identification group, so that the computer device can directly acquire the user behavior characteristics corresponding to any user group identification stored when needed.
Each user identification group comprises one or more user identifications, new behavior data may be generated on the internet by the user to which the user identification belongs subsequently, and a new user may be registered on the internet, so that the computer device may also update at least one stored user identification group and the corresponding user behavior characteristics at regular time.
It should be noted that steps 401 to 403 are optional steps, which need to be executed before content search is performed, and are not required to be executed each time content search is performed.
404. The computer device receives a content search request, the content search request carrying a user identification and a search term.
The computer device may receive a content search request sent by another device, for example, the computer device is a server, and a user may input a search term on a terminal and trigger a content search request for the search term, for example, click a search button, so that the terminal may send the content search request carrying the search term input by the user and a user identifier of the user to the server.
In some possible embodiments, the computer device may also receive a content search request triggered by a user on the computer device, for example, the computer device is a terminal, and after the content search request is triggered by the user on the terminal, the terminal may receive the content search request.
405. And the computer equipment searches at least one content corresponding to the search word from the content database according to the search word carried in the content search request.
This step 405 is the same as step 202 in the embodiment corresponding to fig. 2, and is not described here again.
406. The computer equipment acquires the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs from the stored user attribute characteristics and the stored user behavior characteristics corresponding to at least one user identification group according to the user identification carried in the content search request.
The computer device may obtain, according to the user identifier carried in the content search request, the user attribute feature and the user behavior feature corresponding to the user identifier carried in the content search request from among the user attribute features and the user behavior features corresponding to the stored at least one user identifier, and obtain, from among the user behavior features corresponding to the stored at least one user identifier group, the user behavior feature corresponding to the user identifier group to which the user identifier belongs.
It should be noted that the step 406 is an optional step, and in some possible embodiments, when receiving a content search request, the computer device may also obtain historical behavior data corresponding to the user identifier according to the user identifier carried in the content search request, generate a user behavior feature corresponding to the user identifier according to the historical behavior data, and perform clustering according to the user behavior feature corresponding to the user identifier and user behavior features corresponding to other user identifiers to obtain a user identifier group to which the user identifier belongs and user behavior features corresponding to the user identifier group.
407. And the computer equipment sorts the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs to obtain a content search result for display.
The computer device can refer the content matched with the two user behavior characteristics in at least one content searched according to the search word to the previous ranking order according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification carried in the content search request and the user behavior characteristics corresponding to the user identification group to which the user identification belongs.
In one possible implementation, the step 407 may include the following steps a and B:
and step A, acquiring at least one type of characteristics of the first characteristics of the search word, the second characteristics of the at least one content or the third characteristics between the search word and the at least one content.
The first, second and third features have been described before in relation to the embodiment of fig. 3 and will not be described further here. The first feature, the second feature and the third feature may be obtained by analyzing, by the computer device, the search term and the recalled content after receiving the content search request and recalling the corresponding content according to the search term carried in the content search request.
Through this step a, the computer device may obtain at least one type of feature selected from the first feature of the search term, the second feature of the at least one content, and the third feature between the search term and the at least one content.
And B, sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification, and the user behavior characteristics and the at least one class of characteristics corresponding to the user identification group to obtain a content search result for display.
The computer device may refer to the content matching the feature in the at least one content to the previous ranking order according to the user attribute feature and the user behavior feature corresponding to the user identifier carried in the content search request, the user behavior feature corresponding to the user identifier group to which the user identifier belongs, and at least one type of feature selected from the first feature, the second feature, and the third feature.
Referring to fig. 5, a schematic diagram of a search ranking feature configuration is provided, as shown in fig. 5, the ranked features include, in addition to the features shown in fig. 3, user dimensional features further including user behavior features of a group to which a user belongs (user behavior features corresponding to a user identification group to which a single user identification belongs).
In one possible implementation, the step B includes the step B1 and the step B2:
step B1, inputting the user attribute feature and the user behavior feature corresponding to the user identifier, the user behavior feature corresponding to the user identifier group and the at least one type of feature into a sorting model, fusing the input features based on the sorting model, activating the fused features, and outputting the sorting information of the at least one content.
The ranking model is used for outputting ranking information of recalled content according to user behavior characteristics corresponding to the input user identification, user behavior characteristics corresponding to the user identification group, and at least one type of characteristics of first characteristics of search terms, second characteristics of content recalled by the search terms or third characteristics between the search terms and the recalled content (the content searched according to the search terms). The ranking information may be a ranking score used to determine the ranking order of the content.
The ranking model can be obtained by training the computer equipment based on training data by adopting a machine learning method. The training data may include user attribute features and user behavior features corresponding to the sample user identifiers, user behavior features corresponding to the sample user identifier groups, and at least one type of feature of the first features of the sample search terms, the second features of the content recalled by the sample search terms, or the third features between the sample search terms and the recalled content. The computer device can input training data into an initial model, perform fusion processing on the input features by the initial model, perform activation processing on the fused features based on an activation function (such as a sigmoid function), output ranking information of contents, then adjust model parameters according to an output result of the initial model until a target training frequency is reached or the model converges, and take a model obtained by current training as the ranking model. The ranking model may be a DNN (Deep Neural Networks) model or a GBDT (gradient boosting decision tree) model, and the type of the model is not limited in the embodiment of the present application.
The processing mode of the ranking model for the features during use is the same as the processing mode for the features during training, the processing mode comprises fusion processing and activation processing, and ranking information of contents is output.
For personalized search, if information of each person is collected and a model is established for each person, the information is gradually solidified along with iteration and does not have generalization capability at the later stage. In the embodiment of the application, the characteristics of the group are introduced into the training data, the characteristic composition of the training data becomes as shown in fig. 5, and the characteristics are put into the ranking model, so that the content personalized search based on the group and individual combined characteristics can be realized, the personalized requirements of the user can be met, the defect that the personalized contents of the user lack novelty is overcome, the model is narrower in iteration, and the generalization capability can be improved. In some possible embodiments, the sorting model may be a one _ stage content search sorting model, i.e. the final sorting result may be obtained directly through a single sorting.
The user behavior characteristics corresponding to the user identifier, the user behavior characteristics corresponding to the user identifier group, and the user attribute characteristics corresponding to the user identifier may constitute user portrait characteristics, also referred to as user dimension characteristics. Compared with the method that after the content is recalled, the content is firstly sorted based on the universal feature and then rearranged based on the user information, the time can be saved and the precision can be improved.
And step B2, sequencing the at least one content according to the sequencing information of the at least one content, and obtaining a content search result for presentation.
The computer device may rank the at least one content recalled by the search term according to the ranking information, taking the ranking information as a ranking score for example, the higher the ranking score, the higher the ranking number. The computer device may present the ranked results as user-presented content search results that include the ranked at least one content.
The sequencing information is obtained by using the sequencing model in the step B1 and the step B2, and then sequencing is carried out according to the sequencing information, because the sequencing model can be obtained by pre-training, the sequencing information can be efficiently output, and the sequencing efficiency can be improved.
The computer device can display the content search result to the user after obtaining the content search result, and for the case that the computer device is the server, the computer device can send the content search result to the terminal, and the terminal displays the content search result on the interface.
The embodiment of the application provides an individual search scheme based on user individual behaviors and user group behaviors, and the user cooperation mechanism is adopted for interest mining, so that the defects of individual interest points are overcome, and the problems of sparseness and bias of individual interest mining are avoided. The user image characteristics and the universal characteristics (the first characteristics, the second characteristics and the third characteristics) directly participate in sequencing, so that the personal information of the user is interacted with the characteristics of the content, the phenomenon that after the content is recalled, sequencing is carried out on the basis of the universal characteristics, and then a rearrangement process based on the user information is carried out is avoided, the time can be saved, and the precision can be improved.
According to the method provided by the embodiment of the application, after the content search request is received, the corresponding content is recalled in the content database according to the search terms carried in the content search request, and then the recalled content is sequenced according to the user identification carried in the content search request and the user behavior characteristics corresponding to the user identification group to which the user identification belongs, so that the content search result displayed to the user is obtained. According to the technical scheme, the recall results of the search terms are sorted according to the personal behaviors of the user and the group behaviors of the user, and because the behaviors of the group of the user are likely to be behaviors which the user can possibly perform, the interest points of a single user can be complemented by the mechanism based on user cooperation, and the contents which the user can possibly interest are mined, so that the sorting order of the contents can be advanced during sorting, the accuracy of personalized search is improved, and the accuracy of content search is improved.
Fig. 6 is a schematic structural diagram of a content search apparatus according to an embodiment of the present application. Referring to fig. 6, the apparatus includes:
a receiving module 601, configured to receive a content search request, where the content search request carries a user identifier and a search term;
a recall module 602, configured to search, according to the search term, at least one content corresponding to the search term from a content database;
the sorting module 603 is configured to sort the at least one content according to the user attribute feature and the user behavior feature corresponding to the user identifier, and the user behavior feature corresponding to the user identifier group to which the user identifier belongs, so as to obtain a content search result for display.
In one possible implementation, the sorting module 603 is configured to:
acquiring at least one type of characteristics of a first characteristic of the search word, a second characteristic of the at least one content or a third characteristic between the search word and the at least one content;
and sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification, and the user behavior characteristics and the at least one class of characteristics corresponding to the user identification group to obtain a content search result for display.
In one possible implementation, the sorting module 603 is configured to:
inputting the user attribute characteristics and the user behavior characteristics corresponding to the user identification, the user behavior characteristics corresponding to the user identification group and the at least one type of characteristics into a sequencing model;
performing fusion processing on the input features based on the sequencing model, performing activation processing on the fused features, and outputting sequencing information of the at least one content;
and sequencing the at least one content according to the sequencing information of the at least one content to obtain a content search result for display.
In one possible implementation, the apparatus further includes:
the first obtaining module is used for obtaining the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs from the stored user attribute characteristics and the user behavior characteristics corresponding to at least one user identification group and the stored user behavior characteristics corresponding to at least one user identification group.
In one possible implementation, the apparatus further includes:
the clustering module is used for clustering the at least one user identifier according to the user attribute characteristics and the user behavior characteristics corresponding to the at least one user identifier to obtain at least one user identifier group;
and the second obtaining module is used for obtaining the user behavior characteristics corresponding to the at least one user identification group according to the user behavior characteristics corresponding to the user identifications included in the at least one user identification group.
In one possible implementation, the clustering module is to:
dividing the at least one user identifier into a first number of large classes according to the user attribute characteristics corresponding to the at least one user identifier;
for any one of the first number of major classes, clustering the user identifications included in the any one major class according to the user behavior characteristics corresponding to the user identifications included in the any one major class to obtain minor classes under the any one major class;
and taking the subclass under the first number of major classes as the at least one user identification group.
In one possible implementation, the clustering module is to:
determining the subclasses to which the user identifications included in any one of the major classes belong according to the user behavior characteristics corresponding to the user identifications included in the major classes and the initial class centers of the second number to obtain the subclasses of the second number;
updating the class centers of the second number of subclasses;
merging the subclasses of which the distances between the class centers are smaller than a distance threshold value according to the distances between the class centers of different subclasses in the second number of subclasses to obtain a new class;
updating the class center of the new class obtained by merging;
and repeatedly executing the steps of determining the subclasses to which the user identifications included in any one of the large classes belong, updating the class centers of the subclasses, combining the subclasses of which the distances between the class centers are smaller than a distance threshold value, and updating the class center of the new class obtained by combination until convergence, wherein the subclasses obtained during convergence are used as the subclasses under any one of the large classes.
In the embodiment of the application, after a content search request is received, corresponding content is recalled in a content database according to search terms carried in the content search request, and then the recalled content is sequenced according to user identifications carried in the content search request and user behavior characteristics corresponding to user identification groups to which the user identifications belong, so that a content search result for displaying to a user is obtained. According to the technical scheme, the recall results of the search terms are sorted according to the personal behaviors of the user and the group behaviors of the user, and because the behaviors of the group of the user are likely to be behaviors which the user can possibly perform, the interest points of a single user can be complemented by the mechanism based on user cooperation, and the contents which the user can possibly interest are mined, so that the sorting order of the contents can be advanced during sorting, the accuracy of personalized search is improved, and the accuracy of content search is improved.
It should be noted that: in the content search apparatus provided in the above embodiment, only the division of the above functional modules is illustrated in the content search, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the content search apparatus and the content search method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments and are not described herein again.
The computer device in the above embodiments may be a terminal.
Fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on.
In general, terminal 700 includes: one or more processors 701 and one or more memories 702.
The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement a content search method provided by method embodiments herein.
In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 704, a display screen 705, a camera assembly 706, an audio circuit 707, a positioning component 708, and a power source 709.
The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 705 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, providing the front panel of the terminal 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.
The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.
The positioning component 708 is used to locate the current geographic Location of the terminal 700 for navigation or LBS (Location Based Service). The Positioning component 708 can be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.
Power supply 709 is provided to supply power to various components of terminal 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When power source 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.
The acceleration sensor 711 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the terminal 700 by the user. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensors 713 may be disposed on a side frame of terminal 700 and/or underneath display 705. When the pressure sensor 713 is disposed on a side frame of the terminal 700, a user's grip signal on the terminal 700 may be detected, and the processor 701 performs right-left hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the terminal 700. When a physical button or a vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical button or the vendor Logo.
The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the display screen 705 is increased; when the ambient light intensity is low, the display brightness of the display screen 705 is adjusted down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.
A proximity sensor 716, also referred to as a distance sensor, is typically disposed on a front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front surface of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the dark screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 is gradually increased, the processor 701 controls the display 705 to switch from the breath-screen state to the bright-screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 7 is not intended to be limiting of terminal 700 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
The computer device in the above embodiments may be a server.
Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 801 and one or more memories 802, where the memory 802 stores at least one program code, and the at least one program code is loaded and executed by the processors 801 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.
In an exemplary embodiment, there is also provided a computer readable storage medium, such as a memory, storing at least one program code, which is loaded and executed by a processor, to implement the content search method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps in implementing the embodiments described above may be implemented by hardware, or may be implemented by hardware associated with program instructions, and that the program may be stored in a computer-readable storage medium, such as a read-only memory, a magnetic or optical disk, and so on.
The present application is intended to cover various modifications, alternatives, and equivalents, which may be included within the spirit and scope of the present application.

Claims (10)

1. A method for searching for content, the method comprising:
receiving a content search request, wherein the content search request carries a user identifier and a search word;
searching at least one content corresponding to the search word from a content database according to the search word;
and sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs to obtain a content search result for display.
2. The method according to claim 1, wherein the step of ordering the at least one content according to the user attribute features and the user behavior features corresponding to the user identifier group to which the user identifier belongs to obtain a content search result for presentation comprises:
acquiring at least one type of characteristics of a first characteristic of the search word, a second characteristic of the at least one content or a third characteristic between the search word and the at least one content;
and sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification, and the user behavior characteristics and the at least one class of characteristics corresponding to the user identification group to obtain a content search result for display.
3. The method according to claim 2, wherein the ranking the at least one content according to the user attribute features and the user behavior features corresponding to the user identifier, and the user behavior features and the at least one category of features corresponding to the user identifier group to obtain a content search result for presentation comprises:
inputting the user attribute characteristics and the user behavior characteristics corresponding to the user identification, the user behavior characteristics corresponding to the user identification group and the at least one type of characteristics into a sequencing model;
performing fusion processing on the input features based on the sequencing model, performing activation processing on the fused features, and outputting sequencing information of the at least one content;
and sequencing the at least one content according to the sequencing information of the at least one content to obtain a content search result for display.
4. The method according to claim 1, wherein before the at least one content is sorted according to the user attribute features and the user behavior features corresponding to the user identifier group to which the user identifier belongs to obtain the content search result for presentation, the method further comprises:
according to the user identification, the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs are obtained from the stored user attribute characteristics and the user behavior characteristics corresponding to at least one user identification group.
5. The method of claim 4, wherein prior to receiving the content search request, the method further comprises:
clustering the at least one user identifier according to the user attribute characteristics and the user behavior characteristics corresponding to the at least one user identifier to obtain at least one user identifier group;
and acquiring the user behavior characteristics corresponding to the at least one user identification group according to the user behavior characteristics corresponding to the user identifications included in the at least one user identification group.
6. The method according to claim 5, wherein the clustering the at least one user identifier according to the user attribute feature and the user behavior feature corresponding to the at least one user identifier to obtain the at least one user identifier group comprises:
dividing the at least one user identifier into a first number of large classes according to the user attribute characteristics corresponding to the at least one user identifier;
for any one of the first number of major classes, clustering the user identifications included in the any one major class according to the user behavior characteristics corresponding to the user identifications included in the any one major class to obtain minor classes under the any one major class;
and taking the subclass under the first number of major classes as the at least one user identification group.
7. The method according to claim 6, wherein for any one of the first number of major classes, clustering the user identifiers included in the any one major class according to the user behavior characteristics corresponding to the user identifiers included in the any one major class to obtain a minor class under the any one major class includes:
determining the subclasses to which the user identifications included in any one of the major classes belong according to the user behavior characteristics corresponding to the user identifications included in the major classes and the initial class centers of a second number to obtain the subclasses of the second number;
updating respective class centers of the second number of subclasses;
merging the subclasses of which the distances between the class centers are smaller than a distance threshold value according to the distances between the class centers of different subclasses in the second number of subclasses to obtain a new class;
updating the class center of the new class obtained by merging;
and repeatedly executing the steps of determining the subclasses to which the user identifications included in any one of the classes belong, updating the class centers of the subclasses, combining the subclasses of which the distances between the class centers are smaller than a distance threshold value, and updating the class center of the new class obtained by combination until convergence, wherein the subclasses obtained during convergence are used as the subclasses under any one of the classes.
8. A content search apparatus, characterized in that the apparatus comprises:
the receiving module is used for receiving a content searching request, and the content searching request carries a user identifier and a searching word;
the recall module is used for searching at least one content corresponding to the search word from a content database according to the search word;
and the sequencing module is used for sequencing the at least one content according to the user attribute characteristics and the user behavior characteristics corresponding to the user identification group to which the user identification belongs to obtain a content search result for display.
9. A computer device, characterized in that the terminal comprises one or more processors and one or more memories, in which at least one program code is stored, which is loaded and executed by the one or more processors to implement the content search method according to any one of claims 1 to 7.
10. A computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to implement the content search method of any one of claims 1 to 7.
CN202010354375.3A 2020-04-29 2020-04-29 Content search method, content search device, computer equipment and storage medium Pending CN112749329A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010354375.3A CN112749329A (en) 2020-04-29 2020-04-29 Content search method, content search device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010354375.3A CN112749329A (en) 2020-04-29 2020-04-29 Content search method, content search device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112749329A true CN112749329A (en) 2021-05-04

Family

ID=75645309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010354375.3A Pending CN112749329A (en) 2020-04-29 2020-04-29 Content search method, content search device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112749329A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297398A (en) * 2021-05-24 2021-08-24 百果园技术(新加坡)有限公司 User recall method and device, computer equipment and storage medium
WO2023234865A1 (en) * 2022-06-01 2023-12-07 Grabtaxi Holdings Pte. Ltd. A communication server, a method, a user device, and a system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297398A (en) * 2021-05-24 2021-08-24 百果园技术(新加坡)有限公司 User recall method and device, computer equipment and storage medium
WO2022247671A1 (en) * 2021-05-24 2022-12-01 百果园技术(新加坡)有限公司 User recall method and apparatus, and computer device and storage medium
WO2023234865A1 (en) * 2022-06-01 2023-12-07 Grabtaxi Holdings Pte. Ltd. A communication server, a method, a user device, and a system

Similar Documents

Publication Publication Date Title
CN109740068B (en) Media data recommendation method, device and storage medium
CN108304441B (en) Network resource recommendation method and device, electronic equipment, server and storage medium
CN109918669B (en) Entity determining method, device and storage medium
CN111897996B (en) Topic label recommendation method, device, equipment and storage medium
CN111858971B (en) Multimedia resource recommendation method, device, terminal and server
CN111368101B (en) Multimedia resource information display method, device, equipment and storage medium
CN112069414A (en) Recommendation model training method and device, computer equipment and storage medium
CN109784351B (en) Behavior data classification method and device and classification model training method and device
CN112163428A (en) Semantic tag acquisition method and device, node equipment and storage medium
CN111831917A (en) Content recommendation method, device, equipment and medium
CN112749329A (en) Content search method, content search device, computer equipment and storage medium
CN113987326B (en) Resource recommendation method and device, computer equipment and medium
CN113032587A (en) Multimedia information recommendation method, system, device, terminal and server
WO2021218634A1 (en) Content pushing
CN111931075B (en) Content recommendation method and device, computer equipment and storage medium
CN114281936A (en) Classification method and device, computer equipment and storage medium
CN110929137A (en) Article recommendation method, article recommendation device, article recommendation equipment and storage medium
CN117217839A (en) Method, device, equipment and storage medium for issuing media resources
CN113486260B (en) Method and device for generating interactive information, computer equipment and storage medium
CN113762585B (en) Data processing method, account type identification method and device
CN109635153B (en) Migration path generation method, device and storage medium
CN112070586A (en) Article recommendation method and device based on semantic recognition, computer equipment and medium
CN111597823A (en) Method, device and equipment for extracting central word and storage medium
CN113704448A (en) Model training method, text pushing device, computer equipment and medium
CN112365294B (en) Film and television work production participation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40049212

Country of ref document: HK

WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210504