CN113378020A - Acquisition method, device and computer readable storage medium for similar film watching users - Google Patents

Acquisition method, device and computer readable storage medium for similar film watching users Download PDF

Info

Publication number
CN113378020A
CN113378020A CN202110638357.2A CN202110638357A CN113378020A CN 113378020 A CN113378020 A CN 113378020A CN 202110638357 A CN202110638357 A CN 202110638357A CN 113378020 A CN113378020 A CN 113378020A
Authority
CN
China
Prior art keywords
users
user
film watching
user group
film
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110638357.2A
Other languages
Chinese (zh)
Inventor
张潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL New Technology Co Ltd
Original Assignee
Shenzhen TCL New Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL New Technology Co Ltd filed Critical Shenzhen TCL New Technology Co Ltd
Priority to CN202110638357.2A priority Critical patent/CN113378020A/en
Publication of CN113378020A publication Critical patent/CN113378020A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

An acquisition method of similar viewing users comprises the following steps: acquiring the film watching statistical data of each film watching user in a plurality of film watching users, wherein the film watching statistical data comprise the ratio of the film and television types watched by each film watching user in a preset time period and the ratio of watched channels; clustering a plurality of film watching users into n user groups according to the label proportion data, wherein n is a natural number greater than 1; setting a similarity threshold corresponding to each user group for the n user groups according to the number of viewing users contained in each user group of the n user groups; and determining film watching users with similar characteristics in each user group based on the similarity threshold set by each user group. According to the technical scheme, the calculated amount is reduced, when similarity sorting is carried out after the similarity of the film watching users is subsequently calculated, the similarity behind partial sorting can be filtered, the calculated amount during sorting can be reduced, and the overflow of the memory can be prevented.

Description

Acquisition method, device and computer readable storage medium for similar film watching users
Technical Field
The invention relates to the field of artificial intelligence, in particular to a method and equipment for acquiring similar film watching users and a computer readable storage medium.
Background
With the advancement of big data technology, it is common in the video industry to recommend video content to users based on user portrayal. The nature of the user representation is to label each user with labels such as gender, age, region, and disposition, and then intervene in the experience enhancement scheme for the user's characteristic attributes. And calculating the similarity of the users based on the user portrait, and then recommending corresponding video content to the users. At present, the similarity calculation of users is optimized among algorithms, namely, the definition of what label is marked on each user, and the algorithm with the best effect and the highest efficiency is continuously found. However, as the user population increases, the data volume expands by a factor, and the requirement for the algorithm increases in difficulty. When the data volume breaks through tens of millions, the conventional similarity calculation method is no longer practical, and the fundamental reason is that the calculation time is suddenly increased due to sudden increase of the data volume, so that the algorithm cannot be normally converged.
Disclosure of Invention
The application provides a method and equipment for obtaining similar viewing users and a computer readable storage medium, so that the calculation time is reduced, and the calculation accuracy of the similarity is improved.
In one aspect, the present application provides a method for obtaining users with similar viewing, including:
acquiring the film watching statistical data of each film watching user in a plurality of film watching users, wherein the film watching statistical data comprise channels watched by each film watching user in a preset time period, watched film types, watched times and label proportion data, and the label proportion data comprise the proportion of the film types watched by each film watching user in the preset time period and the proportion of the watched channels;
clustering the plurality of film watching users into n user groups according to the label proportion data, wherein n is a natural number greater than 1;
setting similarity threshold values corresponding to the user groups for the n user groups according to the number of film watching users contained in the user groups;
and determining film watching users with similar characteristics in each user group based on the similarity threshold set by each user group.
On the other hand, the present application provides an apparatus for obtaining users with similar viewing, comprising:
the system comprises an acquisition module, a comparison module and a comparison module, wherein the acquisition module is used for acquiring the film watching statistical data of each film watching user in a plurality of film watching users, the film watching statistical data comprise channels watched by each film watching user in a preset time period, the types and times of watched films and labels, and the labels comprise the ratio of the types of the watched films and the watched channels watched by each film watching user in the preset time period;
the clustering module is used for clustering the plurality of film watching users into n user groups according to the label proportion data, wherein n is a natural number greater than 1;
the setting module is used for setting a similarity threshold value corresponding to each user group for the n user groups according to the number of film watching users contained in each user group of the n user groups;
and the determining module is used for determining the film watching users with similar characteristics in each user group based on the similarity threshold set by each user group.
In a third aspect, the present application provides an apparatus, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the technical solution of the acquisition method for the similar viewing user as described above when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the technical solution of the acquisition method for the similar viewing user as described above.
According to the technical scheme provided by the application, on one hand, since a plurality of film watching users are clustered into n user groups in advance and then calculated in each user group when the film watching users with similar characteristics are determined, compared with the prior art that all film watching users are used as one group to calculate the film watching users with similar characteristics, the calculation amount is reduced; on the other hand, according to the number of film watching users contained in each user group of the n user groups, a corresponding similarity threshold value is set for each user group, so that when similarity sorting is carried out after the similarity of the film watching users is calculated subsequently, the similarity behind partial sorting can be filtered, the calculation amount in sorting can be reduced, and the memory overflow can be prevented; in the third aspect, the similarity is calculated in the clustered user groups, which is equivalent to narrowing the range, so that the calculation of the similarity can be more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for obtaining similar viewing users according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of an acquisition apparatus for similar viewing users according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an apparatus provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In this specification, adjectives such as first and second may only be used to distinguish one element or action from another, without necessarily requiring or implying any actual such relationship or order. References to an element or component or step (etc.) should not be construed as limited to only one of the element, component, or step, but rather to one or more of the element, component, or step, etc., where the context permits.
In the present specification, the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The application provides a method for obtaining users with similar viewing, as shown in fig. 1, which mainly includes steps S101 to S104, as detailed below:
step S101: the method comprises the steps of obtaining viewing statistical data of each viewing user in a plurality of viewing users, wherein the viewing statistical data comprise channels watched by each viewing user in a preset time period, types and times of watched videos and label proportion data, and the label proportion data comprise proportion of the types of the videos watched by each viewing user in the preset time period and proportion of the watched channels.
In the embodiment of the present application, the viewing user refers to a user who views video contents such as a television show, a movie, a synthesis art, and a recording, and does not refer to a user who views a movie in a narrow sense. The video information and the channel where the video information is located, such as a drama channel, a movie channel or a kids channel, can be extracted from the movie unit associated media asset library, and then the occupation ratio of the types of videos watched by the film watching users in the preset time period and the occupation ratio of the watched channels are counted, so that the film watching statistical data of each of the plurality of film watching users can be obtained. Because the data volume of the viewing record is large, for example, three million viewing records can be generated each day, one viewing record corresponds to a plurality of video types, and the data volume can break through ten million levels by performing burst operation after the data association, udf of the spark model can be referred to complete the calculation process, so that the data can be read from the hive table, and the data can be converted into udf and stored into the hive table to form a closed-loop data processing flow.
It should be noted that, in the embodiment of the present application, it is considered that the ratio of the types of movies viewed by each viewing user in the preset time period (and/or the ratio of the channels viewed by each viewing user in the preset time period) rather than the absolute number of the types of movies viewed by each viewing user in the preset time period is used as the tag data, and the ratio reflects the preference of the viewing user. For example, viewing user a has watched 100 total love, comedy and horror in the past month, wherein love 60, comedy 30 and horror 10, while viewing user B has watched 300 total love, comedy and horror in the past month, wherein love 100, comedy 20 and horror 180, although viewing user B has watched more than viewing user a has watched love 30, and viewing user a has watched more than viewing user B has watched comedy, whereas viewing user a has watched only 33.33%, 6.67% and 60% of love, therefore, it is reasonable to assume that watching user a prefers love photos and watching user B prefers horror photos.
Step S102: and clustering a plurality of viewing users into n user groups according to the label proportion data, wherein n is a natural number greater than 1.
In this embodiment of the present application, a Mapreduce framework may be adopted, and multiple viewing users are clustered into n user groups according to the label proportion data, and the specific clustering algorithm may be any one of the clustering algorithms in the prior art, for example, any one of hierarchical clustering, k-means algorithm, EM algorithm, DBSCAN algorithm, OPTICS algorithm, Mean Shift algorithm, and spectral clustering algorithm, or a combination thereof, which is not limited in this application.
It should be noted that, since a plurality of viewing users are clustered into n user groups, and then when the similarity of the viewing users is calculated, the similarity of the viewing users in the user group is calculated in each user group instead of using the whole viewing user as a calculation object as in the prior art, the calculation amount is greatly reduced. For example, if there are 3 ten thousand viewing users, and the 3 ten thousand viewing users are regarded as a user group as in the prior art, when calculating the similarity of the viewing users, the relationship matrix will be a matrix of 30000 × 30000 order, which means there will be 9 hundred million pieces of data, and it will be an extremely complicated and computation problem to calculate the similarity of the viewing users by traversing the 9 hundred million pieces of data. However, if the 3 ten thousand viewing users are firstly clustered, and if the clustering is a user group in which 5 viewing users are 6000, the relationship matrix of each user group is a 6000 x 6000 order matrix, that is, when the similarity of the viewing users in each user group is calculated, only 3600 ten thousand data volumes need to be traversed, and compared with the prior art, the calculation amount and the complexity degree are greatly reduced. As for the clustering of multiple viewing users, since the clustering is performed in the Mapreduce framework, the Mapreduce framework can cope with the situation even if the number of viewing users is in the order of tens of millions, compared with the Mapreduce framework, which has strong computing power.
When a plurality of viewing users are clustered, the clustering model obtained through training is stored in a file system of a hadoop frame, and the clustering result is stored in a hive table.
Step S103: and setting a similarity threshold corresponding to each user group for the n user groups according to the number of viewing users contained in each user group of the n user groups.
Generally, as long as the clustering algorithm performed on the user group in the early stage is scientific and reasonable, if the number of viewing users included in the user group 1 is greater than the number of viewing users included in the user group 2, and the same similarity threshold is still set as in the prior art when the similarity threshold is set for the user group 1 and the user group 2, the number of viewing users with similar characteristics in the user group 1 needs to be greater than the number of viewing users with similar characteristics in the user group 2, which also means that when the similarity of viewing users in each user group is calculated, when only the top n viewing users with similar characteristics need to be ranked, more work needs to be performed on the user group 1. Therefore, in the embodiment of the present application, a similarity threshold corresponding to each user group may be set for the n user groups according to the number of viewing users included in each user group of the n user groups. Specifically, a reference similarity threshold Th may be setbCounting the number of film watching users contained in a first user group and the number of film watching users contained in a second user group of n user groups, and if the number of the film watching users contained in the first user group is larger than the number of the film watching users contained in the second user group, setting a first similarity threshold Th1 for the first user groupSetting a first similarity threshold Th1 for the second user group, wherein the reference similarity threshold ThbThe relationship between the first similarity threshold Th1 and the first similarity threshold Th2 is Th2<Thb<Th 1. For example, assume that, as in the prior art, a similarity threshold of 0.8 is set for both the user group 1 and the user group 2, according to the similarity threshold, 7 film watching users with similar features in the user group 1 are provided, and 3 film watching users with similar features in the user group 2 are provided, further assuming that a top3 ranking of similarity is to be performed on each user group, then 7 film watching users with similar features in the user group 1 need to be traversed and ranked, and finally 3 film watching users with similarity in the top3 are excluded. In the embodiment of the present application, it is assumed that a reference similarity threshold Th is set firstbWhen the number of viewing users included in the user group 1 is counted to be greater than the number of viewing users included in the user group 2, the similarity threshold of the user group 1 is compared with the reference similarity threshold ThbUp-regulation, e.g. setting the similarity threshold of user group 1 to 0.88, and the similarity threshold of user group 2 relative to the reference similarity threshold ThbFor example, if the similarity threshold of the user group 2 is set to 0.75, according to the above setting, the number of viewing users with similar features in the user group 1 is reduced to 4, and even if the number of viewing users with similar features in the user group 2 increases to 4, the traversing and sorting computation amount of the user group 1 is significantly reduced compared to the case where the similarity threshold is 0.8, where 7 viewing users with similar features in the user group 1 need to be traversed and sorted, and the traversing and sorting computation amount of the user group 2 needs to be traversed and sorted, and although the traversing and sorting computation amount of the user group 2 increases, the total computation amount of the user group 1 and the user group 2 is reduced.
The above assumptions about the user group 1 and the user group 2 are both of a small number of levels, and in fact, when the assumption about the user group 1 or the user group 2 is of several thousands or tens of thousands, the reference similarity threshold Th is setbThe similarity threshold value of each user group is reasonably adjusted up or down, which is beneficial to obviously reducing the calculated amount, and unnecessary similarity calculation results do not need to be stored or cachedThereby preventing the overflow of the memory.
Step S104: and determining film watching users with similar characteristics in each user group based on the similarity threshold set by each user group.
Specifically, as an embodiment of the present application, the determining of viewing users with similar features in each user group based on the similarity threshold set by each user group may be: and calculating the similarity of the film watching users in each user group according to the label proportion data, if the similarity is greater than a similarity threshold set by each user group, sequencing the similarities from large to small, and determining the film watching users with the top m of sequencing results as the film watching users with similar characteristics, wherein m is a preset value. It should be noted that the similarity of viewing users is calculated in the user groups to which the viewing users belong, rather than between the user groups. As for the specific calculation method of the similarity, a cosine similarity calculation mode may be adopted, and other similarity calculation modes may also be adopted, which is not limited in this application.
For the newly added film viewing user, in the embodiment of the present application, a cold start strategy may be adopted, that is, a trained model is adopted to re-cluster the newly added user and the plurality of film viewing users, where the trained model is a model adopted when the plurality of film viewing users are clustered according to the label proportion data in the foregoing embodiment.
As can be seen from the above method for acquiring similar viewing users illustrated in fig. 1, on one hand, since a plurality of viewing users have been clustered into n user groups in advance, and then calculated in each user group when determining viewing users with similar characteristics, the amount of calculation is reduced compared with the prior art that all viewing users are used as a group to calculate viewing users with similar characteristics; on the other hand, according to the number of film watching users contained in each user group of the n user groups, a corresponding similarity threshold value is set for each user group, so that when similarity sorting is carried out after the similarity of the film watching users is calculated subsequently, the similarity behind partial sorting can be filtered, the calculation amount in sorting can be reduced, and the memory overflow can be prevented; in the third aspect, the similarity is calculated in the clustered user groups, which is equivalent to narrowing the range, so that the calculation of the similarity can be more accurate.
Referring to fig. 2, an obtaining apparatus for users with similar viewing provided in this embodiment of the present application may include an obtaining module 201, a clustering module 202, a setting module 203, and a determining module 204, which are detailed as follows:
the acquisition module 201 is configured to acquire viewing statistical data of each viewing user of a plurality of viewing users, where the viewing statistical data includes a channel watched by each viewing user in a preset time period, a type and a number of watched movies, and tag proportion data, and the tag proportion data includes a proportion of a type of movies watched by each viewing user in the preset time period and a proportion of watched channels;
the clustering module 202 is configured to cluster the plurality of film watching users into n user groups according to the tag proportion data, where n is a natural number greater than 1;
the setting module 203 is used for setting a similarity threshold corresponding to each user group for the n user groups according to the number of film watching users contained in each user group of the n user groups;
and the determining module 204 is configured to determine viewing users with similar features in each user group based on the similarity threshold set by each user group.
Optionally, the setting module 203 illustrated in fig. 2 may include a reference value setting unit, a statistic unit, and a threshold setting unit, where:
a reference value setting unit for setting a reference similarity threshold Thb
The statistical unit is used for counting the number of film watching users contained in a first user group and the number of film watching users contained in a second user group of the n user groups;
a threshold setting unit, configured to set a first similarity threshold Th1 for the first user group and a first similarity threshold Th1 for the second user group if the number of viewing users included in the first user group is greater than the number of viewing users included in the second user group,reference similarity threshold ThbThe relationship between the first similarity threshold Th1 and the first similarity threshold Th2 is Th2<Thb<Th1。
Optionally, the determining module 204 illustrated in fig. 2 may include a similarity calculating unit, a sorting unit, and a similarity user determining unit, where:
the similarity calculation unit is used for calculating the similarity of the observation users in each user group according to the label proportion data;
the sorting unit is used for sorting the similarity according to a descending order if the similarity is greater than a similarity threshold set by each user group;
and the similarity user determining unit is used for determining the film watching users with the top m of the sequencing result as film watching users with similar characteristics, wherein m is a preset value.
Optionally, the apparatus illustrated in fig. 2 may further include a re-clustering module, configured to re-cluster, if a new film viewing user is added, the new user and the multiple film viewing users by using a trained model, where the trained model is a model used when the multiple film viewing users are clustered according to the tag proportion data.
As can be seen from the above apparatus for acquiring similar viewing users illustrated in fig. 2, on one hand, since a plurality of viewing users have been clustered into n user groups in advance, and then calculated within each user group when determining viewing users with similar characteristics, the amount of calculation is reduced compared with the prior art that all viewing users are used as a group to calculate viewing users with similar characteristics; on the other hand, according to the number of film watching users contained in each user group of the n user groups, a corresponding similarity threshold value is set for each user group, so that when similarity sorting is carried out after the similarity of the film watching users is calculated subsequently, the similarity behind partial sorting can be filtered, the calculation amount in sorting can be reduced, and the memory overflow can be prevented; in the third aspect, the similarity is calculated in the clustered user groups, which is equivalent to narrowing the range, so that the calculation of the similarity can be more accurate.
Fig. 3 is a schematic structural diagram of an apparatus provided in an embodiment of the present application. As shown in fig. 3, the apparatus 3 of this embodiment mainly includes: a processor 30, a memory 31 and a computer program 32 stored in the memory 31 and executable on the processor 30, such as a program resembling the user's acquisition method. The processor 30, when executing the computer program 32, implements the steps in the above-described embodiment of a similar viewing user acquisition method, such as steps S101 to S104 shown in fig. 1. Alternatively, the processor 30, when executing the computer program 32, implements the functions of the modules/units in the above-described apparatus embodiments, such as the functions of the acquiring module 201, the clustering module 202, the setting module 203, and the determining module 204 shown in fig. 2.
Illustratively, the computer program 32 of the acquisition method for similar viewing users mainly includes: acquiring the film watching statistical data of each film watching user in a plurality of film watching users, wherein the film watching statistical data comprise channels watched by each film watching user in a preset time period, watched film types, watched times and label proportion data, and the label proportion data comprise proportion of the film types watched by each film watching user in the preset time period and proportion of the watched channels; clustering a plurality of film watching users into n user groups according to the label proportion data, wherein n is a natural number greater than 1; setting a similarity threshold corresponding to each user group for the n user groups according to the number of viewing users contained in each user group of the n user groups; and determining film watching users with similar characteristics in each user group based on the similarity threshold set by each user group.
The computer program 32 may be partitioned into one or more modules/units, which are stored in the memory 31 and executed by the processor 30 to accomplish the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 32 in the device 3. For example, the computer program 32 may be divided into functions of the acquisition module 201, the clustering module 202, the setting module 203, and the determination module 204 (modules in the virtual device), and the specific functions of each module are as follows: the acquisition module 201 is configured to acquire viewing statistical data of each viewing user of a plurality of viewing users, where the viewing statistical data includes a channel watched by each viewing user in a preset time period, a type and a number of watched movies, and tag proportion data, and the tag proportion data includes a proportion of a type of movies watched by each viewing user in the preset time period and a proportion of watched channels; the clustering module 202 is configured to cluster the plurality of film watching users into n user groups according to the tag proportion data, where n is a natural number greater than 1; the setting module 203 is used for setting a similarity threshold corresponding to each user group for the n user groups according to the number of film watching users contained in each user group of the n user groups; and the determining module 204 is configured to determine viewing users with similar features in each user group based on the similarity threshold set by each user group.
The device 3 may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is merely an example of a device 3 and does not constitute a limitation of device 3 and may include more or fewer components than shown, or some components in combination, or different components, e.g., a computing device may also include input-output devices, network access devices, buses, etc.
The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may be an internal storage unit of the device 3, such as a hard disk or a memory of the device 3. The memory 31 may also be an external storage device of the device 3, such as a plug-in hard disk provided on the device 3, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 31 may also include both an internal storage unit of the device 3 and an external storage device. The memory 31 is used for storing computer programs and other programs and data required by the device. The memory 31 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as required to different functional units and modules, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other ways. For example, the above-described apparatus/device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-transitory computer readable storage medium. Based on such understanding, all or part of the processes in the method of the embodiments described above may also be implemented by instructing related hardware by a computer program, where the computer program similar to the method for obtaining users may be stored in a computer readable storage medium, and when being executed by a processor, may implement the steps of the embodiments of the methods described above, that is, obtaining viewing statistical data of each of a plurality of users, where the viewing statistical data includes a channel watched by each user in a preset time period, a type and a number of watched videos, and tag proportion data, and the tag proportion data includes a proportion of types of videos watched by each user in the preset time period and a proportion of watched channels; clustering a plurality of film watching users into n user groups according to the label proportion data, wherein n is a natural number greater than 1; setting a similarity threshold corresponding to each user group for the n user groups according to the number of viewing users contained in each user group of the n user groups; and determining film watching users with similar characteristics in each user group based on the similarity threshold set by each user group. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The non-transitory computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the non-transitory computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, non-transitory computer readable media does not include electrical carrier signals and telecommunications signals as subject to legislation and patent practice. The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application. The above-mentioned embodiments, objects, technical solutions and advantages of the present application are described in further detail, it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present invention.

Claims (10)

1. A method for obtaining users with similar viewing, the method comprising:
acquiring the film watching statistical data of each film watching user in a plurality of film watching users, wherein the film watching statistical data comprise channels watched by each film watching user in a preset time period, watched film types, watched times and label proportion data, and the label proportion data comprise the proportion of the film types watched by each film watching user in the preset time period and the proportion of the watched channels;
clustering the plurality of film watching users into n user groups according to the label proportion data, wherein n is a natural number greater than 1;
setting similarity threshold values corresponding to the user groups for the n user groups according to the number of film watching users contained in the user groups;
and determining film watching users with similar characteristics in each user group based on the similarity threshold set by each user group.
2. The method for acquiring similar film viewing users as claimed in claim 1, wherein said setting a similarity threshold corresponding to each user group for said n user groups according to the number of film viewing users included in each user group of said n user groups comprises:
setting a reference similarity threshold Thb
Counting the number of film watching users contained in the first user group and the number of film watching users contained in the second user group of the n user groups;
if the number of film watching users contained in the first user group is larger than that contained in the second user group, setting a first similarity threshold Th for the first user group1Setting a first similarity threshold Th for the second group of users1The reference similarity threshold ThbA first similarity threshold Th1And a first similarity threshold Th2Is Th2<Thb<Th1
3. The method for acquiring similar viewing users as claimed in claim 1, wherein said determining viewing users with similar characteristics in each user group based on the similarity threshold set by each user group comprises:
calculating the similarity of the viewing users in each user group according to the label proportion data;
if the similarity is larger than the similarity threshold set by each user group, sorting the similarities from large to small;
and determining the film watching users with the top m of the sequencing result as film watching users with similar characteristics, wherein m is a preset value.
4. A method for obtaining users with similar appearances according to any one of claims 1 to 3, characterized in that the method further comprises:
and if the film watching users are newly added, re-clustering the newly added users and the plurality of film watching users by adopting a trained model, wherein the trained model is a model adopted when the plurality of film watching users are clustered according to the label proportion data.
5. An apparatus for obtaining a similar viewing user, the apparatus comprising:
the system comprises an acquisition module, a comparison module and a comparison module, wherein the acquisition module is used for acquiring the film watching statistical data of each film watching user in a plurality of film watching users, the film watching statistical data comprise channels watched by each film watching user in a preset time period, the types and times of watched films and labels, and the labels comprise the ratio of the types of the watched films and the watched channels watched by each film watching user in the preset time period;
the clustering module is used for clustering the plurality of film watching users into n user groups according to the label proportion data, wherein n is a natural number greater than 1;
the setting module is used for setting a similarity threshold value corresponding to each user group for the n user groups according to the number of film watching users contained in each user group of the n user groups;
and the determining module is used for determining the film watching users with similar characteristics in each user group based on the similarity threshold set by each user group.
6. The apparatus for obtaining a similar viewing user as in claim 5, wherein said setting module comprises:
a reference value setting unit for setting a reference similarity threshold Thb
The statistical unit is used for counting the number of film watching users contained in the first user group and the number of film watching users contained in the second user group of the n user groups;
a threshold setting unit, configured to set a first similarity threshold Th for the first user group if the number of film viewing users included in the first user group is greater than the number of film viewing users included in the second user group1Setting a first similarity threshold Th for the second group of users1The reference similarity threshold ThbA first similarity threshold Th1And a first similarity threshold Th2Is Th2<Thb<Th1。
7. The acquisition apparatus for users of similar viewing as claimed in claim 5, wherein said determination module comprises:
the similarity calculation unit is used for calculating the similarity of the observation users in each user group according to the label proportion data;
the sorting unit is used for sorting the similarity according to a descending order if the similarity is greater than a similarity threshold set by each user group;
and the similarity user determining unit is used for determining the film watching users with the top m of the sequencing result as film watching users with similar characteristics, wherein m is a preset value.
8. The apparatus for acquiring similar viewing users as in any one of claims 5 to 7, wherein said apparatus further comprises:
and the re-clustering module is used for re-clustering the newly added users and the plurality of film watching users by adopting a trained model if the film watching users are newly added, wherein the trained model is a model adopted when the plurality of film watching users are clustered according to the label proportion data.
9. An apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
CN202110638357.2A 2021-06-08 2021-06-08 Acquisition method, device and computer readable storage medium for similar film watching users Pending CN113378020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110638357.2A CN113378020A (en) 2021-06-08 2021-06-08 Acquisition method, device and computer readable storage medium for similar film watching users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110638357.2A CN113378020A (en) 2021-06-08 2021-06-08 Acquisition method, device and computer readable storage medium for similar film watching users

Publications (1)

Publication Number Publication Date
CN113378020A true CN113378020A (en) 2021-09-10

Family

ID=77576626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110638357.2A Pending CN113378020A (en) 2021-06-08 2021-06-08 Acquisition method, device and computer readable storage medium for similar film watching users

Country Status (1)

Country Link
CN (1) CN113378020A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116781984A (en) * 2023-08-21 2023-09-19 深圳市华星数字有限公司 Set top box data optimized storage method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140068676A1 (en) * 2012-08-28 2014-03-06 Industrial Technology Research Institute Method and system for video collection management, invalid video replacement and continuous video replay
CN103686237A (en) * 2013-11-19 2014-03-26 乐视致新电子科技(天津)有限公司 Method and system for recommending video resource
CN105320702A (en) * 2014-08-04 2016-02-10 Tcl集团股份有限公司 Analysis method and device for user behavior data and smart television
CN106446078A (en) * 2016-09-08 2017-02-22 乐视控股(北京)有限公司 Information recommendation method and recommendation apparatus
CN109063737A (en) * 2018-07-03 2018-12-21 Oppo广东移动通信有限公司 Image processing method, device, storage medium and mobile terminal
WO2020224222A1 (en) * 2019-05-05 2020-11-12 北京三快在线科技有限公司 Target group detection method, device, computer apparatus, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140068676A1 (en) * 2012-08-28 2014-03-06 Industrial Technology Research Institute Method and system for video collection management, invalid video replacement and continuous video replay
CN103686237A (en) * 2013-11-19 2014-03-26 乐视致新电子科技(天津)有限公司 Method and system for recommending video resource
CN105320702A (en) * 2014-08-04 2016-02-10 Tcl集团股份有限公司 Analysis method and device for user behavior data and smart television
CN106446078A (en) * 2016-09-08 2017-02-22 乐视控股(北京)有限公司 Information recommendation method and recommendation apparatus
CN109063737A (en) * 2018-07-03 2018-12-21 Oppo广东移动通信有限公司 Image processing method, device, storage medium and mobile terminal
WO2020224222A1 (en) * 2019-05-05 2020-11-12 北京三快在线科技有限公司 Target group detection method, device, computer apparatus, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116781984A (en) * 2023-08-21 2023-09-19 深圳市华星数字有限公司 Set top box data optimized storage method
CN116781984B (en) * 2023-08-21 2023-11-07 深圳市华星数字有限公司 Set top box data optimized storage method

Similar Documents

Publication Publication Date Title
US10341701B2 (en) Clustering and adjudication to determine a recommendation of multimedia content
US9967628B2 (en) Rating videos based on parental feedback
US9659384B2 (en) Systems, methods, and computer program products for searching and sorting images by aesthetic quality
US8335763B2 (en) Concurrently presented data subfeeds
US20190377955A1 (en) Generating digital video summaries utilizing aesthetics, relevancy, and generative neural networks
US10380249B2 (en) Predicting future trending topics
US9280565B1 (en) Systems, methods, and computer program products for displaying images
CN108259939B (en) New video push control method and device and server
CN109408639B (en) Bullet screen classification method, bullet screen classification device, bullet screen classification equipment and storage medium
US20160147752A1 (en) User-specific media playlists
CN106326391A (en) Method and device for recommending multimedia resources
WO2021155691A1 (en) User portrait generating method and apparatus, storage medium, and device
CN103765421A (en) Content control method, content control apparatus, and program
CN112364202A (en) Video recommendation method and device and electronic equipment
CN114359563B (en) Model training method, device, computer equipment and storage medium
CN111368141A (en) Video tag expansion method and device, computer equipment and storage medium
CN112199582B (en) Content recommendation method, device, equipment and medium
CN111259195A (en) Video recommendation method and device, electronic equipment and readable storage medium
CN111177559A (en) Text travel service recommendation method and device, electronic equipment and storage medium
CN111639230B (en) Similar video screening method, device, equipment and storage medium
CN113761253A (en) Video tag determination method, device, equipment and storage medium
CN111861550A (en) OTT (over the Top) equipment-based family portrait construction method and system
CN113378020A (en) Acquisition method, device and computer readable storage medium for similar film watching users
US9432477B2 (en) Identifying matching video content
US20230069999A1 (en) Method and apparatus for updating recommendation model, computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination