CN116842211B - User analysis method and system based on live big data - Google Patents

User analysis method and system based on live big data Download PDF

Info

Publication number
CN116842211B
CN116842211B CN202310819540.1A CN202310819540A CN116842211B CN 116842211 B CN116842211 B CN 116842211B CN 202310819540 A CN202310819540 A CN 202310819540A CN 116842211 B CN116842211 B CN 116842211B
Authority
CN
China
Prior art keywords
data
analysis
user
preset
trusted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310819540.1A
Other languages
Chinese (zh)
Other versions
CN116842211A (en
Inventor
邹旭光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Energy Time Education Technology Co ltd
Original Assignee
Beijing Energy Time Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Energy Time Education Technology Co ltd filed Critical Beijing Energy Time Education Technology Co ltd
Priority to CN202310819540.1A priority Critical patent/CN116842211B/en
Publication of CN116842211A publication Critical patent/CN116842211A/en
Application granted granted Critical
Publication of CN116842211B publication Critical patent/CN116842211B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/54Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs

Abstract

The invention provides a user analysis method and a system based on live big data, wherein the method comprises the following steps: searching and broadcasting big data; extracting the credible data of the live big data to obtain the credible data; acquiring a user analysis task; generating a data analysis template based on the user analysis task; based on the data analysis template, user analysis is carried out according to the trusted data, and a user analysis result is obtained. According to the user analysis method and system based on the live broadcast big data, when user analysis is performed on a user watching live broadcast, user analysis is performed based on the trusted data extracted from the live broadcast big data, so that the problem that user analysis errors are caused by the fact that unreliable data possibly exist in the live broadcast big data is avoided, and the accuracy of live broadcast software optimization is improved.

Description

User analysis method and system based on live big data
Technical Field
The invention relates to the technical field of computer data processing, in particular to a user analysis method and system based on live big data.
Background
Currently, with the gradual development of the live broadcast industry, live broadcast software is becoming popular, and more users watch live broadcast. In order to facilitate optimization of live software, user analysis needs to be performed on users watching live broadcast regularly, for example: and analyzing the preference of the user for watching the live broadcast, and adding live broadcast personnel in the corresponding live broadcast content edition block in the live broadcast software to meet the preference of the content. In performing user analysis, it is often required to perform the analysis based on live big data, for example: a plurality of viewing content histories of users of respective live software viewing live are collected, and content preferences of the users viewing live are analyzed based on the viewing content histories.
However, the acquisition of live broadcast big data is often implemented based on big data technology, and there may be unreliable data in the acquired live broadcast big data, for example: data with unreliable sources, if directly adopted for user analysis, may cause user analysis errors, so that live software optimization is inaccurate.
Thus, a solution is needed.
Disclosure of Invention
The invention aims to provide a user analysis method based on live broadcast big data, which is used for carrying out user analysis based on trusted data extracted from live broadcast big data when carrying out user analysis on users watching live broadcast, so that the problem that the user analysis error is caused by the fact that unreliable data possibly exist in the live broadcast big data is avoided, and the accuracy of live broadcast software optimization is improved.
The user analysis method based on live big data provided by the embodiment of the invention comprises the following steps:
searching and broadcasting big data;
extracting the credible data of the live big data to obtain the credible data;
acquiring a user analysis task;
generating a data analysis template based on the user analysis task;
based on the data analysis template, user analysis is carried out according to the trusted data, and a user analysis result is obtained.
Preferably, the extracting the trusted data from the live big data to obtain the trusted data includes:
analyzing a plurality of data items in the live big data;
the pathway type of the source pathway for obtaining the data item, the pathway type including: active and passive pathways;
when the path type is an active path, respectively acquiring data information of the data item and path information of a source path;
based on a preset first trusted quantization template, carrying out trusted quantization on the data information and the path information to obtain a first trusted value;
when the first credibility value is larger than or equal to a preset first credibility threshold value, taking the data item as credible data;
when the path type is a passive path, credit information of a source path and guarantee information of the source path for carrying out trusted guarantee on the data item are respectively obtained;
based on a preset second trusted quantization template, performing trusted quantization on the credit information and the guarantee information to obtain a second trusted value;
and when the second credibility value is larger than or equal to a preset second credibility threshold value, taking the data item as credible data.
Preferably, the obtaining a user analysis task includes:
acquiring a user analysis task input by a worker;
and/or the number of the groups of groups,
interactively acquiring user analysis requirements of staff;
based on the user analysis requirements, a user analysis task is determined.
Preferably, the interactive obtaining the user analysis requirement of the staff includes:
acquiring a preset authority scene library corresponding to a worker;
acquiring generating operation generated in any authority scene in an authority scene library within the latest preset time of a worker and corresponding operation generating time;
setting the generating operation on a preset time axis based on the operation generating time;
determining a generation operation cluster from the time axis based on the generation operation cluster constraint condition;
determining a predicted demand corresponding to the generated operation cluster from a preset user analysis demand prediction library;
generating a user analysis demand selection list based on the predicted demand, and outputting and displaying the user analysis demand selection list;
receiving predicted demands selected by staff from a user analysis demand selection list and taking the predicted demands as user analysis demands;
wherein generating the operational cluster constraint includes:
the generating operation cluster consists of N generating operations on a time axis; n is an integer greater than or equal to 2;
the operation generation time difference between adjacent generation operations in the generation operation cluster is smaller than or equal to a preset time difference threshold;
the scene relation generated by the generating operation in the generating operation cluster is matched with one standard scene relation in a preset standard scene relation library.
Preferably, generating the data analysis template based on the user analysis task includes:
based on a preset feature extraction template, performing feature extraction on a user analysis task to obtain a plurality of task features;
constructing a feature description vector of a user analysis task based on task features;
determining at least one first data analysis rule corresponding to the feature description vector from a preset data analysis rule base;
determining at least one second data analysis rule corresponding to the feature description vector from a preset preference data analysis rule base corresponding to the staff;
based on the complementary processing rules, carrying out complementary processing on the first data analysis rules and the second data analysis rules to obtain a plurality of third data analysis rules;
generating a data analysis template based on the third data analysis rule;
wherein the complementary processing rules include:
when one data analysis rule of the same type exists in each of the first data analysis rule and the second data analysis rule, respectively acquiring a preset universal preference value corresponding to the data analysis rule of the same type and a personal preference value of staff for the data analysis rule of the same type;
when the personality preference value is greater than or equal to a preset personality preference threshold value and/or the difference value of the personality preference value greater than Yu Pushi preference value is greater than or equal to a preset difference value threshold value, removing the same type of data analysis rules from the first data analysis rules;
the remaining first data analysis rule and second data analysis rule are used as third data analysis rules.
The user analysis system based on live big data provided by the embodiment of the invention is characterized by comprising:
the live broadcast big data searching module is used for searching and broadcasting big data;
the trusted data extraction module is used for extracting the trusted data of the live big data to obtain the trusted data;
the user analysis task acquisition module is used for acquiring user analysis tasks;
the data analysis template generation module is used for generating a data analysis template based on the user analysis task;
and the user analysis module is used for carrying out user analysis according to the trusted data based on the data analysis template to obtain a user analysis result.
Preferably, the trusted data extraction module performs trusted data extraction on live big data to obtain trusted data, including:
analyzing a plurality of data items in the live big data;
the pathway type of the source pathway for obtaining the data item, the pathway type including: active and passive pathways;
when the path type is an active path, respectively acquiring data information of the data item and path information of a source path;
based on a preset first trusted quantization template, carrying out trusted quantization on the data information and the path information to obtain a first trusted value;
when the first credibility value is larger than or equal to a preset first credibility threshold value, taking the data item as credible data;
when the path type is a passive path, credit information of a source path and guarantee information of the source path for carrying out trusted guarantee on the data item are respectively obtained;
based on a preset second trusted quantization template, performing trusted quantization on the credit information and the guarantee information to obtain a second trusted value;
and when the second credibility value is larger than or equal to a preset second credibility threshold value, taking the data item as credible data.
Preferably, the user analysis task obtaining module obtains a user analysis task, including:
acquiring a user analysis task input by a worker;
and/or the number of the groups of groups,
interactively acquiring user analysis requirements of staff;
based on the user analysis requirements, a user analysis task is determined.
Preferably, the user analysis task acquisition module interactively acquires user analysis requirements of staff, including:
acquiring a preset authority scene library corresponding to a worker;
acquiring generating operation generated in any authority scene in an authority scene library within the latest preset time of a worker and corresponding operation generating time;
setting the generating operation on a preset time axis based on the operation generating time;
determining a generation operation cluster from the time axis based on the generation operation cluster constraint condition;
determining a predicted demand corresponding to the generated operation cluster from a preset user analysis demand prediction library;
generating a user analysis demand selection list based on the predicted demand, and outputting and displaying the user analysis demand selection list;
receiving predicted demands selected by staff from a user analysis demand selection list and taking the predicted demands as user analysis demands;
wherein generating the operational cluster constraint includes:
the generating operation cluster consists of N generating operations on a time axis; n is an integer greater than or equal to 2;
the operation generation time difference between adjacent generation operations in the generation operation cluster is smaller than or equal to a preset time difference threshold;
the scene relation generated by the generating operation in the generating operation cluster is matched with one standard scene relation in a preset standard scene relation library.
Preferably, the data analysis template generating module generates a data analysis template based on the user analysis task, including:
based on a preset feature extraction template, performing feature extraction on a user analysis task to obtain a plurality of task features;
constructing a feature description vector of a user analysis task based on task features;
determining at least one first data analysis rule corresponding to the feature description vector from a preset data analysis rule base;
determining at least one second data analysis rule corresponding to the feature description vector from a preset preference data analysis rule base corresponding to the staff;
based on the complementary processing rules, carrying out complementary processing on the first data analysis rules and the second data analysis rules to obtain a plurality of third data analysis rules;
generating a data analysis template based on the third data analysis rule;
wherein the complementary processing rules include:
when one data analysis rule of the same type exists in each of the first data analysis rule and the second data analysis rule, respectively acquiring a preset universal preference value corresponding to the data analysis rule of the same type and a personal preference value of staff for the data analysis rule of the same type;
when the personality preference value is greater than or equal to a preset personality preference threshold value and/or the difference value of the personality preference value greater than Yu Pushi preference value is greater than or equal to a preset difference value threshold value, removing the same type of data analysis rules from the first data analysis rules;
the remaining first data analysis rule and second data analysis rule are used as third data analysis rules.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
fig. 1 is a schematic diagram of a user analysis method based on live big data in an embodiment of the present invention;
fig. 2 is a schematic diagram of a user analysis system based on live big data in an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides a user analysis method based on live big data, which comprises the following steps as shown in fig. 1:
step S1: searching and broadcasting big data; wherein, live big data includes: a large amount of data related to users watching live broadcast by respective live broadcast software, specifically, for example: a viewing content history of a user viewing live broadcast, a publication barrage history of a user viewing live broadcast, a viewing history of a user viewing live broadcast, a historical popularity ranking of each live broadcast content block (such as a game block, an entertainment block, a science and technology block and the like) in live broadcast software and the like;
step S2: extracting the credible data of the live big data to obtain the credible data; the trusted data is the trusted data in the live big data;
step S3: acquiring a user analysis task; wherein the user analysis tasks include: tasks requiring user analysis, specifically, for example: analyzing content preferences of a user for watching live broadcast and the like;
step S4: generating a data analysis template based on the user analysis task; the data analysis template is a template which is executed against the user analysis task, and specifically, for example: analyzing which type of content live broadcast is watched by the user in the watching content history of live broadcast, and the like;
step S5: based on the data analysis template, user analysis is carried out according to the trusted data, and a user analysis result is obtained.
The working principle and the beneficial effects of the technical scheme are as follows:
when the user analysis is carried out on the user watching live broadcast, the user analysis is carried out based on the trusted data extracted from the live broadcast big data, so that the problem that the user analysis error is caused by the fact that unreliable data possibly exist in the live broadcast big data is avoided, and the accuracy of live broadcast software optimization is improved. When the method is applied specifically, a data analysis template is generated based on a user analysis task, user analysis is performed rapidly according to trusted data, and a user analysis result obtained through analysis is output for a worker to check.
In one embodiment, the trusted data extraction is performed on live big data to obtain trusted data, including:
analyzing a plurality of data items in the live big data; the live broadcast big data consists of a plurality of data items, and the data items are obtained through analysis;
the pathway type of the source pathway for obtaining the data item, the pathway type including: active and passive pathways; the source route of which the route type is the active route is a route of locally actively acquiring live broadcast big data, specifically, for example: searching and obtaining from web pages of websites (some websites can release statistical data of content preference types of live broadcast industry); the route type is that the source route of the passive route is that of providing live big data locally in other directions, specifically, for example: a host platform providing, a big data platform (the big data platform can provide a service for collecting live big data) providing and the like;
when the path type is an active path, respectively acquiring data information of the data item and path information of a source path; wherein the data information includes: total data amount of the data items, the proportion of the picture content in the data items and the like; the route information includes: website credibility, total website users, daily average popularity of websites and the like;
based on a preset first trusted quantization template, carrying out trusted quantization on the data information and the path information to obtain a first trusted value; the larger the first credibility value is, the higher the credibility degree of the data information and the route information representing the data item is; the first trusted quantization template is a template for quantizing data information and path information into a first trusted value, specifically, for example: the total data amount of the data items is more than 100KB, the proportion of the picture content in the data items is more than 20%, the website credibility is more than 70, the total number of website users is more than 120, the daily average popularity of the website is more than 35, and the first credibility value is 80;
when the first credibility value is larger than or equal to a preset first credibility threshold value, taking the data item as credible data; the first trusted threshold is specifically, for example: 70; when the first trusted value is greater than or equal to a first trusted threshold, the data item is trusted and is used as trusted data;
when the path type is a passive path, credit information of a source path and guarantee information of the source path for carrying out trusted guarantee on the data item are respectively obtained; wherein the information includes: the lower the quality of live big data historically provided (specifically, for example, providing false information), the lower the credit value of the live big data historically provided by the source route; the vouching information includes: the greater the guarantee force of the source route for carrying out the credible guarantee on the data item, the greater the guarantee value;
based on a preset second trusted quantization template, performing trusted quantization on the credit information and the guarantee information to obtain a second trusted value; wherein the larger the second trusted value, the higher the trusted degree of the credit information and the guarantee information representing the data item; the second trusted quantization template is a template for quantizing credit information and guarantee information into a second trusted value, specifically, for example: the credit value is more than 5, the guarantee value is more than 6, and the second trusted value is 60;
and when the second credibility value is larger than or equal to a preset second credibility threshold value, taking the data item as credible data. The second trusted threshold is specifically, for example: 75; and when the second credibility value is larger than or equal to a second credibility threshold value, the data item is credible and is used as credible data.
The working principle and the beneficial effects of the technical scheme are as follows:
when the embodiment of the invention is used for carrying out the trusted data extraction on the live big data, the trusted data extraction is respectively carried out according to different path types of the data items in the live big data, so that the comprehensiveness and applicability of carrying out the trusted data extraction on the live big data are improved. The first trusted quantization template and the second trusted quantization template are respectively introduced, the first trusted value and the second trusted value are respectively determined rapidly, and the extraction efficiency of extracting the trusted data of the live big data is improved.
In one embodiment, obtaining a user analysis task includes:
acquiring a user analysis task input by a worker; the staff can input the user analysis task by himself;
and/or ("and/or" refers to two ways of obtaining the user analysis task),
interactively acquiring user analysis requirements of staff; the user analysis requirement is a requirement that a user needs to perform user analysis of watching live broadcast, specifically, for example: analyzing what kind of live broadcast users like to watch; interactive acquisition refers to acquisition performed interactively with a user, the specific acquisition means being described in detail in the following examples;
based on the user analysis requirements, a user analysis task is determined. The user analysis task may be determined directly based on the user analysis requirements.
The working principle and the beneficial effects of the technical scheme are as follows:
according to the embodiment of the invention, two ways are introduced to acquire the user analysis task, so that the comprehensiveness of acquiring the user analysis task and the applicability of the system are improved.
In one embodiment, interactively obtaining user analysis requirements of a worker includes:
acquiring a preset authority scene library corresponding to a worker; the authority scene library comprises: the operating scene that the staff set has permission to acquire data, specifically, for example: live background data viewing interfaces, hundred-degree search interfaces and the like;
acquiring generating operation generated in any authority scene in an authority scene library within the latest preset time of a worker and corresponding operation generating time; the preset time is, for example: 20 minutes; the generating operation is as follows: viewing live broadcast content sections with highest popularity at the current live broadcast background data viewing interface, hundred-degree searching on which live broadcast contents are popular, and the like;
setting the generating operation on a preset time axis based on the operation generating time; the time axis is an axis with a plurality of time points arranged from left to right according to time sequence, and when the generating operation is set, the position of the time point corresponding to the generating time of the operation is found on the time axis to be set;
determining a generation operation cluster from the time axis based on the generation operation cluster constraint condition;
determining a predicted demand corresponding to the generated operation cluster from a preset user analysis demand prediction library; the user analysis demand prediction library has prediction demands corresponding to different generation operation clusters, and the demands of users represented by the generation operations in the prediction demand generation operation clusters on analysis of users watching live broadcast are specifically: the generation operation in the generation operation cluster sequentially comprises the steps of checking the live broadcast content version with highest popularity at the current live broadcast background data checking interface and searching for 'which live broadcast contents are popular', and predicting the demand to be the preference of a user for watching live broadcast contents;
generating a user analysis demand selection list based on the predicted demand, and outputting and displaying the user analysis demand selection list; arranging the predicted demands into a table to obtain a user analysis demand selection list;
receiving predicted demands selected by staff from a user analysis demand selection list and taking the predicted demands as user analysis demands;
wherein generating the operational cluster constraint includes:
the generating operation cluster consists of N generating operations on a time axis; n is an integer greater than or equal to 2;
the operation generation time difference between adjacent generation operations in the generation operation cluster is smaller than or equal to a preset time difference threshold; the time difference threshold is, for example: 100 seconds;
the scene relation generated by the generating operation in the generating operation cluster is matched with one standard scene relation in a preset standard scene relation library. The scene relationship may be, for example: live background data viewing interface-hundred-degree search interface and the like; the operations that a worker generates in a rights scene with a standard scene relationship indicate that the more likely the worker has an analysis requirement to analyze a user viewing a live broadcast, such as: the standard scene relationship is a live broadcast background data viewing interface-hundred-degree searching interface, a user can view the live broadcast content version with highest popularity at the live broadcast background data viewing interface, and then hundred-degree searches are popular with the live broadcast content, so that the user can have the requirement of analyzing and analyzing the preference of the user for viewing the live broadcast content.
The working principle and the beneficial effects of the technical scheme are as follows:
when the three conditions, namely the constraint conditions of the generation operation clusters, are met, the fact that the user continuously generates corresponding generation operation in the authority scene which is more likely to react to the analysis demands of the staff in a short time is explained, the generation operation clusters are screened out, the prediction demands are determined based on the user analysis demand prediction library, and the accuracy and efficiency of interactively acquiring the user analysis demands of the staff are improved. And a user analysis demand selection list is generated based on the prediction demand, so that the user can select, interaction is realized, the experience of staff is improved, and meanwhile, the method is more humanized.
In one embodiment, generating a data analysis template based on a user analysis task includes:
based on a preset feature extraction template, performing feature extraction on a user analysis task to obtain a plurality of task features; the task features include: there are analysis tasks that analyze the user watching the live, specifically, for example: analyzing content preference of a user for watching live broadcast;
constructing a feature description vector of a user analysis task based on task features; the feature description vector is constructed by task features in a vector form;
determining at least one first data analysis rule corresponding to the feature description vector from a preset data analysis rule base; the data analysis rule base has first data analysis rules corresponding to different feature description vectors, wherein the first data analysis rules are rules for executing analysis tasks of users for analyzing and watching live broadcast in the feature description vectors, specifically, for example: analyzing which type of content live broadcast is watched by the user in the watching content history of live broadcast, and the like;
determining at least one second data analysis rule corresponding to the feature description vector from a preset preference data analysis rule base corresponding to the staff; the preference data analysis rules include a second data analysis rule corresponding to a staff preference of an analysis task of a user who analyzes and views live broadcast in different feature description vectors, specifically, for example: analyzing the live broadcast content type of the top ranking 5 of the number of watched persons from the live broadcast watching content history watched by the user;
based on the complementary processing rules, carrying out complementary processing on the first data analysis rules and the second data analysis rules to obtain a plurality of third data analysis rules;
generating a data analysis template based on the third data analysis rule; when the data analysis template is generated, third data analysis rules are arranged one by one to obtain the data analysis template;
wherein the complementary processing rules include:
when one data analysis rule of the same type exists in each of the first data analysis rule and the second data analysis rule, respectively acquiring a preset universal preference value corresponding to the data analysis rule of the same type and a personal preference value of staff for the data analysis rule of the same type; among the types of the first data analysis rule and the second data analysis rule are, for example: analyzing content preference of a user for watching live broadcast; the universal preference value represents the preference degree of most staff for the same type of data analysis rule, and the larger the universal preference value is, the higher the preference degree is; the personal preference value is the preference degree of the staff per se for the data analysis rule of the same type, and the larger the personal preference value is, the higher the preference degree is;
when the personality preference value is greater than or equal to a preset personality preference threshold value and/or the difference value of the personality preference value greater than Yu Pushi preference value is greater than or equal to a preset difference value threshold value, removing the same type of data analysis rules from the first data analysis rules; wherein, the personality preference threshold is specific, such as: 75; when the personality preference value is greater than or equal to the personality preference threshold, indicating that the staff member particularly prefers the data analysis rules of the same type, and therefore eliminating the data analysis rules of the same type in the first data analysis rules; when the difference value of the individual preference value is larger than Yu Pushi preference value and is larger than or equal to a preset difference value threshold, the preference degree of the staff to the same type of data analysis rules is far higher than the preference degree of most staff to the same type of data analysis rules, so that the same type of data analysis rules in the first data analysis rules are eliminated;
the remaining first data analysis rule and second data analysis rule are used as third data analysis rules.
The working principle and the beneficial effects of the technical scheme are as follows:
based on the user analysis task, when the data analysis template is generated, not only the user preference factor is considered, but also the user preference degree, namely the individual preference value, is considered, and the first data analysis rule and the second data analysis rule are subjected to complementary processing, so that the rationality of the generation of the data analysis template is improved. In addition, the complementary processing rules are introduced, the reliable data can be analyzed only by adopting the data analysis rules of the same type which are particularly preferred by the user, and the result which is preferred by the user is output, and the reliable data can be analyzed by adopting the data analysis rules of the same type which are preferred by most staff, and the result which is preferred by the user is output for the user to refer to, so that the method is more humanized.
The embodiment of the invention provides a user analysis system based on live big data, as shown in fig. 2, which is characterized by comprising the following steps:
the live broadcast big data searching module 1 is used for searching and broadcasting big data;
the trusted data extraction module 2 is used for extracting the trusted data of the live big data to obtain the trusted data;
the user analysis task acquisition module 3 is used for acquiring a user analysis task;
the data analysis template generation module 4 is used for generating a data analysis template based on the user analysis task;
and the user analysis module 5 is used for carrying out user analysis according to the trusted data based on the data analysis template to obtain a user analysis result.
In one embodiment, the trusted data extraction module 2 performs trusted data extraction on live big data to obtain trusted data, including:
analyzing a plurality of data items in the live big data;
the pathway type of the source pathway for obtaining the data item, the pathway type including: active and passive pathways;
when the path type is an active path, respectively acquiring data information of the data item and path information of a source path;
based on a preset first trusted quantization template, carrying out trusted quantization on the data information and the path information to obtain a first trusted value;
when the first credibility value is larger than or equal to a preset first credibility threshold value, taking the data item as credible data;
when the path type is a passive path, credit information of a source path and guarantee information of the source path for carrying out trusted guarantee on the data item are respectively obtained;
based on a preset second trusted quantization template, performing trusted quantization on the credit information and the guarantee information to obtain a second trusted value;
and when the second credibility value is larger than or equal to a preset second credibility threshold value, taking the data item as credible data.
In one embodiment, the user analysis task acquisition module 3 acquires user analysis tasks, including:
acquiring a user analysis task input by a worker;
and/or the number of the groups of groups,
interactively acquiring user analysis requirements of staff;
based on the user analysis requirements, a user analysis task is determined.
In one embodiment, the user analysis task acquisition module 3 interactively acquires user analysis requirements of the staff member, including:
acquiring a preset authority scene library corresponding to a worker;
acquiring generating operation generated in any authority scene in an authority scene library within the latest preset time of a worker and corresponding operation generating time;
setting the generating operation on a preset time axis based on the operation generating time;
determining a generation operation cluster from the time axis based on the generation operation cluster constraint condition;
determining a predicted demand corresponding to the generated operation cluster from a preset user analysis demand prediction library;
generating a user analysis demand selection list based on the predicted demand, and outputting and displaying the user analysis demand selection list;
receiving predicted demands selected by staff from a user analysis demand selection list and taking the predicted demands as user analysis demands;
wherein generating the operational cluster constraint includes:
the generating operation cluster consists of N generating operations on a time axis; n is an integer greater than or equal to 2;
the operation generation time difference between adjacent generation operations in the generation operation cluster is smaller than or equal to a preset time difference threshold;
the scene relation generated by the generating operation in the generating operation cluster is matched with one standard scene relation in a preset standard scene relation library.
In one embodiment, the data analysis template generation module 4 generates a data analysis template based on the user analysis task, including:
based on a preset feature extraction template, performing feature extraction on a user analysis task to obtain a plurality of task features;
constructing a feature description vector of a user analysis task based on task features;
determining at least one first data analysis rule corresponding to the feature description vector from a preset data analysis rule base;
determining at least one second data analysis rule corresponding to the feature description vector from a preset preference data analysis rule base corresponding to the staff;
based on the complementary processing rules, carrying out complementary processing on the first data analysis rules and the second data analysis rules to obtain a plurality of third data analysis rules;
generating a data analysis template based on the third data analysis rule;
wherein the complementary processing rules include:
when one data analysis rule of the same type exists in each of the first data analysis rule and the second data analysis rule, respectively acquiring a preset universal preference value corresponding to the data analysis rule of the same type and a personal preference value of staff for the data analysis rule of the same type;
when the personality preference value is greater than or equal to a preset personality preference threshold value and/or the difference value of the personality preference value greater than Yu Pushi preference value is greater than or equal to a preset difference value threshold value, removing the same type of data analysis rules from the first data analysis rules;
the remaining first data analysis rule and second data analysis rule are used as third data analysis rules.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (4)

1. The user analysis method based on the live big data is characterized by comprising the following steps:
searching and broadcasting big data;
extracting the trusted data from the live big data to obtain the trusted data;
acquiring a user analysis task;
generating a data analysis template based on the user analysis task;
based on the data analysis template, user analysis is carried out according to the trusted data to obtain user analysis results,
the obtaining the user analysis task comprises the following steps:
interactively acquiring user analysis requirements of staff;
determining the user analysis task based on the user analysis requirements,
the interactive obtaining of the user analysis requirement of the staff includes:
acquiring a preset authority scene library corresponding to a worker;
acquiring generating operation generated in any authority scene in the authority scene library within the latest preset time of a worker and corresponding operation generating time;
setting the generating operation on a preset time axis based on the operation generating time;
determining a generation operation cluster from the time axis based on a generation operation cluster constraint condition;
determining a predicted demand corresponding to the generated operation cluster from a preset user analysis demand prediction library;
generating a user analysis demand selection list based on the predicted demand, and outputting and displaying the user analysis demand selection list;
receiving the predicted demand selected by a worker from the user analysis demand selection list, and taking the predicted demand as the user analysis demand;
wherein the generating operational cluster constraints comprises:
said cluster of generating operations consists of N consecutive generating operations on said time axis; n is an integer greater than or equal to 2;
the operation generation time difference between adjacent generation operations in the generation operation cluster is smaller than or equal to a preset time difference threshold;
the scene relation between every two rights scenes generated by the generating operation in the generating operation cluster is matched with one standard scene relation in a preset standard scene relation library;
the step of extracting the trusted data from the live big data to obtain the trusted data comprises the following steps:
analyzing a plurality of data items in the live big data;
a pathway type of a pathway from which the data item was obtained, the pathway type comprising: active and passive pathways;
when the path type is the active path, respectively acquiring the data information of the data item and the path information of the source path;
based on a preset first trusted quantization template, carrying out trusted quantization on the data information and the path information to obtain a first trusted value;
when the first credibility value is larger than or equal to a preset first credibility threshold value, the data item is used as the credible data;
when the path type is the passive path, respectively acquiring credit information of the source path and guarantee information of the source path for carrying out trusted guarantee on the data item;
based on a preset second trusted quantization template, performing trusted quantization on the credit information and the guarantee information to obtain a second trusted value;
and when the second credibility value is larger than or equal to a preset second credibility threshold value, the data item is used as the credible data.
2. The live big data based user analysis method of claim 1, wherein the generating a data analysis template based on the user analysis task comprises:
performing feature extraction on the user analysis task based on a preset feature extraction template to obtain a plurality of task features;
constructing a feature description vector of the user analysis task based on the task features;
determining at least one first data analysis rule corresponding to the feature description vector from a preset data analysis rule base;
determining at least one second data analysis rule corresponding to the feature description vector from a preset preference data analysis rule base corresponding to the staff;
performing complementary processing on the first data analysis rule and the second data analysis rule based on the complementary processing rule to obtain a plurality of third data analysis rules;
generating the data analysis template based on the third data analysis rule;
wherein the complementary processing rules include:
when one data analysis rule of the same type exists in each of the first data analysis rule and the second data analysis rule, respectively acquiring a preset universal preference value corresponding to the data analysis rule of the same type and a personal preference value of staff for the data analysis rule of the same type;
when the personality preference value is greater than or equal to a preset personality preference threshold value and/or the difference value of the personality preference value greater than or equal to a preset difference threshold value, the same type of data analysis rules are removed from the first data analysis rules;
and taking the rest of the first data analysis rules and the second data analysis rules as the third data analysis rules.
3. A user analysis system based on live big data, comprising:
the live broadcast big data searching module is used for searching and broadcasting big data;
the trusted data extraction module is used for extracting the trusted data from the live big data to obtain the trusted data;
the user analysis task acquisition module is used for acquiring user analysis tasks;
the data analysis template generation module is used for generating a data analysis template based on the user analysis task;
the user analysis module is used for carrying out user analysis according to the trusted data based on the data analysis template to obtain a user analysis result,
the user analysis task acquisition module acquires a user analysis task, including:
interactively acquiring user analysis requirements of staff;
determining the user analysis task based on the user analysis requirements,
the user analysis task acquisition module interactively acquires the user analysis requirements of the staff, and comprises the following steps:
acquiring a preset authority scene library corresponding to a worker;
acquiring generating operation generated in any authority scene in the authority scene library within the latest preset time of a worker and corresponding operation generating time;
setting the generating operation on a preset time axis based on the operation generating time;
determining a generation operation cluster from the time axis based on a generation operation cluster constraint condition;
determining a predicted demand corresponding to the generated operation cluster from a preset user analysis demand prediction library;
generating a user analysis demand selection list based on the predicted demand, and outputting and displaying the user analysis demand selection list;
receiving the predicted demand selected by a worker from the user analysis demand selection list, and taking the predicted demand as the user analysis demand;
wherein the generating operational cluster constraints comprises:
said cluster of generating operations consists of N consecutive generating operations on said time axis; n is an integer greater than or equal to 2;
the operation generation time difference between adjacent generation operations in the generation operation cluster is smaller than or equal to a preset time difference threshold;
the scene relation between every two rights scenes generated by the generating operation in the generating operation cluster is matched with one standard scene relation in a preset standard scene relation library;
the trusted data extraction module performs trusted data extraction on the live big data to obtain trusted data, and the trusted data extraction module comprises:
analyzing a plurality of data items in the live big data;
a pathway type of a pathway from which the data item was obtained, the pathway type comprising: active and passive pathways;
when the path type is the active path, respectively acquiring the data information of the data item and the path information of the source path;
based on a preset first trusted quantization template, carrying out trusted quantization on the data information and the path information to obtain a first trusted value;
when the first credibility value is larger than or equal to a preset first credibility threshold value, the data item is used as the credible data;
when the path type is the passive path, respectively acquiring credit information of the source path and guarantee information of the source path for carrying out trusted guarantee on the data item;
based on a preset second trusted quantization template, performing trusted quantization on the credit information and the guarantee information to obtain a second trusted value;
and when the second credibility value is larger than or equal to a preset second credibility threshold value, the data item is used as the credible data.
4. The live big data based user analysis system of claim 3, wherein the data analysis template generation module generates a data analysis template based on the user analysis task, comprising:
performing feature extraction on the user analysis task based on a preset feature extraction template to obtain a plurality of task features;
constructing a feature description vector of the user analysis task based on the task features;
determining at least one first data analysis rule corresponding to the feature description vector from a preset data analysis rule base;
determining at least one second data analysis rule corresponding to the feature description vector from a preset preference data analysis rule base corresponding to the staff;
performing complementary processing on the first data analysis rule and the second data analysis rule based on the complementary processing rule to obtain a plurality of third data analysis rules;
generating the data analysis template based on the third data analysis rule;
wherein the complementary processing rules include:
when one data analysis rule of the same type exists in each of the first data analysis rule and the second data analysis rule, respectively acquiring a preset universal preference value corresponding to the data analysis rule of the same type and a personal preference value of staff for the data analysis rule of the same type;
when the personality preference value is greater than or equal to a preset personality preference threshold value and/or the difference value of the personality preference value greater than or equal to a preset difference threshold value, the same type of data analysis rules are removed from the first data analysis rules;
and taking the rest of the first data analysis rules and the second data analysis rules as the third data analysis rules.
CN202310819540.1A 2023-07-05 2023-07-05 User analysis method and system based on live big data Active CN116842211B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310819540.1A CN116842211B (en) 2023-07-05 2023-07-05 User analysis method and system based on live big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310819540.1A CN116842211B (en) 2023-07-05 2023-07-05 User analysis method and system based on live big data

Publications (2)

Publication Number Publication Date
CN116842211A CN116842211A (en) 2023-10-03
CN116842211B true CN116842211B (en) 2024-03-15

Family

ID=88164881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310819540.1A Active CN116842211B (en) 2023-07-05 2023-07-05 User analysis method and system based on live big data

Country Status (1)

Country Link
CN (1) CN116842211B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102956009A (en) * 2011-08-16 2013-03-06 阿里巴巴集团控股有限公司 Electronic commerce information recommending method and electronic commerce information recommending device on basis of user behaviors
CN105590055A (en) * 2014-10-23 2016-05-18 阿里巴巴集团控股有限公司 Method and apparatus for identifying trustworthy user behavior in network interaction system
CN107818491A (en) * 2017-09-30 2018-03-20 平安科技(深圳)有限公司 Electronic installation, Products Show method and storage medium based on user's Internet data
CN107958020A (en) * 2017-10-24 2018-04-24 中国南方电网有限责任公司超高压输电公司检修试验中心 It is a kind of based on cluster electric network data processing and data visualization method
CN111194005A (en) * 2019-12-05 2020-05-22 中国科学院地理科学与资源研究所 Indoor pedestrian semantic position extraction method and prediction method
CN111932427A (en) * 2020-09-24 2020-11-13 北京泰策科技有限公司 Method and system for detecting emergent public security incident based on multi-mode data
CN112380190A (en) * 2020-11-27 2021-02-19 北京三维天地科技股份有限公司 Data quality health degree analysis method and system based on multidimensional analysis technology
CN113486983A (en) * 2021-08-02 2021-10-08 东莞市道滘洪诺计算机技术开发服务中心 Big data office information analysis method and system for anti-fraud processing
KR20220000436A (en) * 2020-06-25 2022-01-04 윤성종 Social big data analysis report automatic provision system using big data and artificial intelligence
CN114530038A (en) * 2022-01-11 2022-05-24 江苏大学 Travel interest region extraction method and system based on spatio-temporal data clustering
KR20220077184A (en) * 2020-11-30 2022-06-09 가천대학교 산학협력단 System and method for log anomaly detection using bayesian probability and closed pattern mining method and computer program for the same
CN114664401A (en) * 2022-03-31 2022-06-24 夏文江 User demand analysis method and system serving intelligent medical big data
CN115563196A (en) * 2022-09-09 2023-01-03 上海市大数据股份有限公司 Method and system for enhancing object information value based on multi-source data
CN116090789A (en) * 2023-03-03 2023-05-09 麦高(广东)数字科技有限公司 Lean manufacturing production management system and method based on data analysis
CN116186692A (en) * 2023-01-11 2023-05-30 中央民族大学 Safety protection system and method for electronic economic data

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102956009A (en) * 2011-08-16 2013-03-06 阿里巴巴集团控股有限公司 Electronic commerce information recommending method and electronic commerce information recommending device on basis of user behaviors
CN105590055A (en) * 2014-10-23 2016-05-18 阿里巴巴集团控股有限公司 Method and apparatus for identifying trustworthy user behavior in network interaction system
CN107818491A (en) * 2017-09-30 2018-03-20 平安科技(深圳)有限公司 Electronic installation, Products Show method and storage medium based on user's Internet data
CN107958020A (en) * 2017-10-24 2018-04-24 中国南方电网有限责任公司超高压输电公司检修试验中心 It is a kind of based on cluster electric network data processing and data visualization method
CN111194005A (en) * 2019-12-05 2020-05-22 中国科学院地理科学与资源研究所 Indoor pedestrian semantic position extraction method and prediction method
KR20220000436A (en) * 2020-06-25 2022-01-04 윤성종 Social big data analysis report automatic provision system using big data and artificial intelligence
CN111932427A (en) * 2020-09-24 2020-11-13 北京泰策科技有限公司 Method and system for detecting emergent public security incident based on multi-mode data
CN112380190A (en) * 2020-11-27 2021-02-19 北京三维天地科技股份有限公司 Data quality health degree analysis method and system based on multidimensional analysis technology
KR20220077184A (en) * 2020-11-30 2022-06-09 가천대학교 산학협력단 System and method for log anomaly detection using bayesian probability and closed pattern mining method and computer program for the same
CN113486983A (en) * 2021-08-02 2021-10-08 东莞市道滘洪诺计算机技术开发服务中心 Big data office information analysis method and system for anti-fraud processing
CN114530038A (en) * 2022-01-11 2022-05-24 江苏大学 Travel interest region extraction method and system based on spatio-temporal data clustering
CN114664401A (en) * 2022-03-31 2022-06-24 夏文江 User demand analysis method and system serving intelligent medical big data
CN115563196A (en) * 2022-09-09 2023-01-03 上海市大数据股份有限公司 Method and system for enhancing object information value based on multi-source data
CN116186692A (en) * 2023-01-11 2023-05-30 中央民族大学 Safety protection system and method for electronic economic data
CN116090789A (en) * 2023-03-03 2023-05-09 麦高(广东)数字科技有限公司 Lean manufacturing production management system and method based on data analysis

Also Published As

Publication number Publication date
CN116842211A (en) 2023-10-03

Similar Documents

Publication Publication Date Title
US11714816B2 (en) Information search method and apparatus, device and storage medium
US8065619B2 (en) Customized today module
CN106326391B (en) Multimedia resource recommendation method and device
US9202523B2 (en) Method and apparatus for providing information related to broadcast programs
JP5512831B2 (en) Marker search system for augmented reality services
CN112115299A (en) Video searching method and device, recommendation method, electronic device and storage medium
US20060004528A1 (en) Apparatus and method for extracting similar source code
JP5469331B2 (en) RECOMMENDATION INFORMATION GENERATION DEVICE AND RECOMMENDATION INFORMATION GENERATION METHOD
JP2013088832A (en) Information processing device, information processing method, and program
CN102426577A (en) Information processing apparatus, information processing system, information processing method, and program
JP6816214B2 (en) AI Headline News
US8943012B2 (en) Information processing device, information processing method, and program
KR101873339B1 (en) System and method for providing interest contents
CN112241327A (en) Shared information processing method and device, storage medium and electronic equipment
CN110704677A (en) Program recommendation method and device, readable storage medium and terminal equipment
CN109213933B (en) Content item recommendation method, device, equipment and storage medium
Van Damme et al. Machine learning based content-agnostic viewport prediction for 360-degree video
CN116842211B (en) User analysis method and system based on live big data
EP2741507B1 (en) Video processing system, method of determining viewer preference, video processing apparatus, and control method and control program therefor
JP2012242844A (en) Recommendation information generation device and recommendation information generation method
US20160085814A1 (en) Information processing apparatus, information processing method, and program
WO2012132427A1 (en) Recommendation device, recommendation system, recommendation method, and program
KR102159715B1 (en) Ai headline news
CN108366276B (en) Viewing preference analysis method and system
CN117171432B (en) Data pushing method of client APP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant