CN116245555B - User information collecting and analyzing system based on big data - Google Patents

User information collecting and analyzing system based on big data Download PDF

Info

Publication number
CN116245555B
CN116245555B CN202310222160.XA CN202310222160A CN116245555B CN 116245555 B CN116245555 B CN 116245555B CN 202310222160 A CN202310222160 A CN 202310222160A CN 116245555 B CN116245555 B CN 116245555B
Authority
CN
China
Prior art keywords
user
browsing
health
information
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310222160.XA
Other languages
Chinese (zh)
Other versions
CN116245555A (en
Inventor
庞清瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhangjiakou Qiaogong Technology Service Co ltd
Original Assignee
Zhangjiakou Qiaogong Technology Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhangjiakou Qiaogong Technology Service Co ltd filed Critical Zhangjiakou Qiaogong Technology Service Co ltd
Priority to CN202310222160.XA priority Critical patent/CN116245555B/en
Publication of CN116245555A publication Critical patent/CN116245555A/en
Application granted granted Critical
Publication of CN116245555B publication Critical patent/CN116245555B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Databases & Information Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The application discloses a user information collection analysis system based on big data, in particular to the technical field of user big data analysis, which comprises a user information collection module, a user internet surfing time health degree coefficient acquisition module, a user network speaking health degree coefficient acquisition module, a user browsing information analysis module, a user network behavior health degree evaluation module and a user health degree development analysis module.

Description

User information collecting and analyzing system based on big data
Technical Field
The application relates to the technical field of big data analysis of users, in particular to a big data-based user information collection and analysis system.
Background
With the popularity of the internet and the increasing availability of large data processing technologies, users have left more and more "traces" in network platforms, including text, pictures, video, and browsing traces. And collecting the traces to obtain big data, and mining the information of the user from the collected information through big data processing. The definition of health includes physical health and mental health, and the development of modern medical treatment basically ensures that people can see medical treatment with diseases, but the health importance of the mental level and the measures taken for the mental health are not enough.
Healthy network behavior is that a user reasonably plans internet surfing time, browses active information from a network by utilizing network learning, leaves an active network theory, builds an active network environment together, and the psychological health of the user can be continuously and positively developed. But there is currently a lack of analysis and assessment of the health of the network behavior, resulting in an inability to understand the health of the network behavior of the user.
How to acquire, evaluate and monitor the network behavior from the network behavior of the user is a major concern for the manager.
Disclosure of Invention
In order to overcome the problems in the prior art, the application provides a user information collection and analysis system based on big data, which is used for obtaining big data by collecting user network behavior information, mining a user internet surfing time health degree coefficient, a user network speaking health degree coefficient and a user browsing health degree coefficient from the big data, evaluating the health degree of the user network behavior, and judging the user network behavior based on a health degree value so as to solve the problem that the network behavior health degree of the user is difficult to obtain and quantify in the background art.
In order to achieve the above purpose, the present application provides the following technical solutions: the user information collecting and analyzing system based on big data comprises a user information collecting module, a user internet surfing time health degree coefficient obtaining module, a user network speaking health degree coefficient obtaining module, a user browsing information analyzing module, a user network behavior health degree evaluating module and a user health degree developing and analyzing module, wherein the user information collecting module collects information of active users by a method of installing a monitor in a user internet surfing platform, the collected information is stored in a database in a user unit, and each user unit comprises four categories of user basic information, internet surfing time information, network speaking information and browsing information;
the user surfing time health coefficient acquisition module acquires surfing time information of a user from a database, wherein the surfing time information comprises total online time of the user in a day, online learning time of the user in a day, online stay-up time of the user in a day and analysis to obtain a surfing time health coefficient epsilon of the user 1 The coefficient epsilon of health assessment is higher with longer on-line time, lower learning time and night time 1 The lower;
the user network speaking health coefficient acquisition module acquires the speaking health coefficient from a databaseAcquiring speaking information of a user in a network, including network speaking documents of the user in one day, and analyzing to obtain a network speaking health coefficient epsilon of the user 2 The speaking document comprises n word vectors, and emotion coefficient y and emotion health coefficient epsilon of each word vector are obtained based on a text emotion analysis model 2 Satisfy the formulaWherein y is i An emotion coefficient representing an ith word vector;
the user browsing information analysis module obtains browsing information of a user in a network from data, wherein the browsing information comprises browsing duration, browsing content and browsing emotion direction in one day, and analyzes and obtains a user browsing health coefficient epsilon 3
The user network behavior health degree evaluation module is used for evaluating the network behavior health degree of the user in one day, and the network behavior health degree JKD in one day meets the formulaWherein w is 1 、w 2 、w 3 Weight factors respectively representing the user Internet surfing time health degree, the user network speaking health degree and the user browsing health degree for evaluating the user network behavior health degree, and w 1 +w 2 +w 3 The greater the network behavior health value, the healthier the user network behavior is indicated.
In a preferred embodiment, the user information collecting module comprises an information screening unit and a user network behavior collecting unit, wherein the information screening unit screens active users based on screening conditions, and the screening conditions are the number of active days of the users on line each month and the data size of the user network behavior; the user behavior collection unit user collects user internet surfing time information, user network speaking information and user browsing information.
In a preferred embodiment, the user surfing time health factor epsilon 1 Satisfy the formulaWherein t represents the total online time of the user, t a Representing the duration spent by the user on learning, t b Indicating the time period for the user to stay up at night, k 1 、k 2 、k 3 The influence factors of the online time period, the influence factors of the online learning time and the influence factors of the online night time period are respectively shown.
In a preferred embodiment, the user browsing information analysis module screens and obtains L pieces of browsing information in a time interval of a user within one day by setting a browsing time interval, separates the browsing information into positive browsing information and negative browsing information according to keywords and browsing emotion directions of each piece of browsing information, adds browsing time durations of all positive browsing information, adds browsing time durations of all negative browsing information, and the user browsing health coefficient epsilon 3 Satisfy the formulaThe sum of the times of the positive browsing information is denoted by Zt, and the sum of the times of the negative browsing information is denoted by Ft.
In a preferred embodiment, the user browsing preference portrait construction unit transmits the obtained browsing time and keywords to the user browsing habit analysis subunit by extracting keywords of each browsing content; the user browsing habit analysis subunit is used for analyzing the preference of the user during browsing, analyzing the browsing preference of the user based on the browsing time and the keyword information, classifying all keywords through a clustering algorithm, and then calculating the occurrence frequency of each category of keywords, wherein the occurrence frequency is the accumulated browsing duration of the keywords divided by the total duration, and establishing a user browsing preference portrait according to the occurrence frequency of the keywords.
In a preferred embodiment, the user health development evaluation module comprises a user current month health development state evaluation unit, a user health trend evaluation unit and a user health comprehensive evaluation unit, wherein the user current month health development state evaluation unit is used for evaluating the ratio of unhealthy network behavior values to healthy network behavior values of the user in one month; the user health degree trend evaluation unit is used for evaluating the change trend of unhealthy network behavior values of each month of a user; the comprehensive user health evaluation unit is used for evaluating the network behavior development state of each month of the user.
In a preferred embodiment, the user health development evaluation module is used for analyzing and evaluating the development state of the network behavior health of the user, and comprises the following steps:
s1, acquiring daily network behavior health of a user to obtain a data set (di, z) of the current month behavior health, wherein the data set (di, z) is used for di to represent the network behavior health of the user on the i th day;
s2, establishing a rectangular coordinate system, marking the data set in the coordinate system, wherein the x-axis represents time, the y-axis represents network behavior health degree, and the data set of the current month behavior health degree is corresponding to the rectangular coordinate system to obtain discrete points, and fitting the discrete points into a function curve;
s3, acquiring a user current month health degree development state coefficient, calculating an area S surrounded by a function curve and a coordinate axis, representing a healthy network behavior value by an area Sa in a first quadrant, and representing an unhealthy network behavior value by an area Sb in a fourth quadrant, and obtaining the user current month health degree development state coefficientIf gamma is 1 > 1 represents unhealthy network behavior in the current month, and the greater γ1 is, the higher unhealthy network behavior in the current month is;
s4, acquiring a user health degree trend coefficient, and acquiring unhealthy network behavior values of the user in each month to obtain a data set B, wherein the data set B is recorded asCalculating the growth rate of unhealthy network behavior values in each month to obtain a user health trend coefficient ++>Wherein S is n+1 Representing the unhealthy network behavior value of the current month, sn representing the unhealthy network behavior value of the previous month, if gamma 2 > 0 represents the trend of user healthUnhealthy, the greater γ2 the higher the unhealthy degree of network behavior in the current month;
s5, comprehensively evaluating the health degree of the user, comprehensively evaluating the health degree development condition of the user based on gamma 1 and gamma 2 obtained in the steps S3 and S4, wherein the evaluation parameter of the health development condition of the user is G, and the requirements are metWherein k is 4 、k 5 Is constant.
In a preferred embodiment, the text emotion analysis model is used for analyzing emotion coefficients of a network speaking document, the text emotion analysis model is based on a neural network algorithm and comprises an input layer, an implicit layer and a classified output layer, wherein the input layer is used for inputting a word vector x included in the speaking document, the output layer outputs an emotion type y corresponding to the word vector through a classifier, the emotion implicit layer is used for extracting features in the word vector and comprises n layers of neurons, the neurons output the neurons through an activation function, and the implicit layer satisfies the formula:where xi represents the input word vector, wi represents the connection weight of the ith neuron, b is the activation threshold, and f represents the activation function.
In a preferred embodiment, the text emotion analysis model comprises the steps of:
s11, extracting document keywords and emotion words: extracting user language documents from a database, splitting the documents into words by using regular expressions, filtering nonsensical words in the words to obtain an effective word set A in the documents, obtaining a high-frequency word set B in all files, subtracting intersection of the set A and the set B from the set A to obtain a keyword set C in the documents, and marking emotion words in the documents;
s12, speaking and vectorizing expression: expressing words in each sentence as a word vector, namely expressing each word as a vector in a high-dimensional space, wherein the dimension of the word vector is between 260 and 280, and obtaining n word vectors;
s13, extracting features: inputting the word vector into a neural network model to extract features, adding weight to the emotion words and the keywords obtained in the step S1 during feature extraction, extracting features P (X, Y) in the word vector, wherein the features P (X, Y) represent joint probability distribution of sample features X and categories to which the samples belong, and extracting the features by using a continuous word bag model;
s14, classifying an emotion classifier: and (3) dividing the characteristics obtained in the step (S3) into 4 gradient emotion states which are respectively negative, normal and positive by using a classifier, wherein the emotion states are represented by numerical quantization and are sequentially-1 score, 0 score and 1 score, so as to obtain the emotion numerical value y of the word vector.
The application has the technical effects and advantages that:
the application provides an analysis method of network behavior health degree, which respectively obtains influence coefficients of network behavior health degree from three dimensions of internet surfing time, network speaking and browsing information, and finally carries out comprehensive evaluation to obtain a network behavior health degree score of a user.
Drawings
Fig. 1 is a block diagram of a system architecture of the present application.
FIG. 2 is a flowchart of the user health development evaluation according to the present application.
FIG. 3 is a flow chart of a text emotion analysis model of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "module," "system," and the like as used herein are intended to include a computer-related entity, such as, but not limited to, hardware, firmware, a combination of hardware and software, or software in execution. For example, a module may be, but is not limited to: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a module. One or more modules may be located in one process and/or thread of execution, and one module may be located on one computer and/or distributed between two or more computers.
The embodiment provides a user information collection and analysis system based on big data as shown in fig. 1, which comprises a user information collection module, a user internet surfing time health degree coefficient acquisition module, a user network speaking health degree coefficient acquisition module, a user browsing information analysis module, a user network behavior health degree evaluation module and a user health degree development analysis module, wherein the user information collection module collects information of active users by a method of installing a monitor in a user internet surfing platform, the collected information is stored in a database in a user unit, and each user unit comprises four categories of user basic information, internet surfing time information, network speaking information and browsing information;
the user surfing time health coefficient acquisition module acquires surfing time information of a user from a database, wherein the surfing time information comprises total online time of the user in a day, online learning time of the user in a day, online stay-up time of the user in a day and analysis to obtain a surfing time health coefficient epsilon of the user 1 The coefficient epsilon of health assessment is higher with longer on-line time, lower learning time and night time 1 The lower;
the user network speaking health degree coefficient acquisition module acquires the speaking information of the user in the network from the database, wherein the speaking information comprises network speaking documents of the user in one day, and the network speaking health degree coefficient epsilon of the user is obtained through analysis 2 The speaking document comprises n word vectors, and emotion coefficient y and emotion health coefficient epsilon of each word vector are obtained based on a text emotion analysis model 2 Satisfy the formulaWherein y is i An emotion coefficient representing an ith word vector;
the user browsing information analysis module obtains browsing information of a user in a network from data, wherein the browsing information comprises browsing duration, browsing content and browsing emotion direction in one day, and analyzes and obtains a user browsing health coefficient epsilon 3
The user network behavior health degree evaluation module is used for evaluating the network behavior health degree of the user in one day, and the network behavior health degree JKD in one day meets the formulaWherein w is 1 、w 2 、w 3 Weight factors respectively representing the user Internet surfing time health degree, the user network speaking health degree and the user browsing health degree for evaluating the user network behavior health degree, and w 1 +w 2 +w 3 The greater the network behavior health value, the healthier the user network behavior is indicated.
Further, the user information collection module comprises an information screening unit and a user network behavior collection unit, wherein the information screening unit screens active users based on screening conditions, and the screening conditions are the number of active days of the users on line per month and the data size of the user network behavior; the user behavior collection unit user collects user internet surfing time information, user network speaking information and user browsing information.
Further, the user surfing time health degree coefficient epsilon 1 Satisfy the formulaWherein t represents the total online time of the user, t a Representing the duration spent by the user on learning, t b Indicating the time period for the user to stay up at night, k 1 、k 2 、k 3 The influence factors of the online time period, the influence factors of the online learning time and the influence factors of the online night time period are respectively shown.
Further, the user browses messagesThe information analysis module screens and obtains L pieces of browsing information in a time interval of a user within one day by setting a browsing time interval, divides the browsing information into positive browsing information and negative browsing information according to the keyword and browsing emotion direction of each piece of browsing information, adds the browsing time durations of all the positive browsing information, adds the browsing time durations of all the negative browsing information, and the user browses the health coefficient epsilon 3 Satisfy the formulaThe sum of the times of the positive browsing information is denoted by Zt, and the sum of the times of the negative browsing information is denoted by Ft.
Further, the user browsing preference portrait construction unit is used for extracting keywords of each browsing content and transmitting the obtained browsing time and keywords to the user browsing habit analysis subunit; the user browsing habit analysis subunit is used for analyzing the preference of the user during browsing, analyzing the browsing preference of the user based on the browsing time and the keyword information, classifying all keywords through a clustering algorithm, and then calculating the occurrence frequency of each category of keywords, wherein the occurrence frequency is the accumulated browsing duration of the keywords divided by the total duration, and establishing a user browsing preference portrait according to the occurrence frequency of the keywords.
Further, the user health degree development evaluation module comprises a user current month health degree development state evaluation unit, a user health degree trend evaluation unit and a user health degree comprehensive evaluation unit, wherein the user current month health degree development state evaluation unit is used for evaluating the ratio of unhealthy network behavior values to healthy network behavior values of a user in one month; the user health degree trend evaluation unit is used for evaluating the change trend of unhealthy network behavior values of each month of a user; the comprehensive user health evaluation unit is used for evaluating the network behavior development state of each month of the user.
Further, as shown in fig. 2, the user health development evaluation module is configured to analyze and evaluate a development state of the network behavior health of the user, and includes the following steps:
s1, acquiring daily network behavior health of a user to obtain a data set (di, z) of the current month behavior health, wherein the data set (di, z) is used for di to represent the network behavior health of the user on the i th day;
s2, establishing a rectangular coordinate system, marking the data set in the coordinate system, wherein the x-axis represents time, the y-axis represents network behavior health degree, and the data set of the current month behavior health degree is corresponding to the rectangular coordinate system to obtain discrete points, and fitting the discrete points into a function curve;
s3, acquiring a user current month health degree development state coefficient, calculating an area S surrounded by a function curve and a coordinate axis, representing a healthy network behavior value by an area Sa in a first quadrant, and representing an unhealthy network behavior value by an area Sb in a fourth quadrant, and obtaining the user current month health degree development state coefficientIf gamma is 1 > 1 represents unhealthy network behavior in the current month, and the greater γ1 is, the higher unhealthy network behavior in the current month is;
s4, acquiring a user health degree trend coefficient, and acquiring unhealthy network behavior values of the user in each month to obtain a data set B, wherein the data set B is recorded asCalculating the growth rate of unhealthy network behavior values in each month to obtain a user health trend coefficient ++>Wherein S is n+1 Representing the unhealthy network behavior value of the current month, sn representing the unhealthy network behavior value of the previous month, if gamma 2 The more than 0 is the unhealthy trend of the user health degree, the higher the gamma 2 is, the higher the unhealthy degree of the network behavior in the current month is;
s5, comprehensively evaluating the health degree of the user, comprehensively evaluating the health degree development condition of the user based on gamma 1 and gamma 2 obtained in the steps S3 and S4, wherein the evaluation parameter of the health development condition of the user is G, and the requirements are metWherein k is 4 、k 5 Is constant.
Further, the text emotion analysis model is used for analyzing emotion coefficients of a network speaking document, the text emotion analysis model is based on a neural network algorithm and comprises an input layer, an implicit layer and a classified output layer, wherein the input layer is used for inputting a word vector x contained in the speaking document, the output layer outputs an emotion type y corresponding to the word vector through a classifier, the implicit layer is used for extracting features in the word vector and comprises n layers of neurons, the neurons are used for processing the output of the neurons through an activation function, and the implicit layer satisfies the formula:where xi represents the input word vector, wi represents the connection weight of the ith neuron, b is the activation threshold, and f represents the activation function.
Further, as shown in fig. 3, the text emotion analysis model includes the following steps:
s11, extracting document keywords and emotion words: extracting user language documents from a database, splitting the documents into words by using regular expressions, filtering nonsensical words in the words to obtain an effective word set A in the documents, obtaining a high-frequency word set B in all files, subtracting intersection of the set A and the set B from the set A to obtain a keyword set C in the documents, and marking emotion words in the documents;
s12, speaking and vectorizing expression: expressing words in each sentence as a word vector, namely expressing each word as a vector in a high-dimensional space, wherein the dimension of the word vector is between 260 and 280, and obtaining n word vectors;
s13, extracting features: inputting the word vector into a neural network model to extract features, adding weight to the emotion words and the keywords obtained in the step S1 during feature extraction, extracting features P (X, Y) in the word vector, wherein the features P (X, Y) represent joint probability distribution of sample features X and categories to which the samples belong, and extracting the features by using a continuous word bag model;
s14, classifying an emotion classifier: and (3) dividing the characteristics obtained in the step (S3) into 4 gradient emotion states which are respectively negative, normal and positive by using a classifier, wherein the emotion states are represented by numerical quantization and are sequentially-1 score, 0 score and 1 score, so as to obtain the emotion numerical value y of the word vector.
To sum up: the application designs an evaluation method of the network behavior health degree, which respectively obtains the influence coefficient on the network behavior health degree from three dimensions of the internet surfing time, the network speaking and the browsing information, and finally carries out comprehensive evaluation to obtain the network behavior health degree score of the user, and the health degree development trend of the user is tracked and analyzed by utilizing the user health degree development analysis module, so that the network behavior health degree of the user is monitored, and the effects of obtaining, quantifying and monitoring the network behavior health degree of the user are realized.
Finally: the foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims (8)

1. The user information collection and analysis system based on big data is characterized in that: the system comprises a user information collection module, a user internet surfing time health degree coefficient acquisition module, a user network speaking health degree coefficient acquisition module, a user browsing information analysis module, a user network behavior health degree assessment module and a user health degree development analysis module, wherein the user information collection module collects information of active users by a method of installing a monitor in a user internet surfing platform, the collected information is stored in a database in a user unit, and each user unit comprises four categories of user basic information, internet surfing time information, network speaking information and browsing information;
the user surfing time health coefficient acquisition module acquires surfing time information of a user from a database, wherein the surfing time information comprises total online time of the user in a day, online learning time of the user in a day, online stay-up time of the user in a day and analysis to obtain a surfing time health coefficient epsilon of the user 1 The coefficient epsilon of health assessment is higher with longer on-line time, lower learning time and night time 1 The lower the user surfing timeHealth coefficient epsilon 1 Satisfy the formulaWherein t represents the total online time of the user, t a Representing the duration spent by the user on learning, t b Indicating the time period for the user to stay up at night, k 1 、k 2 、k 3 The influence factors of the online time duration, the influence factors of the online learning time and the influence factors of the online stay-up time duration are respectively represented;
the user network speaking health degree coefficient acquisition module acquires the speaking information of the user in the network from the database, wherein the speaking information comprises network speaking documents of the user in one day, and the network speaking health degree coefficient epsilon of the user is obtained through analysis 2 The speaking document comprises n word vectors, and emotion coefficient y and emotion health coefficient epsilon of each word vector are obtained based on a text emotion analysis model 2 Satisfy the formulaWherein y is i An emotion coefficient representing an ith word vector;
the user browsing information analysis module comprises a user browsing health degree coefficient acquisition unit and a user browsing preference portrait construction unit, wherein the user browsing health degree coefficient acquisition unit acquires browsing information of a user in a network from data, comprises browsing time length, browsing content and browsing emotion direction in one day, and analyzes and obtains a user browsing health degree coefficient epsilon 3 The user browses the health coefficient epsilon 3 Satisfy the formulaThe time sum of the positive browsing information is represented by Zt, and the time sum of the negative browsing information is represented by Ft;
the user network behavior health evaluation module is used for evaluating the network behavior health of the user in one day, and the network behavior health JKD in one day meets the formula JKD=w 1 ε 1 +w 2 ε 2 +w 3 ε 3 Wherein w is 1 、w 2 、w 3 Respectively are provided withWeight factor representing user Internet surfing time health, user network speaking health, user browsing health versus evaluating user network behavioral health, and w 1 +w 2 +w 3 =100% influencing factor.
2. The big data based user information collection and analysis system of claim 1, wherein: the user information collection module comprises an information screening unit and a user network behavior collection unit, wherein the information screening unit screens active users based on screening conditions, and the screening conditions are the number of active days of the users on line each month and the data size of the user network behavior data; the user network behavior collection unit collects user internet surfing time information, user network speaking information and user browsing information.
3. The big data based user information collection and analysis system of claim 1, wherein: the user browsing health coefficient obtaining unit screens and obtains L pieces of browsing information in a time interval of a day of a user by setting a browsing time interval, divides the browsing information into positive browsing information and negative browsing information according to keywords and browsing emotion directions of each piece of browsing information, adds the browsing time durations of all the positive browsing information, and adds the browsing time durations of all the negative browsing information.
4. The big data based user information collection and analysis system of claim 1, wherein: the user browsing preference portrait construction unit is used for extracting keywords of each browsing content and transmitting the obtained browsing time and keywords to the user browsing habit analysis subunit; the user browsing habit analysis subunit is used for analyzing the preference of the user during browsing, analyzing the browsing preference of the user based on the browsing time and the keyword information, classifying all keywords through a clustering algorithm, and then calculating the occurrence frequency of each category of keywords, wherein the occurrence frequency is the accumulated browsing duration of the keywords divided by the total duration, and establishing a user browsing preference portrait according to the occurrence frequency of the keywords.
5. The big data-based user information collection and analysis system according to claim 1, wherein the user network behavior health assessment module comprises a user current month health development state assessment unit, a user health trend assessment unit and a user health comprehensive assessment unit, wherein the user current month health development state assessment unit is used for assessing the ratio of unhealthy network behavior values to healthy network behavior values of a user in one month; the user health degree trend evaluation unit is used for evaluating the change trend of unhealthy network behavior values of each month of a user; the comprehensive user health evaluation unit is used for evaluating the network behavior development state of each month of the user.
6. The big data based user information collection and analysis system of claim 5, wherein the user network behavior health assessment module is configured to analyze and assess a development status of the user's network behavior health, and comprises the steps of:
s1, acquiring daily network behavior health of a user to obtain a data set (di, z) of the current month behavior health, wherein the data set (di, z) is used for di to represent the network behavior health of the user on the i th day;
s2, establishing a rectangular coordinate system, marking the data set in the coordinate system, wherein the x-axis represents time, the y-axis represents network behavior health degree, and the data set of the current month behavior health degree is corresponding to the rectangular coordinate system to obtain discrete points, and fitting the discrete points into a function curve;
s3, acquiring a user current month health degree development state coefficient, calculating an area S surrounded by a function curve and a coordinate axis, representing a healthy network behavior value by an area Sa in a first quadrant, and representing an unhealthy network behavior value by an area Sb in a fourth quadrant, and obtaining the user current month health degree development state coefficientIf gamma is 1 > 1 indicates unhealthy network behavior in the current month;
s4, acquiring a user health degree trend coefficient, and acquiring unhealthy network behavior values of the user in each month to obtain a data set B, wherein the data set B is recorded as B= [ S ] 1 ,S 2 ,...,S 12 ]Calculating the growth rate of unhealthy network behavior values in each month, namely the user health degree trend coefficientWherein S is n+1 Representing the unhealthy network behavior value of the current month, sn representing the unhealthy network behavior value of the previous month, if gamma 2 > 0 indicates that the user health degree development trend is unhealthy;
s5, comprehensively evaluating the user health degree, and comprehensively evaluating the user health degree development condition based on gamma 1 and gamma 2 obtained in the steps S3 and S4, wherein the evaluation parameter of the user health development condition is G, and G=k is satisfied 412 )+k 512 Wherein k is 4 、k 5 Is constant.
7. The big data based user information collection and analysis system of claim 1, wherein: the text emotion analysis model is based on a neural network algorithm and comprises an input layer, an implicit layer and a classified output layer, wherein the input layer is used for inputting word vectors x contained in a speaking document, the output layer of the text emotion analysis model outputs emotion types y corresponding to the word vectors through a classifier, the implicit layer of emotion is used for extracting features in the word vectors and comprises n layers of neurons, the neurons are subjected to output through an activation function, and the implicit layer satisfies the formula:where xi represents the input word vector, wi represents the connection weight of the ith neuron, b is the activation threshold, and f represents the activation function.
8. The big data based user information collection and analysis system of claim 7, wherein: the text emotion analysis model comprises the following steps:
s11, extracting document keywords and emotion words: extracting user language documents from a database, splitting the documents into words by using regular expressions, filtering nonsensical words in the words to obtain an effective word set A in the documents, obtaining a high-frequency word set B in all files, subtracting intersection of the set A and the set B from the set A to obtain a keyword set C in the documents, and marking emotion words in the documents;
s12, speaking and vectorizing expression: expressing words in each sentence as a word vector, namely expressing each word as a vector in a high-dimensional space, wherein the dimension of the word vector is between 260 and 280, and obtaining n word vectors;
s13, extracting features: inputting the word vector into a neural network model to extract features, adding weight to the emotion words and the keywords obtained in the step S1 during feature extraction, extracting features P (X, Y) in the word vector, wherein the features P (X, Y) represent joint probability distribution of sample features X and categories to which the samples belong, and extracting the features by using a continuous word bag model;
s14, classifying an emotion classifier: and (3) dividing the characteristics obtained in the step (S3) into 4 gradient emotion states which are respectively negative, normal and positive by using a classifier, wherein the emotion states are represented by numerical quantization and are sequentially-1 score, 0 score and 1 score, so as to obtain the emotion numerical value y of the word vector.
CN202310222160.XA 2023-03-09 2023-03-09 User information collecting and analyzing system based on big data Active CN116245555B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310222160.XA CN116245555B (en) 2023-03-09 2023-03-09 User information collecting and analyzing system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310222160.XA CN116245555B (en) 2023-03-09 2023-03-09 User information collecting and analyzing system based on big data

Publications (2)

Publication Number Publication Date
CN116245555A CN116245555A (en) 2023-06-09
CN116245555B true CN116245555B (en) 2023-12-08

Family

ID=86635735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310222160.XA Active CN116245555B (en) 2023-03-09 2023-03-09 User information collecting and analyzing system based on big data

Country Status (1)

Country Link
CN (1) CN116245555B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710054A (en) * 2023-12-20 2024-03-15 塞奥斯(北京)网络科技有限公司 Intelligent display system for commodity in online mall

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140056637A (en) * 2012-10-30 2014-05-12 에스케이플래닛 주식회사 System and method for providing analysis information, and apparatus applied to the same
CN106780073A (en) * 2017-01-11 2017-05-31 中南大学 A kind of community network maximizing influence start node choosing method for considering user behavior and emotion
CN107291739A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 Evaluation method, system and the equipment of network user's health status
CN108536757A (en) * 2018-03-19 2018-09-14 武汉大学 One kind being based on the potentially harmful theme bootstrap technique of user's history network
CN108573411A (en) * 2018-04-17 2018-09-25 重庆理工大学 Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method
CN110245816A (en) * 2019-01-07 2019-09-17 西南科技大学 User job efficiency visualized evaluation method based on browser history record
CN111563190A (en) * 2020-04-07 2020-08-21 中国电子科技集团公司第二十九研究所 Multi-dimensional analysis and supervision method and system for user behaviors of regional network
KR20200127654A (en) * 2019-05-03 2020-11-11 주식회사 자이냅스 A operating method for an automatic sentiment information labeling apparatus to news articles

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140056637A (en) * 2012-10-30 2014-05-12 에스케이플래닛 주식회사 System and method for providing analysis information, and apparatus applied to the same
CN107291739A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 Evaluation method, system and the equipment of network user's health status
CN106780073A (en) * 2017-01-11 2017-05-31 中南大学 A kind of community network maximizing influence start node choosing method for considering user behavior and emotion
CN108536757A (en) * 2018-03-19 2018-09-14 武汉大学 One kind being based on the potentially harmful theme bootstrap technique of user's history network
CN108573411A (en) * 2018-04-17 2018-09-25 重庆理工大学 Depth sentiment analysis and multi-source based on user comment recommend the mixing of view fusion to recommend method
CN110245816A (en) * 2019-01-07 2019-09-17 西南科技大学 User job efficiency visualized evaluation method based on browser history record
KR20200127654A (en) * 2019-05-03 2020-11-11 주식회사 자이냅스 A operating method for an automatic sentiment information labeling apparatus to news articles
CN111563190A (en) * 2020-04-07 2020-08-21 中国电子科技集团公司第二十九研究所 Multi-dimensional analysis and supervision method and system for user behaviors of regional network

Also Published As

Publication number Publication date
CN116245555A (en) 2023-06-09

Similar Documents

Publication Publication Date Title
Srinivasan et al. Biases in AI systems
Revathy et al. Sentiment analysis using machine learning: Progress in the machine intelligence for data science
JP6301966B2 (en) DATA ANALYSIS SYSTEM, DATA ANALYSIS METHOD, DATA ANALYSIS PROGRAM, AND RECORDING MEDIUM OF THE PROGRAM
CN116245555B (en) User information collecting and analyzing system based on big data
Pasichnyk et al. The model of data analysis of the psychophysiological survey results
Hovhannisyan et al. The visual and semantic features that predict object memory: Concept property norms for 1,000 object images
JPWO2016125310A1 (en) Data analysis system, data analysis method, and data analysis program
Gürsoy et al. A wavelet neural network approach to predict daily river discharge using meteorological data
Cheng et al. Multimodal time-aware attention networks for depression detection
JP6524790B2 (en) INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING PROGRAM
Ning et al. Calling for response: automatically distinguishing situation-aware tweets during crises
WO2023159756A1 (en) Price data processing method and apparatus, electronic device, and storage medium
WO2016203652A1 (en) System related to data analysis, control method, control program, and recording medium therefor
US11727685B2 (en) System and method for generation of process graphs from multi-media narratives
Beriwal et al. Techniques for suicidal ideation prediction: a qualitative systematic review
CN117216419A (en) Data analysis method based on AI technology
CN116864128A (en) Psychological state assessment system and method based on physical activity behavior pattern monitoring
Guo et al. Development and application of emotion recognition technology—a systematic literature review
WO2016189606A1 (en) Data analysis system, control method, control program, and recording medium
CN115062994A (en) Object evaluation method, object evaluation device, electronic device, and storage medium
Rao et al. ORG-RGRU: An automated diagnosed model for multiple diseases by heuristically based optimized deep learning using speech/voice signal
Yoo et al. Prediction of cardiac disease-causing pattern using multimedia extraction in health ontology
CN115270873A (en) Information recommendation generation method and device based on emotional state
Banerjee et al. A survey on mental health monitoring system via social media data using deep learning framework
Gamage et al. Academic depression detection using behavioral aspects for Sri Lankan university students

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20231116

Address after: Room 109, 1st Floor, Xinxiangshicheng Entrepreneurship and Employment Incubation Base, Building 51, No. 5 Baishan Road, Qiaoxi District, Zhangjiakou City, Hebei Province, 075000

Applicant after: Zhangjiakou Qiaogong Technology Service Co.,Ltd.

Address before: 274000 Store 1005, Building 3, Shidai Aocheng, Zhonghua Road, Xicheng Street, Mudan District, Heze City, Shandong Province

Applicant before: Qingrui Network Technology (Shandong) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant