CN115983873B - User data analysis management system and method based on big data - Google Patents

User data analysis management system and method based on big data Download PDF

Info

Publication number
CN115983873B
CN115983873B CN202211711362.2A CN202211711362A CN115983873B CN 115983873 B CN115983873 B CN 115983873B CN 202211711362 A CN202211711362 A CN 202211711362A CN 115983873 B CN115983873 B CN 115983873B
Authority
CN
China
Prior art keywords
data
user
image
comments
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211711362.2A
Other languages
Chinese (zh)
Other versions
CN115983873A (en
Inventor
黄诚龙
张德明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Landu Technology Co ltd
Original Assignee
Zhuhai Landu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Landu Technology Co ltd filed Critical Zhuhai Landu Technology Co ltd
Priority to CN202211711362.2A priority Critical patent/CN115983873B/en
Publication of CN115983873A publication Critical patent/CN115983873A/en
Application granted granted Critical
Publication of CN115983873B publication Critical patent/CN115983873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data analysis, in particular to a user data analysis management system and method based on big data, comprising the following steps: the system comprises a data acquisition module, a database, a data analysis module, an intelligent screening module and a data use module; collecting comment data of all shops in a network through the data collecting module; storing all acquired data through the database; analyzing the authenticity of the user comments through the data analysis module, and judging whether the store performs false marketing or not; screening shops subjected to false marketing through the intelligent screening module; identifying user requirements through the data use module, and performing intelligent pushing; through screening all shops with false marketing, and pushing the screened data, the phenomenon of deceptive consumers in the shops with false marketing is reduced to a great extent, and the user experience and the use sense are improved.

Description

User data analysis management system and method based on big data
Technical Field
The invention relates to the technical field of data analysis, in particular to a user data analysis management system and method based on big data.
Background
False propaganda behavior refers to the behavior that operators utilize advertisements or other methods to make imaginary facts, conceal true looks, cause misunderstanding of consumers and users on goods or services, so as to trade with the consumers and the users, win markets and obtain benefits. This kind of behavior violates the honest credit principle, violates the accepted business principle, and is a serious unfair competitive behavior.
With the continuous development of network data, the operation modes of the Internet and the physical stores are generated, so that a plurality of physical stores can increase benefits through network advertising, however, the situation that a plurality of merchants conduct false marketing to induce consumers occurs; when people cannot get out of the holiday and play, but select a store with false propaganda, the mood is very influenced, and thus very bad experience is caused.
Therefore, a user data analysis management system and a user data analysis management method based on big data are needed to solve the problems, stores are screened by analyzing the authenticity of user comments, and proper stores are intelligently pushed to users according to user requirements, so that the experience of the users is greatly improved.
Disclosure of Invention
The invention aims to provide a user data analysis management system and method based on big data, so as to solve the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme: a big data based user data analysis management system, the system comprising: the system comprises a data acquisition module, a database, a data analysis module, an intelligent screening module and a data use module;
the output end of the data acquisition module is connected with the input end of the database, the output end of the database is connected with the input end of the data analysis module, the output end of the data analysis module is connected with the input end of the intelligent screening module, and the output end of the intelligent screening module is connected with the input end of the data use module;
collecting comment data of all shops in a network through the data collecting module;
storing all acquired data through the database;
analyzing the authenticity of the user comments through the data analysis module, and judging whether the store performs false marketing or not;
screening shops subjected to false marketing through the intelligent screening module;
and identifying the user demand through the data use module, and performing intelligent pushing.
Further, the data acquisition module comprises an information acquisition unit and a picture acquisition unit;
the information acquisition unit is used for acquiring text data of shop comments; the picture acquisition unit is used for acquiring image information of shop comments.
Further, the database is used for storing name information of all stores and comment information of the stores in the network;
further, the data analysis module comprises a text analysis unit, a picture analysis unit and an authenticity judgment unit;
the text analysis unit is used for analyzing the authenticity of the text in the shop comments of the user; the picture analysis unit is used for analyzing the authenticity of the image data in the comments; the authenticity judging unit is used for judging whether the store performs false marketing or not according to the analysis result;
the text analysis unit comprises a content comparison subunit and a similarity analysis subunit;
the content comparison subunit is used for comparing the number of the good comments and the bad comments in the user comments, and if the number is similar, the authenticity of the shop comments is low; if the number is not equal, entering the similarity analysis subunit; the similarity analysis subunit is used for analyzing the relevance between comment contents, if the relevance is large, the authenticity of the shop comment is low, and the picture analysis unit is entered;
the picture analysis unit comprises a parameter analysis subunit and an image analysis subunit;
the parameter analysis subunit is configured to analyze consistency of image parameters, where the image parameters include: photographing devices (body, lens, flash, etc.), photographing parameters (shutter speed, aperture F value, ISO speed), image processing (sharpening, contrast, saturation, white balance, etc.); the image analysis subunit is used for analyzing consistency of shooting angles of shops in the images: if the shooting angles are consistent, the comment data corresponding to the comment data are all false comments, and when the quantity exceeds a threshold value, the image in the comment is judged to be not true.
Further, the intelligent screening module is used for screening out false marketing existing in all shops.
Further, the data use module comprises an information identification unit, a keyword analysis unit, an intelligent pushing unit and a user feedback unit;
the information identification unit is used for identifying voice data of a user; the keyword analysis unit is used for analyzing the user demand according to keywords in the voice data and matching with related shops; the intelligent pushing unit is used for sending shops with high matching degree to the mobile terminal for users to select; the user feedback unit is used for receiving the user use satisfaction information and sending the data to the intelligent pushing unit, so that the system is convenient to optimize, and the user use sense is improved.
A user data analysis management method based on big data comprises the following steps:
s1: collecting comment data of all shops in a network;
s2: storing all the acquired data;
s3: analyzing the authenticity of the user comments, and judging whether the store performs false marketing or not;
s4: screening shops for false marketing;
s5: and identifying the user demand and performing intelligent pushing.
Further, in step S1: collecting comment data of all shops in a network, wherein the comment data comprises:
1. collecting all store names to form a store name set D;
2. collecting text data of comments of any store D, D epsilon D to form a text set A;
3. image information of any store d comment is collected to form an image data set B.
Further, in step S2: storing name information of all stores and comment information of the stores in the network;
further, in step S3: the authenticity of the user comments is analyzed, and whether the store performs false marketing is judged, wherein the method comprises the following specific steps:
s301: the authenticity of the characters in any shop d comment of the user is analyzed, and the steps are as follows:
s301_1: all text data in the comment are acquired to form a text set: a= { a i I=1, 2, …, n, where a i Representing a comment text content, wherein the quantity of n is relatively large;
s301_2: comparing the good evaluation with the poor evaluation: recognizing that the character set good evaluation is x and the poor evaluation is y, wherein x+y=n; if (x-y) > beta, wherein beta is a set threshold value, representing that the credibility of the store is not low, analyzing comment content; otherwise, if (x-y) < beta, indicating that the store credibility is low, proceeding to step S4 to screen the store;
s301_3: comparing the similarity of the characters, and classifying: extracting and traversing the good content A in the text set x ={a i },i=1,2,…,x;According to the formula: />If gamma is>Epsilon, the text content representing the comments is similar, and the comments are classified into one class to form a same class set, and the number of comments in the same class is recorded by using a statistical algorithm>The statistical algorithm belongs to the conventional technical means of the person skilled in the art, so that excessive details are not made;
s301_4: analyzing the number of comments and judging the authenticity of the characters: if it isWherein μ is a set threshold, and represents that the number of comments with high similarity is large, which indicates that the text content of the category has no authenticity, and at this time, step S303 is entered; on the contrary, ifThe number of comments showing high similarity is small, and in order to further analyze the authenticity of the user comments, step S302 is entered;
s302: the authenticity of the image data in any store d comment is analyzed as follows:
s302_1: extracting store name information d= { s c C=1, 2, …, σ, where s c Representing words in the name; store-related image according to name keywordScreening is carried out to remove irrelevant pictures:
traversing the store name information d according to the formula:obtaining a keyword X;
acquiring all image data sets B in comments, screening out image data related to a keyword X by using GBDT algorithm, and forming the image data setsWherein each image data b j There is a piece of text data a i Corresponding to the above; the GBDT algorithm belongs to a conventional technical means of those skilled in the art, so that excessive details are not made;
wherein p(s) c ) Representation word s c The number of occurrences in the name information D, |d| represents the number of stores in the database, |s c E d| represents that the database contains the word s c Store number of (2);
s302_2: analyzing consistency of image parameters: traversing an image datasetAcquiring arbitrary image b j The parameter data of (2) is K j ={k z Z=1, 2, …, τ; according to the comparison formula: obtaining parameter comparison results of different images;
when Λ < phi, wherein phi is a parameter comparison threshold, representing that the parameter data of the images are consistent, screening the consistent images at the moment, and recording the number omega of the images, and when omega > zeta, wherein zeta is an image number threshold, representing that the number of the images with consistent grid in comments is large, and not having image authenticity, at the moment, entering step S303; otherwise, if ω < ζ, to further analyze the authenticity of the user comment, step s302_3 is entered;
s302_3: the consistency of shooting angles of shops in the images is analyzed, wherein the consistency comprises shooting height, shooting direction and shooting distance, and the steps are as follows:
1) Traversing an image datasetArbitrary image b using picture processing algorithm j Modified to the same size; the image processing algorithm belongs to a conventional technical means of a person skilled in the art, so that excessive details are not made;
2) Comparing the image similarity: according to the similarity formula:obtaining the similarity between images when xi>The method comprises the steps of (1) determining the similarity of the images, wherein the (E) is a similarity threshold value, representing that the content of the same elements in the images is high, classifying the images into one type, and forming q different types of image data at the moment;
3) Extracting one kind of image data, and overlapping the images by using a coincidence algorithm: extracting texture features of a plurality of images by using an LBP feature extraction method, sequentially carrying out pixel fusion on the image textures with the same pixel points, and taking the area with the most new pixel points as a distinguishing mark at the moment; wherein, the LBP feature extraction method belongs to the conventional technical means of the person skilled in the art, so that excessive details are not made;
4) Constructing a three-dimensional model of a certain scene of a store according to a plurality of images in the image data, and determining the approximate direction of each image shooting by taking a distinguishing mark as a reference object to form a direction vector set
5) Analyzing the included angle and judging the consistency of shooting angles: traversing the set of direction vectors P according to the formula: when cos theta>Γ, identifying that the image shooting angles are similar, recording the similar number u, and when u>Zeta indicates that the number of images with similar shooting angles is large, and the image is not true, otherwise, u<Returning to the step 3) to extract other kinds of image data until the traversal is finished;
s303: judging whether the store performs false marketing according to the analysis result:
and when the fact that the text content does not have the authenticity or the image does not have the authenticity is analyzed, judging that the store has a behavior for carrying out false marketing.
Further, in step S4: all stores with dummy marketing are screened out.
Further, in step S5: identifying user demands, performing intelligent pushing, and performing data pushing on screened data, so that phenomena of false marketing deception consumers of shops are reduced to a great extent, and user experience and use sense are improved; the method comprises the following specific steps:
s501: recognizing voice data of a user, and extracting voice keywords of the user by utilizing a keyword recognition technology;
s502: matching relevant stores according to the voice keyword information;
s503: transmitting stores with high matching degree to the mobile terminal for selection by a user;
s504: after confirming the store, the user use satisfaction information is received, and the data is returned to step S503.
Compared with the prior art, the invention has the following beneficial effects:
the invention performs preliminary screening on the authenticity of the characters in the comments by comparing the good evaluation with the poor evaluation, is favorable for analyzing the relevance of the content of the characters subsequently, and improves the accuracy of data analysis; the comment is classified according to the good text content, and whether false marketing phenomenon exists or not is analyzed by utilizing the number of similar comments, so that the accuracy of the system for analyzing the comment content is greatly improved; the key words of store information are extracted, the GBDT algorithm is utilized to screen out image data related to the key words, irrelevant images are removed, and errors of system analysis are reduced; the consistency of the image parameters is compared by utilizing a formula, and the authenticity of the comment image is primarily analyzed according to the consistent number, so that the follow-up analysis of the image shooting angle is facilitated; by comparing the similarity of the images, the images of the same element are classified, so that the subsequent coincidence analysis of the images is facilitated; overlapping the images by utilizing a coincidence algorithm, and taking the area with the largest number of generated new pixel points as a distinguishing mark, thereby being beneficial to the establishment of the subsequent three-dimensional reference object; by constructing a three-dimensional space, analyzing the approximate azimuth of a shot image by using a reference object, and analyzing an included angle according to the azimuth, the accuracy of image analysis is greatly improved, and the judgment of the image authenticity is facilitated; through screening all shops with false marketing, and pushing the screened data, the phenomenon of deceptive consumers in the shops with false marketing is reduced to a great extent, and the user experience and the use sense are improved.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a block diagram of a big data based user data analysis management system of the present invention;
fig. 2 is a flowchart of a user data analysis management method based on big data according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, the present invention provides the following technical solutions: a big data based user data analysis management system, the system comprising: the system comprises a data acquisition module, a database, a data analysis module, an intelligent screening module and a data use module;
the output end of the data acquisition module is connected with the input end of the database, the output end of the database is connected with the input end of the data analysis module, the output end of the data analysis module is connected with the input end of the intelligent screening module, and the output end of the intelligent screening module is connected with the input end of the data use module;
collecting comment data of all shops in a network through the data collecting module;
the data acquisition module comprises an information acquisition unit and a picture acquisition unit;
the information acquisition unit is used for acquiring text data of shop comments; the picture acquisition unit is used for acquiring image information of shop comments.
Storing all acquired data through the database;
the database is used for storing name information of all shops and comment information of the shops in the network;
analyzing the authenticity of the user comments through the data analysis module, and judging whether the store performs false marketing or not;
the data analysis module comprises a text analysis unit, a picture analysis unit and an authenticity judgment unit;
the text analysis unit is used for analyzing the authenticity of the text in the shop comments of the user; the picture analysis unit is used for analyzing the authenticity of the image data in the comments; the authenticity judging unit is used for judging whether the store performs false marketing or not according to the analysis result;
the text analysis unit comprises a content comparison subunit and a similarity analysis subunit;
the content comparison subunit is used for comparing the number of the good comments and the bad comments in the user comments, and if the number is similar, the authenticity of the shop comments is low; if the number is not equal, entering the similarity analysis subunit; the similarity analysis subunit is used for analyzing the relevance between comment contents, if the relevance is large, the authenticity of the shop comment is low, and the picture analysis unit is entered;
the picture analysis unit comprises a parameter analysis subunit and an image analysis subunit;
the parameter analysis subunit is configured to analyze consistency of image parameters, where the image parameters include: photographing devices (body, lens, flash, etc.), photographing parameters (shutter speed, aperture F value, ISO speed), image processing (sharpening, contrast, saturation, white balance, etc.); the image analysis subunit is used for analyzing consistency of shooting angles of shops in the images: if the shooting angles are consistent, the comment data corresponding to the comment data are all false comments, and when the quantity exceeds a threshold value, the image in the comment is judged to be not true.
Screening shops subjected to false marketing through the intelligent screening module;
the intelligent screening module is used for screening out false marketing existing in all shops.
And identifying the user demand through the data use module, and performing intelligent pushing.
The data use module comprises an information identification unit, a keyword analysis unit, an intelligent pushing unit and a user feedback unit;
the information identification unit is used for identifying voice data of a user; the keyword analysis unit is used for analyzing the user demand according to keywords in the voice data and matching with related shops; the intelligent pushing unit is used for sending shops with high matching degree to the mobile terminal for users to select; the user feedback unit is used for receiving the user use satisfaction information and sending the data to the intelligent pushing unit, so that the system is convenient to optimize, and the user use sense is improved.
A user data analysis management method based on big data comprises the following steps:
s1: collecting comment data of all shops in a network;
in step S1: collecting comment data of all shops in a network, wherein the comment data comprises:
1. collecting all store names to form a store name set D;
2. collecting text data of comments of any store D, D epsilon D to form a text set A;
3. image information of any store d comment is collected to form an image data set B.
S2: storing all the acquired data;
in step S2: storing name information of all stores and comment information of the stores in the network;
s3: analyzing the authenticity of the user comments, and judging whether the store performs false marketing or not;
in step S3: the authenticity of the user comments is analyzed, and whether the store performs false marketing is judged, wherein the method comprises the following specific steps:
s301: the authenticity of the characters in any shop d comment of the user is analyzed, and the steps are as follows:
s301_1: all text data in the comment are acquired to form a text set: a= { a i I=1, 2, …, n, where a i Representing a comment text content, wherein the quantity of n is relatively large;
s301_2: comparing the good evaluation with the poor evaluation, and improving the stringency of data analysis: recognizing that the character set good evaluation is x and the poor evaluation is y, wherein x+y=n; if (x-y) > beta, wherein beta is a set threshold value, representing that the credibility of the store is not low, analyzing comment content; otherwise, if (x-y) < beta, indicating that the store credibility is low, proceeding to step S4 to screen the store;
s301_3: comparing the similarity of the characters, and classifying: extracting and traversing the good content A in the text set x ={a i },i=1,2,…,x;According to the formula: />If gamma is>Epsilon, the text content representing the comments is similar, and the comments are classified into one class to form a same class set, and the number of comments in the same class is recorded by using a statistical algorithm>
S301_4: the number of comments is analyzed, and the authenticity of the characters is judged, so that the accuracy of the system in comment content analysis is greatly improved: if it isWherein μ is a set threshold, and represents that the number of comments with high similarity is large, which indicates that the text content of the category has no authenticity, and at this time, step S303 is entered; on the contrary, if->The number of comments showing high similarity is small, and in order to further analyze the authenticity of the user comments, step S302 is entered;
s302: the authenticity of the image data in any store d comment is analyzed as follows:
s302_1: extracting store name information d= { s c C=1, 2, …, σ, where s c Representing words in the name; screening the shop related images according to the name keywords, removing irrelevant pictures, and reducing errors of system analysis:
traversing the store name information d according to the formula:obtaining a keyword X;
acquiring all image data sets B in comments, screening out image data related to a keyword X by using GBDT algorithm, and forming the image data setsWherein each image data b j There is a piece of text data a i Corresponding to the above; wherein the GBDT algorithm is within the skill of the artConventional technical means, therefore, are not excessively detailed;
wherein p(s) c ) Representation word s c The number of occurrences in the name information D, |d| represents the number of stores in the database, |s c E d| represents that the database contains the word s c Store number of (2);
s302_2: analyzing consistency of image parameters: traversing an image datasetAcquiring arbitrary image b j The parameter data of (2) is K j ={k z Z=1, 2, …, τ; according to the comparison formula: obtaining parameter comparison results of different images;
when Λ < phi, wherein phi is a parameter comparison threshold, representing that the parameter data of the images are consistent, screening the consistent images at the moment, and recording the number omega of the images, and when omega > zeta, wherein zeta is an image number threshold, representing that the number of the images with consistent grid in comments is large, and not having image authenticity, at the moment, entering step S303; otherwise, if ω < ζ, to further analyze the authenticity of the user comment, step s302_3 is entered;
s302_3: the consistency of shooting angles of shops in the images is analyzed, wherein the consistency comprises shooting height, shooting direction and shooting distance, and the steps are as follows:
1) Traversing an image datasetArbitrary image b using picture processing algorithm j Modified to the same size; the image processing algorithm belongs to a conventional technical means of a person skilled in the art, so that excessive details are not made;
2) Comparing the image similarity: according to the similarity formula:obtaining the similarity between images when xi>The method comprises the steps of (1) determining the similarity of the images, wherein the (E) is a similarity threshold value, representing that the content of the same elements in the images is high, classifying the images into one type, and forming q different types of image data at the moment;
3) Extracting one kind of image data, and overlapping the images by using a coincidence algorithm: extracting texture features of a plurality of images by using an LBP feature extraction method, sequentially carrying out pixel fusion on the image textures with the same pixel points, and taking the area with the most new pixel points as a distinguishing mark at the moment; wherein, the LBP feature extraction method belongs to the conventional technical means of the person skilled in the art, so that excessive details are not made;
4) Constructing a three-dimensional model of a certain scene of a store according to a plurality of images in the image data, and determining the approximate direction of each image shooting by taking a distinguishing mark as a reference object to form a direction vector set
5) Analyzing the included angle and judging the consistency of shooting angles: traversing the set of direction vectors P according to the formula: when cos theta>Γ, identifying that the image shooting angles are similar, recording the similar number u, and when u>Zeta indicates that the number of images with similar shooting angles is large, and the image is not true, otherwise, u<Returning to the step 3) to extract other kinds of image data until the traversal is finished, so that the accuracy of image analysis is greatly improved, and the judgment of the image authenticity is facilitated;
s303: judging whether the store performs false marketing according to the analysis result:
and when the fact that the text content does not have the authenticity or the image does not have the authenticity is analyzed, judging that the store has false marketing behaviors.
S4: screening shops for false marketing;
in step S4: all stores with dummy marketing are screened out.
S5: and identifying the user demand and performing intelligent pushing.
In step S5: identifying user demands, performing intelligent pushing, and performing data pushing on screened data, so that phenomena of false marketing deception consumers of shops are reduced to a great extent, and user experience and use sense are improved; the method comprises the following specific steps:
s501: recognizing voice data of a user, and extracting voice keywords of the user by utilizing a keyword recognition technology;
s502: matching relevant stores according to the voice keyword information;
s503: transmitting stores with high matching degree to the mobile terminal for selection by a user;
s504: after confirming the store, the user use satisfaction information is received, and the data is returned to step S503.
Embodiment one:
in step S1: collecting comment data of all shops in a network, wherein the comment data comprises:
1. collecting all store names to form a store name set D;
2. collecting text data of comments of any store D, D epsilon D to form a text set A;
3. image information of any store d comment is collected to form an image data set B.
In step S2: storing name information of all stores and comment information of the stores in the network;
in step S3: the authenticity of the user comments is analyzed, and whether the store performs false marketing is judged, wherein the method comprises the following specific steps:
s301: the authenticity of the characters in any shop d comment of the user is analyzed, and the steps are as follows: taking the "xxx Hotel" as an example
S301_1: all text data in the comment are acquired to form a text set: a={a i I=1, 2, …,500, with 500 comments;
s301_2: comparing the good evaluation with the poor evaluation: recognizing that the character set good score is 450 and the poor score is 50, ifIf the credibility of the store is not low, analyzing comment content;
s301_3: comparing the similarity of the characters, and classifying: extracting and traversing the good content A in the text set x ={a i I=1, 2, …,450, according to the formula:if gamma is>0.8, classifying the text content representing the comments into one category if the text content representing the comments is similar to form a same category set, and recording the number of comments in the same category>
S301_4: analyzing the number of comments and judging the authenticity of the characters: by traversing The number of comments showing high similarity is small, and in order to further analyze the authenticity of the user comments, step S302 is entered;
s302: the authenticity of the image data in any store d= "xxx hotel" comment is analyzed as follows:
s302_1: extracting store name information d= { s c = "xxx hotel" }, c=1, 2, …, σ, screen store related images according to name keywords, remove unrelated pictures:
traversing the store name information d according to the formula:obtaining a keyword of X= "hotel";
acquiring all image data sets B in comments, screening out image data related to a keyword X by using GBDT algorithm, and forming the image data setsWherein each image data b j There is a piece of text data a i Corresponding to the above;
wherein p(s) c ) Representation word s c The number of occurrences in the name information D, |d| represents the number of stores in the database, |s c E d| represents that the database contains the word s c Store number of (2);
s302_2: analyzing consistency of image parameters: traversing an image datasetAcquiring arbitrary image b j The parameter data of (2) is K j ={k z Z=1, 2, …, τ; according to the comparison formula: obtaining parameter comparison results of different images;
when Λ <0.9, the parameter data representing the images are consistent, at this time, the consistent images are filtered, the number ω= {80,78,69, … } of the images is recorded, and because ω <100, in order to further analyze the authenticity of the user comment, step s302_3 is entered;
s302_3: the consistency of shooting angles of shops in the images is analyzed, wherein the consistency comprises shooting height, shooting direction and shooting distance, and the steps are as follows:
traversing an image datasetArbitrary image b using picture processing algorithm j Modified to the same size;
1) Comparing the image similarity: according to similarityThe degree formula:obtaining the similarity between images when xi>0.75, which shows that the content of the same element in the image is more, the image similarity is high, and the images are classified into one type, and 45 different types of image data are formed at the moment;
2) Extracting one kind of image data, wherein the number of the images is 200, and overlapping the images by using a coincidence algorithm: extracting texture features of a plurality of images by using an LBP feature extraction method, sequentially carrying out pixel fusion on the image textures with the same pixel points, and taking the area with the most new pixel points as a distinguishing mark at the moment;
3) Constructing a three-dimensional model of a certain scene of a store according to a plurality of images in the image data, and determining the approximate direction of each image shooting by taking a distinguishing mark as a reference object to form a direction vector set
4) Analyzing the included angle and judging the consistency of shooting angles: traversing the set of direction vectors P according to the formula: when cos theta>0.85, the shooting angles of the identification images are similar, the similar number u is recorded, and when u is>100, the number of images showing similar shooting angles is large, the image reality is not possessed,
s303: judging whether the store performs false marketing according to the analysis result:
and (5) analyzing that the existing image is not true, and judging that a store has false marketing behaviors.
Further, in step S4: for all stores with dummy marketing, including: and (5) screening out the xxx hotels.
Further, in step S5: identifying user demands, performing intelligent pushing, and performing data pushing on screened data, so that phenomena of false marketing deception consumers of shops are reduced to a great extent, and user experience and use sense are improved; the method comprises the following specific steps:
s501: identifying voice data of a user: "want to go away from hotel that xx subway station is near a bit, the requirement price performance ratio is high, budgets in xx-xx", utilize keyword recognition technology to draw user's pronunciation keyword: "Hotel", "cost performance", "budget";
s502: matching relevant stores according to the voice keyword information;
s503: transmitting stores with high matching degree to the mobile terminal for selection by a user;
s504: after confirming the store, the user use satisfaction information is received, and the data is returned to step S503.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A user data analysis and management method based on big data is characterized in that: the method comprises the following steps:
s1: collecting comment data of all shops in a network;
s2: storing all the acquired data;
s3: analyzing the authenticity of the user comments, and judging whether the store performs false marketing or not;
s4: screening shops for false marketing;
s5: identifying user requirements and performing intelligent pushing;
in step S1: collecting comment data of all shops in a network, wherein the comment data comprises:
collecting all shop names to form a shop name set D;
II, collecting text data of comments of any store D, D epsilon D to form a text set A;
III, collecting image information of comments of any store d to form an image data set B;
in step S3: the authenticity of the user comments is analyzed, and whether the store performs false marketing is judged, wherein the method comprises the following specific steps:
s301: the authenticity of the characters in any shop d comment of the user is analyzed, and the steps are as follows:
s301_1: all text data in the comment are acquired to form a text set: a= { a i I=1, 2, …, n, where a i Representing a comment text content;
s301_2: comparing the good evaluation with the poor evaluation: recognizing that the character set good evaluation is x and the poor evaluation is y, wherein x+y=n; if (x-y) > beta, wherein beta is a set threshold value, analyzing comment content;
s301_3: comparing the similarity of the characters, and classifying: extracting and traversing the good content A in the text set x ={a i },i=1,2,…,x;According to the formula: />If gamma is>Epsilon, wherein epsilon represents a text similarity threshold value, represents that the text contents of comments are similar, classifies the comments into one class to form a same class set, and records the number of comments in the same class by using a statistical algorithm>
S301_4: analyzing the number of comments and judging the authenticity of the characters: if it isWherein mu is a number threshold value, and represents that the number of comments with high similarity is large, so that the text content of the category has no reality; on the contrary, if->The number of comments showing high similarity is small, and in order to further analyze the authenticity of the user comments, step S302 is entered;
s302: the authenticity of the image data in any store d comment is analyzed as follows:
s302_1: extracting store name information d= { s c C=1, 2, …, σ, where s c Representing words in the name; screening the store related images according to the name keywords:
traversing the store name information d according to the formula:obtaining a keyword X;
acquiring all image data sets B in comments, screening out image data related to a keyword X by using GBDT algorithm, and forming the image data setsWherein each image data b j There is a piece of text data a i Corresponding to the above;
wherein p(s) c ) Representation word s c The number of occurrences in the name information D, |d| represents the number of stores in the database, |s c E d| represents the content of the databaseWith words s c Store number of (2);
s302_2: analyzing consistency of image parameters: traversing an image datasetAcquiring arbitrary image b j The parameter data of (2) is K j ={k z Z=1, 2, …, τ; according to the comparison formula:obtaining parameter comparison results of different images;
when lambda < phi, wherein phi is a parameter comparison threshold value, representing that the parameter data of the images are consistent, screening the consistent images at the moment, and recording the number omega of the images, and when omega > zeta, wherein zeta is an image number threshold value, representing that the number of the images with consistent grid in comments is large, and not having image authenticity; otherwise, if ω < ζ, to further analyze the authenticity of the user comment, step s302_3 is entered;
s302_3: the consistency of shooting angles of shops in the images is analyzed, and the method comprises the following steps:
1) Traversing an image datasetArbitrary image b using picture processing algorithm j Modified to the same size;
2) Comparing the image similarity: according to the similarity formula:obtaining the similarity between images when xi>The method comprises the steps of (1) determining the similarity of images, wherein the (E) is a similarity threshold value, representing that the similarity of the images is high, classifying the images into one type, and forming q different types of image data at the moment;
3) Extracting one kind of image data, and overlapping the images by using a coincidence algorithm: extracting texture features of a plurality of images by using an LBP feature extraction method, sequentially carrying out pixel fusion on the image textures with the same pixel points, and taking the area with the most new pixel points as a distinguishing mark at the moment;
4) Constructing a three-dimensional model of a certain scene of a store according to a plurality of images in the image data, and determining the approximate direction of each image shooting by taking a distinguishing mark as a reference object to form a direction vector set
5) Analyzing the included angle and judging the consistency of shooting angles: traversing the set of direction vectors P according to the formula: when cos theta>Γ, wherein Γ is an angle similarity threshold, the image shooting angles are identified to be similar, the similar number u is recorded, and when u>Zeta indicates that the number of images with similar shooting angles is large, and the image is not true, otherwise, u<Returning to the step 3) to extract other kinds of image data until the traversal is finished;
s303: judging whether the store performs false marketing according to the analysis result:
judging that a store has false marketing behaviors when the text content is analyzed to be not true or the image is analyzed to be not true;
in step S5: identifying user demands, and performing intelligent pushing, wherein the method comprises the following specific steps of:
s501: recognizing voice data of a user, and extracting voice keywords of the user by utilizing a keyword recognition technology;
s502: matching relevant stores according to the voice keyword information;
s503: transmitting stores with high matching degree to the mobile terminal for selection by a user;
s504: after confirming the store, the user use satisfaction information is received, and the data is returned to step S503.
2. A user data analysis management system applied to the big data based user data analysis management method of claim 1, characterized in that: the system comprises: the system comprises a data acquisition module, a database, a data analysis module, an intelligent screening module and a data use module;
the output end of the data acquisition module is connected with the input end of the database, the output end of the database is connected with the input end of the data analysis module, the output end of the data analysis module is connected with the input end of the intelligent screening module, and the output end of the intelligent screening module is connected with the input end of the data use module;
collecting comment data of all shops in a network through the data collecting module;
storing all acquired data through the database;
analyzing the authenticity of the user comments through the data analysis module, and judging whether the store performs false marketing or not;
screening shops subjected to false marketing through the intelligent screening module;
and identifying the user demand through the data use module, and performing intelligent pushing.
3. A big data based user data analysis management system according to claim 2, wherein: the data acquisition module comprises an information acquisition unit and a picture acquisition unit;
the information acquisition unit is used for acquiring text data of shop comments; the picture acquisition unit is used for acquiring image information of shop comments.
4. A big data based user data analysis management system according to claim 2, wherein: the data analysis module comprises a text analysis unit, a picture analysis unit and an authenticity judgment unit;
the text analysis unit is used for analyzing the authenticity of the text in the shop comments of the user; the picture analysis unit is used for analyzing the authenticity of the image data in the comments; the authenticity judging unit is used for judging whether the store performs false marketing or not according to the analysis result;
the text analysis unit comprises a content comparison subunit and a similarity analysis subunit;
the content comparison subunit is used for comparing the number of the good comments and the poor comments in the user comments; the similarity analysis subunit is used for analyzing the relevance between comment contents;
the picture analysis unit comprises a parameter analysis subunit and an image analysis subunit;
the parameter analysis subunit is used for analyzing the consistency of the image parameters; the image analysis subunit is used for analyzing consistency of shooting angles of shops in the images.
5. A big data based user data analysis management system according to claim 2, wherein: the data use module comprises an information identification unit, a keyword analysis unit, an intelligent pushing unit and a user feedback unit;
the information identification unit is used for identifying voice data of a user; the keyword analysis unit is used for analyzing the user demand according to keywords in the voice data and matching with related shops; the intelligent pushing unit is used for sending shops with high matching degree to the mobile terminal for users to select; the user feedback unit is used for receiving the user satisfaction information and sending the data to the intelligent pushing unit.
CN202211711362.2A 2022-12-29 2022-12-29 User data analysis management system and method based on big data Active CN115983873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211711362.2A CN115983873B (en) 2022-12-29 2022-12-29 User data analysis management system and method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211711362.2A CN115983873B (en) 2022-12-29 2022-12-29 User data analysis management system and method based on big data

Publications (2)

Publication Number Publication Date
CN115983873A CN115983873A (en) 2023-04-18
CN115983873B true CN115983873B (en) 2023-07-28

Family

ID=85957608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211711362.2A Active CN115983873B (en) 2022-12-29 2022-12-29 User data analysis management system and method based on big data

Country Status (1)

Country Link
CN (1) CN115983873B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076812B (en) * 2023-10-13 2023-12-12 西安康奈网络科技有限公司 Intelligent monitoring management system of network information release and propagation platform

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667337A (en) * 2020-04-28 2020-09-15 苏宁云计算有限公司 Commodity evaluation ordering method and system
WO2022021400A1 (en) * 2020-07-31 2022-02-03 深圳齐心集团股份有限公司 E-commerce comment identification and marking system
WO2022021391A1 (en) * 2020-07-31 2022-02-03 深圳齐心集团股份有限公司 Electronic commerce information push monitoring system
CN112396433A (en) * 2020-11-30 2021-02-23 翼果(深圳)科技有限公司 Method and system for identifying false commodity comments based on behavior of person to be evaluated
CN112700265A (en) * 2021-03-23 2021-04-23 广州格鲁信息技术有限公司 Anti-fraud system and method based on big data processing
CN113888231A (en) * 2021-10-21 2022-01-04 上海声通信息科技股份有限公司 Marketing data screening analysis system based on artificial intelligence

Also Published As

Publication number Publication date
CN115983873A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN108829764B (en) Recommendation information acquisition method, device, system, server and storage medium
CN109284729B (en) Method, device and medium for acquiring face recognition model training data based on video
US11914639B2 (en) Multimedia resource matching method and apparatus, storage medium, and electronic apparatus
KR20180122926A (en) Method for providing learning service and apparatus thereof
CN107590491B (en) Image processing method and device
CN111400607B (en) Search content output method and device, computer equipment and readable storage medium
US11531840B2 (en) Systems, methods, and storage media for training a model for image evaluation
CN111062871A (en) Image processing method and device, computer equipment and readable storage medium
CN105512180B (en) A kind of search recommended method and device
US10276213B2 (en) Automatic and intelligent video sorting
EP4086786A1 (en) Video processing method, video searching method, terminal device, and computer-readable storage medium
CN111625687B (en) Method and system for quickly searching people in media asset video library through human faces
CN105956051A (en) Information finding method, device and system
CN115983873B (en) User data analysis management system and method based on big data
CN107358490A (en) A kind of image matching method, device and electronic equipment
CN111447260A (en) Information pushing and information publishing method and device
CN112837108A (en) Information processing method and device and electronic equipment
CN113704623A (en) Data recommendation method, device, equipment and storage medium
US8270731B2 (en) Image classification using range information
CN109859011A (en) Based on the information push method in store, system and its storage medium in jewellery wire
CN115860829A (en) Intelligent advertisement image generation method and device
CN116503115B (en) Advertisement resource recommendation method and system based on Internet game platform
CN115205555B (en) Method for determining similar images, training method, information determining method and equipment
CN112989114B (en) Video information generation method and device applied to video screening
KR102228159B1 (en) Apparatus and method for generating positioning map for market research

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant