CN110737662A - data analysis method, device, server and computer storage medium - Google Patents

data analysis method, device, server and computer storage medium Download PDF

Info

Publication number
CN110737662A
CN110737662A CN201910958968.8A CN201910958968A CN110737662A CN 110737662 A CN110737662 A CN 110737662A CN 201910958968 A CN201910958968 A CN 201910958968A CN 110737662 A CN110737662 A CN 110737662A
Authority
CN
China
Prior art keywords
data
user data
tag
analyzed
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910958968.8A
Other languages
Chinese (zh)
Other versions
CN110737662B (en
Inventor
俄万有
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910958968.8A priority Critical patent/CN110737662B/en
Publication of CN110737662A publication Critical patent/CN110737662A/en
Application granted granted Critical
Publication of CN110737662B publication Critical patent/CN110737662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention provides data analysis methods, devices, a server and a computer storage medium, wherein the method comprises the steps of obtaining an identification of a number packet to be analyzed carried by a data analysis request and a target data label of requested user data to be analyzed when the data analysis request is received, obtaining a target label type corresponding to the target data label, obtaining an index table corresponding to the target label type, obtaining the user data to be analyzed matched with the identification of the number packet to be analyzed and the target data label according to the index table, wherein the index table comprises a mapping relation among the data label, the information of the user data and the identification of the number packet, obtaining the user data to be analyzed which is the user data indicated by the information of the user data in the index table, analyzing the obtained user data to be analyzed to obtain user attribute characteristics corresponding to the number in the number packet to be analyzed.

Description

data analysis method, device, server and computer storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to an data analysis method, an apparatus, a server, and a computer storage medium.
Background
With the rapid development of internet technology and big data technology, user portraits and user refined operation strategies based on massive user data mining become important means for improving service flow of internet services. When analyzing user data, the currently common technical means is to map and associate offline user data with a number packet to generate a number packet tag table, and then analyze the user data based on the number packet tag table. However, in the above method, offline data calculation is needed in the data analysis process, and the offline calculation is time-consuming and cannot feed back the data analysis result to the user in real time, thereby resulting in low data analysis efficiency.
Disclosure of Invention
The embodiment of the invention provides data analysis methods and devices, a server and a computer storage medium, which can realize real-time analysis of user data and effectively improve the analysis efficiency of the user data.
, embodiments of the present invention provide a method of data analysis, the method comprising:
when a data analysis request is received, acquiring an identifier of a number packet to be analyzed carried by the data analysis request and a target data tag of requested user data to be analyzed;
acquiring a target label category corresponding to the target data label, and acquiring an index table corresponding to the target label category, wherein the index table comprises a mapping relation among the data label, the user data information and the number packet identifier;
acquiring user data to be analyzed matched with the identification of the number packet to be analyzed and the target data label according to the index table, wherein the user data to be analyzed is user data indicated by the information of the user data in the index table;
and analyzing the acquired user data to be analyzed to obtain the user attribute characteristics corresponding to the numbers in the number packet to be analyzed.
In , the number packet includes at least numbers, and the numbers in the number packet are used to indicate the communication identifier of the user, where the communication identifier is any of the device identifier, the application account, and the routing address.
In an embodiment, before the obtaining, when receiving a data analysis request, an identifier of a to-be-analyzed number packet carried by the data analysis request and a target data tag of requested to-be-analyzed user data, the method further includes:
when detecting that a newly added number packet exists, acquiring the newly added number packet, wherein the newly added number packet is matched with the number packet to be analyzed;
acquiring a feature tag table generated according to user data with the data tag of the target tag category as a tag, wherein the feature tag table comprises a mapping relation between the data tag and the user data;
and creating an index table according to the newly added number packet and the feature tag table.
In an embodiment, the creating an index table according to the newly added number packet and the feature tag table includes:
acquiring the identification of the newly added number packet and the number in the newly added number packet;
adding a number packet mark in the feature tag table to mark the identification of the newly added number packet for the user data information meeting the preset condition in the feature tag table, wherein the number meeting the preset condition is the number in the newly added number packet for identifying the user data information;
and creating an index table according to the characteristic label table added with the number packet marks.
In an embodiment, the creating an index table according to the feature tag table added with the number packet tag includes:
and creating an inverted index table according to the characteristic tag table added with the number packet marks, wherein the inverted index table takes the data tags and the number packet marks as attribute values, and takes the information of the user data as the data with the attribute values.
In an embodiment, the obtaining of the target label category corresponding to the target data label includes:
at least label categories corresponding to the target data labels are inquired from a preset label dictionary, and the target label categories are any of the at least label categories.
In an embodiment, the obtaining, according to the index table, the to-be-analyzed user data that matches the identifier of the to-be-analyzed number packet and the target data tag includes:
acquiring the identifier of the number packet to be analyzed and the information of the target user data matched with the target data label according to the index table;
acquiring a storage address of target user data from the information of the target user data, wherein the target user data is user data indicated by the information of the target user data;
and acquiring the target user data according to the storage address, and taking the target user data as user data to be analyzed.
In another aspect, an embodiment of the present invention provides a apparatus for analyzing data, the apparatus including:
the acquiring unit is used for acquiring the identifier of the number packet to be analyzed carried by the data analysis request and the target data label of the requested user data to be analyzed when the receiving and sending unit receives the data analysis request;
the acquiring unit is further configured to acquire a target tag category corresponding to the target data tag, and acquire an index table corresponding to the target tag category, where the index table includes a mapping relationship among the data tag, information of the user data, and a number packet identifier;
the processing unit is used for acquiring user data to be analyzed matched with the identification of the number packet to be analyzed and the target data label according to the index table, wherein the user data to be analyzed is user data indicated by the information of the user data in the index table;
the processing unit is further configured to analyze the acquired user data to be analyzed to obtain user attribute characteristics corresponding to the numbers in the number packet to be analyzed.
, the embodiment of the invention provides servers, which include a processor and a memory, wherein the memory stores executable program codes, and the processor is used for calling the executable program codes and executing the data analysis method.
Accordingly, the present invention also provides computer storage media, which have instructions stored therein, when the instructions are run on a computer, the computer is caused to execute the above data analysis method.
The embodiment of the invention acquires the identification of the number packet to be analyzed and the target data label in response to the data analysis request, acquires the index table matched with the target label type corresponding to the target data label, acquires the user data to be analyzed matched with the identification and the target data label according to the index table, and analyzes the user data to be analyzed to obtain the user attribute characteristics, thereby realizing the real-time analysis of the user data and effectively improving the analysis efficiency of the user data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic flow chart diagram of an data analysis method provided by an embodiment of the invention;
FIG. 2 is a schematic flow chart diagram of another data analysis method provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of an data analysis system according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating an index table creating method according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram of another data analysis method provided by an embodiment of the invention;
FIG. 6 is a schematic structural diagram of data analysis devices provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of servers according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, it is obvious that the described embodiments are only partial embodiments of the of the present invention, rather than all embodiments.
Embodiments of the present invention are described in detail below with reference to the drawings, and features of the following examples and embodiments may be combined without conflict.
In the internet service, data describing users is often multi-dimensional across, for example, in the video service, user data includes both large disk user playing data, active data and the like with a device id (guid) as an identifier (key), the device id including a physical address (MAC address) of the device, member attribute data and the like with a user application account (vuid) as a key, and also includes family attribute data with a family routing address (wifi-MAC) as a key, key identifiers users correspond to groups of user data.
In the process of analyzing data of internet services, user groups often need to be analyzed in a cross-type manner, for example, video member groups (number packages composed of vuids) need to be analyzed simultaneously in terms of member feature distribution and play feature distribution, the distribution of play features of large disk users can be analyzed in terms of play data and the like corresponding to the video member groups, the distribution of member features can be analyzed in terms of member attribute data and the like corresponding to the video member groups, and for the data analysis requirements, cross-type data integration needs to be performed, in a feasible implementation manner, at least two types of offline user data can be mapped and associated with the number packages to generate number package label tables related to the cross-type data corresponding to the number packages, and then user data analysis is performed based on the number package label tables, wherein the processing flow of the method is shown in fig. 1 and comprises the following steps:
and submitting a data analysis task in the analysis system by a data analyzer, and performing related data analysis configuration. The system analyzes the configuration information submitted by the data analyst, extracts the corresponding user attribute tags and number packets, and classifies the extracted user attribute tags, such as a large-disk tag, a member tag, a family tag, and the like. And mapping the original number packet into number packets corresponding to the label types according to the label types. And combining the label types and the number packet to generate a sub-label table corresponding to each label type. And combining the sub-label tables to form a large label table with the type of the target number packet as key. And importing the tag table into a search engine, and extracting and analyzing user data. And after the data analysis is finished, returning an analysis result to the personnel to be analyzed. However, in the above method, in the data analysis process, offline data calculation (including number packet mapping, sub-label table extraction, label combination table, and the like) is required, and the offline calculation is time-consuming and cannot feed back a data analysis result to a user in real time, thereby resulting in low data analysis efficiency.
The data analysis method includes the steps of inputting feature tag tables corresponding to various user data into a search system in advance, creating an inverted index table and storing account mapping data, obtaining an identifier of a number packet to be analyzed carried by a data analysis request and a target data tag of the requested user data to be analyzed when the data analysis request is received, obtaining a target tag type corresponding to the target data tag and obtaining an index table corresponding to the target tag type, wherein the index table comprises a mapping relation among the data tag, user data information and the number packet identifier, obtaining user data to be analyzed matched with the identifier of the number packet to be analyzed and the target data tag according to the index table, and the user data to be analyzed is user data indicated by the user data information in the index table, analyzing the obtained user data to be analyzed to obtain user attribute features corresponding to numbers in the number packet to be analyzed.
Referring to fig. 2, fig. 2 is a schematic flow chart of data analysis methods according to an embodiment of the present invention, where the data analysis method described in the embodiment of the present invention is applied to a server, the method includes:
s201, when a data analysis request is received, acquiring an identifier of a number packet to be analyzed carried by the data analysis request and a target data tag of requested user data to be analyzed.
In the embodiment of the invention, a data analysis worker submits a data analysis task through a terminal and performs related data analysis configuration; and the terminal sends a data analysis request to the server according to the configuration data input by the user, wherein the data analysis request carries the identification of the number packet to be analyzed and the target data label of the requested user data to be analyzed. And when receiving a data analysis request sent by the terminal, the server responds to the data analysis request to acquire the identifier of the number packet to be analyzed and the target data label.
The number packet comprises at least numbers of the same type, the numbers in the number packet are used for representing communication identifications of users, and the communication identifications are types of device identifications, application accounts and routing addresses, target data labels comprise or a plurality of data labels, the data labels are used for indicating data types of user data, in the embodiment, the user data comprise large-disk user playing with device id as key, active data and the like, member attribute data with user application accounts as key and the like, and family attribute data with family routing addresses as key and the like.
S202, obtaining a target label category corresponding to the target data label, and obtaining an index table corresponding to the target label category, wherein the index table comprises a mapping relation among the data label, the user data information and the number packet identifier.
In the embodiment of the invention, a server inquires at least label categories corresponding to a target data label from a preset label dictionary, such as label categories of a large disk label, a member label, a family label, a guid label, a vuid label, a wifi-mac label and the like, wherein the target label category is any label categories of at least label categories, information of user data in an index table comprises a number for identifying information of the user data, a storage address of the user data indicated by the information of the user data and or more types of the user data indicated by the information of the user data, in a implementation mode, the number for identifying the information of the user data is consistent with a number for identifying the user data indicated by the information, and the number for identifying the information of the user data is a communication identification of a corresponding user, and the communication identification is types of a device identification, an application account number and a routing address.
In , the index table corresponding to the target label type is created in advance, the server obtains the new number packet when detecting the new number packet, the new number packet matches with the number packet to be analyzed, i.e. the new number packet is the same as the number packet to be analyzed, the identifier of the new number packet matches with the identifier of the number packet to be analyzed, the server obtains the feature label table corresponding to the target label type, the feature label table is generated in advance according to the user data using the data label of the target label type as the label, and includes the mapping relationship between the data label and the information of the user data.
The method includes the steps of , creating an index table according to the new number packet and the feature tag table, in the embodiment, obtaining an identifier of the new number packet and a number in the new number packet, adding a number packet tag to the feature tag table to tag an identifier of the new number packet for user data meeting a preset condition in the feature tag table, wherein the preset condition is that the number for identifying the user data is the number in the new number packet or the number for identifying the user data is the number of a target category corresponding to the number in the feature tag table, the target category is different from the category to which the number in the new number packet belongs, and the server stores mapping relationships between the numbers of the different categories, and , creating the index table according to the feature tag table to which the number packet tag is added, creating an inverted index table according to the feature tag table to which the number packet tag is added, and the inverted index table takes the data tag and the number packet as attribute values, so that the user data can be analyzed based on the user data and the user data needed to be analyzed quickly.
S203, obtaining the user data to be analyzed matched with the identification of the number packet to be analyzed and the target data label according to the index table, wherein the user data to be analyzed is the user data indicated by the information of the user data in the index table.
In the embodiment of the invention, a server acquires the identifier marked as the number packet to be analyzed and the information of the target user data marked as the target data label according to an index table corresponding to the target label category; and acquiring the user data to be analyzed according to the information of the target user data. When the information of the target user data comprises the target user data, directly acquiring the target user data from the information of the target user data, and taking the acquired target user data as user data to be analyzed; the target user data is user data indicated by information of the target user data. When the information of the target user data comprises the storage address of the target user data, the storage address of the target user data is obtained from the information of the target user data, the target user data is obtained according to the storage address, and the obtained target user data is used as the user data to be analyzed.
In another embodiment, the target data tag corresponds to at least two tag categories, and the at least two tag categories are tag categories in a preset tag category set, wherein the preset tag category set comprises at least two tag categories of a guid tag, a vuid tag and a wifi-mac tag, the target tag category is any tag categories of the at least two tag categories, in the embodiment, for a th tag category, a th tag category is any tag categories of the at least two tag categories, the server acquires target user data information of a data tag labeled as a packet to be analyzed and labeled as a th tag category according to an index table corresponding to a th tag category , acquires user data to be analyzed according to the information of the target user data of the data tag category, acquires at least th number marked as the identifier of the packet to be analyzed according to an index table corresponding to a 356 th tag category, the th number is a number category corresponding to the th tag category, acquires a mapping relationship between the target data packet and the target data packet corresponding to the second tag category 465, acquires a number packet corresponding to the index table corresponding to the second tag category from an index table corresponding to the second tag category, acquires a number packet of the second tag category, acquires a mapping relationship between the target data packet corresponding to the target data packet and the target data packet from the second tag category, the corresponding to the second tag category, the label packet corresponding to the third tag category, the third tag.
For example, the at least two tag categories are two tag categories of a vuid tag and a guid tag, and the target tag category is of the vuid tag and the guid tag, assuming that the -th tag category is the vuid tag and the second tag category is the guid tag, for the vuid tag, the information of the target user data marked as the number packet to be analyzed and marked as the vuid tag is obtained according to the index table corresponding to the vuid tag, the -th user data to be analyzed is obtained according to the information of the target user data, and at least vuid numbers marked as the identification of the number packet to be analyzed are obtained according to the index table corresponding to the vuid tag, a vuid number packet is formed according to the at least vuid numbers, the mapping relationship between the preset vuid number and the guid number is obtained, the guid number packet corresponding to the vuid number packet is obtained according to the mapping relationship, and the guid number packet corresponding to the vuid number packet in the vuid number packet is obtained, and the target user data is analyzed according to the target user data marked as the target data packet, and the target user data to be analyzed, wherein the target data is obtained .
In another implementation, the server obtains, according to an index table corresponding to a target tag category, information of target user data marked as an identifier of the number packet to be analyzed and marked as a data tag of the target tag category, and obtains user data to be analyzed according to the information of the target user data.
And S204, analyzing the acquired user data to be analyzed to obtain the user attribute characteristics corresponding to the numbers in the number packet to be analyzed.
For example, when the acquired data to be analyzed is the playing data of the large disk user, the acquired playing data of the large disk user is analyzed, and the playing characteristics of the application user group corresponding to the number packet to be analyzed can be obtained; when the acquired data to be analyzed is member attribute data, the acquired member attribute data is analyzed, and member characteristics of the application user group corresponding to the number packet to be analyzed can be acquired.
The embodiment of the invention obtains the identification of the number packet to be analyzed and the target data label in response to the data analysis request, obtains the index table matched with the target label type corresponding to the target data label, obtains the user data to be analyzed matched with the identification and the target data label according to the index table, and analyzes the user data to be analyzed to obtain the user attribute characteristics, thereby realizing the real-time analysis of the user data and improving the analysis efficiency of the user data.
In a possible implementation manner, the data analysis method provided by the embodiment of the present invention may be applied to a data analysis system, and the data analysis system may be disposed in a server. As shown in fig. 3, the data analysis system includes five parts, namely an interaction layer, an access layer, a logic layer, a data synchronization layer and a data storage layer. The functions of each part are as follows:
1. an interaction layer: the web system with input and output capabilities is provided for the user, so that a simple and clear data analysis task configuration page and an analysis result display page are provided for the user, and the data analysis requirements of the user are met.
2. In the implementation mode, when the data analysis request includes a plurality of data tags and the tags belong to different tag categories, the data analysis request can be split into a plurality of concurrent data analysis requests according to the tag categories to which the data tags belong, and simultaneously, each data analysis request is adapted according to syntax supported by the search module to generate a corresponding query statement.
3. The logic layer provides Search service, which can be provided by a Search module, comprises a Search cluster, such as an Elastic Search cluster, wherein the Elastic Search is a Search server based on Lucene and provides a full-text Search engine with distributed multi-user capability, the Lucene is a full-text Search engine tool kit with open source codes, clusters (cluster) are organized by or a plurality of nodes at , commonly hold the whole data, and provides indexing and searching functions and provides online real-time data Search and aggregation service according to query statements provided by an access layer.
4. The data synchronization layer comprises a tag data synchronization module, a number packet data synchronization module and a account mapping data storage database, wherein the tag data synchronization module is used for periodically loading user tag data of a plurality of data types into a search cluster, namely groups of data consisting of information such as A, gender, age, academic history, income, viewing preference and the like, of a user portrait, the data synchronization layer is provided with an account mapping data storage service for periodically loading mapping relation data of various accounts into a account mapping data storage database which supports online real-time query service, such as a Remote data service (Remote Dictionary Server, Redis) database, the number packet data synchronization module adds number packet identifications for every newly added user number packets, queries the stored account mapping relation, marks corresponding to the original number packets are marked in a converted identification information center, for example, a guid number packet pkg _100 includes an account number packet corresponding to a guid, and a guid label is added to a corresponding table 3651, and the guid label is extended to a table 8678.
5. The system comprises a data storage layer, a label data storage database, a user number packet data storage service and a user account mapping data storage service, wherein the data storage layer provides offline big data storage service which can be provided by a Hadoop Distributed File System (HDFS), the user label data storage service can be used for storing user label data obtained by calculation of a big data platform, the user number packet data storage service can be used for storing user number packet data obtained by calculation of the big data platform, the number packet data storage service can be used for storing user account mapping data obtained by calculation of the big data platform, and the second account mapping data storage database can be the same database or different databases.
Please , see fig. 4, where fig. 4 is a flow chart for creating an index table, taking guid and vuid label data as examples, guid label data is user data with device id as a data label, such as user device id + large disc playing data, and user device id as a data label, vuid label data is user data with user application account as a data label, such as user application account + member attribute data, and user application account is a data label, and as shown in fig. 4, the flow chart for creating an index table includes the following steps:
1. and the tag data synchronization module initiates a guid tag data reading request to the tag data storage database and initiates a vuid tag data reading request to the tag data storage database in parallel. The tag data storage database can periodically update the stored user tag data, including updating guid tag data and vuid tag data.
2. The tag data storage database returns guid tag data and vuid tag data to the tag data synchronization module.
3. The method comprises the steps that after a tag data synchronization module obtains guid tag data, a guid tag index table creating request is sent to a search module to enable the search module to create a guid tag index table guid _ tag, after the tag data synchronization module obtains the vuid tag data, a vuid tag index table creating request is sent to the search module in parallel to enable the search module to create the vuid tag index table vuid _ tag, in the implementation mode, the guid tag index table creating request carries the guid tag data obtained by the tag data synchronization module, and the vuid tag index table creating request carries the vuid tag data obtained by the tag data synchronization module.
4. And after receiving the guid tag index table creation request, the search module creates a guid tag index table for the guid tag data. An inverted index table can be created for the guid tag data to support real-time indexing of the guid tag data; specifically, a feature tag table is created from the guid tag data, and the feature tag table is a forward-ranking table, such as: the organization format of the feature tag table is: device id + information of the large disc play data (e.g., storage address of the large disc play data); the search module then creates an inverted index table for the data in the feature tag table. And the search module will return the creation results of the guid tag index table to the tag data synchronization module. Similarly, after receiving the creating request of the vuid label index table, the searching module creates the vuid label index table for the vuid label data; an inverted index table can be created for the vuid label data to support real-time indexing of the vuid label data; and the search module will return the creation results of the vuid tag index table to the tag data synchronization module.
5. And after receiving the response that the guid tag index table and the vuid tag index table are successfully established, the tag data synchronization module sends a notification that the tag data synchronization is successful to the number packet data synchronization module. And after receiving the notice of successful tag data synchronization, the number packet data synchronization module returns a confirmation notice message of successful tag data synchronization to the tag data synchronization module. And after the tag data synchronization module receives the confirmation notification message sent by the number packet data synchronization module, updating the synchronization state of the tag data to be completed.
6. The number packet data synchronization module requests the number packet data storage database to acquire a vuid number packet, the requested vuid number packet is a new any number packet consisting of user application accounts, the number packet data storage database returns a corresponding vuid number packet to the number packet data synchronization module, and the identifier of the returned number packet is pkg 1.
7. For each vuid numbers x in the number package pkg1, the number package data synchronization module queries guid numbers y corresponding to the vuid numbers x from a th account mapping data storage database, the vuid numbers x and the guid numbers y correspond to users, for the vuid numbers x, the number package data synchronization module initiates a vuid label index table updating request to the search module, after the search module receives the vuid label index table updating request, a label pkg1 is added to the vuid label index table vuid _ tag, the label pkg1 of data (including information of accounts, data labels, label data or label data and the like) corresponding to the vuid ═ x is set to be 1, and an updating result is returned to the number package data synchronization module.
And for the guid number y corresponding to the vuid number x, the number packet data synchronization module initiates a guid tag index table updating request to the search module. After receiving the update request of the guid tag index table, the search module adds pkg1 a new tag for guid _ tag of the guid tag index table, and sets 1 to the pkg1 tag of the data corresponding to guid ═ y; and returning the updating result to the number packet data synchronization module.
8. After all the numbers in the number packet pkg1 are processed, the number packet data synchronization module updates the synchronization status of the number packet pkg1 to be completed.
It should be noted that, for steps 6 and 7, the number packet data synchronization module may also request the number packet data storage database to obtain a guid number packet, where the guid number packet is a newly added any number packet composed of user equipment identifiers, the number packet data storage database returns a corresponding guid number packet to the number packet data synchronization module, where the identifier of the returned number packet is pkg2, for every guid numbers y in the number packet pkg2, the number packet data synchronization module queries a vuid number x corresponding to the guid number y from the -th account mapping data storage database, the vuid number x and the guid number y correspond to users, for the guid number y, the number packet data synchronization module initiates a guid tag index table update request to the search module, after the search module receives the guid tag index table update request, the guid tag index table guid _ tag pkg2 is newly added to the guid number tag 63 corresponding to the guid number, and returns a new guid tag number synchronization request to the guid tag synchronization module, and returns a result synchronization module, and the vuid tag synchronization module returns a vuid tag synchronization request for the guid number synchronization module pkg2 corresponding to the guid packet data synchronization module.
And when the data analysis is required, responding to the data analysis request to perform data analysis. As shown in fig. 5, a process flow of user data analysis is shown, which includes the following steps:
1. a data analysis person (i.e. a user of the data analysis system) initiates a data analysis request to the query adaptation module through the web system, wherein the data analysis request carries the identifier pkg1 of the number packet to be analyzed and the data tag of the requested data of the user to be analyzed.
2. After receiving the data analysis request, the query adaptation module analyzes the data label and the number packet identifier in the data analysis request; and querying the label dictionary to obtain the category corresponding to each data label. And the label dictionary returns the label category corresponding to the target data label.
3. For the vuid label, the query adaptation module increases pkg1 an analysis condition that the label is equal to 1, and initiates a vuid label analysis request to the search module to request the search module to acquire user data corresponding to the vuid label. For guid tags, the query adaptation module adds pkg1 an analysis condition equal to 1 and initiates a guid tag analysis request to the search module to request the search module to obtain user data corresponding to the guid tag.
4. The search module receives the guid label analysis request, obtains second user data to be analyzed according to the guid label index table established in advance, obtains pkg1 users with labels equal to 1 corresponding to the second user data to be analyzed, and analyzes the th user data to be analyzed according to the guid label index table established in advance, and returns the analysis result to the query adaptation module.
5. After receiving the analysis results of all types of user data, the query adaptation module carries out data aggregation to obtain a final data analysis result; and sending the data analysis result to a web system so as to display the data analysis result to a data analyst.
The method can realize the real-time analysis of the user data of the cross-type data joint index, for each number packets to be analyzed, the real-time analysis of the user data corresponding to the number packets to be analyzed can be realized by adding the corresponding number packet labels to the feature label table of each data types, and when the user data of the cross-type data are analyzed, the user data are split into the corresponding index requests according to the feature types to which the attribute features of the user to be analyzed belong, so that the real-time analysis of the user data of the cross-type data can be realized, and the efficiency of data analysis is greatly improved.
Referring to fig. 6, fig. 6 is a schematic structural diagram of data analysis devices according to an embodiment of the present invention, where the data analysis device described in the embodiment of the present invention corresponds to the server described above, the device includes:
an obtaining unit 601, configured to obtain, when the transceiving unit 602 receives a data analysis request, an identifier of a to-be-analyzed number packet carried by the data analysis request and a target data tag of requested to-be-analyzed user data;
the obtaining unit 603 is further configured to obtain a target tag category corresponding to the target data tag, and obtain an index table corresponding to the target tag category, where the index table includes a mapping relationship among the data tag, the user data information, and the number packet identifier;
a processing unit 603, configured to obtain, according to the index table, to-be-analyzed user data that matches the identifier of the to-be-analyzed number packet and the target data tag, where the to-be-analyzed user data is user data indicated by information of the user data in the index table;
the processing unit 603 is further configured to analyze the obtained user data to be analyzed, so as to obtain a user attribute feature corresponding to a number in the number packet to be analyzed.
In , the number packet includes at least numbers, and the numbers in the number packet are used to indicate the communication identifier of the user, where the communication identifier is any of the device identifier, the application account, and the routing address.
In , the obtaining unit 601 is further configured to, when a new number packet is detected, obtain the new number packet, where the new number packet matches the number packet to be analyzed, obtain a feature tag table generated according to user data tagged with a data tag of the target tag type, where the feature tag table includes a mapping relationship between the data tag and information of the user data;
the processing unit 603 is further configured to create an index table according to the new number packet and the feature tag table.
In an embodiment, the information of the user data includes a number used to identify the information of the user data, and when the processing unit 603 creates an index table according to the new number packet and the feature tag table, the processing unit is specifically configured to:
acquiring the identification of the newly added number packet and the number in the newly added number packet;
adding a number packet mark in the feature tag table to mark the identification of the newly added number packet for the user data information meeting the preset condition in the feature tag table, wherein the number meeting the preset condition is the number in the newly added number packet for identifying the user data information;
and creating an index table according to the characteristic label table added with the number packet marks.
In the embodiment, when the processing unit 603 creates the index table according to the feature tag table added with the number packet label, it is specifically configured to:
and creating an inverted index table according to the characteristic tag table added with the number packet marks, wherein the inverted index table takes the data tags and the number packet marks as attribute values, and takes the information of the user data as the data with the attribute values.
In , when the obtaining unit 601 obtains the target label category corresponding to the target data label, it is specifically configured to:
at least label categories corresponding to the target data labels are inquired from a preset label dictionary, and the target label categories are any of the at least label categories.
In an embodiment, the information of the user data includes a storage address of the user data indicated by the information of the user data, and when the processing unit 603 acquires, according to the index table, the user data to be analyzed that matches the identifier of the number packet to be analyzed and the target data tag, the processing unit is specifically configured to:
acquiring the identifier of the number packet to be analyzed and the information of the target user data matched with the target data label according to the index table;
acquiring a storage address of target user data from the information of the target user data, wherein the target user data is user data indicated by the information of the target user data;
and acquiring the target user data according to the storage address, and taking the target user data as user data to be analyzed.
It can be understood that the functions of the functional units of the data analysis apparatus in the embodiment of the present invention can be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
The embodiment of the invention acquires the identification of the number packet to be analyzed and the target data label in response to the data analysis request, acquires the index table matched with the target label type corresponding to the target data label, acquires the user data to be analyzed matched with the identification and the target data label according to the index table, and analyzes the user data to be analyzed to obtain the user attribute characteristics, thereby realizing the real-time analysis of the user data and effectively improving the analysis efficiency of the user data.
Referring to fig. 7, fig. 7 is a schematic structural diagram of servers according to an embodiment of the present invention, where the server described in the embodiment of the present invention includes a processor 701, a communication interface 702, and a memory 703, where the processor 701, the communication interface 702, and the memory 703 may be connected by a bus or in another manner, and the embodiment of the present invention is described by being connected by a bus.
The processor 701 (or CPU) is a computing core and a control core of the server, and can analyze various instructions in the server and process various data of the server, for example: the CPU may transmit various types of interactive data between the internal structures of the server, and so on. The communication interface 702 may optionally include a standard wired interface, a wireless interface (e.g., Wi-Fi, mobile communication interface, etc.), controlled by the processor 701 for transceiving data. The Memory 703(Memory) is a Memory device in the server for storing programs and data. It is understood that the memory 703 herein may include both the built-in memory of the server and, of course, the expansion memory supported by the server. Memory 703 provides storage space that stores the operating system of the server, which may include, but is not limited to: android system, iOS system, Windows Phone system, etc., which are not limited in this respect.
In the embodiment of the present invention, the processor 701 executes the executable program code in the memory 703 to perform the following operations:
when a data analysis request is received through the communication interface 702, acquiring an identifier of a number packet to be analyzed carried by the data analysis request and a target data tag of requested user data to be analyzed;
acquiring a target label category corresponding to the target data label, and acquiring an index table corresponding to the target label category, wherein the index table comprises a mapping relation among the data label, the user data information and the number packet identifier;
acquiring user data to be analyzed matched with the identification of the number packet to be analyzed and the target data label according to the index table, wherein the user data to be analyzed is user data indicated by the information of the user data in the index table;
and analyzing the acquired user data to be analyzed to obtain the user attribute characteristics corresponding to the numbers in the number packet to be analyzed.
In , the number packet includes at least numbers, and the numbers in the number packet are used to indicate the communication identifier of the user, where the communication identifier is any of the device identifier, the application account, and the routing address.
In an embodiment, before the processor 701 obtains, when receiving a data analysis request through the communication interface 702, an identifier of a to-be-analyzed number packet carried by the data analysis request and a target data tag of requested to-be-analyzed user data, when detecting that a new number packet exists, the new number packet is obtained, the new number packet is matched with the to-be-analyzed number packet, a feature tag table generated from the user data tagged with the data tag of the target tag category is obtained, the feature tag table includes a mapping relationship between the data tag and information of the user data, and an index table is created according to the new number packet and the feature tag table.
In the embodiment, the information about the user data includes a number used to identify the information about the user data, and when the processor 701 creates the index table according to the newly added number packet and the feature tag table, the processor is specifically configured to obtain an identifier about the newly added number packet and a number in the newly added number packet, add a number packet tag to the feature tag table to tag the identifier about the newly added number packet with the information about the user data meeting a preset condition in the feature tag table, where the number meeting the preset condition is the number in the newly added number packet, create the index table according to the feature tag table to which the number packet tag is added, and mark the identifier about the newly added number packet with the information about the user data, where the number meeting the preset condition is the number in the newly added number packet, and create the index table according to the feature tag table to which the number packet tag is added.
In the embodiment, when the processor 701 creates the index table according to the feature tag table labeled by the number package, it is specifically configured to create an inverted index table according to the feature tag table labeled by the number package, where in the inverted index table, the data tag and the number package identifier are used as attribute values, and the information of the user data is used as data with the attribute values.
In an embodiment, when the processor 701 obtains the target tag categories corresponding to the target data tags, the processor 701 is specifically configured to query at least tag categories corresponding to the target data tags from a preset tag dictionary, where the target tag categories are any of the at least tag categories.
In an embodiment, the information of the user data includes a storage address of the user data indicated by the information of the user data, and when the processor 701 acquires the user data to be analyzed matching with the identifier of the number packet to be analyzed and the target data tag according to the index table, the processor 701 is specifically configured to acquire the information of the target user data matching with the identifier of the number packet to be analyzed and the target data tag according to the index table, acquire the storage address of the target user data from the information of the target user data, where the target user data is the user data indicated by the information of the target user data, acquire the target user data according to the storage address, and use the target user data as the user data to be analyzed.
In a specific implementation, the processor 701, the communication interface 702, and the memory 703 described in this embodiment of the present invention may execute the implementation manner of the server described in the data analysis method provided in this embodiment of the present invention, and may also execute the implementation manner described in the data device provided in this embodiment of the present invention, which is not described herein again.
The embodiment of the invention acquires the identification of the number packet to be analyzed and the target data label in response to the data analysis request, acquires the index table matched with the target label type corresponding to the target data label, acquires the user data to be analyzed matched with the identification and the target data label according to the index table, and analyzes the user data to be analyzed to obtain the user attribute characteristics, thereby realizing the real-time analysis of the user data and effectively improving the analysis efficiency of the user data.
Embodiments of the present invention also provide computer-readable storage media, which have instructions stored therein, which when run on a computer, cause the computer to perform the data analysis method according to the embodiments of the present invention.
Embodiments of the present invention also provide a computer program product containing instructions that, when executed on a computer, cause the computer to perform a data analysis method according to embodiments of the present invention.
It should be noted that, for the sake of simplicity, the aforementioned method embodiments are all expressed as series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence, because some steps can be performed in other sequences or simultaneously according to the present invention.
It will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program stored in computer readable storage medium, which may include flash Memory, Read-Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, etc.
The above disclosure is intended to be illustrative of only some embodiments of the invention, and is not intended to limit the scope of the invention.

Claims (10)

1, A method of data analysis, the method comprising:
when a data analysis request is received, acquiring an identifier of a number packet to be analyzed carried by the data analysis request and a target data tag of requested user data to be analyzed;
acquiring a target label category corresponding to the target data label, and acquiring an index table corresponding to the target label category, wherein the index table comprises a mapping relation among the data label, the user data information and the number packet identifier;
acquiring user data to be analyzed matched with the identification of the number packet to be analyzed and the target data label according to the index table, wherein the user data to be analyzed is user data indicated by the information of the user data in the index table;
and analyzing the acquired user data to be analyzed to obtain the user attribute characteristics corresponding to the numbers in the number packet to be analyzed.
2. The method of claim 1, wherein at least numbers are included in the number packet, and the numbers in the number packet are used for representing communication identifications of the user, and the communication identifications are any of device identifications, application accounts and routing addresses.
3. The method according to claim 1 or 2, wherein before obtaining the identifier of the number packet to be analyzed carried by the data analysis request and the target data tag of the requested user data to be analyzed when receiving the data analysis request, the method further comprises:
when detecting that a newly added number packet exists, acquiring the newly added number packet, wherein the newly added number packet is matched with the number packet to be analyzed;
acquiring a feature tag table generated according to user data with the data tag of the target tag category as a tag, wherein the feature tag table comprises a mapping relation between the data tag and the user data;
and creating an index table according to the newly added number packet and the feature tag table.
4. The method of claim 3, wherein the information of the user data includes a number for identifying the information of the user data, and the creating an index table based on the newly added number packet and the feature tag table comprises:
acquiring the identification of the newly added number packet and the number in the newly added number packet;
adding a number packet mark in the feature tag table to mark the identification of the newly added number packet for the user data information meeting the preset condition in the feature tag table, wherein the number meeting the preset condition is the number in the newly added number packet for identifying the user data information;
and creating an index table according to the characteristic label table added with the number packet marks.
5. The method of claim 4, wherein creating an index table from the feature tag table with number packet tags added comprises:
and creating an inverted index table according to the characteristic tag table added with the number packet marks, wherein the inverted index table takes the data tags and the number packet marks as attribute values, and takes the information of the user data as the data with the attribute values.
6. The method of claim 1, wherein the obtaining of the target tag category corresponding to the target data tag comprises:
at least label categories corresponding to the target data labels are inquired from a preset label dictionary, and the target label categories are any of the at least label categories.
7. The method of claim 1, wherein the information of the user data includes a storage address of the user data indicated by the information of the user data, and the obtaining the user data to be analyzed that matches the identifier of the number packet to be analyzed and the target data tag according to the index table includes:
acquiring the identifier of the number packet to be analyzed and the information of the target user data matched with the target data label according to the index table;
acquiring a storage address of target user data from the information of the target user data, wherein the target user data is user data indicated by the information of the target user data;
and acquiring the target user data according to the storage address, and taking the target user data as user data to be analyzed.
An apparatus for analyzing data, the apparatus comprising:
the acquiring unit is used for acquiring the identifier of the number packet to be analyzed carried by the data analysis request and the target data label of the requested user data to be analyzed when the receiving and sending unit receives the data analysis request;
the acquiring unit is further configured to acquire a target tag category corresponding to the target data tag, and acquire an index table corresponding to the target tag category, where the index table includes a mapping relationship among the data tag, information of the user data, and a number packet identifier;
the processing unit is used for acquiring user data to be analyzed matched with the identification of the number packet to be analyzed and the target data label according to the index table, wherein the user data to be analyzed is user data indicated by the information of the user data in the index table;
the processing unit is further configured to analyze the acquired user data to be analyzed to obtain user attribute characteristics corresponding to the numbers in the number packet to be analyzed.
Server of , comprising a processor and a memory, the memory storing executable program code, the processor configured to invoke the executable program code to perform the data analysis method of any of claims 1-7 as .
10, computer storage media having stored therein instructions that, when executed on a computer, cause the computer to perform the data analysis method of any of claims 1-7 and .
CN201910958968.8A 2019-10-10 2019-10-10 Data analysis method, device, server and computer storage medium Active CN110737662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910958968.8A CN110737662B (en) 2019-10-10 2019-10-10 Data analysis method, device, server and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910958968.8A CN110737662B (en) 2019-10-10 2019-10-10 Data analysis method, device, server and computer storage medium

Publications (2)

Publication Number Publication Date
CN110737662A true CN110737662A (en) 2020-01-31
CN110737662B CN110737662B (en) 2024-06-18

Family

ID=69270051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910958968.8A Active CN110737662B (en) 2019-10-10 2019-10-10 Data analysis method, device, server and computer storage medium

Country Status (1)

Country Link
CN (1) CN110737662B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421108A (en) * 2021-05-11 2021-09-21 北京沃东天骏信息技术有限公司 Method, device and equipment for determining data relationship and storage medium
CN117579456A (en) * 2023-10-18 2024-02-20 中移互联网有限公司 Service message sending method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106959965A (en) * 2016-01-12 2017-07-18 腾讯科技(北京)有限公司 A kind of information processing method and server
WO2017125020A1 (en) * 2016-01-22 2017-07-27 腾讯科技(深圳)有限公司 Message processing method, device and system
CN107918618A (en) * 2016-10-10 2018-04-17 腾讯科技(北京)有限公司 Data processing method and device
CN110020333A (en) * 2017-07-27 2019-07-16 北京嘀嘀无限科技发展有限公司 Data analysing method and device, electronic equipment, storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106959965A (en) * 2016-01-12 2017-07-18 腾讯科技(北京)有限公司 A kind of information processing method and server
WO2017125020A1 (en) * 2016-01-22 2017-07-27 腾讯科技(深圳)有限公司 Message processing method, device and system
CN107918618A (en) * 2016-10-10 2018-04-17 腾讯科技(北京)有限公司 Data processing method and device
CN110020333A (en) * 2017-07-27 2019-07-16 北京嘀嘀无限科技发展有限公司 Data analysing method and device, electronic equipment, storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421108A (en) * 2021-05-11 2021-09-21 北京沃东天骏信息技术有限公司 Method, device and equipment for determining data relationship and storage medium
CN117579456A (en) * 2023-10-18 2024-02-20 中移互联网有限公司 Service message sending method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110737662B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN109299110B (en) Data query method and device, storage medium and electronic equipment
US11899681B2 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN111459985B (en) Identification information processing method and device
CN108846753B (en) Method and apparatus for processing data
CN109829287A (en) Api interface permission access method, equipment, storage medium and device
CN111339171B (en) Data query method, device and equipment
US11244153B2 (en) Method and apparatus for processing information
CN113076729B (en) Method and system for importing report, readable storage medium and electronic equipment
CN107491463B (en) Optimization method and system for data query
CN110457346A (en) Data query method, apparatus and computer readable storage medium
CN112800197A (en) Method and device for determining target fault information
CN114328632A (en) User data analysis method and device based on bitmap and computer equipment
CN111177481B (en) User identifier mapping method and device
CN108959294B (en) Method and device for accessing search engine
CN110737662A (en) data analysis method, device, server and computer storage medium
CN114238767B (en) Service recommendation method, device, computer equipment and storage medium
CN111858617A (en) User searching method and device, computer readable storage medium and electronic equipment
CN116263659A (en) Data processing method, apparatus, computer program product, device and storage medium
US20140006438A1 (en) Virtual agent response to customer inquiries
CN110674383B (en) Public opinion query method, device and equipment
US12050634B2 (en) Method and apparatus for distributing content across platforms, device and storage medium
CN110781375A (en) User state identification determining method and device
CN115098738A (en) Service data extraction method and device, storage medium and electronic equipment
CN115481026A (en) Test case generation method and device, computer equipment and storage medium
CN117009430A (en) Data management method, device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40021043

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant