CN107908606A - Method and system based on different aforementioned sources automatic report generation - Google Patents
Method and system based on different aforementioned sources automatic report generation Download PDFInfo
- Publication number
- CN107908606A CN107908606A CN201711055134.3A CN201711055134A CN107908606A CN 107908606 A CN107908606 A CN 107908606A CN 201711055134 A CN201711055134 A CN 201711055134A CN 107908606 A CN107908606 A CN 107908606A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- label
- information
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/177—Editing, e.g. inserting or deleting of tables; using ruled lines
- G06F40/18—Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Business, Economics & Management (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the method and system based on different aforementioned sources automatic report generation, the described method includes:Step 1, the static information data from first information source acquisition user;Step 2, the multidate information data from the second information source acquisition user;Step 3, analysis acquired static state and multidate information data, carry out data cleansing, and the data needed for generation report are obtained after filtering/formatting, include multiple labels of user;The data and label that step 4, basis are obtained in step 3, weight is calculated to each label of user;Step 5, combined the result of calculation of step 4 with the data of step 3, forms the data acquisition system for including each dimension of the user.The present invention considers the separate sources of data, carries out the processing of differentiation, improves fineness and the accuracy of processing;In view of the timeliness of user data, it is established that Data renewal mechanism;Using user's portrait as with reference to information, greatly improve related service and handle speed so that customer relation management is more accurate.
Description
Technical field
The present invention relates to Internet service technical field, more particularly to the side based on different aforementioned sources automatic report generation
Method.
Background technology
With the development of internet explosion type, mass data all is being produced daily, how effectively to be analyzed by method
Mass data, and from wherein finding favourable specification or information has become a kind of trend.
, it is necessary to carry out manual sort to user behavior, judge in traditional method, and provided targetedly in the later stage
Service or other processing replies.Under the scene of mass data, often dimension is more, data volume is big for data, is difficult manually by user
The relevant indicator-specific statistics of behavior is comprehensive.Further, since situations such as fatigue occurs in people, the method for this traditional manual identified is accurate
True rate is not high.
After gradually stepping into the big data epoch in internet, the behavior of user all will be visual in face of service provider.Clothes
Business business also begin to focus day benefit focus on how using big data come precision marketing, and then deeply excavate potential business valency
Value.Then, the concept of " user's portrait " is also just come into being.Big data enables service provider advantageously to be obtained by internet
The more extensive feedback information of user, for further precisely, rapidly analyze the important business such as user behavior custom, consumption habit
Information, there is provided enough data basis.With the understanding to people progressively deeply, user draw a portrait (UserProfile) it is general
Thought is come into being, it is used for the information overall picture that user is taken out by user tag, is considered as service provider using big data
Foundation.Typical user's portrait be by user information labeling, be exactly service provider by collect with analysis consumer's society attribute,
After the data of the main informations such as habits and customs, consumer behavior, the business overall picture of a user is ideally taken out, is considered as
It is the basic mode that service provider applies big data technology.
However, at present, identification, the foundation of user's portrait to user tag, mainly or pass through manual intervention and calculating
The mode of machine simple process conversion is completed, and there are following deficiency:1st, take very long;2nd, high labor cost;3rd, the result of generation
It is not directly perceived enough;4th, data input has the risk of errors caused by manually.
In addition, excessively rely on the otherness for user's portrait result that background work personnel individual factor can cause very
Greatly, and the timeliness of label is not accounted for, the user that can cause to finally obtain draws a portrait not accurate enough.
In the prior art, the method classified to user behavior and predicted is more single, undesirable with reference to effect.By
Include online and offline behavior in user behavior, data source is complicated, needs exist for:Exploitation can be directed to different numbers
According to source, integrated judgment and prediction user property and the scheme for generating user's portrait with reference to a variety of classification Predicting Techniques.
The content of the invention
In view of this, how quickly to capture information and generate chart intuitive and easy to understand (user tag and user's portrait), supply
Policymaker carries out the foundation of decision-making just into an important problem.Applicant creatively by multiple data sources Macro or mass analysis,
And various charts are automatically generated according to specified services scene, quickly make decision-making for policymaker.
It is a primary object of the present invention to provide the method that various charts are automatically generated based on different aforementioned sources.This method can
Website relevant information is crawled, with reference to existing business datum by using Scrapy, by web data digging technology
PageRank algorithms and sorting algorithm parse the data without source, are classified, and are finally retouched using cluster analysis generation
State, call Python Charts generation user tags, be further abstracted, collect and draw a portrait for user.
According to an embodiment of the invention, there is provided a kind of method based on different aforementioned sources automatic report generation, including:
Step 1, the static information data from first information source acquisition user;
Step 2, the multidate information data from the second information source acquisition user;
Step 3, analysis acquired static state and multidate information data, carry out data cleansing, are given birth to after filtering/formatting
Into multiple labels of the data needed for report, including user;
The data and label that step 4, basis are obtained in step 3, weight is calculated to each label of user;
Step 5, combined the result of calculation of step 4 with the data of step 3, forms the data set for including each dimension of the user
Close.
According to an embodiment of the invention, wherein, first information source is the user data at business service end, the second information source bag
The user behavior data from third party's acquisition, business datum and the user behavior data with business service end under line are included, it is described
User behavior data includes the use of the user behavior data that Scrapy is captured from third party website.
According to an embodiment of the invention, wherein, in step 4, the label weight is identified below:
Label weight=decay factor × behavior weight × network address weight,
Wherein, decay factor is determined as the time of the act of the user behavior data involved by each label of user, behavior
Weight is determined that the network address weight is by the every of user as the behavior classification of the user behavior data involved by each label of user
Information source involved by a label determines.
According to an embodiment of the invention, wherein, the step 3 includes:Variable range is carried out to static and multidate information data
Between handle, wherein, according to the rule of business be as variable behavioral data delimit section, and by the Interval Maps of delimitation for tool
There is operational indicator, so as to be inputted as follow-up numerical value.
According to an embodiment of the invention, wherein, by following steps, the behavior classification is determined:
Calculate attribute of the user behavior data in default each dimension;
Classify accordingly according to the source of user behavior data and with the corresponding attribute in the source, selection
Model;
According to selected disaggregated model, classify to user behavior data.
According to an embodiment of the invention, wherein it is determined that the step of behavior classification further includes:
User identity is identified, obtains attribute of the user behavior data in each dimension, if the use
Family behavioral data is imperfect in the attribute of partial dimensional, then the historical behavior data of the user is called, with the preset period of time
User behavior data merges, and supplements the attribute of the partial dimensional.
According to an embodiment of the invention, wherein, for the user behavior data at business service end, trade-off decision tree classification
Model, the user behavior data obtained for business datum under line, and/or from third party, selects random forest classification mould
Type.
According to an embodiment of the invention, wherein, in step 4, each label of user has under different business scene
Different weighted values, in steps of 5, in the data acquisition system of each dimension of user, according to current business scenario, selects weight to lean on
Preceding user tag, carries out visualization processing, generation user's portrait.
According to an embodiment of the invention, there is provided a kind of system based on different aforementioned sources automatic report generation, including:
First acquisition module 101, for obtaining the static information data of user from first information source;
Second acquisition module 102, for obtaining the multidate information data of user from the second information source;
Data analysis module 103, for analyzing acquired static state and multidate information data, carries out data cleansing, filter/
The data needed for generation report are obtained after formatting, include multiple labels of user;
Weight computation module 104, for according to the first acquisition module 101 and the second acquisition module 102 acquisition data, with
And the label of data analysis module generation, weight is calculated to each label of user;
Data binding modules 105, for the result of calculation of weight computation module 104 and data analysis module 103 to be obtained
Data combine, formed and include the data acquisition system of each dimension of the user.
According to an embodiment of the invention, there is provided a kind of computer-readable recording medium, is stored thereon with based on different letters
The program of the method for breath source automatic report generation, when described program is executed by processor, the step of realizing the method.
Beneficial effects of the present invention essentially consist in:In view of the separate sources of data, the processing of differentiation is carried out, is improved
The fineness of processing and accuracy;It take into account the timeliness of user data, it is established that Data renewal mechanism;Drawn a portrait and made with user
For with reference to information, greatly improve related service handles speed;Different points can be selected according to the difference in sample data source
Class model cascades and/or parallel connection so that customer relation management is more accurate.
Brief description of the drawings
Fig. 1 is to be illustrated according to the flow of the method based on different aforementioned sources automatic report generation of the embodiment of the present invention
Figure;
Fig. 2 is the schematic diagram for the result classified according to the user of the embodiment of the present invention;
Fig. 3 is another schematic diagram for the result classified according to the user of the embodiment of the present invention;
Fig. 4 is to be illustrated according to the flow classified based on Decision Tree Inductive to user behavior of the embodiment of the present invention
Figure;
Fig. 5 for the user tag under scene of doing shopping generated according to the embodiment of the present invention schematic diagram;
Fig. 6 is to be illustrated according to the composition of the system based on different aforementioned sources automatic report generation of the embodiment of the present invention
Figure;
Fig. 7 is the schematic diagram according to the running environment of the system for being mounted with application program of the embodiment of the present invention.
Embodiment
In the following, it is described in further detail with reference to implementation of the attached drawing to technical solution.
It will be appreciated by those of skill in the art that although the following description is related to many of embodiment for the present invention
Ins and outs, but be only for not meaning that any restrictions for illustrating the example of the principle of the present invention.The present invention can be applicable in
Occasion outside different from ins and outs exemplified below, without departing from the principle of the present invention and spirit.
, may pair can be in description in the present specification in addition, miscellaneous in order to avoid being limited to the description of this specification
The portion of techniques details obtained in prior art data has carried out the processing such as omission, simplification, accommodation, this technology for this area
It will be understood by for personnel, and this does not interfere with the open adequacy of this specification.
Hereinafter, description is used to carry out the embodiment of the present invention.Note that description will be provided with following order:1st, base
In the method (Fig. 1) of different aforementioned sources automatic report generation;2nd, the definite method (Fig. 2 to 5) of user behavior classification;3rd, according to this
The system (Fig. 6) for being mounted with application program of the embodiment of invention.
1st, the method based on different aforementioned sources automatic report generation
As shown in Figure 1, according to an embodiment of the invention, there is provided a kind of auto report completing method, including:
Step S100, the static state and multidate information data of user is obtained from different aforementioned sources;
Wherein, static information data refer to that user stablizes the information of (being not easy to change over time) relatively, are derived mainly from public
Data, it may for example comprise the ascribed characteristics of population, commercial attribute etc. data.This category information, self-contained label, if there is true letter in enterprise
Breath is then more data cleansing (filtering, screening) work without excessive modeling and forecasting.
Wherein, multidate information data refer to the continually changing information of user, include the behavioural information of user, in a broad sense,
User opens webpage, has bought a cup;Time dog has been slipped at dusk with the user, has been taken daytime, has played yawn etc.
All it is user behavior Deng as.At present, user behavior has the trend for focusing on internet (such as electric business, social networks), in this way,
User behavior can be focused on to less scope, for example, delivering the microblogging on footwear quality, praising the micro- of " double 11 promote to power greatly "
Rich message.Etc. can regard Internet user's behavior as.Behavior on user internet is considered as user's multidate information
Key data source.
Step S200, the acquired above- mentioned information data of analysis, carry out data cleansing (filtering), after filtering/formatting
To the data needed for generation report, include the label of each user;
The target of user's portrait is by analyzing user behavior, and finally tagged for each user, each label corresponds to
In weight.Label, characterizes content, and user is interested in the content, preference, demand etc..Weight, characterizes index, user
Interest, preference function, it is also possible to characterize the demand degree of user, can simply be interpreted as confidence level, probability, etc.).
Step S300, according to the data and label obtained in above-mentioned steps, weight (public affairs are calculated to the label of each user
Formula is as follows);
User data model, may be summarized to be following formula:User identifier+time+behavior type+contact point (network address+
Content), i.e. what user at what time, what place, what has done.
The weight of user tag may decay with the increase of time, therefore it is decay factor r to define the time, behavior class
Type, network address determine weight, and content determines label, can be further converted into formula:
Label weight=decay factor × behavior weight × information source weight
Step S400, the result of calculation of above-mentioned steps is combined with the data of step S100, formation is respectively tieed up comprising the user
The data acquisition system of degree;
Step S500, analysis classification is carried out to data in data acquisition system, generation user describes using cluster analysis, most throughout one's life
Into user's report.
Specifically, can be by the PageRank algorithms in web data digging technology and sorting algorithm to number in database
According to analysis classification is carried out, finally generated and described using cluster analysis, call Python Charts generation reports.User, which classifies, to tie
The example of fruit is as shown in Figures 2 and 3.
Wherein, Fig. 2 shows user's classification results in the form of histogram, it can be seen that when the quantity of all types of user and
Proportion.
Fig. 3 shows the feature distribution of all types of user in the form of radar map, wherein have chosen representational in Fig. 2
3 class users.Such as feature A, B, C can be other features such as age, region (region consumption level), position (regional population).
Alternatively, step S100 includes:
S101, pre-processed (data cleansing, screening) to static and multidate information data, according to pretreated network
Behavioral data of the user in each default behavior classification is obtained in access information, has the same category of behavioral data of acquisition
There is identical form.
In step S101, to extract the behavioral data of each classification, which can be pre-processed.
Pretreatment to network access information includes carrying out network access information variable collection, range of variables processing, minimax rule
Then processing, missing values processing and format analysis processing etc..
Variable collection be gathered out from network access information access time of each network access of user, login time,
Access time when browsing information, search information and purchase information etc., for example accessing a specific electric business website, log in
Time, browse information, search information and purchase information.Server is gathering out access time, the login that user accesses every time
Time, browse information, search information and purchase information when information, can call the corresponding system such as relevant accumulator or calculator
Count out login times of the user in preset time period, purchase number, number of visits and searching times, purchase amount of money, etc..
The Interval Maps of delimitation are tool to be that each variable delimit section according to the rule of business by range of variables processing
There is operational indicator, so as to be inputted as follow-up numerical value, to calculate the features such as user behavior entropy.For example, the above-mentioned number of user
Login times, the purchase amount of money can be divided into one in multiple sections respectively, and each section corresponds to concrete numerical value, for example,
It may correspond to the index (0 to 100) of standardization with number or the relevant user behavior of the amount of money.
The rule process of minimax includes the processing of the numerical values recited included to the network access information gathered, with
Reduce the interference that behavior classification of the abnormal data to user judges.Specifically, can in the network access information that is gathered
The age of user carries out the rule process of minimax.For example it is -1,0 or 999 years old etc. for the age, hence it is evident that do not meet just
The data of normal age of user, minimax rule process is carried out to it.
Behavioral data in the default behavior classification that missing values processing refers to include in gathered network access information is not deposited
When, missing values processing can be carried out to it.Such as it is marked as " 0 ", or using other information replacement etc..For example user adopts
When accessing with anonymous access or directly relevant shopping website without logging into user name, the login letter for the user that server is recorded
Breath then lacks.Server can carry out missing values processing to the category information, can such as obtain the unique mark of the access terminal of user, will
The unique mark is associated as the login name with user.
Format analysis processing includes the processing of the form of the temporal information to being included in network access information, its form is kept phase
Together.Such as the temporal information such as login time of the user for being recorded, for example the temporal information that recorded includes
The form such as 20091011 and on October 11st, 2009-10-11 and 2009, can be wholly converted into unified form, such as
20091011。
2nd, the definite method of user behavior classification
During generation user's portrait, it is possible to need to classify to user behavior and establish and user tag
Correspondence, adoptable sorting technique has very much, such as decision tree, Bayesian network, neutral net, genetic algorithm, association
Rule etc..Wherein, decision tree technique is the major technique for classifying and predicting, decision tree learning is returning based on example
Receive learning algorithm.It is conceived to the classifying rules that the reasoning from one group of out of order, random example removes decision tree representation.
It uses top-down recursive fashion, carries out the comparison of property value in the internal node of decision tree and is judged according to different attribute
The branch downward from the node, then carries out beta pruning, finally obtains classification results in the leaf node of decision tree.So from root to leaf
As soon as node just correspond to a conjunction rule, whole tree correspond to one group of expression formula rule of extracting.
Exemplified by using decision tree, illustrate how to realize the classification (prediction) to user behavior, so as to according to business scenario
Generate various user tags.
As shown in figure 4, the classification Forecasting Methodology mainly includes the following steps that:
S600, the behavioural characteristic for obtaining user behavior to be identified, determine the attribute of behavioural characteristic;
S700, according to the decision-tree model generated, load the attribute of the behavioural characteristic;
Decision-tree model described in S800, recursive traversal, searches the corresponding decision-making leaf class node of the behavioural characteristic,
The classification of the subscriber network access behavior is determined by the leaf node;
S900, the classification determined by, generate the user tag under different scenes.
Alternatively, in step S600, include the process of the identification to user identity, identify the situation of user identity
Under, call the historical behavior data of the user, the attribute in the multiple dimensions for the network behavior feature for supplying the user.If should
Attribute of the historical behavior data of user on partial dimensional is imperfect, then according to the incomplete behavior category of default rule completion
Property, to meet the requirement of the decision-tree model.
Alternatively, in step S700, the decision-tree model generated can have one or more, can be according to classification purpose
And one in trade-off decision tree-model.Also, the multiple decision-tree models alternatively, generated can also be multilevel relation,
Cascaded by similar or inhomogeneous decision-tree model to meet final classificating requirement.
Alternatively, in step S800, the classification can be that multidimensional exports, and according to pre-defined rule, can produce user's mark
Label, to establish user's portrait.
Alternatively, in step S900, including the authority of subscriber network access is controlled.
Alternatively, in step S900, according to classification results, recalls information knowledge base, generates the user under each scene
Label, wherein, described information knowledge base have recorded the relation between the action process of user and behavior purpose.
Fig. 5 for the user tag under scene of doing shopping generated according to the embodiment of the present invention schematic diagram.Wherein,
According to weight of each user tag under the scene, differentiation is carried out to different user label and is shown.
3rd, the system based on different aforementioned sources automatic report generation
In addition, person of ordinary skill in the field it is understood that various aspects of the invention may be implemented as system,
Method or computer program product.Therefore, various aspects of the invention can be implemented as following form:Complete hardware is real
Combined in terms of applying mode, complete Software Implementation (including firmware, resident software, microcode etc.), or hardware and software
Embodiment, may be collectively referred to as " circuit ", " module " or " system " here.In addition, various aspects of the invention can also be realized
For the form of the computer program product in one or more computer-readable mediums, meter is included in the computer-readable medium
The readable program code of calculation machine.
In the case where being implemented as above-mentioned " system " according to an embodiment of the invention, the invention further relates to one kind based on not
With the system of information source automatic report generation, including:
First acquisition module, for obtaining the static information data of user from first information source;
Second acquisition module, for obtaining the multidate information data of user from the second information source;
Data analysis module, for analyzing acquired static state and multidate information data, carries out data cleansing, filtering/lattice
The data needed for generation report are obtained after formula, include multiple labels of user;
Weight computation module, for the data and data point according to the first acquisition module and the acquisition of the second acquisition module
The label of module generation is analysed, weight is calculated to each label of user;
Data binding modules, for the data knot for obtaining the result of calculation of weight computation module and data analysis module
Close, form the data acquisition system for including each dimension of the user.
4th, it is mounted with the system for being used for realization the application program of the embodiment of the present invention
In addition, different embodiments of the invention by software module or can also be stored in one or more computer-readable
The mode of computer-readable instruction on medium realizes, wherein, the computer-readable instruction is when by processor or apparatus assembly
During execution, different embodiment of the present invention is performed.Similarly, software module, computer-readable medium and hardware component
Any combination be all expected from the present invention.The software module can be stored in any type of computer-readable storage medium
In matter, such as RAM, EPROM, EEPROM, flash memory, register, hard disk, CD-ROM, DVD etc..
Specifically, another aspect of the present invention is directed to use with hardware and/or software realizes above-mentioned different embodiment.This
Field it is to be understood by the skilled artisans that computing device or one or more processors can be used to realize or perform the present invention's
Embodiment.Computing device or processor can be such as general processor, digital signal processor (DSP), special integrated chip
(ASIC), field programmable gate array (FPGA) or other programmable logic devices, etc..Various embodiments of the invention also may be used
To be performed or be embodied by the combination of these equipment.
With reference to Fig. 7, it illustrates the running environment of the system according to an embodiment of the invention for being mounted with application program.
In the present embodiment, the system of the installation application program is installed and run in electronic device.The electronics
Device can be the computing devices such as desktop PC, notebook, palm PC and server.The electronic device may include but not
It is limited to memory, processor and display.Fig. 6 illustrate only the electronic device with said modules, it should be understood that simultaneously
All components shown realistic are not applied, the more or less component of the implementation that can be substituted.
The memory can be the internal storage unit of the electronic device in certain embodiments, such as electronics dress
The hard disk or memory put.The memory can also be the External memory equipment of the electronic device in further embodiments,
Such as the plug-in type hard disk being equipped with the electronic device, intelligent memory card (Smart Media Card, SMC), secure digital
(Secure Digital, SD) blocks, flash card (Flash Card) etc..Further, the memory can also both include institute
Stating the internal storage unit of electronic device also includes External memory equipment.The memory is installed on the electronics dress for storage
The application software and Various types of data put, such as program code of system of the installation application program etc..The memory may be used also
For temporarily storing the data that has exported or will export.
The processor can be in certain embodiments central processing unit (Central Processing Unit,
CPU), microprocessor or other data processing chips, for running the program code stored in the memory or processing data,
Such as system for performing the installation application program etc..
The display can be in certain embodiments light-emitting diode display, liquid crystal display, touch-control liquid crystal display with
And OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touches device etc..The display is used to show
Show the information that is handled in the electronic device and for showing visual user interface, for example, application menu interface, should
With icon interface etc..The component of the electronic device is in communication with each other by system bus.
By upper, it will be appreciated that for illustrative purposes, the specific embodiment of the present invention is described herein, still, can make
Each modification, without departing from the scope of the present invention.It will be apparent to one skilled in the art that drawn in flow chart step or this
In the operation that describes and routine can be varied in many ways.More specifically, the order of step can be rearranged, step can be performed parallel
Suddenly, step can be omitted, it may include other steps, can make the various combinations or omission of routine.Thus, the present invention is only by appended power
Profit requires limitation.
Claims (10)
1. based on the method for different aforementioned sources automatic report generation, including:
Step 1, the static information data from first information source acquisition user;
Step 2, the multidate information data from the second information source acquisition user;
Step 3, analysis acquired static state and multidate information data, carry out data cleansing, and generation report is obtained after filtering/formatting
Data needed for table, include multiple labels of user;
The data and label that step 4, basis are obtained in step 3, weight is calculated to each label of user;
Step 5, combined the result of calculation of step 4 with the data of step 3, forms the data acquisition system for including each dimension of the user.
2. according to the method described in claim 1, wherein, first information source is the user data at business service end, the second information
Source includes the user behavior data from third party's acquisition, business datum and the user behavior data with business service end under line,
The user behavior data includes the use of the user behavior data that Scrapy is captured from third party website.
3. according to the method described in claim 2, wherein, in step 4, the label weight is identified below:
Label weight=decay factor × behavior weight × network address weight,
Wherein, decay factor is determined as the time of the act of the user behavior data involved by each label of user, behavior weight
Determined as the behavior classification of the user behavior data involved by each label of user, the network address weight is marked by each of user
The involved information source of label determines.
4. according to the method described in claim 1, wherein, the step 3 includes:Variable is carried out to static and multidate information data
Section is handled, wherein, it is the behavioral data delimitation section as variable according to the rule of business, and be by the Interval Maps of delimitation
With operational indicator, so as to be inputted as follow-up numerical value.
5. according to the method described in claim 4, wherein, by following steps, determine the behavior classification:
Calculate attribute of the user behavior data in default each dimension;
According to the source of user behavior data and with the corresponding attribute in the source, select corresponding disaggregated model;
According to selected disaggregated model, classify to user behavior data.
6. according to the method described in claim 5, wherein it is determined that the step of behavior classification further include:
User identity is identified, obtains attribute of the user behavior data in each dimension, if user's row
It is imperfect in the attribute of partial dimensional for data, then call the historical behavior data of the user, the user with the preset period of time
Behavioral data merges, and supplements the attribute of the partial dimensional.
7. according to the method described in claim 5, wherein, for the user behavior data at business service end, trade-off decision tree
Disaggregated model, the user behavior data obtained for business datum under line, and/or from third party, selects random forest point
Class model.
8. according to the method described in claim 1, wherein, in step 4, each label of user has under different business scene
There is different weighted values,
In steps of 5, in the data acquisition system of each dimension of user, according to current business scenario, the forward user of weight is selected
Label, carries out visualization processing, generation user's portrait.
9. a kind of system based on different aforementioned sources automatic report generation, including:
First acquisition module, for obtaining the static information data of user from first information source;
Second acquisition module, for obtaining the multidate information data of user from the second information source;
Data analysis module, for analyzing acquired static state and multidate information data, carries out data cleansing, filtering/formatting
The data needed for generation report are obtained afterwards, include multiple labels of user;
Weight computation module, for the data and data analysis mould according to the first acquisition module and the acquisition of the second acquisition module
The label of block generation, weight is calculated to each label of user;
Data binding modules, for the result of calculation of weight computation module to be combined with the data that data analysis module obtains, shape
Into the data acquisition system for including each dimension of the user.
10. a kind of computer-readable recording medium, is stored thereon with the journey of the method based on different aforementioned sources automatic report generation
Sequence, when described program is executed by processor, the step of realizing the method described in one in claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711055134.3A CN107908606A (en) | 2017-10-31 | 2017-10-31 | Method and system based on different aforementioned sources automatic report generation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711055134.3A CN107908606A (en) | 2017-10-31 | 2017-10-31 | Method and system based on different aforementioned sources automatic report generation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107908606A true CN107908606A (en) | 2018-04-13 |
Family
ID=61843183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711055134.3A Pending CN107908606A (en) | 2017-10-31 | 2017-10-31 | Method and system based on different aforementioned sources automatic report generation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107908606A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108984668A (en) * | 2018-06-29 | 2018-12-11 | 深圳鼎盛电脑科技有限公司 | A kind of method, apparatus of data processing, equipment and storage medium |
CN109034970A (en) * | 2018-07-20 | 2018-12-18 | 政采云有限公司 | Integrity index evaluation method, device, equipment and access medium |
CN109255067A (en) * | 2018-07-19 | 2019-01-22 | 国政通科技有限公司 | One kind being based on big data intelligent recommendation method and apparatus |
CN109447126A (en) * | 2018-09-27 | 2019-03-08 | 长威信息科技发展股份有限公司 | A kind of method and apparatus of entity and entity attribute dynamic aggregation construction personage's portrait |
CN109522333A (en) * | 2018-11-23 | 2019-03-26 | 北京锐安科技有限公司 | Data analysing method, device, equipment and medium |
CN109558530A (en) * | 2018-10-23 | 2019-04-02 | 深圳壹账通智能科技有限公司 | User's portrait automatic generation method and system based on data processing |
CN109635011A (en) * | 2018-10-31 | 2019-04-16 | 北京辰森世纪科技股份有限公司 | Multistage gauge outfit report processing method, device and equipment based on data service metadata |
CN109684330A (en) * | 2018-12-17 | 2019-04-26 | 深圳市华云中盛科技有限公司 | User's portrait base construction method, device, computer equipment and storage medium |
CN110148049A (en) * | 2019-04-15 | 2019-08-20 | 深圳壹账通智能科技有限公司 | A kind of risk control method, device, computer equipment and readable storage medium storing program for executing |
CN110287308A (en) * | 2019-06-13 | 2019-09-27 | 薛映杜 | A kind of computer data formula statistical method |
CN110347739A (en) * | 2019-06-26 | 2019-10-18 | 联动优势科技有限公司 | A kind of the general data source access method and device of composite data item label |
CN110442670A (en) * | 2019-06-11 | 2019-11-12 | 天津交通职业学院 | A kind of consumer representation generation method based on document indexing |
CN110490729A (en) * | 2019-08-16 | 2019-11-22 | 南京汇银迅信息技术有限公司 | A kind of financial user classification method based on user's portrait model |
CN111177123A (en) * | 2019-12-30 | 2020-05-19 | 联想(北京)有限公司 | Method, apparatus, electronic device and medium for optimizing tag library |
WO2020108153A1 (en) * | 2018-11-30 | 2020-06-04 | 阿里巴巴集团控股有限公司 | Blockchain-based data processing method and apparatus, and computer device |
CN111597179A (en) * | 2020-05-18 | 2020-08-28 | 北京思特奇信息技术股份有限公司 | Method and device for automatically cleaning data, electronic equipment and storage medium |
CN111831636A (en) * | 2020-07-28 | 2020-10-27 | 平安国际融资租赁有限公司 | Data processing method, device, computer system and readable storage medium |
CN112182333A (en) * | 2020-09-25 | 2021-01-05 | 山东亿云信息技术有限公司 | Talent space-time big data processing method and system based on random forest |
CN112214556A (en) * | 2020-09-30 | 2021-01-12 | 招商局金融科技有限公司 | Label generation method and device, electronic equipment and computer readable storage medium |
CN112818023A (en) * | 2021-01-26 | 2021-05-18 | 龚世燕 | Big data analysis method and cloud computing server in associated cloud service scene |
CN113094424A (en) * | 2021-04-09 | 2021-07-09 | 北京元年科技股份有限公司 | Method and system for identifying chart mode by constructing multi-level index system |
CN113449103A (en) * | 2021-01-28 | 2021-09-28 | 民生科技有限责任公司 | Bank transaction flow classification method and system integrating label and text interaction mechanism |
CN116340302A (en) * | 2023-03-30 | 2023-06-27 | 呼和浩特市凡诚电子科技有限公司 | Computer data integration management system and method based on Internet |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150324857A1 (en) * | 2014-04-15 | 2015-11-12 | TapFwd, Inc. | Cross-platform advertising systems and methods |
CN106469191A (en) * | 2016-08-31 | 2017-03-01 | 洑云龙 | A kind of adaptive user portrait automotive engine system of Behavior-based control scene and method |
CN106504099A (en) * | 2015-09-07 | 2017-03-15 | 国家计算机网络与信息安全管理中心 | A kind of system for building user's portrait |
CN106803190A (en) * | 2017-01-03 | 2017-06-06 | 北京掌阔移动传媒科技有限公司 | A kind of ad personalization supplying system and method |
CN106934412A (en) * | 2015-12-31 | 2017-07-07 | 中国科学院深圳先进技术研究院 | A kind of user behavior sorting technique and system |
-
2017
- 2017-10-31 CN CN201711055134.3A patent/CN107908606A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150324857A1 (en) * | 2014-04-15 | 2015-11-12 | TapFwd, Inc. | Cross-platform advertising systems and methods |
CN106504099A (en) * | 2015-09-07 | 2017-03-15 | 国家计算机网络与信息安全管理中心 | A kind of system for building user's portrait |
CN106934412A (en) * | 2015-12-31 | 2017-07-07 | 中国科学院深圳先进技术研究院 | A kind of user behavior sorting technique and system |
CN106469191A (en) * | 2016-08-31 | 2017-03-01 | 洑云龙 | A kind of adaptive user portrait automotive engine system of Behavior-based control scene and method |
CN106803190A (en) * | 2017-01-03 | 2017-06-06 | 北京掌阔移动传媒科技有限公司 | A kind of ad personalization supplying system and method |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108984668A (en) * | 2018-06-29 | 2018-12-11 | 深圳鼎盛电脑科技有限公司 | A kind of method, apparatus of data processing, equipment and storage medium |
CN109255067A (en) * | 2018-07-19 | 2019-01-22 | 国政通科技有限公司 | One kind being based on big data intelligent recommendation method and apparatus |
CN109034970A (en) * | 2018-07-20 | 2018-12-18 | 政采云有限公司 | Integrity index evaluation method, device, equipment and access medium |
CN109447126A (en) * | 2018-09-27 | 2019-03-08 | 长威信息科技发展股份有限公司 | A kind of method and apparatus of entity and entity attribute dynamic aggregation construction personage's portrait |
WO2020082596A1 (en) * | 2018-10-23 | 2020-04-30 | 深圳壹账通智能科技有限公司 | Data processing-based automatic user profile generating method and system |
CN109558530A (en) * | 2018-10-23 | 2019-04-02 | 深圳壹账通智能科技有限公司 | User's portrait automatic generation method and system based on data processing |
CN109635011A (en) * | 2018-10-31 | 2019-04-16 | 北京辰森世纪科技股份有限公司 | Multistage gauge outfit report processing method, device and equipment based on data service metadata |
CN109522333A (en) * | 2018-11-23 | 2019-03-26 | 北京锐安科技有限公司 | Data analysing method, device, equipment and medium |
US11176170B2 (en) | 2018-11-30 | 2021-11-16 | Advanced New Technologies Co., Ltd. | Blockchain-based data processing methods and apparatuses and computer devices |
WO2020108153A1 (en) * | 2018-11-30 | 2020-06-04 | 阿里巴巴集团控股有限公司 | Blockchain-based data processing method and apparatus, and computer device |
CN109684330A (en) * | 2018-12-17 | 2019-04-26 | 深圳市华云中盛科技有限公司 | User's portrait base construction method, device, computer equipment and storage medium |
CN110148049A (en) * | 2019-04-15 | 2019-08-20 | 深圳壹账通智能科技有限公司 | A kind of risk control method, device, computer equipment and readable storage medium storing program for executing |
CN110442670A (en) * | 2019-06-11 | 2019-11-12 | 天津交通职业学院 | A kind of consumer representation generation method based on document indexing |
CN110442670B (en) * | 2019-06-11 | 2023-05-26 | 天津交通职业学院 | Consumer portrait generation method based on text indexing |
CN110287308A (en) * | 2019-06-13 | 2019-09-27 | 薛映杜 | A kind of computer data formula statistical method |
CN110347739A (en) * | 2019-06-26 | 2019-10-18 | 联动优势科技有限公司 | A kind of the general data source access method and device of composite data item label |
CN110347739B (en) * | 2019-06-26 | 2021-04-20 | 联动优势科技有限公司 | Universal data source access method and device for composite data item label |
CN110490729A (en) * | 2019-08-16 | 2019-11-22 | 南京汇银迅信息技术有限公司 | A kind of financial user classification method based on user's portrait model |
CN110490729B (en) * | 2019-08-16 | 2022-11-18 | 南京汇银迅信息技术有限公司 | Financial user classification method based on user portrait model |
CN111177123A (en) * | 2019-12-30 | 2020-05-19 | 联想(北京)有限公司 | Method, apparatus, electronic device and medium for optimizing tag library |
CN111597179A (en) * | 2020-05-18 | 2020-08-28 | 北京思特奇信息技术股份有限公司 | Method and device for automatically cleaning data, electronic equipment and storage medium |
CN111597179B (en) * | 2020-05-18 | 2023-12-05 | 北京思特奇信息技术股份有限公司 | Method and device for automatically cleaning data, electronic equipment and storage medium |
CN111831636A (en) * | 2020-07-28 | 2020-10-27 | 平安国际融资租赁有限公司 | Data processing method, device, computer system and readable storage medium |
CN112182333A (en) * | 2020-09-25 | 2021-01-05 | 山东亿云信息技术有限公司 | Talent space-time big data processing method and system based on random forest |
CN112214556A (en) * | 2020-09-30 | 2021-01-12 | 招商局金融科技有限公司 | Label generation method and device, electronic equipment and computer readable storage medium |
CN112214556B (en) * | 2020-09-30 | 2024-02-23 | 招商局金融科技有限公司 | Label generation method, label generation device, electronic equipment and computer readable storage medium |
CN112818023A (en) * | 2021-01-26 | 2021-05-18 | 龚世燕 | Big data analysis method and cloud computing server in associated cloud service scene |
CN113449103A (en) * | 2021-01-28 | 2021-09-28 | 民生科技有限责任公司 | Bank transaction flow classification method and system integrating label and text interaction mechanism |
CN113449103B (en) * | 2021-01-28 | 2024-05-10 | 民生科技有限责任公司 | Bank transaction running water classification method and system integrating label and text interaction mechanism |
CN113094424A (en) * | 2021-04-09 | 2021-07-09 | 北京元年科技股份有限公司 | Method and system for identifying chart mode by constructing multi-level index system |
CN116340302A (en) * | 2023-03-30 | 2023-06-27 | 呼和浩特市凡诚电子科技有限公司 | Computer data integration management system and method based on Internet |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107908606A (en) | Method and system based on different aforementioned sources automatic report generation | |
CN111831636B (en) | Data processing method, device, computer system and readable storage medium | |
CN107818344B (en) | Method and system for classifying and predicting user behaviors | |
JP6494777B2 (en) | Method and device for selecting data content to be pushed to a terminal | |
Yoon et al. | Structuring technological information for technology roadmapping: data mining approach | |
CN109558530A (en) | User's portrait automatic generation method and system based on data processing | |
CN112632405B (en) | Recommendation method, recommendation device, recommendation equipment and storage medium | |
US20230162051A1 (en) | Method, device and apparatus for execution of automated machine learning process | |
CN106991576A (en) | A kind of heating power of geographic area shows method and apparatus | |
CN106067094A (en) | A kind of dynamic assessment method and system | |
CN111127105A (en) | User hierarchical model construction method and system, and operation analysis method and system | |
CN104679743A (en) | Method and device for determining preference model of user | |
Aeron et al. | Data mining framework for customer lifetime value-based segmentation | |
CN106982251A (en) | Project field work data reporting method and system are reconnoitred based on mobile device | |
CN104598474B (en) | Information recommendation method based on data semantic under cloud environment | |
CN112631889A (en) | Portrayal method, device and equipment for application system and readable storage medium | |
CN111259167A (en) | User request risk identification method and device | |
CN118411195A (en) | Big data-based sales power quantity information plan management system | |
CN115130811A (en) | Method and device for establishing power user portrait and electronic equipment | |
CN105389714A (en) | Method for identifying user characteristic from behavior data | |
Shi et al. | Human resources balanced allocation method based on deep learning algorithm | |
CN117076770A (en) | Data recommendation method and device based on graph calculation, storage value and electronic equipment | |
US20120271789A1 (en) | Apparatus and method for prediction development speed of technology | |
CN116402546A (en) | Store risk attribution method and device, equipment, medium and product thereof | |
CN114707510A (en) | Resource recommendation information pushing method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20180606 Address after: 518052 Room 201, building A, 1 front Bay Road, Shenzhen Qianhai cooperation zone, Shenzhen, Guangdong Applicant after: Shenzhen one ledger Intelligent Technology Co., Ltd. Address before: 200030 Xuhui District, Shanghai Kai Bin Road 166, 9, 10 level. Applicant before: Shanghai Financial Technologies Ltd |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180413 |
|
RJ01 | Rejection of invention patent application after publication |