CN107908606A - Method and system based on different aforementioned sources automatic report generation - Google Patents

Method and system based on different aforementioned sources automatic report generation Download PDF

Info

Publication number
CN107908606A
CN107908606A CN201711055134.3A CN201711055134A CN107908606A CN 107908606 A CN107908606 A CN 107908606A CN 201711055134 A CN201711055134 A CN 201711055134A CN 107908606 A CN107908606 A CN 107908606A
Authority
CN
China
Prior art keywords
data
user
label
information
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711055134.3A
Other languages
Chinese (zh)
Inventor
王盼
李晨光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201711055134.3A priority Critical patent/CN107908606A/en
Publication of CN107908606A publication Critical patent/CN107908606A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Business, Economics & Management (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the method and system based on different aforementioned sources automatic report generation, the described method includes:Step 1, the static information data from first information source acquisition user;Step 2, the multidate information data from the second information source acquisition user;Step 3, analysis acquired static state and multidate information data, carry out data cleansing, and the data needed for generation report are obtained after filtering/formatting, include multiple labels of user;The data and label that step 4, basis are obtained in step 3, weight is calculated to each label of user;Step 5, combined the result of calculation of step 4 with the data of step 3, forms the data acquisition system for including each dimension of the user.The present invention considers the separate sources of data, carries out the processing of differentiation, improves fineness and the accuracy of processing;In view of the timeliness of user data, it is established that Data renewal mechanism;Using user's portrait as with reference to information, greatly improve related service and handle speed so that customer relation management is more accurate.

Description

Method and system based on different aforementioned sources automatic report generation
Technical field
The present invention relates to Internet service technical field, more particularly to the side based on different aforementioned sources automatic report generation Method.
Background technology
With the development of internet explosion type, mass data all is being produced daily, how effectively to be analyzed by method Mass data, and from wherein finding favourable specification or information has become a kind of trend.
, it is necessary to carry out manual sort to user behavior, judge in traditional method, and provided targetedly in the later stage Service or other processing replies.Under the scene of mass data, often dimension is more, data volume is big for data, is difficult manually by user The relevant indicator-specific statistics of behavior is comprehensive.Further, since situations such as fatigue occurs in people, the method for this traditional manual identified is accurate True rate is not high.
After gradually stepping into the big data epoch in internet, the behavior of user all will be visual in face of service provider.Clothes Business business also begin to focus day benefit focus on how using big data come precision marketing, and then deeply excavate potential business valency Value.Then, the concept of " user's portrait " is also just come into being.Big data enables service provider advantageously to be obtained by internet The more extensive feedback information of user, for further precisely, rapidly analyze the important business such as user behavior custom, consumption habit Information, there is provided enough data basis.With the understanding to people progressively deeply, user draw a portrait (UserProfile) it is general Thought is come into being, it is used for the information overall picture that user is taken out by user tag, is considered as service provider using big data Foundation.Typical user's portrait be by user information labeling, be exactly service provider by collect with analysis consumer's society attribute, After the data of the main informations such as habits and customs, consumer behavior, the business overall picture of a user is ideally taken out, is considered as It is the basic mode that service provider applies big data technology.
However, at present, identification, the foundation of user's portrait to user tag, mainly or pass through manual intervention and calculating The mode of machine simple process conversion is completed, and there are following deficiency:1st, take very long;2nd, high labor cost;3rd, the result of generation It is not directly perceived enough;4th, data input has the risk of errors caused by manually.
In addition, excessively rely on the otherness for user's portrait result that background work personnel individual factor can cause very Greatly, and the timeliness of label is not accounted for, the user that can cause to finally obtain draws a portrait not accurate enough.
In the prior art, the method classified to user behavior and predicted is more single, undesirable with reference to effect.By Include online and offline behavior in user behavior, data source is complicated, needs exist for:Exploitation can be directed to different numbers According to source, integrated judgment and prediction user property and the scheme for generating user's portrait with reference to a variety of classification Predicting Techniques.
The content of the invention
In view of this, how quickly to capture information and generate chart intuitive and easy to understand (user tag and user's portrait), supply Policymaker carries out the foundation of decision-making just into an important problem.Applicant creatively by multiple data sources Macro or mass analysis, And various charts are automatically generated according to specified services scene, quickly make decision-making for policymaker.
It is a primary object of the present invention to provide the method that various charts are automatically generated based on different aforementioned sources.This method can Website relevant information is crawled, with reference to existing business datum by using Scrapy, by web data digging technology PageRank algorithms and sorting algorithm parse the data without source, are classified, and are finally retouched using cluster analysis generation State, call Python Charts generation user tags, be further abstracted, collect and draw a portrait for user.
According to an embodiment of the invention, there is provided a kind of method based on different aforementioned sources automatic report generation, including:
Step 1, the static information data from first information source acquisition user;
Step 2, the multidate information data from the second information source acquisition user;
Step 3, analysis acquired static state and multidate information data, carry out data cleansing, are given birth to after filtering/formatting Into multiple labels of the data needed for report, including user;
The data and label that step 4, basis are obtained in step 3, weight is calculated to each label of user;
Step 5, combined the result of calculation of step 4 with the data of step 3, forms the data set for including each dimension of the user Close.
According to an embodiment of the invention, wherein, first information source is the user data at business service end, the second information source bag The user behavior data from third party's acquisition, business datum and the user behavior data with business service end under line are included, it is described User behavior data includes the use of the user behavior data that Scrapy is captured from third party website.
According to an embodiment of the invention, wherein, in step 4, the label weight is identified below:
Label weight=decay factor × behavior weight × network address weight,
Wherein, decay factor is determined as the time of the act of the user behavior data involved by each label of user, behavior Weight is determined that the network address weight is by the every of user as the behavior classification of the user behavior data involved by each label of user Information source involved by a label determines.
According to an embodiment of the invention, wherein, the step 3 includes:Variable range is carried out to static and multidate information data Between handle, wherein, according to the rule of business be as variable behavioral data delimit section, and by the Interval Maps of delimitation for tool There is operational indicator, so as to be inputted as follow-up numerical value.
According to an embodiment of the invention, wherein, by following steps, the behavior classification is determined:
Calculate attribute of the user behavior data in default each dimension;
Classify accordingly according to the source of user behavior data and with the corresponding attribute in the source, selection Model;
According to selected disaggregated model, classify to user behavior data.
According to an embodiment of the invention, wherein it is determined that the step of behavior classification further includes:
User identity is identified, obtains attribute of the user behavior data in each dimension, if the use Family behavioral data is imperfect in the attribute of partial dimensional, then the historical behavior data of the user is called, with the preset period of time User behavior data merges, and supplements the attribute of the partial dimensional.
According to an embodiment of the invention, wherein, for the user behavior data at business service end, trade-off decision tree classification Model, the user behavior data obtained for business datum under line, and/or from third party, selects random forest classification mould Type.
According to an embodiment of the invention, wherein, in step 4, each label of user has under different business scene Different weighted values, in steps of 5, in the data acquisition system of each dimension of user, according to current business scenario, selects weight to lean on Preceding user tag, carries out visualization processing, generation user's portrait.
According to an embodiment of the invention, there is provided a kind of system based on different aforementioned sources automatic report generation, including:
First acquisition module 101, for obtaining the static information data of user from first information source;
Second acquisition module 102, for obtaining the multidate information data of user from the second information source;
Data analysis module 103, for analyzing acquired static state and multidate information data, carries out data cleansing, filter/ The data needed for generation report are obtained after formatting, include multiple labels of user;
Weight computation module 104, for according to the first acquisition module 101 and the second acquisition module 102 acquisition data, with And the label of data analysis module generation, weight is calculated to each label of user;
Data binding modules 105, for the result of calculation of weight computation module 104 and data analysis module 103 to be obtained Data combine, formed and include the data acquisition system of each dimension of the user.
According to an embodiment of the invention, there is provided a kind of computer-readable recording medium, is stored thereon with based on different letters The program of the method for breath source automatic report generation, when described program is executed by processor, the step of realizing the method.
Beneficial effects of the present invention essentially consist in:In view of the separate sources of data, the processing of differentiation is carried out, is improved The fineness of processing and accuracy;It take into account the timeliness of user data, it is established that Data renewal mechanism;Drawn a portrait and made with user For with reference to information, greatly improve related service handles speed;Different points can be selected according to the difference in sample data source Class model cascades and/or parallel connection so that customer relation management is more accurate.
Brief description of the drawings
Fig. 1 is to be illustrated according to the flow of the method based on different aforementioned sources automatic report generation of the embodiment of the present invention Figure;
Fig. 2 is the schematic diagram for the result classified according to the user of the embodiment of the present invention;
Fig. 3 is another schematic diagram for the result classified according to the user of the embodiment of the present invention;
Fig. 4 is to be illustrated according to the flow classified based on Decision Tree Inductive to user behavior of the embodiment of the present invention Figure;
Fig. 5 for the user tag under scene of doing shopping generated according to the embodiment of the present invention schematic diagram;
Fig. 6 is to be illustrated according to the composition of the system based on different aforementioned sources automatic report generation of the embodiment of the present invention Figure;
Fig. 7 is the schematic diagram according to the running environment of the system for being mounted with application program of the embodiment of the present invention.
Embodiment
In the following, it is described in further detail with reference to implementation of the attached drawing to technical solution.
It will be appreciated by those of skill in the art that although the following description is related to many of embodiment for the present invention Ins and outs, but be only for not meaning that any restrictions for illustrating the example of the principle of the present invention.The present invention can be applicable in Occasion outside different from ins and outs exemplified below, without departing from the principle of the present invention and spirit.
, may pair can be in description in the present specification in addition, miscellaneous in order to avoid being limited to the description of this specification The portion of techniques details obtained in prior art data has carried out the processing such as omission, simplification, accommodation, this technology for this area It will be understood by for personnel, and this does not interfere with the open adequacy of this specification.
Hereinafter, description is used to carry out the embodiment of the present invention.Note that description will be provided with following order:1st, base In the method (Fig. 1) of different aforementioned sources automatic report generation;2nd, the definite method (Fig. 2 to 5) of user behavior classification;3rd, according to this The system (Fig. 6) for being mounted with application program of the embodiment of invention.
1st, the method based on different aforementioned sources automatic report generation
As shown in Figure 1, according to an embodiment of the invention, there is provided a kind of auto report completing method, including:
Step S100, the static state and multidate information data of user is obtained from different aforementioned sources;
Wherein, static information data refer to that user stablizes the information of (being not easy to change over time) relatively, are derived mainly from public Data, it may for example comprise the ascribed characteristics of population, commercial attribute etc. data.This category information, self-contained label, if there is true letter in enterprise Breath is then more data cleansing (filtering, screening) work without excessive modeling and forecasting.
Wherein, multidate information data refer to the continually changing information of user, include the behavioural information of user, in a broad sense, User opens webpage, has bought a cup;Time dog has been slipped at dusk with the user, has been taken daytime, has played yawn etc. All it is user behavior Deng as.At present, user behavior has the trend for focusing on internet (such as electric business, social networks), in this way, User behavior can be focused on to less scope, for example, delivering the microblogging on footwear quality, praising the micro- of " double 11 promote to power greatly " Rich message.Etc. can regard Internet user's behavior as.Behavior on user internet is considered as user's multidate information Key data source.
Step S200, the acquired above- mentioned information data of analysis, carry out data cleansing (filtering), after filtering/formatting To the data needed for generation report, include the label of each user;
The target of user's portrait is by analyzing user behavior, and finally tagged for each user, each label corresponds to In weight.Label, characterizes content, and user is interested in the content, preference, demand etc..Weight, characterizes index, user Interest, preference function, it is also possible to characterize the demand degree of user, can simply be interpreted as confidence level, probability, etc.).
Step S300, according to the data and label obtained in above-mentioned steps, weight (public affairs are calculated to the label of each user Formula is as follows);
User data model, may be summarized to be following formula:User identifier+time+behavior type+contact point (network address+ Content), i.e. what user at what time, what place, what has done.
The weight of user tag may decay with the increase of time, therefore it is decay factor r to define the time, behavior class Type, network address determine weight, and content determines label, can be further converted into formula:
Label weight=decay factor × behavior weight × information source weight
Step S400, the result of calculation of above-mentioned steps is combined with the data of step S100, formation is respectively tieed up comprising the user The data acquisition system of degree;
Step S500, analysis classification is carried out to data in data acquisition system, generation user describes using cluster analysis, most throughout one's life Into user's report.
Specifically, can be by the PageRank algorithms in web data digging technology and sorting algorithm to number in database According to analysis classification is carried out, finally generated and described using cluster analysis, call Python Charts generation reports.User, which classifies, to tie The example of fruit is as shown in Figures 2 and 3.
Wherein, Fig. 2 shows user's classification results in the form of histogram, it can be seen that when the quantity of all types of user and Proportion.
Fig. 3 shows the feature distribution of all types of user in the form of radar map, wherein have chosen representational in Fig. 2 3 class users.Such as feature A, B, C can be other features such as age, region (region consumption level), position (regional population).
Alternatively, step S100 includes:
S101, pre-processed (data cleansing, screening) to static and multidate information data, according to pretreated network Behavioral data of the user in each default behavior classification is obtained in access information, has the same category of behavioral data of acquisition There is identical form.
In step S101, to extract the behavioral data of each classification, which can be pre-processed. Pretreatment to network access information includes carrying out network access information variable collection, range of variables processing, minimax rule Then processing, missing values processing and format analysis processing etc..
Variable collection be gathered out from network access information access time of each network access of user, login time, Access time when browsing information, search information and purchase information etc., for example accessing a specific electric business website, log in Time, browse information, search information and purchase information.Server is gathering out access time, the login that user accesses every time Time, browse information, search information and purchase information when information, can call the corresponding system such as relevant accumulator or calculator Count out login times of the user in preset time period, purchase number, number of visits and searching times, purchase amount of money, etc..
The Interval Maps of delimitation are tool to be that each variable delimit section according to the rule of business by range of variables processing There is operational indicator, so as to be inputted as follow-up numerical value, to calculate the features such as user behavior entropy.For example, the above-mentioned number of user Login times, the purchase amount of money can be divided into one in multiple sections respectively, and each section corresponds to concrete numerical value, for example, It may correspond to the index (0 to 100) of standardization with number or the relevant user behavior of the amount of money.
The rule process of minimax includes the processing of the numerical values recited included to the network access information gathered, with Reduce the interference that behavior classification of the abnormal data to user judges.Specifically, can in the network access information that is gathered The age of user carries out the rule process of minimax.For example it is -1,0 or 999 years old etc. for the age, hence it is evident that do not meet just The data of normal age of user, minimax rule process is carried out to it.
Behavioral data in the default behavior classification that missing values processing refers to include in gathered network access information is not deposited When, missing values processing can be carried out to it.Such as it is marked as " 0 ", or using other information replacement etc..For example user adopts When accessing with anonymous access or directly relevant shopping website without logging into user name, the login letter for the user that server is recorded Breath then lacks.Server can carry out missing values processing to the category information, can such as obtain the unique mark of the access terminal of user, will The unique mark is associated as the login name with user.
Format analysis processing includes the processing of the form of the temporal information to being included in network access information, its form is kept phase Together.Such as the temporal information such as login time of the user for being recorded, for example the temporal information that recorded includes The form such as 20091011 and on October 11st, 2009-10-11 and 2009, can be wholly converted into unified form, such as 20091011。
2nd, the definite method of user behavior classification
During generation user's portrait, it is possible to need to classify to user behavior and establish and user tag Correspondence, adoptable sorting technique has very much, such as decision tree, Bayesian network, neutral net, genetic algorithm, association Rule etc..Wherein, decision tree technique is the major technique for classifying and predicting, decision tree learning is returning based on example Receive learning algorithm.It is conceived to the classifying rules that the reasoning from one group of out of order, random example removes decision tree representation. It uses top-down recursive fashion, carries out the comparison of property value in the internal node of decision tree and is judged according to different attribute The branch downward from the node, then carries out beta pruning, finally obtains classification results in the leaf node of decision tree.So from root to leaf As soon as node just correspond to a conjunction rule, whole tree correspond to one group of expression formula rule of extracting.
Exemplified by using decision tree, illustrate how to realize the classification (prediction) to user behavior, so as to according to business scenario Generate various user tags.
As shown in figure 4, the classification Forecasting Methodology mainly includes the following steps that:
S600, the behavioural characteristic for obtaining user behavior to be identified, determine the attribute of behavioural characteristic;
S700, according to the decision-tree model generated, load the attribute of the behavioural characteristic;
Decision-tree model described in S800, recursive traversal, searches the corresponding decision-making leaf class node of the behavioural characteristic, The classification of the subscriber network access behavior is determined by the leaf node;
S900, the classification determined by, generate the user tag under different scenes.
Alternatively, in step S600, include the process of the identification to user identity, identify the situation of user identity Under, call the historical behavior data of the user, the attribute in the multiple dimensions for the network behavior feature for supplying the user.If should Attribute of the historical behavior data of user on partial dimensional is imperfect, then according to the incomplete behavior category of default rule completion Property, to meet the requirement of the decision-tree model.
Alternatively, in step S700, the decision-tree model generated can have one or more, can be according to classification purpose And one in trade-off decision tree-model.Also, the multiple decision-tree models alternatively, generated can also be multilevel relation, Cascaded by similar or inhomogeneous decision-tree model to meet final classificating requirement.
Alternatively, in step S800, the classification can be that multidimensional exports, and according to pre-defined rule, can produce user's mark Label, to establish user's portrait.
Alternatively, in step S900, including the authority of subscriber network access is controlled.
Alternatively, in step S900, according to classification results, recalls information knowledge base, generates the user under each scene Label, wherein, described information knowledge base have recorded the relation between the action process of user and behavior purpose.
Fig. 5 for the user tag under scene of doing shopping generated according to the embodiment of the present invention schematic diagram.Wherein, According to weight of each user tag under the scene, differentiation is carried out to different user label and is shown.
3rd, the system based on different aforementioned sources automatic report generation
In addition, person of ordinary skill in the field it is understood that various aspects of the invention may be implemented as system, Method or computer program product.Therefore, various aspects of the invention can be implemented as following form:Complete hardware is real Combined in terms of applying mode, complete Software Implementation (including firmware, resident software, microcode etc.), or hardware and software Embodiment, may be collectively referred to as " circuit ", " module " or " system " here.In addition, various aspects of the invention can also be realized For the form of the computer program product in one or more computer-readable mediums, meter is included in the computer-readable medium The readable program code of calculation machine.
In the case where being implemented as above-mentioned " system " according to an embodiment of the invention, the invention further relates to one kind based on not With the system of information source automatic report generation, including:
First acquisition module, for obtaining the static information data of user from first information source;
Second acquisition module, for obtaining the multidate information data of user from the second information source;
Data analysis module, for analyzing acquired static state and multidate information data, carries out data cleansing, filtering/lattice The data needed for generation report are obtained after formula, include multiple labels of user;
Weight computation module, for the data and data point according to the first acquisition module and the acquisition of the second acquisition module The label of module generation is analysed, weight is calculated to each label of user;
Data binding modules, for the data knot for obtaining the result of calculation of weight computation module and data analysis module Close, form the data acquisition system for including each dimension of the user.
4th, it is mounted with the system for being used for realization the application program of the embodiment of the present invention
In addition, different embodiments of the invention by software module or can also be stored in one or more computer-readable The mode of computer-readable instruction on medium realizes, wherein, the computer-readable instruction is when by processor or apparatus assembly During execution, different embodiment of the present invention is performed.Similarly, software module, computer-readable medium and hardware component Any combination be all expected from the present invention.The software module can be stored in any type of computer-readable storage medium In matter, such as RAM, EPROM, EEPROM, flash memory, register, hard disk, CD-ROM, DVD etc..
Specifically, another aspect of the present invention is directed to use with hardware and/or software realizes above-mentioned different embodiment.This Field it is to be understood by the skilled artisans that computing device or one or more processors can be used to realize or perform the present invention's Embodiment.Computing device or processor can be such as general processor, digital signal processor (DSP), special integrated chip (ASIC), field programmable gate array (FPGA) or other programmable logic devices, etc..Various embodiments of the invention also may be used To be performed or be embodied by the combination of these equipment.
With reference to Fig. 7, it illustrates the running environment of the system according to an embodiment of the invention for being mounted with application program.
In the present embodiment, the system of the installation application program is installed and run in electronic device.The electronics Device can be the computing devices such as desktop PC, notebook, palm PC and server.The electronic device may include but not It is limited to memory, processor and display.Fig. 6 illustrate only the electronic device with said modules, it should be understood that simultaneously All components shown realistic are not applied, the more or less component of the implementation that can be substituted.
The memory can be the internal storage unit of the electronic device in certain embodiments, such as electronics dress The hard disk or memory put.The memory can also be the External memory equipment of the electronic device in further embodiments, Such as the plug-in type hard disk being equipped with the electronic device, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) blocks, flash card (Flash Card) etc..Further, the memory can also both include institute Stating the internal storage unit of electronic device also includes External memory equipment.The memory is installed on the electronics dress for storage The application software and Various types of data put, such as program code of system of the installation application program etc..The memory may be used also For temporarily storing the data that has exported or will export.
The processor can be in certain embodiments central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chips, for running the program code stored in the memory or processing data, Such as system for performing the installation application program etc..
The display can be in certain embodiments light-emitting diode display, liquid crystal display, touch-control liquid crystal display with And OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touches device etc..The display is used to show Show the information that is handled in the electronic device and for showing visual user interface, for example, application menu interface, should With icon interface etc..The component of the electronic device is in communication with each other by system bus.
By upper, it will be appreciated that for illustrative purposes, the specific embodiment of the present invention is described herein, still, can make Each modification, without departing from the scope of the present invention.It will be apparent to one skilled in the art that drawn in flow chart step or this In the operation that describes and routine can be varied in many ways.More specifically, the order of step can be rearranged, step can be performed parallel Suddenly, step can be omitted, it may include other steps, can make the various combinations or omission of routine.Thus, the present invention is only by appended power Profit requires limitation.

Claims (10)

1. based on the method for different aforementioned sources automatic report generation, including:
Step 1, the static information data from first information source acquisition user;
Step 2, the multidate information data from the second information source acquisition user;
Step 3, analysis acquired static state and multidate information data, carry out data cleansing, and generation report is obtained after filtering/formatting Data needed for table, include multiple labels of user;
The data and label that step 4, basis are obtained in step 3, weight is calculated to each label of user;
Step 5, combined the result of calculation of step 4 with the data of step 3, forms the data acquisition system for including each dimension of the user.
2. according to the method described in claim 1, wherein, first information source is the user data at business service end, the second information Source includes the user behavior data from third party's acquisition, business datum and the user behavior data with business service end under line, The user behavior data includes the use of the user behavior data that Scrapy is captured from third party website.
3. according to the method described in claim 2, wherein, in step 4, the label weight is identified below:
Label weight=decay factor × behavior weight × network address weight,
Wherein, decay factor is determined as the time of the act of the user behavior data involved by each label of user, behavior weight Determined as the behavior classification of the user behavior data involved by each label of user, the network address weight is marked by each of user The involved information source of label determines.
4. according to the method described in claim 1, wherein, the step 3 includes:Variable is carried out to static and multidate information data Section is handled, wherein, it is the behavioral data delimitation section as variable according to the rule of business, and be by the Interval Maps of delimitation With operational indicator, so as to be inputted as follow-up numerical value.
5. according to the method described in claim 4, wherein, by following steps, determine the behavior classification:
Calculate attribute of the user behavior data in default each dimension;
According to the source of user behavior data and with the corresponding attribute in the source, select corresponding disaggregated model;
According to selected disaggregated model, classify to user behavior data.
6. according to the method described in claim 5, wherein it is determined that the step of behavior classification further include:
User identity is identified, obtains attribute of the user behavior data in each dimension, if user's row It is imperfect in the attribute of partial dimensional for data, then call the historical behavior data of the user, the user with the preset period of time Behavioral data merges, and supplements the attribute of the partial dimensional.
7. according to the method described in claim 5, wherein, for the user behavior data at business service end, trade-off decision tree Disaggregated model, the user behavior data obtained for business datum under line, and/or from third party, selects random forest point Class model.
8. according to the method described in claim 1, wherein, in step 4, each label of user has under different business scene There is different weighted values,
In steps of 5, in the data acquisition system of each dimension of user, according to current business scenario, the forward user of weight is selected Label, carries out visualization processing, generation user's portrait.
9. a kind of system based on different aforementioned sources automatic report generation, including:
First acquisition module, for obtaining the static information data of user from first information source;
Second acquisition module, for obtaining the multidate information data of user from the second information source;
Data analysis module, for analyzing acquired static state and multidate information data, carries out data cleansing, filtering/formatting The data needed for generation report are obtained afterwards, include multiple labels of user;
Weight computation module, for the data and data analysis mould according to the first acquisition module and the acquisition of the second acquisition module The label of block generation, weight is calculated to each label of user;
Data binding modules, for the result of calculation of weight computation module to be combined with the data that data analysis module obtains, shape Into the data acquisition system for including each dimension of the user.
10. a kind of computer-readable recording medium, is stored thereon with the journey of the method based on different aforementioned sources automatic report generation Sequence, when described program is executed by processor, the step of realizing the method described in one in claims 1 to 10.
CN201711055134.3A 2017-10-31 2017-10-31 Method and system based on different aforementioned sources automatic report generation Pending CN107908606A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711055134.3A CN107908606A (en) 2017-10-31 2017-10-31 Method and system based on different aforementioned sources automatic report generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711055134.3A CN107908606A (en) 2017-10-31 2017-10-31 Method and system based on different aforementioned sources automatic report generation

Publications (1)

Publication Number Publication Date
CN107908606A true CN107908606A (en) 2018-04-13

Family

ID=61843183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711055134.3A Pending CN107908606A (en) 2017-10-31 2017-10-31 Method and system based on different aforementioned sources automatic report generation

Country Status (1)

Country Link
CN (1) CN107908606A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984668A (en) * 2018-06-29 2018-12-11 深圳鼎盛电脑科技有限公司 A kind of method, apparatus of data processing, equipment and storage medium
CN109034970A (en) * 2018-07-20 2018-12-18 政采云有限公司 Integrity index evaluation method, device, equipment and access medium
CN109255067A (en) * 2018-07-19 2019-01-22 国政通科技有限公司 One kind being based on big data intelligent recommendation method and apparatus
CN109447126A (en) * 2018-09-27 2019-03-08 长威信息科技发展股份有限公司 A kind of method and apparatus of entity and entity attribute dynamic aggregation construction personage's portrait
CN109522333A (en) * 2018-11-23 2019-03-26 北京锐安科技有限公司 Data analysing method, device, equipment and medium
CN109558530A (en) * 2018-10-23 2019-04-02 深圳壹账通智能科技有限公司 User's portrait automatic generation method and system based on data processing
CN109635011A (en) * 2018-10-31 2019-04-16 北京辰森世纪科技股份有限公司 Multistage gauge outfit report processing method, device and equipment based on data service metadata
CN109684330A (en) * 2018-12-17 2019-04-26 深圳市华云中盛科技有限公司 User's portrait base construction method, device, computer equipment and storage medium
CN110148049A (en) * 2019-04-15 2019-08-20 深圳壹账通智能科技有限公司 A kind of risk control method, device, computer equipment and readable storage medium storing program for executing
CN110287308A (en) * 2019-06-13 2019-09-27 薛映杜 A kind of computer data formula statistical method
CN110347739A (en) * 2019-06-26 2019-10-18 联动优势科技有限公司 A kind of the general data source access method and device of composite data item label
CN110442670A (en) * 2019-06-11 2019-11-12 天津交通职业学院 A kind of consumer representation generation method based on document indexing
CN110490729A (en) * 2019-08-16 2019-11-22 南京汇银迅信息技术有限公司 A kind of financial user classification method based on user's portrait model
CN111177123A (en) * 2019-12-30 2020-05-19 联想(北京)有限公司 Method, apparatus, electronic device and medium for optimizing tag library
WO2020108153A1 (en) * 2018-11-30 2020-06-04 阿里巴巴集团控股有限公司 Blockchain-based data processing method and apparatus, and computer device
CN111597179A (en) * 2020-05-18 2020-08-28 北京思特奇信息技术股份有限公司 Method and device for automatically cleaning data, electronic equipment and storage medium
CN111831636A (en) * 2020-07-28 2020-10-27 平安国际融资租赁有限公司 Data processing method, device, computer system and readable storage medium
CN112182333A (en) * 2020-09-25 2021-01-05 山东亿云信息技术有限公司 Talent space-time big data processing method and system based on random forest
CN112214556A (en) * 2020-09-30 2021-01-12 招商局金融科技有限公司 Label generation method and device, electronic equipment and computer readable storage medium
CN112818023A (en) * 2021-01-26 2021-05-18 龚世燕 Big data analysis method and cloud computing server in associated cloud service scene
CN113094424A (en) * 2021-04-09 2021-07-09 北京元年科技股份有限公司 Method and system for identifying chart mode by constructing multi-level index system
CN113449103A (en) * 2021-01-28 2021-09-28 民生科技有限责任公司 Bank transaction flow classification method and system integrating label and text interaction mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324857A1 (en) * 2014-04-15 2015-11-12 TapFwd, Inc. Cross-platform advertising systems and methods
CN106469191A (en) * 2016-08-31 2017-03-01 洑云龙 A kind of adaptive user portrait automotive engine system of Behavior-based control scene and method
CN106504099A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of system for building user's portrait
CN106803190A (en) * 2017-01-03 2017-06-06 北京掌阔移动传媒科技有限公司 A kind of ad personalization supplying system and method
CN106934412A (en) * 2015-12-31 2017-07-07 中国科学院深圳先进技术研究院 A kind of user behavior sorting technique and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324857A1 (en) * 2014-04-15 2015-11-12 TapFwd, Inc. Cross-platform advertising systems and methods
CN106504099A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of system for building user's portrait
CN106934412A (en) * 2015-12-31 2017-07-07 中国科学院深圳先进技术研究院 A kind of user behavior sorting technique and system
CN106469191A (en) * 2016-08-31 2017-03-01 洑云龙 A kind of adaptive user portrait automotive engine system of Behavior-based control scene and method
CN106803190A (en) * 2017-01-03 2017-06-06 北京掌阔移动传媒科技有限公司 A kind of ad personalization supplying system and method

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984668A (en) * 2018-06-29 2018-12-11 深圳鼎盛电脑科技有限公司 A kind of method, apparatus of data processing, equipment and storage medium
CN109255067A (en) * 2018-07-19 2019-01-22 国政通科技有限公司 One kind being based on big data intelligent recommendation method and apparatus
CN109034970A (en) * 2018-07-20 2018-12-18 政采云有限公司 Integrity index evaluation method, device, equipment and access medium
CN109447126A (en) * 2018-09-27 2019-03-08 长威信息科技发展股份有限公司 A kind of method and apparatus of entity and entity attribute dynamic aggregation construction personage's portrait
WO2020082596A1 (en) * 2018-10-23 2020-04-30 深圳壹账通智能科技有限公司 Data processing-based automatic user profile generating method and system
CN109558530A (en) * 2018-10-23 2019-04-02 深圳壹账通智能科技有限公司 User's portrait automatic generation method and system based on data processing
CN109635011A (en) * 2018-10-31 2019-04-16 北京辰森世纪科技股份有限公司 Multistage gauge outfit report processing method, device and equipment based on data service metadata
CN109522333A (en) * 2018-11-23 2019-03-26 北京锐安科技有限公司 Data analysing method, device, equipment and medium
US11176170B2 (en) 2018-11-30 2021-11-16 Advanced New Technologies Co., Ltd. Blockchain-based data processing methods and apparatuses and computer devices
WO2020108153A1 (en) * 2018-11-30 2020-06-04 阿里巴巴集团控股有限公司 Blockchain-based data processing method and apparatus, and computer device
CN109684330A (en) * 2018-12-17 2019-04-26 深圳市华云中盛科技有限公司 User's portrait base construction method, device, computer equipment and storage medium
CN110148049A (en) * 2019-04-15 2019-08-20 深圳壹账通智能科技有限公司 A kind of risk control method, device, computer equipment and readable storage medium storing program for executing
CN110442670A (en) * 2019-06-11 2019-11-12 天津交通职业学院 A kind of consumer representation generation method based on document indexing
CN110442670B (en) * 2019-06-11 2023-05-26 天津交通职业学院 Consumer portrait generation method based on text indexing
CN110287308A (en) * 2019-06-13 2019-09-27 薛映杜 A kind of computer data formula statistical method
CN110347739A (en) * 2019-06-26 2019-10-18 联动优势科技有限公司 A kind of the general data source access method and device of composite data item label
CN110347739B (en) * 2019-06-26 2021-04-20 联动优势科技有限公司 Universal data source access method and device for composite data item label
CN110490729A (en) * 2019-08-16 2019-11-22 南京汇银迅信息技术有限公司 A kind of financial user classification method based on user's portrait model
CN110490729B (en) * 2019-08-16 2022-11-18 南京汇银迅信息技术有限公司 Financial user classification method based on user portrait model
CN111177123A (en) * 2019-12-30 2020-05-19 联想(北京)有限公司 Method, apparatus, electronic device and medium for optimizing tag library
CN111597179A (en) * 2020-05-18 2020-08-28 北京思特奇信息技术股份有限公司 Method and device for automatically cleaning data, electronic equipment and storage medium
CN111597179B (en) * 2020-05-18 2023-12-05 北京思特奇信息技术股份有限公司 Method and device for automatically cleaning data, electronic equipment and storage medium
CN111831636A (en) * 2020-07-28 2020-10-27 平安国际融资租赁有限公司 Data processing method, device, computer system and readable storage medium
CN112182333A (en) * 2020-09-25 2021-01-05 山东亿云信息技术有限公司 Talent space-time big data processing method and system based on random forest
CN112214556A (en) * 2020-09-30 2021-01-12 招商局金融科技有限公司 Label generation method and device, electronic equipment and computer readable storage medium
CN112214556B (en) * 2020-09-30 2024-02-23 招商局金融科技有限公司 Label generation method, label generation device, electronic equipment and computer readable storage medium
CN112818023A (en) * 2021-01-26 2021-05-18 龚世燕 Big data analysis method and cloud computing server in associated cloud service scene
CN113449103A (en) * 2021-01-28 2021-09-28 民生科技有限责任公司 Bank transaction flow classification method and system integrating label and text interaction mechanism
CN113449103B (en) * 2021-01-28 2024-05-10 民生科技有限责任公司 Bank transaction running water classification method and system integrating label and text interaction mechanism
CN113094424A (en) * 2021-04-09 2021-07-09 北京元年科技股份有限公司 Method and system for identifying chart mode by constructing multi-level index system

Similar Documents

Publication Publication Date Title
CN107908606A (en) Method and system based on different aforementioned sources automatic report generation
CN107818344A (en) The method and system that user behavior is classified and predicted
JP6494777B2 (en) Method and device for selecting data content to be pushed to a terminal
Yoon et al. Structuring technological information for technology roadmapping: data mining approach
CN109558530A (en) User's portrait automatic generation method and system based on data processing
US20230162051A1 (en) Method, device and apparatus for execution of automated machine learning process
CN106067094A (en) A kind of dynamic assessment method and system
CN112632405B (en) Recommendation method, recommendation device, recommendation equipment and storage medium
CN103678618A (en) Web service recommendation method based on socializing network platform
CN108648068A (en) A kind of assessing credit risks method and system
Aeron et al. Data mining framework for customer lifetime value-based segmentation
CN108241867A (en) A kind of sorting technique and device
CN104598474B (en) Information recommendation method based on data semantic under cloud environment
CN111259167A (en) User request risk identification method and device
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
CN115130811A (en) Method and device for establishing power user portrait and electronic equipment
CN105389714A (en) Method for identifying user characteristic from behavior data
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
Shi et al. Human resources balanced allocation method based on deep learning algorithm
CN117076770A (en) Data recommendation method and device based on graph calculation, storage value and electronic equipment
US20120271789A1 (en) Apparatus and method for prediction development speed of technology
CN116402546A (en) Store risk attribution method and device, equipment, medium and product thereof
CN110442614A (en) Searching method and device, electronic equipment, the storage medium of metadata
CN114707510A (en) Resource recommendation information pushing method and device, computer equipment and storage medium
Nan et al. [Retracted] Corporate Marketing Strategy Analysis with Machine Learning Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20180606

Address after: 518052 Room 201, building A, 1 front Bay Road, Shenzhen Qianhai cooperation zone, Shenzhen, Guangdong

Applicant after: Shenzhen one ledger Intelligent Technology Co., Ltd.

Address before: 200030 Xuhui District, Shanghai Kai Bin Road 166, 9, 10 level.

Applicant before: Shanghai Financial Technologies Ltd

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20180413

RJ01 Rejection of invention patent application after publication