CN112199411B

CN112199411B - Big data analysis method and artificial intelligence platform applied to cloud computing communication architecture

Info

Publication number: CN112199411B
Application number: CN202010964887.1A
Authority: CN
Inventors: 刘明明
Original assignee: Xiamen Limayao Network Technology Co ltd
Current assignee: Xiamen limayao Network Technology Co.,Ltd.
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2021-06-29
Anticipated expiration: 2040-09-15
Also published as: CN112199411A

Abstract

The big data analysis method and the artificial intelligence platform applied to the cloud computing communication architecture are characterized in that a data screening model is adopted to screen flow redundancy data in business data to be screened to obtain multiple groups of user behavior data, data nodularization processing is carried out on each group of user behavior data according to time node distribution information determined based on historical data mining results to obtain multiple node data packets, behavior characteristic data are determined in the node data packets with the same data mining triggering time period and integrated into characteristic data to be mined according to the sequence of the data mining triggering time period, finally the characteristic data to be mined are converted according to a set conversion mode to obtain an input data set, and data mining is carried out on the input data set to obtain an output data set corresponding to the input data set. Therefore, the accuracy of the data mining result can be ensured on the premise of reducing the size of the feature data to be mined as much as possible, and the data mining efficiency is improved.

Description

Big data analysis method and artificial intelligence platform applied to cloud computing communication architecture

Technical Field

The application relates to the technical field of big data analysis and cloud computing processing, in particular to a big data analysis method and an artificial intelligence platform applied to a cloud computing communication architecture.

Background

Cloud computing (cloud computing) is one of distributed computing, and means that a huge data computing processing service is decomposed into a plurality of block services through a network cloud, and then the block services are processed and analyzed through a system composed of a plurality of servers, and corresponding data processing results are obtained.

In the big data era, data mining and data analysis are required to be carried out on various business data in order to obtain a large amount of value information behind the data. Artificial Intelligence (AI) is a branch of computer science that attempts to understand the essence of Intelligence and produces a new intelligent machine that can react in a manner similar to human Intelligence. With the development of science and technology, the application fields of artificial intelligence are more and more extensive, such as robots, language recognition, image recognition, natural language processing, expert systems and the like. The artificial intelligence technology is applied to the cloud computing environment for big data analysis, and the accuracy of data mining can be improved.

However, with the continuous expansion of data scale and data amount, it is difficult for the existing big data analysis technology to consider both the time consumption of data analysis and the accuracy of data analysis when performing data analysis.

Disclosure of Invention

The specification provides a big data analysis method and an artificial intelligence platform applied to a cloud computing communication architecture, and aims to solve or partially solve the technical problem that in the prior art, data analysis is difficult to take time and data analysis accuracy into consideration.

In a first aspect of the present specification, a big data analysis method applied to a cloud computing communication architecture is provided, where the method includes:

the method comprises the steps that business data to be screened are periodically collected from an application device side, and flow redundant data in the business data to be screened are screened out by adopting a data screening model which is trained in advance, so that a plurality of groups of user behavior data are obtained; the business data to be screened is business interaction data comprising process redundancy data used for increasing data mining duration;

according to time node distribution information determined based on historical data mining results, performing data nodularization processing on each group of user behavior data to obtain a plurality of node data packets;

determining behavior characteristic data in node data packets with the same data mining triggering time period, and integrating multiple groups of behavior characteristic data into characteristic data to be mined according to the sequence of the data mining triggering time periods;

converting the characteristic data to be mined according to a set conversion mode to obtain an input data set, and performing data mining on the input data set to obtain an output data set corresponding to the input data set; wherein the output data set includes user portrait data corresponding to the user behavior data.

In a second aspect of the present specification, an artificial intelligence platform is provided, where the artificial intelligence platform is in communication connection with an application device, and the artificial intelligence platform is configured to:

In a third aspect of the present description, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the above-mentioned method.

In a fourth aspect of the present specification, an artificial intelligence platform is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the above method when executing the program.

Through one or more technical schemes of this description, this description has following beneficial effect or advantage:

the method comprises the steps of screening process redundancy data in service data to be screened by adopting a data screening model to obtain multiple groups of user behavior data, performing data nodularization processing on each group of user behavior data respectively according to time node distribution information determined based on historical data mining results to obtain multiple node data packets, determining behavior characteristic data in the node data packets with the same data mining triggering time interval, integrating the behavior characteristic data into characteristic data to be mined according to the sequence of the data mining triggering time interval, converting the characteristic data to be mined according to a set conversion mode to obtain an input data set, and performing data mining on the input data set to obtain an output data set corresponding to the input data set.

Therefore, by screening the business data to be screened, useless process redundant data can be removed, the data scale of the user behavior data is reduced, and the processing of the user behavior data with higher feature recognition degree cannot be influenced because the process redundant data are removed. Furthermore, by determining and integrating the behavior characteristic data from the node data packets with the same data mining triggering time interval, the characteristic recognition degree of the characteristic data to be mined can be ensured on the premise of reducing the data scale and the data size of the characteristic data to be mined, so that the accuracy of the data mining result can be ensured on the premise of reducing the size of the characteristic data to be mined as much as possible, and the data mining efficiency is improved.

The above description is only an outline of the technical solution of the present specification, and the embodiments of the present specification are described below in order to make the technical means of the present specification more clearly understood, and the present specification and other objects, features, and advantages of the present specification can be more clearly understood.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the specification. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates an architectural diagram of a big data analytics system applied to a cloud computing communication architecture, according to one embodiment of the present description;

FIG. 2 illustrates a flow diagram of a big data analytics method applied to a cloud computing communication architecture, according to one embodiment of the present description;

FIG. 3 illustrates a functional block diagram of a big data analysis apparatus applied to a cloud computing communication architecture according to one embodiment of the present description;

FIG. 4 illustrates a schematic diagram of an artificial intelligence platform in accordance with one embodiment of the present description.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

After the inventor researches and analyzes the existing big data analysis technology, the existing big data analysis technology does not actively determine and identify data with higher feature recognition degree when analyzing and mining the big data, which results in longer time consumption when the data analysis is mining, and if the analysis of some data is omitted for reducing the time consumption, the data with higher feature recognition degree can be omitted, thereby resulting in the accuracy of the data analysis and mining being reduced.

In order to solve the technical problems, embodiments of the present invention provide a big data analysis method and an artificial intelligence platform applied to a cloud computing communication architecture, which can ensure accuracy of a data mining result and improve data mining efficiency on the premise of reducing the size of feature data to be mined as much as possible.

To achieve the above objective, please first refer to fig. 1, which illustrates an architecture diagram of a big data analysis system 100 applied to a cloud computing communication architecture, where the big data analysis system 100 may include an artificial intelligence platform 200 and an application device 400 that communicate with each other. The application device 400 may be a service terminal, such as a mobile phone, a tablet computer, a notebook computer, an intelligent wearable device, and the like, which is not limited herein. The artificial intelligence platform 200 may be a server for data mining, such as a big data platform, a cloud data center, and the like.

It can be understood that the big data analysis system 100 can be applied to user portrait mining in block chain payment, and can also be applied to internet of things, internet of vehicles, smart cities, game platforms on the cloud, virtual reality, online education, online office, smart medical treatment, and the like, which are not limited herein.

On the basis of fig. 1, fig. 2 shows a flowchart of a big data analysis method applied to a cloud computing communication architecture, where the method may be applied to the artificial intelligence platform in fig. 1, and specifically may include the contents described in the following steps S21-S24.

And step S21, periodically collecting service data to be screened from the application equipment side, and screening out flow redundant data in the service data to be screened by adopting a pre-trained data screening model to obtain a plurality of groups of user behavior data.

For example, periodically collecting the service data to be screened may be understood as collecting the service data to be screened according to a set event interval, the data screening model may be trained according to sample service data, the sample service data may be historical user behavior data, and the service data to be screened is service interaction data including process redundancy data for increasing data mining duration.

And step S22, performing data nodularization processing on each group of user behavior data respectively according to the time node distribution information determined based on the historical data mining result to obtain a plurality of node data packets.

For example, historical data mining results are stored in a preset database of the artificial intelligence platform, time node distribution information is used for representing the time node distribution situation during data mining, and data nodularization processing can be understood as data packet compression and encapsulation on user behavior data.

Step S23, determining behavior feature data in node data packets with the same data mining trigger time interval, and integrating multiple groups of behavior feature data into feature data to be mined according to the sequence of the data mining trigger time interval.

For example, the feature data is used to characterize a user profile of the node packet.

And step S24, converting the characteristic data to be mined according to a set conversion mode to obtain an input data set, and performing data mining on the input data set to obtain an output data set corresponding to the input data set.

For example, the output data set includes user portrait data corresponding to the user behavior data.

When the contents described in the above steps S21 to S24 are applied, the data screening model is first used to screen out the process redundancy data in the service data to be screened to obtain multiple groups of user behavior data, then data nodularization processing is performed on each group of user behavior data according to the time node distribution information determined based on the historical data mining result to obtain multiple node data packets, then behavior feature data are determined in the node data packets with the same data mining trigger time period and are integrated into the feature data to be mined according to the sequence of the data mining trigger time period, finally the feature data to be mined are converted according to a set conversion mode to obtain an input data set, and data mining is performed on the input data set to obtain an output data set corresponding to the input data set.

In specific implementation, the inventor finds that data deviation is easy to occur in user behavior data obtained by screening when data screening is performed on service data to be screened, and the reason for the data deviation is that the influence of process redundant data on data continuity of the user behavior data is not analyzed, so that all the process redundant data are removed. To improve the technical problem, the screening of the pre-trained data screening model in step S21 to obtain multiple sets of user behavior data includes the following steps S211 to S215.

Step S211, extracting service item category labels of the service data to be screened through a label extraction subnetwork in the data screening model, identifying data streams located under each service item category label from the service data to be screened through a data stream identification subnetwork in the data screening model, integrating the data streams located under each service item category label in the service data to be screened into a first data set, and integrating data except the first data set in the service data to be screened into a second data set.

Step S212, on the premise that it is determined that a periodic calling tag and a non-calling tag exist in the service data to be screened based on the service item category tag, determining a time sequence matching coefficient between each second target data stream of the second data set under the non-calling tag and each first target data stream of the second data set under the periodic calling tag according to the first target data stream of the second data set under the periodic calling tag and the data calling heat value of the first target data stream.

Step S213, allocating a second target data stream of the second data set under the non-call label and a first target data stream under the staged call label to be time-sequentially matched under the staged call label based on the time-sequence matching coefficient; when the non-call tag corresponding to the second data set contains a plurality of time-sequential continuous data streams, determining a time sequence matching coefficient between each time-sequential continuous data stream of the second data set under the non-call tag according to a first target data stream of the second data set under the periodic call tag and a data call heat value of the first target data stream, and merging each time-sequential continuous data stream under the non-call tag according to the time sequence matching coefficient between each time-sequential continuous data stream; and setting data flow distribution information for the third target data flow obtained by merging according to the first target data flow of the second data set under the periodic call label and the data call heat value of the first target data flow, and sequentially distributing part of the third target data flow under the periodic call label based on the distribution priority in the data flow distribution information.

Step S214, determining a first numerical value for characterizing a first data size of a data flow in the first data set, a second numerical value for characterizing a second data size of a data flow of the second data set under the periodic call tag, and a third numerical value for characterizing a third data size of a data flow of the second data set under the non-call tag; and calculating the sum of the first numerical value and the second numerical value, and judging whether the ratio of the third numerical value to the sum exceeds a set ratio.

Step S215, when the ratio of the third numerical value to the sum value does not exceed the set ratio, determining the data flow under the non-call label as the flow redundant data, and integrating the data flow in the first data set and the data flow under the periodic call label as the user behavior data; when the ratio of the third numerical value to the sum exceeds the set ratio, performing data flow feature extraction on the data flow under the non-call label by adopting a redundant data feature extraction subnet in a data screening model to obtain multiple groups of data flow features corresponding to the non-call label; and determining a business process participation coefficient corresponding to each group of data stream characteristics, sequencing the data streams corresponding to each group of data stream characteristics according to the descending order of the business process participation coefficients to obtain a sequencing queue, and sequentially selecting at least one data stream in the front sequencing from the sequencing queue to be distributed under the stage calling label until the ratio of the third numerical value to the sum value does not exceed the set ratio.

In this way, by executing the contents described in steps S211 to S215, the influence of the process redundant data on the data continuity of the user behavior data can be taken into consideration based on the ratio between the first data set, the staged call tag, and the data stream under the non-call tag, so as to avoid removing all the process redundant data, and thus, the data deviation of the screened user behavior data can be avoided when the data of the service data to be screened is screened, and the accuracy and reliability of the screened user behavior data can be further ensured.

In a specific embodiment, in order to ensure that updating conditions between data are taken into account when integrating data streams, so as to ensure accuracy and real-time performance of user behavior data, the integration of the data streams in the first data set and the data streams under the periodic call tag into the user behavior data described in step S215 may specifically include what is described in steps S2151 to S2155 below.

Step S2151, generating a first service behavior parameter set corresponding to the data stream in the first data set based on the data correlation of the data stream in the first data set, and determining a second service behavior parameter set of the data stream under the periodic call tag through the extracted tag call frequency in the periodic call tag.

Step S2152, after the first service behavior parameter set and the second service behavior parameter set are obtained, obtaining a first behavior parameter list of the first service behavior parameter set and a second behavior parameter list of the second service behavior parameter set, where the first service behavior parameter set includes a first user behavior identification parameter, and the second service behavior parameter set includes a second user behavior identification parameter.

Step S2153, acquiring each list unit in the first behavior parameter list and each list unit in the second behavior parameter list, and obtaining a list unit distribution map.

Step S2154, determining list fusion accuracy between any two list units in the list unit distribution map to obtain a list fusion list; and adjusting the list fusion accuracy rate smaller than the set accuracy rate in the list fusion list to the set accuracy rate to obtain a fusion correction list.

Step S2155, weighting the first user behavior identification parameter and the second user behavior identification parameter according to the fusion correction list to obtain a third user behavior identification parameter, and performing multiple iterative merging on the data stream in the first data set and the data stream under the periodic call label by using the third user behavior identification parameter to obtain the user behavior data.

In this way, based on the content described in the above steps S2151 to S2155, the update situation between data can be taken into account when integrating data streams, thereby ensuring the accuracy and real-time performance of user behavior data.

On the basis of the step S2155, the data stream in the first data set and the data stream under the periodic call tag are iteratively merged multiple times by using the third user behavior identification parameter, so as to obtain the user behavior data, which can be further implemented by the following contents described in steps S2155a to S2155 d.

Step S2155a, extracting a current data stream from the data streams in the first data set according to the third user behavior recognition parameter, and determining a corresponding mapping trajectory parameter from the third user behavior recognition parameter according to the timing trajectory parameter corresponding to the current data stream.

Step S2155b, determining a corresponding data stream to be merged from the data stream under the periodic call label through the mapping trajectory parameter, and merging the data stream to be merged and the current data stream to obtain a set of user behavior data.

Step S2155c, extracting behavior trace features corresponding to the group of user behavior data, and updating the data streams in the first data set and the data streams under the periodic call label based on the behavior trace features, specifically including: removing the current data stream from the data streams in the first data set, updating the time sequence relevance between the remaining data streams in the first data set based on the behavior trace characteristics, removing the data stream to be merged from the data stream under the staged call label, and updating the time sequence relevance between the remaining data streams in the data stream under the staged call label based on the behavior trace characteristics.

Step S2155d, when there is no data stream in the first data set or no data stream under the periodic call tag, completes multiple iterative merging of the data streams in the first data set and the data streams under the periodic call tag.

It can be understood that based on the descriptions of step S2155 a-step S2155d, mutual interference between different data streams can be avoided when the data streams are combined for multiple iterations, so as to ensure the accuracy and reliability of the multiple iterations.

In one possible embodiment, in order to ensure the accuracy of the data nodularization process in time sequence, the data nodularization process is performed on each set of user behavior data according to the time node distribution information determined based on the historical data mining result, which is described in step S22, to obtain a plurality of node data packets, and further, the method may include the following steps S221 to S225.

Step S221, determining a result generation time corresponding to each data mining result in the historical data mining results, and fitting the result generation time to obtain a result generation time period curve corresponding to the historical data mining results.

Step S222, obtaining time interval information determined based on curve feature distribution in the result generation time interval curve, wherein time node parameters of the result generation time interval curve are recorded in the time interval information; and under the condition that the time node parameter is found in a preset time sequence matching list, extracting initial time node distribution information corresponding to the time node parameter, and taking the extracted initial time node distribution information as the current time node distribution information of the result generation period curve.

Step S223, determining a time node distribution map in the current time node distribution information, and extracting graph data description values of a plurality of distribution subgraphs from the time node distribution map; and the graph data description value is used for representing the data node category corresponding to the distribution subgraph.

Step S224, mapping the multiple graph data description values to data messages corresponding to each group of user behavior data, respectively, to obtain data node mapping values corresponding to each graph data description value in the data messages; and dividing the target data message corresponding to the data node mapping value in the user behavior data into a plurality of target message frames based on data packet capturing logic.

Step S225, in each target message frame of each graph data description value, determining the message frame priority corresponding to the message frame identification contained in the target data message of each graph data description value and each target message frame respectively; and based on at least the graph data description value, the message frame priority and a plurality of user behavior nodes corresponding to each group of user behavior data and execution parameter variables corresponding to each user behavior node, taking the execution parameter variables corresponding to each user behavior node as data packet capturing bases, and capturing behavior data sets corresponding to each user behavior node from the user behavior data to obtain a node data packet corresponding to each user behavior node.

It can be understood that when the contents described in the above steps S221 to S225 are applied, the accuracy of the data nodalization process in terms of time sequence can be ensured, so that the accuracy and the time sequence continuity of the node data packet can be ensured.

In one possible example, in order to improve the accuracy of feature data extraction, ensure the feature recognition degree of the feature data, and further reduce the data size of the feature data, the determining of the behavior feature data in the node data packets in which the same data mining trigger period exists as described in step S23 may specifically include the following contents described in step S2311 and step S2312.

Step S2311, determining to obtain a data packet compression thread parameter at least according to data compression paths of a first node data packet and a second node data packet with the same data mining trigger time interval; and determining to obtain the data mining thread parameters from the data mining log files corresponding to the same data mining trigger time.

Step S2312, acquiring a data set to be extracted of node data packets with the same data mining trigger time interval; determining feature pointing path data of tag pointing information of at least one behavior trace tag included in the data set to be extracted according to the data packet compression thread parameters; and taking the feature pointing path data of the tag pointing information as reference data of the data mining thread parameters, configuring the data mining thread parameters and the reference data into corresponding feature extraction models to realize parameter adjustment of the feature extraction models, and extracting behavior feature data of the data set to be extracted.

It can be understood that based on the steps S2311 and S2312, the accuracy of feature data extraction can be improved, the feature recognition degree of the feature data can be ensured, and the data size of the feature data can be reduced.

Further, in order to ensure the feature continuity of the integrated feature data to be mined, in step S23, the multiple sets of behavior feature data are integrated into the feature data to be mined according to the sequence of the data mining trigger time period, and the method may further include the following contents described in step S2321 to step S2324.

Step S2321, a first ranking factor of the data mining trigger time period corresponding to each group of behavior characteristic data in the time sequence order and a second ranking factor of the data mining trigger time period corresponding to each group of behavior characteristic data in the data mining heat degree are determined.

Step S2322, the multiple groups of behavior feature data are sorted according to the descending order of the first sorting factor to obtain a time sequence sorting queue, and the multiple groups of behavior feature data are sorted according to the descending order of the second sorting factor to obtain a data mining heat queue.

Step S2323, acquiring a first sorting number value of each group of behavior characteristic data in the time sequence sorting queue and a second sorting number value of each group of behavior characteristic data in the data mining heat queue; calculating a difference value between a first sequencing number value and a second sequencing number value corresponding to each group of behavior characteristic data, and judging whether the difference value falls into a set numerical value interval; and when the difference value is within the set value interval, moving the behavior characteristic data to the queue head of the time sequence sorting queue in the time sequence sorting queue, fixing the sorting number of the behavior characteristic data in the time sequence sorting queue, and returning to the step of acquiring the first sorting number value of each group of behavior characteristic data in the time sequence sorting queue and the second sorting number value of each group of behavior characteristic data in the data mining heat degree queue until the final time sequence sorting queue is obtained.

Step S2324, integrating the multiple groups of behavior characteristic data into characteristic data to be mined according to the final time sequence sorting queue.

In specific implementation, by executing the steps S2321 to S2324, the adjustment of the sorting position of the feature data can be realized through different sorting queues, so as to ensure the feature continuity of the integrated feature data to be mined.

In an alternative embodiment, in order to ensure that a data format for data mining can be matched with a system data format of an artificial intelligence platform, thereby ensuring that the data mining can be normally implemented, and avoiding interruption or error of the data mining caused by the heterogeneity of the data format, data conversion needs to be performed on feature data to be mined during the data mining. To achieve this, in step S24, the feature data to be mined is converted into an input data set according to a set conversion manner, which can be specifically realized by the following steps S2411 to S2414.

Step S2411, based on the extracted current configuration file of the artificial intelligence platform, inquiring in a preset script file list to obtain a system data format script corresponding to the current configuration file.

Step S2412, acquiring at least two current data formats in the feature data to be mined; and transcoding the system data format script in a set script coding mode to obtain a reference coding string.

Step S2413, transcoding each data format according to the set script coding form to obtain a coding string to be matched aiming at each data format of the at least two current data formats.

Step S2414, judging whether each group of coding strings to be matched is matched with the reference coding string in a bit-by-bit comparison mode, if not, converting the data to be converted corresponding to the coding strings to be matched in the feature data to be mined according to the system data format script to obtain converted data; and determining the converted data and the original data to be mined corresponding to the code string to be matched, which is matched with the reference code string, in the feature data to be mined as the input data set.

It can be understood that, by executing the above steps S2411 to S2414, data conversion can be performed on feature data to be mined during data mining, so that a data format for data mining can be ensured to be matched with a system data format of an artificial intelligence platform, data mining can be ensured to be normally realized, and interruption or errors in data mining caused by the heterogeneity of the data format are avoided. In addition, when data format conversion is carried out, the data format conversion efficiency can be improved by comparing the consistency of the coding strings, format conversion of all data in the feature data to be mined is avoided, and memory resources and thread resources of the artificial intelligence platform are effectively saved.

In an alternative embodiment, in order to perform accurate and complete mining on the data, the input data set is subjected to data mining as described in step S24, and an output data set corresponding to the input data set is obtained, which may exemplarily include the following steps S2421 to S2423.

Step S2421, extracting characteristic clustering data in the behavior characteristic clustering indexes of the input data set, wherein the characteristic clustering data comprise sets of to-be-mined portraits of the same user behavior event of the input data set.

Step S2422, processing the feature clustering data through a thread state queue in an associated clustering thread of the multi-dimensional feature clustering threads, and determining a feature data clustering result matched with the feature clustering data; and determining a user portrait clustering result matched with the feature clustering data through a user portrait state queue in an associated clustering thread in the multi-dimensional feature clustering threads based on the feature data clustering result.

And step S2423, mining the feature clustering data through a user portrait mining thread of the multi-dimensional feature clustering thread based on the user portrait clustering result matched with the feature clustering data so as to output an output data set which is processed by multi-dimensional feature mining and contains the user portrait data.

Therefore, based on the steps S2421 to S2423, the input data set can be accurately and completely mined, so as to ensure the accuracy of the user portrait data, and facilitate the matching and pushing of the user portrait data in the later period.

Based on the same inventive concept as the foregoing embodiment, please refer to fig. 3 in combination, which shows a big data analysis apparatus 300 applied to a cloud computing communication architecture, and the apparatus is applied to the artificial intelligence platform 200 in fig. 1, and the apparatus includes:

the data screening module 310 is configured to periodically collect service data to be screened from an application device side, and screen out process redundant data in the service data to be screened by using a pre-trained data screening model to obtain multiple groups of user behavior data; the business data to be screened is business interaction data comprising process redundancy data used for increasing data mining duration;

the node processing module 320 is configured to perform data nodalization processing on each group of user behavior data respectively according to time node distribution information determined based on a historical data mining result to obtain a plurality of node data packets;

the feature integration module 330 is configured to determine behavior feature data in node data packets with the same data mining trigger time period, and integrate multiple sets of behavior feature data into feature data to be mined according to the sequence of the data mining trigger time periods;

the data mining module 340 is configured to convert the feature data to be mined according to a set conversion manner to obtain an input data set, and perform data mining on the input data set to obtain an output data set corresponding to the input data set; wherein the output data set includes user portrait data corresponding to the user behavior data.

Based on the same inventive concept as the previous embodiment, an artificial intelligence platform is further shown, the artificial intelligence platform is in communication connection with the application device, and the artificial intelligence platform is used for:

Based on the same inventive concept as in the previous embodiments, the present specification further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of any of the methods described above.

Based on the same inventive concept as the previous embodiment, an embodiment of the present specification further provides an artificial intelligence platform 200, as shown in fig. 4, including a memory 204, a processor 202, and a computer program stored on the memory 204 and executable on the processor 202, wherein the processor 202 implements the steps of any one of the methods described above when executing the program.

Through one or more embodiments of the present description, the present description has the following advantages or advantages:

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, this description is not intended for any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present specification and that specific languages are described above to disclose the best modes of the specification.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the present description may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the specification, various features of the specification are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the present specification as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this specification.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the description and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of this description may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of a gateway, proxy server, system in accordance with embodiments of the present description. The present description may also be embodied as an apparatus or device program (e.g., computer program and computer program product) for performing a portion or all of the methods described herein. Such programs implementing the description may be stored on a computer-readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the specification, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The description may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

On the basis of the above, the present specification also provides two possible embodiments, which are embodiment a and embodiment B, respectively, and the detailed description of embodiment a and embodiment B is as follows.

The description of embodiment a is given.

A1. A big data analysis method applied to a cloud computing communication architecture, the method comprising:

converting the characteristic data to be mined according to a set conversion mode to obtain an input data set, and performing data mining on the input data set to obtain an output data set corresponding to the input data set; wherein the output dataset comprises user representation data corresponding to the user behavior data;

converting the feature data to be mined according to a set conversion mode to obtain an input data set, specifically comprising the following steps:

based on the extracted current configuration file of the artificial intelligence platform, inquiring a preset script file list to obtain a system data format script corresponding to the current configuration file;

acquiring at least two current data formats in the feature data to be mined; transcoding the system data format script in a form of set script coding to obtain a reference coding string;

for each data format in the at least two current data formats, transcoding each data format according to the form of the set script coding to obtain a coding string to be matched;

judging whether each group of coding strings to be matched is matched with the reference coding string in a bit-by-bit comparison mode, if not, converting the data to be converted corresponding to the coding strings to be matched in the feature data to be mined according to the system data format script to obtain converted data; and determining the converted data and the original data to be mined corresponding to the code string to be matched, which is matched with the reference code string, in the feature data to be mined as the input data set.

A2. According to the method of a1, screening out flow redundancy data in the business data to be screened by adopting a pre-trained data screening model to obtain multiple groups of user behavior data, including:

extracting service item category labels of the service data to be screened through a label extraction sub-network in the data screening model, identifying data streams under each service item category label from the service data to be screened through a data stream identification sub-network in the data screening model, integrating the data streams under each service item category label in the service data to be screened into a first data set, and integrating data except the first data set in the service data to be screened into a second data set;

on the premise that a periodic calling label and a non-calling label exist in the to-be-screened business data based on a business item category label, determining a time sequence matching coefficient between each second target data stream of the second data set under the non-calling label and each first target data stream of the second data set under the periodic calling label according to a first target data stream of the second data set under the periodic calling label and a data calling heat value of the first target data stream;

assigning a second target data stream of the second data set under the non-calling tag that is time-sequentially matched with a first target data stream under the staging calling tag to under the staging calling tag based on the time-sequence matching coefficients; when the non-call tag corresponding to the second data set contains a plurality of time-sequential continuous data streams, determining a time sequence matching coefficient between each time-sequential continuous data stream of the second data set under the non-call tag according to a first target data stream of the second data set under the periodic call tag and a data call heat value of the first target data stream, and merging each time-sequential continuous data stream under the non-call tag according to the time sequence matching coefficient between each time-sequential continuous data stream; setting data flow distribution information for a third target data flow obtained by merging according to a first target data flow of the second data set under the periodic call label and a data call heat value of the first target data flow, and sequentially distributing part of the third target data flow under the periodic call label based on distribution priority in the data flow distribution information;

determining a first numerical value for characterizing a first data volume size of a data flow in the first data set, a second numerical value for characterizing a second data volume size of a data flow of the second data set under the periodic call tag, and a third numerical value for characterizing a third data volume size of a data flow of the second data set under the non-call tag; calculating the sum of the first numerical value and the second numerical value, and judging whether the ratio of the third numerical value to the sum exceeds a set ratio;

when the ratio of the third numerical value to the sum value does not exceed the set ratio, determining the data flow under the non-call label as the flow redundant data, and integrating the data flow in the first data set and the data flow under the periodic call label as the user behavior data; when the ratio of the third numerical value to the sum exceeds the set ratio, performing data flow feature extraction on the data flow under the non-call label by adopting a redundant data feature extraction subnet in a data screening model to obtain multiple groups of data flow features corresponding to the non-call label; and determining a business process participation coefficient corresponding to each group of data stream characteristics, sequencing the data streams corresponding to each group of data stream characteristics according to the descending order of the business process participation coefficients to obtain a sequencing queue, and sequentially selecting at least one data stream in the front sequencing from the sequencing queue to be distributed under the stage calling label until the ratio of the third numerical value to the sum value does not exceed the set ratio.

A3. Integrating the data flow in the first data set and the data flow under the periodic call label into the user behavior data according to the method of a2, including:

generating a first business behavior parameter set corresponding to the data flow in the first data set based on the data relevance of the data flow in the first data set, and determining a second business behavior parameter set of the data flow under the periodic calling tag through the extracted tag calling frequency in the periodic calling tag;

after the first service behavior parameter set and the second service behavior parameter set are obtained, a first behavior parameter list of the first service behavior parameter set and a second behavior parameter list of the second service behavior parameter set are obtained, wherein the first service behavior parameter set comprises a first user behavior identification parameter, and the second service behavior parameter set comprises a second user behavior identification parameter;

acquiring each list unit in the first behavior parameter list and each list unit in the second behavior parameter list to obtain a list unit distribution diagram;

determining the list fusion accuracy rate between any two list units in the list unit distribution diagram to obtain a list fusion list; adjusting the list fusion accuracy rate smaller than the set accuracy rate in the list fusion list to the set accuracy rate to obtain a fusion correction list;

and weighting the first user behavior identification parameter and the second user behavior identification parameter according to the fusion correction list to obtain a third user behavior identification parameter, and performing multiple iterative combination on the data stream in the first data set and the data stream under the periodic call label by adopting the third user behavior identification parameter to obtain the user behavior data.

A4. According to the method described in a3, performing multiple iterative combinations on the data streams in the first data set and the data streams under the periodic call labels by using the third user behavior recognition parameter to obtain the user behavior data, including:

extracting a current data stream from the data streams in the first data set according to the third user behavior identification parameter, and determining a corresponding mapping track parameter from the third user behavior identification parameter according to a time sequence track parameter corresponding to the current data stream;

determining a corresponding data stream to be merged from the data stream under the periodic calling label through the mapping track parameter, and merging the data stream to be merged and the current data stream to obtain a group of user behavior data;

extracting behavior trace characteristics corresponding to the group of user behavior data, and updating the data stream in the first data set and the data stream under the periodic call label based on the behavior trace characteristics, specifically including: removing the current data stream from the data streams in the first data set, updating the time sequence relevance among the remaining data streams in the first data set based on the behavior trace characteristics, removing the data stream to be merged from the data stream under the staged call label, and updating the time sequence relevance among the remaining data streams in the data stream under the staged call label based on the behavior trace characteristics;

and when no data stream exists in the first data set or under the periodic call tag, finishing multiple iterative combination of the data stream in the first data set and the data stream under the periodic call tag.

A5. According to the method of any one of A1-A4, according to the time node distribution information determined based on the historical data mining result, each group of user behavior data is subjected to data nodularization processing to obtain a plurality of node data packets, and the method comprises the following steps:

determining result generation time corresponding to each data mining result in the historical data mining results, and fitting the result generation time to obtain a result generation time period curve corresponding to the historical data mining results;

acquiring time interval information determined based on curve feature distribution in the result generation time interval curve, wherein time node parameters of the result generation time interval curve are recorded in the time interval information; under the condition that the time node parameter is found in a preset time sequence matching list, extracting initial time node distribution information corresponding to the time node parameter, and taking the extracted initial time node distribution information as the current time node distribution information of the result generation period curve;

determining a time node distribution diagram in the current time node distribution information, and extracting diagram data description values of a plurality of distribution subgraphs from the time node distribution diagram; the graph data description value is used for representing the data node category corresponding to the distribution subgraph;

mapping a plurality of graph data description values to data messages corresponding to each group of user behavior data respectively to obtain data node mapping values corresponding to each graph data description value in the data messages; dividing a target data message corresponding to the data node mapping value in the user behavior data into a plurality of target message frames based on data packet capturing logic;

determining the message frame priority corresponding to the message frame identifier contained in the target data message of each graph data description value and each target message frame in each target message frame of each graph data description value; and based on at least the graph data description value, the message frame priority and a plurality of user behavior nodes corresponding to each group of user behavior data and execution parameter variables corresponding to each user behavior node, taking the execution parameter variables corresponding to each user behavior node as data packet capturing bases, and capturing behavior data sets corresponding to each user behavior node from the user behavior data to obtain a node data packet corresponding to each user behavior node.

A6. According to the method of a1, determining behavior feature data in node packets with the same data mining trigger period comprises:

determining to obtain a data packet compression thread parameter at least according to data compression paths of a first node data packet and a second node data packet with the same data mining trigger time interval; determining to obtain data mining thread parameters from the data mining log files corresponding to the same data mining trigger time;

acquiring a data set to be extracted of node data packets with the same data mining trigger time period; determining feature pointing path data of tag pointing information of at least one behavior trace tag included in the data set to be extracted according to the data packet compression thread parameters; and taking the feature pointing path data of the tag pointing information as reference data of the data mining thread parameters, configuring the data mining thread parameters and the reference data into corresponding feature extraction models to realize parameter adjustment of the feature extraction models, and extracting behavior feature data of the data set to be extracted.

A7. According to the method of a6, integrating multiple sets of behavior feature data into feature data to be mined according to the sequence of the data mining trigger time period, including:

determining a first ranking factor of the data mining triggering time period corresponding to each group of behavior characteristic data on the time sequence and determining a second ranking factor of the data mining triggering time period corresponding to each group of behavior characteristic data under the data mining heat degree;

sequencing the multiple groups of behavior characteristic data according to the sequence of the first sequencing factors from large to small to obtain a time sequence sequencing queue, and sequencing the multiple groups of behavior characteristic data according to the sequence of the second sequencing factors from small to large to obtain a data mining heat queue;

acquiring a first sorting number value of each group of behavior characteristic data in the time sequence sorting queue and a second sorting number value of each group of behavior characteristic data in the data mining heat queue; calculating a difference value between a first sequencing number value and a second sequencing number value corresponding to each group of behavior characteristic data, and judging whether the difference value falls into a set numerical value interval; when the difference value is within the set value interval, moving the behavior characteristic data to a queue head of the time sequence sorting queue in the time sequence sorting queue, fixing the sorting number of the behavior characteristic data in the time sequence sorting queue, and returning to the step of acquiring a first sorting number value of each group of behavior characteristic data in the time sequence sorting queue and a second sorting number value of each group of behavior characteristic data in the data mining heat degree queue until a final time sequence sorting queue is obtained;

and integrating a plurality of groups of behavior characteristic data into characteristic data to be mined according to the final time sequence sorting queue.

A8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of a1-a 7.

A9. An artificial intelligence platform comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps of the method of any one of a1-a 7.

Description of embodiment B.

B1. A big data analysis method applied to a cloud computing communication architecture, the method comprising:

the data mining is performed on the input data set to obtain an output data set corresponding to the input data set, and the method specifically includes:

extracting feature clustering data in a behavior feature clustering index of the input data set, wherein the feature clustering data comprise a set of to-be-mined figures of the same user behavior event of the input data set;

processing the feature clustering data through a thread state queue in an associated clustering thread of a multi-dimensional feature clustering thread, and determining a feature data clustering result matched with the feature clustering data; determining a user portrait clustering result matched with the feature clustering data through a user portrait state queue in an associated clustering thread in the multi-dimensional feature clustering threads based on the feature data clustering result;

and mining the feature clustering data through a user portrait mining thread of the multi-dimensional feature clustering thread based on a user portrait clustering result matched with the feature clustering data so as to output an output data set which is subjected to multi-dimensional feature mining and contains the user portrait data.

B2. According to the method of B1, screening out flow redundancy data in the business data to be screened by using a pre-trained data screening model to obtain multiple sets of user behavior data, including:

B3. Integrating the data flow in the first data set and the data flow under the periodic call label into the user behavior data according to the method of B2, including:

B4. According to the method of B3, performing multiple iterative combinations on the data streams in the first data set and the data streams under the periodic call label by using the third user behavior recognition parameter to obtain the user behavior data, including:

B5. According to the method of any one of B1-B4, according to time node distribution information determined based on historical data mining results, each group of user behavior data is subjected to data nodularization processing to obtain a plurality of node data packets, and the method comprises the following steps:

B6. According to the method of B1, the determining the behavior characteristic data in the node data packets with the same data mining triggering period comprises the following steps:

B7. According to the method of B6, integrating multiple sets of behavior feature data into feature data to be mined according to the sequence of the data mining trigger time period includes:

B8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of B1-B7.

B9. An artificial intelligence platform comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps of the method of any one of B1-B7.

Claims

1. A big data analysis method applied to a cloud computing communication architecture is characterized by comprising the following steps:

screening out flow redundancy data in the business data to be screened by adopting a data screening model which is trained in advance to obtain a plurality of groups of user behavior data, wherein the method comprises the following steps:

2. The method of claim 1, wherein integrating the data flow in the first data set and the data flow under the periodic call label into the user behavior data comprises:

3. The method of claim 2, wherein iteratively merging the data streams in the first data set and the data streams under the periodic call label for a plurality of times using the third user behavior recognition parameter to obtain the user behavior data, comprises:

4. The method according to any one of claims 1 to 3, wherein the data nodularization processing is performed on each group of user behavior data according to the time node distribution information determined based on the historical data mining result to obtain a plurality of node data packets, and the method comprises the following steps:

5. The method of claim 1, wherein determining behavior characteristic data in node packets having the same data mining trigger period comprises:

6. The method according to claim 5, wherein integrating the multiple sets of behavior feature data into feature data to be mined according to the sequence of the data mining trigger time periods comprises:

7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

8. An artificial intelligence platform comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1-6 are implemented when the program is executed by the processor.