CN117273765B

CN117273765B - Multistage dealer circulation data processing method and system based on automatic check

Info

Publication number: CN117273765B
Application number: CN202311549858.9A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Guangzhou Oupai Creative Home Design Co ltd
Current assignee: Guangzhou Oupai Creative Home Design Co ltd
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-02-06
Anticipated expiration: 2043-11-21
Also published as: CN117273765A

Abstract

The utility model provides a multistage dealer circulation data processing method and system based on automatic check, can distribute estimated annotation information for circulation data clusters of a first circulation data learning sample for circulation data clusters of the first circulation data learning sample through the saliency recognition algorithm, the detailed classification of estimated annotation information characterization distributed for circulation data clusters with higher saliency is higher, thus the saliency weight is regulated through the difference between the real annotation information and the estimated annotation information of the circulation data clusters, then the saliency recognition algorithm can distribute higher saliency for circulation data clusters with higher detailed performance in the circulation data through the regulated saliency weight, distribute lower saliency for circulation data clusters with lower detailed performance in the circulation data, so as to more comprehensively mine the data characteristics of circulation data clusters with higher detailed performance in the circulation data, obtain more accurate and reliable compliance description characteristics of the circulation data, and based on the detailed description characteristics, the recognition of the circulation data is more accurate and reliable.

Description

Multistage dealer circulation data processing method and system based on automatic check

Technical Field

The application relates to the field of data processing, in particular to a method and a system for processing multilevel dealer circulation data based on automatic check.

Background

The process of commodity transfer from the producer to the consumer may be through multiple levels of transfer, producing dealer transfer data that relates to commodity transfer and transaction information data between the individual dealers. Compliance detection of dealer flow data refers to monitoring and auditing of commodity flow and transaction information between dealers to ensure compliance with relevant legal regulations and compliance requirements. For example, ensuring comprehensive and accurate collection and recording of dealer stream data, including merchandise information, shipping information, dealer information, sales information, procurement information, inventory information, payment information, etc.; verifying and verifying the acquired data, and ensuring the authenticity and credibility of the data; and establishing rules and standards of compliance detection, and defining the compliance requirements of various data indexes. For example, setting an upper limit of the price fluctuation range, requiring lot tracing of a specific product, etc.; and abnormal detection and early warning are carried out on the circulation data of the dealer by utilizing an artificial intelligence technology and a data analysis method, and if the conditions of greatly increased or reduced sales, midway loss of commodities, price change exceeding a conventional range and the like are found compared with the historical data, early warning is timely generated and investigation and verification are carried out. For the circulation process with good circulation records, in order to save human resources, artificial intelligence can be adopted to automatically check recorded circulation data, and the compliance problem reflected by the circulation data is identified, however, because of the difficulty of personnel management, each circulation link and the circulation records related to each link, the detail degree of the circulation records may have larger difference, and for the record content with insufficient disclosure information, in actual work, the artificial intelligence algorithm may excessively focus, so that the deviation of the characteristic information mined from the circulation records is serious, and the accuracy of the obtained global compliance recognition result is reduced.

Disclosure of Invention

The invention aims to provide a multistage dealer circulation data processing method and system based on automatic check.

The technical scheme of the embodiment of the application is realized as follows: in a first aspect, an embodiment of the present application provides a method for processing circulation data of a multistage dealer based on automatic verification, where the method includes: acquiring a first stream data learning sample; the first streaming data learning sample is provided with a plurality of streaming data clusters, and if the details of the streaming data clusters are different, the corresponding detail classifications of the streaming data clusters are different; detecting the plurality of circulation data clusters through a significance identification algorithm to obtain a first circulation data cluster and a second circulation data cluster in the plurality of circulation data clusters; the saliency recognition algorithm has a saliency weight, and the saliency recognition algorithm determines that the saliency of the first stream data cluster is higher than the saliency of the second stream data cluster according to the saliency weight; distributing first estimated annotation information to the first streaming data cluster according to the saliency of the first streaming data cluster by the saliency recognition algorithm, and distributing second estimated annotation information to the second streaming data cluster according to the saliency of the second streaming data cluster by the saliency recognition algorithm; the first estimated annotation information characterizes that the first streaming data cluster belongs to a first estimated detail classification; the second estimated annotation information characterizes that the second circulation data cluster belongs to a second estimated detail classification; the details of the first estimated detail classification characterization are higher than those of the second estimated detail classification characterization; acquiring first contrast annotation information of the first streaming data cluster and second contrast annotation information of the second streaming data cluster; the first contrast annotation information characterizes that the first streaming data cluster belongs to a first contrast detail classification; the second comparison annotation information characterizes that the second circulation data cluster belongs to a second comparison detail classification; adjusting the significance weight according to the difference between the first estimated annotation information and the first comparison annotation information and the difference between the second estimated annotation information and the second comparison annotation information; the saliency recognition algorithm is used for extracting compliance description features of the circulation data according to the adjusted saliency weight, and the compliance description features are used for recognizing compliance of the circulation data.

As an embodiment, the detecting the plurality of clusters of stream data by a saliency identification algorithm, to obtain a first cluster of stream data and a second cluster of stream data, includes: determining a significance score of each of the plurality of circulation data clusters according to the significance weight through the significance identification algorithm; a significance score for a circulation data cluster represents the significance of the significance recognition algorithm to the circulation data cluster; determining a circulation data cluster with significance scores within a first scoring range in the plurality of circulation data clusters as the first circulation data cluster, and determining a circulation data cluster with significance scores within a second scoring range in the plurality of circulation data clusters as the second circulation data cluster; wherein the score in the first score range is greater than the score in the second score range; the saliency recognition algorithm is embedded in a compliance recognition algorithm, the adjusted saliency weight is fixed in the compliance recognition algorithm, and the compliance recognition algorithm is also provided with a recognition branch operator; the method further comprises the steps of: obtaining a second stream data learning sample comprising stream annotation information characterizing a compliance result of the second stream data learning sample; extracting features of the second-stream data learning sample according to the adjusted significance weights through the significance identification algorithm to obtain sample compliance description features of the second-stream data learning sample; carrying out compliance recognition on the second flow data learning sample according to the sample compliance description characteristics through the recognition branch operator to obtain a compliance estimation result of the second flow data learning sample; and adjusting algorithm parameters except for the fixed significance weights after adjustment in the compliance recognition algorithm according to the difference between the compliance result and the compliance estimation result to obtain a debugged compliance recognition algorithm.

As one embodiment, the second stream data learning samples are multiple, and the second stream data learning samples correspond to the stream records of different links; the feature extraction is performed on the second-stream data learning sample by the saliency recognition algorithm according to the adjusted saliency weight, so as to obtain sample compliance description features of the second-stream data learning sample, which comprises the following steps: respectively extracting features of each second-stream data learning sample according to the adjusted significance weights through the significance identification algorithm to obtain sample compliance description features of each second-stream data learning sample; and carrying out characteristic interaction on the sample compliance description characteristics of the plurality of second-stream data learning samples to obtain the sample compliance description characteristics.

As one embodiment, the performing feature interaction on the sample compliance description features of the plurality of second-stream data learning samples to obtain the sample compliance description features includes: connecting the sample compliance description features of the plurality of second flow data learning samples end to obtain the sample compliance description features; or alternatively; and summing the sample compliance characterization features of the plurality of second stream data learning samples to obtain the sample compliance characterization features.

As an implementation mode, the debugged compliance recognition algorithm is provided with a debugged significance recognition algorithm and a debugged recognition branch operator; the method further comprises the steps of: acquiring circulation data to be validated by compliance; extracting features of the circulation data through the debugged significance identification algorithm to obtain target compliance description features of the circulation data; and carrying out compliance recognition on the circulation data according to the target compliance description characteristics through the debugged recognition branch operator to obtain a compliance recognition result of the circulation data.

As one implementation mode, the circulation data comprises a plurality of circulation records corresponding to different links; the feature extraction is performed on the circulation data through the debugged saliency recognition algorithm to obtain target compliance description features of the circulation data, and the feature extraction comprises the following steps: extracting features of each piece of circulation data through the debugged significance identification algorithm to obtain sample compliance description features of each piece of circulation data; and carrying out feature interaction on the sample compliance description features of the plurality of circulation data to obtain the target compliance description features.

In one embodiment, the obtaining the first contrast annotation information of the first streaming data cluster and the second contrast annotation information of the second streaming data cluster includes: compliance recognition is carried out on the details of a first flow data cluster through a debugged detail recognition algorithm, so that the first comparison detail classification of the first flow data cluster is obtained; performing compliance recognition on the details of a second flow data cluster through the debugged details recognition algorithm to obtain the second comparison details classification of the second flow data cluster; and distributing the first comparison annotation information to the first circulation data cluster according to the first comparison detail classification, and distributing the second comparison annotation information to the second circulation data cluster according to the second comparison detail classification.

As an embodiment, the method further comprises: acquiring a third stream data learning sample and a detailed identification algorithm to be debugged; the third stream data learning sample has detail annotation information characterizing a true detail classification of the third stream data learning sample; performing compliance recognition on the detail classification of the third-stream data learning sample through the detail recognition algorithm to be debugged to obtain estimated detail classification of the third-stream data learning sample; and adjusting algorithm parameters of the detail identification algorithm to be debugged according to the difference between the real detail classification and the estimated detail classification to obtain the debugged detail identification algorithm.

In one embodiment, the obtaining the first contrast annotation information of the first streaming data cluster and the second contrast annotation information of the second streaming data cluster includes: clustering the first-stream data learning sample through a debugged data clustering algorithm to obtain a plurality of data clusters of the first-stream data learning sample, wherein one data cluster corresponds to one detail classification; determining a data cluster with the largest intersection ratio with the first circulation data cluster from the plurality of data clusters as a first detail matching data cluster of the first circulation data cluster, and distributing the first contrast annotation information for the first circulation data cluster according to the detail classification corresponding to the first detail matching data cluster; and determining a data cluster with the largest intersection ratio with the second circulation data cluster in the plurality of data clusters as a second detail matching data cluster of the second circulation data cluster, and distributing the second contrast annotation information for the second circulation data cluster according to the detail classification corresponding to the second detail matching data cluster; the first comparison detail classification is a detail classification corresponding to the first detail matching data cluster, and the second comparison detail classification is a detail classification corresponding to the second detail matching data cluster.

In a second aspect, the present application provides an automatic check-based multistage dealer circulation data processing system, including a check server and a circulation terminal communicatively connected to the check server, where the check server includes a memory and a processor, the memory stores a computer program that can be run on the processor, and the processor implements the method described above when executing the computer program.

The application has at least the beneficial effects that: the application can acquire a first-class data learning sample; the first streaming data learning sample is provided with a plurality of streaming data clusters, and if the details of the streaming data clusters are different, the detailed classification of the streaming data clusters is different; detecting the plurality of circulation data clusters through a saliency recognition algorithm to obtain a first circulation data cluster and a second circulation data cluster in the plurality of circulation data clusters; the saliency recognition algorithm has a saliency weight, and the saliency recognition algorithm determines that the saliency of the first data cluster of the stream is higher than the saliency of the second data cluster of the stream according to the saliency weight; then, the first estimated annotation information can be allocated to the first data cluster according to the saliency of the first data cluster by the saliency recognition algorithm, and the second estimated annotation information can be allocated to the second data cluster according to the saliency of the second data cluster by the saliency recognition algorithm; the first estimated annotation information characterizes that the first circulation data cluster belongs to a first estimated detail classification; the second estimated annotation information characterizes the second circulation data cluster as belonging to a second estimated detail classification; the details of the first estimated detail classification characterization are higher than those of the second estimated detail classification characterization; the first contrast annotation information of the first stream data cluster and the second contrast annotation information of the second stream data cluster can also be obtained; the first comparison annotation information characterizes that the first circulation data cluster belongs to a first comparison detail classification; the second comparison annotation information characterizes that the second circulation data cluster belongs to a second comparison detail classification; thus, the significance weight can be adjusted according to the difference between the first estimated annotation information and the first comparative annotation information and the difference between the second estimated annotation information and the second comparative annotation information; the saliency recognition algorithm is used for extracting compliance description features of the circulation data according to the adjusted saliency weight, and the compliance description features are used for recognizing the compliance of the circulation data. Based on the method, the estimated annotation information can be distributed to the circulating data clusters of the first circulating data learning sample for the circulating data clusters of the first circulating data learning sample through the saliency recognition algorithm, the detailed classification of the estimated annotation information distributed to the circulating data clusters with higher saliency is higher, so that the saliency weight can be adjusted through the difference between the real annotation information (such as comparison annotation information) and the estimated annotation information of the circulating data clusters, then the saliency recognition algorithm can distribute higher saliency to the circulating data clusters with higher saliency in the circulating data through the adjusted saliency weight, distribute lower saliency to the circulating data clusters with lower saliency in the circulating data, so as to comprehensively mine the data characteristics of the circulating data clusters with higher saliency in the circulating data, obtain more accurate compliance description characteristics of the circulating data, and recognize the circulating data more accurately and reliably based on the detailed description characteristics.

In the following description, other features will be partially set forth. Upon review of the ensuing disclosure and the accompanying figures, those skilled in the art will in part discover these features or will be able to ascertain them through production or use thereof. The features of the present application may be implemented and obtained by practicing or using the various aspects of the methods, tools, and combinations that are set forth in the detailed examples described below.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

Fig. 1 is an application scenario schematic diagram of a multi-level dealer circulation data processing method based on automatic verification according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for processing multi-level dealer circulation data based on automatic verification according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a functional module architecture of a circulation data processing device according to an embodiment of the present application;

fig. 4 is a schematic diagram of a composition of a verification server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description reference is made to "some embodiments," "as one implementation/scheme," "in one implementation," which describe a subset of all possible embodiments, but it is to be understood that "some embodiments," "as one implementation/scheme," "in one implementation," can be the same subset or different subsets of all possible embodiments, and can be combined with each other without conflict.

In the following description, the terms "first", "second", "third", and the like are used merely to distinguish similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", "third", and the like may be interchanged with a particular order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

The multi-level dealer stream data processing method based on automatic verification provided by the embodiment of the application can be executed by a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

FIG. 1 is a schematic diagram of a multi-level dealer flow data processing system based on automated verification provided in an embodiment of the present application. The multi-level dealer stream data processing system 10 based on automatic verification provided in the embodiment of the present application includes a plurality of circulation terminals 100, a network 200 and a verification server 300, and the plurality of circulation terminals 100 and the verification server 300 are connected through the network 200 in a communication manner. The check server 300 is used for executing the method provided in the embodiment of the present application. Specifically, the embodiment of the application provides a method for processing multi-level dealer circulation data based on automatic verification, which is applied to a verification server 300, as shown in fig. 2, and includes: step S110, a first stream data learning sample is acquired.

In the embodiment of the application, the first streaming data learning sample has a plurality of streaming data clusters, and if the details of the streaming data clusters are different, the corresponding detail classifications of the streaming data clusters are different. The first flow data learning sample is a sample data sample for training a saliency recognition algorithm, the saliency recognition algorithm is used for extracting compliance description features of flow data according to saliency weights obtained through adjustment in a training process, and the compliance description features are used for recognizing compliance of the flow data. The details of the stream data clusters represent the degree of detail of the information disclosed. The first streaming data learning sample may have a plurality of streaming data clusters, that is, the first streaming data learning sample may be decomposed into a plurality of streaming data clusters, and each of the streaming data clusters may be divided according to data generated by a streaming node of the commodity, or according to types of the data, for example, when the division is performed based on the types of the data, the types of the data may include streaming records of different links such as commodity information, transportation information, dealer information, sales information, purchase information, inventory information, payment information, dealer relationship information, and the like, more specifically, specific compositions of the above-mentioned types of data are exemplified, and the commodity information: including basic information such as commodity name, specification, price, date of manufacture, etc. The transportation information may include transportation means, logistics companies, shipping addresses, shipping costs, etc. The dealer information may include dealer name, contacts, address, phone, mailbox, etc. The sales information may include information of sales time, sales quantity, sales amount, sales channel, and the like. The purchase information may include purchase time, purchase quantity, purchase price, etc. Inventory information may include information on inventory quantity, inventory location, inventory change records, and the like. The payment information may include information of a payment manner, a payment time, a payment amount, and the like. The dealer relationship information may include information of a partnership relationship, a proxy relationship, a provisioning relationship, etc. between dealers. Wherein, for continuous data, the value is directly recorded, and for discrete data, the data can be encoded according to any feasible encoding mode (such as single-heat encoding), so as to obtain the value data, for example, the payment mode of cash transaction is encoded as 10.

The plurality of stream data clusters may include stream data clusters having different details, and if the stream data clusters have different details, the stream data clusters have different corresponding detailed classifications. The above-mentioned detail classifications may be detail classifications of the flow data, each of which characterizes a corresponding detail, e.g., by a hierarchical classification, such as primary, secondary, tertiary.

For example, the detailed classification of the flow data includes two types, namely a simple detailed classification and a rich detailed classification, wherein the simple detailed classification characterizes a lower detail than the rich detailed classification characterizes. The above-mentioned difference in the details of the stream data clusters indicates a difference in the level of the details of the stream data clusters.

Step S120, detecting a plurality of circulation data clusters through a saliency recognition algorithm to obtain a first circulation data cluster and a second circulation data cluster in the plurality of circulation data clusters; the saliency recognition algorithm has a saliency weight from which it determines that the saliency for the first cluster of data of the stream is higher than the saliency for the second cluster of data of the stream.

As an implementation manner, the embodiment of the application invokes the saliency recognition algorithm to detect the input stream data (the first stream data learning sample) according to each stream data cluster divided for the input stream data, so as to obtain a first stream data cluster and a second stream data cluster in the plurality of stream data clusters. Wherein the saliency recognition algorithm may have a saliency weight (i.e., a weight in the algorithm parameters) by which the saliency recognition algorithm may determine that the saliency of the first stream data cluster (i.e., the focus contribution of the first stream data cluster) is higher than the saliency of the second stream data cluster, and the saliency of the saliency recognition algorithm for each stream data cluster may be determined by the saliency weight of the saliency recognition algorithm.

The saliency recognition algorithm can perform feature learning on the first data cluster of the first data cluster according to each data cluster of the first data cluster by using the saliency weight to finish detection, so as to obtain a saliency score (also called saliency weight) of the saliency recognition algorithm on each data cluster of the first data cluster, the saliency recognition algorithm can correspond to one saliency score of one data cluster of the first data cluster, the saliency score of the saliency recognition algorithm on the one data cluster represents the saliency of the saliency recognition algorithm on the data cluster of the first data cluster, and the higher the saliency score is, and the value of the saliency score is limited to be between 0 and 1.

Based on this, the saliency recognition algorithm obtains saliency scores for the respective flow data clusters of the first flow data learning sample by saliency weights in terms of the respective flow data clusters of the first flow data learning sample. In other words, it is possible to clarify by the saliency weight of the saliency recognition algorithm which of the clusters of the stream data of the input stream data (first stream data learning sample) is focused more and which of the clusters of the stream data of the input stream data is focused less.

As an embodiment, the number of the score ranges of the saliency score may be divided according to the number of the detailed classification of the circulation data, so as to obtain a plurality of score ranges of the saliency score, one detailed classification may correspond to one score range of the saliency score, the higher the detailed performance of the detailed classification characterization is, the higher the score in the score range corresponding to the detailed classification is, and all the score ranges are combined, namely, the global number range of the saliency score. For example, the detailed classification of the circulation data includes a simple detailed classification and a rich detailed classification, and then the numerical interval of the significance score is divided into two score ranges, such as score range 1 (0,0.48) and score range 2 (0.48,1), wherein score range 1 is a score range corresponding to the simple detailed classification and score range 2 is a score range corresponding to the rich detailed classification. The significance score of the first stream data cluster and the significance score of the second stream data cluster are each located in two partitioned scoring ranges, and the significance score of the first stream data cluster is greater than the significance score of the second stream data cluster, in other words, the score in the scoring range to which the significance score of the first stream data cluster belongs is greater than the score in the scoring range to which the significance score of the second stream data cluster belongs. Wherein any two of the plurality of scoring ranges that divide the numerical interval of the saliency score are considered a first scoring range and a second scoring range, and the score in the first scoring range is greater than the score in the second scoring range.

Then, the present application determines, as the first stream data cluster, a stream data cluster having a significance score within a first score range among the plurality of stream data clusters of the first stream data learning sample, and determines, as the second stream data cluster, a stream data cluster having a significance score within a second score range among the plurality of stream data clusters. The number of first and second clusters of data is at least one.

Step S130, distributing first estimated annotation information for the first data cluster according to the saliency of the first data cluster by a saliency recognition algorithm, and distributing second estimated annotation information for the second data cluster according to the saliency of the second data cluster by the saliency recognition algorithm; the first estimated annotation information characterizes that the first circulation data cluster belongs to a first estimated detail classification; the second estimated annotation information characterizes the second circulation data cluster as belonging to a second estimated detail classification; the first pre-estimated detail class representation has a higher detail than the second pre-estimated detail class representation.

As an implementation manner, a significance identification algorithm classifies a scoring range to which a significance score of each circulation data cluster belongs and details corresponding to each scoring range, so that corresponding estimated annotation information is allocated to each circulation data cluster, and the estimated annotation information is estimated classification label information of the corresponding circulation data cluster.

According to the embodiment of the application, the saliency of the first stream data cluster is scored through a saliency recognition algorithm, first estimated annotation information is distributed to the first stream data cluster, for example, the first estimated annotation information is distributed to the first stream data cluster through the detailed classification corresponding to the scoring range to which the saliency of the first stream data cluster belongs, the first estimated annotation information is used for representing that the detailed classification of the first stream data cluster belongs to the first estimated detailed classification, and the first estimated detailed classification is also used for representing that the saliency of the first stream data cluster corresponds to the scoring range to which the saliency of the first stream data cluster belongs, and the first estimated detailed classification can be regarded as the detailed classification of the first stream data cluster determined by the saliency recognition algorithm according to the saliency weight. In addition, the second estimated annotation information is allocated to the second data cluster by the saliency recognition algorithm, for example, the second estimated annotation information is allocated to the second data cluster by the detailed classification corresponding to the scoring range to which the saliency score of the second data cluster belongs, the second estimated annotation information is characterized in that the detailed classification of the second data cluster belongs to the second estimated detailed classification, and the second estimated detailed classification is also the detailed classification corresponding to the scoring range to which the saliency score of the second data cluster belongs, and the second estimated detailed classification can be regarded as the detailed classification of the second data cluster determined by the saliency recognition algorithm according to the saliency weight. The first pre-estimated detail class representation has a higher detail than the second pre-estimated detail class representation.

Based on the above mode of distributing estimated annotation information (comprising the first estimated annotation information and the second estimated annotation information) to the first stream data cluster and the second stream data cluster, the method can distribute the estimated annotation information of the detailed classification with higher detailed performance, which is characterized by the more focused stream data clusters of the saliency recognition algorithm, and then adjust the saliency weight of the saliency recognition algorithm by adopting the mode, so that the saliency recognition algorithm can focus the more abundant stream data clusters in the input stream data through the adjusted saliency weight, and reduce the attention degree of the simpler stream data clusters in the input stream data.

Step S140, obtaining first contrast annotation information of a first stream data cluster and second contrast annotation information of a second stream data cluster; the first comparison annotation information characterizes that the first circulation data cluster belongs to a first comparison detail classification; the second contrast annotation information characterizes the second stream data cluster as belonging to a second contrast detail classification.

In one embodiment, the comparison annotation information of the first data cluster and the comparison annotation information of the second data cluster are obtained, the comparison annotation information of the first data cluster is regarded as first comparison annotation information, and the comparison annotation information of the second data cluster is regarded as second comparison annotation information. Wherein the first contrast annotation information may be annotation information of the actual detailed classification of the first data cluster of streams and the second contrast annotation information may be annotation information of the actual detailed classification of the second data cluster of streams. The first contrast annotation information may characterize that the detailed classification of the first cluster of data of the first stream belongs to the first contrast detailed classification, i.e. the actual detailed classification of the first cluster of data of the first stream may be the first contrast detailed classification. Further, the second contrast annotation information may characterize that the detail classification of the second cluster of stream data belongs to the second contrast detail classification, i.e. the actual detail classification of the second cluster of stream data is, for example, the second contrast detail classification. Wherein the first and second comparative detail classifications may be one of a plurality of detail classifications set for the flow data.

As an implementation manner, the first comparison annotation information of the first data cluster and the second comparison annotation information of the second data cluster are obtained through a debugged detail identification algorithm, specifically, the detail of the first data cluster is subjected to compliance identification through the debugged detail identification algorithm, so that the first comparison detail classification of the first data cluster is obtained, namely the actual detail classification of the first data cluster estimated through the debugged detail identification algorithm. And similarly, the detail of the second streaming data cluster is subjected to compliance recognition through a debugged detail recognition algorithm, so that a second comparison detail classification of the second streaming data cluster is obtained, and the second comparison detail classification is the actual detail classification of the second streaming data cluster estimated through the debugged detail recognition algorithm. Then, by assigning the above-described first contrast annotation information to the first cluster of stream data by the above first contrast detail classification, the first contrast annotation information characterizes that the detail classification of the first cluster of stream data is the first contrast detail classification. In addition, the second stream data cluster may be assigned second comparative annotation information as described above based on the second comparative detail classification above, the second comparative annotation information characterizing that the detail classification of the second stream data cluster is the second comparative detail classification.

For example, the process of obtaining the above-mentioned detailed identification algorithm after debugging specifically includes: the flow data recognition algorithm obtains a third flow data learning sample having detailed annotation information that can characterize a true detailed classification of the third flow data learning sample, and a detailed recognition algorithm to be debugged. As an embodiment, the present application may include a plurality of third stream data learning samples, which may include various detail classifications (including all detail classifications configured in advance of stream data) of stream data learning samples for debugging a detail identification algorithm to be debugged. And carrying out compliance recognition on the detail classification of the third-stream data learning sample through a detail recognition algorithm to be debugged to obtain estimated detail classification of the third-stream data learning sample, wherein the estimated detail classification is the detail classification estimated by the detail recognition algorithm to be debugged on the third-stream data learning sample. And then adjusting algorithm parameters of a detail recognition algorithm to be debugged through the difference between the real detail classification and the estimated detail classification (such as cross entropy loss of the real detail classification and the estimated detail classification) of the third stream data learning sample, so as to obtain the debugged detail recognition algorithm.

In addition, the first contrast annotation information of the first stream data cluster and the second contrast annotation information of the second stream data cluster can be obtained through a debugged data clustering algorithm, and specifically: the debugged data clustering algorithm is an algorithm for segmenting each part of the circulating data with different detail classifications in the input circulating data, which is obtained by debugging, and as an implementation mode, the debugged data clustering algorithm can be any segmentation algorithm, such as a random segmentation algorithm, a hierarchical segmentation algorithm, a time sequence segmentation algorithm and the like. Therefore, the data clustering algorithm after debugging performs clustering processing on the first-stream data learning sample to obtain a plurality of data clusters of the first-stream data learning sample, each data cluster is the stream data in the first-stream data learning sample, one data cluster can correspond to one detail classification, in other words, one data cluster corresponds to one detail classification, and the detail classification corresponding to each data cluster can be identified by the data clustering algorithm after debugging when the data clusters are obtained through segmentation.

Further, the data cluster with the largest intersection ratio (IOU value) with the first data cluster of the plurality of data clusters (namely, the largest coincidence rate of the streaming data) is determined to be the first detail matching data cluster of the first data cluster of streaming data, so that the first contrast annotation information can be distributed to the first data cluster according to the detail classification corresponding to the first detail matching data cluster, and the first contrast detail classification represented by the first contrast annotation information is the detail classification corresponding to the first detail matching data cluster. And similarly, the data cluster with the largest intersection ratio with the second data cluster in the plurality of data clusters is determined to be the second detail matching data cluster of the second data cluster, so that second contrast annotation information can be distributed to the second data cluster through detail classification corresponding to the second detail matching data cluster, and the second contrast detail classification represented by the second contrast annotation information is the detail classification corresponding to the second detail matching data cluster.

As an embodiment, by allocating corresponding contrast annotation information to each stream data cluster in the first stream data learning sample according to any one of the two modes (according to the detailed identification algorithm after debugging or according to the clustering algorithm after debugging), the contrast annotation information (including the first contrast annotation information and the second contrast annotation information) allocated to the first stream data cluster and the second stream data cluster in advance can be directly obtained.

When the data clustering algorithm is debugged, the method specifically comprises the following steps:

s1, if the detail classification of the circulation data comprises rich detail classification and simple detail classification, preparing a circulation data learning sample with rich annotation information and simple annotation information, wherein the rich annotation information can be annotation information endowed to the rich circulation data in the circulation data learning sample, and the rich annotation information is the annotation information of the rich detail classification; the simple annotation information is the annotation information given by the simple circulation data in the circulation data learning sample, and is the annotation information of the simple detail classification;

s2, constructing a data clustering algorithm, for example, adopting a convolutional neural network, and initializing the neural network algorithm based on a U-Net architecture into the data clustering algorithm;

S3, constructing an error function (such as a cross entropy function) of a data clustering algorithm;

s4, obtaining a debugging sample (circulating data learning sample) for debugging the data clustering algorithm in the current generation;

s5, inputting a sample into a data clustering algorithm to forward propagate, and predicting rich data clusters and simple data clusters in a circulating data learning sample;

s6, determining errors of a data clustering algorithm through predicting differences between the identified rich data clusters and actual rich data clusters corresponding to the rich data cluster annotation information and predicting differences between the identified simple data clusters and actual simple data clusters corresponding to the simple data cluster annotation information;

s7, carrying out gradient optimization adjustment on the data clustering algorithm based on the error, and updating algorithm parameters of the data clustering algorithm;

s8, when the algorithm converges, obtaining a data clustering algorithm obtained by debugging, wherein the condition of algorithm convergence is that the debugging turns reach a preset maximum turn or the error of the algorithm is smaller than an error threshold;

s9, taking the data clustering algorithm obtained by debugging as the data clustering algorithm after debugging.

Step S150, adjusting the significance weight according to the difference between the first estimated annotation information and the first contrast annotation information and the difference between the second estimated annotation information and the second contrast annotation information; the saliency recognition algorithm is used for extracting compliance description features of the circulation data according to the adjusted saliency weight, and the compliance description features are used for recognizing the compliance of the circulation data.

As an embodiment, the saliency weight of the saliency identification algorithm may be adjusted by the difference between the first pre-estimated annotation information and the first contrast annotation information, and the difference between the second pre-estimated annotation information and the second contrast annotation information. The difference between the first estimated annotation information and the first contrast annotation information may be a cross entropy error between the first estimated annotation information and the first contrast annotation information, and the difference between the second estimated annotation information and the second contrast annotation information may be a cross entropy error between the second estimated annotation information and the second contrast annotation information.

In the embodiment of the present application, when the significance weight is adjusted, the first data cluster of the first stream may include first estimated annotation information and first contrast annotation information, and the second data cluster of the second stream may include second estimated annotation information and second contrast annotation information. A difference between first pre-estimated annotation information and first contrast annotation information of a first stream data cluster is obtained, and a difference between second pre-estimated annotation information and second contrast annotation information of a second stream data cluster is obtained. And adjusting the saliency weight of the saliency recognition algorithm through the cross entropy error between the first estimated annotation information and the first comparison annotation information and the cross entropy error between the second estimated annotation information and the second comparison annotation information, so as to obtain the adjusted saliency weight. Based on the above process, the saliency recognition algorithm focuses more detailed circulation data clusters in the input circulation data through the adjusted saliency weight, reduces the focusing degree of circulation data clusters with lower detailed circulation data in the input circulation data, and then extracts the compliance description characteristics of the circulation data through the saliency recognition algorithm by adopting the adjusted saliency weight, so that the compliance description characteristics are more accurate and reliable, and the compliance of the circulation data is more accurately and reliably recognized.

The application can acquire a first-class data learning sample; the first streaming data learning sample is provided with a plurality of streaming data clusters, and if the details of the streaming data clusters are different, the detailed classification of the streaming data clusters is different; detecting the plurality of circulation data clusters through a saliency recognition algorithm to obtain a first circulation data cluster and a second circulation data cluster in the plurality of circulation data clusters; the saliency recognition algorithm has a saliency weight, and the saliency recognition algorithm determines that the saliency of the first data cluster of the stream is higher than the saliency of the second data cluster of the stream according to the saliency weight; then, the first estimated annotation information can be allocated to the first data cluster according to the saliency of the first data cluster by the saliency recognition algorithm, and the second estimated annotation information can be allocated to the second data cluster according to the saliency of the second data cluster by the saliency recognition algorithm; the first estimated annotation information characterizes that the first circulation data cluster belongs to a first estimated detail classification; the second estimated annotation information characterizes the second circulation data cluster as belonging to a second estimated detail classification; the details of the first estimated detail classification characterization are higher than those of the second estimated detail classification characterization; the first contrast annotation information of the first stream data cluster and the second contrast annotation information of the second stream data cluster can also be obtained; the first comparison annotation information characterizes that the first circulation data cluster belongs to a first comparison detail classification; the second comparison annotation information characterizes that the second circulation data cluster belongs to a second comparison detail classification; thus, the significance weight can be adjusted according to the difference between the first estimated annotation information and the first comparative annotation information and the difference between the second estimated annotation information and the second comparative annotation information; the saliency recognition algorithm is used for extracting compliance description features of the circulation data according to the adjusted saliency weight, and the compliance description features are used for recognizing the compliance of the circulation data. Based on the method, the estimated annotation information can be distributed to the circulating data clusters of the first circulating data learning sample for the circulating data clusters of the first circulating data learning sample through the saliency recognition algorithm, the detailed classification of the estimated annotation information distributed to the circulating data clusters with higher saliency is higher, so that the saliency weight can be adjusted through the difference between the real annotation information (such as comparison annotation information) and the estimated annotation information of the circulating data clusters, then the saliency recognition algorithm can distribute higher saliency to the circulating data clusters with higher saliency in the circulating data through the adjusted saliency weight, distribute lower saliency to the circulating data clusters with lower saliency in the circulating data, so as to comprehensively mine the data characteristics of the circulating data clusters with higher saliency in the circulating data, obtain more accurate compliance description characteristics of the circulating data, and recognize the circulating data more accurately and reliably based on the detailed description characteristics.

The following describes a debugging process for debugging the compliance recognition algorithm, which specifically includes the following steps: step S210, a second stream data learning sample is acquired.

The second stream data learning sample includes stream annotation information characterizing compliance results of the second stream data learning sample. The compliance results are, for example, two-class, i.e., compliance or non-compliance, and may be further refined, such as the type of non-compliance, e.g., payment type non-compliance, transportation type non-compliance, inventory record non-compliance, etc.

The significance identification algorithm can be embedded into the compliance identification algorithm, the significance weight obtained by adjusting the significance weight of the significance identification algorithm through the process can be fixed in the compliance identification algorithm, and the significance weight obtained by adjusting in the follow-up compliance identification algorithm adjusting process does not need to be adjusted.

That is, the above process is a preprocessing process for adjusting the saliency weight of the saliency recognition algorithm in the compliance recognition algorithm, after the saliency weight of the saliency recognition algorithm in the compliance recognition algorithm is adjusted, the overall debugging of the compliance recognition algorithm can be performed, and at this time, other algorithm parameters except the adjusted saliency weight in the compliance recognition algorithm are only updated without additionally adjusting the optimized saliency weight.

And step S220, extracting the characteristics of the second flow data learning sample according to the adjusted significance weight through a significance identification algorithm to obtain the sample compliance description characteristics of the second flow data learning sample.

As an embodiment, the above saliency recognition algorithm may be a branch algorithm (or called operator) used for feature mining of the input stream data in the compliance recognition algorithm, and the algorithm parameters of the saliency recognition algorithm include other algorithm parameters for feature extraction of the input stream data in addition to the adjusted saliency weight. Then, the feature extraction is performed on the second flow data learning sample (i.e., the input flow data of the saliency recognition algorithm) according to the saliency weight adjusted above and the remaining algorithm parameters for performing feature extraction on the input flow data by the saliency recognition algorithm, so as to obtain the sample compliance description feature of the second flow data learning sample.

As an embodiment, the second-stream data learning samples may be multiple, where multiple second-stream data learning samples correspond to the flow records of different links (i.e. record information generated by the flow nodes involved in the flow process), and one flow record corresponds to one second-stream data learning sample, and the obtaining, by this embodiment of the present application, the compliance description features of the samples specifically include: and respectively extracting the characteristics of each second-stream data learning sample according to the adjusted significance weights and the rest algorithm parameters for extracting the characteristics of the input stream data through a significance identification algorithm to obtain the sample compliance description characteristics of each second-stream data learning sample, wherein one second-stream data learning sample comprises one sample compliance description characteristic, and one second-stream data learning sample comprises the sample compliance description characteristics, namely the characteristics obtained by the significance identification algorithm for the second-stream data learning sample. Then, feature interaction (for example, feature vector fusion, specifically, vector operation may be performed, such as addition, end-to-end connection, multiplication, etc.) is performed on the sample compliance description features of the plurality of second-stream data learning samples, so as to obtain the sample compliance description features, where the sample compliance description features fuse the features of the circulation data (that is, the plurality of second-stream data learning samples) recorded by the multiple circulation nodes of the second-stream data learning samples.

And step S230, carrying out compliance recognition on the second flow data learning sample through recognition branch operators according to the sample compliance description characteristics to obtain a compliance estimation result of the second flow data learning sample.

As an embodiment, the compliance recognition algorithm may further include a recognition branch operator, where the recognition branch operator is used to recognize a compliance type in the circulation data, so as to obtain a compliance type in the circulation data, and a specific output result may be a probability value or a confidence level, and as an embodiment, the recognition branch operator may be an affine network layer (full connection layer).

Then, the compliance recognition algorithm can perform compliance recognition on the second-stream data learning sample through the above sample compliance description features by recognizing the branch operator, so as to obtain a compliance prediction result of the second-stream data learning sample, where the compliance prediction result can be a distribution condition of probability or confidence corresponding to each compliance type of the predicted second-stream data learning sample, and can be specifically represented by a vector.

And step S240, adjusting algorithm parameters except for the fixed adjusted significance weights in the compliance recognition algorithm according to the difference between the compliance result and the compliance estimation result to obtain the debugged compliance recognition algorithm.

As an implementation manner, the rest algorithm parameters (such as the algorithm parameters of the saliency recognition algorithm except the saliency weight after adjustment and the algorithm parameters of the recognition branching operator for extracting features of the circulation data) in the compliance recognition algorithm except the fixed saliency weight can be adjusted through the difference between the compliance result and the compliance prediction result of the second circulation data learning sample (as described above, the difference may be a cross entropy error between the two, specifically, a vector representing the compliance result), so as to obtain the debugged compliance recognition algorithm.

For example, the above-described process is continuously iterated on the compliance recognition algorithm, and the algorithm is stopped when the algorithm parameters (algorithm parameters other than the fixed significance weights) of the compliance recognition algorithm converge, and of course, other debug stopping conditions are also possible, for example, the number of iterations of the compliance recognition algorithm reaches the maximum value of the number of times of debugging, and at this time, the compliance recognition algorithm obtained by debugging is determined as the compliance recognition algorithm after debugging. The method comprises the steps of debugging a compliance recognition algorithm, wherein the significance recognition algorithm in the compliance recognition algorithm after debugging is the significance recognition algorithm after debugging, and the recognition branch operator in the compliance recognition algorithm after debugging is the recognition branch operator after debugging.

To sum up, in the debugging of the compliance recognition algorithm, the significance weight of the significance recognition algorithm in the compliance recognition algorithm is adjusted through the first-stream data learning sample. And then, adjusting other algorithm parameters except the adjusted significance weights in the compliance recognition algorithm through the second flow data learning sample to obtain the debugged compliance recognition algorithm. Through the process, the compliance recognition algorithm is obtained through debugging, and the debugged compliance recognition algorithm has the debugged significance recognition algorithm and the debugged recognition branch operator. And then, the circulation data can be accurately identified in compliance through a debugged compliance identification algorithm, and specifically: obtaining the circulation data to be subjected to compliance verification, and extracting the characteristics of the circulation data to be subjected to compliance verification through a debugged saliency recognition algorithm to obtain target compliance description characteristics of the circulation data to be subjected to compliance verification, wherein the obtaining mode of the target compliance description characteristics is the same as that of the sample compliance description characteristics, and the method can comprise the following steps: and respectively carrying out feature extraction on each piece of circulation data to be validated by the debugged significance recognition algorithm to obtain sample compliance description features of each piece of circulation data, and then carrying out feature interaction (such as head-to-tail connection or summation) on the sample compliance description features of a plurality of pieces of circulation data to be validated to obtain the target compliance description features. The method comprises the steps that a debugged saliency recognition algorithm comprises the adjusted saliency weight, so that the debugged saliency recognition algorithm can focus the circulating data of rich data clusters in all circulating data through the adjusted saliency weight, more accurate target compliance description characteristics of the circulating data to be validated by compliance can be obtained, and accuracy and reliability of compliance validation are improved.

Based on the foregoing embodiments, the present embodiment provides a circulation data processing device, and fig. 3 is a circulation data processing device 350 provided in the embodiment of the present application, as shown in fig. 3, where the device 350 includes: a training sample acquisition module 351 for acquiring a first stream data learning sample; the first streaming data learning sample is provided with a plurality of streaming data clusters, and if the details of the streaming data clusters are different, the corresponding detail classifications of the streaming data clusters are different; the saliency determination module 352 is configured to detect the plurality of circulation data clusters through a saliency recognition algorithm, so as to obtain a first circulation data cluster and a second circulation data cluster in the plurality of circulation data clusters; the saliency recognition algorithm has a saliency weight, and the saliency recognition algorithm determines that the saliency of the first stream data cluster is higher than the saliency of the second stream data cluster according to the saliency weight; the annotation information distribution module 353 is configured to distribute first estimated annotation information for the first streaming data cluster according to the saliency of the first streaming data cluster by the saliency recognition algorithm, and distribute second estimated annotation information for the second streaming data cluster according to the saliency of the second streaming data cluster by the saliency recognition algorithm; the first estimated annotation information characterizes that the first streaming data cluster belongs to a first estimated detail classification; the second estimated annotation information characterizes that the second circulation data cluster belongs to a second estimated detail classification; the details of the first estimated detail classification characterization are higher than those of the second estimated detail classification characterization; a contrast annotation acquisition module 354 for acquiring first contrast annotation information of the first streaming data cluster and second contrast annotation information of the second streaming data cluster; the first contrast annotation information characterizes that the first streaming data cluster belongs to a first contrast detail classification; the second comparison annotation information characterizes that the second circulation data cluster belongs to a second comparison detail classification; an algorithm parameter adjustment module 355, configured to adjust the saliency weight according to a difference between the first estimated annotation information and the first comparative annotation information, and a difference between the second estimated annotation information and the second comparative annotation information; the saliency recognition algorithm is used for extracting compliance description features of the circulation data according to the adjusted saliency weight, and the compliance description features are used for recognizing compliance of the circulation data.

The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.

If the technical scheme of the application relates to personal or private information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information, and obtains personal autonomous consent. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, simultaneously meets the requirement of 'explicit consent', and is collected within the scope of laws and regulations. For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a type of personal information to be processed.

It should be noted that, in the embodiment of the present application, if the method is implemented in the form of a software functional module, and sold or used as a separate product, the method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or portions contributing to the related art, and the software product may be stored in a storage medium, including several instructions to cause an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

An embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and the processor implements the method when executing the computer program.

The present embodiments provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method. The computer readable storage medium may be transitory or non-transitory.

Embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It should be noted that, fig. 4 is a schematic diagram of a hardware entity of a verification server according to an embodiment of the present application, as shown in fig. 4, the hardware entity of the verification server 300 includes: a processor 310, a communication interface 320, and a memory 330, wherein: processor 310 generally controls the overall operation of the check server 300. The communication interface 320 may enable the electronic device to communicate with other terminals or servers over a network. The memory 330 is configured to store instructions and applications executable by the processor 310, and may also cache data (e.g., streaming data) to be processed or processed by each module in the processor 310 and the check server 300, and may be implemented by FLASH memory (FLASH) or random access memory (Random Access Memory, RAM). Data transfer may occur between processor 310, communication interface 320, and memory 330 via bus 340. It should be noted here that: the description of the storage medium and apparatus embodiments above is similar to that of the method embodiments described above, with similar benefits as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and the apparatus of the present application, please refer to the description of the method embodiments of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Alternatively, the integrated units described above may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The foregoing is merely an embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the protection scope of the present application.

Claims

1. A method for processing multi-level dealer circulation data based on automatic verification, the method comprising:

acquiring a first stream data learning sample; the first streaming data learning sample is provided with a plurality of streaming data clusters, and if the details of the streaming data clusters are different, the corresponding detail classifications of the streaming data clusters are different; the method comprises the steps that the division mode of each circulation data cluster is obtained by division according to the data type of the commodity, wherein the data type comprises circulation records of commodity information, transportation information, dealer information, sales information, purchasing information, inventory information, payment information and dealer relation information, and the commodity information comprises commodity name, specification, price and production date; the transportation information comprises a transportation mode, a logistics company, a delivery address and a freight; the dealer information includes dealer names, contacts, addresses, telephones, mailboxes; the sales information includes sales time, sales quantity, sales amount, sales channel; the purchase information comprises purchase time, purchase quantity and purchase price; the inventory information comprises inventory quantity, inventory position and inventory change record; the payment information comprises a payment mode, a payment time and a payment amount; the dealer relation information comprises a cooperation relation, an agent relation and a supply relation among dealers; wherein, for continuous data, directly recording the numerical value, and for discrete data, encoding the data according to the data to obtain numerical value data;

Detecting the plurality of circulation data clusters through a significance identification algorithm to obtain a first circulation data cluster and a second circulation data cluster in the plurality of circulation data clusters; the saliency recognition algorithm has a saliency weight, and the saliency recognition algorithm determines that the saliency of the first stream data cluster is higher than the saliency of the second stream data cluster according to the saliency weight;

distributing first estimated annotation information to the first streaming data cluster according to the saliency of the first streaming data cluster by the saliency recognition algorithm, and distributing second estimated annotation information to the second streaming data cluster according to the saliency of the second streaming data cluster by the saliency recognition algorithm; the first estimated annotation information characterizes that the first streaming data cluster belongs to a first estimated detail classification; the second estimated annotation information characterizes that the second circulation data cluster belongs to a second estimated detail classification; the details of the first estimated detail classification characterization are higher than those of the second estimated detail classification characterization;

acquiring first contrast annotation information of the first streaming data cluster and second contrast annotation information of the second streaming data cluster; the first contrast annotation information characterizes that the first streaming data cluster belongs to a first contrast detail classification; the second comparison annotation information characterizes that the second circulation data cluster belongs to a second comparison detail classification; wherein the obtaining the first contrast annotation information of the first streaming data cluster and the second contrast annotation information of the second streaming data cluster includes: compliance recognition is carried out on the details of a first flow data cluster through a debugged detail recognition algorithm, so that the first comparison detail classification of the first flow data cluster is obtained; performing compliance recognition on the details of a second flow data cluster through the debugged details recognition algorithm to obtain the second comparison details classification of the second flow data cluster; distributing the first comparison annotation information to the first circulation data cluster according to the first comparison detail classification, and distributing the second comparison annotation information to the second circulation data cluster according to the second comparison detail classification; or alternatively; the obtaining the first contrast annotation information of the first streaming data cluster and the second contrast annotation information of the second streaming data cluster includes: clustering the first-stream data learning sample through a debugged data clustering algorithm to obtain a plurality of data clusters of the first-stream data learning sample, wherein one data cluster corresponds to one detail classification; determining a data cluster with the largest intersection ratio with the first circulation data cluster from the plurality of data clusters as a first detail matching data cluster of the first circulation data cluster, and distributing the first contrast annotation information for the first circulation data cluster according to the detail classification corresponding to the first detail matching data cluster; and determining a data cluster with the largest intersection ratio with the second circulation data cluster in the plurality of data clusters as a second detail matching data cluster of the second circulation data cluster, and distributing the second contrast annotation information for the second circulation data cluster according to the detail classification corresponding to the second detail matching data cluster; the first comparison detail classification is a detail classification corresponding to the first detail matching data cluster, and the second comparison detail classification is a detail classification corresponding to the second detail matching data cluster;

Adjusting the significance weight according to the difference between the first estimated annotation information and the first comparison annotation information and the difference between the second estimated annotation information and the second comparison annotation information; the saliency recognition algorithm is used for extracting compliance description features of the circulation data according to the adjusted saliency weight, and the compliance description features are used for recognizing compliance of the circulation data;

the detecting the plurality of circulation data clusters through a saliency recognition algorithm to obtain a first circulation data cluster and a second circulation data cluster comprises the following steps:

determining a significance score of each of the plurality of circulation data clusters according to the significance weight through the significance identification algorithm; a significance score for a circulation data cluster represents the significance of the significance recognition algorithm to the circulation data cluster;

determining a circulation data cluster with significance scores within a first scoring range in the plurality of circulation data clusters as the first circulation data cluster, and determining a circulation data cluster with significance scores within a second scoring range in the plurality of circulation data clusters as the second circulation data cluster; wherein the score in the first score range is greater than the score in the second score range;

The saliency recognition algorithm is embedded in a compliance recognition algorithm, the adjusted saliency weight is fixed in the compliance recognition algorithm, and the compliance recognition algorithm is also provided with a recognition branch operator; the method further comprises the steps of:

obtaining a second stream data learning sample comprising stream annotation information characterizing a compliance result of the second stream data learning sample;

extracting features of the second-stream data learning sample according to the adjusted significance weights through the significance identification algorithm to obtain sample compliance description features of the second-stream data learning sample;

carrying out compliance recognition on the second flow data learning sample according to the sample compliance description characteristics through the recognition branch operator to obtain a compliance estimation result of the second flow data learning sample;

and adjusting algorithm parameters except for the fixed significance weights after adjustment in the compliance recognition algorithm according to the difference between the compliance result and the compliance estimation result to obtain a debugged compliance recognition algorithm.

2. The method of claim 1, wherein there are a plurality of second stream data learning samples, and a plurality of the second stream data learning samples correspond to the stream records of different links; the feature extraction is performed on the second-stream data learning sample by the saliency recognition algorithm according to the adjusted saliency weight, so as to obtain sample compliance description features of the second-stream data learning sample, which comprises the following steps:

Respectively extracting features of each second-stream data learning sample according to the adjusted significance weights through the significance identification algorithm to obtain sample compliance description features of each second-stream data learning sample;

and carrying out characteristic interaction on the sample compliance description characteristics of the plurality of second-stream data learning samples to obtain the sample compliance description characteristics.

3. The method of claim 2, wherein the feature interacting the sample compliance descriptive features of the plurality of second stream data learning samples to obtain the sample compliance descriptive features comprises:

connecting the sample compliance description features of the plurality of second flow data learning samples end to obtain the sample compliance description features;

or alternatively;

and summing the sample compliance characterization features of the plurality of second stream data learning samples to obtain the sample compliance characterization features.

4. The method of claim 1, wherein the debugged compliance recognition algorithm has a debugged significance recognition algorithm and a debugged recognition branch operator; the method further comprises the steps of:

acquiring circulation data to be validated by compliance;

Extracting features of the circulation data through the debugged significance identification algorithm to obtain target compliance description features of the circulation data;

and carrying out compliance recognition on the circulation data according to the target compliance description characteristics through the debugged recognition branch operator to obtain a compliance recognition result of the circulation data.

5. The method of claim 4, wherein the stream data comprises a plurality of stream records corresponding to different links; the feature extraction is performed on the circulation data through the debugged saliency recognition algorithm to obtain target compliance description features of the circulation data, and the feature extraction comprises the following steps:

extracting features of each piece of circulation data through the debugged significance identification algorithm to obtain sample compliance description features of each piece of circulation data;

and carrying out feature interaction on the sample compliance description features of the plurality of circulation data to obtain the target compliance description features.

6. The method of claim 1, wherein the method further comprises:

acquiring a third stream data learning sample and a detailed identification algorithm to be debugged; the third stream data learning sample has detail annotation information characterizing a true detail classification of the third stream data learning sample;

Performing compliance recognition on the detail classification of the third-stream data learning sample through the detail recognition algorithm to be debugged to obtain estimated detail classification of the third-stream data learning sample;

and adjusting algorithm parameters of the detail identification algorithm to be debugged according to the difference between the real detail classification and the estimated detail classification to obtain the debugged detail identification algorithm.

7. A multistage dealer circulation data processing system based on automatic check, characterized by comprising a check server and a circulation terminal in communication connection with the check server, wherein the check server comprises a memory and a processor, the memory stores a computer program capable of running on the processor, and the processor realizes the method of any one of claims 1-6 when executing the computer program.