WO2020062702A1 - 短信发送的方法、装置、计算机设备和存储介质 - Google Patents

短信发送的方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020062702A1
WO2020062702A1 PCT/CN2019/070119 CN2019070119W WO2020062702A1 WO 2020062702 A1 WO2020062702 A1 WO 2020062702A1 CN 2019070119 W CN2019070119 W CN 2019070119W WO 2020062702 A1 WO2020062702 A1 WO 2020062702A1
Authority
WO
WIPO (PCT)
Prior art keywords
invoice data
cluster
model
offset
real
Prior art date
Application number
PCT/CN2019/070119
Other languages
English (en)
French (fr)
Other versions
WO2020062702A8 (zh
WO2020062702A9 (zh
Inventor
夏良超
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020062702A1 publication Critical patent/WO2020062702A1/zh
Publication of WO2020062702A9 publication Critical patent/WO2020062702A9/zh
Publication of WO2020062702A8 publication Critical patent/WO2020062702A8/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Definitions

  • the present application relates to a method, device, computer equipment, and storage medium for monitoring enterprise invoice data.
  • the lender will analyze the loan qualification of the enterprise based on various data to provide loans to the enterprise. However, the loan is a long-term cooperation process. If the company's operating conditions are not good, the company's ability to repay may be reduced. Considering the risk, the lender will reduce the loan amount of the company or stop lending to the company.
  • the invoice data reflects the operating status of the enterprise to a certain extent. By analyzing the invoice data of the enterprise, the operating status of the enterprise can be obtained.
  • a method, a device, a computer device, and a storage medium for monitoring enterprise invoice data are provided.
  • An enterprise invoice data monitoring method includes:
  • the monitoring result of the real-time invoice data is obtained according to the nearest neighbor algorithm and the identification tag to which the sample belongs.
  • An enterprise invoice data monitoring device includes:
  • a data acquisition module for acquiring real-time invoice data to be monitored
  • An offset calculation module configured to input the real-time invoice data into a pre-trained clustering model, and detect an offset of a cluster center of the clustering model;
  • a sample selection module configured to obtain a plurality of historical invoice data that is closest to the real-time invoice data in the cluster model when the offset of the cluster center exceeds a threshold range, and serve as a sample of a nearest neighbor algorithm
  • a monitoring module is configured to obtain a monitoring result of the real-time invoice data according to the nearest neighbor algorithm and an identification tag to which the sample belongs.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors are executed. The following steps:
  • the monitoring result of the real-time invoice data is obtained according to the nearest neighbor algorithm and the identification tag to which the sample belongs.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the monitoring result of the real-time invoice data is obtained according to the nearest neighbor algorithm and the identification tag to which the sample belongs.
  • FIG. 1 is an application scenario diagram of an enterprise invoice data monitoring method according to one or more embodiments.
  • FIG. 2 is a schematic flowchart of a method for monitoring invoice data of an enterprise according to one or more embodiments.
  • FIG. 3 is a schematic flowchart of a step of training a clustering model according to one or more embodiments.
  • FIG. 4 is a schematic flowchart of a step of detecting an offset according to one or more embodiments.
  • FIG. 5 is a schematic diagram of sample distribution in a nearest neighbor algorithm in another embodiment.
  • FIG. 6 is a structural block diagram of an enterprise invoice data monitoring device according to one or more embodiments.
  • FIG. 7 is an internal structural diagram of a computer device according to one or more embodiments.
  • the method for monitoring enterprise invoice data provided in this application can be applied to the application environment shown in FIG. 1.
  • the invoice server 102 communicates with the server 104 through the network through the network.
  • the invoice server 102 and the server 104 may be implemented by independent servers or a server cluster composed of multiple servers.
  • the invoice server 102 stores the invoice data of the enterprise, or the invoice server 102 has the authority to obtain the invoice data of the enterprise.
  • the invoice server 102 communicates with the server 104, the invoice server 102 sends the invoice data of the enterprise to the server 104.
  • the cluster model is pre-trained in the server 104.
  • the cluster model is trained based on the invoice data. For different types of invoice data, different cluster models need to be trained. In this way, when the invoice data is received, the corresponding The clustering model detects the invoice data.
  • the server 104 After the server 104 obtains the invoice data from the invoice server 102, the invoice data is input into the clustering model, and the clustering model is iteratively trained. After the clustering model is stabilized, a new clustering center is formed, so that there is an offset. The server 104 calculates the offset and detects whether the offset exceeds a threshold range. When the threshold is exceeded, a sample of the nearest neighbor algorithm is selected in the clustering model, and the monitoring result of the invoice data is determined by the sample identification label.
  • a method for monitoring enterprise invoice data is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • Step 202 Obtain real-time invoice data to be monitored.
  • Invoice data refer to business vouchers issued by enterprises when they engage in sales activities.
  • the invoice data mainly records information such as invoice code, invoice number, invoice detail serial number, product name, specification model, unit of measure, product quantity, unit price, unit price including tax mark, amount, tax rate, and tax amount.
  • Invoices are divided into electronic invoices and ordinary invoices.
  • the invoice data can be obtained by identifying the corresponding location of the electronic invoice; for ordinary invoices, the image information of the ordinary invoice can be used to identify the corresponding location of the ordinary invoice in the picture. Get the invoice data.
  • a form of invoice data can be created according to an electronic invoice or a general invoice, and the server receives the form of the invoice data, and parses the form to obtain the invoice data.
  • Real-time invoice data represents the invoice data currently acquired during the monitoring cycle.
  • Real-time invoice data is for historical invoice data that has been monitored.
  • Step 204 Input the real-time invoice data into a pre-trained clustering model, and detect the offset of the clustering center of the clustering model.
  • the clustering model is a classification model that can classify a large amount of invoice data.
  • the invoice data in the trained clustering model is divided into multiple clusters, and each cluster includes a cluster center.
  • the clustering model will classify the real-time invoice data, that is, divide the real-time invoice data into one of the clusters, iterate the clustering model, and evaluate whether the evaluation function Iterate to the best cluster.
  • the cluster center may shift. At this time, the shift of the cluster center is detected.
  • step 206 when the offset of the cluster center exceeds a threshold range, a plurality of historical invoice data that is closest to the real-time invoice data in the cluster model is obtained as a sample of the nearest neighbor algorithm.
  • all cluster centers in the cluster model may be shifted, or only one cluster center may be shifted. Therefore, when calculating the offset, all cluster centers can be selected. The sum of the offsets.
  • a clustering model When a clustering model is obtained through training, a plurality of normal invoice data at a critical value can be input to analyze and obtain a threshold range.
  • the invoice data of the offset within the threshold range can be determined to be normal invoice data, and the invoice data of the offset exceeding the threshold range needs to further identify the invoice data.
  • the nearest neighbor algorithm is a classification algorithm that can classify and detect real-time invoice data based on multiple samples that are closest to each other.
  • the Euclidean distance between the real-time invoice data and the historical invoice data in the clustering model can be calculated, and then all Euclidean distances are sorted, and the historical invoice data with the closest Euclidean distance is selected until the sample size of the nearest neighbor algorithm is reached.
  • Step 208 Obtain the monitoring result of real-time invoice data according to the nearest neighbor algorithm and the identification tag to which the sample belongs.
  • the identification tag is a tag that the server can identify. Different identification tags represent different clusters to which the invoice data belongs. The server can obtain the corresponding clusters of invoice data by identifying the identification tags. Since the samples are selected from the clustering model, each sample is labeled by the identification tag, and the number of different identification tags is counted by the nearest neighbor algorithm, and then the real-time invoice data belongs to the identification tag, so as to determine the abnormality of the invoice data Types of.
  • the real-time invoice data monitoring method by acquiring the real-time invoice data to be monitored, inputting the real-time invoice data into a pre-trained clustering model, and detecting the offset of the clustering center of the clustering model.
  • the threshold range is exceeded, multiple historical invoice data that is closest to the real-time invoice data in the clustering model is taken as a sample of the nearest neighbor algorithm, and the real-time invoice data monitoring results are obtained according to the nearest neighbor algorithm and the identification label to which the sample belongs.
  • a pre-trained clustering model can accurately identify abnormal invoicing and false invoicing in the invoice data, and then determine the abnormal type of the abnormal invoice through the nearest neighbor algorithm, thereby receiving real-time invoice data to be monitored , It can improve the accuracy of identifying abnormal types of abnormal invoice data in real-time invoice data.
  • cluster models corresponding to multiple different types of invoice data of the enterprise are set in the server in advance.
  • the server receives the real-time invoice data of the enterprise to be monitored, it first determines the type of invoice data, and then selects the corresponding clustering model for invoice monitoring.
  • the products sold by the company 1 to the outside include the products A, B, and C.
  • the server obtains the historical invoice data for the products A, B, and C of the company 1, the invoice data is first classified and classified. Produce the invoice data corresponding to product A, product B, and product C, and then train the clustering models corresponding to product A, product B, and product C.
  • the server will The invoice data of product A is correspondingly input into the clustering model corresponding to product A.
  • the clustering model is encapsulated and stored in a server.
  • the server stores multiple encapsulated clustering models of multiple enterprises. When monitoring invoice data, only the required data is called through the index. Clustering model.
  • the encapsulated clustering model does not participate in the clustering operation in essence. That is, when the server monitors the invoice data, it calls the encapsulated clustering model, and then copies a virtual clustering model consistent with the clustering model.
  • the clustering model performs iterative calculations to determine whether the invoice data is abnormal.
  • the encapsulated clustering model includes fixed clusters and identification labels of invoice data in the clusters.
  • the identification labels of the invoice data need to be copied at the same time.
  • the encapsulated clustering model can be updated regularly, or it can be updated through trigger conditions, which can be commodity price adjustments, etc.
  • the clustering model by encapsulating the clustering model, each time the clustering model is used, the clustering model is not modified, thereby ensuring that the same clustering model is always used to monitor the invoice data, thereby effectively ensuring Accuracy of invoice data monitoring.
  • FIG. 3 a flowchart of the steps for training a clustering model in an embodiment is provided. The specific steps are as follows:
  • Step 302 Obtain historical invoice data for training a clustering model.
  • the historical invoice data can be the invoice data in the historical time period, and the invoice data needs to be the invoice data of similar products.
  • the historical invoice data is normal invoice data, so you can choose the invoice data after manual analysis as the historical invoice data.
  • Step 304 Extract the first characteristic parameter points in the historical invoice data, and select multiple first characteristic parameter points as the initial clustering center.
  • the invoice data includes a large amount of information, and a combination of two types of information can be selected to obtain the first characteristic parameter point. For example, a combination of tax amount and unit price can be used to obtain the first characteristic parameter point. A combination of other information can also obtain the first characteristic parameter point. .
  • the initial clustering center can be selected according to the distribution of the first feature parameter points, or can be predicted based on the classification of the invoice data, and then a first feature parameter point that matches each classification is selected as the initial clustering center.
  • first feature parameter points there are multiple classifications of invoice data, so multiple first feature parameter points need to be selected as the initial clustering center.
  • Step 306 Perform cluster training according to the initial cluster center and the first feature parameter point to obtain a cluster model.
  • the process of cluster training is a process of grouping the first feature parameter points.
  • the first feature parameter points are determined according to the distance from the initial cluster center to the first feature parameter points. Classification into each initial cluster center, and then iteratively update the cluster center until the cluster center is stable to obtain a cluster model.
  • the specific implementation of the clustering model is as follows:
  • a distance measure for example, Euclidean distance
  • the expression of Euclidean distance is:
  • d 12 represents the Euclidean distance between the historical invoice data and the initial cluster center
  • (x 1 , y 1 ) are the coordinates of the first feature parameter point converted into the preset coordinate system
  • (x 2 , y 2 ) is the initial cluster The center is transformed into coordinates in a preset coordinate system.
  • the evaluation function can choose the sum of squared errors.
  • the evaluation value is calculated for each iteration according to the evaluation function. The smaller the evaluation value, the more accurate the clustering. At the same time, the target value needs to be set. , Stop iteration to get the clustering model.
  • the expression of the sum of squared errors is:
  • FIG. 4 a schematic flowchart of an offset detection step in an embodiment is provided. The specific steps are as follows:
  • Step 402 Extract the second characteristic parameter point in the real-time invoice data, and input the second characteristic parameter point to the clustering model.
  • the extraction of the second feature parameter point can refer to the extraction of the first feature parameter point, that is, when the server receives the real-time invoice data, it selects the corresponding clustering model, then detects the extraction rule of the first feature parameter point in the clustering model, and then Extract the second feature parameter points.
  • An extraction rule in which the first feature parameter point and the second feature parameter point are consistent can also be set in advance.
  • Step 404 Perform cluster training according to the second feature parameter point, the cluster center of the cluster model, and the first feature parameter point to obtain an offset cluster model.
  • the offset clustering model refers to the clustering model obtained by iteratively stabilizing again after inputting the second feature parameter points in the clustering model.
  • Step 406 Obtain an offset cluster center of the offset cluster model, and obtain an offset of the cluster center of the cluster model according to the offset distance between the offset cluster center and the position of the cluster center.
  • the cluster center may change, that is, the offset distance between the offset cluster center and the cluster center is the offset.
  • step 404 in one embodiment, after the server selects the clustering model, it copies the clustering model to obtain a virtual clustering model, and then enters the second feature parameter points into the virtual clustering model to perform the virtual clustering model. Iteratively get the offset clustering model.
  • a sample of the nearest neighbor algorithm can be obtained in the following ways: Calculate the distance between the first feature parameter point and the second feature parameter point in the clustering model, and obtain multiple historical invoice data with the closest distance as the nearest neighbor algorithm Of samples.
  • the parameter points are used as samples, and the process is repeated K times to obtain K samples of the first characteristic parameter points.
  • training the clustering model requires a large number of first feature parameter points to ensure the accuracy of the clustering model. Therefore, when the historical invoice data is obtained, the historical invoice data needs to be classified. Specifically, the value of the product name field and the specification model field in the historical invoice data can be obtained to filter out the historical invoice data of the same product. Then, the total amount field value and the unit price field value of the commodity historical invoice data are extracted, and the first characteristic parameter point is obtained according to the total amount field value and the unit price field value of the product. In the embodiment of the present application, the total amount and the unit price of the product can reflect the sales of the product by the enterprise.
  • the excessively high or low price of the product may be caused by false invoicing, and the abnormal total amount of sales may also be caused by false invoicing or abnormal invoicing. Therefore, using the total amount and the unit price of the product as the first characteristic parameter point can accurately reflect whether the invoice data is abnormal.
  • the identification tags include: a false high price tag, a low price tag, a total false high tag, and a low total tag. Therefore, when training the clustering model, 4 initial clustering centers can be selected, and then 4 clusters are obtained, and the historical invoice data in the 4 clusters are labeled with identification tags, respectively.
  • the samples selected by the nearest neighbor algorithm are marked by the above identification tags, and then the number of falsely high price tags, low price tags, total false high tags, and total low tags in the sample are counted respectively, and the tags of the real-time invoice data are determined by the nearest neighbor algorithm. Mark to output abnormal classifications that are monitored for real-time invoice data. It is worth noting that the types of identification tags can be selected according to the actual situation of the invoice data, and are not limited to four.
  • the historical invoice data marked by the identification tags in the cluster are normal invoice data
  • the historical invoice data in the cluster is marked according to the types of abnormal invoice data and the distribution of the abnormal invoice data.
  • FIG. 5 is a schematic diagram of the sample distribution in the nearest neighbor algorithm.
  • the sample includes historical invoice data corresponding to the falsely high price tag and the total falsely high tag. Statistics show that there are more falsely high price tags than the total. The false high label can output the monitoring result of false high price.
  • the clustering model can be updated when predicting real-time invoices.
  • the specific operation is as follows: When the clustering center When the offset is within the threshold range, the real-time invoice data is acquired in the cluster corresponding to the offset clustering model, and the number of historical invoice data in the cluster is counted. When the number is smaller than the average value of the historical invoice data in other clusters, one historical tax data in the cluster with the most historical tax data is deleted, and the clustering model is updated based on the real-time invoice data.
  • the trained clustering model includes multiple clusters. Due to the randomness of historical invoice data during training, the historical invoice data in each cluster is unevenly distributed, which may cause inaccurate monitoring. Therefore, in When normal invoice data is monitored, it is necessary to determine whether the clustering model can be updated based on normal invoice data.
  • the judgment condition is set to whether the historical invoice data in the cluster of real-time invoice data is smaller than the average value of the historical invoice data in other clusters.
  • the average value of the historical invoice data in the other clusters refers to the historical invoice data in the other clusters and then is calculated based on the other clusters.
  • the number is taken as the mean. In order to ensure that the number of historical invoice data in the clustering model remains unchanged, you can choose to delete one historical invoice data from the cluster with the most historical invoice data, so that the monitoring does not change the complexity of the calculation.
  • a cluster model of multiple products of multiple enterprises is trained in the server in advance, and an index of the enterprise-product name-product model is established.
  • the taxpayer in the invoice data is read Field, product name field, and product model field to retrieve the corresponding clustering model, extract the characteristic parameter points in the invoice data, and then copy a virtual clustering model, enter the characteristic parameter points into the virtual clustering model, and then perform iterative calculation.
  • the virtual clustering model is stable, the current clustering center is detected to obtain the offset between the current clustering center and the clustering center of the clustering model. When the offset exceeds a threshold, the nearest neighbor algorithm is selected for invoice data.
  • the sample size of the nearest neighbor algorithm is determined, so that a corresponding number of feature parameter points of historical invoice data is selected as a sample in the virtual clustering model, and the monitoring result of the invoice data to be monitored is output through the identification tag to which the sample belongs.
  • steps in the flowcharts of FIGS. 2-4 are sequentially displayed in accordance with the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in Figure 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another step or a sub-step or stage of another step.
  • an enterprise invoice data monitoring device including: a data acquisition module 602, an offset calculation module 604, a sample selection module 606, and a monitoring module 608, where:
  • the data collection module 602 is configured to obtain real-time invoice data to be monitored.
  • An offset calculation module 604 is configured to input real-time invoice data into a pre-trained clustering model, and detect an offset of a cluster center of the clustering model.
  • a sample selection module 606 is configured to obtain a plurality of historical invoice data that are closest to the real-time invoice data in the clustering model as a sample of the nearest neighbor algorithm when the cluster center offset exceeds a threshold range.
  • the monitoring module 608 is configured to obtain the monitoring result of the real-time invoice data according to the nearest neighbor algorithm and the identification tag to which the sample belongs.
  • the real-time invoice data to be monitored is acquired through the data acquisition module 602, and the offset calculation module 604 inputs the real-time invoice data into a pre-trained clustering model to detect the offset of the clustering center of the clustering model.
  • the sample selection module 606 obtains multiple historical invoice data that is closest to the real-time invoice data in the cluster model as a sample of the nearest neighbor algorithm, and the monitoring module 608 according to the nearest neighbor algorithm and The identification tag that the sample belongs to, obtains the monitoring results of real-time invoice data.
  • a pre-trained clustering model can accurately identify abnormal invoicing and false invoicing in the invoice data, and then determine the abnormal type of the abnormal invoice through the nearest neighbor algorithm, thereby receiving real-time invoice data to be monitored This can improve the accuracy of identifying the abnormal type of abnormal invoice data.
  • it further includes a model training module for obtaining historical invoice data for training the clustering model, extracting the first feature parameter points in the historical invoice data, and selecting a plurality of first feature parameter points as the initial clustering.
  • the class center performs cluster training according to the initial cluster center and the first feature parameter point to obtain a cluster model.
  • the offset calculation module 604 is used to extract the second feature parameter points in the real-time invoice data, input the second feature parameter points to the clustering model, and cluster the clusters according to the second feature parameter points and the clustering model. Center and the first feature parameter point to perform cluster training to obtain the offset cluster model, obtain the offset cluster center of the offset cluster model, and obtain the cluster according to the offset distance between the offset cluster center and the position of the cluster center. The offset of the cluster center of the class model.
  • the sample selection module 606 is further configured to calculate the distance between the first feature parameter point and the second feature parameter point in the clustering model, and obtain a plurality of closest historical invoice data as a sample of the nearest neighbor algorithm.
  • the model training module is further configured to obtain historical product invoice data with the same product name field value and specification model field value in historical invoice data, extract the total amount field value and product unit price in the historical product invoice data.
  • the field value according to the total amount field value and the product unit price field value, obtains a first characteristic parameter point.
  • the identification tags include: false high price tags, low price tags, total false high tags, and total low tags.
  • the monitoring module 608 is also used to count samples with falsely high price tags, and partial price tags. The number of low label, total false high label, and total low label.
  • the identification label that determines the most samples among the price high label, low price label, total false high label, and low total label is the monitoring result of real-time invoice data. .
  • it further includes an update module for obtaining real-time invoice data in the cluster corresponding to the offset cluster model when the offset of the cluster center is within a threshold range, and counting the number of historical invoice data in the cluster. ; When the number is smaller than the average of the historical invoice data in other clusters, delete one historical invoice data in the cluster with the largest historical invoice data number, and update the clustering model according to the real-time invoice data.
  • Each module in the above-mentioned enterprise invoice data monitoring device can be realized in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 7.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile computer-readable storage medium and an internal memory.
  • the non-volatile computer-readable storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for operating systems and computer-readable instructions in a non-volatile computer-readable storage medium.
  • the database of the computer equipment is used to store the data monitored by the enterprise invoice data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a method for monitoring enterprise invoice data.
  • FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • the specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • Computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, the one or more processors execute the following steps:
  • the monitoring result of the real-time invoice data is obtained according to the nearest neighbor algorithm and the identification tag to which the sample belongs.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the monitoring result of the real-time invoice data is obtained according to the nearest neighbor algorithm and the identification tag to which the sample belongs.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

一种企业发票数据监测方法,包括:获取待监测的实时发票数据,将实时发票数据输入预先训练的聚类模型,检测聚类模型的聚类中心的偏移量,当聚类中心的偏移量超过阈值范围时,获取聚类模型中与实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本,根据最近邻算法以及样本所属的识别标签,得到实时发票数据的监测结果。

Description

短信发送的方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2018年9月26日提交中国专利局,申请号为201811122776.5,申请名称为“企业发票数据监测方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种企业发票数据监测方法、装置、计算机设备和存储介质。
背景技术
目前,企业需要贷款时,会给放贷方提供各项企业资料和各个平台的资料获取权限,放贷方根据各项资料分析企业的贷款资质,为企业提供贷款。然而贷款是个长期合作的过程,如果企业的经营状况不佳,可能导致企业的偿还能力降低,考虑到风险问题,放贷方会降低企业的贷款额度或者停止给企业放款。发票数据一定程度反映企业的经营状况,通过对企业发票数据进行分析,可以得到企业的经营状况。
然而,发明人意识到,发票数据中内容繁多,发票数据量大,在人工分析发票数据时,对于异常开票、虚假开票的发票数据的识别精度低,从而无法准确的识别异常开票、虚假开票的发票数据的异常类型。
发明内容
根据本申请公开的各种实施例,提供一种企业发票数据监测方法、装置、计算机设备和存储介质。
一种企业发票数据监测方法包括:
获取待监测的实时发票数据;
将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
一种企业发票数据监测装置包括:
数据采集模块,用于获取待监测的实时发票数据;
偏移计算模块,用于将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模 型的聚类中心的偏移量;
样本选择模块,用于当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
监测模块,用于根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取待监测的实时发票数据;
将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取待监测的实时发票数据;
将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中企业发票数据监测方法的应用场景图。
图2为根据一个或多个实施例中企业发票数据监测方法的流程示意图。
图3为根据一个或多个实施例中训练聚类模型步骤的流程示意图。
图4为根据一个或多个实施例中检测偏移量步骤的流程示意图。
图5为另一个实施例中最近邻算法中样本分布的示意图。
图6为根据一个或多个实施例中企业发票数据监测装置的结构框图。
图7为根据一个或多个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的企业发票数据监测方法,可以应用于如图1所示的应用环境中。其中,发票服务器102通过网络与服务器104通过网络进行通信。其中,发票服务器102和服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
发票服务器102中存储企业的发票数据,或者发票服务器102具有获取企业的发票数据的权限,发票服务器102和服务器104进行通讯时,发票服务器102将企业的发票数据发送给服务器104。
服务器104中预先训练了聚类模型,聚类模型是根据发票数据训练得到的,针对不同类型的发票数据,需要训练不同的聚类模型,以此,在接收到发票数据时,才能利用对应的聚类模型进行发票数据的检测。
服务器104从发票服务器102获取发票数据后,将发票数据输入聚类模型中,聚类模型进行迭代训练,稳定后,会形成新的聚类中心,从而存在偏移量。服务器104计算该偏移量,并检测该偏移量是否超过阈值范围,超过阈值范围时,然后从而聚类模型中选择最近邻算法的样本,通过样本的识别标签确定发票数据的监测结果。
在其中一个实施例中,如图2所示,提供了一种企业发票数据监测方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤202,获取待监测的实时发票数据。
发票数据指的是企业在从事销售活动时,开具的业务凭证。发票数据中主要记录了发票代码、发票号码、发票明细序号、商品名称、规格型号、计量单位、商品数量、商品单价、单价含税标志、金额、税率、税额等信息。发票分为电子发票和普通发票,对与电子发票,可以通过识别电子发票相应的位置,得到发票数据;对于普通发票,可以通过将普通发票的图像信息,然后识别图片中普通发票的相应位置,得到发票数据。
在其中一个实施例中,根据电子发票或者普通发票可以建立发票数据的表单,服务器接收发票数据的表单,从而解析表单得到发票数据。
实时发票数据表示在监测周期中,当前获取的发票数据。实时发票数据是针对已完成监测的历史发票数据而言的。
步骤204,将实时发票数据输入预先训练的聚类模型,检测聚类模型的聚类中心的偏移量。
聚类模型是一种分类模型,可以将大量发票数据进行分类。训练好的聚类模型中的发票数据被分成多个簇,每个簇内均包括一个聚类中心。在将实时发票数据输入训练好的聚类模型中时,聚类模型会实时发票数据进行分类,即将实时发票数据划分至其中的一个簇内,对聚类模型进行迭代运算,通过评价函数评估是否迭代到最佳聚类,在迭代到最佳聚类时,聚类中心可能会发生偏移,此时,检测聚类中心的偏移量。
步骤206,当聚类中心的偏移量超过阈值范围时,获取聚类模型中与实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本。
在聚类中心发生偏移时,可能聚类模型中所有的聚类中心均发生偏移,也可能只是其中一个聚类中心发生偏移,因此在计算偏移量时,可以选择所有聚类中心偏移量的总和。
在训练得到聚类模型时,可以根据输入多个处于临界值的正常发票数据,从而分析得到阈值范围。阈值范围内的偏移量的发票数据可以确定是正常的发票数据,超过阈值范围的偏移量的发票数据需要进一步对发票数据进行识别。
另外,最近邻算法是一种分类算法,可以根据距离最近的多个样本对实时发票数据进行分类检测。在确定距离最近时,可以计算实时发票数据与聚类模型中历史发票数据的欧式距离,然后对所有欧式距离进行排序,选择欧式距离最近的历史发票数据直至达到最近邻算法的样本大小。
步骤208,根据最近邻算法以及样本所属的识别标签,得到实时发票数据的监测结果。
识别标签为一种服务器可以识别的标签,不同的识别标签表示发票数据所属于的不同簇,服务器通过对识别标签进行识别,可以得到发票数据对应簇。由于样本是从聚类模型中选择的,因此每个样本均被识别标签进行标记,通过最近邻算法统计不同识别标签的数量,然后判断实时发票数据所属的是识别标签,从而确定发票数据的异常类型。
上述企业发票数据监测方法中,通过获取待监测的实时发票数据,将实时发票数据输入预先训练的聚类模型,检测聚类模型的聚类中心的偏移量,当聚类中心的偏移量超过阈值范围时,获取聚类模型中与实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本,根据最近邻算法以及样本所属的识别标签,得到实时发票数据的监测结果。本申请实施例,通过预先训练的聚类模型,能够精确的识别发票数据中异常开票、虚假开票的情况,然后通过最近邻算法确定异常发票的异常类型,从而在接收到待监测的实时发票数据时,能够提高识别实时发票数据中异常发票数据的异常类型的准确性。
在其中一个实施例中,针对不同类型的发票数据,需要设置不同的聚类模型,即服务器中预先设置企业多个不同类型的发票数据对应的聚类模型。在服务器接收到企业的待监测实时发票数据时,首先确定发票数据的类型,然后选择对应的聚类模型进行发票监测。
在其中一个实施例中,企业1对外销售的商品包括商品A、商品B以及商品C,服务器获取到企业1商品A、商品B以及商品C的历史发票数据后,首先对发票数据进行分类,分类出商品A、商品B以及商品C对应的发票数据然后分别训练得到商品A、商品B以及商品C对应的聚类模型,在对发票数据进行监测时,若接收到商品A的发票数据, 服务器将商品A的发票数据对应输入商品A对应的聚类模型中。
在另一些实施例中,将聚类模型进行封装后,存储在服务器中,服务器中存储多个企业的多个封装后的聚类模型,在进行发票数据监测时,只需要通过索引调用需要的聚类模型。封装后的聚类模型实质不参与聚类的运算,即服务器在进行发票数据监测时,调用封装的聚类模型,然后拷贝一个与聚类模型一致的虚拟聚类模型,通过将发票数据输入虚拟聚类模型进行迭代计算,从而判断发票数据是否异常。
在其中一个实施例中,封装后的聚类模型包括固定的簇以及簇内发票数据的识别标签,在进行拷贝时,同时需要拷贝发票数据的识别标签。封装后的聚类模型可以定时进行更新,也可以通过触发条件进行更新,触发条件可以是商品价格调整等。
本申请实施例中,通过将聚类模型进行封装,可以在每次使用聚类模型时,保证聚类模型不被修改,从而保证始终采用同一个聚类模型对发票数据进行监测,从而有效保证发票数据监测的准确性。
在其中一个实施例中,如图3所示,提供一实施例中训练聚类模型步骤的流程示意图,具体步骤如下:
步骤302,获取用于训练聚类模型的历史发票数据。
历史发票数据可以是历史时间段内的发票数据,发票数据需要是同类商品的发票数据。另外,历史发票数据均为正常发票数据,因此可以选择人工分析后的发票数据作为历史发票数据。
步骤304,提取历史发票数据中的第一特征参数点,选择多个第一特征参数点作为初始聚类中心。
发票数据中包括大量的信息,可以选择两种信息的组合,得到第一特征参数点,例如,可以采用税额和单价组合得到第一特征参数点,其他信息的组合也可以得到第一特征参数点。
初始聚类中心可以根据第一特征参数点的分布情况进行选择,也可以根据对发票数据的分类情况进行预测,然后选择符合每种分类中的一个第一特征参数点作为初始聚类中心。一般而言,发票数据存在多种分类,因此需要选择多个第一特征参数点作为初始聚类中心。
步骤306,根据初始聚类中心以及第一特征参数点进行聚类训练,得到聚类模型。
本申请实施例中,聚类训练的过程即将第一特征参数点进行分组的过程,在确定初始聚类中心时,根据初始聚类中心到第一特征参数点的距离,将第一特征参数点分类到各个初始聚类中心中,然后通过迭代不断的更新聚类中心直至聚类中心稳定,得到聚类模型。
在一实施例中,聚类模型的具体实现如下:
S1,在历史发票数据中选择K个作为初始聚类中心。
S2,选定距离量度,例如:欧式距离,计算每个历史发票数据与初始聚类中心的欧式距离,按照欧式距离的排序将每个历史发票数据指派给初始聚类中心形成簇。欧式距离的 表达式为:
Figure PCTCN2019070119-appb-000001
其中d 12表示历史发票数据与初始聚类中心的欧式距离,(x 1,y 1)为第一特征参数点转化到预设坐标系中的坐标,(x 2,y 2)为初始聚类中心转化到预设坐标系中的坐标。
S3,在每个簇内重新选择聚类中心。
S4,选定评价函数,评价函数可以选择误差平方和,根据评价函数计算每次迭代的评价值,评价值越小表示聚类越精确,同时需要设定目标值,当评价值达到目标值时,停止迭代,即得到聚类模型。误差平方和的表达式为:
Figure PCTCN2019070119-appb-000002
()表示欧式距离函数。
在另一些实施例中,如图4所示,提供一实施例中检测偏移量步骤的流程示意图,具体步骤如下:
步骤402,提取实时发票数据中的第二特征参数点,将第二特征参数点输入聚类模型。
第二特征参数点的提取可以参考第一特征参数点的提取,即服务器在接收到实时发票数据时,选择对应的聚类模型,然后检测聚类模型中第一特征参数点的提取规则,然后提取第二特征参数点。也可以预先设置第一特征参数点和第二特征参数点保持一致的提取规则。
步骤404,根据第二特征参数点、聚类模型的聚类中心和第一特征参数点进行聚类训练,得到偏移聚类模型。
偏移聚类模型是指聚类模型中输入第二特征参数点后,再次迭代稳定得到的聚类模型。
步骤406,获取偏移聚类模型的偏移聚类中心,根据偏移聚类中心和聚类中心位置的偏移距离,得到所述聚类模型的聚类中心的偏移量。
通过聚类模型训练得到偏移聚类模型的过程中,聚类中心可能改变,即偏移聚类中心与聚类中心的偏移距离为偏移量。
对于步骤404,在其中一个实施例中,服务器选择聚类模型后,对聚类模型进行拷贝,得到虚拟聚类模型,然后将第二特征参数点输入虚拟聚类模型,对虚拟聚类模型进行迭代得到偏移聚类模型。
在另一些实施例中,可以通过以下方式获取最近邻算法的样本:计算聚类模型中第一特征参数点与第二特征参数点的距离,获取距离最近的多个历史发票数据作为最近邻算法的样本。
在其中一个实施例中,首先计算第二特征参数点与第一特征参数点的欧式距离,然后确定最近邻算法的样本大小,例如样本大小为K,则选择所有欧式距离中最小的第一特征参数点作为样本,重复这个过程K次得到K个第一特征参数点的样本。
在其中一个实施例中,训练聚类模型需要大量的第一特征参数点,才能保证聚类模型的精确度。因此在获取到历史发票数据时,需要对历史发票数据进行分类,具体可以获取历史发票数据中商品名称字段和规格型号字段值,从而筛选出相同商品的商品历史发票数据。然后提取出商品历史发票数据中合计金额字段值和商品单价字段值,根据合计金额字段值和商品单价字段值得到第一特征参数点。本申请实施例中,合计金额和商品单价可以反映企业该商品的销售情况,商品价格过高或者过低均可能是虚假开票导致,异常的销售合计金额也可能是虚假开票或者异常开票导致的,因此,利用合计金额和商品单价作为第一特征参数点可以准确的反映发票数据是否异常。
在另一些实施例中,识别标签包括:价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签。因此,在训练聚类模型时,可以选择4个初始聚类中心,然后得到4个簇,对4个簇内的历史发票数据分别采用识别标签进行标记。最近邻算法选择的样本均被上述识别标签标记,然后分别统计样本中价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签的数目,通过最近邻算法确定实时发票数据的标签标记,从而输出监测到实时发票数据的异常分类。值得说明的是,可以根据发票数据的实际情况选择识别标签的种类,而不限于是4种。
本申请实施例中,簇内被识别标签标记的历史发票数据均为正常发票数据,根据预先分析各种异常发票数据的类型以及异常发票数据的分布情况,对簇内历史发票数据进行标记。对于已训练好的聚类模型,可以监测到发票数据是否异常,但是无法监测到发票数据的异常类型,通过最近邻算法以及设置的识别标签,可以对异常发票数据的异常类型进行预测,从而使服务器在检测到异常发票数据时可以输出异常发票数据的异常类型,从而完成实时发票数据的监测。
在其中一个实施例中,图5为最近邻算法中样本分布的示意图,在图5中,样本包括价格虚高标签和总额虚高标签对应的历史发票数据,统计可知价格虚高标签多于总额虚高标签,因此可以输出价格虚高的监测结果。
在其中一个实施例中,由于聚类模型训练好之后即封装,为了保证监测发票数据的准确性,可以在对实时发票进行预测时,进行聚类模型的更新,具体操作如下:当聚类中心的偏移量在阈值范围内时,获取实时发票数据在偏移聚类模型对应的簇,统计簇内历史发票数据的数目。当数目小于其他簇内历史发票数据的数目的均值时,删除历史税务数据数量最多的簇中的一个历史税务数据,根据实时发票数据更新聚类模型。
本申请实施例中,训练好的聚类模型包括多个簇,由于训练时历史发票数据的随机性,每个簇内的历史发票数据分配不均,可能导致监测不准确的问题,因此,在监测到正常的发票数据时,需要判断是否可以根据正常的发票数据对聚类模型进行更新。判断的条件设置为实时发票数据的簇内的历史发票数据是否小于其他簇内历史发票数据的均值,其他簇内历史发票数据的均值指的是其他簇内历史发票数据求和后再根据其他簇的个数取均值。为了保证聚类模型中历史发票数据个数的不变,可以将历史发票数据最多的簇中选择删除 一个历史发票数据,从而在进行监测时,不改变计算的复杂程度。
在一些具体实施例中,服务器中预先训练多个企业多个商品的聚类模型,建立企业-商品名称-商品型号的索引,在接收到待监测发票数据时,读取发票数据中的纳税人字段、商品名称字段和商品型号字段检索到对应的聚类模型,提取发票数据中的特征参数点,然后拷贝一个虚拟聚类模型,将特征参数点输入虚拟聚类模型中,然后进行迭代计算,当虚拟聚类模型稳定后,检测当前的聚类中心,获取当前的聚类中心和聚类模型的聚类中心的偏移量,当偏移量超过阈值时,选定最近邻算法进行发票数据类型的判断,确定最近邻算法的样本大小,从而虚拟聚类模型中选择对应数目的历史发票数据的特征参数点作为样本,从而通过样本所属的识别标签输出对待监测发票数据的监测结果。
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图6所示,提供了一种企业发票数据监测装置,包括:数据采集模块602、偏移计算模块604、样本选择模块606和监测模块608,其中:
数据采集模块602,用于获取待监测的实时发票数据。
偏移计算模块604,用于将实时发票数据输入预先训练的聚类模型,检测聚类模型的聚类中心的偏移量。
样本选择模块606,用于当聚类中心的偏移量超过阈值范围时,获取聚类模型中与实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本。
监测模块608,用于根据最近邻算法以及样本所属的识别标签,得到实时发票数据的监测结果。
上述企业发票数据监测装置中,通过数据采集模块602获取待监测的实时发票数据,偏移计算模块604将实时发票数据输入预先训练的聚类模型,检测聚类模型的聚类中心的偏移量,当聚类中心的偏移量超过阈值范围时,样本选择模块606获取聚类模型中与实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本,监测模块608根据最近邻算法以及样本所属的识别标签,得到实时发票数据的监测结果。本申请实施例,通过预先训练的聚类模型,能够精确的识别发票数据中异常开票、虚假开票的情况,然后通过最近邻算法确定异常发票的异常类型,从而在接收到待监测的实时发票数据时,能够提高识别异常发票数据的异常类型的准确性。
在其中一个实施例中,还包括模型训练模块,用于获取用于训练聚类模型的历史发票数据,提取历史发票数据中的第一特征参数点,选择多个第一特征参数点作为初始聚类中 心,根据初始聚类中心以及第一特征参数点进行聚类训练,得到聚类模型。
在其中一个实施例中,偏移计算模块604用于提取实时发票数据中的第二特征参数点,将第二特征参数点输入聚类模型,根据第二特征参数点、聚类模型的聚类中心和第一特征参数点进行聚类训练,得到偏移聚类模型,获取偏移聚类模型的偏移聚类中心,根据偏移聚类中心和聚类中心位置的偏移距离,得到聚类模型的聚类中心的偏移量。
在其中一个实施例中,样本选择模块606还用于计算聚类模型中第一特征参数点与第二特征参数点的距离,获取距离最近的多个历史发票数据作为最近邻算法的样本。
在其中一个实施例中,模型训练模块,还用于获取历史发票数据中商品名称字段值以及规格型号字段值均相同的商品历史发票数据,提取商品历史发票数据中的合计金额字段值以及商品单价字段值,根据合计金额字段值以及商品单价字段值,得到第一特征参数点。
在其中一个实施例中,识别标签包括:价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签,监测模块608还用于统计样本的识别标签中价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签的数量,确定价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签中数量最多的样本的识别标签为实时发票数据的监测结果。
在其中一个实施例中,还包括更新模块,用于当聚类中心的偏移量在阈值范围内时,获取实时发票数据在偏移聚类模型对应的簇,统计簇内历史发票数据的数目;当数目小于其他簇内历史发票数据的数目的均值时,删除历史发票数据数量最多的簇中的一个历史发票数据,根据实时发票数据更新所述聚类模型。
关于企业发票数据监测装置的具体限定可以参见上文中对于企业发票数据监测方法的限定,在此不再赘述。上述企业发票数据监测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性计算机可读存储介质、内存储器。该非易失性计算机可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性计算机可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储企业发票数据监测的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种企业发票数据监测方法。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令, 计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:
获取待监测的实时发票数据;
将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取待监测的实时发票数据;
将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种企业发票数据监测方法,所述方法包括:
    获取待监测的实时发票数据;
    将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
    当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
    根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取用于训练聚类模型的历史发票数据;
    提取历史发票数据中的第一特征参数点,选择多个第一特征参数点作为初始聚类中心;及
    根据所述初始聚类中心以及所述第一特征参数点进行聚类训练,得到聚类模型。
  3. 根据权利要求2所述的方法,其特征在于,将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量,包括:
    提取所述实时发票数据中的第二特征参数点,将所述第二特征参数点输入所述聚类模型;
    根据所述第二特征参数点、所述聚类模型的聚类中心和所述第一特征参数点进行聚类训练,得到偏移聚类模型;及
    获取所述偏移聚类模型的偏移聚类中心,根据所述偏移聚类中心和所述聚类中心位置的偏移距离,得到所述聚类模型的聚类中心的偏移量。
  4. 根据权利要求3所述的方法,其特征在于,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本,包括:
    计算所述聚类模型中第一特征参数点与所述第二特征参数点的距离;及
    获取距离最近的多个历史发票数据作为最近邻算法的样本。
  5. 根据权利要求2至4任一项所述的方法,其特征在于,所述提取历史发票数据中的第一特征参数点,包括:
    获取历史发票数据中商品名称字段值以及规格型号字段值均相同的商品历史发票数据;及
    提取商品历史发票数据中的合计金额字段值以及商品单价字段值,根据所述合计金额字段值以及所述商品单价字段值,得到第一特征参数点。
  6. 根据权利要求1至4任一项所述的方法,其特征在于,所述识别标签包括:价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签;
    根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结 果,包括:
    统计所述样本的识别标签中价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签的数量,确定价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签中数量最多的样本的识别标签为所述实时发票数据的监测结果。
  7. 根据权利要求1至4任一项所述的方法,其特征在于,还包括:
    当聚类中心的偏移量在阈值范围内时,获取所述实时发票数据在所述偏移聚类模型对应的簇,统计簇内历史发票数据的数目;及
    当所述数目小于其他簇内历史发票数据的数目的均值时,删除历史发票数据数量最多的簇中的一个历史发票数据,根据所述实时发票数据更新所述聚类模型。
  8. 一种企业发票数据监测装置,其特征在于,所述装置包括:
    数据采集模块,用于获取待监测的实时发票数据;
    偏移计算模块,用于将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
    样本选择模块,用于当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
    监测模块,用于根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
  9. 根据权利要求8所述的装置,其特征在于,还包括:模型训练模块;
    所述模型训练模块,用于获取用于训练聚类模型的历史发票数据;
    提取历史发票数据中的第一特征参数点,选择多个第一特征参数点作为初始聚类中心;及
    根据所述初始聚类中心以及所述第一特征参数点进行聚类训练,得到聚类模型。
  10. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取待监测的实时发票数据;
    将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
    当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
    根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结果。
  11. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取用于训练聚类模型的历史发票数据;
    提取历史发票数据中的第一特征参数点,选择多个第一特征参数点作为初始聚类中心;及
    根据所述初始聚类中心以及所述第一特征参数点进行聚类训练,得到聚类模型。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    提取所述实时发票数据中的第二特征参数点,将所述第二特征参数点输入所述聚类模型;
    根据所述第二特征参数点、所述聚类模型的聚类中心和所述第一特征参数点进行聚类训练,得到偏移聚类模型;及
    获取所述偏移聚类模型的偏移聚类中心,根据所述偏移聚类中心和所述聚类中心位置的偏移距离,得到所述聚类模型的聚类中心的偏移量。
  13. 根据权利要求12所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    计算所述聚类模型中第一特征参数点与所述第二特征参数点的距离;及
    获取距离最近的多个历史发票数据作为最近邻算法的样本。
  14. 根据权利要求11至13任一项所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取历史发票数据中商品名称字段值以及规格型号字段值均相同的商品历史发票数据;及
    提取商品历史发票数据中的合计金额字段值以及商品单价字段值,根据所述合计金额字段值以及所述商品单价字段值,得到第一特征参数点。
  15. 根据权利要求10至13任一项所述的计算机设备,其特征在于,所述识别标签包括:价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签;
    所述处理器执行所述计算机可读指令时还执行以下步骤:
    统计所述样本的识别标签中价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签的数量,确定价格虚高标签、价格偏低标签、总额虚高标签以及总额偏低标签中数量最多的样本的识别标签为所述实时发票数据的监测结果。
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取待监测的实时发票数据;
    将所述实时发票数据输入预先训练的聚类模型,检测所述聚类模型的聚类中心的偏移量;
    当所述聚类中心的偏移量超过阈值范围时,获取所述聚类模型中与所述实时发票数据距离最近的多个历史发票数据作为最近邻算法的样本;及
    根据所述最近邻算法以及所述样本所属的识别标签,得到所述实时发票数据的监测结 果。
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取用于训练聚类模型的历史发票数据;
    提取历史发票数据中的第一特征参数点,选择多个第一特征参数点作为初始聚类中心;及
    根据所述初始聚类中心以及所述第一特征参数点进行聚类训练,得到聚类模型。
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    提取所述实时发票数据中的第二特征参数点,将所述第二特征参数点输入所述聚类模型;
    根据所述第二特征参数点、所述聚类模型的聚类中心和所述第一特征参数点进行聚类训练,得到偏移聚类模型;及
    获取所述偏移聚类模型的偏移聚类中心,根据所述偏移聚类中心和所述聚类中心位置的偏移距离,得到所述聚类模型的聚类中心的偏移量。
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    计算所述聚类模型中第一特征参数点与所述第二特征参数点的距离;及
    获取距离最近的多个历史发票数据作为最近邻算法的样本。
  20. 根据权利要求17至19任一项所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取历史发票数据中商品名称字段值以及规格型号字段值均相同的商品历史发票数据;及
    提取商品历史发票数据中的合计金额字段值以及商品单价字段值,根据所述合计金额字段值以及所述商品单价字段值,得到第一特征参数点。
PCT/CN2019/070119 2018-09-26 2019-01-02 短信发送的方法、装置、计算机设备和存储介质 WO2020062702A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811122776.5 2018-09-26
CN201811122776.5A CN109299259A (zh) 2018-09-26 2018-09-26 企业发票数据监测方法、装置、计算机设备和存储介质

Publications (3)

Publication Number Publication Date
WO2020062702A1 true WO2020062702A1 (zh) 2020-04-02
WO2020062702A9 WO2020062702A9 (zh) 2020-11-26
WO2020062702A8 WO2020062702A8 (zh) 2020-12-30

Family

ID=65164262

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/070119 WO2020062702A1 (zh) 2018-09-26 2019-01-02 短信发送的方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN109299259A (zh)
WO (1) WO2020062702A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114244824A (zh) * 2021-11-25 2022-03-25 国家计算机网络与信息安全管理中心河北分中心 一种网络空间WEB类资产风险Server同性快速识别的方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084620B (zh) * 2019-04-16 2022-08-12 上海交通大学 基于深度学习的电子凭据高频异常开具检测系统及方法
CN110473034A (zh) * 2019-08-22 2019-11-19 携程旅游网络技术(上海)有限公司 电子发票的红冲方法、系统、电子设备和介质
CN111027607B (zh) * 2019-11-29 2023-10-17 泰康保险集团股份有限公司 无监督高维数据特征重要性评估与选择的方法及装置
CN111126966A (zh) * 2019-12-25 2020-05-08 卓尔智联(武汉)研究院有限公司 票据审核方法、装置、计算机设备和计算机可读存储介质
CN113313213B (zh) * 2021-07-28 2021-11-19 中国航空油料集团有限公司 一种加速目标检测算法训练的数据集处理方法
CN114115719B (zh) * 2021-08-24 2022-10-18 深圳市木浪云科技有限公司 基于io模式识别的io批量处理方法、装置及存储介质
CN116561693A (zh) * 2023-05-26 2023-08-08 工业富联(佛山)产业示范基地有限公司 注塑机异常确定方法、电子设备及存储介质
CN116360956B (zh) * 2023-06-02 2023-08-08 济南大陆机电股份有限公司 用于大数据任务调度的数据智能处理方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2840542A2 (en) * 2013-08-19 2015-02-25 Compass Plus (GB) Limited Method and system for detection of fraudulent transactions
CN107133833A (zh) * 2016-02-26 2017-09-05 阿里巴巴集团控股有限公司 异常交易识别方法及装置
CN108268898A (zh) * 2018-01-19 2018-07-10 大象慧云信息技术有限公司 一种基于K-Means的电子发票用户聚类方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3995099B2 (ja) * 2004-07-27 2007-10-24 国立医薬品食品衛生研究所長 高次元データを塊に分割する装置
US9336484B1 (en) * 2011-09-26 2016-05-10 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration (Nasa) System and method for outlier detection via estimating clusters
CN106874923A (zh) * 2015-12-14 2017-06-20 阿里巴巴集团控股有限公司 一种商品的风格分类确定方法及装置
CN107358519A (zh) * 2017-05-18 2017-11-17 新疆航天信息有限公司 发票监控方法及系统
CN107194400B (zh) * 2017-05-31 2019-12-20 北京天宇星空科技有限公司 一种财务报销全票据图片识别处理方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2840542A2 (en) * 2013-08-19 2015-02-25 Compass Plus (GB) Limited Method and system for detection of fraudulent transactions
CN107133833A (zh) * 2016-02-26 2017-09-05 阿里巴巴集团控股有限公司 异常交易识别方法及装置
CN108268898A (zh) * 2018-01-19 2018-07-10 大象慧云信息技术有限公司 一种基于K-Means的电子发票用户聚类方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHOU, GUOBING ET AL.: "Clustering Method Based on Nearest Neighbors Representation", JOURNAL OF SOFTWARE, no. 11, 15 November 2015 (2015-11-15), ISSN: 1000-9825 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114244824A (zh) * 2021-11-25 2022-03-25 国家计算机网络与信息安全管理中心河北分中心 一种网络空间WEB类资产风险Server同性快速识别的方法
CN114244824B (zh) * 2021-11-25 2024-05-03 国家计算机网络与信息安全管理中心河北分中心 一种网络空间WEB类资产风险Server同性快速识别的方法

Also Published As

Publication number Publication date
WO2020062702A8 (zh) 2020-12-30
WO2020062702A9 (zh) 2020-11-26
CN109299259A (zh) 2019-02-01

Similar Documents

Publication Publication Date Title
WO2020062702A1 (zh) 短信发送的方法、装置、计算机设备和存储介质
WO2021052031A1 (zh) 基于统计四分位距的商品库存风险预警方法、系统及计算机可读存储介质
CN109598095B (zh) 评分卡模型的建立方法、装置、计算机设备和存储介质
US10805151B2 (en) Method, apparatus, and storage medium for diagnosing failure based on a service monitoring indicator of a server by clustering servers with similar degrees of abnormal fluctuation
WO2019218699A1 (zh) 欺诈交易判断方法、装置、计算机设备和存储介质
AU2016201425B2 (en) Systems and methods for predictive reliability mining
CN109446061B (zh) 一种页面检测方法、计算机可读存储介质及终端设备
CN110471945B (zh) 活跃数据的处理方法、系统、计算机设备和存储介质
WO2023056723A1 (zh) 故障诊断的方法、装置、电子设备及存储介质
CN107679734A (zh) 一种用于无标签数据分类预测的方法和系统
CN111045894A (zh) 数据库异常检测方法、装置、计算机设备和存储介质
CN112613569B (zh) 图像识别方法、图像分类模型的训练方法及装置
CN110942190A (zh) 排队时间预测方法、装置、计算机设备以及存储介质
JP2019215698A (ja) 画像検査支援装置および方法
CN115545103A (zh) 异常数据识别、标签识别方法和异常数据识别装置
CN114584377A (zh) 流量异常检测方法、模型的训练方法、装置、设备及介质
WO2022022042A1 (zh) 监控数据上报方法、装置、计算机设备及存储介质
CN114595765A (zh) 数据处理方法、装置、电子设备及存储介质
CN113947076A (zh) 保单数据的检测方法、装置、计算机设备及存储介质
CN113326177A (zh) 一种指标异常检测方法、装置、设备及存储介质
CN116610821B (zh) 一种基于知识图谱的企业风险分析方法、系统和存储介质
CN109542947B (zh) 数据统计方法、装置、计算机设备和存储介质
CN111277465A (zh) 一种异常数据报文检测方法、装置及电子设备
CN114785616A (zh) 数据风险检测方法、装置、计算机设备及存储介质
JP6948470B1 (ja) 修理支援システムおよび修理支援方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19866602

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 08/07/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19866602

Country of ref document: EP

Kind code of ref document: A1