CN115994846B - Public health management system for processing trace source of cash microorganism data of financial system - Google Patents

Public health management system for processing trace source of cash microorganism data of financial system Download PDF

Info

Publication number
CN115994846B
CN115994846B CN202310295082.6A CN202310295082A CN115994846B CN 115994846 B CN115994846 B CN 115994846B CN 202310295082 A CN202310295082 A CN 202310295082A CN 115994846 B CN115994846 B CN 115994846B
Authority
CN
China
Prior art keywords
time
circulation
public health
obtaining
banknote
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310295082.6A
Other languages
Chinese (zh)
Other versions
CN115994846A (en
Inventor
柳长庆
巫远程
张建中
张流波
孟凡博
李涛
廖少强
徐燕
尤元海
肖迪
张剑
吴晓松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunchao Financial Services Beijing Co ltd
Original Assignee
Yunchao Financial Services Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunchao Financial Services Beijing Co ltd filed Critical Yunchao Financial Services Beijing Co ltd
Priority to CN202310295082.6A priority Critical patent/CN115994846B/en
Publication of CN115994846A publication Critical patent/CN115994846A/en
Application granted granted Critical
Publication of CN115994846B publication Critical patent/CN115994846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the technical field of data processing, and provides a public health management system for processing trace sources of cash microorganism data of a financial system. The system comprises: recording the date and the geographic position of each banknote when the banknote flows back; acquiring a long-time circulation characteristic descriptor and a short-time circulation characteristic descriptor of each paper money; obtaining a geographical circulation range index, a reflux stability index and an activity rating index of each banknote; obtaining a microorganism characteristic descriptor of each paper currency; obtaining a public health event, obtaining a plurality of circulation modes by using clustering, obtaining general microorganism feature descriptors of each circulation mode, obtaining an adjusted public health event, obtaining a token Q corresponding to each record, obtaining an LOF value according to each cash situation feature, reducing the illusion phenomenon of LLM training based on the proportion of Dropout obtaining the token Q, and inputting the token Q and a plurality of guide sentences into the LLM to obtain an emergency response scheme.

Description

Public health management system for processing trace source of cash microorganism data of financial system
Technical Field
The invention relates to the technical field of data processing, in particular to a public health management system for processing financial system cash microorganism data traceability.
Background
In the context of our country's online payments replacing a large number of offline payments, notes are still in absolute position for cash circulation and cannot be replaced. Since cash entities are always subject to microbial contamination, these microorganisms may be transmitted by direct (intimate contact) or indirect (food or other inanimate objects) means. Paper money is still a potential threat to public health at present, and at the present time when government big data and artificial intelligence applications achieve remarkable results, a system for analyzing characteristics carrying microorganisms and public health event response schemes based on the situation of cash circulation of a financial system still lacks scientifically.
The invention provides an emergency response scheme for analyzing the microbial characteristics of paper money and the circulation characteristics of paper money in a financial system and generating a public health event scheme based on historical big data based on cash circulation of the financial system.
Disclosure of Invention
The invention provides a public health management system for processing cash microorganism data traceability of a financial system, which aims to solve the problem that an emergency response scheme cannot be generated based on circulation situation characteristics of paper money and characteristics of carrying microorganisms in the prior art.
One embodiment of the invention provides a public health management system for processing trace sources of cash microorganism data of a financial system, which comprises the following modules: the paper currency information acquisition module records the information of paper currency with reflux behavior; the banknote reflux characteristic descriptor obtaining module is used for obtaining a long-time circulation characteristic descriptor and a short-time circulation characteristic descriptor according to the information of each banknote with reflux behaviors; obtaining a geographic circulation range index, a reflux stability index and an activity rating index according to the long-time circulation characteristic descriptors and the short-time circulation characteristic descriptors; selecting a plurality of paper currencies with activity rating indexes larger than a threshold value for swab test, and obtaining a microbial characteristic descriptor of each paper currencies according to the result of the swab test; the method comprises the steps that an adjusted public health event acquisition module acquires a public health event, data of the public health event is structured to obtain public health event data, the public health event data is composed of guide sentences and text contents, the starting time and the ending time of the public health event are acquired, each time of backflow of paper money containing microorganisms of the public health event is taken as a record in the public health event, a plurality of records are acquired in the public health event, different clusters are obtained through clustering, each cluster is in a circulation mode, and the measurement distance used by the clustering is obtained according to long-time circulation feature descriptors and short-time circulation feature descriptors of any two records; under each circulation mode, obtaining a general microorganism feature descriptor of each circulation mode according to all recorded microorganism feature descriptors, screening abnormal records of each circulation mode according to each recorded microorganism feature descriptor and the general microorganism feature descriptor to obtain a main circulation mode of each public health event, and adjusting the public health event according to the main circulation mode to obtain an adjusted public health event; the Token Q acquisition module acquires cash situation characteristics of each record according to the twin network by utilizing a long-time circulation characteristic descriptor, a short-time circulation characteristic descriptor and a microorganism characteristic descriptor of each record for a plurality of adjusted public health events, and constructs the cash situation characteristics as Token to obtain Token Q; the ratio acquisition module of the Dropout of the cash situation features obtains LOF values corresponding to the cash situation features according to the cash situation features recorded, and obtains the ratio of the Dropout of the TokenQ according to the LOF values corresponding to the cash situation features; the emergency response scheme making module trains the LLM based on the calculated proportion of the Dropout of the cash situation characteristics, inputs TokenQ and a plurality of guide sentences into the LLM to obtain generated contents, and obtains the public health event emergency response scheme according to the generated contents.
Preferably, the information of the banknote includes a date and a geographical location of each return of the banknote, and the ID of the banknote is associated with the date and the geographical location of each return of the banknote.
Preferably, the method for obtaining the long-time circulation characteristic descriptors and the short-time circulation characteristic descriptors according to the information of each banknote with the reflux action comprises the following steps: setting a long-time threshold, counting the date and the geographic position of each banknote when the banknote reflows in the time corresponding to the long-time threshold, calculating the difference between two adjacent dates of each banknote to obtain a time interval, recording the average value of the calculated time intervals as a time interval central value, obtaining a position central value according to the geographic position of each reflow, wherein the position central value is represented by longitude and latitude, obtaining the time mean square error of each banknote according to the average value of the time interval and the time interval, obtaining the position mean square error according to the distance between the position central value and the geographic position, and the long-time circulation feature descriptor is composed of the position central value, the time interval central value, the time mean square error and the position mean square error; setting a short time threshold, and calculating the position center value, the time interval center value, the time mean square error and the position mean square error of each banknote in a short time in the time corresponding to the short time threshold, wherein the short time circulation feature descriptor consists of the position center value, the time interval center value, the time mean square error and the position mean square error which are calculated in the short time.
Preferably, the method for obtaining the geographic circulation range index, the reflux stability index and the activity rating index according to the long-time circulation characteristic descriptors and the short-time circulation characteristic descriptors comprises the following steps: taking the mean value of the position mean square errors of the long-time circulation feature descriptors and the short-time circulation feature descriptors as a geographic circulation range index, taking the ratio of the maximum value of the time interval central values of the long-time circulation feature descriptors and the short-time circulation feature descriptors to the time interval central value of the long-time circulation feature descriptors as a reflux stability index, and taking the product of the inverse of the reflux stability index and the geographic circulation range index of each paper money as an activity rating index.
Preferably, the method for obtaining the microbial characterization descriptors of each banknote according to the result of the swab test is as follows: in a public health event, detecting the flora type of each paper currency, marking the flora type affecting the public health time as 1, marking the flora type and the flora type as 0, obtaining a binary code with a fixed dimension, wherein the binary code with the fixed dimension represents a microbial characteristic descriptor of each paper currency.
Preferably, the method for acquiring the measurement distance of any two records comprises the following steps:
Figure SMS_2
Wherein a, b represent any two records a and b, < >>
Figure SMS_4
Mean square error of position in long-time flow-through feature descriptor representing record a,/>
Figure SMS_6
Position mean square error in short-time flow-through feature descriptor representing record a, < >>
Figure SMS_8
Time mean square error in long-time flow-through feature descriptor representing record a,/>
Figure SMS_9
Time mean square error in short-time flow-through feature descriptor representing record a, < >>
Figure SMS_10
Long time stream representing record bPosition mean square error in the general feature descriptor, < >>
Figure SMS_11
Position mean square error in short-time flow-through feature descriptor representing record b, < >>
Figure SMS_1
Time mean square error in long-time flow-through feature descriptor representing record b,/and method for recording>
Figure SMS_3
Representing the temporal mean square error in the short-time flow-through feature descriptor of record b,
Figure SMS_5
representing cosine similarity,/->
Figure SMS_7
Representing a metric function between record a and record b.
Preferably, the method for obtaining the overall microbial profile descriptor for each flow-through mode from all recorded microbial profile descriptors is as follows: and for each circulation mode, performing OR operation on all recorded microbial feature descriptors in each circulation mode, namely 1 is obtained when 1 exists in the two codes, 0 is obtained when 1 does not exist, and obtaining the total microbial feature descriptors of each circulation mode after all records or operation.
Preferably, the method for screening abnormal records for each circulation mode according to each recorded microbial characteristic descriptor and total microbial characteristic descriptor comprises the following steps: and calculating the hamming distance by using the total microorganism feature descriptors and the microorganism feature descriptors of each record, descending and sequencing the calculated hamming distance, and rejecting records larger than the screening threshold value as abnormal records.
Preferably, the method for adjusting the public health event according to the main circulation mode to obtain the adjusted public health event comprises the following steps: calculating the average value of the average time interval value in the short-time circulation characteristic descriptors in the main flow circulation mode as the time interval of the typical circulation, and shifting the starting time of the public health event forward by the time length of the time interval of the typical circulation to obtain the adjusted public health event.
The beneficial effects of the invention are as follows: the invention provides a method for automatically forming a reliable large-scale language model (Large Language Model, LLM) fine adjustment data set by utilizing and analyzing public health event data, banknote microorganism characteristic data and banknote circulation characteristic data in a financial system, so as to obtain a generating system of a public health event scheme based on the LLM; in addition, a method for training LLM in stages based on building fine-tuning data sets is provided, so that a more accurate emergency response management scheme is generated.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a schematic diagram of a public health management system for processing financial system cash microorganism data trace sources according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flowchart of a public health management system for processing trace sources of cash microorganism data of a financial system according to an embodiment of the present invention is shown, where the system includes a banknote information acquisition module, a banknote return feature descriptor acquisition module, an adjusted public health event acquisition module, a token q acquisition module, a Dropout ratio acquisition module of cash situation features, and an emergency response scheme formulation module.
The banknote information acquisition module records banknote information of the reflow behavior. If the currency attribute of a banknote is rich, this banknote is said to be more current. The microbial records of the surface are more range-representative and time-efficient and are more suitable as references for microbial records of public health events. This embodiment selects notes for ATM and counter. Since these two sources of banknotes are directly bound to the financial system, they are the subject of easy standardization implementation. The present embodiment can select only the rmb with a denomination of 100 yuan and create an instance for each banknote according to the banknote number. The example refers to the fact that the practitioner can build an object in the database, the object having a unique ID, i.e. the banknote number. Based on the banknote number, the time for each circulation of a banknote back to the financial system can be continuously recorded. Thus, the numbers of the banknotes collected by the counter and the ATM are counted every day, so that an instance of each banknote is established, and each time the banknote is returned to the counter or the ATM, the date and the geographical position of the place where the banknote is returned are recorded. Thus, each time a banknote is returned to the financial system, either a bank counter or an ATM, the current instance corresponding to the banknote records a date and geographic location. The banknote information acquisition module acquires the numbers of a plurality of banknotes and the date and the geographic position of each banknote when the banknote returns each time.
The banknote reflux characteristic descriptor obtaining module is used for obtaining a long-time circulation characteristic descriptor and a short-time circulation characteristic descriptor according to the information of each banknote with reflux behaviors; obtaining a geographic circulation range index, a reflux stability index and an activity rating index according to the long-time circulation characteristic descriptors and the short-time circulation characteristic descriptors; selecting a plurality of paper currencies with activity rating indexes larger than a threshold value for swab test, and obtaining a microbial characteristic descriptor of each paper currencies according to the result of the swab test.
The banknote information is obtained in a long-term and continuous recording mode, so that the time and the position of a plurality of banknotes flowing back to a financial system can be obtained according to the recording in a period of time. Based on each recording time and position, banknote circulation characteristic data of each banknote at that time can be obtained based on the recorded distribution.
First, for a banknote instance, whenever the banknote instance is returned to the financial system, the return time and the return geographical location of the banknote instance are counted. Since one banknote may not be refluxed or a plurality of refluxed may be generated within one day, a relatively long observation period is required. In this example, a half year is taken as an example, and a long-time circulation characteristic descriptor is established, that is, the return time and the return position of each banknote in half year are counted. Within a half year refers to any one half year before the onset of a public health event.
And obtaining the time interval of each banknote according to all the reflux time of each banknote, and calculating the average value of all the time intervals of each banknote as a time interval central value TM. The mean square error of all time intervals and the mean value of the time intervals is calculated and recorded as a time mean square error TMSE. The position center value LM is obtained from the return position of each banknote, and the position is represented by coordinates, and thus the position center value is also represented by coordinates. The coordinates of the position are longitude and latitude, and the mean square error of the distance between the calculated position central value and each position is recorded as the position mean square error LMSE.
The long-time circulation characteristic descriptor M1 of the paper money is formed by a time interval central value TM, a position central value LM, a time mean square error TMSE and a position mean square error LMSE. The long-time flow-through feature descriptor M1 has 5 values in total, wherein the position center value LM has two values and the rest is one value. The time mean square error TMSE refers to the difference of the time intervals, and the larger the difference is, the time is not fixed. Also, the larger the location mean square error LMSE, the more divergent the location.
Based on the established method of the long-time circulation characteristic descriptor M1, a relatively short observation time length is set to obtain a short-time circulation characteristic descriptor M2. Depending on the burstiness of the public health event, it may be set to one month in this embodiment, one month referring to any month before the public health event starts. Similarly, the corresponding data are processed into a time interval central value TM, a position central value LM, a time mean square error TMSE and a position mean square error LMSE to form a short-time flow characteristic descriptor M2 after the reflow. The short-time flow-through feature descriptor M2 has 5 values, wherein the position center value has 2 values, and the rest has 1 value.
The value representing the circulation characteristic is irrelevant to the number of times of the paper money flowing back to the financial system through the characteristic description of the long period and the short period, and is relevant to the circulation characteristic related to the circulation range and the time effect. Further, the geographical circulation range index S1 is obtained according to the position mean square error of the long-time circulation feature descriptor and the short-time circulation feature descriptor, and the formula is as follows:
Figure SMS_12
wherein->
Figure SMS_13
Representing the position mean square error in the long-term flow-through feature descriptor M1, < >>
Figure SMS_14
Mean square error of the position representing the short-time flow-through feature descriptor M2 +.>
Figure SMS_15
Is a mean function>
Figure SMS_16
Indicating a geographical circulation range indicator. The larger S1 is, the larger the geographical position span of cash circulation in one area is, and the geographical circulation range index is wider. The cash circulation range can be determined together according to the mean value of the position mean square error LMSE of the long-time circulation characteristic descriptor M1 and the short-time circulation characteristic descriptor M2.
Further, the reflux stability index S2 is obtained according to the time interval center values of the long-time flow feature descriptor M1 and the short-time flow feature descriptor M2, and the formula is as follows:
Figure SMS_17
wherein->
Figure SMS_18
Time interval center value in a description of a characteristic of a long-term circulation,/- >
Figure SMS_19
A time interval center value representing a short-time flow-through feature descriptor,
Figure SMS_20
represents a maximum function>
Figure SMS_21
Indicating the reflux stability index. Since the time for which the microorganism is active is limited, if the TM of the short-time flow-through feature descriptor is smaller than the TM of the long-time flow-through feature descriptor, the effective component of the microorganism may not be accurately represented, and the smaller S2 indicates that the shorter the time interval of the backflow of the short-time flow-through feature descriptor is smaller than the time interval of the backflow of the long-time flow-through feature descriptor, the faster the backflow of the long-time flow-through feature descriptor is possible, indicating that the better the backflow stability index is.
Obtaining an activity rating index S of each banknote based on the geographical circulation range index and the reflux stability index, wherein
Figure SMS_22
. The activity is comprehensively determined by the geographic activity range and the difficulty degree of backflow, the higher the activity is, the easier the sampling is, the larger the geographic activity range is, and the microbial diversity at the moment can be represented.
Because of the high cost of microbiological analysis of notes, only highly rated notes are analyzed and sampled. The reason is that after the banknote circulation characteristics are used for grading, more active banknotes can be found out, so that the capability of the banknote microbial characteristics for representing recent public health events is improved.
According to each station with sampling, K paper money is selected as a sampling object, wherein K is more than or equal to 10. In this embodiment, K may be 20, that is, for a sample planned by a station, a sample of the bill that flows back to the station at this time is selected, and a microorganism sampling operation is performed for the first 20 bill samples with the largest S value.
The known flora is numbered, and each time a new flora is obtained, the flora is numbered once, and a positive integer number is performed starting from 1. If the colonies which have been numbered are collected, they are not numbered again, i.e. each different colony is numbered differently, and thus the presence or absence of colonies after the swab test is determined. Because viral variants are more active, they can be labeled against a small number of common influenza viruses.
For example, in a swab test result for one banknote collected during a period associated with a public health event of anthrax, bacillus, penicillin, fusarium and saccharomyces cerevisiae were found, numbered 119, 3481, 2750, 821, respectively. It may cause immunosuppressive diseases and may also infect anthrax. The number is marked as 1 (with) or 0 (without) for detection or not. Assuming that 5000 numbers are established in total, the numbers belonging to 119, 3481, 2750, 821 are 1, and the rest are 0; finally, 5000 numbers are denoted as 5000-dimensional microbial characterization descriptors W1, the dimensions being fixed at 5000.
After each banknote is rated, the microbial characteristic descriptor W1 of the banknote is obtained after each reflux of the banknote with higher rating.
So far, after each banknote is refluxed once, if the liveness of the banknote is high enough, the banknote participates in microorganism sampling, and three feature descriptors W1, M1 and M2 are obtained.
The method comprises the steps of acquiring a public health event after adjustment, structuring data of the public health event to obtain public health event data, wherein the public health event data consists of guide sentences and text contents, acquiring starting and ending time of the public health event, taking each reflux of paper money containing microorganisms of the public health event as one record in the public health event, acquiring a plurality of records in the public health event, using clusters to obtain different clusters, wherein each cluster is in a circulation mode, and the measurement distance used by the clusters is obtained according to long-time circulation feature descriptors and short-time circulation feature descriptors of any two records. Public health event data refers to a series of descriptive information before and after the occurrence of a public health event and emergency response schemes thereof, and can be manually arranged into texts with template properties. The semantic field included in the method can be the corresponding semantic of the guide sentence of the emergency scheme content, for example, the semantic field of the guide sentence 'emergency response measure' can be 'measure' and 'response', and the concept of the semantic field is the same as that of the field, namely the field name of one data item. The guiding sentence and the corresponding content are the data corresponding to the field name; and the LLM can write corresponding contents based on the guide sentence as long as the LLM after the fine tuning training is constructed based on the text generation mode of the LLM. Once the association relation of the cash situation features is established, the association relation can be used as a Token according to the cash situation features, and the LLM is driven to generate emergency scheme content meeting expectations according to the guiding sentence, for example, the association relation is implemented through a Token Q acquisition module of the application.
The present embodiment proposes a typical field template as a format of structured data, and each event data of the present embodiment includes the following. Guide sentence: event type and severity, content: the type and severity of public health events are described in terms of natural language. Guide sentence: the basic situation is as follows: the basic conditions of the cause, time, place, propagation mode and the like of the event are described. Guide sentence: the emergency response measures comprise the following contents: the emergency response measures taken for the event are listed, including isolation, quarantine, medical treatment, material assurance, and the like. Guide sentence: risk assessment and prediction, the content is: the trend of the event is assessed and predicted based on the index data, including but not limited to infection rate, mortality rate. Guide sentence: the information release and coordination mechanism comprises the following contents: information release and coordination mechanisms are described, including emergency communication, information sharing, command scheduling and the like.
The above is merely an example of a format of structured data, and is specifically characterized by creating fields for public health events, and constructing structured data in the fields by using natural language and index data, i.e. each public health event corresponds to one structured data. It will be appreciated by those skilled in the art that conventional tables may naturally be constructed in terms of guide sentences for public health events, as shown in the following tables.
Figure SMS_23
The table is structured data, namely, the corresponding title of each item of the table is a guiding sentence (for example, an information release and coordination mechanism), and the content natural language text or index data corresponding to the guiding sentence is data corresponding to a field.
The technical feature of the present embodiment is to establish a public health event data format and obtain structured public health event data, where the semantic field must contain semantics corresponding to "emergency response measure", "information publishing and coordination mechanism". I.e. there are two field names belonging to the two meanings, e.g. "measure" and "mechanism", respectively. This is the core content of the emergency response scheme. Based on this, the implementer may further add the design and content of the guide sentence of other semantic fields.
The method for quantifying public health events by manually describing the corresponding content of each guide sentence of the emergency scheme according to the way that the guide sentence uses natural language, and even using index data is considered to belong to the category of record labeling. The present invention is only required to sort public health event data into structured data in the manner described above. The degree of detail, content, and accuracy of the structured data depends on the practitioner's conditions and manner of construction.
The guide sentence and the field name represent the corresponding information, such as emergency response measures, information release and coordination mechanism, and the content corresponding to the semantic field is the scheme content.
In this embodiment, based on the analysis of the time relationship between the long-time circulation feature descriptors M1 and the short-time circulation feature descriptors M2 of each record, the structured public health event data and records are associated, and for a plurality of records including the corresponding microbial feature descriptors W1 and the long-time circulation feature descriptors M1 and the short-time circulation feature descriptors M2 within the range of a public health event, the records of the public health event correspond to each record, namely, each return of each banknote is regarded as one record, and the records include the long-time circulation feature descriptors and the short-time circulation feature descriptors of the corresponding banknote and the microbial feature descriptors of the banknote at the time of the return.
For this public health event, the time interval distribution and the spatial distribution of each record are determined, thus representing a different circulation pattern G under the current public health event.
For any two records, a measurement function between circulation modes is constructed according to the time mean square error and the position mean square error of the recorded long-time circulation characteristic descriptors and the short-time circulation characteristic descriptors, and the formula is as follows:
Figure SMS_25
Wherein a, b represent any two records a and b, < >>
Figure SMS_27
Time mean square error in long-time flow-through feature descriptor representing record a,/>
Figure SMS_29
Time mean square error in short-time flow-through feature descriptor representing record a, < >>
Figure SMS_31
Mean square error of position in long-time flow-through feature descriptor representing record a,/>
Figure SMS_32
Position mean square error in short-time flow-through feature descriptor representing record a, < >>
Figure SMS_33
Time mean square error in long-time flow-through feature descriptor representing record b,/and method for recording>
Figure SMS_34
Representation ofRecording the temporal mean square error in the short-time flow-through feature descriptor of b, < >>
Figure SMS_24
Mean square error of position in long-time flow-through feature descriptor representing record b,/and>
Figure SMS_26
representing the position mean square error in the short-time flow-through feature descriptor of record b,
Figure SMS_28
representing cosine similarity,/->
Figure SMS_30
Representing a metric function between record a and record b.
In the formula, a and b are any two records, then all records are based on a metric function, and the distribution of the records in the whole imaginary space can be established, wherein one record is closer to the other record, and the other record is farther.
The method of subtracting the cosine similarity from 1 is a method for calculating the cosine distance based on the cosine similarity, wherein the cosine similarity between vectors corresponding to a and b is calculated according to four-dimensional vectors formed by a time mean square error TMSE and a position mean square error LMSE in a long-time circulation characteristic descriptor M1 and a short-time circulation characteristic descriptor M2 in a record, and then converted into the cosine distance. If the time mean square error and the position mean square error values of the two recorded long-time flow feature descriptors and the short-time flow feature descriptors are similar, the flow patterns are considered to be similar.
Further, based on the calculated metric function, using DBSCAN clustering, a parameter of minpts=20, r=0.3 is adopted, i.e. a number of records greater than 20 are core points only if the metric function is within a range of 0.3. Based on the above, different circulation modes G, different clusters exist in the G, isolated records exist, the isolated records are removed, no follow-up processing is participated, and each cluster is a circulation mode.
Further, for the start and end processes of a public health event, different circulation modes exist in the corresponding records, and the corresponding mode of W1 in the circulation modes is determined according to the records with abnormal W1 removed.
For different circulation modes, calculating the general microorganism feature descriptors W2 and W1 of the microorganisms possibly detected in each circulation mode is binary coding, namely the numerical value of each dimension is 0 or 1, and therefore, performing OR operation on all recorded W1 in a cluster of one circulation mode, wherein the W1 is binary coding, and therefore, the OR operation is that when the existence value of any two codes at the same position is 1, the existence value is 0, and the nonexistence value is 0, so that most of the values in the general microorganism feature descriptors W2 and W2 codes of the microorganisms possibly detected in one circulation mode are 1. Based on W2, the hamming distances of W1 and W2 of all records in the cluster are calculated, and the larger the hamming distance, the more codes W1 and W2 are identical, namely, the more codes with a value of 1 in W1, the more codes with a value of 1 are, the less the value of 0 is, and the fewer the types of microorganisms which W1 can detect are indicated.
For too large a distance, it is indicated that in a similar circulation mode, the distribution of W1 is dissimilar to other banknotes, and thus the microbial characteristics at the time of occurrence of a public health event cannot be well represented, so that abnormal records are removed, therefore, the practitioner sets a threshold ratio, and in this embodiment, records in a cluster are sorted in descending order based on hamming distances, and for the first 10%, the abnormal records are regarded as abnormal records, and removed.
For different circulation patterns, there is an average time interval TM in the short-term circulation feature descriptor M1 in each circulation pattern record for the records after the abnormal microbial feature is removed.
Taking the most recorded circulation mode in each cluster as a main circulation mode, and calculating the average value of the average time interval TM value in the short-time circulation characteristic descriptor M2 in the main circulation mode as the time interval of typical circulation. The start time of the public health event is redetermined according to the time interval of the typical circulation, specifically, the start time of the public health event is shifted forward by the time length of the time interval of the typical circulation, and the purpose of the start time is to be corrected, so that the W1 of the public health event can be captured. Based on this, the circulation pattern G is redetermined based on the new public health event start time and end time, and the new public health event is recorded as an adjusted public health event based on the record of W1 of the abnormality is removed from W2 of each pattern.
In this way, a more accurate and typical record is obtained for automatically establishing a public health event development process, and training of the twin network T is facilitated, so that more accurate cash situation characteristics are obtained.
For the TokenQ acquisition module, when a public health event happens, the record of the banknote example which has high activity and collects the microorganism record in all the banknotes within the time range of the adjusted public health event can represent the microorganism characteristic conforming to the public health event, and the microorganism characteristic distribution of the microorganism characteristic under different flow characteristics can be further represented by combining the circulation characteristic. Thus, in combination with W1, M2, the twin network T is trained according to the result of the association.
The twin network functions to distinguish tags of different public health events and represent them in a high-dimensional space based on records so as to maintain a large separation from the recorded distances of other public health events over the distance of the high-dimensional space.
Since the future records for observing and implementing the emergency scheme for generating public health events are non-uniform, multi-version and lack of uniform description, in order to input uniform features and add guide sentences on the premise that LLM such as GPT-2 belongs to the form of RNN, a method for describing the records W1, M1 and M2 as uniform cash situation features Q is required, and LLM refers to a large language model. Thereby enabling the LLM to output corresponding contents based on the cash situation feature Q and the guide sentence. Therefore, the twin network T is established, and the same label is established for the cash situation characteristic Q in one public health event according to the association relation, so that the cash situation characteristics Q among a plurality of public health events are distinguished.
In a twin network, the architecture of the network is the same and the weights are shared among the subnetworks, such a network being referred to as a twin network. The purpose of the twin network is to learn and output useful data descriptors according to the same data and different data, and the descriptors can be further used for comparing the input of each subnet, for example, training technology used by applications such as face recognition is the twin network. The twin network of the invention is an FCN (fully connected neural network) with inputs of several values, namely a cash situation feature Q.
The method of inputting FCN for each record is as follows, there are three descriptors of W1, M2 for one record, and the three descriptors are spliced into a high-dimensional descriptor containing all values of W1, M2.
After any two records are input into the network, labeling each record, wherein the label is the event name of a public health event, if the labels of the records are the same, the twin network needs to re-describe the cash situation characteristics Q of the two records into descriptors similar in high-dimensional space, specifically, two descriptors are obtained, so that the cosine similarity is close to 1 after the FCN under the same label. And vice versa approaches 0.
The dimension of the new descriptor of the output of the FCN is 128 dimensions, that is, the cash situation feature Q is 128 dimensions, and the specific dimension number can be freely defined by manpower. The training method of the twin network is well known, how to iteratively converge, how to adjust the conventional super parameters and the like are well known to those skilled in the art, and the invention is not repeated. The method can solve the problem of how to uniformly describe the microbial characteristics and circulation characteristics of paper money as the characteristics suitable for LLM as guiding information.
Therefore, for any record, the cash situation feature Q can be generated through the twin network T, that is, each record obtains one cash situation feature, and there is a large high-dimensional space difference between the cash situation feature Q and the cash situation feature Q in other public health events, so that LLM can generate contents based on the cash situation feature Q and the guiding sentence.
Structured public health event data of historically occurring public health events can be used as corpus data in a training set of fine-tuning LLMs, but the corpus data at this time also lacks accurate correlation with cash situation features. Therefore, an association relationship between the cash situation feature Q and the structured public health event data needs to be established, that is, the input of the LLM can be used as the input according to the cash situation feature Q and the guiding sentence, so that the LLM can generate emergency plan content meeting expectations according to the previous input.
Based on the cash situation characteristic Q and the guide sentence, a training set for training the LLM is constructed, and public health event data are labeled in the training set. Since the corpus is constructed by Token, the cash situation feature Q is not Token, and in order to construct the cash situation feature Q as Token, namely word segmentation, first, a relationship with Token is established for each numerical value of the cash situation feature Q. The method comprises the steps of firstly determining that all values in the cash situation characteristic Q possibly reach a value range, and determining the value range by finding out the maximum value and the minimum value of a certain dimension of the cash situation characteristic Q when training the twin network, so as to determine the value range possibly reached by each value of the cash situation characteristic Q.
The invention encodes based on Token of GPT, GPT encodes by Byte-Pair Encoding-BPE method, assuming that vocabulary size of the pre-trained GPT model is 40478 and the range of maximum and minimum value of each numerical value of cash situation feature Q is [ -1000,50000], the range of cash situation feature Q is converted into [0,40478], namely the cash situation feature Q is converted into a character in this way, thereby capturing the feature in LLM such as GPT. Based on this method, the cash situation feature Q is converted into Token, denoted Token Q, which is fixed-length, i.e., 128.
For the data set, tokenQ and the guide sentence are used as the input of LLM, and are output as the corresponding content of the guide sentence.
The proportional acquisition module of Dropout of cash situation characteristics, the priori knowledge in the language model is all from the corpus pre-trained previously, because the big data corpus of the training language model can not be free from containing some wrong information when being collected, the wrong knowledge can be learned and stored in model parameters, the model can give priority to self parameterized knowledge when generating text, so that illusion content is more prone to be generated, and the illusion content is also the source of most of illusions in the natural language generation task.
For fine tuning tasks, if the recording distribution is not balanced, some tasks may be more prone to illusion, i.e. generating erroneous content, under the influence of long tail effects, such as: assuming that two distinct public health event records are present, when the LLM severely experiences a illusion, it may erroneously generate a cash-situation feature for one public health event to the majority of the contents of the scheme corresponding to the other public health event.
When training LLM, in order to make random modification to Token q based on limited Token q, guide sentence and corresponding text content, it is called Dropout operation, specifically, to randomly select a part of Token in Token q to be zeroed or converted into random characters, so that LLM can generate correct scheme content according to information of part of Token q.
Since the microbiological, long and short time features in all cash records have been mapped to high dimensional space, and the cash situational feature Q is significantly different from the cash situational features Q in other public health events. Considering that the record with larger LOF belongs to long tail record, LLM is effective in generating the correct content as much as possible, so for long tail record, the Token number of Dropout operation needs to be further increased, and overfitting is avoided, so that illusion content is generated.
Specifically, the LOF is calculated by: the space in which the cash situation features Q are located is a 128-dimensional high-level space, each cash situation feature Q is a recording point of the high-dimensional space, and the distance between each cash situation feature Q is calculated based on the cosine distance.
In this high-dimensional space, it can be assumed that one record point p, LOF represents the average of the ratio of the local reachable density of K neighborhood points of point p to the local reachable density of point p, so that LOF can represent the degree to which p is an outlier.
If the ratio is closer to 1, the density of the neighborhood points of p is almost the same as that of the neighborhood, and p is possibly in the same cluster with the neighborhood; if the ratio is smaller than 1, the density of p is higher than the density of the neighborhood points, and p is a dense point; if this ratio is greater than 1, the density of p is less than the density of its neighborhood points, and p is more likely to be an outlier.
For the local reachable density of each R neighborhood point, the R value is set to 20, namely, the cash situation characteristics Q corresponding to one public health event are assumed to be at least 20, so that the local reachable density is calculated according to the nearest 20 records of each record.
Therefore, based on the cosine distance between different cash situation features Q, the LOF size corresponding to each cash situation feature Q can be calculated.
The illusion problem of LLM can be better represented by the data set establishing method and the representation effect of the cash situation feature Q by correcting the Dropout proportion for each cash situation feature Q, and the overfitting phenomenon can be obviously reduced by omitting half of the feature detectors in each training batch. In this way, the interaction between feature detectors, dropout being an neglected proportion, is reduced, and detector interaction means that some detectors rely on others to function, further suppressing the illusion of LLM. Specifically, the Dropout ratio is dynamic according to the LOF, and first, a minimum Dropout ratio and a maximum Dropout ratio are manually set, and in this embodiment, the minimum Dropout ratio and the maximum Dropout ratio are set to 5% and 25%, respectively.
The ratio of Dropout of each recorded cash situation feature Q is calculated as: and calculating LOF values corresponding to all cash situation characteristics Q, wherein the larger the LOF is, the larger the Dropout proportion is. All LFOs of the cash-situation feature Q were linearly normalized such that the maximum LOF value was 0.25 and the minimum LOF value was 0.05. The normalized LOF value is used as an input to obtain the Dropout ratio of the TokenQ. To this end, the proportion of Dropout per cash register situation feature Q is obtained.
For the emergency response scheme making module, based on the calculated Dropout proportion of the cash situation characteristic Q, the illusion phenomenon of LLM training is relieved, and based on the trained LLM, a user can generate TokenQ according to a record (W1, M1, M2) detected at the moment of cash, and text content for the public health event emergency response scheme is obtained by combining a guiding sentence.
First, each record is converted into token q, and a guide sentence is added, and for an emergency response scheme, there are multiple guide sentences, so 128 values of token q need to be input first, and then guide sentences are added, for example: an information release and coordination mechanism; the LLM is then caused to generate content up to several characters in length. The practitioner resets the network and continues to use other guide sentences based on token q, thereby rapidly letting LLM generate each content of the most appropriate emergency plan for a banknote from the previous public health event plans based on its microbiological data and its recent circulation characteristics.
Based on the detection result of the microbial characteristics of each banknote, the token Q corresponding to each banknote can be obtained by combining the circulation characteristic descriptors M1 and M2, and an implementer can obtain the text content of the required public health event emergency response scheme according to different banknotes by using different guide sentences.
Therefore, an implementer can quickly obtain text contents of the public health event emergency response schemes of multiple versions according to some banknote records participating in detection, so that important information is quickly extracted from the contents of the multiple versions, and the establishment of the public health event emergency response schemes which are more scientific and effective is facilitated.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (8)

1. A public health management system for processing a financial system cash microorganism data trace source, comprising the following steps:
the banknote information acquisition module records banknote information of reflux behaviors;
the banknote reflux characteristic descriptor obtaining module is used for obtaining a long-time circulation characteristic descriptor and a short-time circulation characteristic descriptor according to the information of each banknote with reflux behaviors; obtaining a geographic circulation range index, a reflux stability index and an activity rating index according to the long-time circulation characteristic descriptors and the short-time circulation characteristic descriptors; selecting a plurality of paper currencies with activity rating indexes larger than a threshold value for swab test, and obtaining a microbial characteristic descriptor of each paper currencies according to the result of the swab test;
The method comprises the steps that an adjusted public health event acquisition module acquires a public health event, data of the public health event is structured to obtain public health event data, the public health event data is composed of guide sentences and text contents, the starting time and the ending time of the public health event are acquired, each time of backflow of paper money containing microorganisms of the public health event is taken as a record in the public health event, a plurality of records are acquired in the public health event, different clusters are obtained through clustering, each cluster is in a circulation mode, and the measurement distance used by the clustering is obtained according to long-time circulation feature descriptors and short-time circulation feature descriptors of any two records; under each circulation mode, obtaining a general microorganism feature descriptor of each circulation mode according to all recorded microorganism feature descriptors, screening abnormal records of each circulation mode according to each recorded microorganism feature descriptor and the general microorganism feature descriptor to obtain a main circulation mode of each public health event, and adjusting the public health event according to the main circulation mode to obtain an adjusted public health event;
the Token Q acquisition module acquires cash situation characteristics of each record according to the twin network by utilizing a long-time circulation characteristic descriptor, a short-time circulation characteristic descriptor and a microorganism characteristic descriptor of each record for a plurality of adjusted public health events, and constructs the cash situation characteristics as Token to obtain Token Q;
The ratio acquisition module of the Dropout of the cash situation features obtains LOF values corresponding to the cash situation features according to the cash situation features recorded, and obtains the ratio of the Dropout of the TokenQ according to the LOF values corresponding to the cash situation features; and
the emergency response scheme making module trains the LLM based on the calculated proportion of the Dropout of the cash situation characteristics, inputs the TokenQ and a plurality of guide sentences into the LLM to obtain the generated content, obtains the public health event emergency response scheme according to the generated content,
the method for obtaining the geographic circulation range index, the reflux stability index and the activity rating index according to the long-time circulation characteristic descriptors and the short-time circulation characteristic descriptors comprises the following steps: taking the mean value of the position mean square errors of the long-time circulation feature descriptors and the short-time circulation feature descriptors as a geographic circulation range index, taking the ratio of the maximum value of the time interval central values of the long-time circulation feature descriptors and the short-time circulation feature descriptors to the time interval central value of the long-time circulation feature descriptors as a reflux stability index, and taking the product of the inverse of the reflux stability index and the geographic circulation range index of each paper money as an activity rating index.
2. The public health management system for processing financial system cash microorganism data traceability of claim 1, wherein the information of the banknote includes a date and a geographic location of each return of the banknote, wherein an ID of the banknote is associated with the date and the geographic location of each return of the banknote.
3. The public health management system for processing financial system cash microorganism data traceability according to claim 1, wherein the method for obtaining the long-time circulation feature descriptor and the short-time circulation feature descriptor according to each piece of paper money information with reflux behavior comprises the following steps:
setting a long-time threshold, counting the date and the geographic position of each banknote when the banknote reflows in the time corresponding to the long-time threshold, calculating the difference between two adjacent dates of each banknote to obtain a time interval, recording the average value of the calculated time intervals as a time interval central value, obtaining a position central value according to the geographic position of each reflow, wherein the position central value is represented by longitude and latitude, obtaining the time mean square error of each banknote according to the average value of the time interval and the time interval, obtaining the position mean square error according to the distance between the position central value and the geographic position, and the long-time circulation feature descriptor is composed of the position central value, the time interval central value, the time mean square error and the position mean square error;
Setting a short time threshold, and calculating the position center value, the time interval center value, the time mean square error and the position mean square error of each banknote in a short time in the time corresponding to the short time threshold, wherein the short time circulation feature descriptor consists of the position center value, the time interval center value, the time mean square error and the position mean square error which are calculated in the short time.
4. The public health management system for processing financial system cash microorganism data traceability according to claim 1, wherein the method for obtaining the microorganism feature descriptor of each banknote according to the result of the swab test is as follows:
in a public health event, detecting the flora type of each paper currency, marking the flora type affecting the public health time as 1, marking the flora type and the flora type as 0, obtaining a binary code with a fixed dimension, wherein the binary code with the fixed dimension represents a microbial characteristic descriptor of each paper currency.
5. The public health management system for processing financial system cash microorganism data tracing of claim 1, wherein the method for obtaining the measured distance of any two records is as follows:
Figure QLYQS_2
wherein a, b represent any two records a and b, < > >
Figure QLYQS_4
Time mean square error in long-time flow-through feature descriptor representing record a,/>
Figure QLYQS_7
Time mean square error in short-time flow-through feature descriptor representing record a, < >>
Figure QLYQS_3
Mean square error of position in long-time flow-through feature descriptor representing record a,/>
Figure QLYQS_6
Position mean square error in short-time flow-through feature descriptor representing record a, < >>
Figure QLYQS_8
Time mean square error in long-time flow-through feature descriptor representing record b,/and method for recording>
Figure QLYQS_11
Time mean square error in short-time flow-through feature descriptor representing record b, < >>
Figure QLYQS_1
Mean square error of position in long-time flow-through feature descriptor representing record b,/and>
Figure QLYQS_5
representing the position mean square error in the short-time flow-through feature descriptor of record b,
Figure QLYQS_9
representing cosine similarity,/->
Figure QLYQS_10
Representing a metric function between record a and record b.
6. The public health management system for processing financial system cash microorganism data traceability of claim 1, wherein the method for obtaining the general microorganism feature descriptors of each circulation mode according to all recorded microorganism feature descriptors comprises the following steps:
and for each circulation mode, performing OR operation on all recorded microbial feature descriptors in each circulation mode, namely 1 is obtained when 1 exists in the two codes, 0 is obtained when 1 does not exist, and obtaining the total microbial feature descriptors of each circulation mode after all records or operation.
7. The public health management system for processing financial system cash microorganism data traceability of claim 1, wherein the method for screening abnormal records for each circulation pattern according to each recorded microorganism feature descriptor and total microorganism feature descriptor is as follows:
and calculating the hamming distance by using the total microorganism feature descriptors and the microorganism feature descriptors of each record, descending and sequencing the calculated hamming distance, and rejecting records larger than the screening threshold value as abnormal records.
8. The public health management system for processing financial system cash microorganism data traceability of claim 1, wherein the method for adjusting public health events according to the main circulation mode to obtain adjusted public health events comprises the following steps:
calculating the average value of the average time interval value in the short-time circulation characteristic descriptors in the main flow circulation mode as the time interval of the typical circulation, and shifting the starting time of the public health event forward by the time length of the time interval of the typical circulation to obtain the adjusted public health event.
CN202310295082.6A 2023-03-24 2023-03-24 Public health management system for processing trace source of cash microorganism data of financial system Active CN115994846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310295082.6A CN115994846B (en) 2023-03-24 2023-03-24 Public health management system for processing trace source of cash microorganism data of financial system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310295082.6A CN115994846B (en) 2023-03-24 2023-03-24 Public health management system for processing trace source of cash microorganism data of financial system

Publications (2)

Publication Number Publication Date
CN115994846A CN115994846A (en) 2023-04-21
CN115994846B true CN115994846B (en) 2023-05-19

Family

ID=85995443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310295082.6A Active CN115994846B (en) 2023-03-24 2023-03-24 Public health management system for processing trace source of cash microorganism data of financial system

Country Status (1)

Country Link
CN (1) CN115994846B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076649B (en) * 2023-10-13 2024-01-26 卓世科技(海南)有限公司 Emergency information query method and device based on large model thinking chain

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8881040B2 (en) * 2008-08-28 2014-11-04 Georgetown University System and method for detecting, collecting, analyzing, and communicating event-related information
CN109727681A (en) * 2017-10-30 2019-05-07 戚本昊 Disease epidemic situation prevention monitoring survey system
US11194691B2 (en) * 2019-05-31 2021-12-07 Gurucul Solutions, Llc Anomaly detection using deep learning models
WO2022088066A1 (en) * 2020-10-30 2022-05-05 柳长庆 Public health security financial early-warning system based on cash microorganism tracing and analysis
CN112185584A (en) * 2020-10-30 2021-01-05 柳长庆 Public health safety finance early warning system based on cash microorganism traceability analysis
CN114091443B (en) * 2021-10-28 2022-07-19 广州大学 Network information propagation index system construction and evaluation method based on deep learning

Also Published As

Publication number Publication date
CN115994846A (en) 2023-04-21

Similar Documents

Publication Publication Date Title
Roy et al. A deep ensemble framework for fake news detection and classification
Bornmann et al. Multilevel‐statistical reformulation of citation‐based university rankings: The Leiden ranking 2011/2012
Van Der Meer Automated content analysis and crisis communication research
CN110826320B (en) Sensitive data discovery method and system based on text recognition
Do et al. Multiview deep learning for predicting twitter users' location
CN108717433A (en) A kind of construction of knowledge base method and device of programming-oriented field question answering system
CN110134787A (en) A kind of news topic detection method
CN115994846B (en) Public health management system for processing trace source of cash microorganism data of financial system
CN114844840B (en) Method for detecting distributed external network flow data based on calculated likelihood ratio
Maini et al. Characterizing datapoints via second-split forgetting
Kacprzak et al. Making sense of numerical data-semantic labelling of web tables
CN109766911A (en) A kind of behavior prediction method
CN113032525A (en) False news detection method and device, electronic equipment and storage medium
CN111260145A (en) Method and system for predicting reading amount of WeChat public number article
Makhortykh et al. Panning for gold: Lessons learned from the platform-agnostic automated detection of political content in textual data
Zeng et al. Detecting journalism in the age of social media: three experiments in classifying journalists on twitter
Fale et al. A hybrid deep learning approach for abusive text detection
de Mello Araújo et al. Detecting Dutch Political Tweets: A Classifier based on Voting System using Supervised Learning.
Harrison et al. Using supervised machine learning to scale human‐coded data: A method and dataset in the board leadership context
Sicilia et al. Cross-topic rumour detection in the health domain
Mathews et al. Fake news detection: an effective content-based approach using machine learning techniques
Motaleb et al. Analyzing Human Abilities to Detect Fake Bangla News with Respect to Different Features
CN110705597B (en) Network early event detection method and system based on event cause and effect extraction
CN114443930A (en) News public opinion intelligent monitoring and analyzing method, system and computer storage medium
Roy et al. Feature extraction to filter out low-quality answers from social question answering sites

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant