CN113704399A

CN113704399A - Intelligent identification and storage method, system and storage medium for big data financial information

Info

Publication number: CN113704399A
Application number: CN202110904059.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Jiang Zhenghao
Current assignee: Jiang Zhenghao
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-11-26

Abstract

The invention discloses a big data financial information intelligent identification and storage method and a system, which are characterized in that a chat log between a big data user and an opposite user is obtained; extracting keywords of the chat records through a three-layer Bayesian probability model; identifying the importance index of the chat records according to the keywords of the chat records and the keywords of the front and back chat records of the chat records; the chat records are stored in the blocks corresponding to the importance indexes, so that the important events are stored in a partition mode, the same important events are stored in one block, a user can conveniently search and manage the chat records, and the effectiveness and convenience of user information storage, namely management are improved.

Description

Intelligent identification and storage method, system and storage medium for big data financial information

Technical Field

The invention relates to the technical field of information security, in particular to a big data financial information intelligent identification and storage method and system.

Background

Today, the communication through network chat is a way for making friends and work in daily life, and in the big data era, particularly for the fields of finance and education, a great deal of communication between people is needed in the fields. At present, people mainly rely on chat interfaces of software and websites of various chats, the chat records are not stored or all the chat records are stored in the local of the equipment, but when the storage space of the local of the equipment is full, the chat records must be cleared to release the space to support the progress of a new chat process.

In life or work, the chat records contain important information, such as statement about an event, promise about an event, amount agreement about an event, event agreement and the like, if a user forgets to record the important information, once the chat records containing the information are deleted, the basis for replying the event condition for a subsequent user is lost when the event is required to be replayed, and great troubles are brought to the user. But if all chat records are saved, a large amount of storage space is occupied. However, the user may not be able to determine whether the event is the one that needs to be copied later, and the user may not be aware of the important information at the time of the event, so the user does not record the information in time until the information is deleted when needed, and the user does not regress the information.

For this reason, a method of storing a chat log capable of automatically recognizing important information of a user and efficiently storing the important information is desired.

Disclosure of Invention

The invention aims to provide a method, a system and a storage medium for intelligently identifying and storing big data financial information, which are used for solving the existing problems.

In a first aspect, an embodiment of the present invention provides a big data financial information intelligent identification and storage method, where the method includes:

obtaining a chat log between a big data user and an opposite user; the chat log comprises a plurality of chat records, and each chat record is information of each speaking of the user or the opposite user in a dialog box; each chat record comprises a speaking object, speaking time and speaking content;

extracting keywords of the chat records through a three-layer Bayesian probability model; the keywords can represent behavioral intents of the chat records;

identifying the importance index of the chat records according to the keywords of the chat records and the keywords of the front and back chat records of the chat records; the keywords of the front and back chat records comprise keywords of a prepositive phrase section formed by the first N chat records of the chat records and keywords of a postive phrase section formed by the last M chat records of the chat records; n, M is 0 or a positive integer; the importance index represents a predicted degree of subsequent impact of the event stated by the chat log on the user;

and storing the chat records in a block corresponding to the importance index.

Optionally, before storing the chat records in the block corresponding to the importance index, the method further includes:

partitioning the storage space of the chat records to obtain a plurality of blocks, and determining the importance index of each block.

Optionally, the method further includes: and obtaining keywords of a prepositive phrase section formed by the first N chat records of the chat records, and obtaining keywords of a postive phrase section formed by the last M chat records of the chat records.

Optionally, the obtaining of the keyword of the prefix segment formed by the first N chat records of the chat records includes:

obtaining the speaking contents of the first N chat records of the chat records, and connecting the speaking contents end to end according to the sequence of speaking time to form the preposed speech segment;

extracting keywords of the preposed language segments based on the three-layer Bayesian probability model;

obtaining keywords of a postscript section formed by the last M chat records of the chat records, including:

obtaining the speaking contents of the last M chat records of the chat records, and connecting the speaking contents end to end according to the sequence of speaking time to form the post-positioned speech segment;

and extracting the keywords of the postword section based on the three-layer Bayesian probability model.

Optionally, the determining the importance index of the chat record according to the keyword of the chat record and the keywords of the preceding and following chat records of the chat record includes:

connecting the keywords of the chat records and the keywords of the front and back chat records of the chat records end to form a keyword entry;

forming a key phrase by the key words of the chat records and the key words of the front and back chat records of the chat records; the keyword group comprises a plurality of keywords;

obtaining a first correlation index between each keyword in the keyword group and a plurality of standard keywords in a big database;

obtaining a second correlation index between the keyword entry and a plurality of standard keywords in a big database;

the size of the first correlation index characterizes the size of similarity of the keyword to the standard keyword;

and taking the sum of the second correlation index and the first correlation index as the importance index of the chat records.

In a second aspect, an embodiment of the present method provides a big data financial information intelligent identification and storage system, where the system includes:

the obtaining module is used for obtaining a chat log between a big data user and an opposite user; the chat log comprises a plurality of chat records, and each chat record is information of each speaking of the user or the opposite user in a dialog box; each chat record comprises a speaking object, speaking time and speaking content;

the extraction module is used for extracting the keywords of the chat records through a three-layer Bayesian probability model; the keywords can represent behavioral intents of the chat records;

the storage module is used for identifying the importance index of the chat record according to the keywords of the chat record and the keywords of the front and back chat records of the chat record; the keywords of the front and back chat records comprise keywords of a prepositive phrase section formed by the first N chat records of the chat records and keywords of a postive phrase section formed by the last M chat records of the chat records; n, M is 0 or a positive integer; the importance index represents a predicted degree of subsequent impact of the event stated by the chat log on the user; and storing the chat records in a block corresponding to the importance index.

In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the methods described above and stores the chat log.

Compared with the prior art, the invention has the following beneficial effects:

the embodiment of the invention provides a big data financial information intelligent identification and storage method and a system, which are characterized in that a chat log between a big data user and an opposite user is obtained; the chat log comprises a plurality of chat records, and each chat record is information of each speaking of the user or the opposite user in a dialog box; each chat record comprises a speaking object, speaking time and speaking content; extracting keywords of the chat records through a three-layer Bayesian probability model; the keywords can represent behavioral intents of the chat records; identifying the importance index of the chat records according to the keywords of the chat records and the keywords of the front and back chat records of the chat records; the keywords of the front and back chat records comprise keywords of a prepositive phrase section formed by the first N chat records of the chat records and keywords of a postive phrase section formed by the last M chat records of the chat records; n, M is 0 or a positive integer; the importance index represents a predicted degree of subsequent impact of the event stated by the chat log on the user; and storing the chat records in a block corresponding to the importance index. According to the key words of the chat records and the key words of the front and back chat records of the chat records, the important indexes of the chat records are identified, the chat records are stored in the blocks corresponding to the important indexes, the partition storage of important events is realized, the same important events are stored in one block, the user can conveniently search and manage the chat records, and the effectiveness and the convenience of user information storage, namely management are improved.

Drawings

Fig. 1 is a flowchart of a big data financial information intelligent identification and storage method according to an embodiment of the present invention.

Fig. 2 is a schematic block structure diagram of an electronic device according to an embodiment of the present invention.

The labels in the figure are: a bus 500; a receiver 501; a processor 502; a transmitter 503; a memory 504; a bus interface 505.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

Examples

The embodiment of the invention provides a big data financial information intelligent identification and storage method, as shown in figure 1, the method comprises the following steps:

s101: and obtaining a chat log between the big data user and the opposite user.

The chat log comprises a plurality of chat records, and each chat record is information of each speaking of the user or the opposite user in a dialog box. Each chat record includes the subject of the utterance, the time of the utterance, and the content of the utterance. The big data user may be a financial user, a user of online education, etc. of these internet users.

S102: and extracting the keywords of the chat records through a three-layer Bayesian probability model.

Wherein the keywords can characterize behavioral intent of the chat log. For example, one chat log is "zhang san: today selling three thousand, 2021 years, 4 months, 5 days, 16:01 ", then the keyword is" selling three thousand ".

S103: and identifying the importance index of the chat records according to the keywords of the chat records and the keywords of the front and back chat records of the chat records.

The keywords of the front and back chat records comprise keywords of a prepositive phrase section formed by the first N chat records of the chat records and keywords of a postive phrase section formed by the last M chat records of the chat records; n, M is 0 or a positive integer; the importance index represents a degree of expected impact of the event stated by the chat log on the user at a later time. The first N chat logs of the chat logs refer to N chat logs with the sending time before the chat logs, and the last M chat logs of the chat logs refer to M chat logs with the sending time after the chat logs.

S104: and storing the chat records in a block corresponding to the importance index.

By adopting the scheme, the important indexes of the chat records are identified according to the keywords of the chat records and the keywords of the front and back chat records of the chat records, the chat records are stored in the blocks corresponding to the important indexes, so that the important events are intelligently identified and stored in the blocks in a partitioned manner, the same important events are stored in one block, a user can conveniently search and manage the chat records, and the effectiveness and convenience of user information storage, namely management are improved.

Thus, when a user needs to find a piece of information, the user can directly find the piece of information (chat log) in the block where the importance index is lost. When the user needs to delete the unimportant information, the data information (chat history) in the block with the low importance index can be directly deleted. In addition, the method also comprises the step of regularly updating the chat records (information) in the storage blocks, namely updating the storage positions of the chat records, namely moving the chat records (information) which are not important any more to the blocks with low importance indexes and moving the chat records (information) which become important to the blocks with high importance indexes. Specifically, the method for updating the storage location of the chat log includes: obtaining the influence degree of the historical chat records of the users in the storage block and the importance index of the historical chat records; it should be noted that the storage location of the historical chat history is updated based on the influence degree and the importance index. It should be noted that after storing the chat history in a block, the chat history becomes the historical chat history of the user.

Wherein the influence level is indicative of a degree of influence of an event stated in the historical chat log on the user at the present time; the importance index represents the degree of influence of the events stated by the historical chat records on the user in the future, which is expected before the historical chat records are stored in the storage block; the chat records are the information of each speaking of the user or the opposite user in the dialog box; each chat record includes the subject of the utterance, the time of the utterance, and the content of the utterance.

The obtaining of the influence of the history chat records of the users in the storage block (the above block) includes:

determining a time period from the moment of storing the historical chat records in the storage block to the current moment as an investigation time period; alternatively, the investigation period may be set to a time length of one week, half month, one quarter, half year, one year, or the like.

And obtaining a chat log of the user and the opposite user in the investigation time period.

The chat log comprises a plurality of chat records, and each chat record is information of each speaking of the user or the opposite user in a dialog box.

And obtaining the key words of each chat record in the chat log and the key words of the historical chat records. Wherein the keywords can characterize behavioral intent of the chat log.

And obtaining the association degree between the keywords of the chat records and the keywords of the historical chat records.

The relevance degree represents the relevance degree between the events mentioned by the historical chat records and the events mentioned by the chat records; and an association degree exists between each chat record and the historical chat records, and a plurality of chat records correspond to a plurality of association degrees. In the embodiment of the invention, the cosine value of the included angle between the word vector formed by the keywords of the chat records and the word vector formed by the keywords of the historical chat records is obtained as the relevancy.

Determining the chat records with the association degree larger than a preset value as associated chat records; optionally, the preset value may be 0.5.

And taking the number of the associated chat records as the influence degree of the historical chat records. If there are 5 associated chat records, the influence of the historical chat records is 5.

Optionally, the updating the storage location of the historical chat record based on the influence degree and the importance index includes:

predicting, based on the degree of influence and the chat log, a subsequent degree of influence of the event stated by the historical chat log after the current time.

Wherein the subsequent influence level is indicative of a degree of influence of an event stated in the historical chat log on the user after a current time.

Updating the storage position of the historical chat record based on the subsequent influence degree and the importance index, and specifically comprising:

obtaining the number of the chat records in the chat log;

taking the quotient of the influence degree and the number of the chat records in the chat log as an influence factor;

and taking the product of the influence factor and the influence degree as the predicted subsequent influence degree.

Optionally, the updating the storage location of the historical chat record based on the subsequent influence degree and the importance index includes:

taking the quotient of the subsequent influence degree and the number of the chat records in the chat log as an adjusting factor; for example, the subsequent influence degree is h, the number of chat records in the chat log is n, and the adjustment factor r = h/n.

Taking the product of the adjustment factor and the importance index as a new importance index of the historical chat record;

and storing the historical chat records on the storage blocks corresponding to the new importance indexes. Assuming that the original significance index is 1 and the new significance index is 2, the historical chat records are moved from the storage block with the significance index of 1 to the storage block with the significance index of 2 for storage. Therefore, the intelligent and automatic effective management of the user information is realized, and the system performance is improved by colleagues.

Optionally, before storing the chat records in the block corresponding to the importance index, the method further includes: partitioning the storage space of the chat records to obtain a plurality of blocks, and determining the importance index of each block. In the embodiment of the present invention, the storage space may be divided into 3 blocks, and the significance indexes of the 3 blocks are 1, 2, and 3, respectively. The larger the importance index is, the more important the chat history (information) stored in the block is.

Optionally, before S103, the method further includes: and obtaining keywords of a prepositive phrase section formed by the first N chat records of the chat records, and obtaining keywords of a postive phrase section formed by the last M chat records of the chat records. Optionally, the keywords of the preceding phrase segment formed by the first N chat records of the chat records may be obtained through a three-layer bayesian probability model (LDA), and the keywords of the following phrase segment formed by the last M chat records of the chat records may be obtained through a three-layer bayesian probability model (LDA).

For example, the chat log is "user a: how much money you sell gives me 2021 year 8 month 5 day 12:00

User B (opposite user): the 25 pieces of money are one minute or more. 8/5/12: 01/2021

The user A: less 12:02 in 8 months and 5 days in 2021

And a user B: that you feel how much he is right 2021 year 8 month 5 day 12:03

The user A: 5 Yuan xing cang (8 months and 5 days 12: 04) in 2021

And a user B: you can pull the bar, go wrong, add a bit. 8/5/12: 05 in 2021

The user A: those 10 yuan are not sold. 8/5/12/06/2021

And a user B: it is not sold for 10 Yuan. "

Then 8 chat logs are included in the chat log. Assuming N =4 and M =3, the chat log "user a: the first N chat records of 5-element row No. 2021, 8-month, 5-day, 12:04 "include" user a: how much money you sell gives me 2021 year 8 month 5 day 12:00

The user A: less 12:02 in 8 months and 5 days in 2021

And a user B: that you feel how much to fit in 2021, 8 months, 5 days 12:03 ".

The prefix section is that the book gives me 25 pieces of money for more or less how much money you sell. You may not feel as much suitable as less.

The chat log "user a: the last M chat records of 5-element row No. 2021, 8-month, 5-day, 12:04 "include" user B: you can pull the bar, go wrong, add a bit. 8/5/12: 05 in 2021

The user A: those 10 yuan are not sold. 8/5/12/06/2021

And a user B: it is not sold for 10 Yuan. ".

The postscript field is "5 Yuan xing you can pull down the bar, and add a little. Those 10 yuan are not sold. It is not sold for 10 Yuan. ".

connecting the keywords of the chat records and the keywords of the front and back chat records of the chat records end to form a keyword entry; for example, the keywords of the chat log include "5 yuan" and "unlawful", and the keywords of the chat log before and after the chat log include "book" and "how much money". According to the speaking time of the keywords in the chat records, the keyword entries obtained by connecting the keywords end to end are 'how much money 5 yuan is not available from the beginning'.

Forming a key phrase by the key words of the chat records and the key words of the front and back chat records of the chat records; the keyword group comprises a plurality of keywords; according to the above example, the keyword group includes the keywords "5 yuan", "not go", "primitive", "how much money".

Obtaining a first correlation index between each keyword in the keyword group and a plurality of standard keywords in a big database; it should be noted that the standard keywords are stored in the big database by the user in advance according to the individual speech habits, or are the standard keywords trained by the user according to the big data technology. The obtaining mode of the first correlation index between each keyword in the keyword group and a plurality of standard keywords in the big database is as follows: the plurality of keywords correspond to the plurality of first correlation indexes. The size of the first correlation index characterizes the size of similarity of the keyword to the standard keyword; each standard keyword in the large database corresponds to an importance index; the standard keywords are keywords which are confirmed by a user in advance to be stored in a large database.

Obtaining a first correlation index between each keyword in the keyword set and a plurality of standard keywords in a big database comprises:

the keywords are converted into keyword vectors, and the standard keywords are converted into keyword vectors.

Obtaining a cosine value of an included angle between the keyword vector and the standard keyword vector; each keyword vector corresponds to a plurality of standard keyword vectors, and each keyword vector corresponds to a plurality of cosine values; aiming at each keyword vector, obtaining a cosine value mean value and a cosine value variance of a plurality of cosine values corresponding to the keyword vector; if the cosine value variance is greater than a set value, obtaining a maximum value in the cosine values, and taking the quotient of the maximum value and the cosine value mean value and adding the cosine value variance as the first correlation index, for example, if the first correlation index is d1, the maximum value in the cosine values is max, the cosine value mean value is p, and the cosine value variance is t, then the first correlation index d1= (max/p) + t.

The optional set point value is 0.5.

if there is only one keyword entry and there are multiple standard keywords, then multiple second correlation indexes will be obtained correspondingly. The magnitude of the second correlation index characterizes the magnitude of similarity of the keyword entry to the standard keyword. The obtaining of the second correlation index between the keyword entry and the plurality of standard keywords in the big database specifically comprises:

and converting the keyword into a term vector, and taking the mean value of cosine values between the term vector and the standard keyword vector as the second correlation index.

The embodiment of the application also correspondingly provides an executing main body for executing the steps, and the executing main body can be an intelligent big data financial information identifying and storing system. Big data financial information intelligent identification and storage system includes:

With regard to the system in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention further provides an electronic device, as shown in fig. 2, which includes a memory 504, a processor 502 and a computer program stored in the memory 504 and executable on the processor 502, wherein the processor 502 implements the steps of any one of the above-mentioned intelligent identification and storage methods for big data financial information when executing the program.

Where in fig. 2 a bus architecture (represented by bus 500) is shown, bus 500 may include any number of interconnected buses and bridges, and bus 500 links together various circuits including one or more processors, represented by processor 502, and memory, represented by memory 504. The bus 500 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 505 provides an interface between the bus 500 and the receiver 501 and transmitter 503. The receiver 501 and the transmitter 503 may be the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 502 is responsible for managing the bus 500 and general processing, and the memory 504 may be used for storing data used by the processor 502 in performing operations.

In the embodiment of the invention, the big data financial information intelligent identification and storage system is installed in the robot, and particularly can be stored in a memory in the form of a software functional module and can be processed and run by a processor.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the above-mentioned big data financial information intelligent identification and storage methods, and the above-mentioned historical chat records and chat records.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in an apparatus according to an embodiment of the invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A big data financial information intelligent identification and storage method is characterized by comprising the following steps:

and storing the chat records in a block corresponding to the importance index.

2. The method of claim 1, wherein prior to said storing said chat log in a tile corresponding to said importance index, said method further comprises:

3. The method of claim 1, further comprising: and obtaining keywords of a prepositive phrase section formed by the first N chat records of the chat records, and obtaining keywords of a postive phrase section formed by the last M chat records of the chat records.

4. The method of claim 1, wherein obtaining keywords of a prefix segment formed by the first N chat logs of the chat logs comprises:

5. The method of claim 1, wherein determining the importance index of the chat log based on the keywords of the chat log and the keywords of the chat logs before and after the chat log comprises:

6. An intelligent big data financial information identification and storage system, the system comprising:

7. The system of claim 6, further comprising:

and the partitioning module is used for partitioning the storage space of the chat record to obtain a plurality of blocks and determining the importance index of each block.

8. The system of claim 6, further comprising:

and the keyword extraction module is used for obtaining keywords of a prepositive phrase section formed by the first N chat records of the chat records and obtaining keywords of a postive phrase section formed by the last M chat records of the chat records.

9. The system of claim 6, wherein determining the importance index of the chat log based on the keywords of the chat log and the keywords of the chat logs before and after the chat log comprises:

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5 and stores the chat log.