CN116610265B - Data storage method of business information consultation system - Google Patents

Data storage method of business information consultation system Download PDF

Info

Publication number
CN116610265B
CN116610265B CN202310861297.XA CN202310861297A CN116610265B CN 116610265 B CN116610265 B CN 116610265B CN 202310861297 A CN202310861297 A CN 202310861297A CN 116610265 B CN116610265 B CN 116610265B
Authority
CN
China
Prior art keywords
character
characters
data
consultation
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310861297.XA
Other languages
Chinese (zh)
Other versions
CN116610265A (en
Inventor
易万泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Pupu Digital Industry Development Co ltd
Jinan Jiutong Zhiheng Information Technology Co ltd
Original Assignee
Shenzhen Pupu Digital Industry Development Co ltd
Jinan Jiutong Zhiheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Pupu Digital Industry Development Co ltd, Jinan Jiutong Zhiheng Information Technology Co ltd filed Critical Shenzhen Pupu Digital Industry Development Co ltd
Priority to CN202310861297.XA priority Critical patent/CN116610265B/en
Publication of CN116610265A publication Critical patent/CN116610265A/en
Application granted granted Critical
Publication of CN116610265B publication Critical patent/CN116610265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of electronic digital data processing, in particular to a data storage method of a business information consultation system, which comprises the following steps: obtaining a first probability according to the frequency of the characters in the consultation dialogue data, obtaining an influence factor by combining the relation between the frequencies of the character combinations formed by the characters, obtaining a second probability of the characters by using the influence factor, obtaining the front probability of the characters according to the first probability and the second probability, further determining a character list for an MTF algorithm, and improving the coding efficiency. The invention can reduce the index value of high-frequency character, avoid the situation that longer character combination is divided into a plurality of character combinations, and improve the coding compression effect of MTF algorithm on consultation dialogue data by combining the frequency of character combination, thereby further reducing the storage space occupied by compressed data.

Description

Data storage method of business information consultation system
Technical Field
The invention relates to the technical field of electronic digital data processing, in particular to a data storage method of a business information consultation system.
Background
With the continuous development and popularization of internet technology, the application range of the business information consultation system is expanding, and the business information consultation system gradually evolves into a comprehensive tool containing a plurality of fields such as enterprise management, market research, industry analysis and the like from the initial single query function. In the process of consultation, a great deal of communication with clients is needed, such as online chat, and in general, after consultation is completed, consultation data needs to be stored in a reserved file, and for a great deal of chat dialogue data, compression processing is needed first.
The data coding is usually needed to be carried out on the consultation dialogue data by compressing the consultation dialogue data, and the coding compression is usually carried out by utilizing an MTF algorithm because a large number of repeated characters exist in the consultation dialogue data; however, due to the large number of repeated characters, the coding result is affected by the character list obtained by the MTF algorithm, the character list obtained by the existing MTF algorithm in the coding process is random, and the index value corresponding to the characters and the character combination corresponding to the repeated characters is large, so that the data coding compression effect is not ideal.
The invention provides a data storage method of a business information consultation system, optimizes the method for acquiring the character list in the coding process of an MTF algorithm, and improves the coding compression effect on consultation dialogue data.
Disclosure of Invention
The invention provides a data storage method of a business information consultation system, which aims to solve the existing problems.
The data storage method of the business information consultation system adopts the following technical scheme:
the invention provides a data storage method of a business information consultation system, which comprises the following steps:
acquiring consultation dialogue data;
obtaining a first probability of a character according to the frequency of the character in the consultation dialogue data; acquiring a plurality of character strings and corresponding frequencies in consultation dialogue data, and acquiring merging possibility according to the frequencies of any two character strings; combining the character strings according to the combination possibility to obtain a plurality of character combinations;
the character strings formed by any two characters in any character combination are also recorded as character combinations, and influence factors are obtained according to the frequency of the characters in any character combination and the number of the character combinations formed by any two characters; recording the number of characters contained in the character combination as the length of the character combination; obtaining a second probability of the character according to the frequency, the length and the influence factor of the character combination;
the method comprises the steps of obtaining the pre-probability of characters in consultation dialogue data according to a first probability and a second probability, and sequencing the characters according to the pre-probability to obtain a character list; and the character list is utilized to encode and compress the consultation dialogue data, so that the intelligent compressed storage of the business information consultation data is realized.
Further, the first probability of the character is obtained according to the frequency of the character in the consultation dialogue data, and the specific steps are as follows:
acquiring the frequency of each character in the consultation dialogue data and the number of the characters contained in the consultation dialogue data;
the ratio between the frequency of the characters and the number of characters contained in the consultation dialogue data is recorded as a first probability of the corresponding characters.
Further, the method for obtaining the merging possibility according to the frequency of any two character strings comprises the following specific steps:
firstly, obtaining a plurality of character strings from consultation dialogue data by using a backtracking method;
then, the corresponding frequency of any two character strings in the consultation dialogue data and the corresponding frequency of a new character string in the consultation dialogue data after the two character strings are combined are obtained;
and finally, the frequency corresponding to the new character string after the two character strings are combined in the consultation dialogue data and the ratio of the sum of the frequencies corresponding to the two character strings in the consultation dialogue data are recorded as the combining possibility of the corresponding two character strings.
Further, the step of merging the character strings according to the size of the merging possibility to obtain a plurality of character combinations includes the following specific steps:
presetting a merging possibility threshold according to experience, and merging two character strings to obtain a new character string when the merging possibility of any two character strings is larger than the merging possibility threshold; otherwise, the two character strings are not combined, and the obtained combined new character string and the obtained non-combined character string are recorded as character combinations to obtain a plurality of character combinations.
Further, the method for obtaining the influence factor according to the frequency of the characters in any character combination and the number of the character combinations formed by any two characters comprises the following specific steps:
any character combination of the firstThe specific acquisition method of the influence factors of the individual characters comprises the following steps:
wherein ,representing the>An influence factor of the individual characters; />The representation comprises->The character combination of the individual characters comprises +.>Individual characters and except->First->When the characters are selected, the number of the corresponding character combinations in the consultation dialogue data; />The representation comprises->The number of character combinations of the individual characters in the advisory dialog data; />The representation comprises->In the character combinations of the individual characters, except +.>Besides the individual characters, include->The number of character combinations of the individual characters in the advisory dialog data; />Indicate->A character, and the +.>The formation of +.>The frequency with which individual characters are combined in the advisory dialog data; />The representation comprises->In the character combinations of the individual characters, except +.>The ∈th other than the individual character>The frequency of individual characters in the advisory dialog data; />The representation comprises->Character combinations of individual characters, the number of characters contained.
Further, the obtaining the second probability of the character according to the frequency, the length and the influence factor of the character combination includes the following specific steps:
the second probability of the characters in the consultation dialogue data is obtained by the following specific method:
wherein ,representing the +.>A second probability of a character; />The consultation dialogue data includes +.>The number of character combinations of the individual characters; />The representation comprises->First->A frequency in consulting dialogue data; />The representation comprises->First->The length of the individual character combinations; />The representation comprises->First->In the character combination->Influence factor of individual characters.
Further, the method for obtaining the pre-probability of the characters in the consultation dialogue data according to the first probability and the second probability, and sequencing the characters according to the pre-probability to obtain a character list comprises the following specific steps:
firstly, recording a product result of a first probability and a second probability of any character in consultation dialogue data as a leading probability of a corresponding character to obtain leading probabilities of all characters in the consultation dialogue data;
and then, arranging the leading probabilities corresponding to all the characters from large to small, and determining the left-to-right sequence of the corresponding characters in the character list to obtain the character list.
Furthermore, the method for implementing intelligent compression storage of business information consultation data by utilizing the character list to code and compress the consultation dialogue data comprises the following specific steps:
according to the character list, the consultation dialogue data is encoded and compressed by using an MTF algorithm, the encoded and compressed data is recorded as compressed data, and the compressed data is stored by using a data management platform, so that the intelligent compressed storage of the business information consultation data is realized.
The technical scheme of the invention has the beneficial effects that:
(1) According to the frequency relation of the characters and the character combination in the consultation dialogue data, the index value of the high-frequency characters is reduced, so that the coding output value of an MTF algorithm is reduced, and the coding compression effect is improved; in addition, the adjacent character strings are sequentially combined in the consultation dialogue data to obtain a plurality of character combinations, so that the longer character strings in the consultation dialogue data are prevented from being divided into a plurality of character strings, namely, the phenomenon that the frequency relation between the corresponding character combinations and the characters is inconsistent due to repeated statistics of the characters is avoided.
(2) The MTF coding efficiency of the consultation dialogue data is improved by utilizing the frequency characteristics of the characters and the character combinations; meanwhile, the possibility that the character combinations are preposed in the character list is improved by combining the mutual influence of the character frequencies in the character combinations, and in the analysis of influence factors of the characters, the combination relation formed by the characters is considered, so that the influence relation among the characters is accurately quantized, the index value of the high-frequency characters in the character list is further reduced, the coding compression efficiency of an MTF algorithm is improved, compression data with high compression rate are obtained, and intelligent compression storage of business consultation information is realized.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart showing the steps of a data storage method of a business information consulting system according to the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purposes, the following detailed description refers to the specific implementation, structure, features and effects of a data storage method of a business information consultation system according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of a data storage method of a business information consultation system provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a data storage method of a business information consultation system according to an embodiment of the present invention is shown, the method includes the following steps:
step S001, acquiring consultation dialogue data generated in the business consultation system.
In the business consultation system, a business consultation module is connected with data management and is used for transmitting generated consultation dialogue data to a data management platform after one business consultation is completed, and the data management platform is used for encoding, compressing and storing the consultation dialogue data.
In addition, the number of characters included in the character set is recorded as the length of the character set.
Step S002, obtaining a first probability according to the frequency of the characters in the consultation dialogue data, obtaining a plurality of character combinations, obtaining a merging possibility according to the frequency of the character combinations, and merging the character combinations according to the size of the merging possibility to obtain a plurality of character combinations.
In the process of carrying out MTF coding on consultation dialogue data, character frequency and a corresponding index value influence coding effect, wherein the index value is influenced by an initial character list, and the embodiment obtains the initial character list formed by characters by utilizing the character frequency and different character frequency relations in a character combination so as to realize efficient compression of the consultation dialogue data;
in the process of coding consultation dialogue data by using an MTF algorithm, the earlier the position of a character in a list is, the smaller the output index is, the better the coding effect is, and in order to obtain the better coding effect, the earlier the position of the character with higher frequency in the list is required to be as far as possible so as to reduce the whole index data, so that the embodiment obtains the initial list through the character frequency and the character combination frequency relation, and therefore, the data is efficiently compressed.
The character form rule corresponding to the consultation dialogue data involved in the consultation system and the consultation theme surrounded by the consultation process enable a large number of repeated characters to exist in the consultation dialogue data, so that the MTF algorithm can be utilized to compress the code of the consultation dialogue data;
it should be noted that, the MTF algorithm is an existing data encoding technology, and the encoding process moves the appearing character toward the front end, so that the character with high appearing frequency is located at the front end of the list, and the size of the index value is greatly reduced, thereby achieving the effect of compressing data.
The specific process of coding consultation dialogue data by using the MTF algorithm is as follows:
firstly, initializing a list formed by all characters in consultation dialogue data, and recording the list as a character list;
then, for each character in the consultation dialogue data, searching the position of each character in the list, and outputting an index of the corresponding position;
secondly, moving elements corresponding to the characters in the list to the forefront of the list, and leading the position of the next identical character in the list to be more forward, thereby obtaining a shorter index value;
and finally, repeating the step of inquiring the characters until all the characters in the consultation dialogue data are processed.
In the process of coding by using an MTF algorithm, the character list is mainly updated, the closer to the characters at the front end of the list, the smaller the output index value is, when a plurality of repeated symbols exist in input data, the larger probability is used for the symbols arranged in the front in the character list, so that the length of the coded compressed data is shorter; conversely, if the first symbol in the initial list is rarely used, the length of the encoded compressed data is longer, and thus the determined character list has a great influence on the compression effect.
When character encoding is performed, the index value of the character which is the earlier in the list is smaller, and for the data with a large number of repeated characters, the index value occupies a relatively high area in the encoding result, so that the influence of the corresponding encoding result on the compression rate of the consultation dialogue data is large, so that in actual encoding, in order to improve the data compression rate, the character with the higher frequency of the character in the consultation dialogue data is required to be the earlier in the initial list position.
Analyzing the consultation dialogue data, and acquiring the probability of the characters in the consultation dialogue data in front of a character list according to the occurrence frequency of the characters in the consultation dialogue data, wherein the probability is recorded as a first probability, and the specific acquisition method comprises the following steps of:
wherein ,representing the +.>First probability of individual character,/>Indicate->Frequency of individual characters>Representing the total number of characters in the advisory dialog data;
the first probability reflects the i-th character duty cycle, the larger its value, the greater the probability of being in front of the initial list.
Step (2), for the situation that a large number of repeated characters exist in the consultation dialogue data, and more fixed character combinations exist around the consultation theme in the chat content corresponding to the actual consultation process, for example, fixed words, fixed sentences and the like appear in the consultation dialogue data for a plurality of times; in the actual coding, the characters corresponding to the character combinations which repeatedly appear for many times need to be positioned in front of the character list so as to improve the whole coding compression effect of the consultation dialogue data.
In the consultation dialogue data, the character combination length has larger difference, and the accidental character combination existing simultaneously affects the character combination distribution statistics, so that the existing character combination needs to be screened in the consultation dialogue data, and the specific process is as follows:
firstly, obtaining a plurality of character strings in consultation dialogue data by using a backtracking method, wherein the character strings are possibly divided into a plurality of smaller character strings, so that character string combination is needed, namely judging whether new character strings obtained by connecting two adjacent character strings exist in the consultation dialogue data, and if so, combining the corresponding character strings; judging the merging possibility among the character strings according to the judgment criterion;
then, the merging possibility between the character strings in the consultation dialogue data is obtained by the following specific method:
wherein ,indicate->Personal character string and->Merging possibility corresponding to each character string, < >>Indicate->Frequency of individual strings in consultation dialogue data, < >>Indicate->Frequency of individual strings in consultation dialogue data, < >>Indicate->Personal character string and->The frequency of the new character string obtained by the character strings;
the merging possibility indicates a possibility that a new character string is entirely composed of the corresponding two character strings after the two character strings are merged, and the larger the numerical value is, the greater the possibility is.
Finally, presetting a merging possibility threshold value to be 0.4 according to experience, and merging two character strings to obtain a new character string when the merging possibility of any two character strings is larger than the merging possibility threshold value; otherwise, the two character strings are not combined, namely two independent character strings, and the obtained combined new character string and the obtained uncombined character string are recorded as character combinations to obtain a plurality of character combinations.
It should be noted that, the combination into character strings may form a fixed collocation in the consultation dialogue data, that is, common words formed in the consultation dialogue data.
It should be noted that, when the character combinations are acquired by using the backtracking method, it should be avoided that the longer character combinations in the advisory dialogue data are divided into a plurality of character combinations, that is, the frequency relationship between the character combinations and the characters is inconsistent due to repeated statistics of the characters, so that the frequency expression of the character combinations on the characters in the advisory dialogue data is improved, and a final initial list is determined.
So far, a plurality of character combinations in the consultation dialogue data are obtained.
Step S003, obtaining the influence factor of the character according to the frequency and the length of the character combination, and obtaining the second probability of the character according to the influence factor.
In the process of coding the consultation dialogue data by using the MTF algorithm, characters corresponding to repeated character combinations need to be positioned in front of a character list so as to improve the coding effect of the whole consultation dialogue data, so that the frequency of the character combinations also influences the positions of the characters in the consultation dialogue data in the character list.
The higher the frequency of character combinations, the more forward the contained characters are in the character list.
Step (1), obtaining the frequency of each character combination in the consultation dialogue data and the number of the contained characters, and recording the frequency as the length of the character combination; the character string formed by any two characters in the character combination is also recorded as the character combination;
it should be noted that, the consultation dialogue data contains a plurality of character combinations composed of different characters, and there are cases that one character corresponds to a plurality of character combinations; in addition, a character combination formed by any two characters in one character combination may exist in a plurality of identical character combinations in the consultation dialogue data.
In addition, the first step in the consultation dialogue dataFirst->Frequency pairs of other characters in the character combinationThe influence of the first probability of the individual character is noted as an influence factor; acquiring +.f arbitrary character combination in consultation dialogue data>The specific acquisition method of the influence factors of the individual characters comprises the following steps:
wherein ,representing the>An influence factor of the individual characters; />The representation comprises->The character combination of the individual characters comprises +.>Individual characters and except->First->When the characters are selected, the number of the corresponding character combinations in the consultation dialogue data; />The representation comprises->The number of character combinations of the individual characters in the advisory dialog data; />The representation comprises->In the character combinations of the individual characters, except +.>Besides the individual characters, include->The number of character combinations of the individual characters in the advisory dialog data; />Indicate->A character, and the +.>The formation of +.>The frequency with which individual characters are combined in the advisory dialog data; />The representation comprises->In the character combinations of the individual characters, except +.>The ∈th other than the individual character>The frequency of individual characters in the advisory dialog data; />The representation comprises->Character combinations of individual characters, the number of characters contained;
there is the firstIn the character combinations of the individual characters, except +.>The ∈th other than the individual character>The greater the frequency of the individual characters, the more forward the corresponding character combinations are located in the character list, thereby enabling the +.>The higher the probability that the individual character is in the front of the character list;
reflect->The +.f in the combination of the individual character and the corresponding character>A character, a degree of forming a fixed combination, the greater the value thereof, the +.>Frequency pair of individual characters->Person and->The greater the influence on the position of the character list, the greater the influence on the +.>The greater the impact of the probability that the individual character is positioned forward in the initial list;
first, theA character, and the +.>The formation of +.>The greater the frequency of the individual character combinations, the +.>Characters and->The more fixed the combination mode that the individual characters form the character combination;
indicate->Characters and->The presence +.>The degree to which the two are formed into a fixed combination is reflected by the character combination, and the larger the value is, the more the combination mode is fixed;
the influence factor represents the firstIn any character combination corresponding to the individual characters, the others +.>Frequency pair of individual characters->The influence of the first probability of a character.
In this embodiment, according to the frequency of the character combination in the consultation dialogue data, the probability that the character is positioned at the front in the character list, that is, the influence of the first probability, is improved by utilizing the fact that the character combination is positioned at the front in the whole in the initial character list, so that the coding compression efficiency of the consultation dialogue data is improved; meanwhile, the possibility that the character combination is in front in the character list is improved by utilizing the mutual influence formed between the frequencies of the characters in the character combination, and in the analysis of the mutual influence of the characters, the combination relation formed by the characters is considered, so that the accuracy of forming the mutual influence between the characters is improved.
Reflecting the probability of the character in the front of the character list according to the frequency of each character corresponding to different character combinations, and marking the probability as a second probability; the second probability of the character in the dialogue data is consulted, and the specific acquisition method is as follows:
wherein ,representing the +.>A second probability of a character; />The consultation dialogue data includes +.>The number of character combinations of the individual characters; />The representation comprises->First->A frequency in consulting dialogue data; />The representation comprises->First->The length of the individual character combinations; />The representation comprises->First->In the character combination->An influence factor of the individual characters;
the greater the frequency of character combinations, the greater the probability that the character will be in the front of the character list; the longer the length of the character combination, the more frequently the character combination is affected by other characters in the character combination.
Indicate->First->The frequency of the individual character combinations, reflected by +.>Probability of the preceding character in the character list;
the second probability represents the firstThe>The character combination reflects its probability of being in front of the initial list.
In the process of coding by using the MTF algorithm, the more the character corresponding to the repeated character combination needs to be positioned in front of the character list, the better the coding compression effect, the more the character combination is positioned in front of the frequency according to the character combination, but the difference exists between the character frequencies corresponding to the character combination, wherein the character frequencies influence the character combination frequency, namely the frequency relation among different characters in the character combination influences the position of the character combination in the character list, so that the probability that different characters in the character combination are positioned in front of the character list is influenced.
Thus, the higher the frequency of the character combination, the higher the probability that the corresponding character is positioned forward in the character list, and the higher the frequency of a single character, for a plurality of characters present in the character combination, causes the more forward the character combination is positioned in the character list, thereby causing the higher the probability that the other characters in the corresponding character combination are positioned forward in the character list, i.e., the other characters are positioned forward in the character list.
Step S004, obtaining the pre-probability according to the first probability and the second probability of the characters in the consultation dialogue data, obtaining a character list according to the pre-probability, and combining an MTF algorithm to realize intelligent compression storage of the business information consultation data.
Step (1), through the steps, the first probability and the second probability of the characters are determined according to the relation between the frequency of the characters and the frequency of the character combination, and the front probability of the characters in the consultation dialogue data in the character list is further determined according to the first probability and the second probability, wherein the specific acquisition method comprises the following steps:
wherein ,representing the +.>Front probability of individual character,/>Representing the +.>A first probability of a character; />Representing the +.>A second probability of a character.
Step (2), obtaining the pre-probability of all characters in the consultation dialogue data by using a pre-probability obtaining method; and arranging the front probabilities corresponding to all the characters from large to small, determining the left-to-right sequence of the corresponding characters in the character list, and obtaining the character list required when the consultation dialogue data is encoded by using the MTF algorithm.
And (3) coding and compressing the consultation dialogue data by utilizing an MTF algorithm according to the character list, recording the coded and compressed data as compressed data, and storing and managing the compressed data by utilizing a data management platform to realize intelligent compression storage of the business information consultation data.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (6)

1. A data storage method of a business information consultation system, the method comprising the steps of:
acquiring consultation dialogue data;
obtaining a first probability of a character according to the frequency of the character in the consultation dialogue data; acquiring a plurality of character strings and corresponding frequencies in consultation dialogue data, and acquiring merging possibility according to the frequencies of any two character strings; combining the character strings according to the combination possibility to obtain a plurality of character combinations;
the character strings formed by any two characters in any character combination are also recorded as character combinations, and influence factors are obtained according to the frequency of the characters in any character combination and the number of the character combinations formed by any two characters; recording the number of characters contained in the character combination as the length of the character combination; obtaining a second probability of the character according to the frequency, the length and the influence factor of the character combination;
the method comprises the steps of obtaining the pre-probability of characters in consultation dialogue data according to a first probability and a second probability, and sequencing the characters according to the pre-probability to obtain a character list; the character list is utilized to encode and compress the consultation dialogue data, so as to realize intelligent compression storage of business information consultation data;
the method for obtaining the influence factors according to the frequency of the characters in any character combination and the number of the character combinations formed by any two characters comprises the following specific steps:
any character combination of the firstThe specific acquisition method of the influence factors of the individual characters comprises the following steps:
wherein ,representing the>An influence factor of the individual characters; />The representation comprises->The character combination of the individual characters comprises +.>Individual characters and except->First->When the characters are selected, the number of the corresponding character combinations in the consultation dialogue data; />The representation comprises->The number of character combinations of the individual characters in the advisory dialog data; />The representation comprises->In the character combinations of the individual characters, except +.>Besides the individual characters, include->The number of character combinations of the individual characters in the advisory dialog data; />Indicate->A character, and the +.>The formation of +.>The frequency with which individual characters are combined in the advisory dialog data; />The representation comprises->In the character combinations of the individual characters, except +.>First other than the characterThe frequency of individual characters in the advisory dialog data; />The representation comprises->Character combinations of individual characters, the number of characters contained;
the second probability of the character is obtained according to the frequency, the length and the influence factor of the character combination, and the method comprises the following specific steps:
the second probability of the characters in the consultation dialogue data is obtained by the following specific method:
wherein ,representing the +.>A second probability of a character; />The consultation dialogue data includes +.>The number of character combinations of the individual characters; />The representation comprises->First->A frequency in consulting dialogue data; />Representation ofComprises->First->The length of the individual character combinations; />The representation comprises->First->In the character combination->Influence factor of individual characters.
2. The data storage method of a business information consultation system according to claim 1, characterized in that the first probability of the character is obtained according to the frequency of the character in the consultation dialogue data, comprising the specific steps of:
acquiring the frequency of each character in the consultation dialogue data and the number of the characters contained in the consultation dialogue data;
the ratio between the frequency of the characters and the number of characters contained in the consultation dialogue data is recorded as a first probability of the corresponding characters.
3. The data storage method of a business information consultation system according to claim 1, characterized in that said obtaining the merging possibility according to the frequency of any two character strings includes the following specific steps:
firstly, obtaining a plurality of character strings from consultation dialogue data by using a backtracking method;
then, the corresponding frequency of any two character strings in the consultation dialogue data and the corresponding frequency of a new character string in the consultation dialogue data after the two character strings are combined are obtained;
and finally, the frequency corresponding to the new character string after the two character strings are combined in the consultation dialogue data and the ratio of the sum of the frequencies corresponding to the two character strings in the consultation dialogue data are recorded as the combining possibility of the corresponding two character strings.
4. The data storage method of a business information consultation system according to claim 1, characterized in that the step of combining character strings according to the size of the combining possibility to obtain a plurality of character combinations includes the following specific steps:
presetting a merging possibility threshold according to experience, and merging two character strings to obtain a new character string when the merging possibility of any two character strings is larger than the merging possibility threshold; otherwise, the two character strings are not combined, and the obtained combined new character string and the obtained non-combined character string are recorded as character combinations to obtain a plurality of character combinations.
5. The data storage method of a business information consultation system according to claim 1, wherein the steps of obtaining the pre-probability of the characters in the consultation dialogue data according to the first probability and the second probability, and sorting the characters according to the pre-probability to obtain the character list, include the following specific steps:
firstly, recording a product result of a first probability and a second probability of any character in consultation dialogue data as a leading probability of a corresponding character to obtain leading probabilities of all characters in the consultation dialogue data;
and then, arranging the leading probabilities corresponding to all the characters from large to small, and determining the left-to-right sequence of the corresponding characters in the character list to obtain the character list.
6. The data storage method of a business information consultation system according to claim 1, characterized in that said coding and compressing the consultation dialogue data by using character list to realize intelligent compression storage of the business information consultation data, comprising the following specific steps:
according to the character list, the consultation dialogue data is encoded and compressed by using an MTF algorithm, the encoded and compressed data is recorded as compressed data, and the compressed data is stored by using a data management platform, so that the intelligent compressed storage of the business information consultation data is realized.
CN202310861297.XA 2023-07-14 2023-07-14 Data storage method of business information consultation system Active CN116610265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310861297.XA CN116610265B (en) 2023-07-14 2023-07-14 Data storage method of business information consultation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310861297.XA CN116610265B (en) 2023-07-14 2023-07-14 Data storage method of business information consultation system

Publications (2)

Publication Number Publication Date
CN116610265A CN116610265A (en) 2023-08-18
CN116610265B true CN116610265B (en) 2023-09-29

Family

ID=87676762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310861297.XA Active CN116610265B (en) 2023-07-14 2023-07-14 Data storage method of business information consultation system

Country Status (1)

Country Link
CN (1) CN116610265B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056496B (en) * 2023-10-12 2024-01-26 青岛海尔乐信云科技有限公司 Intelligent customer service interaction data management method based on big data
CN117171399B (en) * 2023-11-02 2024-02-20 云图数据科技(郑州)有限公司 New energy data optimized storage method based on cloud platform

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07221652A (en) * 1994-01-31 1995-08-18 Fujitsu Ltd Data compression method
CN1115524A (en) * 1994-06-16 1996-01-24 精工爱普生株式会社 Data compressing method, data recovery method and information processing device
JPH09214353A (en) * 1996-02-08 1997-08-15 Fujitsu Ltd Data compressor and data decoder
JP2000124810A (en) * 1998-08-13 2000-04-28 Fujitsu Ltd Encoding device and decoding device
KR20010067760A (en) * 2001-03-20 2001-07-13 강찬형 Lossless data compression method for uniform entropy data
CN102088607A (en) * 2011-02-28 2011-06-08 西安电子科技大学 Memory quotient (MQ) coding method and circuit based on JPEG (joint photographic experts group) 2000 standard
GB201403038D0 (en) * 2014-02-20 2014-04-09 Gurulogic Microsystems Oy Encoder, decoder and method
CN109428602A (en) * 2017-08-30 2019-03-05 前海中科芯片控股(深圳)有限公司 A kind of data-encoding scheme, device and storage medium
CN109474281A (en) * 2018-09-30 2019-03-15 湖南瑞利德信息科技有限公司 Data encoding, coding/decoding method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11687241B2 (en) * 2017-10-30 2023-06-27 AtomBeam Technologies Inc. System and method for data compaction utilizing mismatch probability estimation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07221652A (en) * 1994-01-31 1995-08-18 Fujitsu Ltd Data compression method
CN1115524A (en) * 1994-06-16 1996-01-24 精工爱普生株式会社 Data compressing method, data recovery method and information processing device
JPH09214353A (en) * 1996-02-08 1997-08-15 Fujitsu Ltd Data compressor and data decoder
JP2000124810A (en) * 1998-08-13 2000-04-28 Fujitsu Ltd Encoding device and decoding device
KR20010067760A (en) * 2001-03-20 2001-07-13 강찬형 Lossless data compression method for uniform entropy data
CN102088607A (en) * 2011-02-28 2011-06-08 西安电子科技大学 Memory quotient (MQ) coding method and circuit based on JPEG (joint photographic experts group) 2000 standard
GB201403038D0 (en) * 2014-02-20 2014-04-09 Gurulogic Microsystems Oy Encoder, decoder and method
CN109428602A (en) * 2017-08-30 2019-03-05 前海中科芯片控股(深圳)有限公司 A kind of data-encoding scheme, device and storage medium
CN109474281A (en) * 2018-09-30 2019-03-15 湖南瑞利德信息科技有限公司 Data encoding, coding/decoding method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Video compressive sensing reconstruction via long-short-term double-pattern prediction;周健;刘浩;;Optoelectronics Letters(第03期);全文 *
多阶上下文自适应二进制算术编码实现;杨文涛;刘卫忠;郑立新;邹雪城;;华中科技大学学报(自然科学版)(第03期);全文 *
数据压缩算法研究与设计;陈昌主;陈小松;;电脑与信息技术(第06期);全文 *

Also Published As

Publication number Publication date
CN116610265A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN116610265B (en) Data storage method of business information consultation system
US9223765B1 (en) Encoding and decoding data using context model grouping
US8120516B2 (en) Data compression using a stream selector with edit-in-place capability for compressed data
CN106407285B (en) A kind of optimization bit file compression &amp; decompression method based on RLE and LZW
CN101783788B (en) File compression method, file compression device, file decompression method, file decompression device, compressed file searching method and compressed file searching device
CN1183683C (en) Position adaptive coding method using prefix prediction
KR101049699B1 (en) Data Compression Method
CN107565971B (en) Data compression method and device
WO2010044100A1 (en) Lossless compression
CN116681036B (en) Industrial data storage method based on digital twinning
CN115840799B (en) Intellectual property comprehensive management system based on deep learning
CN101534124B (en) Compression algorithm for short natural language
CN116614139A (en) User transaction information compression storage method in wine selling applet
CN115801902A (en) Compression method of network access request data
CN108880559B (en) Data compression method, data decompression method, compression equipment and decompression equipment
CN111274816B (en) Named entity identification method based on neural network and vehicle machine
US9235610B2 (en) Short string compression
CN108573069B (en) Twins method for accelerating matching of regular expressions of compressed flow
CN115567058A (en) Time sequence data lossy compression method combining prediction and coding
Karpinski et al. A fast algorithm for adaptive prefix coding
CN102891730B (en) Method and device for encoding satellite short message based on binary coded decimal (BCD) code
CN112506876B (en) Lossless compression query method supporting SQL query
CN116318171B (en) LZ4 decompression hardware acceleration realization/compression method, device, medium and chip
CN113555034B (en) Compressed audio identification method, device and storage medium
CN117375631B (en) Fast coding method based on Huffman coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant