CN115718680B

CN115718680B - Data reading method, system, computer and readable storage medium

Info

Publication number: CN115718680B
Application number: CN202310029172.0A
Authority: CN
Inventors: 谷雨丰; 刘钦; 张敏
Original assignee: Jiangling Motors Corp Ltd
Current assignee: Jiangling Motors Corp Ltd
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2023-06-06
Anticipated expiration: 2043-01-09
Also published as: CN115718680A

Abstract

The invention provides a data reading method, a system, a computer and a readable storage medium, wherein the method comprises the following steps: the backup data in the original database is obtained and stored in the distributed file system, so that the backup data is split and stored in different hard disks; classifying the data in the hard disk, and converting the classified data into corresponding execution tasks, wherein the execution tasks comprise a plurality of data sets; and calculating the topic distribution corresponding to each data set respectively through a first preset algorithm, and reading out topics and phrases contained in each data set respectively according to the topic distribution. According to the mode, the backup data can be respectively stored in different hard disks, so that the data can be separately read, the data reading rate is correspondingly and greatly improved, in addition, the data is stored in the distributed system, so that the data does not need to be directly read from the database, and the data storage stability is greatly improved.

Description

Data reading method, system, computer and readable storage medium

Technical Field

The invention relates to the technical field of intelligent networking of automobiles, in particular to a data reading method, a system, a computer and a readable storage medium.

Background

Nowadays, with the longitudinal development of the internet to the automotive field, the automobile enterprise forum (Bulletin Board System, abbreviated as BBS) also tends to develop and perfect. In the vehicle enterprise forum, when the same problem occurs in a certain vehicle type or the certain vehicle type is hot-sold, the discussion of related topics is rapidly increased in the vehicle enterprise forum when a user discusses the functions of the certain vehicle type in a large number, and meanwhile, the vehicle enterprise also needs to rapidly obtain the discussed topics, so that important guiding significance can be generated for production, manufacture, sales and the like of the vehicle.

Most of the existing traditional BBSs store data in a relational database in a persistent manner, however, the relational database is not suitable for reading large data, and when a large amount of data in the relational database is read, the problems of excessively high utilization rate of a magnetic disk and a CPU (central processing unit ) and the like can be caused, and even the problem that the BBS system cannot be used normally occurs.

Meanwhile, when a large amount of data is read in the BBS, the time for reading a single large file is longer, the reading speed of the hard disk is depended on, the reliability of the single file is not high, and when accidents occur, such as hard disk damage, network abnormality and the like, the stable operation of the system cannot be ensured.

Therefore, in order to overcome the shortcomings of the prior art, it is necessary to provide a stable and rapid data processing method suitable for the vehicle enterprise forum.

Disclosure of Invention

Based on this, an object of the present invention is to provide a data reading method, system, computer and readable storage medium, so as to provide a stable and fast data processing method suitable for a vehicle enterprise forum.

An embodiment of the present invention provides a data reading method, where the method includes:

the backup data in the original database is obtained and stored in a distributed file system, so that the backup data is split and stored in different hard disks;

classifying the data in the hard disk, and converting the data into corresponding execution tasks, wherein the execution tasks comprise a plurality of data sets;

calculating the topic distribution corresponding to each data set through a first preset algorithm, and reading out topics and phrases contained in each data set according to the topic distribution.

The beneficial effects of the invention are as follows: firstly, obtaining backup data in an original database, and storing the current backup data into a distributed file system so as to split the current backup data and store the current backup data into different hard disks; on the basis, classifying the data in the hard disk, and converting the data into corresponding execution tasks, wherein the execution tasks comprise a plurality of data sets; finally, the topic distribution corresponding to each data set is calculated through a first preset algorithm, and topics and phrases contained in each data set are read out according to the topic distribution. According to the method, the backup data in the original database can be stored in the distributed file system, meanwhile, the backup data are respectively stored in different hard disks, so that the data can be separately read, the data reading rate is correspondingly and greatly improved, in addition, the data are stored in the distributed system, the data are not required to be directly read from the database, the data storage stability is greatly improved, and the method is suitable for large-scale popularization and use.

Preferably, the method further comprises:

acquiring a log file generated in the original database, and judging whether a newly added log appears in the log file;

if the newly added log appears in the log file, analyzing the data format of the newly added log so as to store the data and the table structure in the newly added log as target formats.

Preferably, the step of calculating the topic distribution corresponding to each data set through a first preset algorithm includes:

inputting target data distribution in the data set through a Gibbs algorithm, and setting a state transition threshold and the number of sets of the data set;

randomly initializing a state value of the data set, and updating the serial numbers of topics corresponding to each word in the data set;

and carrying out convergence processing on the state values based on a coordinate axis rotation algorithm to obtain a corresponding final sample set, and counting the topics corresponding to each word in the final sample set respectively to obtain topic distribution corresponding to the data set.

Preferably, the step of reading out the topics and phrases contained in each data set according to the topic distribution includes:

when the topic distribution is acquired, acquiring target topics corresponding to each phrase in the data set respectively through a first Gibbs algorithm unit in the topic distribution;

and calculating word distribution corresponding to each target theme through a second Gibbs algorithm unit, and collecting corresponding target phrases in the word distribution.

Preferably, after the step of reading out the topics and phrases respectively contained in each data set according to the topic distribution, the method further includes:

integrating the theme and the phrase to generate a corresponding data packet, and inputting the data packet into a preset evaluation template to generate a corresponding evaluation report;

and generating corresponding guide comments according to the evaluation report, and packaging the evaluation report and the guide comments so as to send the evaluation report and the guide comments to a mobile terminal of a user.

Preferably, the method further comprises:

establishing wireless communication connection with a display terminal, and converting the theme and the phrase into corresponding display signals through a second preset algorithm;

judging whether the display signal is matched with the display terminal or not;

and if the display signal is judged to be matched with the display terminal, transmitting the display signal to the display terminal so as to display the theme and the phrase in real time on the display terminal.

A second aspect of an embodiment of the present invention proposes a data reading system, the system comprising:

the acquisition module is used for acquiring backup data in the original database, storing the backup data into a distributed file system, splitting the backup data and storing the backup data into different hard disks;

the processing module is used for classifying the data in the hard disk and converting the data into corresponding execution tasks, wherein the execution tasks comprise a plurality of data sets;

the reading module is used for calculating the topic distribution corresponding to each data set through a first preset algorithm, and reading out the topic and the phrase contained in each data set according to the topic distribution.

In the above data reading system, the data reading system further includes an analysis module, where the analysis module is specifically configured to:

In the above data reading system, the reading module is specifically configured to:

In the above data reading system, the reading module is further specifically configured to:

In the above data reading system, the data reading system further includes an evaluation module, where the evaluation module is specifically configured to:

In the above data reading system, the data reading system further includes a display module, where the display module is specifically configured to:

judging whether the display signal is matched with the display terminal or not;

A third aspect of the embodiments of the present invention proposes a computer comprising a memory, a processor and a computer program stored on said memory and executable on said processor, said processor implementing a data reading method as described above when executing said computer program.

A fourth aspect of the embodiments of the present invention proposes a readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data reading method as described above.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a flowchart of a data reading method according to a first embodiment of the present invention;

fig. 2 is a block diagram of a data reading system according to a sixth embodiment of the present invention.

The invention will be further described in the following detailed description in conjunction with the above-described figures.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1, a data reading method according to a first embodiment of the present invention is shown, where the data reading method according to the first embodiment of the present invention can store backup data in an original database into a distributed file system, and simultaneously store the backup data in different hard disks, so that separate reading of data can be implemented, and the reading rate of data is greatly improved correspondingly.

Specifically, the data reading method provided in this embodiment specifically includes the following steps:

step S10, backup data in an original database are obtained, and the backup data are stored in a distributed file system so as to split the backup data and store the split backup data in different hard disks;

specifically, in the embodiment, it should be first described that the data reading method provided in the embodiment is specifically applied to a vehicle enterprise forum of each vehicle enterprise, and is used for extracting a required theme and phrase in the vehicle enterprise forum, so as to provide guidance opinion for subsequent vehicle manufacturing and sales of the vehicle enterprise.

It should be noted that most of the prior art directly stores the data generated in the vehicle enterprise forum in the relational database, but the data reading process of the relational database is long, so that the data reading time is greatly increased, and the data reading rate is correspondingly reduced.

Therefore, in this step, in order to effectively improve the reading rate and the storage stability of the data, the backup data in the current original database, that is, the backup data copied from all the current original database, needs to be obtained first. After the backup data is obtained, the current backup data is further stored in a preset distributed file system (Hadoop Distributed File System, HDFS) in real time in the step, so that the backup data input in real time is split inside the current distributed file system, and the split backup data are respectively stored in different hard disks on the basis, so that the split storage of the backup data can be realized. Preferably, in this embodiment, the distributed file system splits the backup data into a plurality of files with sizes of 128MB by default, so as to facilitate subsequent storage, and improve the data reading rate.

Step S20, classifying the data in the hard disk, and converting the data into corresponding execution tasks, wherein the execution tasks comprise a plurality of data sets;

specifically, in this embodiment, it should be noted that, after the foregoing backup data are respectively stored in different hard disks through the foregoing steps, in order to facilitate the subsequent data reading, in this embodiment, the data in each current hard disk are further classified, that is, the data in each hard disk are classified correspondingly according to different topics, and the classified data are converted into corresponding execution tasks, where in particular, the execution tasks include a plurality of data sets, so that the entire backup data can be split into a plurality of data sets to be respectively processed, so as to correspondingly improve the reading rate of the data.

Step S30, calculating the topic distribution corresponding to each data set through a first preset algorithm, and reading out topics and phrases contained in each data set according to the topic distribution.

Finally, in this step, it should be noted that, after the backup data is finally split into a plurality of data sets through the above steps, this step further processes the current plurality of data sets through a preset first preset algorithm, that is, through a preset MapReduce algorithm, and further calculates the topic distributions corresponding to the current plurality of data sets respectively, that is, calculates the topics contained in each data set respectively.

On the basis, the step can further read out the topics and the phrases respectively contained in each data set according to the calculated topic distribution, so that guidance comments can be provided for subsequent vehicle production and sales of the current vehicle enterprise according to the topics and the phrases acquired in real time.

When the method is used, the backup data in the original database is firstly obtained, and the current backup data is stored in the distributed file system so as to split the current backup data and store the split current backup data in different hard disks; on the basis, classifying the data in the hard disk, and converting the data into corresponding execution tasks, wherein the execution tasks comprise a plurality of data sets; finally, the topic distribution corresponding to each data set is calculated through a first preset algorithm, and topics and phrases contained in each data set are read out according to the topic distribution. According to the method, the backup data in the original database can be stored in the distributed file system, meanwhile, the backup data are respectively stored in different hard disks, so that the data can be separately read, the data reading rate is correspondingly and greatly improved, in addition, the data are stored in the distributed system, the data are not required to be directly read from the database, the data storage stability is greatly improved, and the method is suitable for large-scale popularization and use.

It should be noted that the foregoing implementation procedure is only for illustrating the feasibility of the application, but this does not represent that the data reading method of the application is only one implementation procedure, and may be incorporated into the feasible implementation of the data reading method of the application, as long as it can be implemented.

In summary, the data reading method provided by the embodiment of the invention can store the backup data in the original database into the distributed file system, and simultaneously store the backup data into different hard disks respectively, so that the data can be separately read, the data reading rate is correspondingly and greatly improved.

The second embodiment of the present invention also provides a data reading method, which is different from the data reading method provided in the first embodiment in that:

specifically, in this embodiment, it should be noted that, the method further includes:

Specifically, in this embodiment, it should be noted that when the original databases store data, corresponding log files are generated, where it can be understood that the data storage of each database is dynamic, that is, the data in each database is not unchanged, so in this embodiment, whether a new log appears in the log files is also determined in real time.

Further, in this embodiment, if it is determined that a new log appears in the current log file, the embodiment further performs parsing processing on a data format corresponding to the current new log, so that both data and a table structure in the current new log can be stored as a required target format.

Preferably, in this embodiment, the data and the table structure in the newly added log detected in real time are stored in csv (Comma-Separated Values) format, so as to improve the storage stability of the newly added log.

It should be noted that, for the sake of brevity, the method according to the second embodiment of the present invention, which implements the same principle and some of the technical effects as the first embodiment, is not mentioned here, and reference is made to the corresponding content provided by the first embodiment.

The third embodiment of the present invention also provides a data reading method, which is different from the data reading method provided in the first embodiment in that:

further, in this embodiment, it should be noted that the step of calculating, by the first preset algorithm, the topic distribution corresponding to each data set includes:

Specifically, in this embodiment, it should be noted that, the first preset algorithm provided in this embodiment is a MapReduce algorithm, where it should be noted that the MapReduce algorithm includes a Map algorithm stage and a Reduce algorithm stage, and further, in the Reduce algorithm stage, the Gibbs algorithm is used to process the data set.

Specifically, in the present embodiment, when in the Reduce algorithm stage, the present embodiment inputs the target data distribution, such as pi (x) ₁ ，x ₂ ，…，x _n ) Wherein x is _n Representing target data while setting a state translation threshold n for the current data set ₁ Corresponding aggregate number n ₂ 。

Further, the embodiment randomly initializes the state value of the data set, and updates the number of the topic corresponding to each word in the data set.

Based on the above, the present embodiment further performs convergence processing on the state values based on the existing coordinate axis rotation algorithm until the Gibbs algorithm converges to obtain a corresponding final sample set

Wherein x is _n And (3) representing samples, wherein n represents the number of the samples, and simultaneously counting the topics corresponding to each word in the current final sample set so as to finally obtain the topic distribution corresponding to the data set.

It should be noted that, for the sake of brevity, the principles and some technical effects of the method according to the third embodiment of the present invention are the same as those of the first embodiment, and reference should be made to the corresponding matters provided in the first embodiment for the description of the present invention.

The fourth embodiment of the present invention also provides a data reading method, which is different from the data reading method provided in the first embodiment in that:

in addition, in this embodiment, it should be noted that the step of reading the topics and the phrases included in each data set according to the topic distribution includes:

Further, in this embodiment, it should be noted that, after the subject distribution is obtained, the present embodiment further collects, in the current subject distribution, the target subjects corresponding to each phrase in the data set respectively through a preset first Gibbs algorithm unit, on the basis that this embodiment further calculates, through a preset second Gibbs algorithm, the word distribution corresponding to each target subject respectively, and finally only needs to collect, in each obtained word distribution, the corresponding target phrase, and thus, the required subject and phrase can be simply and quickly obtained in the current vehicle enterprise forum.

It should be noted that, for the sake of brevity, the method according to the fourth embodiment of the present invention, which implements the same principle and some of the technical effects as those of the first embodiment, may refer to the corresponding content provided by the first embodiment.

The fifth embodiment of the present invention also provides a data reading method, which is different from the data reading method provided in the first embodiment in that:

in addition, in this embodiment, it should be further noted that, after the step of reading the topics and the phrases included in each data set according to the topic distribution, the method further includes:

Specifically, in this embodiment, by the above manner, a corresponding evaluation report can be accurately generated according to a subject and a phrase acquired in real time, and then a required guidance opinion is further generated according to the acquired evaluation report, and at the same time, the guidance opinion generated in real time is sent to a mobile terminal of a user, so that development of a subsequent vehicle enterprise is facilitated.

Further, in this embodiment, it should also be noted that the method further includes:

judging whether the display signal is matched with the display terminal or not;

Specifically, in this embodiment, through the above manner, a wireless communication connection with the display terminal can be accurately established, on the basis of which, a required theme and phrase are converted into corresponding display signals and transmitted to the current display terminal for real-time display, so that a worker can observe the theme and phrase acquired from the vehicle enterprise forum in real time, and guidance comments can be conveniently provided subsequently.

It should be noted that, for the sake of brevity, the method according to the fifth embodiment of the present invention, which implements the same principle and some of the technical effects as those of the first embodiment, may refer to the corresponding content provided by the first embodiment.

Referring to fig. 2, a data reading system according to a sixth embodiment of the present invention is shown, the system includes:

the obtaining module 12 is configured to obtain backup data in an original database, and store the backup data in a distributed file system, so as to split the backup data and store the split backup data in different hard disks;

the processing module 22 is configured to perform classification processing on data in the hard disk, and convert the data into corresponding execution tasks, where the execution tasks include a plurality of data sets;

the reading module 32 is configured to calculate, according to a first preset algorithm, a topic distribution corresponding to each data set, and read, according to the topic distribution, a topic and a phrase included in each data set.

In the above data reading system, the data reading system further includes an analysis module 42, where the analysis module 42 is specifically configured to:

In the above data reading system, the reading module 32 is specifically configured to:

In the above data reading system, the reading module 32 is further specifically configured to:

In the above data reading system, the data reading system further includes an evaluation module 52, where the evaluation module 52 is specifically configured to:

In the above data reading system, the data reading system further includes a display module 62, where the display module 62 is specifically configured to:

judging whether the display signal is matched with the display terminal or not;

A seventh embodiment of the present invention provides a computer including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the data reading method provided in the above embodiment when executing the computer program.

An eighth embodiment of the present invention provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data reading method provided in the above embodiment.

In summary, the data reading method, system, computer and readable storage medium provided in the embodiments of the present invention can store backup data in an original database into a distributed file system, and simultaneously store the backup data in different hard disks, so that separate reading of data can be implemented, and the data reading rate is greatly improved correspondingly.

The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A method of reading data, the method comprising:

calculating the topic distribution corresponding to each data set through a first preset algorithm, and reading out topics and phrases contained in each data set according to the topic distribution;

2. The data reading method according to claim 1, wherein: the method further comprises the steps of:

3. The data reading method according to claim 1, wherein: the step of reading out the topics and phrases contained in each data set according to the topic distribution includes:

4. The data reading method according to claim 1, wherein: after the step of reading out the topics and phrases respectively contained in each data set according to the topic distribution, the method further includes:

5. The data reading method according to claim 4, wherein: the method further comprises the steps of:

judging whether the display signal is matched with the display terminal or not;

6. A data reading system, the system comprising:

the reading module is used for calculating the topic distribution corresponding to each data set through a first preset algorithm, and reading out topics and phrases contained in each data set according to the topic distribution;

the reading module is specifically configured to: inputting target data distribution in the data set through a Gibbs algorithm, and setting a state transition threshold and the number of sets of the data set; randomly initializing a state value of the data set, and updating the serial numbers of topics corresponding to each word in the data set; and carrying out convergence processing on the state values based on a coordinate axis rotation algorithm to obtain a corresponding final sample set, and counting the topics corresponding to each word in the final sample set respectively to obtain topic distribution corresponding to the data set.

7. The data reading system of claim 6, wherein: the data reading system further comprises an analysis module, wherein the analysis module is specifically used for:

8. A computer comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data reading method according to any one of claims 1 to 5 when executing the computer program.

9. A readable storage medium having stored thereon a computer program, which when executed by a processor implements a data reading method according to any of claims 1 to 5.