CN117648920A - Method, device, computer equipment and storage medium for processing research report data - Google Patents
Method, device, computer equipment and storage medium for processing research report data Download PDFInfo
- Publication number
- CN117648920A CN117648920A CN202311736057.3A CN202311736057A CN117648920A CN 117648920 A CN117648920 A CN 117648920A CN 202311736057 A CN202311736057 A CN 202311736057A CN 117648920 A CN117648920 A CN 117648920A
- Authority
- CN
- China
- Prior art keywords
- report data
- target report
- target
- data
- analysis result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000012545 processing Methods 0.000 title claims abstract description 55
- 238000011160 research Methods 0.000 title claims abstract description 38
- 238000004458 analytical method Methods 0.000 claims abstract description 119
- 238000004590 computer program Methods 0.000 claims abstract description 25
- 238000012015 optical character recognition Methods 0.000 claims description 21
- 238000005516 engineering process Methods 0.000 claims description 14
- 238000007781 pre-processing Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 12
- 238000005498 polishing Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 239000010453 quartz Substances 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000003203 everyday effect Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present application relates to a method, apparatus, computer device, storage medium and computer program product for processing data. The method comprises the following steps: acquiring target report data based on the timing task; analyzing the target report data to obtain an analysis result of the target report data; inquiring subscription information of target research report data; and outputting the analysis result of the target report data according to the subscription information of the target report data. The method can improve the processing efficiency and accuracy of the data.
Description
Technical Field
The present invention relates to the field of computer technology, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for processing data.
Background
Currently, in the field of report management, the key information including industry analysis, report abstracts and the like is formed by manually downloading the original text sent by a report supplier and then manually analyzing the report.
However, the existing method for processing the data of the research report has low efficiency in manual sorting, and the method for manually downloading the original text of the research report requires logging in mails every day, downloading attachments one by one and sorting the mails, and analyzing the research report so as to extract key information.
However, the current manual processing method results in low efficiency of processing the report data.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device, a computer-readable storage medium, and a computer program product for processing report data that can improve efficiency.
In a first aspect, the present application provides a method for processing data, including:
acquiring target report data based on the timing task;
analyzing the target report data to obtain an analysis result of the target report data;
inquiring subscription information of target research report data;
and outputting the analysis result of the target report data according to the subscription information of the target report data.
In one embodiment, acquiring target report data based on a timed task includes:
acquiring initial mail attachment information based on a timing task;
preprocessing the initial mail attachment information to obtain target mail attachment information;
and analyzing the target mail attachment information to obtain target report data.
In one embodiment, the analyzing the target report data to obtain the analysis result of the target report data includes:
converting the target research report data into a picture to be identified;
analyzing the picture to be identified by an optical character identification technology to obtain an initial text;
and inputting the initial text into an analysis model to obtain an analysis result of the target report data output by the analysis model.
In one embodiment, analyzing the picture to be recognized by an optical character recognition technology to obtain an initial text includes:
determining a corresponding report template according to the supplier identification of the target report data;
determining each position to be identified of the picture to be identified based on the report grinding template;
and recognizing the data of each position to be recognized by an optical character recognition technology to obtain an initial text corresponding to each position to be recognized.
In one embodiment, after the analyzing process is performed on the target report data to obtain the analysis result of the target report data, the method further includes:
storing the analysis result of the target report data to a message queue;
outputting the analysis result of the target report data according to the subscription information of the target report data, including:
and reading subscription information of the target report data, and outputting an analysis result of the target report data from the message queue based on the subscription information of the target report data.
In one embodiment, the analyzing process is performed on the target report data, and before the analysis result of the target report data is obtained, the method further includes:
storing target report data into a list;
the target report data in the list is adjusted in response to an operation instruction, the operation instruction including at least one of adding, deleting, modifying, and querying.
In a second aspect, the present application further provides an apparatus for processing data, including:
the acquisition module is used for acquiring target report data based on the timing task;
the analysis module is used for analyzing the target report data to obtain an analysis result of the target report data;
the inquiry module is used for inquiring subscription information of the target research report data;
and the output module is used for outputting the analysis result of the target report data according to the subscription information of the target report data.
In a third aspect, the present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above-described method.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described above.
The method, the device, the computer equipment, the storage medium and the computer program product for processing the report data acquire target report data based on a timing task; analyzing the target report data to obtain an analysis result of the target report data; inquiring subscription information of target research report data; and outputting the analysis result of the target report data according to the subscription information of the target report data. According to the method, analysis processing is carried out on the target report data acquired based on the timing task, and according to the subscription information of the target report data, the analysis result of the target report data is pushed to the equipment subscribed by the target report data, so that the problem of low efficiency of manually processing the target report data is solved, and the efficiency of processing the target report data can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a diagram of an application environment for a method of processing data according to one embodiment;
FIG. 2 is a flowchart of a method for processing data according to an embodiment;
FIG. 3 is a flow chart of acquiring target report data based on a timing task in one embodiment;
FIG. 4 is a flow chart illustrating the analysis of target report data according to one embodiment;
FIG. 5 is a flowchart of an embodiment of analyzing a picture to be recognized by an optical character recognition technique to obtain an initial text;
FIG. 6 is a flow chart illustrating a process of analyzing target report data according to another embodiment, before the analysis result of the target report data is obtained;
FIG. 7 is a flowchart of a method for processing data according to an embodiment;
FIG. 8 is a schematic diagram of an interaction flow of a mail acquisition and subscription distribution module in one embodiment;
FIG. 9 is a schematic diagram of an interaction flow of the research management module in one embodiment;
FIG. 10 is a schematic diagram of an interaction flow of the report data parsing module in one embodiment;
FIG. 11 is a block diagram illustrating an embodiment of a data processing apparatus;
fig. 12 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The method for processing the report data provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the user device 102 communicates with the server 104 via a network. Mail servers may be integrated on server 104 or may be located on the cloud or other network server. The server acquires target report data based on the timing task; analyzing the target report data to obtain an analysis result of the target report data; inquiring subscription information of target research report data; and outputting the analysis result of the target report data according to the subscription information of the target report data. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In an exemplary embodiment, as shown in fig. 2, a method for processing data is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps S202 to S208. Wherein:
step S202, acquiring target report data based on the timing task.
The research report data may be data in a portable file format (portable document format, abbreviated as PDF), or may be in an XML format or a WORD format.
Optionally, the server sets a timing acquisition task, such as a quantiz task scheduling framework, in the report data processing system, and acquires the target report data in PDF format from the mail system, the application program, the web page information, the applet, etc. at fixed intervals, such as every 24 hours, every week, etc., by timing the target report data task through the quantiz task scheduling framework.
Optionally, before acquiring the target report data based on the timing task, the server sets a quantiz timing task scheduling frame in the report data processing system in advance, and the server judges whether to initiate the acquisition task through the quantiz timing task scheduling frame, and when receiving the acquisition task instruction, that is, when the acquisition task is required to be initiated, acquires the target report data from the mail system, the application program, the webpage information, the applet and the like at fixed time intervals.
Optionally, the server may store the obtained target report data in a storage database through the report data processing system, generate list information, and perform maintenance management on the target report data through an instruction to perform modification operation, deletion operation, addition operation, query operation and the like on the target report data in the list.
Further, the server can upload the target report data to the database through the report data processing system through the interface, update the list information, realize the management of adding, deleting and checking the target report data, and process other related requests in the system, such as format conversion requests.
Step S204, analyzing the target report data to obtain an analysis result of the target report data.
The analysis result can be all information of analysis target report data; the information may be part of the target report data, that is, the key information of the target report data.
Optionally, the server analyzes the target report data through the report data processing system to obtain all information of the analyzed target report data; key information of the target report data, such as supplier information, analyst information, report summary information, etc., can also be obtained. The parsing process may include format conversion, character recognition, text combination, and other processing modes.
Step S206, inquiring subscription information of the target report data.
Optionally, before querying subscription information of the target report data, the user equipment subscribes the target report data through a report data processing system, and the report data processing system stores the subscription information of the user equipment into a subscription storage database; the server queries subscription information of the target report data through the report data processing system, wherein the subscription information can comprise user equipment information, user equipment subscription time, subscription period and the like for subscribing the target report data, and the server can issue the target report data to all user equipment subscribing the target report data.
Step S208, according to the subscription information of the target report data, the analysis result of the target report data is output.
Optionally, the server outputs the analysis result of the target report data to the user equipment subscribing to the target report data according to the subscription information of the target report data, so that the server pushes the target report data to the user equipment.
Optionally, the server pushes the target report data task at regular time through the quantiz task scheduling framework according to the regular push task and the subscription information of the target report data, and when the quantiz regular task scheduling framework needs to initiate the subscription push task, the analysis result is pushed to the subscription information of the target report data every fixed time, for example, 8 points in the morning every day.
Optionally, outputting the analysis result of the target report data according to the subscription information of the target report data. Before, the server may store the analysis result in the appointed database and generate a list, and the server may perform maintenance operations such as adding, deleting, modifying and checking the analysis result of the target report data according to the obtained instruction, and update the list information according to the instruction.
In the embodiment of the present application, the time of the timing task, the content of the subscription information, and the analysis result of the target report data are not limited.
In the method for processing the report data, the target report data is acquired based on the timing task; analyzing the target report data to obtain an analysis result of the target report data; inquiring subscription information of target research report data; and outputting the analysis result of the target report data according to the subscription information of the target report data. In this way, analysis processing is performed on target report data acquired based on the timing task; and according to the subscription information of the target report data, the analysis result of the target report data is pushed to the equipment subscribed by the target report data, so that the problem of low efficiency of manually processing the target report data is solved, and the efficiency of processing the target report data can be improved.
In an exemplary embodiment, as shown in fig. 3, the target report data is acquired based on the timing task, including steps S302 to S306. Wherein:
step S302, acquiring initial mail attachment information based on the timing task.
Optionally, the server obtains initial mail attachment information from the mail system at each fixed time according to the Quartz timing task scheduling frame, the mailbox account number and the authorization code, wherein the initial mail attachment information can be obtained by a single thread or multiple threads.
Further, the server may set conditions for the timing acquisition task, for example, acquire mail sent as a certain provider, may set a plurality of mail, or may set one mail; the mail attachment of other suppliers can be obtained by reverse setting and selecting other suppliers.
Step S304, preprocessing the initial mail attachment information to obtain target mail attachment information.
Optionally, the server pre-processes the acquired initial mail attachment information through the research and report data processing system, wherein the pre-process comprises removing webpage labels and illegal symbols from the mail, screening the mail containing Chinese text, performing sentence segmentation, word segmentation, part-of-speech labeling and dependency syntax analysis on the mail content, and storing the extracted mail result in a database.
Step S306, analyzing the target mail attachment information to obtain target report data.
Optionally, the server performs multi-thread analysis processing on the target mail attachment information obtained by preprocessing, and the analysis is mainly performed on the target mail attachment information, for example, the mail attachment information formed by a hypertext markup language in a mail system is analyzed into target research data in a PDF format through a hypertext transfer protocol.
In this embodiment, the target mail attachment is obtained by preprocessing the initial mail attachment, and then the target mail attachment is analyzed to obtain the target report data, so that the processing amount of the report data can be reduced, and illegal report data can be screened out in advance, thereby improving the processing efficiency of the report data. Meanwhile, the target mail attachment information is analyzed by multiple threads, and the processing efficiency of the data is improved.
In an exemplary embodiment, as shown in fig. 4, the analysis processing is performed on the target report data to obtain the analysis result of the target report data, which includes steps S402 to S406. Wherein:
step S402, converting the target report data into a picture to be identified.
Optionally, the server converts the target report data, e.g., in PDF format, into a picture format. And converting the target research data document into a picture to be identified.
Step S404, analyzing the picture to be recognized by an optical character recognition technology to obtain an initial text.
Optical character recognition (Optical Character Recognition, abbreviated OCR) refers to extracting text information in an image, and generally includes text detection and text recognition.
Optionally, the server detects and identifies the text in the picture to be identified by the OCR technology, and extracts the text information detected and identified, thereby obtaining the initial text of the target report data, the initial textThe method can comprise various character strings of the research report, or can comprise a character string appointed in target research report data, wherein the appointed character string can comprise a character string at an appointed position, for example, the appointed position is a character string taking the lower left corner of a page as an origin of coordinates, the bottom of the page as a transverse coordinate axis, the left side of the page as a longitudinal coordinate axis, the distance of the coordinate axes is in millimeters, and the upper left coordinate (X) 1 ,Y 1 ) Upper right coordinates (X) 2 ,Y 2 ) Lower right coordinates (X) 3 ,Y 3 ) Lower left coordinates (X) 4 ,Y 4 ) The determined area is taken as a designated area; a string of specified characters, such as analysts, report titles, industry reviews, industry performances, risk cues, view summaries, etc., may also be included.
Step S406, inputting the initial text into the analysis model to obtain the analysis result of the target report data output by the analysis model.
Optionally, the server inputs the initial text into an analytical model, wherein the analytical model is a model trained in advance by adopting a machine learning model, and can be a neural network model, a support vector machine model and the like; the analysis model can carry out semantic analysis on the input initial text, and output analysis results according to the semantic analysis results, wherein the analysis results can comprise contents such as analysts, report titles, industry critique, industry performance, risk prompt, view overview and the like of target report data.
Further, the server can export a plurality of records to the table file at regular time, the analysis result of the same target report data is one record, and the server can display a plurality of records of the table file.
In the embodiment, through conversion, identification and model analysis of the target report data, subjectivity and personal prejudice existing in manual analysis of the target report data can be avoided, and objectivity and accuracy of analysis results of the target report data are improved.
In an exemplary embodiment, as shown in fig. 5, the picture to be recognized is parsed by the optical character recognition technology to obtain the initial text, which includes steps S502 to S506. Wherein:
step S502, determining a corresponding report template according to the supplier identification of the target report data.
Optionally, each provider of the target report data has a respective report template, where the report template includes a location of each content, for example, a location of an "analyst" text, a location of a "report title" text, a location of a "view summary" text, and so on. The server stores the supplier identification of each target report data and the corresponding report template into the database, and the server determines the corresponding report template from the database according to the supplier identification of the current target report data.
Step S504, determining each position to be identified of the picture to be identified based on the report template.
Optionally, the server determines each position to be identified of the picture to be identified based on the position information of the report template. Such as the location of the "analyst" text, the location of the "report heading" text, the location of the "opinion summary" text, etc.
Step S506, the data of each position to be identified is identified through an optical character identification technology, and an initial text corresponding to each position to be identified is obtained.
Optionally, the server recognizes the data of each position to be recognized through an optical character recognition technology OCR to obtain an initial text corresponding to each position to be recognized.
In this embodiment, by setting the newspaper grinding templates of different supplier identifiers, each position to be identified of the picture to be identified is determined, so that the position to be identified in the picture can be quickly identified by the newspaper grinding templates of different supplier identifiers, the identification work on unnecessary areas is reduced, the identification efficiency is improved, the position in the picture to be identified can be determined by the position of the newspaper grinding template, and the identification accuracy is also improved.
In an exemplary embodiment, after performing the parsing process on the target report data to obtain the parsing result of the target report data, the method further includes: storing the analysis result of the target report data to a message queue; outputting the analysis result of the target report data according to the subscription information of the target report data, including: and reading subscription information of the target report data, and outputting an analysis result of the target report data from the message queue based on the subscription information of the target report data.
Optionally, after analyzing the target report data to obtain an analysis result of the target report data, the server stores the analysis result of the target report data into a message queue, generates an analysis result list according to the analysis result of the target report data stored in the message queue, and performs adding, deleting and checking operations on the analysis result in the analysis result list according to the instruction; the server reads the subscription information of the target report data, judges whether to initiate a subscription pushing task through a Quartz timing task scheduling framework, pushes the analysis result to the subscription information of the target report data every fixed time, such as 8 points in the morning every day when the subscription pushing task is required to be initiated, and outputs the analysis result of the target report data to the user equipment subscribing the target report data from the message queue according to the subscription information of the target report data so as to push the target report data to the user equipment.
In this embodiment, by storing the analysis result of the target report data in the message queue and outputting the analysis result of the target report data from the message queue according to the subscription information of the target report data, the pushing of the target report data can be dispersed to a period of time for processing, so that the system breakdown is avoided, and meanwhile, only the target report data is pushed to the user equipment subscribing the subject, so that the cost of server resources is reduced.
In an exemplary embodiment, as shown in fig. 6, before the analysis processing is performed on the target report data to obtain the analysis result of the target report data, steps S602 to S604 are included. Wherein:
step S602, store the target report data into the list.
Optionally, the server stores the target report data in a database, and generates a list according to the target report data stored in the database. The server may also present the list information.
In step S604, the target report data in the list is adjusted in response to an operation instruction, where the operation instruction includes at least one of adding, deleting, modifying, and querying.
Optionally, the server receives operation instructions such as adding, deleting, modifying, querying, and the like, and the server can also perform modification operation, deleting operation, adding operation, querying operation, and the like on the target report data in the list through the instructions, so as to maintain and manage the target report data.
Further, the server can upload the target report data to the database through the interface and update the list information, and can also realize the management of adding, deleting and checking the target report data, and can also process related requests sent by other execution bodies, such as format conversion requests.
In this embodiment, the target report data in the list is adjusted by the operation instruction, so that normalized management of the target report data can be realized, and the historical document data can be efficiently managed.
In one exemplary embodiment, the servers include, in particular, a lapping server, a parsing source server; the lapping server can also comprise a mail dispatching and distributing server and a lapping management center; the parsing source server may also include a research report parsing platform, an OCR server, and a parsing model server.
As shown in fig. 7, the present embodiment relates to interactions among user devices, mailbox servers and servers, where the servers include a research server and a parsing source server. In particular, the embodiment relates to three modules, namely a mail acquisition and subscription distribution module, a research management module and a research data analysis module.
The mail sorting server periodically captures and sorts mails from the mail server, and the mail sorting server processes the mails and the attachment grinding report; the research server sends a research data processing request to the analysis source server; and the analysis source server returns an analysis result of the report data. The user equipment inquires the report data list and the subscription information, the report server processes the analysis result and the subscription information, and pushes the analysis result of the report data of different categories to the mailbox server, and the mailbox server sends the analysis result of the report data of different categories to the user equipment subscribing the report data.
As shown in fig. 8, the acquiring and subscribing distributing module relates to user equipment, a mailbox server and a server, wherein the server comprises a mail scheduling and distributing server and a research management center.
The scheduling and distributing server determines whether to initiate an acquisition task through a Quartz task scheduling frame, and acquires initial mail attachment information from a mail server when the acquisition task needs to be initiated; the mail server returns a mail body and initial mail attachment information, the dispatch and distribution server carries out pretreatment on the initial mail attachment information, the pretreatment comprises removing webpage labels and illegal symbols from the mail, screening out the mail containing Chinese text, carrying out sentence segmentation, word segmentation, part-of-speech labeling and dependency syntax analysis on the mail content, and storing the extracted mail result in a database to obtain legal target mail attachment information; the dispatching and distributing server analyzes the target mail attachment information to obtain the research report data. And the scheduling and distributing server sends the PDF format report data to a report management center. The user equipment sends industry subscription information to the mail scheduling and distributing server, the mail scheduling and distributing server judges whether to initiate a subscription pushing task at regular time through the Quartz task scheduling framework, when the subscription pushing task needs to be initiated, the mail scheduling and distributing server sends the subscription information to the polishing management center, the polishing management center returns analysis results of the polishing data to the mail scheduling and distributing server, the mail scheduling and distributing server sends analysis results of the polishing data to the mail server, and the mail server sends analysis results of the polishing data to the user equipment receiving the subscription information.
As shown in fig. 9, the lapping management module relates to a user device and a server, wherein the server comprises a mail scheduling and distributing server, a lapping management center and a lapping analysis platform.
The mail dispatching and distributing server uploads the PDF format report data to the report management center; the polishing management center maintains a polishing data task and sends polishing data in a PDF format to a polishing analysis platform; the report analysis platform returns the analysis result to the report management center, the report management center processes the report data information list, and the user equipment can perform daily maintenance of the report data, such as deletion operation and modification operation, in the report management center by sending an instruction. The management center pushes the analysis result to the mail dispatch and distribution server.
As shown in fig. 10, the report analysis platform relates to a server, wherein the server comprises a report management center, a report analysis platform, an OCR server and an analysis model server.
The management center sends the report data in PDF format to the report analysis platform, the report analysis platform receives the report data in PDF format, converts the report data in PDF format into picture format, sends the picture to the OCR server for OCR processing, and returns OCR identification information; the report analysis platform inputs the returned OCR recognition information to the analysis model server, the analysis model server returns the text abstract information to the report analysis platform, and the report analysis platform informs the analysis result to the report management center.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a research report data processing device for realizing the above related research report data processing method. The implementation of the solution provided by the device is similar to that described in the above method, so the specific limitation of one or more embodiments of the report data processing device provided below may refer to the limitation of the report data processing method described above, and will not be repeated here.
In an exemplary embodiment, as shown in fig. 11, there is provided a research report data processing apparatus, including: an acquisition module 1101, a parsing module 1102, a query module 1103 and an output module 1104, wherein:
the obtaining module 1101 is configured to obtain target report data based on the timing task.
And the analysis module 1102 is used for analyzing the target report data to obtain an analysis result of the target report data.
The query module 1103 is configured to query subscription information of the target report data.
And the output module 1104 is configured to output an analysis result of the target report data according to the subscription information of the target report data.
In one exemplary embodiment, the obtaining module 1101 includes:
and the acquisition unit is used for acquiring the initial mail attachment information based on the timing task.
And the preprocessing unit is used for preprocessing the initial mail attachment information to obtain target mail attachment information.
And the analysis unit is used for analyzing the target mail attachment information to obtain target research data.
In one exemplary embodiment, parsing module 1102 includes:
and the conversion unit is used for converting the target research report data into the picture to be identified.
And the initial text determining unit is used for analyzing the picture to be recognized through an optical character recognition technology to obtain an initial text.
The result acquisition unit is used for inputting the initial text into the analysis model and acquiring the analysis result of the target report data output by the analysis model.
In an exemplary embodiment, the initial text determining unit further includes:
and the report template determining subunit is used for determining the corresponding report template according to the supplier identification of the target report data.
And the position determining subunit is used for determining each position to be identified of the picture to be identified based on the report grinding template.
And the initial text determining subunit is used for identifying the data of each position to be identified through an optical character identification technology to obtain an initial text corresponding to each position to be identified.
In an exemplary embodiment, an apparatus for processing data for research, further includes:
and the first storage module is used for storing the analysis result of the target report data to the message queue.
An output module 1104 comprising:
the reading output unit is used for reading the subscription information of the target report data and outputting the analysis result of the target report data from the message queue based on the subscription information of the target report data.
In an exemplary embodiment, an apparatus for processing data for research, further includes:
and the second storage module is used for storing the target report data into the list.
And the adjusting module is used for responding to an operation instruction, and adjusting the target report data in the list, wherein the operation instruction comprises at least one of adding, deleting, modifying and inquiring.
The modules in the above-mentioned research data processing device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 12. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing the report data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for processing data.
It will be appreciated by those skilled in the art that the structure shown in fig. 12 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.
Claims (10)
1. A method for processing data, the method comprising:
acquiring target report data based on the timing task;
analyzing the target report data to obtain an analysis result of the target report data;
inquiring subscription information of the target report data;
and outputting the analysis result of the target report data according to the subscription information of the target report data.
2. The method of claim 1, wherein the acquiring target report data based on the timed task comprises:
acquiring initial mail attachment information based on the timing task;
preprocessing the initial mail attachment information to obtain target mail attachment information;
and analyzing the target mail attachment information to obtain the target report data.
3. The method of claim 1, wherein the parsing the target report data to obtain a parsing result of the target report data comprises:
converting the target research data into a picture to be identified;
analyzing the picture to be recognized by an optical character recognition technology to obtain an initial text;
and inputting the initial text into an analysis model to obtain an analysis result of the target report data output by the analysis model.
4. A method according to claim 3, wherein said parsing the picture to be recognized by optical character recognition technology to obtain an initial text comprises:
determining a corresponding report template according to the supplier identification of the target report data;
determining each position to be identified of the picture to be identified based on the report template;
and recognizing the data of each position to be recognized by the optical character recognition technology to obtain an initial text corresponding to each position to be recognized.
5. The method according to claim 1, wherein after the parsing of the target report data, the method further comprises:
storing the analysis result of the target research data to a message queue;
the outputting the analysis result of the target report data according to the subscription information of the target report data comprises the following steps:
reading the subscription information of the target report data, and outputting the analysis result of the target report data from the message queue based on the subscription information of the target report data.
6. The method according to any one of claims 1 to 5, wherein before the analyzing the target report data to obtain the analysis result of the target report data, the method further comprises:
storing the target report data into a list;
the target report data in the list is adjusted in response to an operational instruction, the operational instruction including at least one of adding, deleting, modifying, and querying.
7. An abrasive data processing device, the device comprising:
the acquisition module is used for acquiring target report data based on the timing task;
the analysis module is used for carrying out analysis processing on the target report data to obtain an analysis result of the target report data;
the query module is used for querying subscription information of the target research report data;
and the output module is used for outputting the analysis result of the target report data according to the subscription information of the target report data.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311736057.3A CN117648920A (en) | 2023-12-18 | 2023-12-18 | Method, device, computer equipment and storage medium for processing research report data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311736057.3A CN117648920A (en) | 2023-12-18 | 2023-12-18 | Method, device, computer equipment and storage medium for processing research report data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117648920A true CN117648920A (en) | 2024-03-05 |
Family
ID=90049441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311736057.3A Pending CN117648920A (en) | 2023-12-18 | 2023-12-18 | Method, device, computer equipment and storage medium for processing research report data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117648920A (en) |
-
2023
- 2023-12-18 CN CN202311736057.3A patent/CN117648920A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108932294B (en) | Resume data processing method, device, equipment and storage medium based on index | |
CN110795919B (en) | Form extraction method, device, equipment and medium in PDF document | |
US20100318492A1 (en) | Data analysis system and method | |
WO2019196226A1 (en) | System information querying method and apparatus, computer device, and storage medium | |
CN111666490A (en) | Information pushing method, device, equipment and storage medium based on kafka | |
WO2021248492A1 (en) | Semantic representation of text in document | |
US20210406981A1 (en) | Method and apparatus of determining display page, electronic device, and medium | |
US20150278248A1 (en) | Personal Information Management Service System | |
CN102880683A (en) | Automatic network generation system for feasibility study report and generation method thereof | |
CN111191111A (en) | Content recommendation method, device and storage medium | |
CN111651552A (en) | Structured information determination method and device and electronic equipment | |
CN111078980A (en) | Management method, device, equipment and storage medium based on credit investigation big data | |
CN113962597A (en) | Data analysis method and device, electronic equipment and storage medium | |
CN117095419A (en) | PDF document data processing and information extracting device and method | |
KR20220079026A (en) | A apparatus for providing general document-based multimedia image content production service | |
CN111026972A (en) | Subscription data pushing method, device, equipment and storage medium in Internet of things | |
CN117648920A (en) | Method, device, computer equipment and storage medium for processing research report data | |
CN115730603A (en) | Information extraction method, device, equipment and storage medium based on artificial intelligence | |
CN116166858A (en) | Information recommendation method, device, equipment and storage medium based on artificial intelligence | |
CN113536788B (en) | Information processing method, device, storage medium and equipment | |
CN112149391B (en) | Information processing method, information processing apparatus, terminal device, and storage medium | |
CN115204393A (en) | Smart city knowledge ontology base construction method and device based on knowledge graph | |
CN114925125A (en) | Data processing method, device and system, electronic equipment and storage medium | |
CN113821555A (en) | Unstructured data collection processing method of intelligent supervision black box | |
CN114610769A (en) | Data analysis method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |