CN111782916A - Method and device for generating service information report - Google Patents

Method and device for generating service information report Download PDF

Info

Publication number
CN111782916A
CN111782916A CN202010842237.XA CN202010842237A CN111782916A CN 111782916 A CN111782916 A CN 111782916A CN 202010842237 A CN202010842237 A CN 202010842237A CN 111782916 A CN111782916 A CN 111782916A
Authority
CN
China
Prior art keywords
information
service
report
business
service information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010842237.XA
Other languages
Chinese (zh)
Other versions
CN111782916B (en
Inventor
苏豫陇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010842237.XA priority Critical patent/CN111782916B/en
Publication of CN111782916A publication Critical patent/CN111782916A/en
Application granted granted Critical
Publication of CN111782916B publication Critical patent/CN111782916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification provides a method and a device for generating a service information report. In the method, a generation request of the service information report is received, the generation request comprises a report subject of the service information report, and report configuration information of the service information report is determined according to the generation request of the service information report, wherein the report configuration information comprises a report template and an information source address list; crawling the service information according to the information source address list by using a web crawler, determining target service information from the crawled service information, and generating a service information report according to the target service information and the report template.

Description

Method and device for generating service information report
Technical Field
The embodiment of the specification relates to the technical field of computer networks, in particular to a method and a device for generating a service information report.
Background
The internet is a source of various information and messages, and practitioners in various industries can search concerned information from the internet. For some practitioners, information in the industry needs to be paid attention to in daily life so as to know the industry dynamic. In order to facilitate the relevant personnel to obtain the industry information, the business information report is generated. The service information report is a report which is obtained by publicly obtaining relevant information through an internet channel, and screening and summarizing according to service types. The information presented by the service information report is the organized key, hot and latest industry information, and for related personnel, the concerned information can be more directly and more conveniently obtained from the service information report without searching from massive internet information.
Disclosure of Invention
In view of the above, the present specification provides a method and an apparatus for generating a service information report. In the method, in response to a request for generating a service information report, report configuration information of the service information report is determined according to a report subject of the service information report, then a web crawler is used to crawl service information according to an information source address list, target service information is determined from the crawled service information, and the service information report is generated according to the target service information and a report template. The method can directly generate the corresponding service information report, thereby improving the report generation efficiency. And the relevance between the target service information acquired according to the corresponding report configuration information and the service information report is higher, so that the quality of the information content presented by the generated service information report is higher.
According to an aspect of an embodiment of the present specification, there is provided a method for generating a service information report, including: receiving a generation request of the business information report, wherein the generation request comprises a report subject of the business information report; determining report configuration information of the service information report according to the generation request of the service information report, wherein the report configuration information comprises a report template and an information source address list; crawling service information according to the information source address list by using a web crawler; determining target business information from the crawled business information; and generating the service information report according to the target service information and the report template.
Optionally, in one example of the above aspect, determining the target business information from the crawled business information comprises: sequencing the crawled service information; and determining the target service information according to the sequencing result of the service information.
Optionally, in an example of the above aspect, before ranking the crawled business information, the method further comprises: and performing information screening processing or information duplication removing processing on the crawled service information.
Optionally, in an example of the foregoing aspect, the report configuration information further includes keywords and/or a logical combination between the keywords, and performing information filtering processing on the crawled business information includes: and performing information screening processing on the crawled business information by using the keywords and/or the logic combination among the keywords.
Optionally, in one example of the above aspect, determining the target business information from the crawled business information comprises: target business information is determined from the crawled business information based at least in part on the relevancy among the business information.
Optionally, in an example of the above aspect, the report template includes at least two business sections, each of the at least two business sections is for a different business topic of the report topic, and determining the target business information from the crawled business information based at least in part on a correlation between the business information includes: and determining target business information of each business section from the crawled business information at least partially according to the correlation degree between the business information and the business information of other business sections.
Optionally, in an example of the above aspect, the information source address list of each service block is determined according to a service theme of each service block.
Optionally, in an example of the above aspect, further comprising: determining the presentation sequence of each service section in the report template according to the target service information of each service section; and generating the service information report according to the target service information and the report template comprises: and generating the service information report according to the target service information of each service section, the report template and the presentation sequence of each service section in the report template.
Optionally, in an example of the foregoing aspect, determining, according to target service information of each service section, a presentation order of each service section in the report template includes: determining a first association degree between each service section and the report subject and a second association degree between each service section and other service sections according to the target service information of each service section; and determining the presentation sequence of each business section in the report template according to the first relevance and the second relevance of each business section.
According to another aspect of the embodiments of the present specification, there is also provided an apparatus for generating a service information report, including: the request receiving unit is used for receiving a generation request of the service information report, and the generation request comprises a report subject of the service information report; the configuration information determining unit is used for determining report configuration information of the service information report according to the generation request of the service information report, wherein the report configuration information comprises a report template and an information source address list; the information crawling unit crawls business information according to the information source address list by using a web crawler; a target service information determining unit for determining target service information from the crawled service information; and a report generating unit for generating the service information report according to the target service information and the report template.
Optionally, in an example of the above aspect, the target service information determining unit: sequencing the crawled service information; and determining the target service information according to the sequencing result of the service information.
Optionally, in an example of the above aspect, the apparatus further comprises: and the information processing unit is used for performing information screening processing or information duplication removing processing on the crawled service information.
Optionally, in an example of the above aspect, the target service information determining unit: target business information is determined from the crawled business information based at least in part on the relevancy among the business information.
Optionally, in an example of the above aspect, the report template includes at least two service sections, each of the at least two service sections is for a different service topic of the report topic, and the target service information determining unit: and determining target business information of each business section from the crawled business information at least partially according to the correlation degree between the business information and the business information of other business sections.
Optionally, in an example of the above aspect, further comprising: the layout sequence determining unit is used for determining the presentation sequence of each service layout in the report template according to the target service information of each service layout; and the report generation unit: and generating the service information report according to the target service information of each service section, the report template and the presentation sequence of each service section in the report template.
Optionally, in one example of the above aspect, the tile order determination unit: determining a first association degree between each service section and the report subject and a second association degree between each service section and other service sections according to the target service information of each service section; and determining the presentation sequence of each business section in the report template according to the first relevance and the second relevance of each business section.
According to another aspect of embodiments herein, there is also provided an electronic device, including: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method for generating a business information report as described above.
According to another aspect of embodiments herein, there is also provided a machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method for generating a business information report as described above.
Drawings
A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.
Fig. 1 is a flowchart illustrating an example of a method for generating a business information report according to an embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating an example of a process for processing business information according to an embodiment of the present disclosure.
FIG. 3 is a diagram illustrating an example of generating a business information report based on a report template according to an embodiment of the present disclosure.
Fig. 4 is a block diagram illustrating an example of a service information report generation apparatus according to an embodiment of the present disclosure.
FIG. 5 shows a block diagram of an electronic device implementing a method for generating a business information report according to an embodiment of the present description.
Detailed Description
The subject matter described herein will be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. In addition, features described with respect to some examples may also be combined in other examples.
As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.
FIG. 1 shows a flowchart of one example of a method 100 for generating a business information report, in accordance with embodiments of the present disclosure.
As shown in fig. 1, a request for generation of a service information report may be received at 110.
The business information report may be a report for a report topic, and the business information report gathers various information related to the report topic, and the information is arranged and presented in the business information report according to rules of business type, heat, authority, and the like. Each piece of information in the service information report can be from information published on the internet.
The generation request of the service information report can comprise a report subject of the service information report, the report subject is correspondingly associated with the service information report, the report subject determines the content direction and the related service range of the service information report, and the service information report is used for presenting related information aiming at the report subject. For example, if the service information report to be generated is related to the amortization cost, it may be determined that the report subject of the service information report may be the amortization cost, and further, it may be determined that the service area related to the service information report is the amortization cost.
The reporting topic may be a topic for the entire industry, for example, the reporting topic may be a topic for the financial industry; it may also be a subject for a professional area or branch in the industry, for example, the reporting subject may be a subject for amortization in the financial industry.
In one example, the request for generating the service information report may be automatically generated at a designated time by the apparatus implementing the method 100. The specified time may be a specified time interval or a specified time point. The device does not need user intervention in the process of generating the service information report through the method 100 provided by the embodiment of the specification, and the convenience of obtaining the service information report by the user is improved.
In another example, the request for generating the service information report may be sent by a user, and the apparatus implementing the method 100 generates the service information report according to the request for generating. In this example, the user may request to generate the service information report according to his/her own needs (e.g., at any time), and the user experience is better for the user.
Then, at 120, report configuration information of the service information report may be determined according to the generation request of the service information report.
Each service information report corresponds to report configuration information, and the report configuration information corresponding to different service information reports may be different or the same. The report configuration information is used for guiding the generation of corresponding service information reports, and different report configuration information can correspondingly generate different service information reports.
The report configuration information corresponding to each service information report may include a report template and an information source address list for the service information report. The report template is used for determining a report overall frame of the corresponding service information report, and report templates of different service information reports can be different.
For example, the service information report with the report subject of the amortization cost may include service information of two service subjects of legislation and market, and the report template of the service information report includes two sections, and the two sections are respectively for the two service subjects of legislation and market. And the service information report with the report subject of storing the service violation is only directed at one service subject, and the report template of the service information report only comprises one block.
The information source address list may include a plurality of information source addresses, which may include web addresses, database addresses, etc. The information source address list of the service information report is the source of the service information report, and the information source address list can be different or the same for different service information reports.
The information source address list of the service information report can be determined according to the report subject of the service information report, and the service information from each information source address in the information source address list is related to the report subject. For example, if the report subject of the business information report is securities finance, the information source address list of the business information report may include a plurality of securities finance portal sites.
Besides being determined according to the report subject, the information source address in the information source address list can be determined according to the media authority degree of the published service information, the user access amount and the like. For example, the higher the correlation with the report topic, the more authoritative the media for publishing the information, and the more visited web sites are, the more easily determined as the information source address in the information source address list.
In one example, the report configuration information corresponding to each service information report may be stored in the report configuration information base in advance. When a request for generating a service information report is received, report configuration information corresponding to the service information report can be obtained from the report configuration information base. The report configuration information base can store the corresponding relation between the report subject and the report configuration information, or the corresponding relation between the report identification of the service information report and the report configuration information. Therefore, the corresponding report configuration information can be obtained only by including the report subject or the report identification in the generation request, the generation request does not need to be configured in a complex way, and the device can generate the generation request of the service information report automatically.
Next, at 130, a web crawler is utilized to crawl business information from the list of information source addresses.
The web crawler in the embodiments of the present specification may include a general web crawler, a focused web crawler, and the like. The web crawler crawls the service information from each information source address in the list of information source addresses, and in one example, the web crawler crawls all information in each information source address, where all information includes the service information and other information.
In another example, the web crawler may crawl only the traffic information in the respective information source addresses, e.g., the web crawler may selectively crawl only the traffic information in the information source addresses through a regularized expression. Therefore, the crawling amount of the web crawler from each information source address is reduced, and the crawling efficiency of the web crawler is improved.
In one example, a web crawler may crawl business information at specified times, which may be specified points in time or specified time intervals.
When the information source address comprises service information of different time points, the network crawler can determine the time point of the last crawling each time, and then crawl the service information from the time point of the last crawling to the current time point. Therefore, the method can avoid the repetition of the business information crawled before, and improves the crawling efficiency.
The web crawler in the embodiment of the description specifically crawls the service information only from each information source address in the information source address list without crawling in the whole network, so that the number of the crawled service information is reduced, and the subsequent processing amount aiming at the service information is reduced. And the information source address list is determined according to the report subject, so that the relevance between the service information crawled according to the information source address list and the service information report is higher, and the information content presented by the generated service information report is more accurate.
After the business information is crawled, at 140, target business information is determined from the crawled business information.
In one example, the service information may be subjected to an information screening process, the information screening process is to screen out target service information with higher association degree with the service information report from a large amount of crawled service information, and the screened target service information is used to generate the service information report, so that the information content in the generated service information report is more simplified and more accurate.
In one way of information screening process, the keywords can be used to perform information screening process on the crawled business information. The keywords may include forward filtering keywords and/or backward filtering keywords, where the forward filtering keywords are words with high correlation with the report topic of the service information report, for example, the forward filtering keywords may include high-frequency words, professional words, and the like in the service range corresponding to the report topic. For example, if the report topic is stall economy, then the forward filtering keywords may include "stall" and "economy".
The reverse filtering keywords are keywords that are contrary to the reporting subject of the business information report, unrelated but confusing, or specifically excluded by the business information report.
The forward filtering keyword is used to directly match the target service information, i.e. the service information matched with the forward filtering keyword can be determined as the target service information. The reverse screening keywords are used for excluding non-target business information in the crawled business information, namely, the business information matched with the reverse screening keywords can be determined as the non-target business information, and the non-target business information can be directly eliminated.
In this example, the report configuration information may also include keywords and/or logical combinations between the respective keywords. The keywords and/or the logic combination between the keywords are used to perform the information screening process.
The keywords used in the information screening process are independent from each other, and the information screening process is performed using each keyword. For example, if the forward filtering keywords include "stall" and "economy", the forward filtering keywords "stall" may be used to perform one information filtering process, and the forward filtering keywords "economy" may be used to perform another information filtering process.
The logical combination of the keywords used in the information screening process is combined by the logical relations of the keywords such as sum, or exclusion.
For example, the report subject of the business information report is a deposit business violation, and the forward filtering keywords associated with the report subject may include: deposit, deposit receipt, account, identity, real name, management, chemical name opening, virtual opening and the like, reverse screening keywords can comprise postal deposit, postal storage and the like, and the obtained logical combination of the keywords can be as follows: ((deposit/account) - (postal deposit/postal storage) + (identity/real name/management))/true/false opening, where "/" in the logical combination means or, "+" means sum and "-" means exclude.
In another example, the service information may be further subjected to information deduplication processing, where the information deduplication processing is to deduplicate service information with the same or similar semantics, and redundant service information may be removed through the information deduplication processing, so as to achieve the purpose of simplifying the service information.
One way of the information duplication elimination process is to perform semantic analysis on each service information to obtain an abstract of each service information, and then perform duplication elimination on the service information with the same or similar semantics expressed by the abstract.
The information duplication elimination processing can be to eliminate duplication only in the currently crawled business information, perform semantic comparison among the crawled business information, and eliminate duplication for business information with the same or similar semantics. The information deduplication processing can also be performed on currently acquired service information in combination with historical service information, at this time, each piece of currently acquired service information is not only subjected to information deduplication processing, but also subjected to semantic comparison with each piece of historical service information, and if the currently acquired service information is the same as or similar to the historical service information, the currently acquired service information can be subjected to deduplication processing.
In another example, the service information may be further processed by information sorting. The sorting rule of the information sorting process can be specified, and the sorting rule can be determined according to at least one dimension of the dimensions of the relevance degree with the report topic, the information dissemination degree, the authority degree of the information distribution media, the information heat degree and the like. For example, the higher the correlation between the service information and the report topic, the higher the information dissemination, the more authoritative the information distribution medium and the higher the information popularity, the higher the ranking of the service information. The information dissemination degree may include the forwarding times and the reference times of the service information, and the information popularity degree may include the praise number and the comment number of the service information.
After the sorting result for the service information is obtained, the target service information can be determined according to the sorting result of the service information. For example, the first N pieces of service information in the sequence may be determined as the target service information. In the present specification, N may be a designated integer.
The sorting result obtained through the information sorting processing can better reflect the information value of each service information in the service information report, the service information with higher sorting is higher in information value, the service information with higher sorting is easier to determine as the target service information, and the more the service information with higher information value in the generated service information report is, the higher the information value of the service information report is.
It should be noted that at least one of the information filtering process, the information deduplication process and the information sorting process may be performed on the crawled service information. Taking fig. 2 as an example, fig. 2 shows a flowchart of an example 200 of a processing procedure for business information according to an embodiment of the present disclosure. As shown in fig. 2, after the web crawler crawls the business information (210), the crawled business information is processed by information screening (220), then processed by information deduplication (230), then processed by information sorting (240), and then the target business information is determined based on the sorting result of the information sorting (250).
In one example, in addition to determining the target business information based on the above-mentioned dimensions of relevance to the report topic, information dissemination, information distribution media, and information popularity, the target business information may also be determined based at least in part on the relevance between the business information. Namely, the correlation degree between the business information is used as one reference dimension in a plurality of reference dimensions for determining the target business information.
The relevancy between the two pieces of business information can be determined by a Natural Language Processing (NLP) mode, and can also be determined according to the matched keywords. Specifically, the matching condition of each service information and the specified keyword can be determined, and if the number of the same keywords matched by two service information is more, the higher the association degree between the two service information is.
In this example, the business information may be sorted according to the above-mentioned dimensions of relevance to the report topic, information dissemination, information distribution media, and information popularity, and the result of sorting may be obtained. Then, reordering is performed based on the first N pieces of service information in the sorting result and the correlation between the service information in the sorting result. For example, when N is 1, the reordering is performed based on the first service information and the correlation between the service information.
In one example, for other service information except the first N service information, the relevance between each other service information and the first N service information is calculated, and then the other service information is reordered according to the order of the relevance from high to low.
In another example, the association degrees of each other service information and the first N service information may be calculated first, the service information with the highest current association degree is ranked at N +1, then the association degrees of each service information to be ranked and the first N +1 service information are continuously calculated, then the service information with the highest current association degree is ranked at N +2, and so on until the ranking is completed.
In the above example, when calculating the association degree between one service information and a plurality of service information, the association degree may be taken as an overall association degree. The individual association degree between the service information and each of the plurality of service information can be calculated first, and then the sum of all the individual association degrees is used as the overall association degree between the service information and the plurality of service information.
By the above example, the higher the information value that can be embodied by the service information ranked earlier in the ranking result, therefore, the first N service information can be determined as the target service information first, and then the relevance between each of the other service information and the service information as the reference can be calculated based on the service information determined first, and the ranking can be performed again based on the calculated relevance. The sequencing result of the service information after reordering comprises the characteristic of the relevance, the relevance between the target service information determined according to the sequencing result of reordering is higher, each target service information with high relevance has continuity, the presented service information has stronger logicality, and a service information report generated based on the target service information with stronger logicality also has higher information value.
Returning to FIG. 1, after the target business information is determined, a business information report may be generated at 150 based on the target business information and the report template.
In one example, each target business information may be directly populated into a report template to generate a business information report. In another example, a title, a summary, and a web page link of each target business information may be extracted, and then the extracted title, summary, and web page link are filled in a report template to generate a business information report. The title, the abstract and the webpage link of the corresponding service information are presented in the service information report, the title and the abstract can summarize the service information in a short mode, and for a user who refers to the service information report, the user can view the short title and the abstract, so that the referring time is saved. When the user needs to further know the service information, the complete service information content can be viewed through the webpage link. Therefore, not only is the space of the business information report reduced, but also the efficiency of the user for looking up the business information report is improved.
In one example, the report template may include at least two business sections, each of the at least two business sections being for a different one of the report topics.
The report subject is directed to the business information report, and the business subject is directed to the business block in the business information report. One report topic may include a plurality of different business topics, and the business scope of the report topic includes the business scope of each business topic.
The business topics aiming at one report topic can be branch topics of the report topic, and the report topics can also be respectively set forth from different angles, wherein each business topic is a set angle. For example, the reporting topic is a stall economy, and the reporting topic may include two business topics: the method comprises the following steps of setting forth the stall economy from the perspective of legislative business and setting forth the stall economy from the perspective of market business.
The information source address lists corresponding to the service blocks can be the same or different. When the information source address lists corresponding to the service blocks are different, the information source address lists of the service blocks can be determined according to the service themes of the service blocks. For example, if the service theme of a service block is a legal theme, the information source address list of the service block is determined according to the legal theme, and each information source address in the information source address list may be a website address related to law.
For each service section, besides calculating the correlation degree between each service information in the service section, the correlation degree between each service information in the service section and other service sections can also be calculated.
Specifically, the correlation between each service information in the service layout block and each service information in other service layout blocks may be calculated, and then the sum of the correlations between the service information and each service information is determined as the correlation between the service information and the other service layout blocks. According to the method, the association degree between each service information in the service block and other service blocks can be calculated. When the other service sections include at least two, the sum of the association degrees with each other service section can be determined as the association degrees of the service information and all other service sections.
Target business information for each business section can then be determined from the crawled business information based at least in part on the correlation between the business information and the business information of other business sections.
For each service information, the sum of the correlation between the service information and other service information in the same service plate and the correlation between the service information and other service plates can be determined as the total correlation of the service information. And then sorting is carried out based on the total correlation degree corresponding to each service information, and the target service information of each service block is determined according to the sorting.
In one example, the association between the business information and other business information in the same business block has a first weight, the association between the business information and other business blocks has a second weight, and the first weight and the second weight may be different. And when the total relevance of each service information is calculated, multiplying the relevance between the service information and other service information by a first weight, multiplying the relevance between the service information and other service blocks by a second weight, and determining the sum of the two multiplication values as the total relevance of the service information.
In one example, after the target service information in each service block is determined, the presentation sequence of each service block in the report template can be determined according to the target service information of each service block.
The association degree between any two service sections can be calculated according to the target service information in the two service sections. Specifically, the correlation degree between each target service information in one service section and each target service information in another service section is calculated, and then the sum of the correlation degrees between the target service information and each target service information is determined as the correlation degree between the target service information and the another service section. According to the method, the association degree between each target service information in the service block and the other service block can be calculated. And then determining the sum of the correlation degrees between each target service information in the service section and another service section as the correlation degree between the two service sections.
In this example, the first service section in the presentation order in the report template may be determined first, where the first service section in the presentation order may be specified, may also be determined according to a logical relationship between the service topics of the respective service sections, and may also be determined according to a first association degree between the respective service sections and the report topic. Specifically, for each service section, the association degree between each target service information in the service section and the report topic is calculated, and then the sum of the association degrees corresponding to each target service information is determined as the first association degree of the service section. And determining the service section with the maximum first relevance as the service section with the first presentation sequence.
Then, a second association degree between each business section and other business sections is calculated. In one example, for each service section, a second degree of association between the service section and the first-in-order service section may be calculated. Specifically, the association degree between each target service information in the service layout block and the first service layout block in the sequence may be calculated, and then the sum of the association degrees corresponding to each target service information is determined as the second association degree between the service layout block and the first service layout block in the sequence. And sequencing other business sections according to the second relevance of the business sections. The higher the association degree between the service layout blocks in the first order, the higher the front of the ordering of the service layout blocks, and the lower the association degree between the service layout blocks in the first order, the back of the ordering of the service layout blocks.
In another example, a second association degree between each service block and a first-order service block may be calculated first, and the service block with the highest current second association degree may be determined as a second-order service block. Then, second association degrees between other service layout blocks and the currently sequenced service layout blocks (the currently sequenced first and second service layout blocks) are calculated, specifically, the sum of the second association degrees between each service layout block and each currently sequenced service layout block is determined as the current second association degree corresponding to each service layout block, and then the service layout block with the highest current second association degree is determined as the service layout block with the third sequence. And repeating the steps until all the service blocks are sequenced.
After the presentation sequence of each service layout block in the report template is determined, a service information report can be generated according to the target service information of each service layout block, the report template and the presentation sequence of each service layout block in the report template.
The positions of the business sections in the report template are adjusted according to the presentation sequence of the business sections in the report template, the target business information of the business sections can determine the sequence of the target business information in the business sections, and then the sequencing sequence of the target business information of the sections is filled into the corresponding business sections in the report template, so that a business information report is generated.
Through the above example, each layout block in the report template can be adjusted according to the target business information, and the adjustment is performed based on the correlation between the target business information and the correlation between the business layout blocks, so that the information contents presented by the adjusted business information report are more consistent with each other and have stronger logic.
FIG. 3 is a diagram illustrating an example 300 of generating a business information report based on a report template according to an embodiment of the present description.
As shown in FIG. 3, the report template includes service sections A and B, wherein the service section A includes target service information A-1 and target service information A-2, and the service section B includes target service information B-1, target service information B-2 and target service information B-3. Aiming at the target business information A-1 and A-2 in the business block A, according to the relevance between each target business information and the business theme of the business block A and the relevance between each target business information and the business block B, the relevance corresponding to the target business information A-2 can be determined to be higher than the relevance corresponding to the target business information A-1.
Aiming at target service information B-1, B-2 and B-3 in the service block B, according to the relevance between each target service information and the service theme of the service block B and the relevance between each target service information and the service block A, the highest target service information B-1, the lowest target service information B-3 times and the lowest corresponding relevance between the target service information B-2 can be determined.
Then, the sum of the first relevance between the target business information B-1, B-2 and B-3 in the business block B and the report subject, the second relevance between the business block B and the business block A, and the sum of the first relevance between the target business information A-1 and A-2 in the business block A and the report subject can be determined. If the sum of the first association degrees corresponding to the service layout blocks B is greater than the sum of the first association degrees corresponding to the service layout blocks a, and the second association degrees corresponding to the service layout blocks B and the service layout blocks a are the same, it can be determined that the service layout blocks B are arranged in the first and the service layout blocks a are arranged in the second. The service information report generated according to the target service information of each service section, the report template and the presentation sequence of each service section in the report template is shown in the right diagram of fig. 3.
Fig. 4 is a block diagram illustrating an example of a service information report generation apparatus 400 according to an embodiment of the present disclosure.
As shown in fig. 4, the business information report generating apparatus 400 may include a request receiving unit 410, a configuration information determining unit 420, an information crawling unit 430, a target business information determining unit 440, and a report generating unit 450.
The request receiving unit 410 is configured to receive a request for generation of a service information report, the request for generation including a report subject of the service information report. The operations performed by the request receiving unit 410 may refer to the operations of block 110 described above with reference to fig. 1.
The configuration information determining unit 420 is configured to determine report configuration information of the service information report according to the generation request of the service information report, the report configuration information including a report template and an information source address list. The operations performed by the configuration information determination unit 420 may refer to the operations of block 120 described above with reference to fig. 1.
The information crawling unit 430 is configured to crawl business information from a list of information source addresses using a web crawler. The operations performed by the information crawling unit 430 may refer to the operations of block 130 described above with reference to fig. 1.
The target business information determining unit 440 is configured to determine target business information from the crawled business information. The operation performed by the target service information determination unit 440 may refer to the operation of block 140 described above with reference to fig. 1. In one example, the target service information determining unit 440 may be further configured to: sequencing the crawled service information; and determining the target service information according to the sequencing result of the service information.
The report generating unit 450 is configured to generate a service information report based on the target service information and the report template. The operations performed by the report generation unit 450 may refer to the operations of block 150 described above with reference to fig. 1.
In one example, the business information report generating device 400 may further include an information processing unit configured to perform information filtering processing or information de-duplication processing on the crawled business information.
In one example, the target service information determining unit 440 may be further configured to: target business information is determined from the crawled business information based at least in part on the relevancy among the business information.
In one example, the report template includes at least two business sections, each of the at least two business sections being for a different business topic of the report topic, and the target business information determination unit 440 may be further configured to: and determining target business information of each business section from the crawled business information at least partially according to the correlation degree between the business information and the business information of other business sections.
In one example, the service information report generating apparatus 400 may further include a layout order determining unit configured to determine a presentation order of each service layout in the report template according to the target service information of each service layout; and the report generating unit may be further configured to: and generating a service information report according to the target service information of each service section, the report template and the presentation sequence of each service section in the report template.
In one example, the tile order determination unit may be further configured to: determining a first association degree between each service section and a report subject and a second association degree between each service section and other service sections according to the target service information of each service section; and determining the presentation sequence of each service section in the report template according to the first relevance and the second relevance of each service section.
Embodiments of a method and apparatus for generating a service information report according to embodiments of the present disclosure are described above with reference to fig. 1 to 4.
The device for generating the service information report according to the embodiments of the present disclosure may be implemented by hardware, or may be implemented by software, or a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the storage into the memory for operation through the processor of the device where the software implementation is located as a logical means. In the embodiment of the present specification, the apparatus for generating the service information report may be implemented by an electronic device, for example.
Fig. 5 shows a block diagram of an electronic device 500 implementing a method for generating a business information report according to an embodiment of the present description.
As shown in fig. 5, the electronic device 500 may include at least one processor 510, a storage (e.g., non-volatile storage) 520, a memory 530, and a communication interface 540, and the at least one processor 510, the storage 520, the memory 530, and the communication interface 540 are connected together via a bus 550. The at least one processor 510 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.
In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause the at least one processor 510 to: receiving a generation request of a business information report, wherein the generation request comprises a report subject of the business information report; determining report configuration information of the service information report according to the generation request of the service information report, wherein the report configuration information comprises a report template and an information source address list; crawling service information according to the information source address list by using a web crawler; determining target business information from the crawled business information; and generating a service information report according to the target service information and the report template.
It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 510 to perform the various operations and functions described above in connection with fig. 1-4 in the various embodiments of the present description.
According to one embodiment, a program product, such as a machine-readable medium, is provided. A machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-4 in the various embodiments of the present specification.
Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.
In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code constitute a part of the embodiments of the present specification.
Computer program code required for the operation of various portions of the present specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB, NET, Python, and the like, a conventional programming language such as C, Visual Basic 2003, Perl, COBOL 2002, PHP, and ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute on the user's computer, or on the user's computer as a stand-alone software package, or partially on the user's computer and partially on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Not all steps and elements in the above flows and system structure diagrams are necessary, and some steps or elements may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.
The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
Although the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the embodiments of the present disclosure are not limited to the specific details of the embodiments, and various simple modifications may be made to the technical solutions of the embodiments of the present disclosure within the technical spirit of the embodiments of the present disclosure, and all of them fall within the scope of the embodiments of the present disclosure.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the description is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (18)

1. A method for generating a business information report, comprising:
receiving a generation request of the business information report, wherein the generation request comprises a report subject of the business information report;
determining report configuration information of the service information report according to the generation request of the service information report, wherein the report configuration information comprises a report template and an information source address list;
crawling service information according to the information source address list by using a web crawler;
determining target business information from the crawled business information; and
and generating the service information report according to the target service information and the report template.
2. The method of claim 1, wherein determining target business information from the crawled business information comprises:
sequencing the crawled service information; and
and determining the target service information according to the sequencing result of the service information.
3. The method of claim 1, wherein prior to ranking the crawled business information, the method further comprises:
and performing information screening processing or information duplication removing processing on the crawled service information.
4. The method of claim 3, wherein the report configuration information further comprises keywords and/or logical combinations between respective keywords,
the information screening processing of the crawled business information comprises the following steps:
and performing information screening processing on the crawled business information by using the keywords and/or the logic combination among the keywords.
5. The method of claim 1, wherein determining target business information from the crawled business information comprises:
target business information is determined from the crawled business information based at least in part on the relevancy among the business information.
6. The method of claim 5, wherein the report template includes at least two business sections, each of the at least two business sections for a different business topic of the report topic, an
Determining target business information from the crawled business information based at least in part on the relevancy among the business information comprises:
and determining target business information of each business section from the crawled business information at least partially according to the correlation degree between the business information and the business information of other business sections.
7. The method of claim 6, wherein the list of information source addresses for each service block is determined based on a service theme for each service block.
8. The method of claim 6, further comprising:
determining the presentation sequence of each service section in the report template according to the target service information of each service section; and
generating the service information report according to the target service information and the report template comprises:
and generating the service information report according to the target service information of each service section, the report template and the presentation sequence of each service section in the report template.
9. The method of claim 8, wherein determining an order of presentation of the respective service blocks in the report template based on the target service information of the respective service blocks comprises:
determining a first association degree between each service section and the report subject and a second association degree between each service section and other service sections according to the target service information of each service section; and
and determining the presentation sequence of each service section in the report template according to the first relevance and the second relevance of each service section.
10. An apparatus for generating a service information report, comprising:
the request receiving unit is used for receiving a generation request of the service information report, and the generation request comprises a report subject of the service information report;
the configuration information determining unit is used for determining report configuration information of the service information report according to the generation request of the service information report, wherein the report configuration information comprises a report template and an information source address list;
the information crawling unit crawls business information according to the information source address list by using a web crawler;
a target service information determining unit for determining target service information from the crawled service information; and
and the report generating unit is used for generating the service information report according to the target service information and the report template.
11. The apparatus of claim 10, wherein the target service information determining unit:
sequencing the crawled service information; and
and determining the target service information according to the sequencing result of the service information.
12. The apparatus of claim 10, wherein the apparatus further comprises:
and the information processing unit is used for performing information screening processing or information duplication removing processing on the crawled service information.
13. The apparatus of claim 10, wherein the target service information determining unit:
target business information is determined from the crawled business information based at least in part on the relevancy among the business information.
14. The apparatus of claim 13, wherein the report template comprises at least two business sections, each of the at least two business sections for a different business topic of the report topic, an
The target service information determining unit:
and determining target business information of each business section from the crawled business information at least partially according to the correlation degree between the business information and the business information of other business sections.
15. The apparatus of claim 14, further comprising:
the layout sequence determining unit is used for determining the presentation sequence of each service layout in the report template according to the target service information of each service layout; and
the report generation unit:
and generating the service information report according to the target service information of each service section, the report template and the presentation sequence of each service section in the report template.
16. The apparatus of claim 15, wherein the tile order determination unit:
determining a first association degree between each service section and the report subject and a second association degree between each service section and other service sections according to the target service information of each service section; and
and determining the presentation sequence of each service section in the report template according to the first relevance and the second relevance of each service section.
17. An electronic device, comprising:
at least one processor, and
a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any of claims 1-9.
18. A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any of claims 1 to 9.
CN202010842237.XA 2020-08-20 2020-08-20 Method and device for generating business information report Active CN111782916B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010842237.XA CN111782916B (en) 2020-08-20 2020-08-20 Method and device for generating business information report

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010842237.XA CN111782916B (en) 2020-08-20 2020-08-20 Method and device for generating business information report

Publications (2)

Publication Number Publication Date
CN111782916A true CN111782916A (en) 2020-10-16
CN111782916B CN111782916B (en) 2024-03-22

Family

ID=72762837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010842237.XA Active CN111782916B (en) 2020-08-20 2020-08-20 Method and device for generating business information report

Country Status (1)

Country Link
CN (1) CN111782916B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579961A (en) * 2020-12-28 2021-03-30 杭州搜车数据科技有限公司 Web navigation page construction method, device, equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120246139A1 (en) * 2010-10-21 2012-09-27 Bindu Rama Rao System and method for resume, yearbook and report generation based on webcrawling and specialized data collection
WO2018014759A1 (en) * 2016-07-18 2018-01-25 阿里巴巴集团控股有限公司 Method, device and system for presenting clustering data table
CN109669853A (en) * 2018-10-23 2019-04-23 深圳壹账通智能科技有限公司 Test report generation method and device, storage medium, electric terminal
CN109726327A (en) * 2018-12-14 2019-05-07 深圳壹账通智能科技有限公司 A kind of information-pushing method and device
CN110147541A (en) * 2019-05-23 2019-08-20 北京神州泰岳软件股份有限公司 A kind of generation method and device of economic report
CN110619568A (en) * 2019-09-17 2019-12-27 王文斌 Risk assessment report generation method, device, equipment and storage medium
CN111125204A (en) * 2019-12-17 2020-05-08 中科鼎富(北京)科技发展有限公司 Analysis report obtaining method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120246139A1 (en) * 2010-10-21 2012-09-27 Bindu Rama Rao System and method for resume, yearbook and report generation based on webcrawling and specialized data collection
WO2018014759A1 (en) * 2016-07-18 2018-01-25 阿里巴巴集团控股有限公司 Method, device and system for presenting clustering data table
CN109669853A (en) * 2018-10-23 2019-04-23 深圳壹账通智能科技有限公司 Test report generation method and device, storage medium, electric terminal
CN109726327A (en) * 2018-12-14 2019-05-07 深圳壹账通智能科技有限公司 A kind of information-pushing method and device
CN110147541A (en) * 2019-05-23 2019-08-20 北京神州泰岳软件股份有限公司 A kind of generation method and device of economic report
CN110619568A (en) * 2019-09-17 2019-12-27 王文斌 Risk assessment report generation method, device, equipment and storage medium
CN111125204A (en) * 2019-12-17 2020-05-08 中科鼎富(北京)科技发展有限公司 Analysis report obtaining method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579961A (en) * 2020-12-28 2021-03-30 杭州搜车数据科技有限公司 Web navigation page construction method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN111782916B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
Gil et al. Towards content trust of web resources
US7624102B2 (en) System and method for grouping by attribute
CN106557558B (en) Data analysis method and device
CN110637316B (en) System and method for prospective object identification
Im et al. Linked tag: image annotation using semantic relationships between image tags
CN104699725A (en) Data searching processing method and system
CN108829656B (en) Data processing method and data processing device for network information
Wu et al. Efficient near-duplicate detection for q&a forum
Li et al. A hybrid model for experts finding in community question answering
Vale et al. Experimenting with information retrieval methods in the recovery of feature-code SPL traces
Bagade et al. The Kauwa-Kaate fake news detection system
Li et al. Getting work done on the web: supporting transactional queries
CN113221535B (en) Information processing method, device, computer equipment and storage medium
CN112765966B (en) Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
KR102257139B1 (en) Method and apparatus for collecting information regarding dark web
CN111782916B (en) Method and device for generating business information report
Joshi et al. Auto-grouping emails for faster e-discovery
US9705972B2 (en) Managing a set of data
CN116775639A (en) Data processing method, storage medium and electronic device
CN110209804B (en) Target corpus determining method and device, storage medium and electronic device
Kim A document ranking method with query-related web context
Preetha et al. Personalized search engines on mining user preferences using clickthrough data
Li et al. Discovering associations between news and contents in social network sites with the D-Miner service framework
JP7003481B2 (en) Reinforcing rankings for social media accounts and content
Zhao et al. Drexel at TREC 2014 Federated Web Search Track.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant