WO2016206395A1 - 周报信息处理方法及装置 - Google Patents

周报信息处理方法及装置 Download PDF

Info

Publication number
WO2016206395A1
WO2016206395A1 PCT/CN2016/074245 CN2016074245W WO2016206395A1 WO 2016206395 A1 WO2016206395 A1 WO 2016206395A1 CN 2016074245 W CN2016074245 W CN 2016074245W WO 2016206395 A1 WO2016206395 A1 WO 2016206395A1
Authority
WO
WIPO (PCT)
Prior art keywords
report information
weekly report
specified
weekly
encoding format
Prior art date
Application number
PCT/CN2016/074245
Other languages
English (en)
French (fr)
Inventor
胡媛
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016206395A1 publication Critical patent/WO2016206395A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling

Definitions

  • the present invention relates to the field of communications, and in particular to a method and apparatus for processing weekly report information.
  • a web crawler is a program that automatically grabs Internet information according to certain rules. It starts from an initial page set and traverses automatic collection of network information. When the crawler opens an HTML page, it analyzes the HTML markup structure to get the information, gets a hyperlink to other pages, and then selects the next site to visit through the established search strategy.
  • Data mining refers to the process of searching for information from a large amount of data through an algorithm.
  • Data mining is related to computer science and achieves these goals through statistics, online analytical processing, information retrieval, machine learning, expert systems, and pattern recognition.
  • Analysis methods include Classification, Estimation, Prediction, Affinity grouping or association rules, Clustering, Complex Data Type Mining (Text, Web, Graphic Image, Video, audio, etc.
  • weekly weekly reports are required to be filled out on the webpage, usually filled in by the responsible persons of each project, and each project has an independent weekly report.
  • Thousands of weekly newspapers generally use manual manual summarization of weekly report information, which takes time and labor, and can not achieve automatic automatic update of thousands of project weekly report information, and project management efficiency is low.
  • the web crawler technology can realize the information acquisition on the webpage, it can not realize the secondary processing and intelligent analysis of the information. It is necessary to combine the data mining technology to realize the automatic acquisition, intelligent analysis processing, customized output and periodicity of thousands of weekly report information. Update.
  • the invention provides a method and a device for processing weekly report information, so as to at least solve the problem that the secondary processing such as automatic acquisition and intelligent analysis of weekly report information cannot be realized in the related art.
  • a weekly report information processing method includes: acquiring weekly report information from a specified web page; acquiring a specified instruction; and filtering out specified weekly report information from the weekly report information according to the specified instruction.
  • the method includes: converting the encoding format of the weekly report information into a specified encoding format; and storing the weekly report information converted to the specified encoding format.
  • obtaining the specified instruction includes: obtaining the specified instruction by using a pre-configured cleaning and comparison rule of the weekly report information.
  • the method before converting the encoding format of the weekly report information to the specified encoding format, includes: reading the weekly report information by using a byte stream.
  • the obtaining the weekly report information from the specified webpage includes: acquiring the weekly report information from the specified webpage by using an HTML tool.
  • a weekly report information processing apparatus includes: a first obtaining module configured to acquire weekly report information from a specified webpage; and a second obtaining module configured to acquire a specified instruction;
  • the processing module is configured to filter the specified weekly report information from the weekly report information according to the specified instruction.
  • the apparatus further includes: a conversion module configured to convert the encoding format of the weekly report information into a specified encoding format; and the storage module configured to store the weekly report information converted to the specified encoding format.
  • the foregoing second obtaining module is further configured to obtain the specified instruction by using a pre-configured cleaning and comparison rule of the weekly report information.
  • the apparatus further includes: a reading module configured to read the weekly report information by using a byte stream.
  • the first obtaining module is further configured to obtain the foregoing weekly report information from the specified webpage by using an HTML tool.
  • the weekly report information is obtained from the specified webpage; the specified instruction is obtained; and the specified weekly report information is filtered out from the weekly report information according to the specified instruction.
  • the problem of secondary processing such as automatic acquisition and intelligent analysis of weekly report information cannot be solved in the related art, and the automatic collection and intelligent analysis of the relevant data of the weekly report of the website Shanghai quantity project is realized, and the user can customize the requirements.
  • FIG. 1 is a flowchart of a method for processing weekly report information according to an embodiment of the present invention
  • FIG. 2 is a block diagram showing the structure of a weekly report information processing apparatus according to an embodiment of the present invention
  • FIG. 3 is a structural block diagram (1) of a weekly report information processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a structural block diagram (2) of a weekly report information processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a flow chart of automatically obtaining a customizable project management weekly report according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a weekly report information processing method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
  • Step S102 Obtain weekly report information from the specified webpage.
  • Step S104 acquiring a specified instruction
  • Step S106 Filter out the specified weekly report information from the weekly report information according to the specified instruction.
  • the weekly report related information is obtained from the specified webpage, and then the specified weekly report information of the user demand is filtered out from the obtained weekly report related information according to the specified instruction input by the user, and the manual periodic summary is required compared with the related technology.
  • the project weekly report was updated to solve the problem that the automatic acquisition and intelligent analysis of the weekly report information could not be realized in the related technology, and the automatic collection and intelligent analysis of the data of the weekly report of the Shanghai volume project was realized, and the user can be customized. demand.
  • the encoding format of the weekly report information is converted into a specified encoding format, and the weekly report information converted to the specified encoding format is stored. Further, the specified weekly report information is filtered out from the weekly report information converted to the specified coding format.
  • the above step S104 involves obtaining the specified instruction.
  • the above specified instruction can be obtained in various ways, which will be exemplified below.
  • the specified instruction is obtained by a pre-configured cleaning and contrasting rule of the weekly report information.
  • the cleaning and comparison rule is to describe whether the field type of the weekly report information field conforms to the definition, whether the field value conforms to the definition, whether there is a value, and whether the data is consistent.
  • the weekly report information is read by using a byte stream before converting the encoding format of the weekly report information to the specified encoding format.
  • the above step S102 involves obtaining the weekly report information from the specified webpage.
  • the weekly report information can be obtained from the designated webpage in various manners, which will be exemplified below.
  • the weekly report information is obtained from the designated web page by an HTML tool.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention in essence or the contribution to the related art can be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, CD-ROM).
  • the instructions include a plurality of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the above-described methods of various embodiments of the present invention.
  • a weekly report information processing device is provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and has not been described again.
  • the term “module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • the apparatus includes: a first obtaining module 22 configured to acquire weekly report information from a specified webpage; and a second obtaining module 24, setting To obtain the specified instruction, the processing module 26 is configured to filter out the specified weekly report information from the weekly report information according to the specified instruction.
  • FIG. 3 is a structural block diagram (1) of a weekly report information processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus further includes: a conversion module 32 configured to convert an encoding format of the weekly report information into a specified encoding format; Module 34 is arranged to store the above-described weekly report information converted to the specified encoding format.
  • a conversion module 32 configured to convert an encoding format of the weekly report information into a specified encoding format
  • Module 34 is arranged to store the above-described weekly report information converted to the specified encoding format.
  • the second obtaining module 24 is further configured to obtain the specified instruction by using a pre-configured cleaning and comparison rule of the weekly report information.
  • FIG. 4 is a structural block diagram (2) of a weekly report information processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus further includes: a reading module 42 configured to read the weekly report information by using a byte stream. .
  • the first obtaining module 22 is further configured to obtain the weekly report information from the specified webpage by using an HTML tool.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • An object of the optional embodiment of the present invention is to provide an automated and intelligent data statistical analysis application system and method, which realizes automatic collection, intelligent analysis, customized query, flexible presentation and the like of the relevant data of the weekly report of the webpage Shanghai project.
  • an optional embodiment of the present invention provides an application system for automatically obtaining a customizable project management weekly report, including:
  • Information collection subsystem Construct an efficient and automatic HTML tool to obtain relevant content information of the project weekly report on the webpage.
  • Data processing subsystem data is extracted from the collected data, read by byte stream, and then converted into a specified encoding format.
  • data storage subsystem all the extracted data into the inventory storage, as a carrier of data, to provide a stable and efficient mass data storage and supply access data interface.
  • Intelligent analysis subsystem intelligently analyzes and filters the weekly report information synchronized by the database according to user-customizable rules.
  • the system provides cleaning and comparison rules for the data interface of the visual interface configuration, and supports the addition, deletion, modification, and checking of cleaning and comparison rules.
  • the weekly automatic acquisition system batch outputs the filtered weekly report information results in a user-customizable format.
  • the system provides an application interface for use by third-party platforms in the form of library functions and API programming interfaces.
  • FIG. 5 is a flow chart of automatically obtaining a customizable project management weekly report according to an embodiment of the present invention. As shown in FIG. 5, the process includes the following steps:
  • Step 1 Use the open source Java-based open source web extraction tool web crawler Web-Harvest to collect the specified web pages and extract the required data from these pages.
  • Step 2 Perform data positioning according to the relative path of the surrounding content, and select an attribute that is related to the content of the webpage and has no format.
  • Step 3 Map the HTML file to an XML file, construct a hash map (HashMap) hash table, the key corresponds to the XML tag, and the value corresponds to the tag content.
  • HashMap hash map
  • Step 4 After the processed data is stored in the library, the data storage subsystem performs functions such as data definition, loading, storage, query, backup, and recovery.
  • Step 5 According to the mapping relationship of the required key index field attributes, perform data cleaning and data comparison under the same directory ID.
  • the cleaning and data comparison rules are to describe whether the field type of the data source field conforms to the definition, whether the field value conforms to the definition, whether there is a value, whether the data is consistent, and the like.
  • Step 6 According to the item-related keywords input by the user, the filtered customized information of the weekly report information is batch-produced according to a customizable format, so that the weekly report content can be automatically updated periodically.
  • an automated and intelligent data statistical analysis application system and method realizes automatic collection and intelligent analysis of relevant data of a weekly report of a webpage Shanghai project, and meets the user's customizable requirements. Automatically obtain regular automatic updates of the weekly report information of the items to be viewed, which greatly shortens the effect of periodic manual summarization and update time of thousands of project weekly reports, and improves project management efficiency.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device so that they can be stored in the storage device Executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that herein, or they may be fabricated into individual integrated circuit modules, or multiple of them. Or the steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
  • the weekly report information is obtained from the specified webpage; the specified instruction is obtained; and the specified weekly report information is filtered out from the weekly report information according to the specified instruction.
  • the problem of secondary processing such as automatic acquisition and intelligent analysis of weekly report information cannot be solved in the related art, and the automatic collection and intelligent analysis of the relevant data of the weekly report of the website Shanghai quantity project is realized, and the user can customize the requirements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种周报信息处理方法及装置,其中,该方法包括:从指定网页中获取周报信息(S102);获取指定指令(S104);根据指定指令从周报信息中筛选出指定周报信息(S106)。解决了相关技术中无法实现周报信息的自动获取、智能分析等二次处理的问题,进而实现了网页上海量项目周报相关数据的自动采集、智能分析,达到了满足用户可定制需求。

Description

周报信息处理方法及装置 技术领域
本发明涉及通信领域,具体而言,涉及一种周报信息处理方法及装置。
背景技术
随着信息化技术的发展,政府、企业单位的信息化程度日益提高。大企业的项目数量与项目信息增长迅速,部门内部、部门之间的项目进展信息共享需求迫切,项目管理信息化、自动化是企业信息化建设的重点工作方向之一。
网络爬虫是一种按照一定的规则,自动的抓取互联网信息的程序。它从一个初始的网页集出发,遍历自动的采集网络信息。当爬虫打开某个HTML页面后,它会分析HTML标记结构来获取信息,并获取指向其它页面的超级链接,然后通过既定的搜索策略选择下一个要访问的站点。
数据挖掘是指从大量的数据中通过算法搜索信息的过程。数据挖掘与计算机科学有关,并通过统计、在线分析处理、情报检索、机器学习、专家系统和模式识别等诸多方法来实现上述目标。分析方法包括分类(Classification)、估计(Estimation)、预测(Prediction)、相关性分组或关联规则(Affinity grouping or association rules)、聚类(Clustering)、复杂数据类型挖掘(Text,Web,图形图像,视频,音频)等。
目前很多企业项目管理过程中,需要每周在网页上填写项目周报,一般由各项目的负责人填写,并且每个项目有一个独立的周报。为达到部门间项目信息共享的目的,需每周汇总周报,共享给各个部门。数以千计的周报普遍采用专人人工汇总周报信息方式,耗时间耗人力,并且无法实现数以千计的项目周报信息定期的自动更新,项目管理效率低下。
网络爬虫技术虽然能实现网页上信息的获取,但无法实现信息的二次处理及智能分析,需结合数据挖掘技术来实现数以千计周报信息的自动获取、智能分析处理、定制化输出及定期更新。
针对相关技术中,无法实现周报信息的自动获取、智能分析等二次处理的问题,还未提出有效的解决方案。
发明内容
本发明提供了一种周报信息处理方法及装置,以至少解决相关技术中无法实现周报信息的自动获取、智能分析等二次处理的问题。
根据本发明实施例的一个方面,提供了一种周报信息处理方法,包括:从指定网页中获取周报信息;获取指定指令;根据上述指定指令从上述周报信息中筛选出指定周报信息。
可选地,从上述指定网页中获取上述周报信息之后包括:将上述周报信息的编码格式转换为指定编码格式;将转换为指定编码格式的上述周报信息进行存储。
可选地,获取上述指定指令包括:通过预先配置的上述周报信息的清洗和对比规则获取上述指定指令。
可选地,将上述周报信息的编码格式转换为指定编码格式之前包括:采用字节流的方式对上述周报信息进行读取。
可选地,从上述指定网页中获取上述周报信息包括:通过HTML工具从上述指定网页中获取上述周报信息。
根据本发明实施例的另一个方面,还提供了一种周报信息处理装置,上述装置包括:第一获取模块,设置为从指定网页中获取周报信息;第二获取模块,设置为获取指定指令;处理模块,设置为根据上述指定指令从上述周报信息中筛选出指定周报信息。
可选地,上述装置还包括:转换模块,设置为将上述周报信息的编码格式转换为指定编码格式;存储模块,设置为将转换为指定编码格式的上述周报信息进行存储。
可选地,上述第二获取模块还设置为通过预先配置的上述周报信息的清洗和对比规则获取上述指定指令。
可选地,上述装置还包括:读取模块,设置为采用字节流的方式对上述周报信息进行读取。
可选地,上述第一获取模块还设置为通过HTML工具从上述指定网页中获取上述周报信息。
通过本发明实施例,采用从指定网页中获取周报信息;获取指定指令;根据指定指令从周报信息中筛选出指定周报信息。解决了相关技术中无法实现周报信息的自动获取、智能分析等二次处理的问题,进而实现了网页上海量项目周报相关数据的自动采集、智能分析,达到了满足用户可定制需求。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的周报信息处理方法的流程图;
图2是根据本发明实施例的周报信息处理装置的结构框图;
图3是根据本发明实施例的周报信息处理装置的结构框图(一);
图4是根据本发明实施例的周报信息处理装置的结构框图(二);
图5是根据本发明实施例的可定制的项目管理周报自动获取流程图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
在本实施例中提供了一种周报信息处理方法,图1是根据本发明实施例的周报信息处理方法的流程图,如图1所示,该流程包括如下步骤:
步骤S102,从指定网页中获取周报信息;
步骤S104,获取指定指令;
步骤S106,根据指定指令从周报信息中筛选出指定周报信息。
通过上述步骤,首先从指定网页中获取周报相关信息,然后根据用户输入的指定指令从获取到的上述周报相关信息中筛选出用户需求的指定周报信息,相比于相关技术中,需要人工定期汇总、更新项目周报,解决了相关技术中无法实现周报信息的自动获取、智能分析等二次处理的问题,进而实现了网页上海量项目周报相关数据的自动采集、智能分析,达到了满足用户可定制需求。
在从指定网页中获取周报信息之后,在一个可选实施例中,将上述周报信息的编码格式转换为指定编码格式,将转换为指定编码格式的周报信息进行存储。进一步地,从转换为指定编码格式的周报信息中筛选出上述的指定周报信息。
上述步骤S104中涉及到获取指定指令,需要说明的是,可以通过多种方式获取上述的指定指令,下面对此进行举例说明。在一个可选实施例中,通过预先配置的上述周报信息的清洗和对比规则获取上述指定指令。其中,清洗和对比规则是描述周报信息字段的字段类型是否符合定义,字段取值是否符合定义,可否有值,数据是否一致等。
在一个可选实施例中,将上述周报信息的编码格式转换为指定编码格式之前,采用字节流的方式对周报信息进行读取。
上述步骤S102中涉及到从上述指定网页中获取上述周报信息,需要说明的是,可以通过多种方式从指定网页中获取上述周报信息,下面对此进行举例说明。在一个可选实施例中,通过HTML工具从指定网页中获取上述周报信息。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例上述的方法。
在本实施例中还提供了一种周报信息处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图2是根据本发明实施例的周报信息处理装置的结构框图,如图2所示,该装置包括:第一获取模块22,设置为从指定网页中获取周报信息;第二获取模块24,设置为获取指定指令;处理模块26,设置为根据指定指令从周报信息中筛选出指定周报信息。
图3是根据本发明实施例的周报信息处理装置的结构框图(一),如图3所示,该装置还包括:转换模块32,设置为将周报信息的编码格式转换为指定编码格式;存储模块34,设置为将转换为指定编码格式的上述周报信息进行存储。
可选地,第二获取模块24还设置为通过预先配置的上述周报信息的清洗和对比规则获取上述指定指令。
图4是根据本发明实施例的周报信息处理装置的结构框图(二),如图4所示,该装置还包括:读取模块42,设置为采用字节流的方式对周报信息进行读取。
可选地,第一获取模块22还设置为通过HTML工具从指定网页中获取周报信息。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,从指定网页中获取周报信息;
S2,获取指定指令;
S3,根据指定指令从周报信息中筛选出指定周报信息。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
本发明可选实施例的目的是提供一种自动化、智能化的数据统计分析应用系统和方法,实现网页上海量项目周报相关数据的自动采集、智能分析、定制查询、灵活展现等功能。
为实现上述目的,本发明可选实施例提供了一种可定制的项目管理周报自动获取的应用系统,包括:
1、信息采集子系统:构造高效的自动获取HTML工具获取网页上的项目周报相关内容信息。
2、数据处理子系统:对采集的数据进行数据抽取,采用字节流的方式进行读取,然后转换成指定的编码格式。
3、数据存储子系统:对所有抽取的数据进行入库存储,作为数据的载体,提供稳定高效的海量数据存储以及供应用访问的数据接口。
4、智能分析子系统:根据用户可定制的规则智能分析和筛选数据库同步到的周报信息。系统提供可视化界面配置数据源的清洗和比对规则,支持清洗和比对规则的增、删、改、查。
周报自动获取系统按用户可定制的格式批量输出筛选后的项目周报信息结果。系统可提供应用接口,以库函数和API编程接口的形式供第三方平台使用。
图5是根据本发明实施例的可定制的项目管理周报自动获取流程图,如图5所示,该流程包括如下步骤:
步骤一:采用开源的基于Java的开源Web抽取工具网络爬虫Web-Harvest,收集指定的Web页面,并从这些页面中提取所需的数据。
步骤二:根据周围内容的相对路径进行数据定位,选取与网页内容相关、格式无关的属性。
步骤三:将HTML文件映射为XML文件,构造哈希映射(HashMap)散列表,键对应XML标签,值对应标签内容。
步骤四:经过处理后的数据入库存储,数据存储子系统完成数据定义、装入、存储、查询、备份和恢复等功能。
步骤五:根据输入所需关键索引字段属性的映射关系,进行同目录ID下的数据清洗和数据比对。清洗和数据比对规则是描述数据源字段的字段类型是否符合定义,字段取值是否符合定义,可否有值,数据是否一致等。
步骤六:根据用户输入的项目关联关键字,按照可定制的格式批量输出筛选后的定制化项目周报信息结果,实现可定期自动更新周报内容。
综上所述,通过本发明实施例提供的一种自动化、智能化的数据统计分析应用系统和方法,实现网页上海量项目周报相关数据的自动采集、智能分析,达到了满足用户可定制需求,自动获取所需查看的项目周报信息的定期自动更新,极大缩短了数以千计的项目周报的定期人工汇总、更新的时间的效果,提升项目管理工作效率。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置 中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
工业实用性
在本发明实施例中,采用从指定网页中获取周报信息;获取指定指令;根据指定指令从周报信息中筛选出指定周报信息。解决了相关技术中无法实现周报信息的自动获取、智能分析等二次处理的问题,进而实现了网页上海量项目周报相关数据的自动采集、智能分析,达到了满足用户可定制需求。

Claims (10)

  1. 一种周报信息处理方法,包括:
    从指定网页中获取周报信息;
    获取指定指令;
    根据所述指定指令从所述周报信息中筛选出指定周报信息。
  2. 根据权利要求1所述的方法,其中,从所述指定网页中获取所述周报信息之后包括:
    将所述周报信息的编码格式转换为指定编码格式;
    将转换为指定编码格式的所述周报信息进行存储。
  3. 根据权利要求1所述的方法,其中,获取所述指定指令包括:
    通过预先配置的所述周报信息的清洗和对比规则获取所述指定指令。
  4. 根据权利要求2所述的方法,其中,将所述周报信息的编码格式转换为指定编码格式之前包括:
    采用字节流的方式对所述周报信息进行读取。
  5. 根据权利要求1至4中任一项所述的方法,其中,从所述指定网页中获取所述周报信息包括:
    通过HTML工具从所述指定网页中获取所述周报信息。
  6. 一种周报信息处理装置,所述装置包括:
    第一获取模块,设置为从指定网页中获取周报信息;
    第二获取模块,设置为获取指定指令;
    处理模块,设置为根据所述指定指令从所述周报信息中筛选出指定周报信息。
  7. 根据权利要求6所述的装置,其中,所述装置还包括:
    转换模块,设置为将所述周报信息的编码格式转换为指定编码格式;
    存储模块,设置为将转换为指定编码格式的所述周报信息进行存储。
  8. 根据权利要求6所述的装置,其中,所述第二获取模块还设置为通过预先配置的所述周报信息的清洗和对比规则获取所述指定指令。
  9. 根据权利要求7所述的装置,其中,所述装置还包括:
    读取模块,设置为采用字节流的方式对所述周报信息进行读取。
  10. 根据权利要求6至9中任一项所述的装置,其中,所述第一获取模块还设置为通过HTML 工具从所述指定网页中获取所述周报信息。
PCT/CN2016/074245 2015-06-25 2016-02-22 周报信息处理方法及装置 WO2016206395A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510359653.3 2015-06-25
CN201510359653.3A CN106327039A (zh) 2015-06-25 2015-06-25 周报信息处理方法及装置

Publications (1)

Publication Number Publication Date
WO2016206395A1 true WO2016206395A1 (zh) 2016-12-29

Family

ID=57584578

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/074245 WO2016206395A1 (zh) 2015-06-25 2016-02-22 周报信息处理方法及装置

Country Status (2)

Country Link
CN (1) CN106327039A (zh)
WO (1) WO2016206395A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111103847A (zh) * 2019-12-31 2020-05-05 中国兵器装备集团自动化研究所 一种用于数控机床实时数据流的分析系统和分析方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829729A (zh) * 2018-05-10 2018-11-16 河海大学常州校区 一种网页解析并采集新闻的方法
CN109978511A (zh) * 2019-04-09 2019-07-05 艾伯资讯(深圳)有限公司 基于网页爬取的项目管理检查系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226599A (zh) * 2013-04-23 2013-07-31 翁杰 一种精确提取网页内容的方法及系统
CN103235827A (zh) * 2013-05-13 2013-08-07 济南政和科技有限公司 一种科技信息自动分类筛选的方法
CN104281680A (zh) * 2014-09-30 2015-01-14 百度在线网络技术(北京)有限公司 用于获取网站资源的数据处理系统、方法及装置
CN104281607A (zh) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 微博热点话题分析方法
CN104537097A (zh) * 2015-01-09 2015-04-22 成都布林特信息技术有限公司 微博舆情监测系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226599A (zh) * 2013-04-23 2013-07-31 翁杰 一种精确提取网页内容的方法及系统
CN103235827A (zh) * 2013-05-13 2013-08-07 济南政和科技有限公司 一种科技信息自动分类筛选的方法
CN104281607A (zh) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 微博热点话题分析方法
CN104281680A (zh) * 2014-09-30 2015-01-14 百度在线网络技术(北京)有限公司 用于获取网站资源的数据处理系统、方法及装置
CN104537097A (zh) * 2015-01-09 2015-04-22 成都布林特信息技术有限公司 微博舆情监测系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111103847A (zh) * 2019-12-31 2020-05-05 中国兵器装备集团自动化研究所 一种用于数控机床实时数据流的分析系统和分析方法
CN111103847B (zh) * 2019-12-31 2023-01-24 中国兵器装备集团自动化研究所 一种用于数控机床实时数据流的分析系统和分析方法

Also Published As

Publication number Publication date
CN106327039A (zh) 2017-01-11

Similar Documents

Publication Publication Date Title
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
US20190122136A1 (en) Feature processing tradeoff management
CA2953969C (en) Interactive interfaces for machine learning model evaluations
US11100420B2 (en) Input processing for machine learning
US11182691B1 (en) Category-based sampling of machine learning data
US9336184B2 (en) Representation of an interactive document as a graph of entities
US9304672B2 (en) Representation of an interactive document as a graph of entities
US20150379426A1 (en) Optimized decision tree based models
US20160078361A1 (en) Optimized training of linear machine learning models
US20150379425A1 (en) Consistent filtering of machine learning data
US10255363B2 (en) Refining search query results
US11762920B2 (en) Composite index on hierarchical nodes in the hierarchical data model within a case model
CN102521232B (zh) 一种互联网元数据的分布式采集处理系统及方法
CN110019616A (zh) 一种poi现势状态获取方法及其设备、存储介质、服务器
CN102760151A (zh) 开源软件获取与搜索系统的实现方法
CN104391978A (zh) 用于浏览器的网页收藏处理方法及装置
CN108959580A (zh) 一种标签数据的优化方法及系统
KR20170073693A (ko) 유사 그룹 요소 추출
WO2016206395A1 (zh) 周报信息处理方法及装置
US9734171B2 (en) Intelligent redistribution of data in a database
CN113407678B (zh) 知识图谱构建方法、装置和设备
CN110825947A (zh) Url去重方法、装置、设备与计算机可读存储介质
KR101508068B1 (ko) 데이터 중복성 제거 장치 및 그 방법
CN109558403A (zh) 数据聚合方法及装置、计算机装置及计算机可读存储介质
CN112767933B (zh) 公路养护管理系统的语音交互方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16813511

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16813511

Country of ref document: EP

Kind code of ref document: A1