WO2017117912A1 - Data acquisition method, apparatus and device, and computer storage medium - Google Patents

Data acquisition method, apparatus and device, and computer storage medium Download PDF

Info

Publication number
WO2017117912A1
WO2017117912A1 PCT/CN2016/084343 CN2016084343W WO2017117912A1 WO 2017117912 A1 WO2017117912 A1 WO 2017117912A1 CN 2016084343 W CN2016084343 W CN 2016084343W WO 2017117912 A1 WO2017117912 A1 WO 2017117912A1
Authority
WO
WIPO (PCT)
Prior art keywords
conversion data
communication conversion
access log
website
communication
Prior art date
Application number
PCT/CN2016/084343
Other languages
French (fr)
Chinese (zh)
Inventor
吴明丹
王杨
叶峻
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2017117912A1 publication Critical patent/WO2017117912A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Definitions

  • the present invention relates to the field of Internet application technologies, and in particular, to a data acquisition method, apparatus, device, and computer storage medium.
  • the Internet operator can provide a platform, which can include entries of a number of third-party websites, so that the user can directly obtain the entrance of the third-party website through the platform, and then can jump to the third-party website, or the user You can also use the communication tools provided by third-party websites on the platform to communicate online with the client services of third-party websites.
  • the embodiments of the present invention provide a data acquisition method, device, device, and computer storage medium, which are used to solve the problem that an Internet operator cannot obtain real communication conversion data on a communication tool.
  • An aspect of the embodiments of the present invention provides a data acquisition method, including:
  • the access log being generated by the website according to an operation performed by a communication tool provided by the user for the website;
  • the operations performed by the user for the communication tool provided by the website include: the communication tool provided by the user in the browser for the website Click operation
  • the access log includes a uniform resource locator URL that the user accesses the website and a page element that the user clicks.
  • any possible implementation manner further provide an implementation manner of obtaining real communication conversion data from the candidate communication conversion data, including:
  • the aspect as described above, and any possible implementation manner, further provide an implementation, according to the HTTP request, determining whether the candidate communication conversion data obtained according to the access log is real communication conversion data, including:
  • determining, according to the matching result, candidate communication conversion data obtained according to the access log is Whether to convert data for real communication, including:
  • the HTTP request matches the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is real communication conversion data.
  • An aspect of an embodiment of the present invention provides a data acquiring apparatus, including:
  • a receiving module configured to receive an access log sent by a website, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website;
  • a processing module configured to obtain candidate communication conversion data according to the access log
  • An obtaining module configured to obtain real communication conversion data from the candidate communication conversion data.
  • the operations performed by the user for the communication tool provided by the website include: the communication tool provided by the user in the browser for the website Click operation
  • the access log includes a uniform resource locator URL that the user accesses the website and a page element that the user clicks.
  • the acquiring module is specifically configured to:
  • the obtaining module is configured to determine, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data, specifically used for:
  • the obtaining module is configured to determine, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
  • the HTTP request matches the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is real communication conversion data.
  • the real communication conversion data on the communication tool can be obtained according to the access log sent by the website providing the communication tool, thereby solving the problem that the Internet operator cannot obtain the real communication conversion data on the communication tool. Efficiently and simply realizes the acquisition of real communication conversion data, and then can make resource delivery decisions based on real communication conversion data.
  • the candidate real communication conversion data can be filtered, the real communication conversion data is determined, and the accuracy of the real communication conversion data is improved.
  • FIG. 1 is a schematic flowchart of a data acquisition method according to an embodiment of the present invention.
  • FIG. 2 is a functional block diagram of a data acquiring apparatus according to an embodiment of the present invention.
  • the word “if” as used herein may be interpreted as “when” or “when” or “in response to determining” or “in response to detecting.”
  • the phrase “if determined” or “if detected (conditions or events stated)” may be interpreted as “when determined” or “in response to determination” or “when detected (stated condition or event) “Time” or “in response to a test (condition or event stated)”.
  • FIG. 1 it is a schematic flowchart of a data acquisition method according to an embodiment of the present invention. As shown in the figure, the method includes the following steps:
  • S101 Receive an access log sent by a website, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website.
  • the technical solution of the embodiment of the present invention can be applied to a system including a website, a site, and a server where the website is located.
  • the website belongs to a third-party website for the site and the server, and the communication tool provided by the website has a corresponding entrance on the site.
  • the user of the website can perform operations on the communication tool on the site, and the user can implement the problem consultation by performing the operation on the communication tool on the site to obtain the required information.
  • the communication tool may include a communication tool in the form of a page on a browser such as Business Communication, 53 Customer Service, 51talk, QQ, or Music.
  • a monitoring module may be pre-installed in the website, and the monitoring module may automatically monitor the user of the website to perform operations on the communication tool on the website.
  • the monitoring module can be implemented by using a script (Java Script, JS) code.
  • JS code can be code for counting website data.
  • the monitoring module may generate an access log according to the operation performed by the communication tool provided by the user for the website. For example, the user clicks the button of the open communication tool provided by the website in the browser, so that the monitoring module can monitor the user to perform a click operation on the communication tool provided by the website in the browser, and then locate the unified resource according to the user visiting the website.
  • the Uniform Resource Locator (URL) and the page elements clicked by the user generate an access log.
  • the monitoring module may send the access log to the server where the site is located, so that the server can receive the access date sent by the website. Zhi.
  • the monitoring module may also store the generated access log locally, and then send all access logs to the server for a period of time at intervals.
  • Each access log may include a URL of the user visiting the website and a page element clicked by the user.
  • the page element that the user clicks can be a button for opening a communication tool provided by the website that the user clicks in the browser.
  • the operation performed by the user for the communication tool provided by the website may be: a click operation of the communication tool provided by the user in the browser for the website.
  • the server after receiving the access log sent by the website, stores the access log, and uses the access log as a candidate communication conversion data. In this way, the server can obtain an access log from the monitoring module of each third-party website, thereby obtaining a large number of access logs.
  • the monitoring module can monitor and capture the URL of the user visiting the website and the page element clicked by the user, thereby generating candidate communication conversion data, but these are based on the captured information.
  • the generated candidate communication conversion data is not the official data provided by the website where the communication tool is located, and since there is a button corresponding to a certain URL, the button is used to open the communication tool after being clicked, but due to the setting of the URL and the button It is controlled by a third-party website.
  • the URL and button may become other pages instead of the page that provides the communication tool entry.
  • the button may also become a click and open other tools, so the candidate communication conversion data may not necessarily be It is a real communication conversion data.
  • Conversion data is the actual communication conversion data. Therefore, in the embodiment of the present invention, After obtaining the candidate communication conversion data, it is required to further obtain real communication conversion data from the candidate communication conversion data.
  • the server may periodically obtain real communication conversion data from the candidate communication conversion data according to a preset period.
  • the method for the server to obtain the real communication conversion data from the candidate communication conversion data may include:
  • the server simulates an operation performed by a user on a communication tool provided by the website according to the access log, and obtains a Hyper Text Transfer Protocol (HTTP) returned by the website after the simulation is completed. request. Then, the server determines, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
  • HTTP Hyper Text Transfer Protocol
  • the server may use the crawler tool to open the URL of the user visiting the website, and then use the simulation tool to correspond to the opened URL.
  • the page simulates user behavior and clicks on the page elements contained in the access log to simulate what the user is doing with the communication tools provided by the site.
  • the HTTP request returned by the website can be monitored, so that the HTTP request returned by the website can be obtained.
  • the method for the server to determine, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data may include, but is not limited to:
  • the server associates the HTTP request with a preset communication tool Request a match to get a match. Then, the server determines, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
  • the server determines that the candidate communication conversion data obtained by the access log is not real communication conversion data.
  • the server determines that the candidate communication conversion data obtained by the access log is real communication conversion data.
  • the server may pre-set the relevant request of the communication tool, and then, after obtaining the HTTP request returned by the website providing the communication tool, perform the HTTP request and the related request of the preset communication tool. Matching, such as performing format matching of the HTTP request. If the format of the HTTP request matches the format of the related request of the preset communication tool, it is considered that the URL opened before the HTTP request is obtained and the page element of the click is the user's real access. Providing a page of communication tools and opening a communication tool, thereby determining that the candidate communication conversion data on which the simulation operation is based is real communication conversion data, thereby realizing real communication conversion data from the candidate communication conversion data.
  • the communication tool can determine that the candidate communication conversion data on which the simulation operation is based is not the real communication conversion data, thereby realizing the acquisition of the non-real communication conversion data from the candidate communication conversion data, and matching operations of each candidate communication conversion data.
  • Real communication conversion data and non-real communication conversion data can be obtained from the candidate communication conversion data.
  • Embodiments of the present invention further provide an apparatus embodiment for implementing the steps and methods in the foregoing method embodiments.
  • FIG. 2 is a functional block diagram of a data acquiring apparatus according to an embodiment of the present invention. As shown, the device includes:
  • the receiving module 21 is configured to receive an access log sent by the website, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website;
  • the processing module 22 is configured to obtain candidate communication conversion data according to the access log.
  • the obtaining module 23 is configured to obtain real communication conversion data from the candidate communication conversion data.
  • the operations performed by the user for the communication tool provided by the website include: a click operation of the communication tool provided by the user in the browser for the website.
  • the access log includes: a uniform resource locator URL that the user accesses the website and a page element that the user clicks.
  • the obtaining module 23 is specifically configured to:
  • the obtaining module 23 is configured to determine, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data, specifically used for:
  • the obtaining module 23 is configured to determine, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data, specifically used for:
  • the HTTP request matches the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is real communication conversion data.
  • an access log sent by a website is received by a server, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website; thus, the server according to the access log Obtaining candidate communication conversion data; and further, the server obtains real communication conversion data from the candidate communication conversion data.
  • the real communication conversion data on the communication tool can be obtained according to the access log sent by the website providing the communication tool, thereby solving the problem that the Internet operator cannot obtain the real communication conversion data on the communication tool. Efficiently and simply realizes the acquisition of real communication conversion data, and then can make resource delivery decisions based on real communication conversion data.
  • the candidate real communication conversion data can be filtered, the real communication conversion data is determined, and the accuracy of the real communication conversion data is improved.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined. Or it can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above integrated unit implemented in the form of a software functional unit can be stored in one meter
  • the computer can be read in the storage medium.
  • the above software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A data acquisition method, apparatus and device, and a computer storage medium. The method comprises: receiving an access log sent by a website, wherein the access log is generated by the website according to an operation performed by a user and executed with regard to a communication tool provided by the website (S101); obtaining candidate communication conversion data according to the access log (S102); and then acquiring real communication conversion data from the candidate communication conversion data (S103). Therefore, the method can solve the problem that an Internet operator cannot obtain real communication conversion data on a communication tool.

Description

一种数据获取方法、装置、设备及计算机存储介质Data acquisition method, device, device and computer storage medium
本申请要求了申请日为2016年01月04日,申请号为201610003715.1发明名称为“一种数据获取方法及装置”的中国专利申请的优先权。The present application claims the priority of the Chinese patent application whose application date is No.
技术领域Technical field
本发明涉及互联网应用技术领域,尤其涉及一种数据获取方法、装置、设备及计算机存储介质。The present invention relates to the field of Internet application technologies, and in particular, to a data acquisition method, apparatus, device, and computer storage medium.
背景技术Background technique
目前,互联网运营商可以提供一个平台,该平台中可以包括若干第三方网站的入口,这样,用户通过该平台可以直接获取到第三方网站的入口,进而可以跳转到第三方网站,或者,用户也可以在该平台上使用第三方网站提供的沟通工具,与第三方网站的客户服务端进行在线沟通。At present, the Internet operator can provide a platform, which can include entries of a number of third-party websites, so that the user can directly obtain the entrance of the third-party website through the platform, and then can jump to the third-party website, or the user You can also use the communication tools provided by third-party websites on the platform to communicate online with the client services of third-party websites.
互联网运营商为了追踪用户的线上沟通行为,需要获取用户的真实沟通转化数据。然而,现有技术中,只有提供沟通工具的第三方网站才能获得真是沟通转化数据,因此,互联网运行商无法获取沟通工具上的真实沟通转化数据。In order to track the online communication behavior of users, Internet operators need to obtain real communication conversion data of users. However, in the prior art, only the third-party website providing the communication tool can obtain the true communication conversion data, and therefore, the Internet operator cannot obtain the real communication conversion data on the communication tool.
发明内容Summary of the invention
有鉴于此,本发明实施例提供了一种数据获取方法、装置、设备及计算机存储介质,用以解决互联网运营商无法获取沟通工具上的真实沟通转化数据的问题。In view of this, the embodiments of the present invention provide a data acquisition method, device, device, and computer storage medium, which are used to solve the problem that an Internet operator cannot obtain real communication conversion data on a communication tool.
本发明实施例的一方面,提供一种数据获取方法,包括:An aspect of the embodiments of the present invention provides a data acquisition method, including:
接收网站发送的访问日志,所述访问日志为所述网站根据用户针对所述网站提供的沟通工具所执行的操作生成的; Receiving an access log sent by the website, the access log being generated by the website according to an operation performed by a communication tool provided by the user for the website;
根据所述访问日志,获得候选沟通转化数据;Obtaining candidate communication conversion data according to the access log;
从所述候选沟通转化数据中获取真实沟通转化数据。Obtain real communication conversion data from the candidate communication conversion data.
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,用户针对所述网站提供的沟通工具所执行的操作包括:所述用户在浏览器中针对所述网站提供的沟通工具的点击操作;The aspect as described above and any possible implementation manner further provide an implementation manner, the operations performed by the user for the communication tool provided by the website include: the communication tool provided by the user in the browser for the website Click operation
所述访问日志包括:用户访问所述网站的统一资源定位符URL和用户所点击的页面元素。The access log includes a uniform resource locator URL that the user accesses the website and a page element that the user clicks.
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,从所述候选沟通转化数据中获取真实沟通转化数据,包括:The aspect as described above and any possible implementation manner further provide an implementation manner of obtaining real communication conversion data from the candidate communication conversion data, including:
根据所述访问日志,模拟用户针对所述网站提供的沟通工具所执行的操作,以及在模拟完毕后,获得所述网站返回的超文本传输协议HTTP请求;Obtaining, according to the access log, an operation performed by a user on a communication tool provided by the website, and obtaining a hypertext transfer protocol HTTP request returned by the website after the simulation is completed;
根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。Determining, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据,包括:The aspect as described above, and any possible implementation manner, further provide an implementation, according to the HTTP request, determining whether the candidate communication conversion data obtained according to the access log is real communication conversion data, including:
将所述HTTP请求与预设的所述沟通工具的相关请求进行匹配,以获得匹配结果;Matching the HTTP request with a preset related request of the communication tool to obtain a matching result;
根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。And determining, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是 否为真实沟通转化数据,包括:The aspect as described above, and any possible implementation manner, further providing an implementation manner, determining, according to the matching result, candidate communication conversion data obtained according to the access log is Whether to convert data for real communication, including:
若所述HTTP请求与所述沟通工具的相关请求不匹配,确定所述访问日志获得的候选沟通转化数据不是真实沟通转化数据;或者,If the HTTP request does not match the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is not real communication conversion data; or
若所述HTTP请求与所述沟通工具的相关请求匹配,确定所述访问日志获得的候选沟通转化数据是真实沟通转化数据。If the HTTP request matches the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is real communication conversion data.
本发明实施例的一方面,提供一种数据获取装置,包括:An aspect of an embodiment of the present invention provides a data acquiring apparatus, including:
接收模块,用于接收网站发送的访问日志,所述访问日志为所述网站根据用户针对所述网站提供的沟通工具所执行的操作生成的;a receiving module, configured to receive an access log sent by a website, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website;
处理模块,用于根据所述访问日志,获得候选沟通转化数据;a processing module, configured to obtain candidate communication conversion data according to the access log;
获取模块,用于从所述候选沟通转化数据中获取真实沟通转化数据。An obtaining module, configured to obtain real communication conversion data from the candidate communication conversion data.
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,用户针对所述网站提供的沟通工具所执行的操作包括:所述用户在浏览器中针对所述网站提供的沟通工具的点击操作;The aspect as described above and any possible implementation manner further provide an implementation manner, the operations performed by the user for the communication tool provided by the website include: the communication tool provided by the user in the browser for the website Click operation
所述访问日志包括:用户访问所述网站的统一资源定位符URL和用户所点击的页面元素。The access log includes a uniform resource locator URL that the user accesses the website and a page element that the user clicks.
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述获取模块,具体用于:The above-mentioned aspect and any possible implementation manner further provide an implementation manner, where the acquiring module is specifically configured to:
根据所述访问日志,模拟用户针对所述网站提供的沟通工具所执行的操作,以及在模拟完毕后,获得所述网站返回的超文本传输协议HTTP请求;Obtaining, according to the access log, an operation performed by a user on a communication tool provided by the website, and obtaining a hypertext transfer protocol HTTP request returned by the website after the simulation is completed;
根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。Determining, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述获取模块用于根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据时,具体用于:An aspect of the above, and any possible implementation, further providing an implementation manner, The obtaining module is configured to determine, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data, specifically used for:
将所述HTTP请求与预设的所述沟通工具的相关请求进行匹配,以获得匹配结果;Matching the HTTP request with a preset related request of the communication tool to obtain a matching result;
根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。And determining, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述获取模块用于根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据时,具体用于:The foregoing aspect and any possible implementation manner further provide an implementation manner, where the obtaining module is configured to determine, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data. When specifically used to:
若所述HTTP请求与所述沟通工具的相关请求不匹配,确定所述访问日志获得的候选沟通转化数据不是真实沟通转化数据;或者,If the HTTP request does not match the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is not real communication conversion data; or
若所述HTTP请求与所述沟通工具的相关请求匹配,确定所述访问日志获得的候选沟通转化数据是真实沟通转化数据。If the HTTP request matches the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is real communication conversion data.
由以上技术方案可以看出,本发明实施例具有以下有益效果:It can be seen from the above technical solutions that the embodiments of the present invention have the following beneficial effects:
根据本发明实施例提供的技术方案,能够根据提供沟通工具的网站发送的访问日志,获得沟通工具上的真实沟通转化数据,因此解决了互联网运营商无法获取沟通工具上的真实沟通转化数据的问题,高效、简单的实现了真实沟通转化数据的获取,进而可以根据真实沟通转化数据进行资源投放的决策。另外,在获取真实沟通转化数据时,可以对候选真实沟通转化数据进行筛选,从中确定真实沟通转化数据,提高了真实沟通转化数据的准确率。According to the technical solution provided by the embodiment of the present invention, the real communication conversion data on the communication tool can be obtained according to the access log sent by the website providing the communication tool, thereby solving the problem that the Internet operator cannot obtain the real communication conversion data on the communication tool. Efficiently and simply realizes the acquisition of real communication conversion data, and then can make resource delivery decisions based on real communication conversion data. In addition, when obtaining real communication conversion data, the candidate real communication conversion data can be filtered, the real communication conversion data is determined, and the accuracy of the real communication conversion data is improved.
附图说明DRAWINGS
图1是本发明实施例所提供的数据获取方法的流程示意图; 1 is a schematic flowchart of a data acquisition method according to an embodiment of the present invention;
图2是本发明实施例所提供的数据获取装置的功能方块图。FIG. 2 is a functional block diagram of a data acquiring apparatus according to an embodiment of the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本发明进行详细描述。The present invention will be described in detail below with reference to the drawings and specific embodiments.
应当明确,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。It should be understood that the described embodiments are only a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terms used in the embodiments of the present invention are for the purpose of describing particular embodiments only and are not intended to limit the invention. The singular forms "a", "the" and "the"
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" as used herein is merely an association describing the associated object, indicating that there may be three relationships, for example, A and/or B, which may indicate that A exists separately, while A and B, there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual object is an "or" relationship.
取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determining" or "in response to detecting." Similarly, depending on the context, the phrase "if determined" or "if detected (conditions or events stated)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event) "Time" or "in response to a test (condition or event stated)".
本发明实施例给出一种数据获取方法,请参考图1,其为本发明实施例所提供的数据获取方法的流程示意图,如图所示,该方法包括以下步骤: An embodiment of the present invention provides a data acquisition method. Referring to FIG. 1 , it is a schematic flowchart of a data acquisition method according to an embodiment of the present invention. As shown in the figure, the method includes the following steps:
S101,接收网站发送的访问日志,所述访问日志为所述网站根据用户针对所述网站提供的沟通工具所执行的操作生成。S101. Receive an access log sent by a website, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website.
具体的,本发明实施例的技术方案可以应用于包括网站、站点和站点所在服务器的系统,该网站对于站点及服务器而言,属于第三方网站,网站提供的沟通工具在站点上有对应的入口,该网站的用户可以在站点上执行针对该沟通工具的操作,用户可以通过在站点上执行针对该沟通工具的操作来实现问题咨询,以获取所需要的信息。Specifically, the technical solution of the embodiment of the present invention can be applied to a system including a website, a site, and a server where the website is located. The website belongs to a third-party website for the site and the server, and the communication tool provided by the website has a corresponding entrance on the site. The user of the website can perform operations on the communication tool on the site, and the user can implement the problem consultation by performing the operation on the communication tool on the site to obtain the required information.
例如,沟通工具可以包括商务通、53客服、51talk、QQ或者乐语等浏览器上页面形式的沟通工具。For example, the communication tool may include a communication tool in the form of a page on a browser such as Business Communication, 53 Customer Service, 51talk, QQ, or Music.
在一个具体的实现过程中,可以在所述网站中预先安装一个监控模块,该监控模块可以自动监控该网站的用户在站点上执行针对该沟通工具的操作。In a specific implementation process, a monitoring module may be pre-installed in the website, and the monitoring module may automatically monitor the user of the website to perform operations on the communication tool on the website.
本发明实施例中,所述监控模块可以利用脚本(Java Script,JS)代码实现。例如,该JS代码可以是用于统计网站数据的代码。In the embodiment of the present invention, the monitoring module can be implemented by using a script (Java Script, JS) code. For example, the JS code can be code for counting website data.
在一个具体的实现过程中,若用户在站点上针对沟通工具执行了操作,监控模块可以根据用户针对该网站提供的沟通工具所执行的操作,生成一条访问日志。例如,用户在浏览器中点击了网站提供的打开沟通工具的按钮,这样,监控模块可以监控到用户在浏览器中针对网站提供的沟通工具执行了点击操作,然后根据用户访问网站的统一资源定位符(Uniform Resource Locator,URL)和用户所点击的页面元素,生成一条访问日志。In a specific implementation process, if the user performs an operation on the communication tool on the site, the monitoring module may generate an access log according to the operation performed by the communication tool provided by the user for the website. For example, the user clicks the button of the open communication tool provided by the website in the browser, so that the monitoring module can monitor the user to perform a click operation on the communication tool provided by the website in the browser, and then locate the unified resource according to the user visiting the website. The Uniform Resource Locator (URL) and the page elements clicked by the user generate an access log.
进一步的,监控模块在生成一条访问日志后,可以将该访问日志发送给所述站点所在服务器,这样服务器就可以接收到网站发送的访问日 志。或者,监控模块也可以将生成的访问日志进行本地存储,然后每间隔一段时间就将这一段时间内所有的访问日志发送给服务器。其中,每条访问日志可以包含用户访问网站的URL和用户所点击的页面元素。Further, after generating an access log, the monitoring module may send the access log to the server where the site is located, so that the server can receive the access date sent by the website. Zhi. Alternatively, the monitoring module may also store the generated access log locally, and then send all access logs to the server for a period of time at intervals. Each access log may include a URL of the user visiting the website and a page element clicked by the user.
例如,用户所点击的页面元素可以为用户在浏览器中点击的网站提供的打开沟通工具的按钮。相应的,用户针对所述网站提供的沟通工具所执行的操作可以为:所述用户在浏览器中针对所述网站提供的沟通工具的点击操作。For example, the page element that the user clicks can be a button for opening a communication tool provided by the website that the user clicks in the browser. Correspondingly, the operation performed by the user for the communication tool provided by the website may be: a click operation of the communication tool provided by the user in the browser for the website.
S102,根据所述访问日志,获得候选沟通转化数据。S102. Obtain candidate communication conversion data according to the access log.
具体的,服务器在收到网站发送的访问日志后,对该访问日志进行存储,并将该访问日志作为一个候选沟通转化数据。如此,服务器可以从各第三方网站的监控模块发送的访问日志,从而获得大量的访问日志。Specifically, after receiving the access log sent by the website, the server stores the access log, and uses the access log as a candidate communication conversion data. In this way, the server can obtain an access log from the monitoring module of each third-party website, thereby obtaining a large number of access logs.
S103,从所述候选沟通转化数据中获取真实沟通转化数据。S103. Obtain real communication conversion data from the candidate communication conversion data.
具体的,可以理解的是,本发明实施例中,可以通过监控模块去监控并抓取用户访问网站的URL和用户所点击的页面元素,进而生成候选沟通转化数据,然而这些根据抓取的信息生成的候选沟通转化数据,并不是由沟通工具所在网站提供的官方数据,而且由于会存在某URL所对应的页面中有按钮是用于被点击后可以打开沟通工具,但是由于URL和按钮的设置是由第三方网站控制的,该URL和按钮可能会变成其他页面,而不是提供沟通工具入口的页面,按钮也有可能变成被点击后打开其他工具,所以导致候选沟通转化数据并不一定都是真实沟通转化数据,只有当用户访问网站的URL对应的页面是提供沟通工具入口的页面,且用户所点击的页面元素是用于在被点击后可以打开沟通工具的按钮时,生成的候选沟通转化数据才是真实沟通转化数据。因此,本发明实施例中, 在获得所述候选沟通转化数据之后,需要进一步从所述候选沟通转化数据中获取真实沟通转化数据。Specifically, in the embodiment of the present invention, the monitoring module can monitor and capture the URL of the user visiting the website and the page element clicked by the user, thereby generating candidate communication conversion data, but these are based on the captured information. The generated candidate communication conversion data is not the official data provided by the website where the communication tool is located, and since there is a button corresponding to a certain URL, the button is used to open the communication tool after being clicked, but due to the setting of the URL and the button It is controlled by a third-party website. The URL and button may become other pages instead of the page that provides the communication tool entry. The button may also become a click and open other tools, so the candidate communication conversion data may not necessarily be It is a real communication conversion data. Only when the page corresponding to the URL of the user visiting the website is a page providing a communication tool entry, and the page element clicked by the user is a button for opening the communication tool after being clicked, the generated candidate communication is generated. Conversion data is the actual communication conversion data. Therefore, in the embodiment of the present invention, After obtaining the candidate communication conversion data, it is required to further obtain real communication conversion data from the candidate communication conversion data.
在一个具体的实现过程中,所述服务器可以根据预设周期,周期性的从候选沟通转化数据中获取真实沟通转化数据。In a specific implementation process, the server may periodically obtain real communication conversion data from the candidate communication conversion data according to a preset period.
举例说明,本发明实施例中,所述服务器从所述候选沟通转化数据中获取真实沟通转化数据的方法可以包括:For example, in the embodiment of the present invention, the method for the server to obtain the real communication conversion data from the candidate communication conversion data may include:
首先,所述服务器根据所述访问日志,模拟用户针对所述网站提供的沟通工具所执行的操作,以及在模拟完毕后,获得所述网站返回的超文本传输协议(Hyper Text Transfer Protocol,HTTP)请求。然后,所述服务器根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。First, the server simulates an operation performed by a user on a communication tool provided by the website according to the access log, and obtains a Hyper Text Transfer Protocol (HTTP) returned by the website after the simulation is completed. request. Then, the server determines, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
在一个具体的实现过程中,由于所述访问日志包含用户访问网站的URL和用户所点击的页面元素,因此,服务器可以使用爬虫工具打开用户访问网站的URL,然后利用模拟工具在打开的URL对应的页面中模拟用户行为,点击访问日志中包含的页面元素,从而实现模拟用户针对网站提供的沟通工具所执行的操作。In a specific implementation process, since the access log includes the URL of the user visiting the website and the page element clicked by the user, the server may use the crawler tool to open the URL of the user visiting the website, and then use the simulation tool to correspond to the opened URL. The page simulates user behavior and clicks on the page elements contained in the access log to simulate what the user is doing with the communication tools provided by the site.
可以理解的是,当模拟用户针对网站提供的沟通工具所执行的操作之后,可以对该网站返回的HTTP请求进行监听,从而可以获得所述网站返回的HTTP请求。It can be understood that after simulating the operation performed by the user for the communication tool provided by the website, the HTTP request returned by the website can be monitored, so that the HTTP request returned by the website can be obtained.
举例说明,所述服务器根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据的方法可以包括但不限于:For example, the method for the server to determine, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data may include, but is not limited to:
首先,所述服务器将所述HTTP请求与预设的所述沟通工具的相关 请求进行匹配,以获得匹配结果。然后,所述服务器根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。First, the server associates the HTTP request with a preset communication tool Request a match to get a match. Then, the server determines, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
进一步的,若所述匹配结果为所述HTTP请求与所述沟通工具的相关请求不匹配,所述服务器确定所述访问日志获得的候选沟通转化数据不是真实沟通转化数据。或者,若所述匹配结果为所述HTTP请求与所述沟通工具的相关请求匹配,所述服务器确定所述访问日志获得的候选沟通转化数据是真实沟通转化数据。Further, if the matching result is that the HTTP request does not match the related request of the communication tool, the server determines that the candidate communication conversion data obtained by the access log is not real communication conversion data. Alternatively, if the matching result is that the HTTP request matches the related request of the communication tool, the server determines that the candidate communication conversion data obtained by the access log is real communication conversion data.
在一个具体的实现过程中,所述服务器中可以预先设置好沟通工具的相关请求,然后当获得提供该沟通工具的网站返回的HTTP请求后,将HTTP请求与预设的沟通工具的相关请求进行匹配,如进行HTTP请求的格式匹配,如果该HTTP请求的格式与预设的沟通工具的相关请求的格式相符,则认为获取该HTTP请求之前打开的URL以及点击的页面元素是用户真实的访问了提供沟通工具的页面,并打开了沟通工具,从而可以确定本次模拟操作所依据的候选沟通转化数据是真实沟通转化数据,从而实现从候选沟通转化数据中获取真实沟通转化数据。In a specific implementation process, the server may pre-set the relevant request of the communication tool, and then, after obtaining the HTTP request returned by the website providing the communication tool, perform the HTTP request and the related request of the preset communication tool. Matching, such as performing format matching of the HTTP request. If the format of the HTTP request matches the format of the related request of the preset communication tool, it is considered that the URL opened before the HTTP request is obtained and the page element of the click is the user's real access. Providing a page of communication tools and opening a communication tool, thereby determining that the candidate communication conversion data on which the simulation operation is based is real communication conversion data, thereby realizing real communication conversion data from the candidate communication conversion data.
反之,如果该HTTP请求的格式与预设的沟通工具的相关请求的格式不相符,则认为获取该HTTP请求之前打开的URL以及点击的页面元素,不是用户访问了提供沟通工具的页面,并打开了沟通工具,从而可以确定本次模拟操作所依据的候选沟通转化数据不是真实沟通转化数据,从而实现从候选沟通转化数据中获取非真实沟通转化数据,通过对每个候选沟通转化数据的匹配操作,可以从候选沟通转化数据中获取真实沟通转化数据以及非真实沟通转化数据。 On the other hand, if the format of the HTTP request does not match the format of the related request of the preset communication tool, it is considered that the URL opened before the HTTP request is obtained and the page element of the click is not the user accessing the page providing the communication tool, and is opened. The communication tool can determine that the candidate communication conversion data on which the simulation operation is based is not the real communication conversion data, thereby realizing the acquisition of the non-real communication conversion data from the candidate communication conversion data, and matching operations of each candidate communication conversion data. Real communication conversion data and non-real communication conversion data can be obtained from the candidate communication conversion data.
本发明实施例进一步给出实现上述方法实施例中各步骤及方法的装置实施例。Embodiments of the present invention further provide an apparatus embodiment for implementing the steps and methods in the foregoing method embodiments.
请参考图2,其为是本发明实施例所提供的数据获取装置的功能方块图。如图所示,该装置包括:Please refer to FIG. 2 , which is a functional block diagram of a data acquiring apparatus according to an embodiment of the present invention. As shown, the device includes:
接收模块21,用于接收网站发送的访问日志,所述访问日志为所述网站根据用户针对所述网站提供的沟通工具所执行的操作生成的;The receiving module 21 is configured to receive an access log sent by the website, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website;
处理模块22,用于根据所述访问日志,获得候选沟通转化数据;The processing module 22 is configured to obtain candidate communication conversion data according to the access log.
获取模块23,用于从所述候选沟通转化数据中获取真实沟通转化数据。The obtaining module 23 is configured to obtain real communication conversion data from the candidate communication conversion data.
在一个具体的实现过程中,用户针对所述网站提供的沟通工具所执行的操作包括:所述用户在浏览器中针对所述网站提供的沟通工具的点击操作。In a specific implementation process, the operations performed by the user for the communication tool provided by the website include: a click operation of the communication tool provided by the user in the browser for the website.
在一个具体的实现过程中,所述访问日志包括:用户访问所述网站的统一资源定位符URL和用户所点击的页面元素。In a specific implementation process, the access log includes: a uniform resource locator URL that the user accesses the website and a page element that the user clicks.
在一个具体的实现过程中,所述获取模块23,具体用于:In a specific implementation process, the obtaining module 23 is specifically configured to:
根据所述访问日志,模拟用户针对所述网站提供的沟通工具所执行的操作,以及在模拟完毕后,获得所述网站返回的超文本传输协议HTTP请求;Obtaining, according to the access log, an operation performed by a user on a communication tool provided by the website, and obtaining a hypertext transfer protocol HTTP request returned by the website after the simulation is completed;
根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。Determining, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
在一个具体的实现过程中,所述获取模块23用于根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据时,具体用于: In a specific implementation process, the obtaining module 23 is configured to determine, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data, specifically used for:
将所述HTTP请求与预设的所述沟通工具的相关请求进行匹配,以获得匹配结果;Matching the HTTP request with a preset related request of the communication tool to obtain a matching result;
根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。And determining, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
在一个具体的实现过程中,所述获取模块23用于根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据时,具体用于:In a specific implementation process, the obtaining module 23 is configured to determine, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data, specifically used for:
若所述HTTP请求与所述沟通工具的相关请求不匹配,确定所述访问日志获得的候选沟通转化数据不是真实沟通转化数据;或者,If the HTTP request does not match the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is not real communication conversion data; or
若所述HTTP请求与所述沟通工具的相关请求匹配,确定所述访问日志获得的候选沟通转化数据是真实沟通转化数据。If the HTTP request matches the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is real communication conversion data.
由于本实施例中的各单元能够执行图1所示的方法,本实施例未详细描述的部分,可参考对图1的相关说明。Since the units in this embodiment can perform the method shown in FIG. 1, and the parts not described in detail in this embodiment, reference may be made to the related description of FIG.
本发明实施例的技术方案具有以下有益效果:The technical solution of the embodiment of the invention has the following beneficial effects:
本发明实施例中,通过服务器接收网站发送的访问日志,所述访问日志为所述网站根据用户针对所述网站提供的沟通工具所执行的操作生成的;从而,所述服务器根据所述访问日志,获得候选沟通转化数据;进而,所述服务器从所述候选沟通转化数据中获取真实沟通转化数据。In the embodiment of the present invention, an access log sent by a website is received by a server, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website; thus, the server according to the access log Obtaining candidate communication conversion data; and further, the server obtains real communication conversion data from the candidate communication conversion data.
根据本发明实施例提供的技术方案,能够根据提供沟通工具的网站发送的访问日志,获得沟通工具上的真实沟通转化数据,因此解决了互联网运营商无法获取沟通工具上的真实沟通转化数据的问题,高效、简单的实现了真实沟通转化数据的获取,进而可以根据真实沟通转化数据进行资源投放的决策。 According to the technical solution provided by the embodiment of the present invention, the real communication conversion data on the communication tool can be obtained according to the access log sent by the website providing the communication tool, thereby solving the problem that the Internet operator cannot obtain the real communication conversion data on the communication tool. Efficiently and simply realizes the acquisition of real communication conversion data, and then can make resource delivery decisions based on real communication conversion data.
另外,在获取真实沟通转化数据时,可以对候选真实沟通转化数据进行筛选,从中确定真实沟通转化数据,提高了真实沟通转化数据的准确率。In addition, when obtaining real communication conversion data, the candidate real communication conversion data can be filtered, the real communication conversion data is determined, and the accuracy of the real communication conversion data is improved.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined. Or it can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计 算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机装置(可以是个人计算机,服务器,或者网络装置等)或处理器(Processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above integrated unit implemented in the form of a software functional unit can be stored in one meter The computer can be read in the storage medium. The above software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。 The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are made within the spirit and principles of the present invention, should be included in the present invention. Within the scope of protection.

Claims (12)

  1. 一种数据获取方法,其特征在于,所述方法包括:A data acquisition method, the method comprising:
    接收网站发送的访问日志,所述访问日志为所述网站根据用户针对所述网站提供的沟通工具所执行的操作生成的;Receiving an access log sent by the website, the access log being generated by the website according to an operation performed by a communication tool provided by the user for the website;
    根据所述访问日志,获得候选沟通转化数据;Obtaining candidate communication conversion data according to the access log;
    从所述候选沟通转化数据中获取真实沟通转化数据。Obtain real communication conversion data from the candidate communication conversion data.
  2. 根据权利要求1所述的方法,其特征在于,The method of claim 1 wherein
    用户针对所述网站提供的沟通工具所执行的操作包括:所述用户在浏览器中针对所述网站提供的沟通工具的点击操作;The operations performed by the user for the communication tool provided by the website include: a click operation of the communication tool provided by the user in the browser for the website;
    所述访问日志包括:用户访问所述网站的统一资源定位符URL和用户所点击的页面元素。The access log includes a uniform resource locator URL that the user accesses the website and a page element that the user clicks.
  3. 根据权利要求1所述的方法,其特征在于,从所述候选沟通转化数据中获取真实沟通转化数据,包括:The method according to claim 1, wherein the real communication conversion data is obtained from the candidate communication conversion data, including:
    根据所述访问日志,模拟用户针对所述网站提供的沟通工具所执行的操作,以及在模拟完毕后,获得所述网站返回的超文本传输协议HTTP请求;Obtaining, according to the access log, an operation performed by a user on a communication tool provided by the website, and obtaining a hypertext transfer protocol HTTP request returned by the website after the simulation is completed;
    根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。Determining, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
  4. 根据权利要求3所述的方法,其特征在于,根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据,包括:The method according to claim 3, wherein determining whether the candidate communication conversion data obtained according to the access log is real communication conversion data according to the HTTP request comprises:
    将所述HTTP请求与预设的所述沟通工具的相关请求进行匹配,以获得匹配结果; Matching the HTTP request with a preset related request of the communication tool to obtain a matching result;
    根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。And determining, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
  5. 根据权利要求4所述的方法,其特征在于,根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据,包括:The method according to claim 4, wherein determining whether the candidate communication conversion data obtained according to the access log is real communication conversion data according to the matching result comprises:
    若所述HTTP请求与所述沟通工具的相关请求不匹配,确定所述访问日志获得的候选沟通转化数据不是真实沟通转化数据;或者,If the HTTP request does not match the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is not real communication conversion data; or
    若所述HTTP请求与所述沟通工具的相关请求匹配,确定所述访问日志获得的候选沟通转化数据是真实沟通转化数据。If the HTTP request matches the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is real communication conversion data.
  6. 一种数据获取装置,其特征在于,所述装置包括:A data acquisition device, characterized in that the device comprises:
    接收模块,用于接收网站发送的访问日志,所述访问日志为所述网站根据用户针对所述网站提供的沟通工具所执行的操作生成的;a receiving module, configured to receive an access log sent by a website, where the access log is generated by the website according to an operation performed by a communication tool provided by the user for the website;
    处理模块,用于根据所述访问日志,获得候选沟通转化数据;a processing module, configured to obtain candidate communication conversion data according to the access log;
    获取模块,用于从所述候选沟通转化数据中获取真实沟通转化数据。An obtaining module, configured to obtain real communication conversion data from the candidate communication conversion data.
  7. 根据权利要求6所述的装置,其特征在于,The device of claim 6 wherein:
    用户针对所述网站提供的沟通工具所执行的操作包括:所述用户在浏览器中针对所述网站提供的沟通工具的点击操作;The operations performed by the user for the communication tool provided by the website include: a click operation of the communication tool provided by the user in the browser for the website;
    所述访问日志包括:用户访问所述网站的统一资源定位符URL和用户所点击的页面元素。The access log includes a uniform resource locator URL that the user accesses the website and a page element that the user clicks.
  8. 根据权利要求6所述的装置,其特征在于,所述获取模块,具体用于:The device according to claim 6, wherein the obtaining module is specifically configured to:
    根据所述访问日志,模拟用户针对所述网站提供的沟通工具所执行的操作,以及在模拟完毕后,获得所述网站返回的超文本传输协议HTTP 请求;Obtaining, according to the access log, an operation performed by a user on a communication tool provided by the website, and obtaining a hypertext transfer protocol HTTP returned by the website after the simulation is completed. request;
    根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。Determining, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
  9. 根据权利要求8所述的装置,其特征在于,所述获取模块用于根据所述HTTP请求,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据时,具体用于:The device according to claim 8, wherein the obtaining module is configured to determine, according to the HTTP request, whether the candidate communication conversion data obtained according to the access log is real communication conversion data, specifically used for:
    将所述HTTP请求与预设的所述沟通工具的相关请求进行匹配,以获得匹配结果;Matching the HTTP request with a preset related request of the communication tool to obtain a matching result;
    根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据。And determining, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data.
  10. 根据权利要求9所述的装置,其特征在于,所述获取模块用于根据所述匹配结果,判断根据所述访问日志获得的候选沟通转化数据是否为真实沟通转化数据时,具体用于:The device according to claim 9, wherein the obtaining module is configured to determine, according to the matching result, whether the candidate communication conversion data obtained according to the access log is real communication conversion data, specifically used for:
    若所述HTTP请求与所述沟通工具的相关请求不匹配,确定所述访问日志获得的候选沟通转化数据不是真实沟通转化数据;或者,If the HTTP request does not match the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is not real communication conversion data; or
    若所述HTTP请求与所述沟通工具的相关请求匹配,确定所述访问日志获得的候选沟通转化数据是真实沟通转化数据。If the HTTP request matches the related request of the communication tool, determining that the candidate communication conversion data obtained by the access log is real communication conversion data.
  11. 一种设备,包括a device, including
    一个或者多个处理器;One or more processors;
    存储器;Memory
    一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时:One or more programs, the one or more programs being stored in the memory, when executed by the one or more processors:
    接收网站发送的访问日志,所述访问日志为所述网站根据用户针对 所述网站提供的沟通工具所执行的操作生成的;Receiving an access log sent by the website, the access log is for the website according to the user Generated by the operations performed by the communication tools provided by the website;
    根据所述访问日志,获得候选沟通转化数据;Obtaining candidate communication conversion data according to the access log;
    从所述候选沟通转化数据中获取真实沟通转化数据。Obtain real communication conversion data from the candidate communication conversion data.
  12. 一种计算机存储介质,所述计算机存储介质被编码有计算机程序,所述程序在被一个或多个计算机执行时,使得所述一个或多个计算机执行如下操作:A computer storage medium encoded with a computer program, when executed by one or more computers, causes the one or more computers to perform the following operations:
    接收网站发送的访问日志,所述访问日志为所述网站根据用户针对所述网站提供的沟通工具所执行的操作生成的;Receiving an access log sent by the website, the access log being generated by the website according to an operation performed by a communication tool provided by the user for the website;
    根据所述访问日志,获得候选沟通转化数据;Obtaining candidate communication conversion data according to the access log;
    从所述候选沟通转化数据中获取真实沟通转化数据。 Obtain real communication conversion data from the candidate communication conversion data.
PCT/CN2016/084343 2016-01-04 2016-06-01 Data acquisition method, apparatus and device, and computer storage medium WO2017117912A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610003715.1 2016-01-04
CN201610003715.1A CN105701175B (en) 2016-01-04 2016-01-04 A kind of data capture method and device

Publications (1)

Publication Number Publication Date
WO2017117912A1 true WO2017117912A1 (en) 2017-07-13

Family

ID=56225965

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/084343 WO2017117912A1 (en) 2016-01-04 2016-06-01 Data acquisition method, apparatus and device, and computer storage medium

Country Status (2)

Country Link
CN (1) CN105701175B (en)
WO (1) WO2017117912A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156634B (en) * 2016-07-13 2019-06-14 成都知道创宇信息技术有限公司 A method of identification Web program bug

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010123000A (en) * 2008-11-20 2010-06-03 Nippon Telegr & Teleph Corp <Ntt> Web page group extraction method, device and program
CN103067198A (en) * 2012-12-14 2013-04-24 北京集奥聚合科技有限公司 Method and system related to Cookie identity (ID) of user
CN103729380A (en) * 2012-10-16 2014-04-16 阿里巴巴集团控股有限公司 Data processing method, system and device
CN104715064A (en) * 2015-03-31 2015-06-17 北京奇虎科技有限公司 Method and server for marking keywords on webpage

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103973749A (en) * 2013-02-05 2014-08-06 腾讯科技(深圳)有限公司 Cloud server and website processing method based on same
CN104579830B (en) * 2014-12-25 2018-05-25 小米科技有限责任公司 service monitoring method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010123000A (en) * 2008-11-20 2010-06-03 Nippon Telegr & Teleph Corp <Ntt> Web page group extraction method, device and program
CN103729380A (en) * 2012-10-16 2014-04-16 阿里巴巴集团控股有限公司 Data processing method, system and device
CN103067198A (en) * 2012-12-14 2013-04-24 北京集奥聚合科技有限公司 Method and system related to Cookie identity (ID) of user
CN104715064A (en) * 2015-03-31 2015-06-17 北京奇虎科技有限公司 Method and server for marking keywords on webpage

Also Published As

Publication number Publication date
CN105701175B (en) 2017-11-07
CN105701175A (en) 2016-06-22

Similar Documents

Publication Publication Date Title
US9300672B2 (en) Managing user access to query results
US9720569B2 (en) Cloud-based custom metric/timer definitions and real-time analytics of mobile applications
JP5952307B2 (en) System, method and medium for managing ambient adaptability of web applications and web services
US20180005245A1 (en) Service management using user experience metrics
US10623522B2 (en) Uploading a form attachment
US20120130801A1 (en) System and method for mobile advertising
US10362086B2 (en) Method and system for automating submission of issue reports
CN110321154B (en) Micro-service interface information display method and device and electronic equipment
TW201737163A (en) Problem prediction method and prediction system
WO2012052998A1 (en) System and method for performance measurement of networked enterprise applications
US20160314033A1 (en) Tracking incomplete transactions in correlation with application errors
WO2021129335A1 (en) Operation monitoring method and apparatus, operation analysis method and apparatus
KR101055871B1 (en) Apparatus and method for extracting user session information through real-time analysis of web logs
WO2017117912A1 (en) Data acquisition method, apparatus and device, and computer storage medium
EP4156009A1 (en) Systematic identification and masking of private data for replaying user sessions
US9785711B2 (en) Online location sharing through an internet service search engine
CN106126538B (en) Page conversion processing method and device
US20140040456A1 (en) Managing website registrations
CN110069649B (en) Graphic file retrieval method, graphic file retrieval device, graphic file retrieval equipment and computer readable storage medium
US20150199773A1 (en) Creating business profiles by third party user on-boarding
CN112214743A (en) Method, device, equipment and storage medium for simulating account login
WO2019005434A1 (en) Developer experience relevant to a variation of an application programming interface
JP5322972B2 (en) Web screen restoration device, web screen restoration method, and web screen restoration program
JP2001344052A (en) Method for generating icon, system for the same, icon information transmitting device and recording medium
CA3055087C (en) Developer experience relevant to a variation of an application programming interface

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16883053

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16883053

Country of ref document: EP

Kind code of ref document: A1