CN106991175B - Customer information mining method, device, equipment and storage medium - Google Patents

Customer information mining method, device, equipment and storage medium Download PDF

Info

Publication number
CN106991175B
CN106991175B CN201710220155.XA CN201710220155A CN106991175B CN 106991175 B CN106991175 B CN 106991175B CN 201710220155 A CN201710220155 A CN 201710220155A CN 106991175 B CN106991175 B CN 106991175B
Authority
CN
China
Prior art keywords
information
webpage
target
client
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710220155.XA
Other languages
Chinese (zh)
Other versions
CN106991175A (en
Inventor
齐海凤
彭长平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710220155.XA priority Critical patent/CN106991175B/en
Publication of CN106991175A publication Critical patent/CN106991175A/en
Application granted granted Critical
Publication of CN106991175B publication Critical patent/CN106991175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for mining customer information. The method comprises the following steps: determining retrieval type information of a preset industry; retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result; determining a target website from the webpage information; and acquiring a candidate webpage set of the target website, and filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology to acquire the portrait characteristics of the target client. By adopting the technical scheme, the target client can be accurately locked from mass information, the client amount is increased, and the portrait characteristic information of the client is acquired so as to ensure the accuracy of subsequent marketing.

Description

Customer information mining method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of information processing, in particular to a client information mining method, a client information mining device, client information mining equipment and a storage medium.
Background
With the continuous development of the electronic commerce era, the market competition is increasingly intensified, in the marketing process, the competition of customer resources never stops, potential customers are continuously mined, the potential customers are converted into real customers, and the method has important significance for enterprises to obtain more benefits and enhance the market competitiveness. The potential client refers to a to-be-developed client with purchasing ability and having a need for a certain product or service, and the potential client and the enterprise have a sales cooperation opportunity, and the potential client can be converted into a real client through the efforts of the enterprise and sales personnel. The potential customers are determined by mining the customer information, and the method has great significance for expanding the marketing range of enterprises.
The existing method for acquiring the potential customer information generally acquires the customer information in various modes of marketing activities, telephone consultation or customer interview and the like, and carries out marketing follow-up, but the quality of the customer information acquired by the existing method is uneven, some customer information even has no marketing value, and the potential customer cannot be accurately locked.
Disclosure of Invention
The embodiment of the invention provides a client information mining method, a client information mining device, client information mining equipment and a storage medium, and aims to overcome the technical defect that the existing client information mining method cannot accurately lock potential clients.
In a first aspect, an embodiment of the present invention provides a method for mining customer information, including:
determining retrieval type information of a preset industry;
retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result;
determining a target website from the webpage information;
and acquiring a candidate webpage set of the target website, and filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology to acquire the portrait characteristics of the target client.
In a second aspect, an embodiment of the present invention further provides a client information mining apparatus, including:
the retrieval type determining module is used for determining retrieval type information of a preset industry;
the retrieval module is used for retrieving through a search engine according to the retrieval type information of the preset industry and capturing webpage information from a retrieval result;
the target website determining module is used for determining a target website from the webpage information;
and the information mining module is used for acquiring a candidate webpage set of the target website, filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology, and acquiring the portrait characteristics of the target client.
In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the client information mining method according to any one of the embodiments of the present invention when executing the computer program.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the client information mining method according to any one of the embodiments of the present invention.
According to the technical scheme provided by the embodiment of the invention, retrieval type information of a preset industry is retrieved based on a search engine, webpage information is captured from a retrieval result and a target website is determined, and the portrait characteristic information of a target customer is obtained by screening and analyzing a webpage set corresponding to the target website. By adopting the technical scheme, the target client can be accurately locked from mass information, the client amount is increased, and the portrait characteristic information of the client is acquired so as to ensure the accuracy of subsequent marketing.
Drawings
Fig. 1 is a schematic flowchart of a client information mining method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a client information mining method according to a second embodiment of the present invention;
fig. 3 is a schematic flowchart of a client information mining method according to a third embodiment of the present invention;
fig. 4 is a schematic flowchart of a client information mining method according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a client information mining apparatus according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flow diagrams. Although a flowchart depicts various operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1 is a schematic flowchart of a client information mining method according to an embodiment of the present invention. The method of the present embodiment may be executed by a client information mining apparatus, the apparatus may be implemented by hardware and/or software, the apparatus may be integrated in a server or a terminal device having a client information mining function, and the method provided by the present embodiment may be generally applicable to the case of mining client information. As shown in fig. 1, the method of the present embodiment includes:
110. and determining retrieval type information of a preset industry.
For example, the industry generally refers to the economic activity categories divided according to the production of the same kind of products, the same technological process or the provision of the same kind of labor service, such as the food industry, the clothing industry, the machinery industry, the financial industry, the mobile internet industry and the like, and can be further divided into small industries under various large industries. The preset industry can be a certain predetermined industry, and the retrieval formula can be understood as a query condition input by a user in a search engine, and can generally refer to an input keyword and also include a retrieval statement. The search formula as corresponding to the flower industry may include: keywords such as flower shop, flower express, flower purchase, and flower gift, and search sentences such as "what is in the common flower express net".
120. And retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result.
130. And determining a target website from the webpage information.
Illustratively, the search engine analyzes the retrieved information to obtain a large amount of webpage information, and the webpage information can be screened according to preset rules to determine the target website.
140. And acquiring a candidate webpage set of the target website, and filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology to acquire the portrait characteristics of the target client.
For example, the content filtering technology may be understood as a technology for filtering a web page according to web page content, and the web page classification technology, the web page entity analysis technology, the web page industry filtering technology, and the like all belong to the content filtering technology. The customer portrait refers to a target customer model established on a series of real data, for example, by taking e-commerce shopping as an example, e-commerce can construct an accurate consumption portrait for each customer by modeling the individual consumption capacity, consumption content, consumption quality, consumption channel and consumption stimulation of the customer for a long time and multiple times. In this embodiment, the web pages in the candidate web page set may be screened and analyzed by a content filtering technique, the web pages corresponding to the target client may be screened, and the portrait characteristics of the target client may be obtained by analyzing the web page information included in the web pages corresponding to the target client.
Optionally, the portrait characteristics of the target webpage corresponding to the client may include: the business to which the client entity belongs, the client home-run product and the client contact information.
The massive webpage information indexed by the search engine has the outstanding characteristics of multiple index categories and disordered index content, for example, common index categories comprise indexes of portals, industry websites, transaction websites, forums and enterprise websites and the like, and the index content has the problems of diversified webpage templates, irregular codes, various webpage advertisement redundant information and the like. If the latent customer information is mined by analyzing massive webpage information indexed by the search engine one by one, the calculation amount is very large.
According to the technical scheme provided by the embodiment, the search engine is used for searching the search type information of the preset industry, the webpage information is captured from the search result, the target website is determined, the webpage set corresponding to the target website is further screened and analyzed, and the portrait feature information of the target customer is obtained. By screening and analyzing the candidate webpage set, the client information with marketing potential is extracted from the webpage, the target client is accurately locked from massive information, the client amount is improved, the portrait characteristic information of the client is obtained, and the accuracy of follow-up marketing is guaranteed.
Example two
Fig. 2 is a schematic flow chart of a customer information mining method according to a second embodiment of the present invention, and in this embodiment, based on the first embodiment, the search-type information of the predetermined industry is determined to be optimized, and as shown in fig. 2, the method according to this embodiment includes:
210. and screening all searched retrieval formula information in a preset time range as a candidate retrieval formula set according to the historical display log information.
For example, the historical presentation log information may be understood as all log information based on user search behavior in a search engine, and all retrieval-type information related in a user retrieval process in a time range is obtained through analysis according to presentation log information within a certain time range from the current time.
220. And classifying all the retrieval formula information in the candidate retrieval formula set by a short text classification technology based on the corresponding relation between the industry and the keywords in the keyword popularization system, and screening out the retrieval formula information of the preset industry.
Illustratively, the keyword promotion system refers to the bidding and purchasing of a client on keywords in a search engine, so that the corresponding relation between the client and the keywords is formed, and the relation between the industry to which the client belongs and the keywords can be established. The short text generally refers to a text with a relatively short text length, a single short text is generally only dozens of bytes in size and can contain several to dozens of words, the retrieval formula in the embodiment belongs to the category of short texts, and the short text classification technology can be understood as a technology for classifying the retrieval formula. According to the embodiment, a model for text classification can be trained based on the corresponding relation between industries and keywords in the keyword popularization system, short text classification is realized, industry information corresponding to various search formulas is obtained, and then the search formula information of a specified industry is screened out according to the industry information.
230. And retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result.
240. And determining a target website from the webpage information.
250. And acquiring a candidate webpage set of the target website, and filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology to acquire the portrait characteristics of the target client.
According to the technical scheme provided by the embodiment, the retrieval formula set in the preset time range is screened out through history display log information, the retrieval formulas in the retrieval formula set are screened out according to the short text classification technology and the corresponding relation between the industry and the keywords, the retrieval formula information in the preset industry is determined, the retrieval formula with high matching degree with the preset industry is ensured to be obtained, and therefore the accuracy of the webpage retrieved according to the retrieval formula is ensured.
EXAMPLE III
Fig. 3 is a schematic flow chart of a customer information mining method according to a third embodiment of the present invention, and in this embodiment, based on the foregoing embodiments, a target website determined from the web page information is optimized, and as shown in fig. 3, the method of this embodiment includes:
310. and determining retrieval type information of a preset industry.
320. And retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result.
330. And analyzing the webpage information and extracting identification information of the webpage.
Wherein the identification information includes at least one of title information, summary information, and address information of the page.
For example, the web crawler may be used to crawl the determined retrievable information of the preset industry, obtain and analyze the returned page, and extract the identification information of the returned page, such as the TOPN title, the summary information of the page, and the address information of the page, that is, url information.
340. And carrying out data cleaning on the identification information according to a preset filtering rule to obtain the target website.
For example, the title, the summary information and the url information of the web page are subjected to data cleaning according to a preset filtering rule, for example, the url may be subjected to a strong rule filtering based on a main domain, the url information includes domain name information, and common web site information such as a portal web site and a blog and the like may be filtered according to the domain name information in the url, which may be understood as setting a url blacklist.
For example, the title and the summary information of the web page may also be subjected to corresponding strong rule filtering, for example, a website in which information that is not desired to be related, such as 58 city, hundred degree knowledge, or hundred degree encyclopedia, appears in the title is filtered, it is understood that, when the title and the summary information of the web page are filtered by using the strong rule, a rule vocabulary may be preset, it may be understood that a white list may be set, and when the strong rule filtering is performed, the website corresponding to the set rule vocabulary may be retained after filtering.
350. And acquiring a candidate webpage set of the target website, and filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology to acquire the portrait characteristics of the target client.
According to the technical scheme, after candidate webpage information is captured according to a search mode, the candidate webpage can be screened by extracting identification information of the webpage, experimental analysis finds that about 70% of irrelevant url information can be eliminated through main domain filtering, about 20% of irrelevant webpage information can be filtered continuously by carrying out strong rule filtering on webpage titles and summaries, so that the range of the candidate webpage is greatly reduced, the accuracy of follow-up target clients is guaranteed, accurate target clients can be locked from mass information, the amount of clients is increased, and portrait feature information of the clients is acquired, and the accuracy of follow-up marketing is guaranteed.
Example four
Fig. 4 is a schematic flow chart of a client information mining method according to a fourth embodiment of the present invention, and in this embodiment, on the basis of the foregoing embodiments, an image feature of an acquired target client is optimized, as shown in fig. 4, the method of this embodiment includes:
410. and determining retrieval type information of a preset industry.
420. And retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result.
430. And determining a target website from the webpage information.
440. And accessing the target website based on the address information of the target website to obtain all the webpages in the target website as a candidate webpage set.
For example, after the target website is determined, the url of the target website is searched in a search engine to obtain a target website page, where the target website may include a plurality of web page information, for example, a certain flower express website may include a plurality of web pages about us, group buying for a business, customer service, and shopping cart, and all web pages included in the target website are taken as a candidate web page set.
450. And filtering the web pages contained in the candidate web page set according to a web page classification technology to obtain the target web page.
For example, the web page classification technology can classify web pages according to web page information, and under different application scenarios, different web page classification standards may exist, for example, web pages may be classified according to information of dimensions such as update scheduling periods of web pages, representation forms of web pages, types of websites, and content bodies of the web pages, and for example, web pages may be classified into categories such as pictures, videos, and texts according to the dimensions of the representation forms of the web pages; according to the website type dimension, the corresponding web pages can be classified into forum type, navigation type, e-commerce type and the like. In this embodiment, the web pages in the candidate web page set may be classified based on the web page type dimension, and the web sites of the platform type, the navigation type, and the like are filtered according to the classification result, so as to obtain the target web page.
460. And analyzing the target webpage according to a webpage entity technology to acquire the portrait characteristics of the target client.
Illustratively, the web page entity technology refers to a technology for extracting entity information of a client corresponding to a web page. And performing entity technical analysis on the candidate web pages obtained after filtering in the step 450 to obtain client portrait characteristic information corresponding to the candidate web pages, such as the entity industry, the main product, the enterprise scale of the client, the contact information of the client and the like.
According to the technical scheme provided by the embodiment, after the target website is screened out from the searched webpage, the target website is further filtered through the webpage classification technology, so that potential client webpage information with high matching performance is locked, the filtered webpage is analyzed through the webpage entity technology, the image characteristics of the client are obtained, the client information with high marketing value is mined, the target client is accurately locked from mass information, the client amount is increased, the image characteristic information of the client is obtained, and the accuracy of subsequent marketing is ensured.
Further, according to the customer entity industry information extracted after the entity technology analysis in step 460, the industry information filtering is performed again on the candidate web pages obtained after the filtering in step 450, irrelevant web pages in the candidate web pages are eliminated, a final target customer web page and the portrait feature information of the target customer are obtained, and the filtered candidate web pages are further screened according to the industry information of the web pages, so that the obtained potential customer web page information can be more accurate, and the accurate locking of the potential customer is further ensured.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a client information mining apparatus according to a fifth embodiment of the present invention. The device can be realized by software and/or hardware, can be integrated in a server or a terminal device with a client information mining function, and can mine a target client by executing a client information mining method. As shown in fig. 4, the apparatus includes: a retrievable determination module 510, a retrieval module 520, a target website determination module 530, and an information mining module 540, wherein:
a retrievable determining module 510, configured to determine retrievable information of a preset industry;
the retrieval module 520 is configured to retrieve through a search engine according to the retrieval formula information of the preset industry, and capture webpage information from a retrieval result;
a target website determining module 530, configured to determine a target website from the web page information;
and the information mining module 540 is configured to obtain a candidate web page set of the target website, filter and analyze web pages in the candidate web page set according to a content filtering technology, and obtain an image feature of the target client.
According to the technical scheme provided by the embodiment, the search engine is used for searching the search type information of the preset industry, the webpage information is captured from the search result, the target website is determined, the webpage set corresponding to the target website is further screened and analyzed, and the portrait feature information of the target customer is obtained. By screening and analyzing the candidate webpage set, the client information with marketing potential is extracted from the webpage, the target client is accurately locked from massive information, the client amount is improved, the portrait characteristic information of the client is obtained, and the accuracy of follow-up marketing is guaranteed.
On the basis of the above embodiments, the retrievable determination module 510 may include:
the retrieval type set acquisition unit is used for screening all retrieved retrieval type information in a preset time range as a candidate retrieval type set according to the history display log information;
and the retrieval formula determining unit is used for classifying all retrieval formula information in the candidate retrieval formula set through a short text classification technology based on the corresponding relation between the industry and the keywords in the keyword popularization system, and screening out the retrieval formula information of a preset industry.
On the basis of the foregoing embodiments, the target website determining module 530 may include:
the webpage information extraction unit is used for analyzing the webpage information and extracting identification information of a webpage, wherein the identification information comprises at least one of title information, abstract information and address information of the webpage; and the target website is accessed.
On the basis of the above embodiments, the information mining module 540 may include:
a web page set obtaining unit, configured to access the target website based on the address information of the target website to obtain all web pages in the target website as a candidate web page set;
the target webpage determining unit is used for filtering the webpages contained in the candidate webpage set according to a webpage classification technology to obtain the target webpage;
and the information mining unit is used for analyzing the target webpage according to a webpage entity technology and acquiring the portrait characteristics of the target client.
On the basis of the above embodiments, the portrait characteristics of the target webpage corresponding to the client include: the business to which the client entity belongs, the client home-run product and the client contact information.
The client information mining device can execute the client information mining method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executed client information mining method.
EXAMPLE six
Fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 6 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 6, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 6, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, such as implementing the client information mining method provided by the embodiments of the present invention, by executing programs stored in the system memory 28.
Namely: the processing unit implements, when executing the program: determining retrieval type information of a preset industry; retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result; determining a target website from the webpage information; and acquiring a candidate webpage set of the target website, and filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology to acquire the portrait characteristics of the target client.
EXAMPLE seven
A seventh embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a client information mining method according to any of the embodiments of the present invention:
namely: the program when executed by a processor implements: determining retrieval type information of a preset industry; retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result; determining a target website from the webpage information; and acquiring a candidate webpage set of the target website, and filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology to acquire the portrait characteristics of the target client.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (12)

1. A method for mining customer information, comprising:
determining retrieval type information of a preset industry;
retrieving through a search engine according to the retrieval type information of the preset industry, and capturing webpage information from a retrieval result; the index categories of the retrieval result comprise portals, industry websites, transaction websites, forums and enterprise websites;
determining a target website from the webpage information;
and acquiring a candidate webpage set of the target website, and filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology to acquire the portrait characteristics of the target client.
2. The method of claim 1, wherein determining the retrievable information of the predetermined industry comprises:
screening all retrieved retrieval formula information in a preset time range as a candidate retrieval formula set according to the history display log information;
and classifying all the retrieval formula information in the candidate retrieval formula set by a short text classification technology based on the corresponding relation between the industry and the keywords in the keyword popularization system, and screening out the retrieval formula information of the preset industry.
3. The method of claim 1, wherein determining a target website from the web page information comprises:
analyzing the webpage information, and extracting identification information of the webpage, wherein the identification information comprises at least one of title information, summary information and address information of the webpage;
and carrying out data cleaning on the identification information according to a preset filtering rule to obtain the target website.
4. The method of claim 1, wherein obtaining a set of candidate web pages of the target website, filtering and analyzing the web pages in the set of candidate web pages according to a content filtering technique, and obtaining an image feature of the target client comprises:
accessing the target website based on the address information of the target website to obtain all webpages in the target website as a candidate webpage set;
filtering the web pages contained in the candidate web page set according to a web page classification technology to obtain a target web page;
and analyzing the target webpage according to a webpage entity technology to acquire the portrait characteristics of the target client.
5. The method of any of claims 1-4, wherein the representation of the target web page to the client comprises: the business to which the client entity belongs, the client home-run product and the client contact information.
6. An apparatus for mining customer information, comprising:
the retrieval type determining module is used for determining retrieval type information of a preset industry;
the retrieval module is used for retrieving through a search engine according to the retrieval type information of the preset industry and capturing webpage information from a retrieval result; the index categories of the retrieval result comprise portals, industry websites, transaction websites, forums and enterprise websites;
the target website determining module is used for determining a target website from the webpage information;
and the information mining module is used for acquiring a candidate webpage set of the target website, filtering and analyzing the webpages in the candidate webpage set according to a content filtering technology, and acquiring the portrait characteristics of the target client.
7. The apparatus of claim 6, wherein the means for determining comprises:
the retrieval type set acquisition unit is used for screening all retrieved retrieval type information in a preset time range as a candidate retrieval type set according to the history display log information;
and the retrieval formula determining unit is used for classifying all retrieval formula information in the candidate retrieval formula set through a short text classification technology based on the corresponding relation between the industry and the keywords in the keyword popularization system, and screening out the retrieval formula information of a preset industry.
8. The apparatus of claim 6, wherein the target website determining module comprises:
the webpage information extraction unit is used for analyzing the webpage information and extracting identification information of a webpage, wherein the identification information comprises at least one of title information, abstract information and address information of the webpage;
and the data cleaning unit is used for cleaning the data of the identification information according to a preset filtering rule to obtain the target website.
9. The apparatus of claim 6, wherein the information mining module comprises:
a web page set obtaining unit, configured to access the target website based on the address information of the target website to obtain all web pages in the target website as a candidate web page set;
the target webpage determining unit is used for filtering the webpages contained in the candidate webpage set according to a webpage classification technology to obtain target webpages;
and the information mining unit is used for analyzing the target webpage according to a webpage entity technology and acquiring the portrait characteristics of the target client.
10. The apparatus of any one of claims 6-9, wherein the representation of the target web page corresponding to the client comprises: the business to which the client entity belongs, the client home-run product and the client contact information.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-5 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201710220155.XA 2017-04-06 2017-04-06 Customer information mining method, device, equipment and storage medium Active CN106991175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710220155.XA CN106991175B (en) 2017-04-06 2017-04-06 Customer information mining method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710220155.XA CN106991175B (en) 2017-04-06 2017-04-06 Customer information mining method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN106991175A CN106991175A (en) 2017-07-28
CN106991175B true CN106991175B (en) 2020-08-11

Family

ID=59414845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710220155.XA Active CN106991175B (en) 2017-04-06 2017-04-06 Customer information mining method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN106991175B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563715A (en) * 2017-07-19 2018-01-09 天津云脉三六五科技有限公司 Foreign trade set-off marketing system and method
CN107679895A (en) * 2017-09-21 2018-02-09 深圳市傲天科技股份有限公司 Screen the method, apparatus and computer-readable recording medium of targeted customer
CN108596683A (en) * 2018-05-03 2018-09-28 新奥(中国)燃气投资有限公司 A kind of potential customers' information acquisition method and device
CN108683949B (en) * 2018-05-18 2021-11-02 北京奇艺世纪科技有限公司 Method and device for extracting potential users of live broadcast platform
CN109344336A (en) * 2018-12-25 2019-02-15 北京时光荏苒科技有限公司 Searching method, search set creation method, device, medium, terminal and server
CN110083623B (en) * 2019-03-12 2023-10-17 中国平安人寿保险股份有限公司 Business rule generation method and device
CN112148957A (en) * 2019-06-26 2020-12-29 北京百度网讯科技有限公司 Webpage access data analysis method, device and equipment and readable storage medium
CN112417251A (en) * 2020-11-30 2021-02-26 华能大理风力发电有限公司 Transaction information retrieval method and device based on wind power bidding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216825A (en) * 2007-12-29 2008-07-09 朱廷劭 Indexing key words extraction/ prediction method, on-line advertisement recommendation method and device
CN101645155A (en) * 2008-08-08 2010-02-10 陈列生 Network marketing method
CN103324708A (en) * 2013-06-18 2013-09-25 哈尔滨工程大学 Method of transfer learning from long text to short text
CN105488697A (en) * 2015-12-09 2016-04-13 焦点科技股份有限公司 Potential customer mining method based on customer behavior characteristics
CN105824833A (en) * 2015-01-07 2016-08-03 苏宁云商集团股份有限公司 Keyword recommendation method and system based on user behavior feedback

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216825A (en) * 2007-12-29 2008-07-09 朱廷劭 Indexing key words extraction/ prediction method, on-line advertisement recommendation method and device
CN101645155A (en) * 2008-08-08 2010-02-10 陈列生 Network marketing method
CN103324708A (en) * 2013-06-18 2013-09-25 哈尔滨工程大学 Method of transfer learning from long text to short text
CN105824833A (en) * 2015-01-07 2016-08-03 苏宁云商集团股份有限公司 Keyword recommendation method and system based on user behavior feedback
CN105488697A (en) * 2015-12-09 2016-04-13 焦点科技股份有限公司 Potential customer mining method based on customer behavior characteristics

Also Published As

Publication number Publication date
CN106991175A (en) 2017-07-28

Similar Documents

Publication Publication Date Title
CN106991175B (en) Customer information mining method, device, equipment and storage medium
Khder Web scraping or web crawling: State of art, techniques, approaches and application.
Al-Sai et al. Big data impacts and challenges: a review
US10380197B2 (en) Network searching method and network searching system
CN102253936B (en) Method for recording access of user to merchandise information, search method and server
CN108021651B (en) Network public opinion risk assessment method and device
US20120198342A1 (en) Automatic generation of task scripts from web browsing interaction history
CN110827112B (en) Deep learning commodity recommendation method and device, computer equipment and storage medium
CN102663060B (en) Method and device for identifying tampered webpage
Lee et al. Fundamentals of big data network analysis for research and industry
CN111444304A (en) Search ranking method and device
KR20200025431A (en) Total management system and method about open market
US20130259362A1 (en) Attribute cloud
Bhatia et al. Machine Learning with R Cookbook: Analyze data and build predictive models
CN115098440A (en) Electronic archive query method, device, storage medium and equipment
CN103631796A (en) Website sort management method and electronic device
CN107153697A (en) Product search method and device in a kind of commodity transaction website
CN107527289B (en) Investment portfolio industry configuration method, device, server and storage medium
CN108959289B (en) Website category acquisition method and device
CN112685618A (en) User feature identification method and device, computing equipment and computer storage medium
CN115423555A (en) Commodity recommendation method and device, electronic equipment and storage medium
US9928303B2 (en) Merging data analysis paths
CN111127057B (en) Multi-dimensional user portrait recovery method
CN113961811A (en) Conversational recommendation method, device, equipment and medium based on event map
CN112069807A (en) Text data theme extraction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant