CN111353084A - Yellow page information acquisition method and device and electronic equipment - Google Patents

Yellow page information acquisition method and device and electronic equipment Download PDF

Info

Publication number
CN111353084A
CN111353084A CN201811583912.0A CN201811583912A CN111353084A CN 111353084 A CN111353084 A CN 111353084A CN 201811583912 A CN201811583912 A CN 201811583912A CN 111353084 A CN111353084 A CN 111353084A
Authority
CN
China
Prior art keywords
page information
yellow page
judged
telephone number
yellow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811583912.0A
Other languages
Chinese (zh)
Inventor
张勇攀
周楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201811583912.0A priority Critical patent/CN111353084A/en
Publication of CN111353084A publication Critical patent/CN111353084A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a yellow page information acquisition method and device and electronic equipment. The method comprises the following steps: inquiring the telephone number to be inquired through a search engine to obtain a plurality of webpages containing the telephone number to be inquired; extracting yellow page information to be selected from a plurality of webpages through a pre-trained recognition model; for any yellow page information to be selected, performing weighted calculation according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located to obtain a weighted result; and determining the yellow page information to be selected with the maximum weighting result as the yellow page information of the telephone number to be inquired. According to the embodiment of the application, the purpose of determining the yellow page information with high accuracy and reliability from a large amount of disordered search data is achieved, so that a user can acquire the yellow page information with high accuracy and reliability matched with the inquired telephone number through one-step quick inquiry of the search engine, and the inquiry experience of the user is improved.

Description

Yellow page information acquisition method and device and electronic equipment
Technical Field
The application relates to the technical field of internet, in particular to a yellow page information acquisition method and device and electronic equipment.
Background
With the development of internet technology, various information can be acquired in a network. When people want to acquire yellow page information of one telephone number, the yellow page information can be acquired by searching through the network.
In the related art, yellow page information of a phone number is generally searched by a search engine, and the search engine returns a series of web pages related to the searched phone number, and then a user views and selects the yellow page information required by the user from a large amount of data.
However, the inventor of the present application found that the yellow page information is generally huge and disorganized, and the accuracy thereof cannot be guaranteed, so that the user is difficult to select.
Disclosure of Invention
The application provides a yellow page information acquisition method, a yellow page information acquisition device and electronic equipment, which are used for acquiring accurate yellow page information from a large amount of disordered search data.
According to a first aspect, an embodiment of the present application provides a yellow page information obtaining method, including:
inquiring the telephone number to be inquired through a search engine to obtain a plurality of webpages containing the telephone number to be inquired;
extracting yellow page information to be selected from a plurality of webpages through a pre-trained recognition model;
for any yellow page information to be selected, performing weighted calculation according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located to obtain a weighted result;
and determining the yellow page information to be selected with the maximum weighting result as the yellow page information of the telephone number to be inquired.
In one possible implementation, the pre-trained recognition model includes any one of:
a pre-trained named entity recognition model;
and training the regular expression set according to the labeled data.
In a possible implementation manner, before performing weighted calculation on any yellow page information to be selected according to the confidence weight of the source website where at least one webpage corresponding to the yellow page information is located, the method further includes:
confidence weights for the various source websites are determined.
In one possible implementation, determining confidence weights for each source website includes:
searching a plurality of sample telephone numbers of known yellow page information through a search engine to obtain a plurality of to-be-judged webpages including each sample telephone number;
extracting yellow page information to be judged from each webpage to be judged respectively through a pre-trained named entity recognition model;
carrying out similarity calculation on any yellow page information to be judged and corresponding known yellow page information;
and determining the confidence coefficient weight of the source website of the webpage to be judged corresponding to any yellow page information to be judged according to the similarity determined by calculation.
In a possible implementation manner, the similarity calculation between any yellow page information to be judged and corresponding known yellow page information includes:
respectively carrying out word segmentation on the yellow page information to be judged and the corresponding known yellow page information to obtain the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information;
and carrying out similarity calculation on the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information.
In one possible implementation, the feature information includes:
personal or business name, phone, address.
In one possible implementation, the method further includes:
and filtering the acquired yellow page information.
According to a second aspect, an embodiment of the present application provides a yellow page information acquiring apparatus, including:
the query module is used for querying the telephone number to be queried by a user through a search engine to obtain a plurality of webpages containing the telephone number to be queried;
the extraction module is used for extracting yellow page information to be selected from a plurality of webpages obtained by the query module through a pre-trained recognition model;
the weighting calculation module is used for carrying out weighting calculation on any yellow page information to be selected extracted by the extraction module according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located to obtain a weighting result;
and the determining module is used for determining the yellow page information to be selected with the maximum weighting result obtained by the weighting calculating module as the yellow page information of the telephone number to be inquired.
In one possible implementation, the pre-trained recognition model includes any one of:
a pre-trained named entity recognition model;
and training the regular expression set according to the labeled data.
In one possible implementation, the apparatus further includes: and the confidence coefficient weight determining module is used for determining the confidence coefficient weight of each source website before carrying out weighted calculation on any yellow page information to be selected extracted by the extracting module according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located.
In one possible implementation, the confidence weight determination module includes:
the search unit is used for searching a plurality of sample telephone numbers of the known yellow page information through a search engine to obtain a plurality of to-be-judged webpages including the sample telephone numbers;
the extraction unit is used for respectively extracting yellow page information to be judged from each webpage to be judged, which is obtained by the search unit, through a pre-trained named entity recognition model;
the similarity calculation unit is used for calculating the similarity between any yellow page information to be judged extracted by the extraction unit and corresponding known yellow page information;
and the confidence coefficient weight determining unit is used for determining the confidence coefficient weight of the source website of the webpage to be judged corresponding to any yellow page information to be judged according to the similarity calculated and determined by the similarity calculating unit.
In one possible implementation, the similarity calculation unit includes:
the word segmentation processing subunit is used for respectively carrying out word segmentation processing on the yellow page information to be judged and the corresponding known yellow page information to acquire the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information;
and the similarity calculation operator unit is used for calculating the similarity between the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information.
In one possible implementation, the feature information includes:
personal or business name, phone, address.
In one possible implementation, the apparatus further includes:
and the filtering module is used for filtering the acquired yellow page information.
According to a third aspect, embodiments of the present application provide an electronic device, including:
a processor, a memory, and a bus;
a bus for connecting the processor and the memory;
a memory for storing operating instructions;
and the processor is used for executing the yellow page information acquisition method shown in the first aspect or any implementation manner thereof by calling the operation instruction.
According to a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium is used for storing computer instructions, and when the computer instructions are executed on a computer, the computer may execute the yellow page information obtaining method shown in the first aspect or any implementation manner thereof.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
compared with the prior art, in the embodiment of the application, the telephone number to be inquired is inquired through the search engine, and a plurality of webpages containing the telephone number to be inquired are obtained; then extracting yellow page information to be selected from a plurality of webpages through a pre-trained recognition model; then, for any yellow page information to be selected, carrying out weighted calculation on the confidence coefficient weight of the source website where at least one corresponding webpage is located according to the yellow page information to be selected to obtain a weighted result; and finally, determining the yellow page information to be selected with the largest weighting result as the yellow page information of the telephone number to be queried, namely, in the embodiment of the application, the confidence weight of the source website of the yellow page information is used as a judgment standard for judging the reliability and the accuracy of the yellow page information, and the yellow page information with the highest confidence weighting result of the source website is used as the yellow page information of the telephone number to be queried, so that the purpose of determining the yellow page information with higher accuracy and reliability from a large amount of disordered search data is achieved, and a user can obtain the yellow page information with higher accuracy and reliability matched with the telephone number to be queried through one-step quick query of a search engine, thereby improving the query experience of the user.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a yellow page information obtaining method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for determining confidence weights for various source websites according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a yellow page information acquiring apparatus according to another embodiment of the present application;
fig. 4 is a schematic structural diagram of a confidence weight determining module according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to yet another embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The application provides a yellow page information acquisition method, a yellow page information acquisition device, an electronic device and a computer readable storage medium, and aims to solve the technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
An embodiment of the present application provides a yellow page information obtaining method, as shown in fig. 1, the method includes steps S101 to S104:
step S101, inquiring the telephone number to be inquired through a search engine to obtain a plurality of webpages containing the telephone number to be inquired.
For the embodiment of the present application, when a user wants to obtain yellow page information of a phone number, the phone number may be queried through a search engine, for example, an AA search engine. The spider crawlers of the AA search engine can automatically capture the web sites on the premise of following robots protocols in advance and put the data of the web sites on a distributed storage platform, such as HBase; when the AA search engine receives a query request which comprises a telephone number to be queried and is input by a user, the query is carried out in the distributed storage platform according to the query request, and a plurality of webpages containing the telephone number to be queried are obtained.
And S102, extracting yellow page information to be selected from a plurality of webpages through a pre-trained recognition model.
For the embodiment of the application, when yellow page information to be selected needs to be extracted from multiple webpages, that is, multiple webpages need to be analyzed, the system pulls corresponding data from the HBase, and extracts the telephone number and the corresponding yellow page information in the named entity recognition model according to the trained named entity recognition model or the regular expression set automatically generated according to the labeled data.
For the embodiment of the application, the pre-trained recognition model is a model which is output after training a named entity recognition algorithm (a conditional random field, a hidden markov model and the like) or a regular expression set and the like on a high-performance server according to the existing label data (namely a specific telephone number, a searched webpage and yellow page information included in the webpage) of yellow page data.
For the embodiment of the application, the purpose of extracting yellow page information from a plurality of webpages based on the input telephone numbers can be realized through the pre-trained recognition model.
Step S103, carrying out weighted calculation on any yellow page information to be selected according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located, and obtaining a weighted result.
For the embodiment of the application, at least one webpage corresponding to any yellow page information to be selected exists. And carrying out weighted calculation on any yellow page information to be selected according to the confidence weights of the source websites where each webpage in at least one webpage is located to obtain a weighted result aiming at any yellow page information to be selected.
And step S104, determining the yellow page information to be selected with the maximum weighting result as the yellow page information of the telephone number to be inquired.
For the embodiment of the application, the larger the weighting result is, the higher the confidence coefficient is, and therefore the yellow page information to be selected with the largest weighting result is the yellow page information with the highest confidence coefficient that can be determined according to the embodiment of the application. Through the steps, the yellow page information with the highest accuracy and reliability can be provided for the user.
Compared with the prior art, in the embodiment of the application, the telephone number to be inquired is inquired through the search engine, and a plurality of webpages containing the telephone number to be inquired are obtained; then extracting yellow page information to be selected from a plurality of webpages through a pre-trained recognition model; then, for any yellow page information to be selected, carrying out weighted calculation according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located to obtain a weighted result; and finally, determining the yellow page information to be selected with the largest weighting result as the yellow page information of the telephone number to be queried, namely, in the embodiment of the application, the confidence weight of the source website of the yellow page information is used as a judgment standard for judging the reliability and the accuracy of the yellow page information, and the yellow page information with the highest confidence weighting result of the source website is used as the yellow page information of the telephone number to be queried, so that the purpose of determining the yellow page information with higher accuracy and reliability from a large amount of disordered search data is achieved, and a user can obtain the yellow page information with higher accuracy and reliability matched with the telephone number to be queried through one-step quick query of a search engine, thereby improving the query experience of the user.
In another possible implementation manner of the embodiment of the application, the pre-trained recognition model includes any one of the following:
a pre-trained named entity recognition model;
and training the regular expression set according to the labeled data.
So-called Named Entities (NEs) are names of people, organizations, places, and all other entities identified by names.
Optionally, the named entity model in the embodiment of the present application may be trained by using a manually labeled corpus based on a statistical method, and the labeling of the corpus does not require extensive linguistic knowledge and can be completed in a short time. The method based on statistical machine learning mainly comprises the following steps: hidden Markov Models (HMMs), Maximum Entropy (MEs), Support Vector Machines (SVMs), Conditional Random Fields (CRFs), and the like.
A Regular Expression (RE) is a pattern of matching character strings, and may be used to check whether a string contains a certain substring, replace the matching substring, or extract a substring that meets a certain condition from a certain string.
Optionally, a large amount of training may be performed on the named entity recognition model or the regular expression set according to a large amount of known phone numbers, a web page including phone numbers, and known yellow page information, and a model is output, so as to obtain a pre-trained named entity recognition model and a regular expression set obtained by training according to labeled data.
Compared with the prior art, the phone number and the corresponding yellow page information are extracted through the pre-trained named entity recognition model and the regular expression set obtained through training according to the labeled data, and the accuracy of analysis is improved.
In another possible implementation manner of the embodiment of the present application, before performing, in step S103, weighted calculation on information of any yellow page to be selected according to a confidence weight of a source website where at least one webpage corresponding to the information of the yellow page to be selected is located, the method further includes:
confidence weights for the various source websites are determined.
Alternatively, the confidence level of the source website (described in detail below) may be calculated by a model, or preset according to the type of the source website. For example, a higher confidence level may be preset for a government website, or a higher confidence level may be preset for an official website of a business or other organization, while a lower confidence level may be preset for a general website.
Compared with the prior art, the confidence coefficient weight of each source website is determined, so that the confidence coefficient weight can be directly used for weighting calculation when the confidence coefficient of the yellow page information is judged, and a simple and easy way with high reliability for judging the confidence coefficient of the yellow page information is provided.
Another possible implementation manner of the embodiment of the present application, as shown in fig. 2, a method for determining confidence weights of source websites includes steps S201 to S204:
step S201, a search engine searches for a plurality of sample phone numbers of the known yellow page information to obtain a plurality of to-be-determined webpages including each sample phone number.
Step S202, yellow page information to be judged is respectively extracted from each webpage to be judged through a pre-trained named entity recognition model.
Optionally, other pre-trained models may be selected to extract yellow page information to be determined from each web page to be determined, respectively. The process of pre-training the named entity recognition model is similar to the description in step S102, and is not repeated here.
And step S203, carrying out similarity calculation on any yellow page information to be judged and corresponding known yellow page information.
Alternatively, the similarity between any yellow page information to be judged and the corresponding known yellow page information can be calculated based on any method such as hamming distance, euclidean distance, cosine distance and the like. The embodiment of the application does not limit the similarity calculation method.
Alternatively, the similarity may be a numerical value between 0 and 1. For example, when two pieces of information are completely dissimilar, the similarity is 0; when the two pieces of information are completely identical, the similarity is 1.
And step S204, determining the confidence coefficient weight of the source website of the webpage to be judged corresponding to any yellow page information to be judged according to the similarity determined by calculation.
Optionally, the confidence weight of the source website where the to-be-determined webpage corresponding to any to-be-determined yellow page information is located is determined according to the similarity determined by calculation, and may be determined according to a corresponding relationship between the similarity and the confidence weight, for example, the similarity 0.4 corresponds to the confidence weight 0.4.
Optionally, the confidence level weight of the source website where the to-be-determined webpage corresponding to any to-be-determined yellow page information is located is determined according to the similarity determined by calculation, and may be determined according to the correspondence between the similarity range and the confidence level weight, for example, the similarity (0.9-1) corresponds to the confidence level weight 1, and the similarity (0.8-0.9) corresponds to the confidence level weight 0.9.
Compared with the prior art, the confidence coefficient weight of the website is obtained by performing similarity calculation on the yellow page information to be judged and the known yellow page information, and a reliable and convenient mode is provided for the calculation of the confidence coefficient weight of the website.
In another possible implementation manner of this embodiment of the present application, in step S203, performing similarity calculation between any piece of yellow page information to be determined and corresponding piece of known yellow page information includes: respectively carrying out word segmentation on the yellow page information to be judged and the corresponding known yellow page information to obtain the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information; and carrying out similarity calculation on the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information.
Optionally, the word segmentation processing performed on the yellow page information to be determined and the corresponding known yellow page information respectively may be character string matching based on a dictionary (for example, a proximity matching algorithm, a two-way maximum matching method, and the like), or may be a word segmentation mode based on statistics (for example, a hidden markov model).
Optionally, after word segmentation processing, the yellow page information to be determined and the feature information of the corresponding known yellow page information can be obtained.
Optionally, the characteristic information of the yellow page information may include: personal or business name, phone, address.
Optionally, the feature information of the yellow page information may further include: nature of industry, web site, fax, zip code, etc. The embodiment of the application does not limit the characteristic information of the yellow page information.
Based on the obtained feature information, the similarity between the feature information of the yellow page information to be determined and the feature information of the known yellow page information may be calculated by using any similarity calculation method described in step S203, so as to obtain the similarity between the determined yellow page information and the known yellow page information.
In another possible implementation manner of the embodiment of the present application, the method further includes: and filtering the acquired yellow page information.
Optionally, the system may cache the yellow page information on a Distributed file system (HDFS), and before online, filter data to be online according to an online rule of the yellow page information, and filter out a phone name and yellow page information thereof that do not meet a specification or are likely to cause a dispute.
Alternatively, the obtained yellow page information may be filtered by processing the yellow page information by a string matching method, a latent semantic indexing method, a neural network method, or the like.
Optionally, the filtering process on the acquired yellow page information may automatically filter the yellow page information in the web pages derived from the website blacklist by setting a website blacklist, for example, adding an illegal website to the website blacklist.
For the embodiment of the application, the yellow page information can be pushed to the yellow page library through the data pushing interface, the off-line yellow page library can be stored in a NoSql-type database such as MongoDB, and finally updated to the on-line database in an incremental updating mode, the on-line database is stored in a levelDB mode, the storage of billion-level number information can be supported, and billions of telephone inquiry requests can be supported every day.
Compared with the prior art, the embodiment of the application can filter the phone names and the yellow page information which do not meet the standard or are easy to cause disputes by filtering before the phone is online, so that more accurate and reliable yellow page information is provided.
Fig. 3 is a schematic structural diagram of a yellow page information acquiring apparatus 300 according to another embodiment of the present application, and as shown in fig. 3, the apparatus may include: a query module 301, an extraction module 302, a weight calculation module 303, and a determination module 304.
The query module 301 is configured to query the phone number to be queried through a search engine, so as to obtain multiple webpages containing the phone number to be queried.
An extracting module 302, configured to extract yellow page information to be selected from the multiple webpages obtained by the querying module 301 through a pre-trained recognition model.
The weighting calculation module 303 is configured to perform weighting calculation on any yellow page information to be selected extracted by the extraction module 302 according to the confidence weight of the source website where the at least one corresponding webpage is located, so as to obtain a weighting result.
And the determining module 304 is configured to determine the yellow page information to be selected with the largest weighting result obtained by the weighting calculating module 303 as the yellow page information of the phone number to be queried.
Optionally, the pre-trained recognition model comprises any one of: a pre-trained named entity recognition model; and training the regular expression set according to the labeled data.
Optionally, the apparatus further comprises: the confidence weight determining module 400 is configured to determine the confidence weight of each source website before performing weighted calculation on any yellow page information to be selected extracted by the extracting module 302 according to the confidence weight of the source website where the at least one webpage corresponding to the yellow page information is located.
Optionally, as shown in fig. 4, the confidence weight determination module 400 may include: search section 401, extraction section 402, similarity calculation section 403, and confidence weight determination section 404.
The searching unit 401 is configured to search, by a search engine, a plurality of sample phone numbers of known yellow page information to obtain a plurality of web pages to be determined including each sample phone number.
An extracting unit 402, configured to extract yellow page information to be determined from each to-be-determined web page obtained by the searching unit 401 through a pre-trained named entity recognition model.
And a similarity calculation unit 403, configured to perform similarity calculation between any piece of yellow page information to be determined extracted by the extraction unit 402 and corresponding piece of known yellow page information.
The confidence weight determining unit 404 is configured to determine a confidence weight of a source website where a webpage to be determined corresponding to any yellow page information to be determined is located according to the similarity calculated and determined by the similarity calculating unit 404.
Alternatively, the similarity calculation unit 403 may include: a word segmentation processing subunit and a similarity calculation subunit.
And the word segmentation processing subunit is used for performing word segmentation processing on the yellow page information to be judged and the corresponding known yellow page information respectively to acquire the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information.
And the similarity calculation operator unit is used for calculating the similarity between the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information.
Optionally, the characteristic information includes: personal or business name, phone, address.
Optionally, the yellow page information obtaining apparatus further includes:
and the filtering module is used for filtering the acquired yellow page information.
The yellow page information acquisition device provided in the embodiment of the present application can execute the yellow page information acquisition method provided in an embodiment of the present application, and the implementation principles thereof are similar, and are not described herein again.
Compared with the prior art, in the embodiment of the application, the telephone number to be inquired is inquired through the search engine, and a plurality of webpages containing the telephone number to be inquired are obtained; then extracting yellow page information to be selected from a plurality of webpages through a pre-trained recognition model; then, for any yellow page information to be selected, carrying out weighted calculation according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located to obtain a weighted result; and finally, determining the yellow page information to be selected with the largest weighting result as the yellow page information of the telephone number to be queried, namely, in the embodiment of the application, the confidence weight of the source website of the yellow page information is used as a judgment standard for judging the reliability and the accuracy of the yellow page information, and the yellow page information with the highest confidence weighting result of the source website is used as the yellow page information of the telephone number to be queried, so that the purpose of determining the yellow page information with higher accuracy and reliability from a large amount of disordered search data is achieved, and a user can obtain the yellow page information with higher accuracy and reliability matched with the telephone number to be queried through one-step quick query of a search engine, thereby improving the query experience of the user.
Yet another embodiment of the present application provides an electronic device, as shown in fig. 5, the electronic device 500 shown in fig. 5 includes: a processor 501 and a memory 503. Wherein the processor 501 is coupled to the memory 503, such as via the bus 502. Optionally, the electronic device 500 may also include a transceiver 504. It should be noted that the transceiver 504 is not limited to one in practical applications, and the structure of the electronic device 500 is not limited to the embodiment of the present application.
The processor 501 may be a Central Processing Unit (CPU), general purpose processor, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 501 may also be a combination of implementing computing functionality, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.
Bus 502 may include a path that transfers information between the above components. The bus 502 may be a Peripheral Component Interconnect (PCI) bus or an extended standard architecture (EISA) bus, among others. The bus 502 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The Memory 503 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an erasable programmable Read-Only Memory (EEPROM), a CD-ROM (Compact Disc-ROM) or other optical disk storage, optical disk storage (including Compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 503 is used for storing application program codes for executing the scheme of the application, and the processor 501 controls the execution. The processor 501 is used to execute the application program code stored in the memory 503 to implement the actions of the yellow page information acquisition device provided by the embodiment shown in fig. 3.
The electronic device provided by the embodiment of the present application can execute the yellow page information obtaining method provided by an embodiment of the present application, and the implementation principles thereof are similar and will not be described herein again.
Compared with the prior art, the method and the device have the advantages that the confidence weighting of the source website of the yellow page information is used as the judgment standard for judging the reliability and the accuracy of the yellow page information, the yellow page information with the highest confidence weighting result of the source website is used as the yellow page information of the telephone number to be inquired, the purpose of determining the yellow page information with higher accuracy and reliability from a large amount of disordered search data is achieved, a user can acquire the yellow page information with higher accuracy and reliability matched with the inquired telephone number of the user through one-step quick inquiry of the search engine, and accordingly inquiry experience of the user is improved.
Yet another embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium is used to store computer instructions, and when the computer instructions are executed on a computer, the computer is enabled to execute the yellow page information obtaining method provided in one embodiment of the present application.
The embodiment of the present application provides a computer-readable storage medium suitable for the yellow page information obtaining method provided in the above embodiment of the present application, which is similar to the foregoing implementation principle and is not described herein again.
Compared with the prior art, the method and the device have the advantages that the confidence weighting of the source website of the yellow page information is used as the judgment standard for judging the reliability and the accuracy of the yellow page information, the yellow page information with the highest confidence weighting result of the source website is used as the yellow page information of the telephone number to be inquired, the purpose of determining the yellow page information with higher accuracy and reliability from a large amount of disordered search data is achieved, a user can acquire the yellow page information with higher accuracy and reliability matched with the inquired telephone number of the user through one-step quick inquiry of the search engine, and accordingly inquiry experience of the user is improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A yellow page information acquisition method is characterized by comprising the following steps:
inquiring the telephone number to be inquired through a search engine to obtain a plurality of webpages containing the telephone number to be inquired;
extracting yellow page information to be selected from the multiple webpages through a pre-trained recognition model;
for any yellow page information to be selected, performing weighted calculation according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located to obtain a weighted result;
and determining the yellow page information to be selected with the maximum weighting result as the yellow page information of the telephone number to be inquired.
2. The yellow page information acquisition method according to claim 1, wherein the pre-trained recognition model includes any one of:
a pre-trained named entity recognition model;
and training the regular expression set according to the labeled data.
3. The yellow page information acquisition method according to claim 1, wherein before performing weighted calculation on any yellow page information to be selected according to the confidence weight of the source website where the at least one webpage corresponding to the yellow page information to be selected is located, the method further comprises:
confidence weights for the various source websites are determined.
4. The method for obtaining yellow page information according to claim 3, wherein the determining the confidence weight of each source website comprises:
searching a plurality of sample telephone numbers of known yellow page information through a search engine to obtain a plurality of to-be-judged webpages including each sample telephone number;
extracting yellow page information to be judged from each webpage to be judged respectively through a pre-trained named entity recognition model;
carrying out similarity calculation on any yellow page information to be judged and corresponding known yellow page information;
and determining the confidence coefficient weight of the source website of the webpage to be judged corresponding to any yellow page information to be judged according to the similarity determined by calculation.
5. The yellow page information acquisition method according to claim 4, wherein the calculating the similarity between any yellow page information to be judged and corresponding known yellow page information comprises:
performing word segmentation processing on the yellow page information to be judged and corresponding known yellow page information respectively to obtain the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information;
and carrying out similarity calculation on the characteristic information of the yellow page information to be judged and the characteristic information of the known yellow page information.
6. The yellow page information acquisition method according to claim 5, wherein the feature information includes:
personal or business name, phone, address.
7. The yellow page information acquisition method according to any one of claims 1 to 6, characterized in that the method further comprises:
and filtering the acquired yellow page information.
8. A yellow page information acquisition apparatus, comprising:
the query module is used for querying the telephone number to be queried through a search engine to obtain a plurality of webpages containing the telephone number to be queried;
the extraction module is used for extracting yellow page information to be selected from the plurality of webpages obtained by the query module through a pre-trained recognition model;
the weighting calculation module is used for carrying out weighting calculation on any yellow page information to be selected extracted by the extraction module according to the confidence coefficient weight of the source website where at least one webpage corresponding to the yellow page information is located to obtain a weighting result;
and the determining module is used for determining the yellow page information to be selected with the maximum weighting result obtained by the weighting calculating module as the yellow page information of the telephone number to be inquired.
9. An electronic device, comprising:
a processor, a memory, and a bus;
the bus is used for connecting the processor and the memory;
the memory is used for storing operation instructions;
the processor is configured to execute the yellow page information obtaining method according to any one of claims 1 to 7 by calling the operation instruction.
10. A computer-readable storage medium for storing computer instructions which, when executed on a computer, enable the computer to perform the yellow page information acquisition method of any one of claims 1 to 7.
CN201811583912.0A 2018-12-24 2018-12-24 Yellow page information acquisition method and device and electronic equipment Pending CN111353084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811583912.0A CN111353084A (en) 2018-12-24 2018-12-24 Yellow page information acquisition method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811583912.0A CN111353084A (en) 2018-12-24 2018-12-24 Yellow page information acquisition method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111353084A true CN111353084A (en) 2020-06-30

Family

ID=71195527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811583912.0A Pending CN111353084A (en) 2018-12-24 2018-12-24 Yellow page information acquisition method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111353084A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199851A (en) * 2014-08-11 2014-12-10 北京奇虎科技有限公司 Method for extracting telephone numbers according to yellow page information and cloud server
CN104915394A (en) * 2015-05-27 2015-09-16 腾讯科技(深圳)有限公司 Yellow page information updating method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199851A (en) * 2014-08-11 2014-12-10 北京奇虎科技有限公司 Method for extracting telephone numbers according to yellow page information and cloud server
CN104915394A (en) * 2015-05-27 2015-09-16 腾讯科技(深圳)有限公司 Yellow page information updating method and device

Similar Documents

Publication Publication Date Title
US9489401B1 (en) Methods and systems for object recognition
CN113822067A (en) Key information extraction method and device, computer equipment and storage medium
JP5078173B2 (en) Ambiguity Resolution Method and System
CN104199965A (en) Semantic information retrieval method
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN110765761A (en) Contract sensitive word checking method and device based on artificial intelligence and storage medium
CN108875065B (en) Indonesia news webpage recommendation method based on content
Wu et al. Extracting topics based on Word2Vec and improved Jaccard similarity coefficient
CN112115232A (en) Data error correction method and device and server
CN104067273A (en) Grouping search results into a profile page
JP2022073981A (en) Source code retrieval
CN110263127A (en) Text search method and device is carried out based on user query word
Wang et al. DM_NLP at semeval-2018 task 12: A pipeline system for toponym resolution
CN111611452A (en) Method, system, device and storage medium for ambiguity recognition of search text
CN112784063A (en) Idiom knowledge graph construction method and device
CN111325033B (en) Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN116226350A (en) Document query method, device, equipment and storage medium
WO2021211145A1 (en) Mapping natural language utterances to operations over a knowledge graph
CN111694967A (en) Attribute extraction method and device, electronic equipment and medium
CN106815179B (en) Text similarity determination method and device
CN108345694B (en) Document retrieval method and system based on theme database
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN113010771A (en) Training method and device for personalized semantic vector model in search engine
US20090182759A1 (en) Extracting entities from a web page

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination