CN110414518A - Network address recognition methods, device, computer equipment and storage medium - Google Patents

Network address recognition methods, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110414518A
CN110414518A CN201910561370.5A CN201910561370A CN110414518A CN 110414518 A CN110414518 A CN 110414518A CN 201910561370 A CN201910561370 A CN 201910561370A CN 110414518 A CN110414518 A CN 110414518A
Authority
CN
China
Prior art keywords
network address
identified
picture
association
party
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910561370.5A
Other languages
Chinese (zh)
Other versions
CN110414518B (en
Inventor
王建华
何四燕
金志敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910561370.5A priority Critical patent/CN110414518B/en
Priority claimed from CN201910561370.5A external-priority patent/CN110414518B/en
Publication of CN110414518A publication Critical patent/CN110414518A/en
Priority to PCT/CN2019/118243 priority patent/WO2020258669A1/en
Application granted granted Critical
Publication of CN110414518B publication Critical patent/CN110414518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

This application involves big data technical fields, for intelligent input industry, a kind of network address recognition methods, device, computer equipment and storage medium are provided, its method includes: the network address identified in picture to be identified by OCR tool, if the network address of Direct Recognition is imperfect, characteristic information in picture to be identified is identified again, association network address relevant to characteristic information is obtained by access third party's internet search engine, the network address of Direct Recognition is matched with network address is associated with, complete network address is obtained, it being capable of network address that is efficient and accurately identifying its carrying.

Description

Network address recognition methods, device, computer equipment and storage medium
Technical field
This application involves intelligent identification technology fields, more particularly to a kind of network address recognition methods, device, computer equipment And storage medium.
Background technique
With the development of science and technology, internet has been deep into daily life at present.People can pass through Internet checking data, purchase commodity, social activity etc., have brought huge convenience.
It is general using being manually entered access network address when general user carries out web page browsing by browser, when encountering certain A little mobile phone screens are too small to cause input inconvenience or network address too long, is easy to wrongly write or even write leakage, bothersome time-consuming.For this feelings Condition, the existing technology that network address is obtained directly from picture can be user A and connect in the application scenarios based on the technology at present The picture that user B shares is received, the network address that user B recommends mono- news of user A is carried in the picture, user A is being received To after the sharing picture, it can identify and extract with the network address carried in picture based on network address identification technology in picture and be input to use In the browser of family A terminal, user A, that is, this browsable news.
Although being taken however, network address identification technology may be implemented network address identification and extract in above-mentioned picture for picture With network address abnormal (such as part network address is covered, network address misprint etc.) or the situation of network address incompleteness is carried, traditional technology is then It is unable to get corresponding accurate network address.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide a kind of accurate network address recognition methods, device, computer Equipment and storage medium.
A kind of network address recognition methods, which comprises
Picture to be identified is obtained, OCR (Optical Character Recognition, optical character identification) work is passed through Tool identifies the network address carried in the picture to be identified;
When the network address that identification obtains is incomplete network address, the spy carried in picture to be identified is extracted by OCR tool Reference breath;
Obtain the association network address of the characteristic information of third party's internet search engine feedback;
The association network address is matched with the incomplete network address, obtains target network address.
It is described in one of the embodiments, to obtain picture to be identified, it is identified in the picture to be identified by OCR tool The network address of carrying includes:
Image gray processing processing and edge detection are carried out to the picture to be identified, and straight line is carried out based on Hough transform Detection;
Radon transformation is carried out to straight-line detection result, calculates the view field in each direction, searches view field's width most The angle of lookup is done slant correction angle and carries out slant correction processing by the angle of hour;
Binary conversion treatment is carried out to the gray level image after slant correction, and based on the floor projection obtained after binary conversion treatment The region for carrying website information is determined with upright projection;
Shearing carries the region of website information, and zooms in and out processing according to image of the pre-set dimension to shearing;
The network address carried in image after identifying scaling processing by optical identifier tool.
It is taken in the picture to be identified after the identification image procossing by OCR tool in one of the embodiments, Before the network address of band, further includes:
Using data shape processing technique and connected region analytical technology, from described image, that treated is described to be identified Respective character is extracted in picture;
Using the respective character of extraction as subgraph;
Include: by the network address carried in the picture to be identified after OCR tool identification image procossing
The network address carried in the subgraph is identified by OCR tool.
When the network address obtained when identification is incomplete network address in one of the embodiments, mentioned by OCR tool Before taking the characteristic information carried in picture to be identified, further includes:
Network address analysis is carried out to the network address that identification obtains, judges whether last bit identifies character in the network address for identifying and obtaining To preset network address end of identification character.
The characteristic information includes character features information in one of the embodiments, the acquisition third party internet The association network address of the characteristic information of search engine feedback includes:
Word segmentation processing is carried out to character features information, obtains multiple participle words;
According to default network characterization term database, the network vocabulary in the multiple participle word is extracted;
The network vocabulary is pushed to third party's internet search engine;
Receive the association network address for the network vocabulary that third party's internet search engine is searched.
The characteristic information includes graphic feature information in one of the embodiments, the acquisition third party internet The association network address of the characteristic information of search engine feedback includes:
The graphic feature information is pushed to third party's internet search engine;
Receive the association network address for the graphic feature information that third party's internet search engine is searched, the figure The association network address of feature is searched the product information with the graphic feature information association by third party's internet search engine Or corporate entity's title and search the product information or corporate entity's title obtains.
It is described in one of the embodiments, to match the association network address with the incomplete network address, obtain target Network address includes:
Similarity mode is carried out to the association network address and the incomplete network address;
Select the corresponding association network address of similarity mode result highest as target network address.
A kind of network address identification device, described device include:
First identification module is identified by OCR tool and to be carried in the picture to be identified for obtaining picture to be identified Network address;
Second identification module is extracted by OCR tool wait know when the network address for obtaining when identification is incomplete network address The characteristic information carried in other picture;
Searching module, the association network address of the characteristic information for obtaining third party's internet search engine feedback;
Network address matching module obtains target network address for matching the association network address with the incomplete network address.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device is realized when executing the computer program such as the step of the above method.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor It realizes when row such as the step of above-mentioned method.
Above-mentioned network address recognition methods, device, computer equipment and storage medium identify picture to be identified by OCR tool In network address, if the network address of Direct Recognition is imperfect, then identifies characteristic information in picture to be identified, pass through access third party interconnection Net search engine obtains association network address relevant to characteristic information, and the network address of Direct Recognition is matched with network address is associated with, has been obtained Whole network address can accurately identify the network address of its carrying.
Detailed description of the invention
Fig. 1 is the applied environment figure of network address recognition methods in one embodiment;
Fig. 2 is the flow diagram of network address recognition methods in one embodiment;
Fig. 3 is the flow diagram of network address recognition methods in another embodiment;
Fig. 4 is the structural block diagram of network address identification device in one embodiment;
Fig. 5 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Network address recognition methods provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, terminal 102 It is communicated with server 104 by network by network.Terminal 102 sends picture to be identified to server 104 by network, Server 104 receives the picture to be identified, identifies the network address carried in picture to be identified by OCR tool, obtains when identification When network address is complete network address, the complete network address is fed back to terminal 102, or directly access the website links, it will access feedback Data be back to terminal 102, user can browse to the corresponding information of the network address by terminal;When the network address that identification obtains is When incomplete network address, the characteristic information carried in picture to be identified is extracted by OCR tool, obtains third party's internet hunt Association network address is matched with incomplete network address, obtains target network address, server by the association network address of the characteristic information of engine feedback 104, which can will fall out, marks network address to terminal 102, or directly accesses the website links, and the data of access feedback are back to terminal 102.Wherein, terminal 102 can be, but not limited to be various personal computers, laptop, smart phone, tablet computer and just Take formula wearable device, server 104 can with the server cluster of the either multiple servers compositions of independent server come It realizes.
In one embodiment, as shown in Fig. 2, providing a kind of network address recognition methods, it is applied in Fig. 1 in this way It is illustrated for server, comprising the following steps:
S200: obtaining picture to be identified, identifies the network address carried in picture to be identified by OCR tool.
OCR tool is to check the character printed on paper for electronic equipment (such as scanner or digital camera), passes through inspection It surveys dark, bright mode and determines its shape, then shape is translated into the process of computword with character identifying method.It is to be identified Picture refers to that the picture for carrying out network address identification, picture to be identified can be terminal and execute the picture that shot operation obtains, can be Picture from the picture of the Internet download, or during being chatted by social application, identifies the figure by OCR tool The network address carried in piece.It should be pointed out that this network address obtained is sky when not carrying network address in picture to be identified It is white.By taking chat scenario as an example, user A shares a piece cuisines article and gives good friend B, and user's A operating terminal executes screenshotss movement, cuts The picture comprising this cuisines article network address is taken, the picture comprising this cuisines article network address is sent to the end of good friend B by user A End, then the picture in this scene comprising this cuisines article network address is picture to be identified;Or with the Internet download For scene, user A browses to a buyer's guide picture from internet, carries the commodity in the buyer's guide picture The network address of webpage is discussed in detail, user A downloads the buyer's guide picture, identifies that the commodity are situated between by the application network address recognition methods The network address that the picture that continues carries, then the buyer's guide picture is picture to be identified in this scene.
S400: it when the network address that identification obtains is incomplete network address, is extracted in picture to be identified and is carried by OCR tool Characteristic information.
In general, the generation of network address is based on industry rule, therefore, by can accurately determine to work as to network address analysis It is preceding to identify whether obtained network address is complete network address.When for imperfect network address, figure to be identified is identified again by OCR tool Characteristic information in piece, characteristic information mainly include character features information and logotype information, and character features information is available Lteral data in picture to be identified forms lteral data set, data cleansing can be carried out to lteral data, and extract wherein Keyword, crucial identification process can obtain set of keywords related with network address, such as safety based on historical empirical data XX, phoenix XX, Sina XX etc..Logotype information specifically can be with the trade mark of brand, the shape etc. of article.
S600: the association network address of the characteristic information of third party's internet search engine feedback is obtained.
The association Web site query of characteristic information can be completed by third party's internet search engine, and server believes feature Breath pushes to third party's internet search engine, inquires skill by third party's internet search engine big data Internet-based Characteristic information is inquired the network address of related content by art in internet, such as when the characteristic information of extraction is " safety science and technology X " When, server is communicated by internet with search engine server, and " safety science and technology X " is sent to search engine in internet and is taken Business device, gets network address relevant to " safety science and technology X ".When the characteristic information of extraction is the trade mark of certain brand, by some product The trade mark of board is sent to search engine server, can inquire the relevant network address of the brand, and relevant network address includes the brand The network address such as official website, brand relevant advertisements, product introduction, the related news.Specifically, association network address can also be with for one It is multiple.Third party's search engine can draw for network search engines common at present, such as Baidu search engine, Google search It holds up, server is sent to search engine by accessing these search engines, by characteristic information, receives the number of search engine feedback According to come the association network address that obtains characteristic information.
S800: association network address is matched with incomplete network address, obtains target network address.
Since the incomplete network address of acquisition belongs to a part of complete network address (target network address), because it can be used as Incomplete network address is matched with the network address that inquiry obtains, shows obtaining if successful match by the matching condition of whole network address Target network address has been found in the network address taken, browser is called to open the target network address, is realized network address efficiently and is accurately identified.
Above-mentioned network address recognition methods identifies the network address in picture to be identified by OCR tool, if the network address of Direct Recognition is not Completely, it then identifies characteristic information in picture to be identified, is obtained and characteristic information phase by access third party's internet search engine The network address of Direct Recognition is matched with network address is associated with, obtains complete network address, can accurately identify its carrying by the association network address of pass Network address.
As shown in figure 3, step S200 includes: in one of the embodiments,
S210: image gray processing processing and edge detection are carried out to picture to be identified, and straight line is carried out based on Hough transform Detection.
S220: Radon transformation is carried out to straight-line detection result, calculates the view field in each direction, searches view field The angle of lookup is done slant correction angle and carries out slant correction processing by angle when width minimum.
S230: binary conversion treatment is carried out to the gray level image after slant correction, and based on the water obtained after binary conversion treatment Flat projection determines the region for carrying website information with upright projection.
S240: shearing carries the region of website information, and zooms in and out processing according to image of the pre-set dimension to shearing.
S250: the network address carried in the image after identifying scaling processing by optical identifier tool.
Slant correction is specifically included in gray processing processing, CANNY edge detection have been carried out to image after, based on Hough become Swap-in row straight-line detection, then Radon transformation is carried out to straight-line detection result, and calculate the view field in each direction, it finds and throws Angle when shadow peak width minimum is inclined direction, then carries out rotation correction to business card is originally inputted by this angle.It cuts It cuts including carrying out binary conversion treatment to the gray level image after slant correction, wherein Threshold uses maximum between-cluster variance Method, then determine based on floor projection, upright projection the region of business card, wherein threshold value, which determines, uses empirical method, then by name section Domain, which is cut out, to be come.Scaling includes:, by initially carry out scaling is sized, to adopt in scaling to the name panel region being cut out Use bilinearity method as interpolation method.In the present embodiment, image procossing first is carried out to picture to be identified and carries out pre-processing, just More efficient in its, accurate progress OCR identification.
The network address carried in picture to be identified after identifying image procossing by OCR tool in one of the embodiments, Before, further includes: using data shape processing technique and connected region analytical technology from the picture to be identified after image procossing Middle extraction respective character;Using the respective character of extraction as subgraph;Figure to be identified after identifying image procossing by OCR tool The network address carried in piece includes: the network address by carrying in OCR tool identified sub-images.
Mathematical mor-phology processing are as follows: to binarization result figure, morphology processing is carried out, to retain real character area Domain.Morphology processing includes image expansion, Image erosion, opening operation, closed operation, connected region analysis, noise remove, exception Region removal;To the binarization result figure after real character is retained, connected region analysis is carried out, and water is carried out to each connected region Then flat expansion process carries out connected component analysis again, and then find out the boundary rectangle of new connected region, finally according to external square Shape is extracted using block region as subgraph.Obtained subgraph when subgraph is multiple, may obtain respectively to be multiple The network address carried in each subgraph is taken, the network address sequential combination that will acquire obtains the network address carried in picture to be identified.
As shown in figure 3, in one of the embodiments, before step S400, further includes:
S300: carrying out network address analysis to the network address that identification obtains, and judges to identify that last bit mark character is in obtained network address No is default network address end of identification character.
Default network address end of identification character is the standard character set based on industry standard, such as conventional .cn or .com Or .html.Such as https: //baike, as incomplete network address enters step S400 when for imperfect network address.
Characteristic information includes character features information in one of the embodiments, obtains third party's internet search engine The association network address of the characteristic information of feedback includes:
Word segmentation processing is carried out to character features information, obtains multiple participle words;According to default network characterization term data The network vocabulary in multiple participle words is extracted in library;Network vocabulary is pushed to third party's internet search engine;Receive third party The association network address for the network vocabulary that internet search engine is searched.
Word segmentation processing refers to that by complete one section words or a sentence classifying rationally be multiple participle words, and participle is obtained Participle word searched in default network characterization term database, see in multiple participle words with the presence or absence of network word Language, then network vocabulary is sent to third party's internet search engine, search corresponding association network address.Specifically, net is preset Network feature term database can be constantly updated according in daily use, net based on the database of historical experience building Network word is specifically as follows corporate entity, name of product, star's name, the red place of net etc., and in general, network vocabulary is generally all The word for needing related content can be searched on the internet by referring to.
Characteristic information includes graphic feature information in one of the embodiments, obtains third party's internet search engine The association network address of the characteristic information of feedback includes: to push graphic feature information to third party's internet search engine;Receive third The association network address for the graphic feature information that square internet search engine is searched, the association network address of graphic feature is by third party internet The product information or corporate entity's title of search engine lookup and graphic feature information association simultaneously search product information or company's reality Body title obtains.
Graphic feature information is specifically as follows house mark, product marking shape etc..Based on graphic feature information, identification Associated product information or corporate entity's title out, then search the network address with product information or corporate entity's names associate.With white For wine, many companies all use the bottle of unique profile in Liquor product at present, include outside bottle when obtaining characteristic information When graphic data, it can be found by third party's internet search engine using big data mode according to the bottle shape data The pass with the alcohol product and/or company is further searched by the alcohol product information and/or the company for producing the alcohol product Networking location.
Association network address is matched with incomplete network address in one of the embodiments, obtaining target network address includes: to pass Networking location and incomplete network address carry out similarity mode, select the corresponding association network address of similarity mode result highest as mesh Mark network address.
Since association network address is there may be multiple, in the present embodiment, from association network address by the way of similarity mode It is middle to select the highest network address of similarity as target network address, it realizes efficiently and target network address is recognized accurately.
For a specific application example will be used below with the technical solution that above-mentioned network address recognition methods is further explained in detail It is illustrated.
In a certain application example, user sends a picture for carrying cosmetic product to server by terminal, is somebody's turn to do Picture is that some cosmetic product introduces screenshot picture, and server receives the picture to be identified, identifies that this is to be identified by OCR tool The https carried in picture: //ABCD network address analyzes the network address, determines that it does not carry network address end of identification character, It therefore, is imperfect network address, it includes XX beauty that server, which extracts the characteristic information carried in the picture to be identified by OCR tool, Yan Shuan and shape are the product shape of approximate face type, and the address correlation or product of XX face beautifying cream are searched for using big data mode Shape is the network address of the product of approximate face type, obtains association network address 1, https: //ABMP.com;2,https:// ATMP.com;3, https: //ABCDMPQ.com, by above-mentioned 3 association network address with imperfect network address https: //ABCD is matched, Obtaining target network address is https: //ABCDMPQ.com.
It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.
As shown in figure 4, a kind of network address identification device, device include:
First identification module 200 is identified by OCR tool and to be carried in picture to be identified for obtaining picture to be identified Network address.
Second identification module 400 is extracted when the network address for obtaining when identification is incomplete network address by OCR tool The characteristic information carried in picture to be identified.
Searching module 600, the association network address of the characteristic information for obtaining third party's internet search engine feedback.
Network address matching module 800 matches with incomplete network address for that will be associated with network address, obtains target network address.
Above-mentioned network address identification device, the first identification module 200 identify the network address in picture to be identified by OCR tool, if The network address of Direct Recognition is imperfect, and the second identification module 400 identifies characteristic information in picture to be identified, and searching module 600 passes through It accesses third party's internet search engine and obtains association network address relevant to characteristic information, network address matching module 800 will directly be known Other network address is matched with network address is associated with, and obtains complete network address, can accurately identify the network address of its carrying.
The first identification module 200 is also used to carry out at image gray processing picture to be identified in one of the embodiments, Reason and edge detection, and straight-line detection is carried out based on Hough transform;Radon transformation is carried out to straight-line detection result, is calculated each The view field in direction searches angle when view field's width minimum, the angle of lookup is done slant correction angle and is inclined Oblique correction process;Binary conversion treatment is carried out to the gray level image after slant correction, and based on the level obtained after binary conversion treatment Projection determines the region for carrying website information with upright projection;Shearing carries the region of website information, and according to pre-set dimension pair The image of shearing zooms in and out processing;The network address carried in image after identifying scaling processing by optical identifier tool.
The first identification module 200 is also used to using data shape processing technique and company in one of the embodiments, Logical area's analytical technology extracts respective character from the picture to be identified after image procossing;Using the respective character of extraction as subgraph Picture;Pass through the network address carried in OCR tool identified sub-images.
Above-mentioned network address identification device further includes judgment module in one of the embodiments, the net for obtaining to identification Location carries out network address analysis, judges to identify whether last bit mark character is default network address end of identification character in obtained network address.
Characteristic information includes character features information in one of the embodiments, and searching module 600 is also used to text spy Reference breath carries out word segmentation processing, obtains multiple participle words;According to default network characterization term database, multiple participle words are extracted Network vocabulary in language;Network vocabulary is pushed to third party's internet search engine;Third party's internet search engine is received to look into The association network address for the network vocabulary looked for.
Characteristic information includes graphic feature information in one of the embodiments, and searching module 600 is also used to push figure Characteristic information is to third party's internet search engine;Receive the pass for the graphic feature information that third party's internet search engine is searched Networking location, the association network address of graphic feature is searched by third party's internet search engine to be believed with the product of graphic feature information association Breath or corporate entity's title simultaneously search product information or corporate entity's title obtains.
Network address matching module 800 is used to carry out phase to association network address and incomplete network address in one of the embodiments, It is matched like degree;Select the corresponding association network address of similarity mode result highest as target network address.
Specific about network address identification device limits the restriction that may refer to above for network address recognition methods, herein not It repeats again.Modules in above-mentioned network address identification device can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also store in a software form In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 5.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing the data such as characteristic information and association network address.The network interface of the computer equipment be used for External terminal passes through network connection communication.To realize a kind of network address recognition methods when the computer program is executed by processor.
It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor perform the steps of when executing computer program
Picture to be identified is obtained, the network address carried in picture to be identified is identified by OCR tool;
When the network address that identification obtains is incomplete network address, the spy carried in picture to be identified is extracted by OCR tool Reference breath;
Obtain the association network address of the characteristic information of third party's internet search engine feedback;
Association network address is matched with incomplete network address, obtains target network address.
In one embodiment, it is also performed the steps of when processor executes computer program
Image gray processing processing and edge detection are carried out to picture to be identified, and straight-line detection is carried out based on Hough transform; Radon transformation is carried out to straight-line detection result, calculates the view field in each direction, searches angle when view field's width minimum The angle of lookup is done slant correction angle and carries out slant correction processing by degree;Two-value is carried out to the gray level image after slant correction Change processing, and the region for carrying website information is determined based on the floor projection and upright projection obtained after binary conversion treatment;Shearing The region of website information is carried, and processing is zoomed in and out according to image of the pre-set dimension to shearing;Pass through optical identifier tool The network address carried in image after identification scaling processing.
In one embodiment, it is also performed the steps of when processor executes computer program
It is mentioned from the picture to be identified after image procossing using data shape processing technique and connected region analytical technology Take respective character;Using the respective character of extraction as subgraph;Pass through the network address carried in OCR tool identified sub-images.
In one embodiment, it is also performed the steps of when processor executes computer program
Network address analysis is carried out to the network address that identification obtains, judges to identify whether last bit mark character is pre- in obtained network address If network address end of identification character.
In one embodiment, it is also performed the steps of when processor executes computer program
According to default network characterization term database, the network vocabulary in multiple participle words is extracted;Push network vocabulary To third party's internet search engine;Receive the association network address for the network vocabulary that third party's internet search engine is searched.
In one embodiment, it is also performed the steps of when processor executes computer program
Graphic feature information is pushed to third party's internet search engine;Receive what third party's internet search engine was searched The association network address of the association network address of graphic feature information, graphic feature is searched by third party's internet search engine and graphic feature The product information or corporate entity's title of information association simultaneously search product information or corporate entity's title obtains.
In one embodiment, it is also performed the steps of when processor executes computer program
Similarity mode is carried out to association network address and incomplete network address, selects the corresponding pass of similarity mode result highest Location network as target network address.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor
Picture to be identified is obtained, the network address carried in picture to be identified is identified by OCR tool;
When the network address that identification obtains is incomplete network address, the spy carried in picture to be identified is extracted by OCR tool Reference breath;
Obtain the association network address of the characteristic information of third party's internet search engine feedback;
Association network address is matched with incomplete network address, obtains target network address.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Image gray processing processing and edge detection are carried out to picture to be identified, and straight-line detection is carried out based on Hough transform; Radon transformation is carried out to straight-line detection result, calculates the view field in each direction, searches angle when view field's width minimum The angle of lookup is done slant correction angle and carries out slant correction processing by degree;Two-value is carried out to the gray level image after slant correction Change processing, and the region for carrying website information is determined based on the floor projection and upright projection obtained after binary conversion treatment;Shearing The region of website information is carried, and processing is zoomed in and out according to image of the pre-set dimension to shearing;Pass through optical identifier tool The network address carried in image after identification scaling processing.
In one embodiment, it is also performed the steps of when computer program is executed by processor
It is mentioned from the picture to be identified after image procossing using data shape processing technique and connected region analytical technology Take respective character;Using the respective character of extraction as subgraph;Pass through the network address carried in OCR tool identified sub-images.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Network address analysis is carried out to the network address that identification obtains, judges to identify whether last bit mark character is pre- in obtained network address If network address end of identification character.
In one embodiment, it is also performed the steps of when computer program is executed by processor
According to default network characterization term database, the network vocabulary in multiple participle words is extracted;Push network vocabulary To third party's internet search engine;Receive the association network address for the network vocabulary that third party's internet search engine is searched.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Graphic feature information is pushed to third party's internet search engine;Receive what third party's internet search engine was searched The association network address of the association network address of graphic feature information, graphic feature is searched by third party's internet search engine and graphic feature The product information or corporate entity's title of information association simultaneously search product information or corporate entity's title obtains.
In one embodiment, it is also performed the steps of when computer program is executed by processor
Similarity mode is carried out to association network address and incomplete network address, selects the corresponding pass of similarity mode result highest Location network as target network address.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.
Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art, In Under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection scope of the application. Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. a kind of network address recognition methods, which comprises
Picture to be identified is obtained, the network address carried in the picture to be identified is identified by optical identifier tool;
When the obtained network address of identification is incomplete network address, is extracted by optical identifier tool and carried in picture to be identified Characteristic information;
Obtain the association network address of the characteristic information of third party's internet search engine feedback;
The association network address is matched with the incomplete network address, obtains target network address.
2. passing through optical identifier work the method according to claim 1, wherein described obtain picture to be identified Tool identifies that the network address carried in the picture to be identified includes:
Image gray processing processing and edge detection are carried out to the picture to be identified, and straight-line detection is carried out based on Hough transform;
Radon transformation is carried out to straight-line detection result, calculates the view field in each direction, when searching view field's width minimum Angle, by the angle of lookup do slant correction angle carry out slant correction processing;
Binary conversion treatment is carried out to the gray level image after slant correction, and based on the floor projection obtained after binary conversion treatment and is hung down It delivers directly shadow and determines the region for carrying website information;
Shearing carries the region of website information, and zooms in and out processing according to image of the pre-set dimension to shearing;
The network address carried in image after identifying scaling processing by optical identifier tool.
3. according to the method described in claim 2, it is characterized in that, after the identification image procossing by optical identifier tool The picture to be identified in front of the network address that carries, further includes:
Using data shape processing technique and connected region analytical technology from described image treated the picture to be identified Middle extraction respective character;
Using the respective character of extraction as subgraph;
Include: by the network address carried in the picture to be identified after optical identifier tool identification image procossing
The network address carried in the subgraph is identified by optical identifier tool.
4. the method according to claim 1, wherein the network address obtained when identification is incomplete network address When, before extracting the characteristic information carried in picture to be identified by optical identifier tool, further includes:
Network address analysis is carried out to the obtained network address of identification, judge described to identify that last bit identifies whether character is pre- in obtained network address If network address end of identification character.
5. described to obtain the method according to claim 1, wherein the characteristic information includes character features information The association network address of the characteristic information for taking third party's internet search engine to feed back includes:
Word segmentation processing is carried out to character features information, obtains multiple participle words;
According to default network characterization term database, the network vocabulary in the multiple participle word is extracted;
The network vocabulary is pushed to third party's internet search engine;
Receive the association network address for the network vocabulary that third party's internet search engine is searched.
6. described to obtain the method according to claim 1, wherein the characteristic information includes graphic feature information The association network address of the characteristic information for taking third party's internet search engine to feed back includes:
The graphic feature information is pushed to third party's internet search engine;
Receive the association network address for the graphic feature information that third party's internet search engine is searched, the graphic feature Association network address searched and the product information or public affairs of the graphic feature information association by third party's internet search engine Department's entity name simultaneously searches the product information or corporate entity's title obtains.
7. the method according to claim 1, wherein described by the association network address and the incomplete network address Matching, obtaining target network address includes:
Similarity mode is carried out to the association network address and the incomplete network address;
Select the corresponding association network address of similarity mode result highest as target network address.
8. a kind of network address identification device, which is characterized in that described device includes:
First identification module is identified in the picture to be identified by optical identifier tool and is taken for obtaining picture to be identified The network address of band;
Second identification module is extracted when the network address for obtaining when identification is incomplete network address by optical identifier tool The characteristic information carried in picture to be identified;
Searching module, the association network address of the characteristic information for obtaining third party's internet search engine feedback;
Network address matching module obtains target network address for matching the association network address with the incomplete network address.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910561370.5A 2019-06-26 2019-06-26 Website identification method, device, computer equipment and storage medium Active CN110414518B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910561370.5A CN110414518B (en) 2019-06-26 Website identification method, device, computer equipment and storage medium
PCT/CN2019/118243 WO2020258669A1 (en) 2019-06-26 2019-11-14 Website identification method and apparatus, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910561370.5A CN110414518B (en) 2019-06-26 Website identification method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110414518A true CN110414518A (en) 2019-11-05
CN110414518B CN110414518B (en) 2024-06-07

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046365A (en) * 2019-12-16 2020-04-21 腾讯科技(深圳)有限公司 Face image transmission method, numerical value transfer method, device and electronic equipment
WO2020258669A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Website identification method and apparatus, and computer device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488983A (en) * 2013-09-13 2014-01-01 复旦大学 Business card OCR data correction method and system based on knowledge base
US20140366052A1 (en) * 2013-06-05 2014-12-11 David J. Ives System for Social Media Tag Extraction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140366052A1 (en) * 2013-06-05 2014-12-11 David J. Ives System for Social Media Tag Extraction
CN103488983A (en) * 2013-09-13 2014-01-01 复旦大学 Business card OCR data correction method and system based on knowledge base

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020258669A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Website identification method and apparatus, and computer device and storage medium
CN111046365A (en) * 2019-12-16 2020-04-21 腾讯科技(深圳)有限公司 Face image transmission method, numerical value transfer method, device and electronic equipment
CN111046365B (en) * 2019-12-16 2023-05-05 腾讯科技(深圳)有限公司 Face image transmission method, numerical value transfer method, device and electronic equipment

Also Published As

Publication number Publication date
WO2020258669A1 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
CN108288078B (en) Method, device and medium for recognizing characters in image
CN108304882B (en) Image classification method and device, server, user terminal and storage medium
CN108595583B (en) Dynamic graph page data crawling method, device, terminal and storage medium
CN110321470B (en) Document processing method, device, computer equipment and storage medium
US8577882B2 (en) Method and system for searching multilingual documents
US7865492B2 (en) Semantic visual search engine
CN111898411B (en) Text image labeling system, method, computer device and storage medium
CA3174601A1 (en) Text intent identifying method, device, computer equipment and storage medium
CN111177405A (en) Data search matching method and device, computer equipment and storage medium
CN110209862B (en) Text matching method, electronic device and computer readable storage medium
CN109800319A (en) Image processing method, device, computer equipment and storage medium
US20180173681A1 (en) System and method for generating content pertaining to real property assets
CN113272803A (en) Method and apparatus for retrieving intelligent information from electronic device
CN108306878A (en) Detection method for phishing site, device, computer equipment and storage medium
CN111858977B (en) Bill information acquisition method, device, computer equipment and storage medium
EP3564833B1 (en) Method and device for identifying main picture in web page
CN110134846A (en) Proper noun processing method, device and the computer equipment of text
JP2007122398A (en) Method for determining identity of fragment, and computer program
WO2020258669A1 (en) Website identification method and apparatus, and computer device and storage medium
US20220027419A1 (en) Smart search and recommendation method for content, storage medium, and terminal
CN115757994A (en) Business name determining method, device, equipment, medium and product
CN110472121A (en) Card information searching method, device, electronic equipment and computer readable storage medium
CN110414518B (en) Website identification method, device, computer equipment and storage medium
CN111783786A (en) Picture identification method and system, electronic equipment and storage medium
CN114549118A (en) Commodity recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant