CN111428061A - Method and device for acquiring picture description information and electronic equipment - Google Patents

Method and device for acquiring picture description information and electronic equipment Download PDF

Info

Publication number
CN111428061A
CN111428061A CN201910020512.7A CN201910020512A CN111428061A CN 111428061 A CN111428061 A CN 111428061A CN 201910020512 A CN201910020512 A CN 201910020512A CN 111428061 A CN111428061 A CN 111428061A
Authority
CN
China
Prior art keywords
domain name
picture
webpage
correlation
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910020512.7A
Other languages
Chinese (zh)
Inventor
贺宇
王东宇
周泽南
苏雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201910020512.7A priority Critical patent/CN111428061A/en
Publication of CN111428061A publication Critical patent/CN111428061A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method and a device for acquiring picture description information and electronic equipment, wherein the method comprises the following steps: acquiring a target picture domain name of a target picture; acquiring a target webpage domain name with the maximum correlation with the target picture domain name based on a correlation between a webpage domain name and a picture domain name which is established in advance, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names; and acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name. In the technical scheme, more accurate description information is obtained from the webpage corresponding to the target webpage domain name with higher relevance to the target picture domain name, so that the technical problem of lower accuracy of the description information of the picture in the prior art is solved, and the beneficial effect of improving the accuracy of the picture description information is achieved.

Description

Method and device for acquiring picture description information and electronic equipment
Technical Field
The invention relates to the technical field of software, in particular to a method and a device for acquiring picture description information and electronic equipment.
Background
Search engines have become an indispensable and important way to acquire resources in our lives. The picture search is the most important vertical product in a search engine, and a user can obtain the picture contents of materials such as news, stars, landscapes and the like through the picture search. For picture searching, a large number of pictures on the internet need to be indexed, so that a user can conveniently search the pictures.
When the picture index is established, the description information of the picture needs to be acquired. The description information of the picture is usually obtained from a web page containing the picture, such as web page text, and is used as the description information of the picture. In the actual application process, each picture exists in a plurality of different webpages, and for the same picture, the prior art adopts a mode of randomly selecting one webpage to obtain page content as description information of the picture. Because some web pages have high quality and some web pages have low quality, for example, some web pages stack many irrelevant pictures and text contents only for traffic, a random selection mode may select such a page to describe a picture, which results in greatly reduced accuracy of description information of the picture, and thus results in reduced picture index quality and search quality of picture search.
Disclosure of Invention
The embodiment of the invention provides a method and a device for acquiring picture description information and electronic equipment, which are used for solving the technical problem of low accuracy of picture description information in the prior art.
The embodiment of the invention provides a method for acquiring picture description information, which comprises the following steps:
acquiring a target picture domain name of a target picture;
acquiring a target webpage domain name with the maximum correlation with the target picture domain name based on a correlation between a webpage domain name and a picture domain name which is established in advance, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
and acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
Optionally, the method for establishing the correlation includes:
capturing all picture domain names appearing under each webpage domain name and the appearance times of each picture domain name;
acquiring the ratio of the occurrence frequency of each picture domain name under each webpage domain name to the total occurrence frequency of all picture domain names under each webpage domain name, and taking the ratio as the correlation between each webpage domain name and each picture domain name;
and establishing the correlation relationship according to the correlation between each webpage domain name and each picture domain name.
Optionally, the establishing the correlation relationship according to the correlation between each web domain name and each picture domain name includes:
constructing a correlation table according to each webpage domain name, each picture domain name and the correlation value between each webpage domain name and each picture domain name, wherein the correlation table is used for representing the correlation; alternatively, the first and second electrodes may be,
and acquiring the webpage domain name and the picture domain name with the correlation value larger than a set threshold value and constructing the correlation table by the correlation value.
Optionally, if obtaining the target webpage domain name based on the pre-established correlation between the webpage domain name and the picture domain name fails, the method further includes:
acquiring a reference webpage using the target picture and a reference webpage domain name of the reference webpage;
calculating to obtain the character string similarity between the reference webpage domain name and the reference picture domain name;
and acquiring the reference webpage domain with the maximum character string similarity as the target webpage domain.
Optionally, the obtaining, by calculation, a character string similarity between the reference webpage domain name and the target picture domain name includes:
acquiring a character string editing distance between the reference webpage domain name and the target picture domain name and the longest common subsequence length of the character string;
and calculating to obtain the similarity of the character strings according to the editing distance of the character strings and the length of the longest public subsequence of the character strings.
Optionally, the obtaining of the similarity of the character strings by calculation according to the editing distance of the character strings and the length of the longest common subsequence of the character strings includes:
S=αED+(1-α)LCS;
where S represents the string similarity, ED represents the string edit distance, L CS represents the longest common subsequence length of the string, and α represents a harmonic coefficient.
An embodiment of the present application further provides an apparatus for acquiring picture description information, including:
the picture domain name acquisition unit is used for acquiring a target picture domain name of a target picture;
the system comprises a webpage domain name acquisition unit, a picture domain name acquisition unit and a picture domain name acquisition unit, wherein the webpage domain name acquisition unit is used for acquiring a target webpage domain name with the maximum correlation with a target picture domain name based on a pre-established correlation between the webpage domain name and the picture domain name, and the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
and the description information acquisition unit is used for acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
Optionally, the apparatus further includes a correlation construction unit, configured to:
capturing all picture domain names appearing under each webpage domain name and the appearance times of each picture domain name;
acquiring the ratio of the occurrence frequency of each picture domain name under each webpage domain name to the total occurrence frequency of all picture domain names under each webpage domain name, and taking the ratio as the correlation between each webpage domain name and each picture domain name;
and establishing the correlation relationship according to the correlation between each webpage domain name and each picture domain name.
Optionally, the building unit is further configured to:
constructing a correlation table according to each webpage domain name, each picture domain name and the correlation value between each webpage domain name and each picture domain name, wherein the correlation table is used for representing the correlation; alternatively, the first and second electrodes may be,
and acquiring the webpage domain name and the picture domain name with the correlation value larger than a set threshold value and constructing the correlation table by the correlation value.
Optionally, if acquiring the target webpage domain name based on the pre-established correlation between the webpage domain name and the picture domain name fails, the webpage domain name acquiring unit is further configured to:
acquiring a reference webpage using the target picture and a reference webpage domain name of the reference webpage;
calculating to obtain the character string similarity between the reference webpage domain name and the reference picture domain name;
and acquiring the reference webpage domain with the maximum character string similarity as the target webpage domain.
Optionally, the webpage domain name obtaining unit is further configured to:
acquiring a character string editing distance between the reference webpage domain name and the target picture domain name and the longest common subsequence length of the character string;
and calculating to obtain the similarity of the character strings according to the editing distance of the character strings and the length of the longest public subsequence of the character strings.
Optionally, the webpage domain name obtaining unit is further configured to calculate and obtain the similarity of the character strings according to the following formula:
S=αED+(1-α)LCS;
where S represents the string similarity, ED represents the string edit distance, L CS represents the longest common subsequence length of the string, and α represents a harmonic coefficient.
Embodiments of the present application also provide an electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
acquiring a target picture domain name of a target picture;
acquiring a target webpage domain name with the maximum correlation with the target picture domain name based on a correlation between a webpage domain name and a picture domain name which is established in advance, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
and acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:
acquiring a target picture domain name of a target picture;
acquiring a target webpage domain name with the maximum correlation with the target picture domain name based on a correlation between a webpage domain name and a picture domain name which is established in advance, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
and acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
One or more technical solutions in the embodiments of the present application have at least the following technical effects:
the embodiment of the application provides a method for acquiring picture description information, which aims at a picture to be acquired description information, namely a target picture, and acquires a target picture domain name of the target picture; based on a pre-established correlation relationship between the webpage domain name and the picture domain name, the correlation relationship is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names, a target webpage domain name with the maximum correlation with the target picture domain name is obtained, and the correlation between the webpage content corresponding to the target webpage domain name with the larger correlation and the target picture is larger; the description information of the target picture is obtained from the webpage corresponding to the domain name of the target webpage so as to obtain more accurate description information, thereby solving the technical problem of lower accuracy of the description information of the picture in the prior art and further achieving the beneficial effects of improving the picture index quality and the search quality of picture search.
Drawings
Fig. 1 is a schematic flowchart of a method for acquiring picture description information according to an embodiment of the present application;
fig. 2 is a block diagram of an apparatus for acquiring picture description information according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the technical scheme provided by the embodiment of the application, the method for acquiring the picture description information is provided, the correlation between the corresponding webpage and the picture is represented by establishing the correlation between the webpage domain name and the picture domain name, and the webpage corresponding to the webpage domain name with high correlation is selected to acquire the picture description information, so that the technical problem of low picture description information accuracy caused by randomly selecting the webpage for acquiring the picture description information in the prior art is solved, and the accuracy of the picture description information is improved.
The main implementation principle, the specific implementation mode and the corresponding beneficial effects of the technical scheme of the embodiment of the present application are explained in detail with reference to the accompanying drawings.
Examples
Aiming at the acquisition of the picture description information, the embodiment of the application establishes the correlation between the webpage domain name and the picture domain name in advance, and represents the correlation between the webpage and the picture through the correlation between the webpage domain name and the picture domain name.
The web page content and the pictures in the website of a high-quality website are mainly carried out around the same theme, the more concentrated the picture domain names are at the place where the pictures are located under the same website, the more compact the relationship between the web page content and the pictures is, and therefore the related relationship between the web page domain name and the picture domain name can be established according to the times of the picture domain names appearing under the same web page domain name. And a plurality of picture domain names may appear under the same webpage domain name, and the correlation between the webpage domain name and the picture domain name can be further established according to the total number of appearance times of all the picture domain names of the same webpage domain name and the number of appearance times of each picture domain name in all the picture domain names under the same webpage domain name.
When the correlation between the webpage domain names and the picture domain names is established, all the picture domain names appearing under each webpage domain name and the number of times of each picture domain name appearing can be captured firstly; then, acquiring the ratio of the occurrence frequency of each picture domain name under each webpage domain name to the total occurrence frequency of all picture domain names under each webpage domain name, and taking the ratio as the correlation between each webpage domain name and each picture domain name; and establishing the correlation relationship according to the correlation between each webpage domain name and each picture domain name.
The establishment of the correlation between the web page domain name and the picture domain name may be performed in an offline mining manner, and specifically may include the following steps:
1.1, data capture. And capturing web pages and pictures on the network by a web crawler technology, and storing a web page address page _ url and a picture address pic _ url of the picture appearing on the web page corresponding to the web page address. Furthermore, a large amount of captured webpages and pictures can be stored in the distributed file system HDFS, so that the pictures can be searched quickly.
And 1.2, acquiring the correlation between the webpage domain name and the picture domain name.
First, a web domain name in the fetched web address page _ url and a picture domain name in the fetched picture address pic _ url may be extracted through a string processing technique, for example, assuming that a web address and a picture address are as follows:
page_url=http://www.fzlhw.com/2013tt/?img.b2.bc.c2.e5.c1.d;
pic_url=http://img21.mtime.cn/pi/2012/05/07/114659.36372429.jpg
then the domain names are extracted as: com, pic _ domain, mtime, cn.
After extracting the domain name, taking the same webpage domain name page _ domain as a key, and performing statistical analysis on all picture domain names pic _ domain appearing under the same webpage domain name page _ domain and the appearance times of each picture domain name pic _ domain. Assuming that 45 different picture domain names appear in the two webpage domain names mtime.com and fzlhw.com, wherein a large amount of picture domain names are concentrated on the two domain names mtime.cn and mtime.com; com is more dispersed under the domain name fzlhw, so that fzlhw.com can be considered as a station with lower quality.
Further, according to the statistical data, the correlation between the webpage domain name and the picture domain name is calculated. Specifically, the following formula can be used:
Figure BDA0001940608670000071
is calculated, wherein, Count (pic)i) The number of times of the ith picture domain name appearing under the webpage domain name is represented, and n represents the total number of the picture domain names appearing under the webpage domain name.
1.3, establishing a correlation relationship according to the correlation between the webpage domain name and the picture domain name. Specifically, a correlation table may be established in any one of the following manners, and the correlation between the web domain name and the picture domain name is represented by the correlation table: constructing a correlation table according to each webpage domain name, each picture domain name and the correlation value between each webpage domain name and each picture domain name; and secondly, acquiring the webpage domain name and the picture domain name with the correlation value larger than the set threshold value and constructing a correlation relation table according to the correlation value between the webpage domain name and the picture domain name. The setting threshold may be set by a designer according to an index requirement, and the specific value of the setting threshold is not limited in this embodiment. The related relation table established in the first mode is more comprehensive in coverage and large in data volume; and the correlation table established by the second mode is preferentially established, so that the storage space is saved.
Based on a pre-established correlation relationship between a webpage domain name and a picture domain name, the embodiment of the application provides a method for acquiring picture description information, which is applied to the picture correlation fields of picture index making, picture searching and the like. Referring to fig. 1, the method for acquiring picture description information includes:
s110: acquiring a target picture domain name of a target picture;
s120: acquiring a target webpage domain name with the maximum correlation with a target picture domain name based on a pre-established correlation between the webpage domain name and the picture domain name, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
s130: and obtaining the description information of the target picture from the webpage corresponding to the target webpage domain name.
In a specific implementation process, for a picture to be obtained with description information, that is, a target picture, S210 is executed to obtain a target picture domain name of the target picture. Similarly, the picture address pic _ url of the target picture may be obtained first, and then the domain name may be extracted from the picture address pic _ url to serve as the domain name of the target picture.
After the target picture domain name is acquired in S110, S120 of acquiring a target web page domain name is performed. Specifically, the target picture domain name, and the corresponding web domain name and the correlation thereof may be searched and obtained from a pre-established correlation between the web domain name and the picture domain name.
If the target picture domain name and the corresponding webpage domain name and the correlation thereof are found, acquiring the webpage domain name with the maximum correlation from the found correlation as the target webpage domain name corresponding to the target picture domain name, and successfully acquiring the target webpage domain name based on the pre-established correlation. In general, if the correlation is established in the first manner, the success rate of acquiring the domain name of the target webpage is high, and if the correlation is established in the second manner, the success rate of acquiring the domain name of the target webpage is relatively low, so that the failure of acquiring the domain name of the target webpage is likely to occur.
If the target picture domain name and the corresponding webpage domain name and the correlation thereof are not found, the target webpage domain name acquisition based on the pre-established correlation fails, so that another method is provided for acquiring the target webpage domain name in the embodiment of the application, and the specific steps are as follows:
and 2.1, acquiring a reference webpage using the target picture and a reference webpage domain name of the reference webpage.
And 2.2, calculating to obtain the character string similarity between the reference webpage domain name and the reference picture domain name.
The method comprises the steps of obtaining the similarity of character strings, wherein the character string editing distance between a reference webpage domain name and a target picture domain name and the longest common subsequence length of the character strings can be obtained firstly; and then, calculating to obtain the similarity of the character strings according to the editing distance of the character strings and the length of the longest public subsequence of the character strings. Specifically, the character string similarity may be obtained by calculating using the following formula:
S=αED+(1-α)LCS;
where S denotes a character string similarity, ED denotes a character string edit distance, L CS denotes a character string longest common subsequence length, and α denotes a harmony coefficient.
The similarity of the character strings is calculated by combining the editing distance of the character strings and the length of the longest public subsequence, so that the similarity between two domain names can be represented more accurately. However, the embodiment of the present application does not limit the specific algorithm of the similarity of the character strings, and the similarity of the character strings may also be calculated by the edit distance between the character strings or the longest common subsequence length, or may also be calculated by converting the character strings into vectors and calculating the cosine similarity, the euclidean distance, and the like.
And 2.3, acquiring the reference webpage domain with the maximum character string similarity as the target webpage domain.
After the target webpage domain name is obtained, S230 is executed to obtain description information of the target picture from a webpage corresponding to the target webpage domain name. Specifically, description information regarding the target picture, such as a picture title, and description content of the target picture may be acquired from a web page corresponding to the domain name of the target web page.
In the embodiment, the correlation between the web pages and the pictures is represented by pre-establishing the first correlation between the web page domain name and the picture domain name and/or the character string similarity between the web page domain name and the picture domain name, so that when the picture description information is obtained, the web pages with high correlation are selected from the multiple web pages corresponding to the pictures, and the description information of the pictures is obtained from the web pages, so that the noise caused by randomly selecting the web pages is overcome, the technical problem of lower accuracy of the web page description information in the prior art caused by random selection is solved, and the accuracy of the web page description information is improved. Furthermore, on the basis of the picture description information acquired by the method, when the picture index is established and the picture search is carried out, the quality of the picture index and the picture search can be further improved.
To provide a method for acquiring picture description information according to the above embodiment, an apparatus for acquiring picture description information according to the embodiment of the present application is also provided, please refer to fig. 2, and the apparatus includes:
a picture domain name obtaining unit 21, configured to obtain a target picture domain name of a target picture;
the web page domain name acquisition unit 22 is configured to acquire a target web page domain name having a maximum correlation with the target picture domain name based on a pre-established correlation between the web page domain name and the picture domain name, where the correlation is established according to the total occurrence frequency of all picture domain names appearing under the web page domain name and the occurrence frequency of each picture domain name in all picture domain names;
the description information obtaining unit 23 is configured to obtain the description information of the target picture from the web page corresponding to the target web page domain name.
As an optional implementation manner, the apparatus further includes a correlation construction unit 24, configured to:
capturing all picture domain names appearing under each webpage domain name and the appearance times of each picture domain name; acquiring the ratio of the occurrence frequency of each picture domain name under each webpage domain name to the total occurrence frequency of all picture domain names under each webpage domain name, and taking the ratio as the correlation between each webpage domain name and each picture domain name; and establishing the correlation relationship according to the correlation between each webpage domain name and each picture domain name.
Specifically, when constructing the correlation, the constructing unit may construct a correlation table according to each web domain name, each picture domain name, and a value of correlation between each web domain name and each picture domain name, where the correlation table is used to represent the correlation; the webpage domain name, the picture domain name and the correlation value with the correlation value larger than the set threshold value can be obtained to construct the correlation table.
In a specific implementation process, if the target webpage domain name fails to be acquired based on the pre-established correlation between the webpage domain name and the picture domain name, the webpage domain name acquiring unit 22 is further configured to:
acquiring a reference webpage using the target picture and a reference webpage domain name of the reference webpage; calculating to obtain the character string similarity between the reference webpage domain name and the reference picture domain name; and acquiring the reference webpage domain with the maximum character string similarity as the target webpage domain.
When calculating the similarity of the character strings, the editing distance of the character strings between the domain name of the reference webpage and the domain name of the target picture and the longest public subsequence length of the character strings can be obtained firstly; and then, calculating to obtain the similarity of the character strings according to the editing distance of the character strings and the length of the longest public subsequence of the character strings. Specifically, the similarity of the character strings can be obtained by calculating according to the following formula:
S=αED+(1-α)LCS;
where S represents the string similarity, ED represents the string edit distance, L CS represents the longest common subsequence length of the string, and α represents a harmonic coefficient.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 3 is a block diagram illustrating an electronic device 800 for implementing a method for obtaining picture description information according to an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 3, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/presentation (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides a presentation interface between the electronic device 800 and a user, in some embodiments, the screen may include a liquid crystal display (L CD) and a Touch Panel (TP). if the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
The audio component 810 is configured to present and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, the audio component 810 also includes a speaker for presenting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), programmable logic devices (P L D), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a method of obtaining picture description information, the method comprising: acquiring a target picture domain name of a target picture; acquiring a target webpage domain name with the maximum correlation with the target picture domain name based on a correlation between a webpage domain name and a picture domain name which is established in advance, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names; and acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is only limited by the appended claims
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method for acquiring picture description information is characterized by comprising the following steps:
acquiring a target picture domain name of a target picture;
acquiring a target webpage domain name with the maximum correlation with the target picture domain name based on a correlation between a webpage domain name and a picture domain name which is established in advance, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
and acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
2. The method of claim 1, wherein the correlation relationship is established by a method comprising:
capturing all picture domain names appearing under each webpage domain name and the appearance times of each picture domain name;
acquiring the ratio of the occurrence frequency of each picture domain name under each webpage domain name to the total occurrence frequency of all picture domain names under each webpage domain name, and taking the ratio as the correlation between each webpage domain name and each picture domain name;
and establishing the correlation relationship according to the correlation between each webpage domain name and each picture domain name.
3. The method of claim 2, wherein establishing the correlation relationship according to the correlation between each web domain name and each photo domain name comprises:
constructing a correlation table according to each webpage domain name, each picture domain name and the correlation value between each webpage domain name and each picture domain name, wherein the correlation table is used for representing the correlation; alternatively, the first and second electrodes may be,
and acquiring the webpage domain name and the picture domain name with the correlation value larger than a set threshold value and constructing the correlation table by the correlation value.
4. The method of claim 1, wherein if the target webpage domain name fails to be obtained based on a pre-established correlation between the webpage domain name and the picture domain name, the method further comprises:
acquiring a reference webpage using the target picture and a reference webpage domain name of the reference webpage;
calculating to obtain the character string similarity between the reference webpage domain name and the reference picture domain name;
and acquiring the reference webpage domain with the maximum character string similarity as the target webpage domain.
5. The method of claim 4, wherein the calculating to obtain the string similarity between the reference webpage domain name and the target picture domain name comprises:
acquiring a character string editing distance between the reference webpage domain name and the target picture domain name and the longest common subsequence length of the character string;
and calculating to obtain the similarity of the character strings according to the editing distance of the character strings and the length of the longest public subsequence of the character strings.
6. The method of claim 5, wherein the calculating the string similarity according to the string edit distance and the longest common subsequence length of the string comprises:
S=αED+(1-α)LCS;
where S represents the string similarity, ED represents the string edit distance, L CS represents the longest common subsequence length of the string, and α represents a harmonic coefficient.
7. An apparatus for acquiring picture description information, comprising:
the picture domain name acquisition unit is used for acquiring a target picture domain name of a target picture;
the system comprises a webpage domain name acquisition unit, a picture domain name acquisition unit and a picture domain name acquisition unit, wherein the webpage domain name acquisition unit is used for acquiring a target webpage domain name with the maximum correlation with a target picture domain name based on a pre-established correlation between the webpage domain name and the picture domain name, and the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
and the description information acquisition unit is used for acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
8. The apparatus of claim 7, wherein the apparatus further comprises a correlation building unit to:
capturing all picture domain names appearing under each webpage domain name and the appearance times of each picture domain name;
acquiring the ratio of the occurrence frequency of each picture domain name under each webpage domain name to the total occurrence frequency of all picture domain names under each webpage domain name, and taking the ratio as the correlation between each webpage domain name and each picture domain name;
and establishing the correlation relationship according to the correlation between each webpage domain name and each picture domain name.
9. An electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:
acquiring a target picture domain name of a target picture;
acquiring a target webpage domain name with the maximum correlation with the target picture domain name based on a correlation between a webpage domain name and a picture domain name which is established in advance, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
and acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
10. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the steps of:
acquiring a target picture domain name of a target picture;
acquiring a target webpage domain name with the maximum correlation with the target picture domain name based on a correlation between a webpage domain name and a picture domain name which is established in advance, wherein the correlation is established according to the total occurrence frequency of all picture domain names appearing under the webpage domain name and the occurrence frequency of each picture domain name in all picture domain names;
and acquiring the description information of the target picture from the webpage corresponding to the target webpage domain name.
CN201910020512.7A 2019-01-09 2019-01-09 Method and device for acquiring picture description information and electronic equipment Pending CN111428061A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910020512.7A CN111428061A (en) 2019-01-09 2019-01-09 Method and device for acquiring picture description information and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910020512.7A CN111428061A (en) 2019-01-09 2019-01-09 Method and device for acquiring picture description information and electronic equipment

Publications (1)

Publication Number Publication Date
CN111428061A true CN111428061A (en) 2020-07-17

Family

ID=71545714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910020512.7A Pending CN111428061A (en) 2019-01-09 2019-01-09 Method and device for acquiring picture description information and electronic equipment

Country Status (1)

Country Link
CN (1) CN111428061A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103281320A (en) * 2013-05-23 2013-09-04 中国科学院计算机网络信息中心 Website icon matching-based detection method for brand counterfeit websites
US20130339845A1 (en) * 2011-01-25 2013-12-19 Japan Registry Services Co., Ltd. Website creation system
CN103838728A (en) * 2012-11-21 2014-06-04 腾讯科技(深圳)有限公司 Webpage information processing method and browser
US20150326606A1 (en) * 2012-06-28 2015-11-12 Beijing Qihoo Technology Company Limited System and method for identifying phishing website
US20170126730A1 (en) * 2015-10-29 2017-05-04 Duo Security, Inc. Methods and systems for implementing a phishing assesment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130339845A1 (en) * 2011-01-25 2013-12-19 Japan Registry Services Co., Ltd. Website creation system
US20150326606A1 (en) * 2012-06-28 2015-11-12 Beijing Qihoo Technology Company Limited System and method for identifying phishing website
CN103838728A (en) * 2012-11-21 2014-06-04 腾讯科技(深圳)有限公司 Webpage information processing method and browser
CN103281320A (en) * 2013-05-23 2013-09-04 中国科学院计算机网络信息中心 Website icon matching-based detection method for brand counterfeit websites
US20170126730A1 (en) * 2015-10-29 2017-05-04 Duo Security, Inc. Methods and systems for implementing a phishing assesment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩一: ""网络动态高仿真钓鱼网站识别方法研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, 15 July 2018 (2018-07-15), pages 139 - 82 *

Similar Documents

Publication Publication Date Title
EP3561691B1 (en) Method and apparatus for displaying webpage content
RU2640632C2 (en) Method and device for delivery of information
CN106095465B (en) Method and device for setting identity image
CN108073606B (en) News recommendation method and device for news recommendation
CN105095427A (en) Search recommendation method and device
CN107229403B (en) Information content selection method and device
CN105447109A (en) Key word searching method and apparatus
CN110309324B (en) Searching method and related device
US20160241674A1 (en) Method and device of filtering address
CN104951522B (en) Method and device for searching
CN111629270A (en) Candidate item determination method and device and machine-readable medium
CN110928425A (en) Information monitoring method and device
CN111797746B (en) Face recognition method, device and computer readable storage medium
CN110213062B (en) Method and device for processing message
CN110110046B (en) Method and device for recommending entities with same name
CN105740356B (en) Method and device for marking target audio
US11284127B2 (en) Method and apparatus for pushing information in live broadcast room
CN106851418B (en) Video recommendation method and device
CN113127613B (en) Chat information processing method and device
CN111428061A (en) Method and device for acquiring picture description information and electronic equipment
CN113918661A (en) Knowledge graph generation method and device and electronic equipment
CN104111980B (en) Extracting method, device and the terminal of web page contents
CN108241438B (en) Input method, input device and input device
CN108874172B (en) Input method and device
CN106843860B (en) Method, device and system for adjusting display style of search box

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination