CN105512220A - Image page output method and device - Google Patents

Image page output method and device Download PDF

Info

Publication number
CN105512220A
CN105512220A CN201510855907.0A CN201510855907A CN105512220A CN 105512220 A CN105512220 A CN 105512220A CN 201510855907 A CN201510855907 A CN 201510855907A CN 105512220 A CN105512220 A CN 105512220A
Authority
CN
China
Prior art keywords
image
recognition
recognition result
key word
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510855907.0A
Other languages
Chinese (zh)
Other versions
CN105512220B (en
Inventor
王百超
龙飞
汪平仄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Technology Co Ltd
Xiaomi Inc
Original Assignee
Xiaomi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaomi Inc filed Critical Xiaomi Inc
Priority to CN201510855907.0A priority Critical patent/CN105512220B/en
Publication of CN105512220A publication Critical patent/CN105512220A/en
Application granted granted Critical
Publication of CN105512220B publication Critical patent/CN105512220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Abstract

The invention discloses an image page output method and device, and belongs to the technical field of multimedia. The image page output method comprises that an acquisition request to the page is received; when the page comprises an image, the content of the image is identified, and an identification result is obtained; in dependence on the identification result, a first text message which describes the content of the image is generated; the page is outputted, and the page comprises the first text message. Under the condition that the image is not downloaded, the text description of each image in a current browsing page is provided, and therefore the user can know the content of the image preliminarily, the user can decide whether to open or pass over the image, the image page output method and device are intelligent, and network flow and time for downloading the image are saved.

Description

Image page output intent and device
Technical field
The disclosure relates to multimedia technology field, particularly a kind of image page output intent and device.
Background technology
Along with the development of infotech, the function that intelligent terminal possesses gets more and more.Such as, can image in browsing pages by intelligent terminal.At present, the quality of image is more and more superior, and an image has the size of even a few M of hundreds of K usually.Like this under intelligent terminal is in mobile network's connection status, the image crossed in browsing pages just can consume mass data flow.For this reason, intelligent terminal additionally provides a kind of without figure browse mode, and the word content namely when display page only in display page, does not show the image in the page.But, only have the page of content of text for comparatively uninteresting and dull undoubtedly user, therefore need a kind of image page output intent badly, to solve the above-mentioned problem expending flow and lack vividness.
Summary of the invention
For overcoming Problems existing in correlation technique, the disclosure provides a kind of image page output intent and device.
According to the first aspect of disclosure embodiment, provide a kind of image page output intent, described method comprises:
Receive the acquisition request to the page;
When the described page comprises image, content recognition is carried out to described image, obtains recognition result;
According to described recognition result, generate the first text message that the content of described image is described;
Export the described page, the described page comprises described first text message.
Alternatively, described according to described recognition result, generate the first text message that the content of described image is described, comprising:
Obtain the second text message associated with described image in the described page;
According to described second text message, described recognition result is verified;
According to the result and described recognition result, generate the first text message that the content of described image is described.
Alternatively, described method also comprises:
Obtain at least one the shielding key word pre-set;
If described recognition result comprises arbitrary shielding key word, then described recognition result is filtered out; Or,
If described recognition result more than the first predetermined threshold value, then filters out by the appearance ratio shielding key word described in described recognition result.
Alternatively, described content recognition is carried out to described image before, described method also comprises:
Object mark is carried out to the sample image that multiple comprise goal-selling thing, obtains first kind mark image;
Carry out model training according to described first kind mark image, obtain the first model;
Described content recognition is carried out to described image, obtains recognition result, comprising:
Described first model is utilized to carry out image recognition to described image;
When described image comprises arbitrary described goal-selling thing, obtain the first key word for describing the object in described image.
Alternatively, described content recognition is carried out to described image before, described method also comprises:
Scene mark is carried out to the sample image that multiple comprise default scene, obtains Equations of The Second Kind mark image;
Carry out model training according to described Equations of The Second Kind mark image, obtain the second model;
Described content recognition is carried out to described image, obtains recognition result, comprising:
Described second model is utilized to carry out image recognition to described image;
When described image comprises arbitrary described default scene, obtain the second key word for describing the scene in described image.
Alternatively, described content recognition is carried out to described image before, described method also comprises:
Text marking is carried out to multiple sample images, obtains the 3rd class mark image;
Carry out model training according to described 3rd class mark image, obtain the 3rd model;
Described content recognition is carried out to described image, obtains recognition result, comprising:
Described 3rd model is utilized to carry out image recognition to described image;
When described image comprises text, obtain the 3rd key word for describing the text in described image.
Alternatively, the corresponding recognition confidence of each key word in described recognition result, describedly verifies described recognition result according to described second text message, comprising:
Word segmentation processing is carried out to described second text message, obtains multiple participle;
For each key word in recognition result, judge whether comprise described key word in described multiple participle;
If described multiple participle comprises described key word, then increase the recognition confidence of described key word according to preset rules;
Wherein, described recognition confidence is for characterizing the probability be correctly validated.
Alternatively, described according to the result and described recognition result, generate the first text message that the content of described image is described, comprising:
Obtain recognition confidence in described recognition result and be greater than the nominal key of the second predetermined threshold value;
RNN (RecurrentneuralNetwork, Multi-Layer Feedback network) model is utilized described nominal key to be formed a statement, using described statement as described first text message.
According to the second aspect of disclosure embodiment, provide a kind of image page output unit, described device comprises:
Receiver module, is configured to receive the acquisition request to the page;
Identification module, is configured to, when the described page comprises image, carry out content recognition, obtain recognition result to described image;
Generation module, is configured to according to described recognition result, generates the first text message be described the content of described image;
Output module, is configured to export the described page, and the described page comprises described first text message.
Alternatively, described generation module, is configured to obtain the second text message associated with described image in described current browse webpage; According to described second text message, described recognition result is verified; According to the result and described recognition result, generate the first text message that the content of described image is described.
Alternatively, described device also comprises:
Acquisition module, is configured to obtain at least one the shielding key word pre-set;
Filtering module, is configured to, when described recognition result comprises arbitrary shielding key word, be filtered out by described recognition result; Or, when the appearance ratio shielding key word described in described recognition result is more than the first predetermined threshold value, described recognition result is filtered out.
Alternatively, described device also comprises:
Labeling module, the sample image being configured to comprise multiple goal-selling thing carries out object mark, obtains first kind mark image;
Training module, is configured to carry out model training according to described first kind mark image, obtains the first model;
Described identification module, is configured to utilize described first model to carry out image recognition to described image, when described image comprises arbitrary described goal-selling thing, obtains the first key word for describing the object in described image.
Alternatively, described device also comprises:
Labeling module, the sample image being configured to comprise multiple default scene carries out scene mark, obtains Equations of The Second Kind mark image;
Training module, is configured to carry out model training according to described Equations of The Second Kind mark image, obtains the second model;
Described identification module, is configured to utilize described second model to carry out image recognition to described image, when described image comprises arbitrary described default scene, obtains the second key word for describing the scene in described image.
Alternatively, described device also comprises:
Labeling module, is configured to carry out text marking to multiple sample images, obtains the 3rd class mark image;
Training module, is configured to carry out model training according to described 3rd class mark image, obtains the 3rd model;
Described identification module, is configured to utilize described 3rd model to carry out image recognition to described image, when described image comprises text, obtains the 3rd key word for describing the text in described image.
Alternatively, the corresponding recognition confidence of each key word in described recognition result, described authentication module, is configured to carry out word segmentation processing to described second text message, obtains multiple participle; For each key word in recognition result, judge whether comprise described key word in described multiple participle; If described multiple participle comprises described key word, then increase the recognition confidence of described key word according to preset rules;
Wherein, described recognition confidence is for characterizing the probability be correctly validated.
Alternatively, described generation module, is configured to obtain recognition confidence in described recognition result and is greater than the nominal key of the second predetermined threshold value; Utilize RNN model that described nominal key is formed a statement, using described statement as described first text message.
According to the third aspect of disclosure embodiment, a kind of image page output unit is provided, comprises:
Processor;
For the storer of storage of processor executable instruction;
Wherein, described processor is configured to: receive the acquisition request to the page; When the described page comprises image, content recognition is carried out to described image, obtains recognition result; According to described recognition result, generate the first text message that the content of described image is described; Export the described page, the described page comprises described first text message.
The technical scheme that embodiment of the present disclosure provides can comprise following beneficial effect:
When receiving the acquisition request of the page and judging to comprise image at this page, content recognition is carried out to image, and the first text message that the content of this image is described is generated according to the recognition result obtained, export afterwards and will comprise this page of the first text message, due to when not downloading image, provide and the text of each image in current browse webpage is described, therefore user can be helped tentatively to understand the content of image, thus assisting users determines whether open or skip over this image, intelligent more excellent, and save the time of network traffics and wait download image.
Should be understood that, it is only exemplary and explanatory that above general description and details hereinafter describe, and can not limit the disclosure.
Accompanying drawing explanation
Accompanying drawing to be herein merged in instructions and to form the part of this instructions, shows embodiment according to the invention, and is used from instructions one and explains principle of the present invention.
Fig. 1 is the process flow diagram of a kind of image page output intent according to an exemplary embodiment.
Fig. 2 is the process flow diagram of a kind of image page output intent according to an exemplary embodiment.
Fig. 3 is the block diagram of a kind of image page output unit according to an exemplary embodiment.
Fig. 4 is the block diagram of a kind of image page output unit according to an exemplary embodiment.
Fig. 5 is the block diagram of a kind of image page output unit according to an exemplary embodiment.
Fig. 6 is the block diagram of a kind of image page output unit according to an exemplary embodiment.
Embodiment
Here will be described exemplary embodiment in detail, its sample table shows in the accompanying drawings.When description below relates to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawing represents same or analogous key element.Embodiment described in following exemplary embodiment does not represent all embodiments consistent with the present invention.On the contrary, they only with as in appended claims describe in detail, the example of apparatus and method that aspects more of the present invention are consistent.
Fig. 1 is the process flow diagram of a kind of image page output intent according to an exemplary embodiment, and as shown in Figure 1, the method is used for, in image page output unit, comprising the following steps.
In a step 101, the acquisition request to the page is received.
In a step 102, when this page comprises image, content recognition is carried out to this image, obtains recognition result.
In step 103, according to this recognition result, generate the first text message that the content of this image is described.
At step 104, export the page, this page comprises the first text message.
The method that disclosure embodiment provides, when receiving the acquisition request of the page and judging to comprise image at this page, content recognition is carried out to image, and the first text message that the content of this image is described is generated according to the recognition result obtained, export afterwards and will comprise this page of the first text message, due to when not downloading image, provide and the text of each image in current browse webpage is described, therefore user can be helped tentatively to understand the content of image, thus assisting users determines whether open or skip over this image, intelligent more excellent, and save the time of network traffics and wait download image.
Alternatively, according to recognition result, generate the first text message that the content of image is described, comprising:
Obtain the second text message associated with image in the page;
According to the second text message, recognition result is verified;
According to the result and recognition result, generate the first text message that the content of image is described.
Alternatively, the method also comprises:
Obtain at least one the shielding key word pre-set;
If recognition result comprises arbitrary shielding key word, then recognition result is filtered out; Or,
If recognition result more than the first predetermined threshold value, then filters out by the appearance ratio shielding key word in recognition result.
Alternatively, before carrying out content recognition to image, the method also comprises:
Object mark is carried out to the sample image that multiple comprise goal-selling thing, obtains first kind mark image;
Carry out model training according to first kind mark image, obtain the first model;
Content recognition is carried out to image, obtains recognition result, comprising:
The first model is utilized to carry out image recognition to image;
When comprising arbitrary goal-selling thing in the picture, obtain the first key word for the object in Description Image.
Alternatively, before carrying out content recognition to image, the method also comprises:
Scene mark is carried out to the sample image that multiple comprise default scene, obtains Equations of The Second Kind mark image;
Carry out model training according to Equations of The Second Kind mark image, obtain the second model;
Content recognition is carried out to image, obtains recognition result, comprising:
The second model is utilized to carry out image recognition to image;
When comprising arbitrary default scene in the picture, obtain the second key word for the scene in Description Image.
Alternatively, before carrying out content recognition to image, the method also comprises:
Text marking is carried out to multiple sample images, obtains the 3rd class mark image;
Carry out model training according to the 3rd class mark image, obtain the 3rd model;
Content recognition is carried out to image, obtains recognition result, comprising:
The 3rd model is utilized to carry out image recognition to image;
When comprising text in the picture, obtain the 3rd key word for the text in Description Image.
Alternatively, the corresponding recognition confidence of each key word in recognition result, according to the second text message, recognition result is verified, comprising:
Word segmentation processing is carried out to the second text message, obtains multiple participle;
For each key word in recognition result, judge whether comprise key word in multiple participle;
If multiple participle comprises key word, then increase the recognition confidence of key word according to preset rules;
Wherein, recognition confidence is for characterizing the probability be correctly validated.
Alternatively, according to the result and recognition result, generate the first text message that the content of image is described, comprising:
Obtain recognition confidence in recognition result and be greater than the nominal key of the second predetermined threshold value;
RNN model is utilized nominal key to be formed a statement, using statement as the first text message.
Above-mentioned all alternatives, can adopt and combine arbitrarily formation embodiment of the present disclosure, this is no longer going to repeat them.
Fig. 2 is the process flow diagram of a kind of image page output intent according to an exemplary embodiment, and as shown in Figure 2, the method is used for, in image page output unit, comprising the following steps.
In step 201, receive the acquisition request to the page, if to be provided with without figure browse mode and this page comprises image, then content recognition to be carried out to each image that this page comprises, obtain recognition result.
The clicking operation performed in the browser can be installed in terminal by user the acquisition request of the page or application triggers.Such as, user wants to access a certain portal website, and to browse webpage, then the link by arranging for this portal website in click browser realizes.Wherein, this page can comprise the message display page etc. in webpage, application, and disclosure embodiment does not specifically limit this.
In the disclosed embodiments, when carrying out content recognition to image, before namely extracting content information in the picture, also needing first based on sample image Modling model, utilizing the model trained to extract content information in the picture.Wherein, content information can be face, famous landmark or buildings, word, and the scenes such as such as indoor, outdoor, sandy beach, meadow or snowfield, the animal such as cat, horse etc., disclosure embodiment does not specifically limit this.Wherein, the type of content-based information is different, specifically can be divided into 3 large classes.One class is object, such as face, the animal such as cat, horse, famous landmark or buildings etc.; One class is word, the word etc. in such as road indicator or trade company's brand mark board; One class is scene, such as indoor, outdoor, sandy beach, meadow or snowfield etc.In order to identify above-mentioned 3 class contents well in the picture, disclosure embodiment can train 3 corresponding models, utilizes these 3 models to identify the object in image, word and scene, as follows in detail:
The first, object mark is carried out to the sample image that multiple comprise goal-selling thing, obtain first kind mark image; Carry out model training according to first kind mark image, obtain the first model; Utilizing the first model to carry out image recognition to image, when comprising arbitrary goal-selling thing in the images, obtaining the first key word for describing the object in this image.
Multiple sample images can be collected in network.When carrying out object mark, the object region in each image all uses indicia framing by hand labeled.Multiple sample images can be divided into training dataset and test data set.Wherein, test data concentrates the image comprising and comprise object in a large number; Remaining picture construction training dataset.
When carrying out model training, CNN (ConvolutionalNeuralNetwork, convolutional neural networks) model can be trained.First, for each sample image, in this sample image, extract multiple target object candidate area.Calculate the feature of each target object candidate area in multiple candidate region.According to this feature, cluster is carried out to target complete phenology favored area, obtain specifying number a class.Parameters in initialization CNN model; The classification response of each candidate region is calculated based on initialized CNN model.For each candidate region, the classification according to candidate region responds, and determines the training classification that candidate region belongs to.Obtain in advance to the object mark result of sample image; According to object mark result, determine the concrete class that candidate region belongs to.According to training classification and concrete class, optimize the parameters in CNN model, until the error in classification of CNN model is less than predetermined threshold value.
After the CNN model obtaining training, utilizing the object of each image in this page of CNN model extraction, obtaining the first key word for describing object.Such as, if image comprises the toys such as cat and dog and little girl, then the first key word can be " cat, dog, girl ".
It should be noted that, when utilizing CNN model to carry out image recognition, except exporting the first key word, also can export the recognition confidence of each key word, be i.e. the probability that is correctly validated of object.Such as, if image comprises a cat, if recognition result is " cat ", so recognition confidence is high, identifies correct.If recognition result is " people ", so recognition confidence is low, identification error.
The second, scene mark is carried out to the sample image that multiple comprise default scene, obtain Equations of The Second Kind mark image; Carry out model training according to Equations of The Second Kind mark image, obtain the second model; Utilizing the second model to carry out image recognition to image, when comprising arbitrary default scene in the images, obtaining the second key word for describing the scene in this image.
Multiple sample images can be collected in network.When carrying out scene areas mark, the scene areas in each image all uses indicia framing by hand labeled.Multiple sample images can be divided into training dataset and test data set.Wherein, test data is concentrated and can be comprised a large amount of scene images; Remaining picture construction training dataset.When carrying out model training, can train CNN model, disclosure embodiment does not specifically limit this.Concrete CNN model training process with reference to above-mentioned steps model training process implementation, can repeat no more herein.
After the CNN model obtaining training, utilizing the scene of each image in this page of CNN model extraction, obtaining the second key word for describing scene.Such as, if image Scene is sea and blue sky, then the second key word can be in " sky, sea ".It should be noted that, when utilizing CNN model to carry out image recognition, except exporting the second key word, also can export the recognition confidence of each key word, be i.e. the probability that is correctly validated of scene.Such as, if image Scene is sea, if recognition result is " sea ", so recognition confidence is high, identifies correct.If recognition result is " meadow ", so recognition confidence is low, identification error.
Three, text marking is carried out to multiple sample images, obtain the 3rd class mark image; Carry out model training according to the 3rd class mark image, obtain the 3rd model; Utilizing the 3rd model to carry out image recognition to image, when comprising text in the images, obtaining the 3rd key word for describing the text in this image.
Multiple sample images can be collected in network.When carrying out text filed mark, the text filed indicia framing of all using in each image is by hand labeled.Multiple sample images can be divided into training dataset and test data set.Wherein, test data is concentrated and is comprised a large amount of text images; Remaining picture construction training dataset.When carrying out model training, can train CNN model or support vector machine classifier, disclosure embodiment does not specifically limit this.Concrete CNN model training process with reference to above-mentioned steps model training process implementation, can repeat no more herein.
When Training Support Vector Machines sorter, following manner can be taked to realize: for each sample image, obtain the training feature vector of this sample image; In whole training feature vector, determine the training feature vector that text image is corresponding, according to the training feature vector of text image, optimize the parameters in SVM classifier.After the 3rd model obtaining training, utilizing the text of each image in the 3rd this page of model extraction, obtaining the 3rd key word for describing text.It should be noted that, when utilizing the 3rd model to carry out image recognition, except exporting the 3rd key word, also can export the recognition confidence of each key word, be i.e. the probability that is correctly validated of scene.
In step 202., at least one the shielding key word pre-set is obtained; Judge whether comprise shielding key word in this recognition result; If this recognition result comprises arbitrary shielding key word, then perform following step 203; If do not comprise shielding key word in this recognition result, then perform following step 204.
Wherein, shield key word to be arranged in advance by user, for automatically shielding the image that user does not want to see under page browsing pattern.The star that the unsound image of such as content, user do not like or scene etc., disclosure embodiment does not specifically limit this.In the disclosed embodiments, if a certain image comprises shielding key word in current browse webpage, then recognition result is filtered out, any description is not carried out to this image.
In step 203, if this recognition result comprises arbitrary shielding key word, then recognition result is filtered out.
In another embodiment, except can taking aforesaid way and recognition result filtered, the disclosure embodiment still provides another filter type, as follows in detail: if the appearance ratio shielding key word in recognition result is more than the first predetermined threshold value, then filtered out by recognition result.Wherein, the first predetermined threshold value can be 80% or 90% etc., and the disclosure does not specifically limit this.Such as, recognition result is " Fan Bingbing, meadow, horse ", and shields key word and contain above-mentioned 3 key words, then direct this image filtering to be fallen, under without figure browse mode without the need to carrying out any description to this image, because user is indifferent to this image, even bored.
Filtered out by recognition result, the image that also namely this recognition result is corresponding carries out any description, is directly abandoned by this image, does not show under without figure browse mode to it.Wherein, the functional switch of image display is provided with in browser.When switch is in opening, browser is in image page output mode; When switch is in closed condition, browser is in without under figure browse mode.
In step 204, if do not comprise shielding key word in this recognition result, then obtain the second text message associated with image in this page, according to the second text message, this recognition result is verified.
Wherein, the second text message is text relevant to this image in current browse webpage.Such as, at present when writing a piece of news, usually except comprising content of text, also can carry out figure for text content.So, second text message of this content of text just for associating with this figure.
In the disclosed embodiments, when verifying this recognition result according to the second text message, following manner can be taked to realize:
Word segmentation processing is carried out to the second text message, obtains multiple participle; For each key word in this recognition result, judge whether comprise this key word in multiple participle; If multiple participle comprises this key word, then increase the recognition confidence of this key word according to preset rules.
A simple example explains above-mentioned steps below.Be " today, model ice ice on certain seabeach was that a certain brand wrist-watch have taken magazine, led the picture of horse-ride step row comprising model ice ice at seabeach " with the second text message.Then when carrying out word segmentation processing, can cutting for " today, certain, seabeach, on, Fan Bingbing, for, a certain, brand wrist-watch, shooting, magazine, wherein, comprise, Fan Bingbing, at, seabeach, lead, horse, walking, picture " etc. multiple participle, if this recognition result comprises " Fan Bingbing, meadow, horse, seashore ", because key word in recognition result " Fan Bingbing " and " horse " have all appeared in multiple participle, therefore the probability that this both keyword is correctly validated improves greatly, so increase the recognition confidence of this both keyword." seashore ", although this word does not appear in multiple participle completely, has occurred in multiple participle therefore also improving its recognition confidence by the word that " seabeach " is such.Wherein, preset rules can be and the recognition confidence of the key word mated completely is improved 10% or 15%, and by the recognition confidence of the key word of semi-match raising 5% or 10% etc., disclosure embodiment does not specifically limit this.
In step 205, according to the result and this recognition result, the first text message that the content of this image is described is generated.
In the disclosed embodiments, according to the result and this recognition result, when generating the first text message be described the content of this image, following manner can be taked to realize:
Obtain recognition confidence in recognition result and be greater than the nominal key of the second predetermined threshold value; RNN model is utilized nominal key to be formed a statement, using this statement as the first text message.
Wherein, the second predetermined threshold value can be 80% or 90% etc., and disclosure embodiment does not specifically limit this.RNN model is existing maturity model, and the key word composition sentence that can will generate, such as adds conjunction etc., be convenient to read, repeat no more herein between key word.After being sorted according to recognition confidence by each keyword root in recognition result, select the nominal key that recognition confidence is greater than the second predetermined threshold value.Be " Fan Bingbing, horse, seashore " for nominal key, then utilize RNN model nominal key can be formed " Fan Bingbing leads a horse by the sea " of this sort statement, it can be used as the first text message.
In step 206, this page comprising the first text message is exported.
Wherein, the first corresponding for each image text message can be presented at the position that originally should show this image, disclosure embodiment does not specifically limit this.Whether, due to the existence of the first text message, user roughly can understand picture material based on text descriptor, thus select to continue download image or directly filter out image, for user provides convenience.
The method that disclosure embodiment provides, when receiving the acquisition request of the page and judging to comprise image at this page, content recognition is carried out to image, and the first text message that the content of this image is described is generated according to the recognition result obtained, export afterwards and will comprise this page of the first text message, due to when not downloading image, provide and the text of each image in current browse webpage is described, therefore user can be helped tentatively to understand the content of image, thus assisting users determines whether open or skip over this image, intelligent more excellent, and save the time of network traffics and wait download image.In addition, also by arranging shielding key word, user not being wished that the image seen carries out shielding processing, improving user experience further.
Fig. 3 is the block diagram of a kind of image page output unit according to an exemplary embodiment.With reference to Fig. 3, this device comprises receiver module 301, identification module 302, generation module 303, output module 304.
Wherein, receiver module 301, is configured to receive the acquisition request to the page; Identification module 302, is configured to, when this page comprises image, carry out content recognition, obtain recognition result to image; Generation module 303, is configured to according to recognition result, generates the first text message be described the content of this image; Output module 304, be configured to export this page, this page comprises the first text message.
Alternatively, generation module 303, is configured to obtain the second text message associated with image in current browse webpage; According to the second text message, recognition result is verified; According to the result and recognition result, generate the first text message that the content of image is described.
See Fig. 4, this device also comprises:
Acquisition module 305, is configured to obtain at least one the shielding key word pre-set;
Filtering module 306, is configured to, when recognition result comprises arbitrary shielding key word, be filtered out by recognition result; Or, when the appearance ratio shielding key word in recognition result is more than the first predetermined threshold value, recognition result is filtered out.
See Fig. 5, this device also comprises:
Labeling module 307, the sample image being configured to comprise multiple goal-selling thing carries out object mark, obtains first kind mark image;
Training module 308, is configured to carry out model training according to first kind mark image, obtains the first model;
Identification module 302, is configured to utilize the first model to carry out image recognition to image, when comprising arbitrary goal-selling thing in the picture, obtains the first key word for the object in Description Image.
Alternatively, this device also comprises:
Labeling module 307, the sample image being configured to comprise multiple default scene carries out scene mark, obtains Equations of The Second Kind mark image;
Training module 308, is configured to carry out model training according to Equations of The Second Kind mark image, obtains the second model;
Identification module 302, is configured to utilize the second model to carry out image recognition to image, when comprising arbitrary default scene in the picture, obtains the second key word for the scene in Description Image.
Alternatively, this device also comprises:
Labeling module 307, is configured to carry out text marking to multiple sample images, obtains the 3rd class mark image;
Training module 308, is configured to carry out model training according to the 3rd class mark image, obtains the 3rd model;
Identification module 302, is configured to utilize the 3rd model to carry out image recognition to image, when comprising text in the picture, obtains the 3rd key word for the text in Description Image.
Alternatively, the corresponding recognition confidence of each key word in recognition result, authentication module 303, is configured to carry out word segmentation processing to the second text message, obtains multiple participle; For each key word in recognition result, judge whether comprise key word in multiple participle; If multiple participle comprises key word, then increase the recognition confidence of key word according to preset rules;
Wherein, recognition confidence is for characterizing the probability be correctly validated.
Alternatively, generation module 303, is configured to obtain recognition confidence in recognition result and is greater than the nominal key of the second predetermined threshold value; RNN model is utilized nominal key to be formed a statement, using statement as the first text message.
The device that disclosure embodiment provides, when receiving the acquisition request of the page and judging to comprise image at this page, content recognition is carried out to image, and the first text message that the content of this image is described is generated according to the recognition result obtained, export afterwards and will comprise this page of the first text message, due to when not downloading image, provide and the text of each image in current browse webpage is described, therefore user can be helped tentatively to understand the content of image, thus assisting users determines whether open or skip over this image, intelligent more excellent, and save the time of network traffics and wait download image.In addition, also by arranging shielding key word, user not being wished that the image seen carries out shielding processing, improving user experience further.
About the device in above-described embodiment, wherein the concrete mode of modules executable operations has been described in detail in about the embodiment of the method, will not elaborate explanation herein.
Fig. 6 is a kind of block diagram be configured to the device 600 that image page exports according to an exemplary embodiment.Such as, device 600 can be mobile phone, computing machine, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc.
With reference to Fig. 6, device 600 can comprise following one or more assembly: processing components 602, storer 604, power supply module 606, multimedia groupware 608, audio-frequency assembly 610, I/O (Input/Output, I/O) interface 612, sensor module 614, and communications component 616.
The integrated operation of the usual control device 600 of processing components 602, such as with display, call, data communication, camera operation and record operate the operation be associated.Processing components 602 can comprise one or more processor 620 to perform instruction, to complete all or part of step of above-mentioned method.In addition, processing components 602 can comprise one or more module, and what be convenient between processing components 602 and other assemblies is mutual.Such as, processing components 602 can comprise multi-media module, mutual with what facilitate between multimedia groupware 608 and processing components 602.
Storer 604 is configured to store various types of data to be supported in the operation of device 600.The example of these data comprises the instruction being configured to any application program or the method operated on device 600, contact data, telephone book data, message, picture, video etc.Storer 604 can be realized by the volatibility of any type or non-volatile memory device or their combination, as SRAM (StaticRandomAccessMemory, static RAM), EEPROM (Electrically-ErasableProgrammableRead-OnlyMemory, Electrically Erasable Read Only Memory), EPROM (ErasableProgrammableReadOnlyMemory, Erasable Programmable Read Only Memory EPROM), PROM (ProgrammableRead-OnlyMemory, programmable read only memory), ROM (Read-OnlyMemory, ROM (read-only memory)), magnetic store, flash memory, disk or CD.
The various assemblies that power supply module 606 is device 600 provide electric power.Power supply module 606 can comprise power-supply management system, one or more power supply, and other and the assembly generating, manage and distribute electric power for device 600 and be associated.
Multimedia groupware 608 is included in the screen providing an output interface between described device 600 and user.In certain embodiments, screen can comprise LCD (LiquidCrystalDisplay, liquid crystal display) and TP (TouchPanel, touch panel).If screen comprises touch panel, screen may be implemented as touch-screen, to receive the input signal from user.Touch panel comprises one or more touch sensor with the gesture on sensing touch, slip and touch panel.Described touch sensor can the border of not only sensing touch or sliding action, but also detects the duration relevant to described touch or slide and pressure.In certain embodiments, multimedia groupware 608 comprises a front-facing camera and/or post-positioned pick-up head.When device 600 is in operator scheme, during as screening-mode or video mode, front-facing camera and/or post-positioned pick-up head can receive outside multi-medium data.Each front-facing camera and post-positioned pick-up head can be fixing optical lens systems or have focal length and optical zoom ability.
Audio-frequency assembly 610 is configured to export and/or input audio signal.Such as, audio-frequency assembly 610 comprises a MIC (Microphone, microphone), and when device 600 is in operator scheme, during as call model, logging mode and speech recognition mode, microphone is configured to receive external audio signal.The sound signal received can be stored in storer 604 further or be sent via communications component 616.In certain embodiments, audio-frequency assembly 610 also comprises a loudspeaker, is configured to output audio signal.
I/O interface 612 is for providing interface between processing components 602 and peripheral interface module, and above-mentioned peripheral interface module can be keyboard, some striking wheel, button etc.These buttons can include but not limited to: home button, volume button, start button and locking press button.
Sensor module 614 comprises one or more sensor, is configured to as device 600 provides the state estimation of various aspects.Such as, sensor module 614 can detect the opening/closing state of equipment 600, the relative positioning of assembly, such as assembly is display and the keypad of device 600, the position of all right pick-up unit 600 of sensor module 614 or device 600 1 assemblies changes, the presence or absence that user contacts with device 600, the temperature variation of device 600 orientation or acceleration/deceleration and device 600.Sensor module 614 can comprise proximity transducer, be configured to without any physical contact time detect near the existence of object.Sensor module 614 can also comprise optical sensor, as CMOS (ComplementaryMetalOxideSemiconductor, CMOS (Complementary Metal Oxide Semiconductor)) or CCD (Charge-coupledDevice, charge coupled cell) imageing sensor, be configured to use in imaging applications.In certain embodiments, this sensor module 614 can also comprise acceleration transducer, gyro sensor, Magnetic Sensor, pressure transducer or temperature sensor.
Communications component 616 is configured to the communication being convenient to wired or wireless mode between device 600 and other equipment.Device 600 can access the wireless network based on communication standard, as WiFi, 2G or 3G, or their combination.In one exemplary embodiment, communications component 616 receives from the broadcast singal of external broadcasting management system or broadcast related information via broadcast channel.In one exemplary embodiment, described communications component 616 also comprises NFC (NearFieldCommunication, near-field communication) module, to promote junction service.Such as, can based on RFID (RadioFrequencyIdentification in NFC module, radio-frequency (RF) identification) technology, IrDA (Infra-redDataAssociation, Infrared Data Association) technology, UWB (UltraWideband, ultra broadband) technology, BT (Bluetooth, bluetooth) technology and other technologies realize.
In the exemplary embodiment, device 600 can by one or more ASIC (ApplicationSpecificIntegratedCircuit, application specific integrated circuit), DSP (DigitalsignalProcessor, digital signal processor), DSPD (DigitalsignalProcessorDevice, digital signal processing appts), PLD (ProgrammableLogicDevice, programmable logic device (PLD)), FPGA) (FieldProgrammableGateArray, field programmable gate array), controller, microcontroller, microprocessor or other electronic components realize, be configured to perform said method.
In the exemplary embodiment, additionally provide a kind of non-transitory computer-readable recording medium comprising instruction, such as, comprise the storer 604 of instruction, above-mentioned instruction can perform said method by the processor 620 of device 600.Such as, described non-transitory computer-readable recording medium can be ROM, RAM (RandomAccessMemory, random access memory), CD-ROM (CompactDiscRead-OnlyMemory, compact disc read-only memory), tape, floppy disk and optical data storage devices etc.
Image page exports the non-transitory computer-readable recording medium that disclosure embodiment provides, when receiving the acquisition request of the page and judging to comprise image at this page, content recognition is carried out to image, and the first text message that the content of this image is described is generated according to the recognition result obtained, export afterwards and will comprise this page of the first text message, due to when not downloading image, provide and the text of each image in current browse webpage is described, therefore user can be helped tentatively to understand the content of image, thus assisting users determines whether open or skip over this image, intelligent more excellent, and save the time of network traffics and wait download image.In addition, also by arranging shielding key word, user not being wished that the image seen carries out shielding processing, improving user experience further.
Those skilled in the art, at consideration instructions and after putting into practice invention disclosed herein, will easily expect other embodiment of the present invention.The application is intended to contain any modification of the present invention, purposes or adaptations, and these modification, purposes or adaptations are followed general principle of the present invention and comprised the undocumented common practise in the art of the disclosure or conventional techniques means.Instructions and embodiment are only regarded as exemplary, and true scope of the present invention and spirit are pointed out by claim below.
Should be understood that, the present invention is not limited to precision architecture described above and illustrated in the accompanying drawings, and can carry out various amendment and change not departing from its scope.Scope of the present invention is only limited by appended claim.

Claims (17)

1. an image page output intent, is characterized in that, described method comprises:
Receive the acquisition request to the page;
When the described page comprises image, content recognition is carried out to described image, obtains recognition result;
According to described recognition result, generate the first text message that the content of described image is described;
Export the described page, the described page comprises described first text message.
2. method according to claim 1, is characterized in that, described according to described recognition result, generates the first text message be described the content of described image, comprising:
Obtain the second text message associated with described image in the described page;
According to described second text message, described recognition result is verified;
According to the result and described recognition result, generate the first text message that the content of described image is described.
3. method according to claim 1, is characterized in that, described method also comprises:
Obtain at least one the shielding key word pre-set;
If described recognition result comprises arbitrary shielding key word, then described recognition result is filtered out; Or,
If described recognition result more than the first predetermined threshold value, then filters out by the appearance ratio shielding key word described in described recognition result.
4. method according to claim 1, is characterized in that, described content recognition is carried out to described image before, described method also comprises:
Object mark is carried out to the sample image that multiple comprise goal-selling thing, obtains first kind mark image;
Carry out model training according to described first kind mark image, obtain the first model;
Described content recognition is carried out to described image, obtains recognition result, comprising:
Described first model is utilized to carry out image recognition to described image;
When described image comprises arbitrary described goal-selling thing, obtain the first key word for describing the object in described image.
5. method according to claim 1, is characterized in that, described content recognition is carried out to described image before, described method also comprises:
Scene mark is carried out to the sample image that multiple comprise default scene, obtains Equations of The Second Kind mark image;
Carry out model training according to described Equations of The Second Kind mark image, obtain the second model;
Described content recognition is carried out to described image, obtains recognition result, comprising:
Described second model is utilized to carry out image recognition to described image;
When described image comprises arbitrary described default scene, obtain the second key word for describing the scene in described image.
6. method according to claim 1, is characterized in that, described content recognition is carried out to described image before, described method also comprises:
Text marking is carried out to multiple sample images, obtains the 3rd class mark image;
Carry out model training according to described 3rd class mark image, obtain the 3rd model;
Described content recognition is carried out to described image, obtains recognition result, comprising:
Described 3rd model is utilized to carry out image recognition to described image;
When described image comprises text, obtain the 3rd key word for describing the text in described image.
7. the method according to claim arbitrary in claim 4 to 6, is characterized in that, the corresponding recognition confidence of each key word in described recognition result, describedly verifies described recognition result according to described second text message, comprising:
Word segmentation processing is carried out to described second text message, obtains multiple participle;
For each key word in recognition result, judge whether comprise described key word in described multiple participle;
If described multiple participle comprises described key word, then increase the recognition confidence of described key word according to preset rules;
Wherein, described recognition confidence is for characterizing the probability be correctly validated.
8. method according to claim 7, is characterized in that, described according to the result and described recognition result, generates the first text message be described the content of described image, comprising:
Obtain recognition confidence in described recognition result and be greater than the nominal key of the second predetermined threshold value;
Multi-Layer Feedback network RNN model is utilized described nominal key to be formed a statement, using described statement as described first text message.
9. an image page output unit, is characterized in that, described device comprises:
Receiver module, is configured to receive the acquisition request to the page;
Identification module, is configured to, when the described page comprises image, carry out content recognition, obtain recognition result to described image;
Generation module, is configured to according to described recognition result, generates the first text message be described the content of described image;
Output module, is configured to export the described page, and the described page comprises described first text message.
10. device according to claim 9, is characterized in that, described generation module, is configured to obtain the second text message associated with described image in described current browse webpage; According to described second text message, described recognition result is verified; According to the result and described recognition result, generate the first text message that the content of described image is described.
11. devices according to claim 9, is characterized in that, described device also comprises:
Acquisition module, is configured to obtain at least one the shielding key word pre-set;
Filtering module, is configured to, when described recognition result comprises arbitrary shielding key word, be filtered out by described recognition result; Or, when the appearance ratio shielding key word described in described recognition result is more than the first predetermined threshold value, described recognition result is filtered out.
12. devices according to claim 9, is characterized in that, described device also comprises:
Labeling module, the sample image being configured to comprise multiple goal-selling thing carries out object mark, obtains first kind mark image;
Training module, is configured to carry out model training according to described first kind mark image, obtains the first model;
Described identification module, is configured to utilize described first model to carry out image recognition to described image, when described image comprises arbitrary described goal-selling thing, obtains the first key word for describing the object in described image.
13. devices according to claim 9, is characterized in that, described device also comprises:
Labeling module, the sample image being configured to comprise multiple default scene carries out scene mark, obtains Equations of The Second Kind mark image;
Training module, is configured to carry out model training according to described Equations of The Second Kind mark image, obtains the second model;
Described identification module, is configured to utilize described second model to carry out image recognition to described image, when described image comprises arbitrary described default scene, obtains the second key word for describing the scene in described image.
14. devices according to claim 9, is characterized in that, described device also comprises:
Labeling module, is configured to carry out text marking to multiple sample images, obtains the 3rd class mark image;
Training module, is configured to carry out model training according to described 3rd class mark image, obtains the 3rd model;
Described identification module, is configured to utilize described 3rd model to carry out image recognition to described image, when described image comprises text, obtains the 3rd key word for describing the text in described image.
15. according to claim 12 to the device described in arbitrary claim in 14, it is characterized in that, the corresponding recognition confidence of each key word in described recognition result, described authentication module, be configured to carry out word segmentation processing to described second text message, obtain multiple participle; For each key word in recognition result, judge whether comprise described key word in described multiple participle; If described multiple participle comprises described key word, then increase the recognition confidence of described key word according to preset rules;
Wherein, described recognition confidence is for characterizing the probability be correctly validated.
16. devices according to claim 15, is characterized in that, described generation module, are configured to obtain recognition confidence in described recognition result and are greater than the nominal key of the second predetermined threshold value; Multi-Layer Feedback network RNN model is utilized described nominal key to be formed a statement, using described statement as described first text message.
17. 1 kinds of image page output units, is characterized in that, comprising:
Processor;
For the storer of storage of processor executable instruction;
Wherein, described processor is configured to: receive the acquisition request to the page; When the described page comprises image, content recognition is carried out to described image, obtains recognition result; According to described recognition result, generate the first text message that the content of described image is described; Export the described page, the described page comprises described first text message.
CN201510855907.0A 2015-11-30 2015-11-30 Image page output method and device Active CN105512220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510855907.0A CN105512220B (en) 2015-11-30 2015-11-30 Image page output method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510855907.0A CN105512220B (en) 2015-11-30 2015-11-30 Image page output method and device

Publications (2)

Publication Number Publication Date
CN105512220A true CN105512220A (en) 2016-04-20
CN105512220B CN105512220B (en) 2018-12-11

Family

ID=55720202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510855907.0A Active CN105512220B (en) 2015-11-30 2015-11-30 Image page output method and device

Country Status (1)

Country Link
CN (1) CN105512220B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446782A (en) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 Image identification method and device
CN107590252A (en) * 2017-09-19 2018-01-16 百度在线网络技术(北京)有限公司 Method and device for information exchange
CN109359257A (en) * 2018-10-09 2019-02-19 上海二三四五网络科技有限公司 A kind of control method and control device realized in browser of mobile terminal without figure browsing
CN109885842A (en) * 2018-02-22 2019-06-14 谷歌有限责任公司 Handle text neural network
CN110489674A (en) * 2019-07-02 2019-11-22 百度在线网络技术(北京)有限公司 Page processing method, device and equipment
CN112149412A (en) * 2020-10-23 2020-12-29 北京金和网络股份有限公司 Catering industry service supervision method, device and system
CN113239302A (en) * 2021-04-23 2021-08-10 维沃移动通信(杭州)有限公司 Page display method and device and electronic equipment
CN115134319A (en) * 2022-06-29 2022-09-30 维沃移动通信有限公司 Information display method and device, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110016150A1 (en) * 2009-07-20 2011-01-20 Engstroem Jimmy System and method for tagging multiple digital images
CN104536973A (en) * 2014-12-03 2015-04-22 北京奇虎科技有限公司 Picture identification method and browser client
CN104808979A (en) * 2014-01-28 2015-07-29 诺基亚公司 Method and device for generating or using information associated with image contents
CN105095498A (en) * 2015-08-24 2015-11-25 北京旷视科技有限公司 Information processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110016150A1 (en) * 2009-07-20 2011-01-20 Engstroem Jimmy System and method for tagging multiple digital images
CN104808979A (en) * 2014-01-28 2015-07-29 诺基亚公司 Method and device for generating or using information associated with image contents
CN104536973A (en) * 2014-12-03 2015-04-22 北京奇虎科技有限公司 Picture identification method and browser client
CN105095498A (en) * 2015-08-24 2015-11-25 北京旷视科技有限公司 Information processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
匿名: ""让电脑用一句话描述出图片的内容,Google现在可以做到了"", 《HTTP://WWW.PINGWEST.COM/GOOGLE-DESCRIPTION-PHOTOS/》 *
匿名: "图片加Alt属性还是Title属性有什么好处-SEO", 《HTTP://BLOG.SINA.COM.CN/S/BLOG_830EDCF30101F3WB.HTML》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446782A (en) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 Image identification method and device
CN107590252A (en) * 2017-09-19 2018-01-16 百度在线网络技术(北京)有限公司 Method and device for information exchange
CN109885842A (en) * 2018-02-22 2019-06-14 谷歌有限责任公司 Handle text neural network
CN109885842B (en) * 2018-02-22 2023-06-20 谷歌有限责任公司 Processing text neural networks
CN109359257A (en) * 2018-10-09 2019-02-19 上海二三四五网络科技有限公司 A kind of control method and control device realized in browser of mobile terminal without figure browsing
CN110489674A (en) * 2019-07-02 2019-11-22 百度在线网络技术(北京)有限公司 Page processing method, device and equipment
CN110489674B (en) * 2019-07-02 2020-11-06 百度在线网络技术(北京)有限公司 Page processing method, device and equipment
CN112149412A (en) * 2020-10-23 2020-12-29 北京金和网络股份有限公司 Catering industry service supervision method, device and system
CN113239302A (en) * 2021-04-23 2021-08-10 维沃移动通信(杭州)有限公司 Page display method and device and electronic equipment
WO2022223002A1 (en) * 2021-04-23 2022-10-27 维沃移动通信(杭州)有限公司 Page display method and apparatus, and electronic device
CN115134319A (en) * 2022-06-29 2022-09-30 维沃移动通信有限公司 Information display method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN105512220B (en) 2018-12-11

Similar Documents

Publication Publication Date Title
CN105512220A (en) Image page output method and device
CN106557768A (en) The method and device is identified by word in picture
CN104731688B (en) Point out the method and device of reading progress
CN104219785A (en) Real-time video providing method and device, server and terminal device
CN105302315A (en) Image processing method and device
CN104933170A (en) Information exhibition method and device
CN104298547A (en) Terminal setting method and device
CN104268547A (en) Method and device for playing music based on picture content
CN104615655A (en) Information recommendation method and device
CN104268150A (en) Method and device for playing music based on image content
CN105184313A (en) Classification model construction method and device
CN104284240A (en) Video browsing method and device
CN104079964B (en) The method and device of transmission of video information
CN105809174A (en) Method and device for identifying image
CN106126632A (en) Recommend method and device
CN104461348A (en) Method and device for selecting information
CN107870712A (en) A kind of screenshot processing method and device
CN104866523A (en) Page display method and device
CN104331503A (en) Information push method and device
CN104615663A (en) File sorting method and device and terminal
CN105550235A (en) Information acquisition method and information acquisition apparatuses
CN105335198A (en) Font addition method and device
CN106503131A (en) Obtain the method and device of interest information
CN107544802A (en) device identification method and device
CN105101354A (en) Wireless network connection method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant