CN105512220B

CN105512220B - Image page output method and device

Info

Publication number: CN105512220B
Application number: CN201510855907.0A
Authority: CN
Inventors: 王百超; 龙飞; 汪平仄
Original assignee: Xiaomi Inc
Current assignee: Xiaomi Inc
Priority date: 2015-11-30
Filing date: 2015-11-30
Publication date: 2018-12-11
Anticipated expiration: 2035-11-30
Also published as: CN105512220A

Abstract

The disclosure is directed to a kind of image page output method and devices, belong to multimedia technology field.The described method includes: receiving the acquisition request to the page；When the page includes image, content recognition is carried out to described image, obtains recognition result；According to the recognition result, the first text information that the content of described image is described is generated；The page is exported, the page includes first text information.Due in the case where not downloading image, provide the text description to each image in current browse webpage, therefore user can be helped tentatively to understand the content of image, to assist user to decide whether to open or skip over this image, intelligence is more excellent, and saves network flow and wait the time of image downloading.

Description

Image page output method and device

Technical field

This disclosure relates to multimedia technology field, in particular to a kind of image page output method and device.

Background technique

With the continuous development of information technology, the function that intelligent terminal has is more and more.For example, can by intelligent terminal Image in browsing pages.Currently, the quality of image is more and more superior, an image usually has several hundred K even size of several M. In this way in the case where intelligent terminal is in mobile network's connection status, the image crossed in browsing pages will consume mass data stream Amount.For this purpose, intelligent terminal additionally provides a kind of no figure browse mode, i.e., only shown when showing the page in the text in the page Hold, the image in the page is not shown.But the page of only content of text for a user undoubtedly more it is uninteresting and Dullness, therefore a kind of image page output method is needed, to solve the problems, such as above-mentioned consuming flow and lack vividness.

Summary of the invention

To overcome the problems in correlation technique, the disclosure provides a kind of image page output method and device.

According to the first aspect of the embodiments of the present disclosure, a kind of image page output method is provided, which comprises

Receive the acquisition request to the page；

When the page includes image, content recognition is carried out to described image, obtains recognition result；

According to the recognition result, the first text information that the content of described image is described is generated；

The page is exported, the page includes first text information.

Optionally, described according to the recognition result, generate the first text envelope that the content of described image is described Breath, comprising:

Obtain in the page with associated second text information of described image；

The recognition result is verified according to second text information；

According to verification result and the recognition result, the first text envelope that the content of described image is described is generated Breath.

Optionally, the method also includes:

Obtain at least one pre-set shielding keyword；

If including any shielding keyword in the recognition result, the recognition result is filtered out；Or,

If the appearance ratio for shielding keyword described in the recognition result is more than the first preset threshold, by the identification As a result it filters out.

Optionally, before the progress content recognition to described image, the method also includes:

Object mark is carried out to the sample image that multiple include goal-selling object, obtains first kind mark image；

Image is marked according to the first kind and carries out model training, obtains the first model；

It is described that content recognition is carried out to described image, obtain recognition result, comprising:

Image recognition is carried out to described image using first model；

When including any goal-selling object in described image, obtain for describing the object in described image First keyword.

Scene mark is carried out to multiple sample images for including default scene, obtains the second class mark image；

Image is marked according to second class and carries out model training, obtains the second model；

Image recognition is carried out to described image using second model；

When including any default scene in described image, second for describing the scene in described image is obtained Keyword.

Text marking is carried out to multiple sample images, obtains third class mark image；

Image is marked according to the third class and carries out model training, obtains third model；

Image recognition is carried out to described image using the third model；

When including text in described image, the third keyword for describing the text in described image is obtained.

Optionally, the corresponding recognition confidence of each of described recognition result keyword, it is described according to described the Two text informations verify the recognition result, comprising:

Word segmentation processing is carried out to second text information, obtains multiple participles；

For each of recognition result keyword, judge in the multiple participle whether to include the keyword；

If including the keyword in the multiple participle, increase the identification confidence of the keyword according to preset rules Degree；

Wherein, the recognition confidence is for characterizing the probability being correctly validated.

Optionally, described according to verification result and the recognition result, it generates and the content of described image is described First text information, comprising:

Obtain the nominal key that recognition confidence in the recognition result is greater than the second preset threshold；

Using RNN (Recurrent neural Network, Multi-Layer Feedback network) model by the nominal key group At a sentence, using the sentence as first text information.

According to the second aspect of an embodiment of the present disclosure, a kind of image page output device is provided, described device includes:

Receiving module is configured as receiving the acquisition request to the page；

Identification module is configured as when the page includes image, is carried out content recognition to described image, is identified As a result；

Generation module is configured as generating first that the content of described image is described according to the recognition result Text information；

Output module is configured as exporting the page, and the page includes first text information.

Optionally, the generation module is configured as obtaining in the current browse webpage and described image associated Two text informations；The recognition result is verified according to second text information；According to verification result and the identification As a result, generating the first text information that the content of described image is described.

Optionally, described device further include:

Module is obtained, is configured as obtaining at least one pre-set shielding keyword；

Filtering module is configured as when in the recognition result including any shielding keyword, by the recognition result It filters out；Or, when the appearance ratio of shielding keyword described in the recognition result is more than the first preset threshold, by the knowledge Other result filters out.

Optionally, described device further include:

Labeling module is configured as including that the sample image of goal-selling object carries out object mark to multiple, obtains the One kind mark image；

Training module is configured as marking image progress model training according to the first kind, obtains the first model；

The identification module is configured as carrying out image recognition to described image using first model, in the figure When including any goal-selling object as in, the first keyword for describing the object in described image is obtained.

Optionally, described device further include:

Labeling module is configured as including the sample image progress scene mark for presetting scene to multiple, obtains the second class Mark image；

Training module is configured as marking image progress model training according to second class, obtains the second model；

The identification module is configured as carrying out image recognition to described image using second model, in the figure When including any default scene as in, the second keyword for describing the scene in described image is obtained.

Optionally, described device further include:

Labeling module is configured as carrying out text marking to multiple sample images, obtains third class mark image；

Training module is configured as marking image progress model training according to the third class, obtains third model；

The identification module is configured as carrying out image recognition to described image using the third model, in the figure When including text as in, the third keyword for describing the text in described image is obtained.

Optionally, the corresponding recognition confidence of each of described recognition result keyword, the authentication module, quilt It is configured to carry out word segmentation processing to second text information, obtains multiple participles；For each of recognition result key Whether word judges in the multiple participle to include the keyword；If in the multiple participle including the keyword, according to Preset rules increase the recognition confidence of the keyword；

Optionally, it is default greater than second to be configured as obtaining recognition confidence in the recognition result for the generation module The nominal key of threshold value；The nominal key is formed into a sentence using RNN model, using the sentence as described the One text information.

According to the third aspect of an embodiment of the present disclosure, a kind of image page output device is provided, comprising:

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to: receive to the acquisition request of the page；It is right when the page includes image Described image carries out content recognition, obtains recognition result；According to the recognition result, the content of described image is retouched in generation The first text information stated；The page is exported, the page includes first text information.

The technical scheme provided by this disclosed embodiment can include the following benefits:

It is receiving the acquisition request to the page and is judging when the page includes image, content recognition is carried out to image, And the first text information that the content of the image is described is generated according to obtained recognition result, output later will include the Page of one text information, due in the case where not downloading image, providing to each image in current browse webpage Text description, therefore user can be helped tentatively to understand the content of image, so that user be assisted to decide whether to open or skip over this Image, intelligence is more excellent, and saves network flow and wait the time of image downloading.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.

Fig. 1 is a kind of flow chart of image page output method shown according to an exemplary embodiment.

Fig. 2 is a kind of flow chart of image page output method shown according to an exemplary embodiment.

Fig. 3 is a kind of block diagram of image page output device shown according to an exemplary embodiment.

Fig. 4 is a kind of block diagram of image page output device shown according to an exemplary embodiment.

Fig. 5 is a kind of block diagram of image page output device shown according to an exemplary embodiment.

Fig. 6 is a kind of block diagram of image page output device shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.

Fig. 1 is a kind of flow chart of image page output method shown according to an exemplary embodiment, as shown in Figure 1, This method is for including the following steps in image page output device.

In a step 101, the acquisition request to the page is received.

In a step 102, when the page includes image, content recognition is carried out to the image, obtains recognition result.

In step 103, according to the recognition result, the first text information that the content of the image is described is generated.

At step 104, the page is exported, which includes the first text information.

The method that the embodiment of the present disclosure provides, is receiving the acquisition request to the page and is judging that in the page include image When, content recognition is carried out to image, and the first text that the content of the image is described is generated according to obtained recognition result This information, exports later by the page including the first text information, due in the case where not downloading image, providing to working as The text description of each image in preceding browsing pages, therefore user can be helped tentatively to understand the content of image, to assist to use Family decides whether to open or skip over this image, intelligent more excellent, and saves network flow and wait the time of image downloading.

Optionally, according to recognition result, the first text information that the content of image is described is generated, comprising:

Obtain the page in associated second text information of image；

Recognition result is verified according to the second text information；

According to verification result and recognition result, the first text information that the content of image is described is generated.

Optionally, this method further include:

Obtain at least one pre-set shielding keyword；

If including any shielding keyword in recognition result, recognition result is filtered out；Or,

If the appearance ratio for shielding keyword in recognition result is more than the first preset threshold, recognition result is filtered out.

Optionally, before carrying out content recognition to image, this method further include:

Content recognition is carried out to image, obtains recognition result, comprising:

Image recognition is carried out to image using the first model；

When in the picture including any goal-selling object, the first keyword for describing the object in image is obtained.

Image is marked according to the second class and carries out model training, obtains the second model；

Image recognition is carried out to image using the second model；

When in the picture including any default scene, the second keyword for describing the scene in image is obtained.

Image is marked according to third class and carries out model training, obtains third model；

Image recognition is carried out to image using third model；

When in the picture including text, the third keyword for describing the text in image is obtained.

Optionally, the corresponding recognition confidence of each of recognition result keyword, according to the second text information pair Recognition result is verified, comprising:

Word segmentation processing is carried out to the second text information, obtains multiple participles；

For each of recognition result keyword, judge in multiple participles whether to include keyword；

If in multiple participles including keyword, increase the recognition confidence of keyword according to preset rules；

Wherein, recognition confidence is for characterizing the probability being correctly validated.

Optionally, according to verification result and recognition result, the first text information that the content of image is described is generated, Include:

Obtain the nominal key that recognition confidence in recognition result is greater than the second preset threshold；

Nominal key is formed into a sentence using RNN model, using sentence as the first text information.

All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination It repeats one by one.

Fig. 2 is a kind of flow chart of image page output method shown according to an exemplary embodiment, as shown in Fig. 2, This method is for including the following steps in image page output device.

In step 201, the acquisition request to the page is received, if provided with including figure in no figure browse mode and the page Picture then carries out content recognition to each image that the page includes, obtains recognition result.

The acquisition request of the page can be triggered by the clicking operation that user executes in the browser or application that terminal is installed. For example, user wants access to a certain portal website, to browse webpage, then can be arranged by clicking in browser for the portal website Link realize.Wherein, which may include webpage, message display page in etc., the embodiment of the present disclosure to this not into Row is specific to be limited.

In the embodiments of the present disclosure, when carrying out content recognition to image, i.e., before extracting content information in the picture, also Model first need to be established based on sample image, extract content information in the picture using trained model.Wherein, content information can For face, famous landmark or building, text, the scenes such as indoor and outdoor, sandy beach, meadow or snowfield, the animals such as cat, horse Deng the embodiment of the present disclosure is to this without specifically limiting.Wherein, the type based on content information is different, and it is big to particularly may be divided into 3 Class.One kind is object, such as face, the animals such as cat, horse, famous landmark or building etc.；One kind is text, for example road refers to Show the text etc. in board or trade company's brand mark board；One kind is scene, such as indoor and outdoor, sandy beach, meadow or snowfield etc..For In the picture above-mentioned 3 class content is identified well, the embodiment of the present disclosure can train 3 corresponding models, using this 3 A model identifies object, text and the scene in image, as follows in detail:

The first, object mark is carried out to the sample image that multiple include goal-selling object, obtains first kind mark image； Image is marked according to the first kind and carries out model training, obtains the first model；Image recognition is carried out to image using the first model, When including any goal-selling object in the image, the first keyword for describing the object in the image is obtained.

Multiple sample images are collected in network.When carrying out object mark, where the object in each image Use indicia framing by hand labeled in region.Multiple sample images can be divided into training dataset and test data set.Wherein, it tests It include a large amount of images comprising object in data set；Remaining picture construction training dataset.

When carrying out model training, CNN (Convolutional Neural Network, convolutional neural networks) can be trained Model.Firstly, for each sample image, multiple target object candidate areas are extracted in the sample image.Calculate multiple times The feature of each target object candidate area in favored area.Target complete phenology favored area is clustered according to this feature, is obtained To specifying number class.Initialize the parameters in CNN model；CNN model based on initialization calculates each candidate regions The classification in domain responds.It for each candidate region, is responded according to the classification of candidate region, determines the training of candidate region ownership Classification.It obtains in advance to the target substance markers result of sample image；According to target substance markers as a result, determining candidate region ownership Concrete class.According to training classification and concrete class, optimize the parameters in CNN model, until the error in classification of CNN model Less than preset threshold.

After obtaining trained CNN model, using the object of each image in the CNN model extraction page, obtain To the first keyword for describing object.For example, if in image including cat and dog etc. toys and little girl, first is crucial Word can be " cat, dog, girl ".

It should be noted that other than it can export the first keyword, can also when carrying out image recognition using CNN model Export the recognition confidence of each keyword, i.e. the probability that is correctly validated of object.For example, if in image including one Cat, if recognition result is " cat ", then recognition confidence is height, identification is correct.If recognition result is " people ", then identification confidence Spend low, identification mistake.

The second, scene mark is carried out to multiple sample images for including default scene, obtains the second class mark image；According to Second class marks image and carries out model training, obtains the second model；Image recognition is carried out to image using the second model, in the figure When including any default scene as in, the second keyword for describing the scene in the image is obtained.

Multiple sample images are collected in network.Scene areas when carrying out scene areas mark, in each image With indicia framing by hand labeled.Multiple sample images can be divided into training dataset and test data set.Wherein, test data Concentration may include a large amount of scene image；Remaining picture construction training dataset.When carrying out model training, CNN can be trained Model, the embodiment of the present disclosure is to this without specifically limiting.Specific CNN model training process can refer to above-mentioned steps model instruction Practice process and realizes that details are not described herein again.

After obtaining trained CNN model, using the scene of each image in the CNN model extraction page, obtain For describing the second keyword of scene.For example, if in image scene be sea and blue sky, the second keyword can for " sky, Sea ".It should be noted that when carrying out image recognition using CNN model, it, can also be defeated other than it can export the second keyword The probability that the recognition confidence of each keyword out, i.e. scene are correctly validated.For example, if scene is sea in image, if Recognition result is " sea ", then recognition confidence is height, identification is correct.If recognition result is " meadow ", then identification confidence Spend low, identification mistake.

Third carries out text marking to multiple sample images, obtains third class mark image；Image is marked according to third class Model training is carried out, third model is obtained；Image recognition is carried out to image using third model, in the images includes text When, obtain the third keyword for describing the text in the image.

Multiple sample images are collected in network.It is text filed in each image when carrying out text filed mark With indicia framing by hand labeled.Multiple sample images can be divided into training dataset and test data set.Wherein, test data Concentrating includes a large amount of text image；Remaining picture construction training dataset.When carrying out model training, CNN mould can be trained Type or support vector machine classifier, the embodiment of the present disclosure is to this without specifically limiting.Specific CNN model training process can join The realization of above-mentioned steps model training process is examined, details are not described herein again.

It in Training Support Vector Machines classifier, may be implemented in the following manner: for each sample image, obtaining should The training feature vector of sample image；In whole training feature vectors, the corresponding training feature vector of text image, root are determined According to the training feature vector of text image, optimize the parameters in SVM classifier.After obtaining trained third model, Using the text of each image in the third model extraction page, the third keyword for describing text is obtained.It needs It is bright, when carrying out image recognition using third model, other than it can export third keyword, it can also export each key The probability that the recognition confidence of word, i.e. scene are correctly validated.

In step 202, at least one pre-set shielding keyword is obtained；Judge in the recognition result whether include Shield keyword；If including any shielding keyword in the recognition result, following step 203 is executed；If in the recognition result Do not include shielding keyword, then executes following step 204.

Wherein, shielding keyword can be configured in advance by user, for automatically shielding to be used under page browsing mode The image that family does not want to see that.For example the star that does not like of the unsound image of content, user or scene etc., the disclosure are implemented Example is to this without specifically limiting.In the embodiments of the present disclosure, if a certain image includes that shielding is crucial in current browse webpage Word then filters out recognition result, does not carry out any description to this image.

In step 203, if including any shielding keyword in the recognition result, recognition result is filtered out.

In another embodiment, other than it aforesaid way can be taken to be filtered recognition result, the embodiment of the present disclosure Another filter type is additionally provided, it is as follows in detail: if the appearance ratio for shielding keyword in recognition result is more than first pre- If threshold value then filters out recognition result.Wherein, the first preset threshold can be 80% or 90% etc., the disclosure to this without It is specific to limit.For example, recognition result is " certain so-and-so, meadow, horse ", and shields keyword and contain above-mentioned 3 keywords, then directly It connects and falls the image filtering, without carrying out any description to this image under no figure browse mode, because user is to this figure Picture is not relevant for, or even is sick of.

Recognition result is filtered out namely the corresponding image of the recognition result carries out any description, directly loses the image It abandons, it is not shown under no figure browse mode.Wherein, the functional switch that image is shown is provided in browser.When opening When closing in the open state, browser is in image page output mode；When switch is in close state, browser is in nothing Under figure browse mode.

In step 204, it if not including shielding keyword in the recognition result, obtains associated with image in the page Second text information verifies the recognition result according to the second text information.

Wherein, the second text information is text relevant to the image in current browse webpage.For example, writing one at present Then when news, usually other than comprising content of text, also figure can be carried out for text content.So, this content of text Just it is and associated second text information of this figure.

In the embodiments of the present disclosure, it when being verified according to the second text information to the recognition result, can take following Mode is realized:

Word segmentation processing is carried out to the second text information, obtains multiple participles；For each of recognition result key Whether word judges in multiple participles to include the keyword；If including the keyword in multiple participles, increase according to preset rules The recognition confidence of the keyword.

Above-mentioned steps are explained in a simply example below.It is that " today is in certain sea with the second text information Certain so-and-so for a certain brand wrist-watch have taken magazine on beach, and including certain, so-and-so leads the picture of horse-ride step row at seabeach ".Then exist When carrying out word segmentation processing, can cutting be " today, certain, seabeach, it is upper, certain so-and-so, be, a certain, brand wrist-watch, shooting, picture Report, wherein, include, certain so-and-so, seabeach, lead, horse, walking, picture " etc. multiple participles, if being wrapped in the recognition result " certain so-and-so, meadow, horse, seashore " is included, since keyword in recognition result " certain so-and-so " and " horse " have been both present in multiple participles In, therefore the probability that the two keywords are correctly validated greatly improves, so increasing the recognition confidence of the two keywords. There is word as " seabeach " although this word does not fully appear in multiple participles in " seashore " in multiple participles Language, therefore also improve its recognition confidence.Wherein, preset rules can improve for the recognition confidence for the keyword that will be exactly matched 10% or 15%, the recognition confidence of the matched keyword in part is improved 5% or 10% etc., the embodiment of the present disclosure to this not Specifically limited.

In step 205, according to verification result and the recognition result, first that the content of the image is described is generated Text information.

In the embodiments of the present disclosure, according to verification result and the recognition result, the content of the image is retouched in generation When the first text information stated, it may be implemented in the following manner:

Obtain the nominal key that recognition confidence in recognition result is greater than the second preset threshold；It will be referred to using RNN model Determine keyword and form a sentence, using the sentence as the first text information.

Wherein, the second preset threshold can be 80% or 90% etc., and the embodiment of the present disclosure is to this without specifically limiting. RNN model is existing maturity model, the keyword of generation can be formed sentence, for example conjunction is added between keyword Deng easy to read, details are not described herein again.Each of recognition result keyword root is being ranked up according to recognition confidence Afterwards, the nominal key that recognition confidence is greater than the second preset threshold is selected.With nominal key for " certain so-and-so, horse, seashore " For, then nominal key can be formed to " so-and-so leads a horse by the sea for certain " of this sort sentence using RNN model, as First text information.

In step 206, output includes the page of the first text information.

Wherein, can show corresponding first text information of each image should show the position of this image in script, The embodiment of the present disclosure is to this without specifically limiting.Due to the presence of the first text information, user can be based on text description information Substantially picture material is understood, to choose whether to continue to download image or directly filter out image, is provided for user It is convenient.

The method that the embodiment of the present disclosure provides, is receiving the acquisition request to the page and is judging that in the page include image When, content recognition is carried out to image, and the first text that the content of the image is described is generated according to obtained recognition result This information, exports later by the page including the first text information, due in the case where not downloading image, providing to working as The text description of each image in preceding browsing pages, therefore user can be helped tentatively to understand the content of image, to assist to use Family decides whether to open or skip over this image, intelligent more excellent, and saves network flow and wait the time of image downloading.This Outside, also the image seen can be not intended to carry out shielding processing user by setting shielding keyword, further improves user Experience Degree.

Fig. 3 is a kind of block diagram of image page output device shown according to an exemplary embodiment.Referring to Fig. 3, the dress It sets including receiving module 301, identification module 302, generation module 303, output module 304.

Wherein, receiving module 301 are configured as receiving the acquisition request to the page；Identification module 302, is configured as When the page includes image, content recognition is carried out to image, obtains recognition result；Generation module 303, is configured as according to identification As a result, generating the first text information that the content of the image is described；Output module 304 is configured as exporting the page, The page includes the first text information.

Optionally, generation module 303 are configured as obtaining in current browse webpage and associated second text envelope of image Breath；Recognition result is verified according to the second text information；According to verification result and recognition result, the content to image is generated The first text information being described.

Referring to fig. 4, the device further include:

Module 305 is obtained, is configured as obtaining at least one pre-set shielding keyword；

Filtering module 306 is configured as filtering recognition result when in recognition result including any shielding keyword Fall；Or, recognition result is filtered out when the appearance ratio for shielding keyword in recognition result is more than the first preset threshold.

Referring to Fig. 5, the device further include:

Labeling module 307, be configured as include to multiple goal-selling object sample image carry out object mark, obtain The first kind marks image；

Training module 308 is configured as marking image progress model training according to the first kind, obtains the first model；

Identification module 302 is configured as carrying out image recognition to image using the first model, in the picture includes any pre- If when object, obtaining the first keyword for describing the object in image.

Optionally, the device further include:

Labeling module 307 is configured as including the sample image progress scene mark for presetting scene to multiple, obtains second Class marks image；

Training module 308 is configured as marking image progress model training according to the second class, obtains the second model；

Identification module 302 is configured as carrying out image recognition to image using the second model, in the picture includes any pre- If when scene, obtaining the second keyword for describing the scene in image.

Optionally, the device further include:

Labeling module 307 is configured as carrying out text marking to multiple sample images, obtains third class mark image；

Training module 308 is configured as marking image progress model training according to third class, obtains third model；

Identification module 302 is configured as carrying out image recognition to image using third model, in the picture includes text When, obtain the third keyword for describing the text in image.

Optionally, the corresponding recognition confidence of each of recognition result keyword, authentication module 303 are configured To carry out word segmentation processing to the second text information, multiple participles are obtained；For each of recognition result keyword, judge more It whether include keyword in a participle；If in multiple participles including keyword, increase the identification of keyword according to preset rules Confidence level；

Optionally, generation module 303 are configured as obtaining recognition confidence in recognition result and are greater than the second preset threshold Nominal key；Nominal key is formed into a sentence using RNN model, using sentence as the first text information.

The device that the embodiment of the present disclosure provides, is receiving the acquisition request to the page and is judging that in the page include image When, content recognition is carried out to image, and the first text that the content of the image is described is generated according to obtained recognition result This information, exports later by the page including the first text information, due in the case where not downloading image, providing to working as The text description of each image in preceding browsing pages, therefore user can be helped tentatively to understand the content of image, to assist to use Family decides whether to open or skip over this image, intelligent more excellent, and saves network flow and wait the time of image downloading.This Outside, also the image seen can be not intended to carry out shielding processing user by setting shielding keyword, further improves user Experience Degree.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Fig. 6 is a kind of device 600 for being configured as exporting image page shown according to an exemplary embodiment Block diagram.For example, device 600 can be mobile phone, computer, digital broadcasting terminal, messaging device, game control Platform, tablet device, Medical Devices, body-building equipment, personal digital assistant etc..

Referring to Fig. 6, device 600 may include following one or more components: processing component 602, memory 604, power supply Component 606, multimedia component 608, audio component 610, the interface 612 of I/O (Input/Output, input/output), sensor Component 614 and communication component 616.

The integrated operation of the usual control device 600 of processing component 602, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing component 602 may include that one or more processors 620 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 602 may include one or more modules, just Interaction between processing component 602 and other assemblies.For example, processing component 602 may include multi-media module, it is more to facilitate Interaction between media component 608 and processing component 602.

Memory 604 is configured as storing various types of data to support the operation in device 600.These data are shown Example includes being configured as the instruction of any application or method operated on device 600, contact data, telephone directory number According to, message, picture, video etc..Memory 604 can by any kind of volatibility or non-volatile memory device or they Combination realize, such as SRAM (Static Random Access Memory, static random access memory), EEPROM (Electrically-Erasable Programmable Read-Only Memory, the read-only storage of electrically erasable Device), EPROM (Erasable Programmable Read Only Memory, Erasable Programmable Read Only Memory EPROM), PROM (Programmable Read-Only Memory, programmable read only memory), and ROM (Read-Only Memory, it is read-only to deposit Reservoir), magnetic memory, flash memory, disk or CD.

Power supply module 606 provides electric power for the various assemblies of device 600.Power supply module 606 may include power management system System, one or more power supplys and other with for device 600 generate, manage, and distribute the associated component of electric power.

Multimedia component 608 includes the screen of one output interface of offer between described device 600 and user.One In a little embodiments, screen may include LCD (Liquid Crystal Display, liquid crystal display) and TP (Touch Panel, touch panel).If screen includes touch panel, screen may be implemented as touch screen, from the user to receive Input signal.Touch panel includes one or more touch sensors to sense the gesture on touch, slide, and touch panel.Institute The boundary of a touch or slide action can not only be sensed by stating touch sensor, but also be detected and the touch or slide phase The duration and pressure of pass.In some embodiments, multimedia component 608 includes that a front camera and/or postposition are taken the photograph As head.When device 600 is in operation mode, such as in a shooting mode or a video mode, front camera and/or rear camera can With the multi-medium data outside reception.Each front camera and rear camera can be a fixed optical lens system Or there are focusing and optical zoom capabilities.

Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a MIC (Microphone, microphone), when device 600 is in operation mode, such as call mode, recording mode, and voice recognition mode When, microphone is configured as receiving external audio signal.The received audio signal can be further stored in memory 604 Or it is sent via communication component 616.In some embodiments, audio component 610 further includes a loudspeaker, is configured as exporting Audio signal.

I/O interface 612 provides interface between processing component 602 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 614 includes one or more sensors, is configured as providing the state of various aspects for device 600 Assessment.For example, sensor module 614 can detecte the state that opens/closes of equipment 600, the relative positioning of component, such as group Part is the display and keypad of device 600, and sensor module 614 can be with 600 1 components of detection device 600 or device Position change, the existence or non-existence that user contacts with device 600, the temperature in 600 orientation of device or acceleration/deceleration and device 600 Degree variation.Sensor module 614 may include proximity sensor, be configured to detect without any physical contact attached The presence of nearly object.Sensor module 614 can also include optical sensor, such as CMOS (Complementary Metal Oxide Semiconductor, complementary metal oxide) or CCD (Charge-coupled Device, charge coupled cell) image biography Sensor is configured as using in imaging applications.In some embodiments, which can also include acceleration Sensor, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 616 is configured to facilitate the communication of wired or wireless way between device 600 and other equipment.Device 600 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 616 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 616 further includes that (Near Field Communication, near field are logical by NFC Letter) module, to promote short range communication.For example, RFID (Radio Frequency can be based in NFC module Identification, radio frequency identification) technology, IrDA (Infra-red Data Association, Infrared Data Association) skill Art, UWB (Ultra Wideband, ultra wide band) technology, BT (Bluetooth, bluetooth) technology and other technologies are realized.

In the exemplary embodiment, device 600 can be by one or more ASIC (Application Specific Integrated Circuit, application specific integrated circuit), DSP (Digital signal Processor, at digital signal Manage device), DSPD (Digital signal Processor Device, digital signal processing appts), PLD (Programmable Logic Device, programmable logic device), FPGA) (Field Programmable Gate Array, field programmable gate Array), controller, microcontroller, microprocessor or other electronic components realize, be configured as executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 604 of instruction, above-metioned instruction can be executed by the processor 620 of device 600 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, RAM (Random Access Memory, random access memory Device), CD-ROM (Compact Disc Read-Only Memory, compact disc read-only memory), tape, floppy disk and light data deposit Store up equipment etc..

Image page exports the non-transitorycomputer readable storage medium that the embodiment of the present disclosure provides, and is receiving to page The acquisition request in face and judge when the page includes image, content recognition is carried out to image, and according to obtained recognition result The first text information that the content of the image is described is generated, is exported later by the page including the first text information, Due in the case where not downloading image, providing the text description to each image in current browse webpage, therefore can help User is helped tentatively to understand the content of image, so that user be assisted to decide whether to open or skip over this image, intelligence is more excellent, and saves It has saved network flow and has waited the time of image downloading.In addition, also user can be not intended to see by setting shielding keyword Image carry out shielding processing, further the user experience is improved degree.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims

1. a kind of image page output method, which is characterized in that the described method includes:

Receive the acquisition request to the page；

The page is exported, the page includes first text information；

First text information that the content of described image is described according to the recognition result, generation, comprising: obtain In the page with associated second text information of described image；The recognition result is carried out according to second text information Verifying；According to verification result and the recognition result, the first text information that the content of described image is described is generated；

Wherein, the corresponding recognition confidence of each of described recognition result keyword, it is described according to second text Information verifies the recognition result, comprising:

Word segmentation processing is carried out to second text information, obtains multiple participles；For each of recognition result keyword, Judge in the multiple participle whether to include the keyword；If including the keyword in the multiple participle, according to pre- If rule increases the recognition confidence of the keyword；Wherein, the recognition confidence is for characterizing the probability being correctly validated.

2. the method according to claim 1, wherein the method also includes:

Obtain at least one pre-set shielding keyword；

If the appearance ratio for shielding keyword described in the recognition result is more than the first preset threshold, by the recognition result It filters out.

3. the method according to claim 1, wherein it is described to described image carry out content recognition before, it is described Method further include:

Image recognition is carried out to described image using first model；

When including any goal-selling object in described image, first for describing the object in described image is obtained Keyword.

4. the method according to claim 1, wherein it is described to described image carry out content recognition before, it is described Method further include:

Image recognition is carried out to described image using second model；

When including any default scene in described image, the second key for describing the scene in described image is obtained Word.

5. the method according to claim 1, wherein it is described to described image carry out content recognition before, it is described Method further include:

Image recognition is carried out to described image using the third model；

6. the method according to claim 1, wherein described according to verification result and the recognition result, generation The first text information that the content of described image is described, comprising:

The nominal key is formed into a sentence using Multi-Layer Feedback network RNN model, using the sentence as described the One text information.

7. a kind of image page output device, which is characterized in that described device includes:

Identification module is configured as when the page includes image, carries out content recognition to described image, obtains identification knot Fruit；

Generation module is configured as generating the first text that the content of described image is described according to the recognition result Information；

Output module is configured as exporting the page, and the page includes first text information；

The generation module is configured as obtaining in current browse webpage and associated second text information of described image；According to Second text information verifies the recognition result；According to verification result and the recognition result, generate to described The first text information that the content of image is described；

Wherein, the corresponding recognition confidence of each of described recognition result keyword, authentication module are configured as to institute It states the second text information and carries out word segmentation processing, obtain multiple participles；For each of recognition result keyword, described in judgement It whether include the keyword in multiple participles；If including the keyword in the multiple participle, increase according to preset rules The recognition confidence of the big keyword；Wherein, the recognition confidence is for characterizing the probability being correctly validated.

8. device according to claim 7, which is characterized in that described device further include:

Filtering module is configured as filtering the recognition result when in the recognition result including any shielding keyword Fall；Or, when the appearance ratio of shielding keyword described in the recognition result is more than the first preset threshold, by identification knot Fruit filters out.

9. device according to claim 7, which is characterized in that described device further include:

Labeling module, be configured as include to multiple goal-selling object sample image carry out object mark, obtain the first kind Mark image；

The identification module is configured as carrying out image recognition to described image using first model, in described image When including any goal-selling object, the first keyword for describing the object in described image is obtained.

10. device according to claim 7, which is characterized in that described device further include:

The identification module is configured as carrying out image recognition to described image using second model, in described image When including any default scene, the second keyword for describing the scene in described image is obtained.

11. device according to claim 7, which is characterized in that described device further include:

The identification module is configured as carrying out image recognition to described image using the third model, in described image When including text, the third keyword for describing the text in described image is obtained.

12. device according to claim 7, which is characterized in that the generation module is configured as obtaining the identification knot Recognition confidence is greater than the nominal key of the second preset threshold in fruit；It will be described specified using Multi-Layer Feedback network RNN model Keyword forms a sentence, using the sentence as first text information.

13. a kind of image page output device characterized by comprising

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to: receive to the acquisition request of the page；When the page includes image, to described Image carries out content recognition, obtains recognition result；The content of described image is described according to the recognition result, generation First text information；The page is exported, the page includes first text information；

14. a kind of computer readable storage medium, instruction is stored on the computer readable storage medium, which is characterized in that The step of method described in any one of claims 1-6 is realized when described instruction is executed by processor.