CN110489674B - Page processing method, device and equipment - Google Patents

Page processing method, device and equipment Download PDF

Info

Publication number
CN110489674B
CN110489674B CN201910591159.8A CN201910591159A CN110489674B CN 110489674 B CN110489674 B CN 110489674B CN 201910591159 A CN201910591159 A CN 201910591159A CN 110489674 B CN110489674 B CN 110489674B
Authority
CN
China
Prior art keywords
image
information
acquiring
determining
introduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910591159.8A
Other languages
Chinese (zh)
Other versions
CN110489674A (en
Inventor
王群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910591159.8A priority Critical patent/CN110489674B/en
Publication of CN110489674A publication Critical patent/CN110489674A/en
Application granted granted Critical
Publication of CN110489674B publication Critical patent/CN110489674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Abstract

The embodiment of the invention provides a page processing method, a device and equipment, wherein the method comprises the following steps: acquiring a first image to be processed in a current page; acquiring image information of the first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image; according to the image information and the text information, determining introduction information of the first image, and playing the introduction information in a voice mode. The reliability of page processing is improved.

Description

Page processing method, device and equipment
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a page processing method, device and equipment.
Background
With the development of internet technology, the content in the internet is more and more abundant, and more users use the internet. For example, more and more users of low age and advanced age start using the internet.
In order to facilitate the user to conveniently obtain the content in the internet, when the user browses the page in the internet, the content in the page can be played in a voice mode. At present, text information in a page is converted into voice information, and the voice information is played, and when an image appears in the page, the image is skipped over, and the voice information corresponding to other text information is continuously played. However, when the images in the page include important information, the user cannot acquire the complete page content, resulting in low reliability of page processing.
Disclosure of Invention
The embodiment of the invention provides a page processing method, a page processing device and page processing equipment, which are used for improving the reliability of page processing.
In a first aspect, an embodiment of the present invention provides a page processing method, including:
acquiring a first image to be processed in a current page;
acquiring image information of the first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image;
according to the image information and the text information, determining introduction information of the first image, and playing the introduction information in a voice mode.
In a possible implementation manner, the determining, according to the image information and the text information, introduction information of the first image includes:
acquiring context information of the first image;
and determining the introduction information according to the context information, the image information and the text information.
In a possible implementation, the determining the introduction information according to the context information, the image information, and the text information includes:
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information;
and determining the introduction information according to the target text information, the image keywords and the image information, wherein the introduction information comprises the target text information, the image keywords and the image information.
In one possible embodiment, determining target text information in the text information according to the context information and the image information includes:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
In a possible implementation manner, the determining the introduction information according to the target text information, the image keyword and the image information includes:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
In a possible implementation manner, the padding the target text information, the image keyword, and the image information to the at least one information padding bit to obtain the introduction information includes:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
In a possible embodiment, the acquiring the image information of the first image includes:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
In a possible implementation, the acquiring a first image to be processed in a current page includes:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
In a possible implementation, the acquiring a first image to be processed in a current page includes:
receiving a voice playing operation input by a user on the first image, wherein the voice playing operation is used for indicating voice to play the content in the first image;
and acquiring the first image according to the voice playing operation.
In a possible implementation manner, after receiving a voice playing operation of the first image output by a user, the method further includes:
acquiring position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the acquiring of the image information of the first image includes:
acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; alternatively, the first and second electrodes may be,
and acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
In a second aspect, an embodiment of the present invention provides a page processing apparatus, including a first obtaining module, a second obtaining module, an extracting module, a first determining module, and a playing module,
the first acquisition module is used for acquiring a first image to be processed in a current page;
the second acquisition module is used for acquiring the image information of the first image;
the extraction module is used for extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image;
the first determining module is used for determining introduction information of the first image according to the image information and the text information;
the playing module is used for playing the introduction information in a voice mode.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring context information of the first image;
and determining the introduction information according to the context information, the image information and the text information.
In a possible implementation manner, the first determining module is specifically configured to:
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information;
and determining the introduction information according to the target text information, the image keywords and the image information, wherein the introduction information comprises the target text information, the image keywords and the image information.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
In a possible implementation manner, the second obtaining module is specifically configured to:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
In a possible implementation manner, the first obtaining module is specifically configured to:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
In a possible implementation manner, the first obtaining module is specifically configured to:
receiving a voice playing operation input by a user on the first image, wherein the voice playing operation is used for indicating voice to play the content in the first image;
and acquiring the first image according to the voice playing operation.
In one possible embodiment, the apparatus further comprises a second determining module, wherein,
the second determining module is configured to, after the first determining module receives a voice playing operation output by a user on the first image, acquire position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the first obtaining module is specifically configured to: acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; or acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
In a third aspect, an embodiment of the present invention provides a page processing apparatus, including: a processor coupled with a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory to enable the terminal device to perform the method of any of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a readable storage medium, which includes a program or instructions, and when the program or instructions are run on a computer, the method according to any one of the first aspect is performed.
According to the page processing method, the page processing device and the page processing equipment, the first image to be processed is obtained from the current page; acquiring image information of a first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image; according to the image information and the text information, the introduction information of the first image is determined, and the introduction information is played in a voice mode. In the process, the first image in the page can be processed, the text information can be extracted from the first image, so that the introduction information corresponding to the first image can be obtained, and the introduction information corresponding to the first image can be played in a voice mode.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of a page processing method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a page processing method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a method for determining introductory information according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a page according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another page provided by the embodiment of the present invention;
fig. 6 is a schematic structural diagram of a page processing apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of another page processing apparatus according to an embodiment of the present invention;
fig. 8 is a schematic hardware structure diagram of a page processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic view of an application scenario of a page processing method according to an embodiment of the present invention. Referring to fig. 1, the electronic device may display a page, which may include text, pictures, and the like. In the process of displaying the page by the electronic device, the electronic device may play content in the page in a voice manner, for example, the electronic device may convert text information in the page into voice information and play voice information corresponding to the text information, and the electronic device may further process an image in the page to obtain voice information corresponding to the image and play voice information corresponding to the image.
In the application, if the page includes the image, the image can be processed, and the text information is extracted from the image to obtain the introduction information corresponding to the image, and the introduction information corresponding to the image is played in a voice mode.
The technical means shown in the present application will be described in detail below with reference to specific examples. It should be noted that the following embodiments may be combined with each other, and the description of the same or similar contents in different embodiments is not repeated.
Fig. 2 is a schematic flowchart of a page processing method according to an embodiment of the present invention. Referring to fig. 2, the method may include:
s201, acquiring a first image to be processed in a current page.
The execution main body of the embodiment of the invention can be electronic equipment, and can also be a page processing device arranged in the electronic equipment. Optionally, the electronic device may be a mobile phone, a computer, or the like. Alternatively, the page processing device may be implemented by software, or may be implemented by a combination of software and hardware.
Optionally, the current page is a page currently displayed by the electronic device. The current page includes at least the first image, and of course, the current page may include other images, for example, text and/or other images.
Optionally, the electronic device may have a plurality of page reading modes, and the page reading mode may include a first page reading mode and a second page reading mode. The first page reading mode is to automatically read all page contents in a page. The second page reading mode is to read part of page contents in a page under the trigger of a user.
For example, referring to fig. 1, when the page reading mode of the electronic device is the first page reading mode, after the electronic device displays the page shown in fig. 1, the electronic device starts to play the text content "great wall, which is an ancient military defense project in china, is a tall, firm and continuous long wall to limit the actions of enemy riding. The great wall is not a pure isolated city wall, but a defense system which takes the city wall as a main body and is combined with a large number of cities, obstacles, pavilions and marks, after the text content is played by the electronic equipment in a voice mode, the electronic equipment acquires images in a page, acquires introduction information of the images and plays the introduction information of the images in the voice mode.
For example, referring to fig. 1, when the page reading mode of the electronic device is the second page reading mode, after the electronic device displays the page shown in fig. 1, if the user inputs a voice playing operation to an image in the page, the electronic device acquires the image and the introduction information of the image, and plays the introduction information of the image in a voice mode.
In the actual application process, a user can set a page reading mode of the electronic equipment according to actual needs. For example, the electronic device may include an icon of the first page reading mode and an icon of the second page reading mode, and the user may select the corresponding icon to set the page reading mode of the electronic device.
Optionally, the electronic device may obtain the first image to be processed in the current page through at least two possible implementation manners:
one possible implementation is: the reading mode of the electronic equipment is a first page reading mode.
And acquiring the voice playing progress of the current page, and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as a first image.
For example, referring to fig. 1, after the electronic device finishes reading the text information in the page, the next processing object corresponding to the voice playing progress of the electronic device is an image in the page, and the electronic device acquires the image in the page.
Another possible implementation: the reading mode of the electronic equipment is a second page reading mode.
Receiving voice playing operation input by a user on the first image, wherein the voice playing operation is used for indicating voice to play the content in the first image; and acquiring a first image according to the voice playing operation.
Optionally, the voice playing operation may be a long press operation, a double click operation, or the like.
For example, referring to fig. 1, after the electronic device displays the page shown in fig. 1, if the user needs the electronic device to play the content of the image in the page in the voice mode, the user may input a voice playing operation to the image in the page. After the electronic device obtains the voice playing operation input by the user, the electronic device may obtain the image in the page.
Optionally, when the electronic device acquires the first image, the electronic device may acquire an address of the first image, and acquire the first image according to the address of the first image. For example, the address of the first image may be a Uniform Resource Locator (URL) of the first image.
Optionally, when the electronic device acquires the first image, the electronic device may intercept the first image in the current page. For example, the electronic device may obtain a page image corresponding to the current page, and intercept the first image in the page image, where the page image corresponding to the current page includes all content in the current page.
S202, acquiring image information of the first image, and extracting text information from the first image.
Wherein the image information comprises an object class of an object displayed by the first image.
Alternatively, the electronic device may obtain the object class of the object displayed in the first image through the recognition model, for example, the first image may be input to the recognition model, so that the recognition model outputs the object class of the object displayed in the first image.
Optionally, the neural network may be trained through multiple groups of sample data to obtain the recognition model, where each group of sample data includes the sample image and the corresponding sample object class in the sample image.
For example, if the first image includes an airplane, the object category of the object displayed in the first image is the airplane, and the image information of the first image includes the airplane. Assuming that the first image includes a monkey, the object category of the object displayed by the first image is the monkey, and the image information of the first image includes the monkey.
Alternatively, the textual information may be extracted in the first image by OCR technology. The textual information extracted in the first image includes all textual information displayed in the first image, e.g., the textual information extracted in the first image includes an image introduction, a watermark, an advertisement, etc. in the first image.
Optionally, when the reading mode of the electronic device is the second page reading mode, an image reading mode may be further set, where the image reading mode includes an overall reading mode and a local reading mode.
When the image reading mode is the integral reading mode, after a user inputs voice playing operation to one image, the electronic equipment plays the content in the integral image in a voice mode.
When the image reading mode is the partial reading mode, after a user inputs a voice playing operation to one image, the electronic device may acquire position information of the voice playing operation in the first image, and determine a partial image in the first image according to the position information, the electronic device may play content in the partial image in a voice manner, and correspondingly, the image information of the first image includes a partial object category of a partial object displayed by the partial image. In this reading mode, in order to enable more accurate playback of the content in the partial image, when acquiring the image information of the first image, the object category of the object displayed by the first image and the partial object category of the partial object displayed by the partial image may be referred to, and accordingly, the object category and the partial object category of the object displayed by the first image are included in the image information of the first image.
For example, assuming that the first image is a face image, and assuming that a user inputs a voice playing operation to eyes in the face image, the acquired image information of the first image includes a face and eyes.
And S203, determining introduction information of the first image according to the image information and the text information.
Optionally, context information of the first image may also be obtained, and the introduction information may be determined according to the context information, the image information, and the text information. The context information of the first image may be obtained in the current page, or the context information of the first image may be obtained in the previous page or the next page of the current page.
It should be noted that, in the embodiment shown in fig. 3, a process of determining the introduction information of the first image is described, and details are not described here again.
And S204, playing introduction information by voice.
Optionally, the introduction information of the text type may be obtained first, then the introduction information of the text type may be converted into the voice information, and the voice information may be played.
According to the page processing method provided by the embodiment of the invention, a first image to be processed is acquired in a current page; acquiring image information of a first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image; according to the image information and the text information, the introduction information of the first image is determined, and the introduction information is played in a voice mode. In the process, the first image in the page can be processed, the text information can be extracted from the first image, so that the introduction information corresponding to the first image can be obtained, and the introduction information corresponding to the first image can be played in a voice mode.
On the basis of any of the above embodiments, the following describes a process of determining the introduction information by the embodiment shown in fig. 3.
Fig. 3 is a flowchart illustrating a method for determining introduction information according to an embodiment of the present invention. Referring to fig. 3, the method may include:
s301, determining target text information in the text information according to the context information and the image information.
Optionally, the text information includes a plurality of entries. The terms in the text information may be text extracted at different locations of the first image. One entry includes at least one character, and text properties of each character in one entry are the same, for example, the text properties may include font, size, color, font special effect, and the like.
Alternatively, the target text information may be determined in the text information by the following feasible implementation manners: and acquiring the matching degree of each entry in the text information with the context information and the image information, and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value, as the entry in the target text information. In this way, text that is not related to the image content can be filtered out from the text information, for example, watermarks, advertisements, etc. can be filtered out from the text information.
Optionally, first semantic information corresponding to the context information and the image information may be obtained, second speech information of the vocabulary entry may be obtained, and the matching degree between the vocabulary entry in the text information and the context information and the image information may be determined according to the matching degree between the first semantic information and the second semantic information.
S302, image keywords are determined in the context information.
Optionally, semantic analysis may be performed on the context information to determine image keywords in the context information.
Alternatively, the image keyword may be a word describing the first image. For example, assuming that the first image is an airplane, the image keywords may include: the model of the aircraft, the color of the aircraft, the manufacturer of the aircraft, etc. Assuming that the first image is a human face, the image keywords may include: gender of the person, name of the person, approximate age of the person, state of the face (smiling, crying, etc.).
S303, obtaining an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit.
The fixed information refers to the fixed information carried by the image introduction template. The information filling bits are used for filling information related to the image.
Alternatively, the image introduction template may be as follows:
next, an image in a page is described, in which the content is (information fill bit 1 (image information)), (information fill bit 2 (image information)) is characterized by (information fill bit 3 (image keyword)), (information fill bit 4 (image information)) is written with characters (information fill bit 5 (target text information)).
In the above image introduction template, the fixed information includes: "hereinafter, an image in a page is described, and the contents" and "the features" in the image "are" and "characters are described". The image introduction template comprises 5 information filling bits.
It should be noted that, the above description illustrates only one image introduction template in an exemplary form, and the image introduction template may also be other, which is not specifically limited in this embodiment of the present invention.
S304, filling the target text information, the image keywords and the image information to at least one information filling position to obtain introduction information.
Optionally, the introduction information includes target text information, image keywords, and image information.
Optionally, the target text information, the image keywords, and the image information may be filled in at least one information filling bit by the following feasible implementation manners to obtain the introduction information: acquiring an information type corresponding to each information filling bit, an information type of target text information and an information type of image keywords; respectively determining information filling positions corresponding to the target text information, the image keywords and the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords; and respectively filling the target text information, the image keywords and the image information to corresponding information filling positions to obtain introduction information.
In the embodiment shown in fig. 3, the introduction information of the first image is related to the target text information, the context information and the first image in the first image, and thus the first image can be accurately described by the introduction information.
The following describes a page processing procedure by specific examples with reference to fig. 4 to 5.
Fig. 4 is a schematic diagram of a page according to an embodiment of the present invention. Referring to fig. 4, an interface 401 and an interface 402 are included, wherein,
referring to the interface 401, the interface 401 sets a page for a page reading mode, including two page reading modes (a first page reading mode and a second page reading mode) and a check box corresponding to each page reading mode, where only the check box corresponding to one page reading mode can be selected at the same time. Assuming that the user selects the first page reading mode, the electronic device sets the page reading mode of the electronic device to the first page reading mode.
Referring to the interface 402, after the electronic device displays the page shown in the interface 402, the electronic device plays the text information in the page, acquires the introduction information of the image in the page after the text information is played, and plays the introduction information of the image in voice.
In the embodiment shown in fig. 4, when the page reading mode of the electronic device is the first page reading mode, the electronic device may play all contents (including introduction information of the image) in the page in a voice manner, so that the user may obtain the related contents of the image in the page in a voice manner, thereby improving the reliability of page processing.
Fig. 5 is another schematic page diagram provided in the embodiment of the present invention. Referring to fig. 5, an interface 501 and an interface 502 are included, wherein,
referring to the interface 501, the interface 501 sets a page for a page reading mode, including two page reading modes (a first page reading mode and a second page reading mode) and a check box corresponding to each page reading mode, where only the check box corresponding to one page reading mode can be selected at the same time. And if the user selects the second page reading mode, the electronic equipment sets the page reading mode of the electronic equipment to be the second page reading mode. After the user selects the second page reading mode, the electronic device displays interface 502.
Referring to the interface 502, the interface 502 sets a page for image reading modes, which includes two image reading modes (an overall reading mode and a local reading mode) and a check box corresponding to each image reading mode, wherein only the check box corresponding to one image reading mode can be selected at the same time. Assuming that the user selects the local reading mode, the electronic device sets the image reading mode of the electronic device to the local reading mode.
Referring to the interface 503, after the electronic device displays the page shown in the interface 503, when the user needs the electronic device to play the content in the image, the user inputs a voice playing operation (for example, a long-press operation) to the image, the electronic device determines a local image (kiosk) according to the position of the voice playing operation input by the user, and the introduction information of the image determined by the electronic device includes the image and the introduction information of the local image, so that the user can know the information of the local image in detail.
In the embodiment shown in fig. 4, when the page reading mode of the electronic device is the second page reading mode, under the trigger of the user, the electronic device may play the introduction information of the image in the page in a voice manner, so that the user may obtain the relevant content of the image in the page in a voice manner, and the reliability of page processing is improved.
Fig. 6 is a schematic structural diagram of a page processing apparatus according to an embodiment of the present invention. Referring to fig. 6, the page processing apparatus 10 may include a first obtaining module 11, a second obtaining module 12, an extracting module 13, a first determining module 14 and a playing module 15, wherein,
the first obtaining module 11 is configured to obtain a first image to be processed in a current page;
the second obtaining module 12 is configured to obtain image information of the first image;
the extracting module 13 is configured to extract text information from the first image, where the image information includes an object category of an object displayed in the first image;
the first determining module 14 is configured to determine introduction information of the first image according to the image information and the text information;
the playing module 15 is configured to play the introduction information in voice.
The page processing apparatus provided in the embodiment of the present invention may execute the basic scheme shown in the foregoing method embodiment, and its implementation principle and beneficial effect are similar, which are not described herein again.
In a possible implementation, the first determining module 14 is specifically configured to:
acquiring context information of the first image;
and determining the introduction information according to the context information, the image information and the text information.
In a possible implementation, the first determining module 14 is specifically configured to:
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information;
and determining the introduction information according to the target text information, the image keywords and the image information, wherein the introduction information comprises the target text information, the image keywords and the image information.
In a possible implementation, the first determining module 14 is specifically configured to:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
In a possible implementation, the first determining module 14 is specifically configured to:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
In a possible implementation, the first determining module 14 is specifically configured to:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
In a possible implementation manner, the second obtaining module 12 is specifically configured to:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
In a possible implementation manner, the first obtaining module 11 is specifically configured to:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
In a possible implementation manner, the first obtaining module 11 is specifically configured to:
receiving a voice playing operation input by a user on the first image, wherein the voice playing operation is used for indicating voice to play the content in the first image;
and acquiring the first image according to the voice playing operation.
Fig. 7 is a schematic structural diagram of another page processing apparatus according to an embodiment of the present invention. In addition to the embodiment shown in fig. 6, referring to fig. 7, the page processing apparatus 10 further includes a second determining module 16, wherein,
the second determining module 16 is configured to, after the first determining module 14 receives a voice playing operation output by a user on the first image, acquire position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the first obtaining module 11 is specifically configured to: acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; or acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
The page processing apparatus provided in the embodiment of the present invention may execute the basic scheme shown in the foregoing method embodiment, and its implementation principle and beneficial effect are similar, which are not described herein again.
Fig. 8 is a schematic diagram of a hardware structure of a page processing apparatus according to an embodiment of the present invention, and as shown in fig. 8, the page processing apparatus 20 includes: at least one processor 21 and a memory 22. The processor 21 and the memory 22 are connected by a bus 23.
In a specific implementation, the at least one processor 21 executes computer-executable instructions stored in the memory 22, so that the at least one processor 21 performs the above page processing method.
For a specific implementation process of the processor 21, reference may be made to the above method embodiments, which implement similar principles and technical effects, and this embodiment is not described herein again.
In the embodiment shown in fig. 8, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The present application also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the page processing method as described above is implemented.
The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.
The division of the units is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (14)

1. A page processing method is characterized by comprising the following steps:
acquiring a first image to be processed in a current page;
acquiring image information of the first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image;
according to the image information and the text information, determining introduction information of the first image, and playing the introduction information in a voice mode;
the determining the introduction information of the first image according to the image information and the text information includes:
acquiring context information of the first image;
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information, wherein the image keyword is a vocabulary describing the first image;
determining the introduction information according to the target text information, the image keywords and the image information;
the determining of the image keyword in the context information includes:
performing semantic analysis on the context information to determine image keywords in the context information;
the acquiring a first image to be processed in a current page includes:
receiving interactive operation input by a user on the first image, wherein the interactive operation is used for indicating voice to play the content in the first image;
acquiring the first image according to the position of the interactive operation; alternatively, the first and second electrodes may be,
the acquiring a first image to be processed in a current page includes:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
2. The method of claim 1, wherein determining target text information in the text information based on the context information and the image information comprises:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
3. The method of claim 1 or 2, wherein the determining the introduction information according to the target text information, the image keyword and the image information comprises:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
4. The method of claim 3, wherein the padding the target text information, the image keyword, and the image information to the at least one information padding bit to obtain the introduction information comprises:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
5. The method of claim 1, wherein the obtaining image information for the first image comprises:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
6. The method of claim 1, wherein after receiving a voice playback operation of the first image output by a user, further comprising:
acquiring position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the acquiring of the image information of the first image includes:
acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; alternatively, the first and second electrodes may be,
and acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
7. A page processing device is characterized by comprising a first acquisition module, a second acquisition module, an extraction module, a first determination module and a playing module, wherein,
the first acquisition module is used for acquiring a first image to be processed in a current page;
the second acquisition module is used for acquiring the image information of the first image;
the extraction module is used for extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image;
the first determining module is used for determining introduction information of the first image according to the image information and the text information;
the playing module is used for playing the introduction information in a voice mode;
the first determining module is specifically configured to:
acquiring context information of the first image;
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information, wherein the image keyword is a vocabulary describing the first image;
determining the introduction information according to the target text information, the image keywords and the image information;
the first determination module is further to:
performing semantic analysis on the context information to determine image keywords in the context information;
the first obtaining module is specifically configured to:
receiving interactive operation input by a user on the first image, wherein the interactive operation is used for indicating voice to play the content in the first image;
acquiring the first image according to the position of the interactive operation; alternatively, the first and second electrodes may be,
the first obtaining module is specifically configured to:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
8. The apparatus of claim 7, wherein the first determining module is specifically configured to:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
9. The apparatus of claim 7 or 8, wherein the first determining module is specifically configured to:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
10. The apparatus of claim 8, wherein the first determining module is specifically configured to:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
11. The apparatus of claim 7, wherein the second obtaining module is specifically configured to:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
12. The apparatus of claim 7, further comprising a second determination module, wherein,
the second determining module is configured to, after the first determining module receives a voice playing operation output by a user on the first image, acquire position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the first obtaining module is specifically configured to: acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; or acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
13. A page processing apparatus, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the memory-stored computer-executable instructions causes the at least one processor to perform the page processing method of any of claims 1 to 6.
14. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the page processing method of any one of claims 1 to 6.
CN201910591159.8A 2019-07-02 2019-07-02 Page processing method, device and equipment Active CN110489674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910591159.8A CN110489674B (en) 2019-07-02 2019-07-02 Page processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910591159.8A CN110489674B (en) 2019-07-02 2019-07-02 Page processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN110489674A CN110489674A (en) 2019-11-22
CN110489674B true CN110489674B (en) 2020-11-06

Family

ID=68546650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910591159.8A Active CN110489674B (en) 2019-07-02 2019-07-02 Page processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN110489674B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062207B (en) * 2019-12-03 2023-01-24 腾讯科技(深圳)有限公司 Expression image processing method and device, computer storage medium and electronic equipment
CN113132781B (en) * 2019-12-31 2023-04-18 阿里巴巴集团控股有限公司 Video generation method and apparatus, electronic device, and computer-readable storage medium
CN113112984A (en) * 2020-01-13 2021-07-13 百度在线网络技术(北京)有限公司 Control method, device and equipment of intelligent sound box and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615440A (en) * 2015-02-13 2015-05-13 联想(北京)有限公司 Information processing method and electronic device
CN105512220A (en) * 2015-11-30 2016-04-20 小米科技有限责任公司 Image page output method and device
CN108182432A (en) * 2017-12-28 2018-06-19 北京百度网讯科技有限公司 Information processing method and device
CN108538300A (en) * 2018-02-27 2018-09-14 科大讯飞股份有限公司 Sound control method and device, storage medium, electronic equipment
CN109348254A (en) * 2018-09-30 2019-02-15 武汉斗鱼网络科技有限公司 Information push method, device, computer equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7437368B1 (en) * 2005-07-05 2008-10-14 Chitika, Inc. Method and system for interactive product merchandizing
JP2008191879A (en) * 2007-02-02 2008-08-21 Sharp Corp Information display device, display method for information display device, information display program, and recording medium with information display program recorded
US8346623B2 (en) * 2010-08-06 2013-01-01 Cbs Interactive Inc. System and method for navigating a collection of editorial content
US8579187B2 (en) * 2012-02-28 2013-11-12 Ebay Inc. System and method to identify machine-readable codes
CN105824819A (en) * 2015-01-05 2016-08-03 阿里巴巴集团控股有限公司 Image loading method, device and electronic device
CN106169065B (en) * 2016-06-30 2019-12-24 联想(北京)有限公司 Information processing method and electronic equipment
CN108495185B (en) * 2018-03-14 2021-04-16 北京奇艺世纪科技有限公司 Video title generation method and device
CN109167939B (en) * 2018-08-08 2021-07-16 成都西纬科技有限公司 Automatic text collocation method and device and computer storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615440A (en) * 2015-02-13 2015-05-13 联想(北京)有限公司 Information processing method and electronic device
CN105512220A (en) * 2015-11-30 2016-04-20 小米科技有限责任公司 Image page output method and device
CN108182432A (en) * 2017-12-28 2018-06-19 北京百度网讯科技有限公司 Information processing method and device
CN108538300A (en) * 2018-02-27 2018-09-14 科大讯飞股份有限公司 Sound control method and device, storage medium, electronic equipment
CN109348254A (en) * 2018-09-30 2019-02-15 武汉斗鱼网络科技有限公司 Information push method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110489674A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
CN110020437B (en) Emotion analysis and visualization method combining video and barrage
CN110489674B (en) Page processing method, device and equipment
CN107958030B (en) Video cover recommendation model optimization method and device
US9613268B2 (en) Processing of images during assessment of suitability of books for conversion to audio format
US20170004374A1 (en) Methods and systems for detecting and recognizing text from images
CN109558513B (en) Content recommendation method, device, terminal and storage medium
CN111241340B (en) Video tag determining method, device, terminal and storage medium
CN110334292B (en) Page processing method, device and equipment
CN107608618B (en) Interaction method and device for wearable equipment and wearable equipment
CN114401431B (en) Virtual person explanation video generation method and related device
CN106571072A (en) Method for realizing children education card based on AR
CN111178056A (en) Deep learning based file generation method and device and electronic equipment
CN111737961B (en) Method and device for generating story, computer equipment and medium
CN111090817A (en) Method for displaying book extension information, electronic equipment and computer storage medium
CN114390220A (en) Animation video generation method and related device
CN108847066A (en) A kind of content of courses reminding method, device, server and storage medium
CN110187816B (en) Automatic page turning method for cartoon type electronic book, computing device and storage medium
CN111542817A (en) Information processing device, video search method, generation method, and program
CN115393865A (en) Character retrieval method, character retrieval equipment and computer-readable storage medium
CN112114770A (en) Interface guiding method, device and equipment based on voice interaction
CN111582281B (en) Picture display optimization method and device, electronic equipment and storage medium
CN110428668B (en) Data extraction method and device, computer system and readable storage medium
CN110970030A (en) Voice recognition conversion method and system
CN114691853A (en) Sentence recommendation method, device and equipment and computer readable storage medium
CN113673414B (en) Bullet screen generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant