CN110489674B - Page processing method, device and equipment - Google Patents
Page processing method, device and equipment Download PDFInfo
- Publication number
- CN110489674B CN110489674B CN201910591159.8A CN201910591159A CN110489674B CN 110489674 B CN110489674 B CN 110489674B CN 201910591159 A CN201910591159 A CN 201910591159A CN 110489674 B CN110489674 B CN 110489674B
- Authority
- CN
- China
- Prior art keywords
- image
- information
- acquiring
- determining
- introduction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
Abstract
The embodiment of the invention provides a page processing method, a device and equipment, wherein the method comprises the following steps: acquiring a first image to be processed in a current page; acquiring image information of the first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image; according to the image information and the text information, determining introduction information of the first image, and playing the introduction information in a voice mode. The reliability of page processing is improved.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a page processing method, device and equipment.
Background
With the development of internet technology, the content in the internet is more and more abundant, and more users use the internet. For example, more and more users of low age and advanced age start using the internet.
In order to facilitate the user to conveniently obtain the content in the internet, when the user browses the page in the internet, the content in the page can be played in a voice mode. At present, text information in a page is converted into voice information, and the voice information is played, and when an image appears in the page, the image is skipped over, and the voice information corresponding to other text information is continuously played. However, when the images in the page include important information, the user cannot acquire the complete page content, resulting in low reliability of page processing.
Disclosure of Invention
The embodiment of the invention provides a page processing method, a page processing device and page processing equipment, which are used for improving the reliability of page processing.
In a first aspect, an embodiment of the present invention provides a page processing method, including:
acquiring a first image to be processed in a current page;
acquiring image information of the first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image;
according to the image information and the text information, determining introduction information of the first image, and playing the introduction information in a voice mode.
In a possible implementation manner, the determining, according to the image information and the text information, introduction information of the first image includes:
acquiring context information of the first image;
and determining the introduction information according to the context information, the image information and the text information.
In a possible implementation, the determining the introduction information according to the context information, the image information, and the text information includes:
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information;
and determining the introduction information according to the target text information, the image keywords and the image information, wherein the introduction information comprises the target text information, the image keywords and the image information.
In one possible embodiment, determining target text information in the text information according to the context information and the image information includes:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
In a possible implementation manner, the determining the introduction information according to the target text information, the image keyword and the image information includes:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
In a possible implementation manner, the padding the target text information, the image keyword, and the image information to the at least one information padding bit to obtain the introduction information includes:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
In a possible embodiment, the acquiring the image information of the first image includes:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
In a possible implementation, the acquiring a first image to be processed in a current page includes:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
In a possible implementation, the acquiring a first image to be processed in a current page includes:
receiving a voice playing operation input by a user on the first image, wherein the voice playing operation is used for indicating voice to play the content in the first image;
and acquiring the first image according to the voice playing operation.
In a possible implementation manner, after receiving a voice playing operation of the first image output by a user, the method further includes:
acquiring position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the acquiring of the image information of the first image includes:
acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; alternatively, the first and second electrodes may be,
and acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
In a second aspect, an embodiment of the present invention provides a page processing apparatus, including a first obtaining module, a second obtaining module, an extracting module, a first determining module, and a playing module,
the first acquisition module is used for acquiring a first image to be processed in a current page;
the second acquisition module is used for acquiring the image information of the first image;
the extraction module is used for extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image;
the first determining module is used for determining introduction information of the first image according to the image information and the text information;
the playing module is used for playing the introduction information in a voice mode.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring context information of the first image;
and determining the introduction information according to the context information, the image information and the text information.
In a possible implementation manner, the first determining module is specifically configured to:
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information;
and determining the introduction information according to the target text information, the image keywords and the image information, wherein the introduction information comprises the target text information, the image keywords and the image information.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
In a possible implementation manner, the first determining module is specifically configured to:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
In a possible implementation manner, the second obtaining module is specifically configured to:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
In a possible implementation manner, the first obtaining module is specifically configured to:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
In a possible implementation manner, the first obtaining module is specifically configured to:
receiving a voice playing operation input by a user on the first image, wherein the voice playing operation is used for indicating voice to play the content in the first image;
and acquiring the first image according to the voice playing operation.
In one possible embodiment, the apparatus further comprises a second determining module, wherein,
the second determining module is configured to, after the first determining module receives a voice playing operation output by a user on the first image, acquire position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the first obtaining module is specifically configured to: acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; or acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
In a third aspect, an embodiment of the present invention provides a page processing apparatus, including: a processor coupled with a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory to enable the terminal device to perform the method of any of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a readable storage medium, which includes a program or instructions, and when the program or instructions are run on a computer, the method according to any one of the first aspect is performed.
According to the page processing method, the page processing device and the page processing equipment, the first image to be processed is obtained from the current page; acquiring image information of a first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image; according to the image information and the text information, the introduction information of the first image is determined, and the introduction information is played in a voice mode. In the process, the first image in the page can be processed, the text information can be extracted from the first image, so that the introduction information corresponding to the first image can be obtained, and the introduction information corresponding to the first image can be played in a voice mode.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of a page processing method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a page processing method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a method for determining introductory information according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a page according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another page provided by the embodiment of the present invention;
fig. 6 is a schematic structural diagram of a page processing apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of another page processing apparatus according to an embodiment of the present invention;
fig. 8 is a schematic hardware structure diagram of a page processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic view of an application scenario of a page processing method according to an embodiment of the present invention. Referring to fig. 1, the electronic device may display a page, which may include text, pictures, and the like. In the process of displaying the page by the electronic device, the electronic device may play content in the page in a voice manner, for example, the electronic device may convert text information in the page into voice information and play voice information corresponding to the text information, and the electronic device may further process an image in the page to obtain voice information corresponding to the image and play voice information corresponding to the image.
In the application, if the page includes the image, the image can be processed, and the text information is extracted from the image to obtain the introduction information corresponding to the image, and the introduction information corresponding to the image is played in a voice mode.
The technical means shown in the present application will be described in detail below with reference to specific examples. It should be noted that the following embodiments may be combined with each other, and the description of the same or similar contents in different embodiments is not repeated.
Fig. 2 is a schematic flowchart of a page processing method according to an embodiment of the present invention. Referring to fig. 2, the method may include:
s201, acquiring a first image to be processed in a current page.
The execution main body of the embodiment of the invention can be electronic equipment, and can also be a page processing device arranged in the electronic equipment. Optionally, the electronic device may be a mobile phone, a computer, or the like. Alternatively, the page processing device may be implemented by software, or may be implemented by a combination of software and hardware.
Optionally, the current page is a page currently displayed by the electronic device. The current page includes at least the first image, and of course, the current page may include other images, for example, text and/or other images.
Optionally, the electronic device may have a plurality of page reading modes, and the page reading mode may include a first page reading mode and a second page reading mode. The first page reading mode is to automatically read all page contents in a page. The second page reading mode is to read part of page contents in a page under the trigger of a user.
For example, referring to fig. 1, when the page reading mode of the electronic device is the first page reading mode, after the electronic device displays the page shown in fig. 1, the electronic device starts to play the text content "great wall, which is an ancient military defense project in china, is a tall, firm and continuous long wall to limit the actions of enemy riding. The great wall is not a pure isolated city wall, but a defense system which takes the city wall as a main body and is combined with a large number of cities, obstacles, pavilions and marks, after the text content is played by the electronic equipment in a voice mode, the electronic equipment acquires images in a page, acquires introduction information of the images and plays the introduction information of the images in the voice mode.
For example, referring to fig. 1, when the page reading mode of the electronic device is the second page reading mode, after the electronic device displays the page shown in fig. 1, if the user inputs a voice playing operation to an image in the page, the electronic device acquires the image and the introduction information of the image, and plays the introduction information of the image in a voice mode.
In the actual application process, a user can set a page reading mode of the electronic equipment according to actual needs. For example, the electronic device may include an icon of the first page reading mode and an icon of the second page reading mode, and the user may select the corresponding icon to set the page reading mode of the electronic device.
Optionally, the electronic device may obtain the first image to be processed in the current page through at least two possible implementation manners:
one possible implementation is: the reading mode of the electronic equipment is a first page reading mode.
And acquiring the voice playing progress of the current page, and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as a first image.
For example, referring to fig. 1, after the electronic device finishes reading the text information in the page, the next processing object corresponding to the voice playing progress of the electronic device is an image in the page, and the electronic device acquires the image in the page.
Another possible implementation: the reading mode of the electronic equipment is a second page reading mode.
Receiving voice playing operation input by a user on the first image, wherein the voice playing operation is used for indicating voice to play the content in the first image; and acquiring a first image according to the voice playing operation.
Optionally, the voice playing operation may be a long press operation, a double click operation, or the like.
For example, referring to fig. 1, after the electronic device displays the page shown in fig. 1, if the user needs the electronic device to play the content of the image in the page in the voice mode, the user may input a voice playing operation to the image in the page. After the electronic device obtains the voice playing operation input by the user, the electronic device may obtain the image in the page.
Optionally, when the electronic device acquires the first image, the electronic device may acquire an address of the first image, and acquire the first image according to the address of the first image. For example, the address of the first image may be a Uniform Resource Locator (URL) of the first image.
Optionally, when the electronic device acquires the first image, the electronic device may intercept the first image in the current page. For example, the electronic device may obtain a page image corresponding to the current page, and intercept the first image in the page image, where the page image corresponding to the current page includes all content in the current page.
S202, acquiring image information of the first image, and extracting text information from the first image.
Wherein the image information comprises an object class of an object displayed by the first image.
Alternatively, the electronic device may obtain the object class of the object displayed in the first image through the recognition model, for example, the first image may be input to the recognition model, so that the recognition model outputs the object class of the object displayed in the first image.
Optionally, the neural network may be trained through multiple groups of sample data to obtain the recognition model, where each group of sample data includes the sample image and the corresponding sample object class in the sample image.
For example, if the first image includes an airplane, the object category of the object displayed in the first image is the airplane, and the image information of the first image includes the airplane. Assuming that the first image includes a monkey, the object category of the object displayed by the first image is the monkey, and the image information of the first image includes the monkey.
Alternatively, the textual information may be extracted in the first image by OCR technology. The textual information extracted in the first image includes all textual information displayed in the first image, e.g., the textual information extracted in the first image includes an image introduction, a watermark, an advertisement, etc. in the first image.
Optionally, when the reading mode of the electronic device is the second page reading mode, an image reading mode may be further set, where the image reading mode includes an overall reading mode and a local reading mode.
When the image reading mode is the integral reading mode, after a user inputs voice playing operation to one image, the electronic equipment plays the content in the integral image in a voice mode.
When the image reading mode is the partial reading mode, after a user inputs a voice playing operation to one image, the electronic device may acquire position information of the voice playing operation in the first image, and determine a partial image in the first image according to the position information, the electronic device may play content in the partial image in a voice manner, and correspondingly, the image information of the first image includes a partial object category of a partial object displayed by the partial image. In this reading mode, in order to enable more accurate playback of the content in the partial image, when acquiring the image information of the first image, the object category of the object displayed by the first image and the partial object category of the partial object displayed by the partial image may be referred to, and accordingly, the object category and the partial object category of the object displayed by the first image are included in the image information of the first image.
For example, assuming that the first image is a face image, and assuming that a user inputs a voice playing operation to eyes in the face image, the acquired image information of the first image includes a face and eyes.
And S203, determining introduction information of the first image according to the image information and the text information.
Optionally, context information of the first image may also be obtained, and the introduction information may be determined according to the context information, the image information, and the text information. The context information of the first image may be obtained in the current page, or the context information of the first image may be obtained in the previous page or the next page of the current page.
It should be noted that, in the embodiment shown in fig. 3, a process of determining the introduction information of the first image is described, and details are not described here again.
And S204, playing introduction information by voice.
Optionally, the introduction information of the text type may be obtained first, then the introduction information of the text type may be converted into the voice information, and the voice information may be played.
According to the page processing method provided by the embodiment of the invention, a first image to be processed is acquired in a current page; acquiring image information of a first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image; according to the image information and the text information, the introduction information of the first image is determined, and the introduction information is played in a voice mode. In the process, the first image in the page can be processed, the text information can be extracted from the first image, so that the introduction information corresponding to the first image can be obtained, and the introduction information corresponding to the first image can be played in a voice mode.
On the basis of any of the above embodiments, the following describes a process of determining the introduction information by the embodiment shown in fig. 3.
Fig. 3 is a flowchart illustrating a method for determining introduction information according to an embodiment of the present invention. Referring to fig. 3, the method may include:
s301, determining target text information in the text information according to the context information and the image information.
Optionally, the text information includes a plurality of entries. The terms in the text information may be text extracted at different locations of the first image. One entry includes at least one character, and text properties of each character in one entry are the same, for example, the text properties may include font, size, color, font special effect, and the like.
Alternatively, the target text information may be determined in the text information by the following feasible implementation manners: and acquiring the matching degree of each entry in the text information with the context information and the image information, and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value, as the entry in the target text information. In this way, text that is not related to the image content can be filtered out from the text information, for example, watermarks, advertisements, etc. can be filtered out from the text information.
Optionally, first semantic information corresponding to the context information and the image information may be obtained, second speech information of the vocabulary entry may be obtained, and the matching degree between the vocabulary entry in the text information and the context information and the image information may be determined according to the matching degree between the first semantic information and the second semantic information.
S302, image keywords are determined in the context information.
Optionally, semantic analysis may be performed on the context information to determine image keywords in the context information.
Alternatively, the image keyword may be a word describing the first image. For example, assuming that the first image is an airplane, the image keywords may include: the model of the aircraft, the color of the aircraft, the manufacturer of the aircraft, etc. Assuming that the first image is a human face, the image keywords may include: gender of the person, name of the person, approximate age of the person, state of the face (smiling, crying, etc.).
S303, obtaining an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit.
The fixed information refers to the fixed information carried by the image introduction template. The information filling bits are used for filling information related to the image.
Alternatively, the image introduction template may be as follows:
next, an image in a page is described, in which the content is (information fill bit 1 (image information)), (information fill bit 2 (image information)) is characterized by (information fill bit 3 (image keyword)), (information fill bit 4 (image information)) is written with characters (information fill bit 5 (target text information)).
In the above image introduction template, the fixed information includes: "hereinafter, an image in a page is described, and the contents" and "the features" in the image "are" and "characters are described". The image introduction template comprises 5 information filling bits.
It should be noted that, the above description illustrates only one image introduction template in an exemplary form, and the image introduction template may also be other, which is not specifically limited in this embodiment of the present invention.
S304, filling the target text information, the image keywords and the image information to at least one information filling position to obtain introduction information.
Optionally, the introduction information includes target text information, image keywords, and image information.
Optionally, the target text information, the image keywords, and the image information may be filled in at least one information filling bit by the following feasible implementation manners to obtain the introduction information: acquiring an information type corresponding to each information filling bit, an information type of target text information and an information type of image keywords; respectively determining information filling positions corresponding to the target text information, the image keywords and the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords; and respectively filling the target text information, the image keywords and the image information to corresponding information filling positions to obtain introduction information.
In the embodiment shown in fig. 3, the introduction information of the first image is related to the target text information, the context information and the first image in the first image, and thus the first image can be accurately described by the introduction information.
The following describes a page processing procedure by specific examples with reference to fig. 4 to 5.
Fig. 4 is a schematic diagram of a page according to an embodiment of the present invention. Referring to fig. 4, an interface 401 and an interface 402 are included, wherein,
referring to the interface 401, the interface 401 sets a page for a page reading mode, including two page reading modes (a first page reading mode and a second page reading mode) and a check box corresponding to each page reading mode, where only the check box corresponding to one page reading mode can be selected at the same time. Assuming that the user selects the first page reading mode, the electronic device sets the page reading mode of the electronic device to the first page reading mode.
Referring to the interface 402, after the electronic device displays the page shown in the interface 402, the electronic device plays the text information in the page, acquires the introduction information of the image in the page after the text information is played, and plays the introduction information of the image in voice.
In the embodiment shown in fig. 4, when the page reading mode of the electronic device is the first page reading mode, the electronic device may play all contents (including introduction information of the image) in the page in a voice manner, so that the user may obtain the related contents of the image in the page in a voice manner, thereby improving the reliability of page processing.
Fig. 5 is another schematic page diagram provided in the embodiment of the present invention. Referring to fig. 5, an interface 501 and an interface 502 are included, wherein,
referring to the interface 501, the interface 501 sets a page for a page reading mode, including two page reading modes (a first page reading mode and a second page reading mode) and a check box corresponding to each page reading mode, where only the check box corresponding to one page reading mode can be selected at the same time. And if the user selects the second page reading mode, the electronic equipment sets the page reading mode of the electronic equipment to be the second page reading mode. After the user selects the second page reading mode, the electronic device displays interface 502.
Referring to the interface 502, the interface 502 sets a page for image reading modes, which includes two image reading modes (an overall reading mode and a local reading mode) and a check box corresponding to each image reading mode, wherein only the check box corresponding to one image reading mode can be selected at the same time. Assuming that the user selects the local reading mode, the electronic device sets the image reading mode of the electronic device to the local reading mode.
Referring to the interface 503, after the electronic device displays the page shown in the interface 503, when the user needs the electronic device to play the content in the image, the user inputs a voice playing operation (for example, a long-press operation) to the image, the electronic device determines a local image (kiosk) according to the position of the voice playing operation input by the user, and the introduction information of the image determined by the electronic device includes the image and the introduction information of the local image, so that the user can know the information of the local image in detail.
In the embodiment shown in fig. 4, when the page reading mode of the electronic device is the second page reading mode, under the trigger of the user, the electronic device may play the introduction information of the image in the page in a voice manner, so that the user may obtain the relevant content of the image in the page in a voice manner, and the reliability of page processing is improved.
Fig. 6 is a schematic structural diagram of a page processing apparatus according to an embodiment of the present invention. Referring to fig. 6, the page processing apparatus 10 may include a first obtaining module 11, a second obtaining module 12, an extracting module 13, a first determining module 14 and a playing module 15, wherein,
the first obtaining module 11 is configured to obtain a first image to be processed in a current page;
the second obtaining module 12 is configured to obtain image information of the first image;
the extracting module 13 is configured to extract text information from the first image, where the image information includes an object category of an object displayed in the first image;
the first determining module 14 is configured to determine introduction information of the first image according to the image information and the text information;
the playing module 15 is configured to play the introduction information in voice.
The page processing apparatus provided in the embodiment of the present invention may execute the basic scheme shown in the foregoing method embodiment, and its implementation principle and beneficial effect are similar, which are not described herein again.
In a possible implementation, the first determining module 14 is specifically configured to:
acquiring context information of the first image;
and determining the introduction information according to the context information, the image information and the text information.
In a possible implementation, the first determining module 14 is specifically configured to:
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information;
and determining the introduction information according to the target text information, the image keywords and the image information, wherein the introduction information comprises the target text information, the image keywords and the image information.
In a possible implementation, the first determining module 14 is specifically configured to:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
In a possible implementation, the first determining module 14 is specifically configured to:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
In a possible implementation, the first determining module 14 is specifically configured to:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
In a possible implementation manner, the second obtaining module 12 is specifically configured to:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
In a possible implementation manner, the first obtaining module 11 is specifically configured to:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
In a possible implementation manner, the first obtaining module 11 is specifically configured to:
receiving a voice playing operation input by a user on the first image, wherein the voice playing operation is used for indicating voice to play the content in the first image;
and acquiring the first image according to the voice playing operation.
Fig. 7 is a schematic structural diagram of another page processing apparatus according to an embodiment of the present invention. In addition to the embodiment shown in fig. 6, referring to fig. 7, the page processing apparatus 10 further includes a second determining module 16, wherein,
the second determining module 16 is configured to, after the first determining module 14 receives a voice playing operation output by a user on the first image, acquire position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the first obtaining module 11 is specifically configured to: acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; or acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
The page processing apparatus provided in the embodiment of the present invention may execute the basic scheme shown in the foregoing method embodiment, and its implementation principle and beneficial effect are similar, which are not described herein again.
Fig. 8 is a schematic diagram of a hardware structure of a page processing apparatus according to an embodiment of the present invention, and as shown in fig. 8, the page processing apparatus 20 includes: at least one processor 21 and a memory 22. The processor 21 and the memory 22 are connected by a bus 23.
In a specific implementation, the at least one processor 21 executes computer-executable instructions stored in the memory 22, so that the at least one processor 21 performs the above page processing method.
For a specific implementation process of the processor 21, reference may be made to the above method embodiments, which implement similar principles and technical effects, and this embodiment is not described herein again.
In the embodiment shown in fig. 8, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The present application also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the page processing method as described above is implemented.
The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.
The division of the units is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (14)
1. A page processing method is characterized by comprising the following steps:
acquiring a first image to be processed in a current page;
acquiring image information of the first image, and extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image;
according to the image information and the text information, determining introduction information of the first image, and playing the introduction information in a voice mode;
the determining the introduction information of the first image according to the image information and the text information includes:
acquiring context information of the first image;
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information, wherein the image keyword is a vocabulary describing the first image;
determining the introduction information according to the target text information, the image keywords and the image information;
the determining of the image keyword in the context information includes:
performing semantic analysis on the context information to determine image keywords in the context information;
the acquiring a first image to be processed in a current page includes:
receiving interactive operation input by a user on the first image, wherein the interactive operation is used for indicating voice to play the content in the first image;
acquiring the first image according to the position of the interactive operation; alternatively, the first and second electrodes may be,
the acquiring a first image to be processed in a current page includes:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
2. The method of claim 1, wherein determining target text information in the text information based on the context information and the image information comprises:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
3. The method of claim 1 or 2, wherein the determining the introduction information according to the target text information, the image keyword and the image information comprises:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
4. The method of claim 3, wherein the padding the target text information, the image keyword, and the image information to the at least one information padding bit to obtain the introduction information comprises:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
5. The method of claim 1, wherein the obtaining image information for the first image comprises:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
6. The method of claim 1, wherein after receiving a voice playback operation of the first image output by a user, further comprising:
acquiring position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the acquiring of the image information of the first image includes:
acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; alternatively, the first and second electrodes may be,
and acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
7. A page processing device is characterized by comprising a first acquisition module, a second acquisition module, an extraction module, a first determination module and a playing module, wherein,
the first acquisition module is used for acquiring a first image to be processed in a current page;
the second acquisition module is used for acquiring the image information of the first image;
the extraction module is used for extracting text information from the first image, wherein the image information comprises an object type of an object displayed by the first image;
the first determining module is used for determining introduction information of the first image according to the image information and the text information;
the playing module is used for playing the introduction information in a voice mode;
the first determining module is specifically configured to:
acquiring context information of the first image;
determining target text information in the text information according to the context information and the image information;
determining an image keyword in the context information, wherein the image keyword is a vocabulary describing the first image;
determining the introduction information according to the target text information, the image keywords and the image information;
the first determination module is further to:
performing semantic analysis on the context information to determine image keywords in the context information;
the first obtaining module is specifically configured to:
receiving interactive operation input by a user on the first image, wherein the interactive operation is used for indicating voice to play the content in the first image;
acquiring the first image according to the position of the interactive operation; alternatively, the first and second electrodes may be,
the first obtaining module is specifically configured to:
acquiring a voice playing progress of a current page;
and if the next processing object corresponding to the voice playing progress is an image, determining the next processing object corresponding to the voice playing progress in the current page as the first image.
8. The apparatus of claim 7, wherein the first determining module is specifically configured to:
acquiring the matching degree of each entry in the text information with the context information and the image information;
and determining the entry in the text information, wherein the matching degree of the entry in the text information with the context information and the image information is greater than or equal to a preset threshold value as the entry in the target text information.
9. The apparatus of claim 7 or 8, wherein the first determining module is specifically configured to:
acquiring an image introduction template, wherein the image introduction template comprises fixed information and at least one information filling bit;
and filling the target text information, the image keywords and the image information to the at least one information filling position to obtain the introduction information.
10. The apparatus of claim 8, wherein the first determining module is specifically configured to:
acquiring an information type corresponding to each information filling bit, an information type of the target text information and an information type of the image keyword;
respectively determining the target text information, the image keywords and the information filling positions corresponding to the image information according to the information type corresponding to each information filling position, the information type of the target text information and the information type of the image keywords;
and filling the target text information, the image keywords and the image information to corresponding information filling positions respectively to obtain the introduction information.
11. The apparatus of claim 7, wherein the second obtaining module is specifically configured to:
the image information of the first image is obtained through an identification model, wherein the identification model is obtained by learning multiple groups of sample data, and each group of sample data comprises a sample image and corresponding sample image information in the sample image.
12. The apparatus of claim 7, further comprising a second determination module, wherein,
the second determining module is configured to, after the first determining module receives a voice playing operation output by a user on the first image, acquire position information of the voice playing operation in the first image; determining a local image in the first image according to the position information;
the first obtaining module is specifically configured to: acquiring the image information according to the first image and the local image, wherein the image information comprises the object type and the local object type of the local object displayed by the local image; or acquiring the image information according to the local image, wherein the image information comprises the local object category of the local object displayed by the local image.
13. A page processing apparatus, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the memory-stored computer-executable instructions causes the at least one processor to perform the page processing method of any of claims 1 to 6.
14. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the page processing method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910591159.8A CN110489674B (en) | 2019-07-02 | 2019-07-02 | Page processing method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910591159.8A CN110489674B (en) | 2019-07-02 | 2019-07-02 | Page processing method, device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110489674A CN110489674A (en) | 2019-11-22 |
CN110489674B true CN110489674B (en) | 2020-11-06 |
Family
ID=68546650
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910591159.8A Active CN110489674B (en) | 2019-07-02 | 2019-07-02 | Page processing method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110489674B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111062207B (en) * | 2019-12-03 | 2023-01-24 | 腾讯科技(深圳)有限公司 | Expression image processing method and device, computer storage medium and electronic equipment |
CN113132781B (en) * | 2019-12-31 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Video generation method and apparatus, electronic device, and computer-readable storage medium |
CN113112984A (en) * | 2020-01-13 | 2021-07-13 | 百度在线网络技术(北京)有限公司 | Control method, device and equipment of intelligent sound box and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104615440A (en) * | 2015-02-13 | 2015-05-13 | 联想(北京)有限公司 | Information processing method and electronic device |
CN105512220A (en) * | 2015-11-30 | 2016-04-20 | 小米科技有限责任公司 | Image page output method and device |
CN108182432A (en) * | 2017-12-28 | 2018-06-19 | 北京百度网讯科技有限公司 | Information processing method and device |
CN108538300A (en) * | 2018-02-27 | 2018-09-14 | 科大讯飞股份有限公司 | Sound control method and device, storage medium, electronic equipment |
CN109348254A (en) * | 2018-09-30 | 2019-02-15 | 武汉斗鱼网络科技有限公司 | Information push method, device, computer equipment and storage medium |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7437368B1 (en) * | 2005-07-05 | 2008-10-14 | Chitika, Inc. | Method and system for interactive product merchandizing |
JP2008191879A (en) * | 2007-02-02 | 2008-08-21 | Sharp Corp | Information display device, display method for information display device, information display program, and recording medium with information display program recorded |
US8346623B2 (en) * | 2010-08-06 | 2013-01-01 | Cbs Interactive Inc. | System and method for navigating a collection of editorial content |
US8579187B2 (en) * | 2012-02-28 | 2013-11-12 | Ebay Inc. | System and method to identify machine-readable codes |
CN105824819A (en) * | 2015-01-05 | 2016-08-03 | 阿里巴巴集团控股有限公司 | Image loading method, device and electronic device |
CN106169065B (en) * | 2016-06-30 | 2019-12-24 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN108495185B (en) * | 2018-03-14 | 2021-04-16 | 北京奇艺世纪科技有限公司 | Video title generation method and device |
CN109167939B (en) * | 2018-08-08 | 2021-07-16 | 成都西纬科技有限公司 | Automatic text collocation method and device and computer storage medium |
-
2019
- 2019-07-02 CN CN201910591159.8A patent/CN110489674B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104615440A (en) * | 2015-02-13 | 2015-05-13 | 联想(北京)有限公司 | Information processing method and electronic device |
CN105512220A (en) * | 2015-11-30 | 2016-04-20 | 小米科技有限责任公司 | Image page output method and device |
CN108182432A (en) * | 2017-12-28 | 2018-06-19 | 北京百度网讯科技有限公司 | Information processing method and device |
CN108538300A (en) * | 2018-02-27 | 2018-09-14 | 科大讯飞股份有限公司 | Sound control method and device, storage medium, electronic equipment |
CN109348254A (en) * | 2018-09-30 | 2019-02-15 | 武汉斗鱼网络科技有限公司 | Information push method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110489674A (en) | 2019-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110020437B (en) | Emotion analysis and visualization method combining video and barrage | |
CN110489674B (en) | Page processing method, device and equipment | |
CN107958030B (en) | Video cover recommendation model optimization method and device | |
US9613268B2 (en) | Processing of images during assessment of suitability of books for conversion to audio format | |
US20170004374A1 (en) | Methods and systems for detecting and recognizing text from images | |
CN109558513B (en) | Content recommendation method, device, terminal and storage medium | |
CN111241340B (en) | Video tag determining method, device, terminal and storage medium | |
CN110334292B (en) | Page processing method, device and equipment | |
CN107608618B (en) | Interaction method and device for wearable equipment and wearable equipment | |
CN114401431B (en) | Virtual person explanation video generation method and related device | |
CN106571072A (en) | Method for realizing children education card based on AR | |
CN111178056A (en) | Deep learning based file generation method and device and electronic equipment | |
CN111737961B (en) | Method and device for generating story, computer equipment and medium | |
CN111090817A (en) | Method for displaying book extension information, electronic equipment and computer storage medium | |
CN114390220A (en) | Animation video generation method and related device | |
CN108847066A (en) | A kind of content of courses reminding method, device, server and storage medium | |
CN110187816B (en) | Automatic page turning method for cartoon type electronic book, computing device and storage medium | |
CN111542817A (en) | Information processing device, video search method, generation method, and program | |
CN115393865A (en) | Character retrieval method, character retrieval equipment and computer-readable storage medium | |
CN112114770A (en) | Interface guiding method, device and equipment based on voice interaction | |
CN111582281B (en) | Picture display optimization method and device, electronic equipment and storage medium | |
CN110428668B (en) | Data extraction method and device, computer system and readable storage medium | |
CN110970030A (en) | Voice recognition conversion method and system | |
CN114691853A (en) | Sentence recommendation method, device and equipment and computer readable storage medium | |
CN113673414B (en) | Bullet screen generation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |